JP Patent App. 2026-046625 (Subset B)

PSDP — Within-language Parallelism

Phase-Synchronous Deterministic Parallelism. Parallelize a sequential program in the same language, without changing a single bit of its output. A commutator-norm phase-sync mechanism gives an algebraic guarantee of determinism under parallel execution.

Headline numbers

3.38 ×
compute-heavy
NENC kernel sweep (Java 8), near-linear scaling with core count
2.17 ×
graph-class kernels
GRA sweep (Java 8), bandwidth-bound, plateaus accordingly
1.02 ×
database / TPC
I/O bound, but bit-exactness is preserved
3072 / 3072
AVX-512 blocks
32-lane SIMD, bit-exact PASS, ±5% vs AVX2 (memory-bound)
11
safety mechanisms
claims 29–32, exposed as 5 profiles in SlimeSyCUDA (GAME_MIN to SAFETY_MAX)
12+
languages supported
each Subset A target language ships with a paired PSDP variant

Mechanism — commutator-norm phase synchronization

A sufficient condition for parallel execution to produce the same result as sequential execution is derived from the commutator norm ‖[A, B]‖ of the relation operators involved. At runtime, we enforce phase synchronization to keep the commutator below a threshold, and the Böttcher–Wenzel inequality then yields a closed-form upper bound on numerical drift. This is what eliminates the familiar phenomenon of "numbers change when the code goes parallel."

// PSDP example (excerpt from OrderBatchProcessor_PSDP.java)
// The bench verifies that the SHA-256 of the input/output exactly
// matches the sequential original (OrderBatchProcessor_ORIGINAL.java).
public class OrderBatchProcessor {
    public Result process(List<Order> orders) {
        return orders.parallelStream()        // only this is parallel
            .map(this::settle)
            .collect(PSDP.phaseSyncReduce(...));   // phase-sync reduce
    }
}

The parallelization touches only a handful of API calls. Logic is not rewritten. A regression bench checks that sequential and parallel forms produce identical SHA-256 outputs.

Three layers

PSDP applies the same principle at three different layers. Identity of result is established via "phase sync + commutator-norm threshold" at every layer.

Layer 1 — Algorithm Algorithmic parallelization Sequential batch → parallel batch. Java parallelStream and equivalents are introduced "without changing the output."
Layer 2 — QP Quantization-parameter parallelism Parallel QP control for video/signal pipelines, used in SlimeCodec and SlimeSyCUDA.
Layer 3 — SIMD Instruction-level parallelism AVX-512 32-lane svt_av1_quantize_fp variant — bit-exact PASS on 3072 blocks, ±5% vs AVX2 (memory-bound).

Eleven safety mechanisms (claims 29–32)

We classify the situations where parallel execution might leave deterministic territory into 11 categories, and provide a safety mechanism for each. SlimeSyCUDA (the GPU extension) exposes these as five staged profiles:

GAME_MINLowest overhead, intended for games. Safety mechanisms minimized; frame time is the priority.
BALANCEDDefault profile. General-purpose parallelization.
STRICTStrict profile for finance / scientific workloads. Bit-exact verification on every loop.
AUDITFull audit logging. Connects to the Subset A audit chain for round-trip proof.
SAFETY_MAXAll 11 mechanisms enabled. Intended for mission-critical applications such as aviation and medical systems.

Languages with PSDP support

In addition to the five Subset A targets, the research implementation extends to a wider set of languages (23 converters under track_c/converter_*). Because both "sequential" and "PSDP-parallel" forms are emitted from the same Slot IR, migration (Subset A) and parallelization (Subset B) can be combined in a single tooling pipeline.

Java 8 / 17 / 21 Rust C / C++ C# Kotlin Kotlin coroutines Go Scala Clojure Erlang Python Node.js PHP Swift FORTRAN Common Lisp

Benchmarks (Java 8, core sweep)

CategorykernelSpeed-upNotes
ComputeNENC (numerical equivalence)3.38 ×nearly linear in core count, CPU-bound
GraphGRA (graph kernels)2.17 ×memory-bandwidth bound, bit-exact preserved
DatabaseTPC (transactions)1.02 ×I/O bound, but result invariance is guaranteed
SIMDsvt_av1_quantize_fp (AVX-512)3072 / 3072bit-exact PASS, ±5% vs AVX2 (memory-bound)

Note: we do not accept the customary trade-off of "result drifts in exchange for speed-up." Bit-exactness is an absolute constraint. The 1.02× ceiling on the database benchmark is an I/O-bound physical limit, not a cost imposed by the safety mechanisms.

Audit suitability

  • Bit-exactSequential and parallel outputs match by SHA-256. The standard failure mode of "the numbers shift slightly when we go parallel" is eliminated.
  • Phase-sync guarantee‖[A, B]‖ ≤ ε is enforced at runtime. The Böttcher–Wenzel inequality then yields an algebraic upper bound on numerical drift.
  • 11 safety mechanismsCovered by claims 29–32, exposed as 5 staged profiles (GAME_MIN through SAFETY_MAX) for application-specific tuning.
  • Audit-chain couplingConnects to the Subset A audit chain (claim 9), giving a single pipeline that produces a bidirectional proof for both transformation and parallelization.
  • Regression resilienceSame input + same version → bit-identical SHA-256. Output does not drift across parallel or GPU execution.

Typical use cases

Financial batchesParallelize nightly batch jobs to shorten the window, while guaranteeing "not a single yen drift." The SAFETY_MAX profile is intended for audit settings.
Scientific computingFFT, Conv2D, LU decomposition and similar numerical kernels are parallelized bit-exactly. The familiar problem "we can't compare the results because parallel changed them" simply does not arise.
Video codingApply bit-exact parallelization to SlimeCodec QP control and the SVT-AV1 AVX-512 quantizer. Encoder parallel performance and reproducibility coexist.
Games / real-timeSlimeSyCUDA (GPU variant) exposes the GAME_MIN profile to minimize safety overhead and prioritize frame time.
Relationship with Subset A: migrate legacy with Subset A (cross-language transpilation), then apply PSDP (Subset B) on the migrated code in the same pipeline. Migration plus parallelization happen in a single tool, providing a single path from legacy code to "audit-grade parallel modern systems."

Technical specifications

PatentJP Patent App. 2026-046625 (Subset B = PSDP) / JP Patent App. 2026-046620 (Subset A coupling)
ClaimsPhase synchronization (Layers 1–3) / 11 safety mechanisms (claims 29–32) / Audit chain coupling (claim 9)
PaperPSDP Paper JP v5d (910 KB PDF, 2026-03-04)
ImplementationsJava 8 PoC / AVX-512 MVP-A (svt_av1_quantize_fp 32-lane) / Rust + WASM demo (index_psdp.html) / 12+ language converters bundled
Standard testsNENC / GRA / TPC kernel sweeps (Java 8) / 3072 AVX-512 blocks bit-exact
License modelCombined licensing with Subset A. Converter is licensed; converted output is unlicensed. Ed25519 3-hop activation.

Related documentation

  • PSDP paperPSDP Paper JP v5d (910 KB PDF, primary technical paper)
  • Specification指示書_PSDP_v2並列開発.md (v2 parallel development specification)
  • Reference samplesOrderBatchProcessor_ORIGINAL.java vs OrderBatchProcessor_PSDP.java (bit-exact contrast of sequential / parallel forms)
  • Implementation logPSDP_IMPLEMENTATION_LOG.md (Phase A/B Rust + WASM implementation record)
  • Patent specificationsJP Patent App. 2026-046625 (Subset B) / 2026-046620 (Subset A coupling)

Contact us See SlimeNENC (Subset A) See other products