Hi @programmerjake,
Thanks for your intervention. You are 100% correct: if a single execution thread generates 2^40 distinct canonical contents, no deduplication strategy on Earth can save you, and the heap chain will eventually exhaust its allocation pools.
However, the way Yon handles this under the hood is tied to its core philosophy: I prefer a hard, predictable specification over a soft, non-deterministic runtime degradation (like a GC pause).
Here is how Yon deals with the scenario you described today, and how I am extending it to support ephemeral computation without losing the categorical properties:
1. The Space Isolation Model (MMU-enforced)
In Yon, a program doesn’t live in a single global heap. It splits into isolated “Spaces” (separate OS processes enforced by the hardware MMU, kinda like Postgres model). Each Space operates on a fixed-size baseline constraint: a chain of up to 256 heaps of 196,560 slots each, roughly 50M distinct contents per process (196,560 is the kissing number of the Leech lattice: each heap in the chain is one shell).
If a specific Space runs a rogue loop like your use_all_memory() generating 2^40 distinct coordinates on Λ₂₄, it doesn’t destabilize the system or trigger a 10-second GC stop-the-world pause. It hits its own process boundary: slot exhaustion returns a structured failure, and raw payload memory is bounded by RAM, enforced by the OS at process granularity. The damage stays confined to that worker, and its memory is reclaimed wholesale via the standard MMU page tables; an orchestration API to catch the failure and restart the subsystem is part of the roadmap below.
2. Real-World Target Workloads & Benchmarks
The content-addressed heap on Λ₂₄/Co₀ is optimized for state-exploration, graph algorithms, and data pipelines where structural sharing is massive.
I just benched an immutable state-tracking scenario (“Magazzino” test) checking and inserting 100,000 deep states in a generative loop. Thanks to global deduplication and the O(1) handle lookup, the execution runs flat at 185 ns/state (~5.4 million states/sec on a single core), with memory remaining stable because re-visited or symmetric states collapse instantly into the same machine handles. On complex data like Merkle Trees (3 trees, 16,384 leaves each, ~98,000 nodes total), building identical trees twice results in zero new allocations.
3. Immediate Roadmap: Ephemeral Sub-Runtimes via spawn / promote
Your stress-test highlights the exact need for sandboxed, unpredictable pipelines. Adding a naive free() or temporary keyword at the function level would violate the language’s mathematical contracts and extensionality (since multiple parts of a program may share identical structures via their handles).
Instead, I am treating sub-runtimes as isolated blocks with a single exit point. I am currently implementing a spawn { ... } block that hooks directly into the frontend reducer and MLIR lowering pipeline:
- Isolation: a
spawn block instantiates an ephemeral under-Space with an un-indexed linear allocation arena.
- The
promote operation: when the calculation finishes, only the final return value is intercepted. Its canonical coordinate is calculated, and it is promoted up to the parent Space’s heap chain via a global put_chain verification.
- O(1) hardware drop: immediately after promotion, the entire memory of the ephemeral under-Space is wiped in O(1) via hardware (
munmap), with no element-by-element tracing.
A sketch of the upcoming syntax (illustrative, may still change). Your loop, confined and streamed:
fun analyze(seed: number): number {
be result holds spawn {
heavy_search(seed) // 2^40 intermediates live and die in here
} follow.map(score).collect()
return result // only the promoted value survives
}
And explicit Space lifecycle management:
be worker holds Space.spawn(Heavy)
be partial holds worker.call(batch)
be _ holds Space.drop(worker) // process exits, OS reclaims its whole heap
The core primitive for spawn/promote is simple enough in the MLIR infrastructure that I expect to land it in the repo very soon (possibly by tonight or tomorrow).
Thanks for pushing the boundaries of the language, this kind of architectural stress-testing is precisely why I open-sourced Yon