Decision support at point of care.
- Symptom-to-differential in < 10 ms
- Drug-interaction screening, on-prem
- HIPAA-compliant by construction
We rebuilt retrieval the way the brain does it. Knowledge that fires in milliseconds, runs on the energy of a candle, and never makes things up — because nothing is generated, only recalled.
We didn't build a faster RAG. We replaced retrieval.
Vector search assembles answers from fragments — embed, retrieve, rerank, prompt, generate, hope. We taught a small spiking network to recognize a question and converge directly on the answer. No tokens. No fabrication. No round trip.The Case for Biological Retrieval
Same problem, two substrates. One is an assembly line; the other is a single instant of recognition. Read across.
Six round trips, two model calls, one autoregressive sampler that can — and does — invent. Every step is a place for latency, cost, and a fabricated answer to accumulate.
Four stages, no autoregressive decoder, no second model. The query perturbs a recurrent network of leaky neurons; it falls into the attractor closest to the right answer, in milliseconds, on a CPU.
Reference implementation, single CPU baseline, 300-pair medical Q&A benchmark. CUDA adds another 10–100× headroom on top.
Click any layer in the stack. The right panel updates with what is actually happening inside that block of the network — and why it costs almost nothing.
A small pre-trained sentence transformer (all-MiniLM-L6-v2) compresses the query into a dense vector. This is the only conventional step in the pipeline — and the only step whose cost is amortized away by what follows.
Embedding magnitudes drive firing rate; salience modulates spike latency; populations carry distributed meaning. The continuous becomes discrete, and time becomes a free axis of computation.
500 leaky integrate-and-fire neurons, 90% sparse recurrent connectivity. The query perturbs the network; dynamics pull it toward the closest stored attractor. No backprop through time, no catastrophic forgetting.
Lateral inhibition between output groups turns parallel evidence into a single, calibrated answer. Weak hypotheses are actively suppressed — confidence is a measured property of the dynamics, not a softmax estimate.
The output is an index into the learned manifold plus a calibrated confidence. Pass it to your existing answer formatter, summarizer, or template — the heavy lifting is already done, and it cost you 5 milliseconds.
Pick a question on the left. Spikes travel from the query source, sweep across the reservoir, and converge on the matching attractor. The answer is read out at the bottom — locked.
We don't sample. We don't generate tokens. The network is a landscape of attractor basins — uncertain answers don't accumulate enough membrane potential to cross threshold, so they never reach the output stage at all.
LIF neurons exponentially decay membrane potential between spikes. If incoming evidence isn't consistent across timesteps, the trace dies before threshold.
Trained domains form stable points in state space. Off-distribution inputs land in shallow basins that decay rather than converge — a structural form of "I don't know".
Output neurons compete laterally. The strongest, most temporally coherent population suppresses the rest — confidence is a measured property, not a softmax estimate.
There is no autoregressive decoder to fabricate plausible nonsense. The output is a pointer into a learned manifold — with a real, calibrated confidence score attached.
Anywhere a millisecond matters, a watt-hour is rationed, or a packet can't leave the perimeter — a brain-inspired retrieval layer is the more honest fit.
Annualized savings against a comparable cloud LLM + vector DB pipeline. Conservative assumptions, every line item visible.
At a million queries per day, your cloud inference line item shrinks to a rounding error — and your edge devices stop needing one at all.
Neuromorphic hardware (Loihi 2, BrainChip Akida) is finally shipping. The pipeline is built to compile down to it without rewriting your application.
We're bringing that solution to your infrastructure. Five-minute demo, fifteen-minute training, sub-ten-millisecond queries from there on.
A real engineer reads every request. We'll reply within one business day with a sandbox and a 30-minute walkthrough.