Network online · 47 jobs running right now

The GPUs already exist.
Let's use those.

Billions of capable GPUs are sitting idle in phones, laptops, and desktops while another data center gets built somewhere. Warppool runs your embarrassingly parallel workloads — Monte Carlo, ML inference, parameter sweeps, rendering — across those instead. Runs in any tab. No new silicon, no new substation, no new cooling tower. Stop the madness.

Researchers run free. Everyone else pays only for the milliseconds their job actually consumed.

worker @ detecting… checking WebGPU…
π estimate
target π = 3.141592… · error
Samples
0
Hits
0
Throughput
0
Chunks
0 / 64
--:--:--warppool worker-client · v0.3.1
This is a real Monte Carlo job. Hits are atomic-summed across GPU workgroups; π = 4·hits/samples.
Workers online
12,438
87 countries · +312 today
Compute pooled
184.2TFLOPS FP32
peak today 241.6 TFLOPS
Jobs in flight
47
3,812 chunks dispatched / min
Sponsor sites
218
sites running the worker tag
Why Warppool exists

The cheapest compute is the compute we already built.

Global data centers consumed roughly 415 TWh in 2024 — about 1.5% of all electricity — and the forecast is to double by 2030. Every marginal AI workload arrives as more substations, more cooling water, more concrete poured somewhere with cheap land.

Meanwhile, the device you're reading this on has a GPU that's idle most of the day. So does almost every device on the network. The math capacity is already paid for, manufactured, and shipped — it just isn't being used.

Move the compute to the silicon, instead of building more silicon. Browsers are the only thing we need.

Global data-center electricity, 2024
~415 TWh / yr
≈ entire annual electricity use of France. Forecast to double by 2030 (IEA).
Personal-GPU duty cycle
< 5% while powered on
Most discrete GPUs are idle the whole time they're plugged in. The capacity is paid for whether we use it or not.
Who's contributing

One dot per connected worker. Right now.

Every dot is a browser tab somewhere with a discrete or integrated GPU, lent for a few seconds at a time.

Running Idle Recently dropped
How it works

Two doors into the same network.

You either ship work into the pool, or you lend cycles to the pool. Same protocol, opposite ends.

For job submitters

Send a kernel. Get an answer back.

Wrap your problem as a WGSL shader (or pick one from the kernel library), describe how to chunk it, and submit. The scheduler farms the chunks out, verifies them by redundancy, and hands you the aggregated result.

  1. Frame it as parallel chunks.

    Each chunk needs to run independently and finish in a few seconds on a mid-range GPU. Monte Carlo seeds, image tiles, batched inputs all qualify.

  2. Upload kernel + chunk plan.

    WGSL source for the compute kernel, plus a JSON manifest describing chunk count, parameters, and the reducer (sum, concat, max, custom).

  3. Scheduler matches workers to chunks.

    Capability profile (memory, workgroup size, throughput score) and latency to the coordinator determine who gets what. Bigger GPUs get fatter chunks.

  4. Each chunk runs on 2–3 workers.

    Results are compared within a numeric tolerance. Disagreement triggers a third opinion. Lying nodes get quietly downranked.

  5. Reduced result returns.

    You get back the aggregate, the per-chunk receipts, and an audit trail of which workers ran what.

For site owners & visitors

Lend cycles. Skip the ads.

Drop a script tag on your site. Visitors who opt in run a worker inside a hidden iframe sandbox while they read your content. You get credit; they get a site without ad networks tracking them.

  1. Embed the worker tag.

    One <script> line. The visitor sees a small, unobtrusive consent strip explaining what it does before any compute begins.

  2. Visitor opts in (or doesn't).

    If they say no, the tag does nothing. If they say yes, a Web Worker thread attaches to the coordinator and waits for chunks.

  3. Browser GPU runs chunks.

    Each chunk is bounded — fixed memory ceiling, fixed wall-clock, throttled when the page goes to a background tab.

  4. Compute credit accrues.

    Per worker-second, scaled by the worker's benchmark score. Credit either pays out to your site, or funds research-pool jobs in your name.

  5. Visitor leaves; worker detaches.

    No persistent state. Closing the tab releases the GPU instantly. The coordinator reassigns any in-flight chunk to another worker.

Is your workload a fit?

Warppool wins when the math is heavy and the data is light.

Shipping a chunk across consumer connections costs real time. The job pays for itself only if computing the chunk takes meaningfully longer than transferring the bytes that describe it. Rule of thumb: high FLOPs-per-byte, embarrassingly parallel, tolerant of stragglers.

Built for this
  • Monte Carlo & stochastic simulationseeds in, scalar out. Highest FLOPs-per-byte you'll ever see.
  • Parameter sweeps & hyperparameter searchsame kernel, different constants per chunk.
  • Batched ML inferencevision model + 10k images: each image is its own chunk.
  • Tiled offline renderingpath tracing, scientific visualization, splat extraction.
  • Brute-force searchcryptanalysis benchmarks, combinatorial enumeration, scan-and-filter.
!Look elsewhere
  • Training large modelschunks must talk mid-step. Use a tightly coupled cluster.
  • Latency-critical inferenceround-trip over WebSocket dominates anything under ~150ms.
  • Big-data ETLmoving terabytes through volunteer pipes is the wrong direction.
  • Regulated datavolunteer nodes are untrusted; PHI, PII, classified work doesn't belong here.
  • Bit-exact reproducibilitydifferent GPUs round floats differently. Use integer kernels if exactness matters.
In production today

What people run on the pool.

Climate ensemble forecasts

A weather lab runs 4,000-member regional precipitation ensembles overnight, distributed across volunteer browsers in the same time zone as the model domain.

Type Monte CarloMedian chunk 2.4 s

Batched FastVLM inference

A classroom batches 220k video frames through Apple's open FastVLM model for captioning research. Each chunk is one frame; chunks stream to whichever idle WebGPU device picks them up first.

Type ML inferenceFrames/hour 38,400

Path-traced architectural renders

A studio renders 8k stills as 64×64 tiles. The coordinator stitches them back as chunks return. Coffee-break renders, not real-time.

Type Render tilePixels/render 33M

Hyperparameter sweeps

An ML team sweeps 20k learning-rate × batch-size combinations on a small classifier. Each chunk is one config × 5 seeds. The coordinator returns the best 50.

Type SweepConfigs/run 20,000

Protein dynamics replicas

Replica-exchange MD simulations split into independent temperature ladders. Volunteers run 200 ps slices and ship back trajectory deltas.

Type Replica MCLadders 32

Combinatorial enumeration

A graph-theory grad student enumerates 4-regular graphs up to N=22 isomorphism. Each chunk owns a slice of canonical labelings; the union is returned.

Type SearchGraphs/day 1.4B
Trust & verification

Built for two-way distrust.

Submitters can't trust volunteer machines; volunteers can't trust submitter kernels. The protocol is built so neither side has to.

Redundant dispatch

Every chunk goes to 2–3 independent workers. Results are compared within numeric tolerance. Disagreement triggers re-runs on disjoint nodes; persistent disagreement quarantines the kernel for human review.

Canary chunks

A fraction of dispatched chunks have known answers seeded by the coordinator. Workers don't know which. Returning the wrong answer to a canary downranks the worker silently.

Browser sandbox

Workers run WGSL inside the browser's WebGPU sandbox. No filesystem access, no native APIs, no persistent state across reloads. Bounded memory per chunk. Visitor closes the tab — the worker is gone.

Per-job audit trail

Every dispatch and result for your job is appended to a log keyed by worker ID and chunk hash. You can replay it, verify the reduction, and check that no worker's output was silently dropped. The submitter chooses whether to publish the log or keep it private.

Important caveat

Volunteer compute is not zero-trust compute. We protect submitters from lying workers via redundancy, and volunteers from malicious kernels via the WebGPU sandbox — but kernels you submit are visible to the workers that run them. Don't put secrets in your shader source.

Read the threat model →
Under the hood

One coordinator. Many tabs. WGSL shaders moving in between.

The coordinator is a small FastAPI service. Workers are browser tabs. The messages between them are JSON. The actual math is WGSL on each visitor's GPU.


      
Submitter
REST client
POST /api/jobs
{ kernel, manifest }
Coordinator
FastAPI + WebSocket
scheduler · registry
verifier · aggregator
Workers (1–N)
Browser tab + WebGPU
WGSL shader
atomic hit aggregator
Protocol: WebSocket JSON · welcome / gpu_capabilities / compute_chunk / chunk_result Kernel: WGSL Shader runtime: WebGPU (Chrome 113+ · Edge 113+ · Firefox 147+ · Safari 26+)
Common questions

Things people ask before submitting a kernel.

Is this just a cryptominer with extra steps?

No — and we built the network specifically to be the opposite. Cryptominers run inside ads, without consent, and benefit the site operator at the visitor's expense. Warppool requires explicit per-session opt-in, surfaces what's running, throttles to background-tab limits, and pays the credit to whoever the visitor authorized. The first thing on every roadmap is making fraud easy to spot.

What does it cost to run a job?

Researchers with verifiable institutional affiliation (university, public lab, school) run free — no cap, just don't be silly with it. Everyone else is billed per millisecond of completed compute: you pay only for chunks that returned a verified result. Failed chunks, timeouts, canary chunks, and redundant re-runs cost nothing. Indicative rate is around $0.04 per GPU-hour. Embed the worker tag on your own site and you earn credits back at the same rate.

How fast is one browser GPU, really?

A mid-range integrated GPU (M-series, recent Intel) returns ~1.5 GFLOPS-equivalent on the Monte Carlo benchmark you see in the hero. A discrete RTX-class card returns 30–80×. Calibration runs on join, so the scheduler always knows roughly how big a chunk a given worker can swallow in ~2 seconds.

What happens when a worker disappears mid-chunk?

Every chunk has a wall-clock deadline. If the result doesn't come back, the scheduler reassigns the chunk to another worker. Most reassignments happen within 200ms — well before a human would notice. The original worker simply doesn't get credit for that chunk.

Can the kernel I submit see what's on the visitor's machine?

No. WebGPU runs in the browser's sandbox. The kernel has access only to the buffers the worker explicitly binds, which contain only the chunk parameters you sent. It can't read disk, can't open network connections, can't query the GPU's host-side memory.

What languages do I write my kernel in?

WGSL, the standard WebGPU shading language. Numpy/Torch-style kernels need to be ported; there's a library of common compute primitives (matmul, reduce, scan, sort, FFT) you can import. We're working on a Python → WGSL transpiler for simple element-wise kernels.

Who runs the coordinator?

Warppool runs the production coordinator — a small FastAPI service that sits between submitters and workers. It's the only piece of the network that needs a server. Enterprise customers with sensitive workloads can run on a private coordinator (a dedicated deployment on isolated infrastructure); same protocol, same kernel library.

→ Run jobs

Ship your first kernel to the pool.

Walk through the submit flow, see the kernel format. Researchers: apply for a free grant. Everyone else: pay per millisecond of completed compute.

Submitter docs
→ Host workers

Replace your ad network with compute.

Embed the worker tag. Visitors opt in. You get the credit; they get a site without trackers.

Site-owner guide