For job submitters

Ship a kernel. Get math back.

Warppool runs your WGSL compute kernel across thousands of browser GPUs. You hand the coordinator a kernel and a chunk plan; it returns the reduced result with a per-chunk audit trail. Free for researchers; commercial workloads pay per millisecond of completed compute.

12,438 workers attached 184.2 TFLOPS pooled p50 chunk return 1.8 s Largest job this month 4.1B samples
Submit flow

Five things you hand the coordinator.

That's it. There's no SDK to install, no SaaS account, no provisioning step. Anything you can do with curl works.

  1. The WGSL kernel.

    A shader that takes a chunk's parameters in a uniform buffer, writes its output to a storage buffer. Standard WebGPU compute. If you've written a CUDA kernel, the mental model is identical; the syntax is closer to Rust.

  2. The chunk manifest.

    A JSON document describing how the job partitions into chunks. The basic form is "N chunks, here's the parameter vector for each one." The coordinator hands one entry to each worker.

  3. The reducer.

    How to combine per-chunk results into one answer. Built-ins: sum, concat, max, min, argmax, topk. Custom: a second tiny WGSL kernel the coordinator runs on the assembled chunk outputs.

  4. The redundancy policy.

    Default k=2 with a third opinion on disagreement. Higher k for higher-stakes runs; k=1 with a public canary frequency for cheap exploratory sweeps.

  5. An optional deadline.

    By default the coordinator pulls from the cheapest workers it can find. Add a deadline if you need the job finished by a specific wall-clock time; the scheduler will spend more credits to recruit faster nodes.

A complete example

Submit a Monte Carlo π job.

What a real submission looks like end-to-end — kernel, manifest, and the REST call. About 60 lines including comments.

monte_carlo_pi.wgsl
@group(0) @binding(0) var<uniform> params: Params;
@group(0) @binding(1) var<storage, read_write> hits: atomic<u32>;

struct Params {
  seed_start:         u32,
  num_samples:        u32,
  samples_per_thread: u32,
}

// PCG hash — fast, good statistical properties
fn pcg(x: u32) -> u32 {
  var s = x * 747796405u + 2891336453u;
  var w = ((s >> ((s >> 28u) + 4u)) ^ s) * 277803737u;
  return (w >> 22u) ^ w;
}

@compute @workgroup_size(256)
fn main(@builtin(global_invocation_id) gid: vec3<u32>) {
  let tid = gid.x;
  var state = pcg(pcg(params.seed_start + tid));
  var local_hits: u32 = 0u;
  for (var i: u32 = 0u; i < params.samples_per_thread; i = i + 1u) {
    state = pcg(state);
    let x = f32(state) / 4294967296.0;
    state = pcg(state);
    let y = f32(state) / 4294967296.0;
    if (x*x + y*y < 1.0) { local_hits = local_hits + 1u; }
  }
  atomicAdd(&hits, local_hits);
}
submit.sh
# 1. Upload the kernel
$ curl https://warppool.tech/api/kernels \
    -F "src=@monte_carlo_pi.wgsl"
> { "kernel_id": "wgsl-pi-7f3" }

# 2. Build a chunk manifest — 100 chunks of
# 100M samples each, distinct seed ranges
$ python -c "
import json
print(json.dumps({
  'chunks': [
    {'seed_start': i * (1<<24),
     'num_samples': 100_000_000,
     'samples_per_thread': 256}
    for i in range(100)
  ],
}))" > manifest.json

# 3. Submit the job
$ curl https://warppool.tech/api/jobs \
    -H "Authorization: Bearer $WARPPOOL_TOKEN" \
    -F "kernel_id=wgsl-pi-7f3" \
    -F "manifest=@manifest.json" \
    -F "reducer=sum" \
    -F "redundancy=2"
> { "job_id": "job-9a2c",
    "queued_at": "2026-05-27T14:02:11Z",
    "chunks": 100, "est_seconds": 38 }

# 4. Stream results back
$ curl https://warppool.tech/api/jobs/job-9a2c/stream
event: chunk_done
data: {"chunk":0,"hits":78539701,"by":["w-3f1a","w-92e0"]}
event: chunk_done
data: {"chunk":1,"hits":78540234,"by":["w-441e","w-2bf8"]}
...
event: job_done
data: {"pi":3.14159265,"hits":7853981633}
No SDK, no lock-in.

The submit protocol is HTTPS and JSON; the worker protocol is JSON over WebSocket. Anything that speaks REST can submit a job. Enterprise customers running on a private pool use the same API, unchanged.

Pricing

Pay for what you used. Down to the millisecond.

Compute is billed in milliseconds of completed work — one ms equals one millisecond of GPU time on a calibrated mid-range device (the M-series benchmark). You only pay for chunks that returned a verified result. Failed chunks, timeouts, canary chunks, and redundant re-runs cost you nothing. Researchers run free.

Researchers · free

Free, no cap, for researchers.

Anyone with a verifiable institutional affiliation — university, public lab, K-12 school, registered research org — runs free. No monthly cap; the only ask is that you don't be silly with it. Class projects, replication studies, exploratory grad work all qualify.

Cap noneVerification EDU email + handshake
Sponsor pool

Sites earn credits; you spend them.

Embed the worker tag on your own site and every completed millisecond of visitor compute lands in your account. Spend it on your own jobs, or push it into the researcher pool. 1 GPU-ms in, 1 GPU-ms out — no transfer fee, no cut.

Earn rate 1 GPU-ms / device-msTransfer fee 0
Pay as you go

Billed per millisecond. Completion only.

For commercial workloads, compute is metered at the millisecond and billed only when the chunk returns a result that passes verification. Failed, retried, and canary chunks are absorbed by the network. Indicative rate works out to about $0.04 per GPU-hour.

Indicative $0.04 / GPU-hourGranularity per ms
Enterprise · private pool

Private coordinator for sensitive workloads.

For teams that need an isolated worker fleet, a dedicated coordinator, regional data residency, or contractual guarantees we don't offer on the public pool — talk to us about a private deployment. Same API, same kernels.

Pricing by contractSLA available
Job dashboard

Chunk-level audit. Every job.

Who ran what. When they returned it. Whether their answer matched the redundant runs. The submitter sees a complete trail and can replay any of it. You choose whether to make your job's log public.

job-9a2c RUNNING · 73% Monte Carlo π · 100 × 100M samples · redundancy k=2 submitter @oconnell-lab · 2026-05-27 14:02
CHUNK
WORKERS (k=2)
HITS
LATENCY
STATUS
chunk_00
w-3f1a (RTX 4070) · w-92e0 (M3 Pro)
78,539,701
1.4 s · 1.8 s
✓ agree
chunk_01
w-441e (RX 7800) · w-2bf8 (Intel Arc A770)
78,540,234
1.0 s · 2.1 s
✓ agree
chunk_02
w-8c10 (M2 Air) · w-fa19 (RTX 3060)
78,539,955
2.6 s · 1.1 s
✓ agree
chunk_03
w-009d (disagreed) · w-7b22 · w-0f3e (re-run)
78,540,012
1.4 s · 1.9 s · 1.3 s
? verified
chunk_04
w-aa01 (RTX 4090) · w-d7c4 (M3 Max)
78,539,818
0.4 s · 0.9 s
✓ agree
chunk_05
w-1f88 · w-c4b2 (M1 Pro)
78,540,447
1.6 s · 1.4 s
✓ agree
chunk_06
w-31f2 · w-ab09 — in flight
0.8 s elapsed
… running
73 / 100 chunks done 7.85B samples consumed π so far = 3.141592… est. completion in 12 s Download full audit log →
Get started

Apply for a free-tier grant.

Two-line description of your project, an institutional email, and a sample kernel. We turn most grants around in 48 hours. Commercial workloads can submit any time.

Request free-tier access Buy credits

What you get on day one

  • 01Unlimited compute, free. Per institution. Just don't be silly with it.
  • 02Kernel library access. Common reductions, FFTs, matmul — drop-in.
  • 03Job dashboard. Citable URL per job, audit log included.
  • 04Forum + office hours. We can help you port your kernel.