Run Jobs — Warppool

Why Warppool

Compute that runs on hardware that already exists.

Every job on Warppool runs on consumer GPUs that are already manufactured, already powered on, and already idle in browser tabs. No new data center capacity needed.

$0

For researchers

EDU email gets unlimited compute. No grant, no cap, no waitlist.

~$0.04

Per GPU-hour, commercial

Billed per millisecond. Only verified results. Failed tasks are free.

0

New servers spun up

Runs on idle consumer GPUs. No new hardware. No new carbon.

curl

Is all you need

No SDK, no cloud account, no Terraform. HTTPS + JSON, that's it.

The environmental case

Data centers consume ~1.5% of global electricity and growing. Most of that powers hardware idle between bursts. Warppool flips the model: instead of building more servers, use the GPUs already plugged in everywhere. Every job you run here is a job that didn't need a new rack, a new cooling system, or a new power contract.

Submit flow

Five things you hand Warppool.

That's the whole interface — no SDK, no cloud account. Anything that speaks HTTPS and JSON can submit a job.

01 Kernel A WGSL compute shader — the standard GPU shader language. Or write Python and compile it (see below).

02 Manifest How the job splits into independent work units: seed ranges, sample counts, per-chunk parameters.

03 Reducer How results combine — sum, max, mean, or a custom reduction kernel.

04 Redundancy Each task is independently verified by 2–3 workers. Disagreements are re-run; you never see them.

05 Deadline Optional. The scheduler prioritizes chunks to land the full result before your cutoff.

Full walkthrough in the docs →

A complete example

Submit a Monte Carlo π job.

What a real submission looks like end-to-end — kernel, manifest, and the REST call. About 60 lines including comments.

monte_carlo_pi.wgsl

@group(0) @binding(0) var<uniform> params: Params;
@group(0) @binding(1) var<storage, read_write> hits: atomic<u32>;

struct Params {
  seed_start:         u32,
  num_samples:        u32,
  samples_per_thread: u32,
}

// PCG hash — fast, good statistical properties
fn pcg(x: u32) -> u32 {
  var s = x * 747796405u + 2891336453u;
  var w = ((s >> ((s >> 28u) + 4u)) ^ s) * 277803737u;
  return (w >> 22u) ^ w;
}

@compute @workgroup_size(256)
fn main(@builtin(global_invocation_id) gid: vec3<u32>) {
  let tid = gid.x;
  var state = pcg(pcg(params.seed_start + tid));
  var local_hits: u32 = 0u;
  for (var i: u32 = 0u; i < params.samples_per_thread; i = i + 1u) {
    state = pcg(state);
    let x = f32(state) / 4294967296.0;
    state = pcg(state);
    let y = f32(state) / 4294967296.0;
    if (x*x + y*y < 1.0) { local_hits = local_hits + 1u; }
  }
  atomicAdd(&hits, local_hits);
}

submit.sh

# 1. Upload the kernel
$ curl https://warppool.tech/api/kernels \
    -F "src=@monte_carlo_pi.wgsl"
> { "kernel_id": "wgsl-pi-7f3" }

# 2. Build a chunk manifest — 100 chunks of
# 100M samples each, distinct seed ranges
$ python -c "
import json
print(json.dumps({
  'chunks': [
    {'seed_start': i * (1<<24),
     'num_samples': 100_000_000,
     'samples_per_thread': 256}
    for i in range(100)
  ],
}))" > manifest.json

# 3. Submit the job
$ curl https://warppool.tech/api/jobs \
    -H "Authorization: Bearer $WARPPOOL_TOKEN" \
    -F "kernel_id=wgsl-pi-7f3" \
    -F "manifest=@manifest.json" \
    -F "reducer=sum" \
    -F "redundancy=2"
> { "job_id": "job-9a2c",
    "queued_at": "2026-05-27T14:02:11Z",
    "chunks": 100, "est_seconds": 38 }

# 4. Stream results back
$ curl https://warppool.tech/api/jobs/job-9a2c/stream
event: chunk_done
data: {"chunk":0,"hits":78539701,"by":["w-3f1a","w-92e0"]}
event: chunk_done
data: {"chunk":1,"hits":78540234,"by":["w-441e","w-2bf8"]}
...
event: job_done
data: {"pi":3.14159265,"hits":7853981633}

Prefer Python? Skip the WGSL.

Our open-source py2wgsl package (pip install py2wgsl) compiles annotated Python functions straight to WGSL compute shaders — write your kernel as ordinary Python, get the shader and its bindings generated for you.

No SDK, no lock-in.

The submit protocol is HTTPS and JSON; the worker protocol is JSON over WebSocket. Anything that speaks REST can submit a job. Enterprise customers running on a private pool use the same API, unchanged.

Pricing

Pay for what you used. Down to the millisecond.

Compute is billed in milliseconds of completed work — one ms equals one millisecond of GPU time on a calibrated mid-range device (the M-series benchmark). You only pay for tasks that returned a verified result. Failed tasks, timeouts, canary tasks, and redundant re-runs cost you nothing. Researchers run free.

Researchers · free

Free, no cap, for researchers.

Anyone with a verifiable institutional affiliation — university, public lab, K-12 school, registered research org — runs free. No monthly cap; the only ask is that you don't be silly with it. Class projects, replication studies, exploratory grad work all qualify.

Cap noneVerification EDU email + handshake

Sponsor pool

Sites earn credits; you spend them.

Embed the worker tag on your own site and every completed millisecond of visitor compute lands in your account. Spend it on your own jobs, or push it into the researcher pool. 1 GPU-ms in, 1 GPU-ms out — no transfer fee, no cut.

Earn rate 1 GPU-ms / device-msTransfer fee 0

Pay as you go

Billed per millisecond. Completion only.

For commercial workloads, compute is metered at the millisecond and billed only when the task returns a result that passes verification. Failed, retried, and canary tasks are absorbed by the network. Indicative rate works out to about $0.04 per GPU-hour.

Indicative $0.04 / GPU-hourGranularity per ms

Enterprise · private pool

Private Warppool instance for sensitive workloads.

For teams that need an isolated worker fleet, a dedicated scheduler, regional data residency, or contractual guarantees we don't offer on the public pool — talk to us about a private deployment. Same API, same kernels.

Pricing by contractSLA available

Is your workload a fit?

Warppool wins when the math is heavy and the data is light.

Shipping a task across consumer connections costs real time. The job pays for itself only if computing the task takes meaningfully longer than transferring the bytes that describe it. Rule of thumb: high FLOPs-per-byte, embarrassingly parallel, tolerant of stragglers.

✓Built for this

Monte Carlo & stochastic simulationseeds in, scalar out. Highest FLOPs-per-byte you'll ever see.
Parameter sweeps & hyperparameter searchsame kernel, different constants per task.
Batched ML inferencevision model + 10k images: each image is its own task.
Tiled offline renderingpath tracing, scientific visualization, splat extraction.
Brute-force searchcryptanalysis benchmarks, combinatorial enumeration, scan-and-filter.

!Look elsewhere

Training large modelstasks must talk mid-step. Use a tightly coupled cluster.
Latency-critical inferenceround-trip over WebSocket dominates anything under ~150ms.
Big-data ETLmoving terabytes through volunteer pipes is the wrong direction.
Regulated datavolunteer nodes are untrusted; PHI, PII, classified work doesn't belong here.
Bit-exact reproducibilitydifferent GPUs round floats differently. Use integer kernels if exactness matters.

Get started

Apply for a free-tier grant.

Two-line description of your project, an institutional email, and a sample kernel. We turn most grants around in 48 hours. Commercial workloads can submit any time.

Request free-tier access Read the docs

What you get on day one

01Unlimited compute, free. Per institution. Just don't be silly with it.
02Kernel library access. Common reductions, FFTs, matmul — drop-in.
03Job dashboard. Citable URL per job, audit log included.
04Forum + office hours. We can help you port your kernel.