Compute Shaders for physical constraints
Written by Andre Molnar 2026-03-18
I wanted to understand how I could puppet a mesh with physical constraints between vertices in the most efficient way.
For a small number of vertices, CPU solving is usually good enough.
But what about tens of thousands of verticies?
The multi-pass WebGPU buffer pipelines below are deliberately silly toys to learn about GPU first constraint solving.
GPU Compute Canvas
Drag the yellow dot to 'puppet' the mesh. The slight shimmer you might see are the tiny subdivided triangles.
GPU Compute Canvas 2 (High Solve Density)
This variant targets higher solve-triangle density and changes the solver data model:
- Vertex-to-edge and vertex-to-triangle adjacency lists remove global scans in accumulate passes.
- Vertex repulsion uses a rest-space spatial hash broadphase (local candidate lists) instead of all-to-all comparisons.
App Flow
What is the same in each implementation
- Start from the same closed shape path, resample the perimeter, and generate interior points.
- Build a triangle mesh, then capture rest-state constraints (edge lengths and triangle areas).
- Drag the same control point to lock the nearest 4 vertices and preserve their control offsets while dragging.
- Solve prediction + constraint passes each frame and render diagnostics in an overlay.
What changes by implementation
- CPU baseline (Tens of Tris): all solving and rendering are CPU-side (JavaScript + Canvas2D).
- GPU v1 (Thousands of Tris): same core constraints, but solving runs in multi-pass WebGPU compute and mesh rendering is WebGPU-native; readback is only used on drag-start for nearest-vertex lock selection.
- GPU v2 (Tens of thousands of Tris): targets higher solve-triangle density and changes the solver data model:
- vertex-to-edge and vertex-to-triangle adjacency lists replace global accumulate scans,
- vertex repulsion uses a rest-space spatial-hash broadphase (local candidate lists) instead of all-to-all checks,
Algorithmic structure still dominates performance even with acceleration.
CPU Baseline Canvas
For comparison, the CPU version of the app runs the same flow but with all constraint solving done in JavaScript on the CPU.
In all honesty, most of the time you don't need more than this. Why would I want to apply constraints to that many triangles anyway?