← All implementations
Sports tech · US — Implementation case

The computer vision that turns a phone into a coach that sees.

For a US sports-tech startup, we're building the machine-learning vision layer of their consumer training app — models that read live phone video during real practice, judge the athlete's form and the outcome of each rep in real time, and feed the coaching and competition the whole product is built around.

01
The starting point

Most apps tell you what happened. The hard part is explaining why.

The gap

Plenty of training apps can tell you the result of a rep. The product we're building with goes further: it coaches. To coach, the app has to read how the movement actually looked — the athlete's form and mechanics — not just whether the rep worked. That is a genuinely hard ML problem, and it's the one the whole product sits on.

Harder still: this has to work on a phone propped on a tripod during real practice — not a lab rig with controlled lighting and fixed cameras. Different angles, different light, different environments, and feedback that has to land between reps, not minutes later. Reading video that well, that fast, on consumer hardware, was the layer that didn't exist yet.

02
What we built

A vision layer that reads mechanics and outcomes.

Not a demo that works in one corner. A real-time computer-vision system — models plus the pipeline around them — built to run on a phone in the field and feed everything the product does.

Vision models

Reading the movement

We're building the models that track the athlete's body and the movement through each rep — capturing how the form looked, frame by frame, so the app can judge mechanics and not just count results.

Mechanics + outcomes

What happened, and how

The vision layer detects both sides of every rep: the outcome — what the result was — and the mechanics — how the movement was executed. Reading both is what lets the product explain why a rep went the way it did.

Real-time pipeline

Built for the phone, in the field

The models run on live video from a phone on a tripod — varied lighting, angles, and environments — and return their read fast enough that feedback lands between reps, inside a real training session, not after it.

Product foundation

The layer everything sits on

Instruction, AI coaching feedback, and the gamified competition layer all depend on what the camera understands. We're building the vision layer as the foundation those experiences are built on — get it right and the rest of the product becomes possible.

03
How it ran

Embedded with the founders, building to be owned.

01

Define what to see

We worked with the founding team to pin down what the vision layer actually has to read in each rep — the mechanics that matter and the outcomes that count — and what good enough means in the field.

02

Build the models

We're building and training the vision models against real practice footage — the messy lighting, angles, and environments a phone on a tripod actually sees — not clean lab conditions.

03

Make it real-time

We tune the models and the pipeline around them so the read is fast and reliable enough to feed coaching feedback between reps, live, on consumer hardware.

04

Build to transfer

The models and pipeline are built to be owned and iterated by the client's team — so they can keep improving the vision layer alongside their own product roadmap.

04
Outcomes

A camera that understands the whole session.

As the engagement continues, the vision layer is becoming the thing the product can lean on: a phone that doesn't just record a session but understands it — reading the athlete's form and the outcome of each rep well enough that the app can explain why, not just report what. That read is what turns instruction, feedback, and competition from features into a coach.

We're embedded with the founding team and building alongside their roadmap — and we're building the models and the pipeline to be theirs. The vision layer is being built so the client's team can own it and keep iterating on it as the product grows. We build the layer; they're set up to run it.

Client name withheld by agreement. Happy to walk through the details on a call.

If your product depends on reading video well, let's talk.

Book a 30-minute call. We'll map what your app needs to see, what real-time computer vision on consumer hardware would take to build — and how to set your team up to own it.