← All posts
AI2SEE
Case Study Fitness Tech Computer Vision
Consumer Fitness App · On-Device Computer Vision

We gave a gym app eyes — 30 FPS rep counting, zero wearables, fully on-device.

A fitness startup needed a phone camera that could recognize exercises and count reps in real time. No cloud. No hardware. Just computer vision that works inside a sweaty gym.

98%
Rep-count accuracy across 12+ exercise classes
30fps
On-device inference · no cloud round-trip
14wks
Kickoff → production-ready app
Delivery timeline
Wk 0
Scoping & data audit
Wk 3
Working PoC on real device
Wk 8
30 FPS milestone hit
Wk 14
Beta launch · 400 users

Gym tech relies on hardware. We were asked to remove it.

A consumer fitness startup had a simple — and genuinely hard — product brief: a phone propped on a gym shelf that recognises your exercise and counts every rep, with no wearables, no chest strap, and no internet connection.

Their existing options were either cloud-dependent (unworkable in gyms with patchy Wi-Fi) or wearable-dependent (hardware cost killed conversion). What they needed was a computer vision system capable of real-time inference on a mid-range Android — a device with roughly a hundredth of a server GPU's compute budget.

Every approach they had tried before either ran too slowly to be useful in-session, broke under real gym conditions (poor lighting, busy backgrounds, varied clothing), or required per-user calibration that killed the out-of-box experience.


A two-stage on-device pipeline, purpose-built for the constraints.

We designed a pipeline that separates what the person is doing from how many times they've done it — two distinct inference problems that each have a right-sized answer. Decoupling them meant we could tune each independently without touching the other, and swap components as the hardware targets changed.

Pipeline — camera to counter
Camera Input
30 FPS · square crop · normalised
RGB frames
Pose Extraction
17-point skeleton · <8 ms/frame
keypoints
Action Recognition
Rolling buffer · runs at 4–6 Hz
exercise label
Rep Counter
Cycle detection · <200 ms lag
rep count ✓
Phase 01

Data Foundation

Audit of available data, gap analysis, data collection protocol for gym-environment clips, annotation pipeline.

Phase 02

Model Development

Pose backbone selection, exercise classifier fine-tuning, rep-counting module development, thread-budget profiling.

Phase 03

On-Device Optimisation

Model quantisation, runtime export, device-tier benchmarking, thermal stress testing across 20+ handsets.

Phase 04

SDK & Integration

Native mobile SDK (iOS + Android), API surface for form feedback, workout summary module, beta instrumentation.


98% accuracy. Real gyms. Mid-range phones. No cloud.

98%
Rep-count accuracy within ±1 rep across all 12 exercise classes
30fps
Sustained on-device inference, Pixel 6a and above
<200ms
Latency from motion peak to on-screen counter increment
400+
Beta users across 3 cities — zero calibration required
"We'd tried two other vendors before AI2SEE. Both delivered something that worked on a good phone in a well-lit office. AI2SEE delivered something that works in an actual gym — and they did it in 14 weeks."
Head of Product, Consumer Fitness App (Series A, US)

The app passed the client's internal "shelf test" — propped 8 feet away against gym equipment, counting through 12 exercises across six testers without manual recalibration. Thermal headroom after a 45-minute continuous session stayed within spec on every test device. The beta hit their internal accuracy threshold (≥95% within ±1 rep) in the first week of user testing.

"The hardest part wasn't the models. It was teaching two models to share a phone politely."

How the system is structured under the hood.

We don't publish model names or exact architectures publicly — the stack is a competitive advantage for the client. What we can share is the structural logic, which applies to any project in this class.

Stage 1 · Recognition

Action Classification

A compact video-understanding model fine-tuned on skeleton sequences. Optimised for the 20-class fitness taxonomy — not a general-purpose benchmark model running on a server.

Stage 2 · Counting

Temporal Cycle Detection

A counting module that detects motion cycle boundaries with complexity that scales linearly — not quadratically — with session length. Critical for sessions over 10 minutes.

Thread Mgmt

Async Inference Orchestration

A producer-consumer architecture that keeps the camera thread at 30 FPS while model inference runs asynchronously on a background thread — the standard pattern for mobile CV.

Deployment

Native Mobile SDK

Exported to a mobile-optimised runtime, quantised to 8-bit where precision allowed, and wrapped in a clean SDK surface the client's iOS and Android engineers could integrate in a sprint.

Computer Vision Pose Estimation Action Recognition On-Device ML Mobile SDK Model Quantisation MLOps

AI2SEE · Proven in weeks, not years

Building something similar?

We work with teams shipping computer vision and on-device AI into real hardware constraints. Start with a free 30-minute scoping call — no pitch deck required.

Talk to our team →

We respond within 24 hours. First call is free.