Skip to content

Flow — Frame Upload + ML Recognition

The flagship Sense flow. Mobile records video, slices frames, uploads them, and gets back ML detections.

Narrative

A field inspector drives a route. The mobile app records video and, every few meters, slices a frame. While still in the field (intermittent connectivity, often), the app uploads frames, commits their metadata, and confirms the track. The Sense Worker hands off to Vision — a Python pipeline that runs a Triton-based quality check, dispatches frames to per-customer external detectors over HTTP, accepts their results back via a push API, then clusters bboxes into real-world objects and persists everything to ClickHouse. Map matching runs in parallel, decoupled from frame recognition.

The flow has four mobile-driven gRPC calls and two async fan-outs (frame recognition + map matching) that run concurrently in Vision:

  1. CreateTrack — open a track session.
  2. UploadFrameMetadataBatch — reserve N frames, get N presigned S3 URLs, upload bytes directly to S3.
  3. CommitFrameBatch — once bytes are in S3, send the metadata.
  4. ConfirmTrack — declare the track done; Sense Worker emits FrameRecognitionRequest per frame and TrackMatchingRequestEvent per track. Vision picks both up.

Sequence

The frame-recognition and track map-matching legs share no Kafka topics. They run concurrently from ConfirmTrack onward and only re-converge inside ClickHouse. They're shown as two diagrams below because that's how they actually run in production. Solid arrows are Kafka messages; dashed arrows are HTTP / gRPC calls.

Mobile → Sense Backend (steps 1–4)

High-level overview

Mobile → Sense Backend high-level flow

Detailed sequence

Mobile → Sense Backend detailed sequence

Vision: frame recognition (per frame)

High-level overview

Vision frame recognition high-level flow

Detailed sequence

Vision frame recognition detailed sequence

Vision: track map matching (per track, parallel)

Triggered by the same ConfirmTrack that started recognition, but fully decoupled — Clusterization does not wait for matched coordinates, and is_map_matched=true on axion_sense.frames / axion_sense.tracks is set by the Sense Worker after consuming the result, not by Vision.

Vision track map matching flow

Both ends of the matching loop publish to the same axion.sense.track.metadata topic with the same track_id partition key — guarantees per-track ordering (TRACK_MATCHING_RESULT lands on the same partition as the request) and lets Sense Worker and Vision Matching coexist on one topic via the message-type header filter.

Key invariants

  • Detection rows in the shared axion_sense.detections table are written only from Vision Clusterization (clusterize_detections) — the push path. Vision Worker writes only image-quality verdicts and dispatch-error rows. There is no Kafka roundtrip of results back to Sense Backend; Sense reads detections directly from the same table.
  • Vision Worker dispatches frames to external detectors but never receives their results via Kafka. Results return strictly via the Vision Detections HTTP API.
  • A frame's chain through vision_frames_lifecycle ends as soon as either (a) the per-detector loop acks a successful HTTP dispatch with no further next_step_detectors, or (b) quality returns verdict=false. The chain resumes only when an external detector pushes results back via the Detections API.

Why this shape

Why mobile uploads frames directly to S3

The API is the bottleneck if you proxy bytes through it. Direct presigned PUT means the API never sees the image data — it just hands out URLs. This decouples upload throughput from API capacity entirely.

Why two steps for "frame committed"

Reserving (UploadFrameMetadataBatch) and committing (CommitFrameBatch) are separate so the metadata write only happens after the bytes are confirmed in S3. If the app loses connectivity between the two, the bytes are still in S3 (orphaned but recoverable) and the metadata simply never lands — no half-committed state.

Why per-track Kafka key

All Kafka topics in this flow use track_id as the message key. That guarantees per-track ordering on the Worker side: the metadata → confirmed sequence for one track is processed in order, but tracks are processed in parallel.

Why recognition is async

Recognition latency varies (model size, queue depth on each external detector, frame complexity). Putting it in the request path would couple our SLA to a fleet of third-party detectors. With Kafka between Sense and Vision, and HTTP-with-retries from Vision out to detectors, each detector scales independently and we get back-pressure for free.

Why results come back over HTTP, not Kafka

External detectors are run by partners and customers. Asking each one to operate a Kafka client is a deployment burden we wouldn't get away with. A push API (POST /api/v1/detections/{detector_name}) is universal — and Vision converts the push into the same LocationEstimationRequiredEvent the rest of the pipeline expects, so internal handling stays event-driven.

Why a separate quality stage

Most rejected frames (motion blur, glare, dark) shouldn't be sent to the more expensive external detectors. A Triton quality check up front turns the second stage into a function of "passed quality" frames only, and the verdict is persisted regardless so we can tune the threshold against historical data.

Why map-matching runs on confirm

We need all the GPS samples before we can map-match correctly. If we tried to match each batch as it arrived, the matched track would jitter at batch boundaries. ConfirmTrack is the signal that the route is complete.

Why map-matching is decoupled from recognition

The two flows share no Kafka topics. Frame recognition uses recognition_requests / vision_frames_lifecycle / clusterization_requests; map matching uses axion.sense.track.metadata. They fan out concurrently from the Sense Worker on independent consumer groups and only re-converge inside ClickHouse. Clusterization never blocks on matched coords, and is_map_matched=true is set by the Sense Worker after consuming the matching result — Vision does not write that flag.

Failure modes

Failure What happens Recovery
Presigned URL expires before mobile uploads (network outage) PUT returns 403. Mobile re-requests via UploadFrameMetadataBatch and re-uploads. Old object key is unreferenced. Lifecycle policy on the bucket cleans orphans (e.g., 7-day delete on frames/.../tmp/).
Mobile crashes between UploadFrameMetadataBatch and CommitFrameBatch Bytes in S3, no metadata. Mobile resumes track from local state and re-issues CommitFrameBatch. Sense API is idempotent on (track_id, frame_id).
Mobile uploads frame N+1 to S3 then app dies before commit Same as above — bytes orphaned, no detection. Manual cleanup or wait for lifecycle policy.
Kafka producer error in API after DB write API returns success but downstream is missing. Outbox-style retry: API uses transactional outbox or simply retries publish; Kafka linger.ms + acks=all; idempotent producer.
Quality (Triton) unavailable vision_frames_lifecycle consumer for the quality group lags; no quality verdicts are produced. Vision Quality fails fast at startup if vision.config.qualityDetector is not set; transient Triton outages back-pressure naturally on Kafka.
External detector returns HTTP error or times out Vision Worker logs every attempt to worker_external_api_calls and publishes a PredictionDoneEvent with ErrorInfo; process_prediction_done persists an error row in detections. No object_id linkage — the frame appears in dashboards as "dispatch failed". Operators replay from worker_external_api_calls once the detector is healthy.
Detector pushes results to Detections API but the API is down Detector retries per its own policy. The push API is the single ingress for non-error results — there is no Kafka backup path. Detections API runs ≥2 replicas behind the gateway; bearer-token auth + per-request raw body logging via RawRequestLoggingMiddleware so missed pushes can be reconstructed from detections_api_raw_requests.
Vision Worker / Quality / Clusterization crashes mid-batch Kafka offset not committed; on restart, the priority polling loop reprocesses. save_objects and save_frame_detections use DELETE + INSERT (atomic by primary key) on the shared axion_sense.detections table — duplicate processing collapses to a single row.
Valhalla / Map Matching API unavailable Vision Matching produces a TrackMatchingResultEvent with error populated; track stays is_map_matched=false. CoverageSyncJob (Sense Worker) re-runs map-matching on a schedule for unmapped tracks.

Observability

  • Every span carries track_id, org_id, request_id.
  • Mobile → API gRPC: span name = RPC method.
  • API → Kafka: span name = kafka.produce <topic>.
  • Worker consume: span name = kafka.consume <topic>.
  • Worker → ClickHouse: span name = clickhouse.insert <table>, with rows_inserted attribute.
  • The full trace from CreateTrack to the last detections insert is one trace tree in SigNoz, joined by request_id propagation.

Code references

Sense Backend (axion.sense.backend): - RoadDataServiceImpl (gRPC handlers): src/Axion.Sense.Api/Services/RoadDataServiceImpl.cs - Kafka producers registered in: src/Axion.Sense.Api/Program.cs - Worker consumers: - TrackMetadataBatchMiddleware — produces FrameRecognitionRequest and TrackMatchingRequestEvent - ProcessTrackMatchingResultOperation — applies TrackMatchingResultEvent to ClickHouse - CoverageSyncJob — Hangfire job for map-matching catch-up

Vision (axion.sense.vision): - Worker entry points: vision/worker/ (process_recognition_request, process_prediction_done, _process_external_recognition) - Quality: vision/quality/ (Triton client + process_quality_check) - Clusterization: vision/clusterization/ (estimate_location priority loop + clusterize_detections subscriber) - Matching: vision/matching/ (Valhalla / external map-matching backend) - Detections API: vision/api/ (FastAPI + RawRequestLoggingMiddleware) - Lifecycle event schemas: vision/common/schemas/lifecycle_events.py

For the canonical narrative, see: - Sense side: axion.sense.backend/docs/FrameUploadFlow.md - Vision side: axion.sense.vision/docs/AxionSenseVisionArchitecture.md