Computer VisionMay 15, 20266 min read

Computer Vision Deployment at Scale: Lessons from 200 Branches

Deploying a computer vision model in a lab is straightforward. Deploying it across 200 retail branches, each with different lighting, camera angles, network conditions, and hardware, that's an entirely different engineering problem. This is what we learned building and operating a retail CV system at that scale.

Edge vs. cloud: it's not a simple choice

The edge-vs-cloud decision is usually framed as a binary, but in practice it's a spectrum. Our system runs inference on edge devices (NVIDIA Jetson units at each branch) and sends aggregated analytics to a central cloud service. Raw video never leaves the store, that's a hard requirement for privacy and bandwidth.

Edge inference solves latency and bandwidth problems but introduces a fleet management problem. You now have 200 devices that need model updates, OS patches, health monitoring, and remote diagnostics. We built a lightweight agent on each device that reports health metrics every 30 seconds and pulls model updates during off-hours. When a device goes dark, we know within two minutes.

The key insight: edge deployment isn't about compute, it's about operations. Your model serving code is maybe 15% of the codebase. The rest is device management, update orchestration, failure recovery, and monitoring.

Latency requirements are branch-specific

Not all 200 branches have the same latency requirements. High-traffic stores need near-real-time product interaction tracking, under 200ms per frame. Lower-traffic locations can tolerate batch processing every few seconds. We built a tiered processing pipeline: a fast path for real-time detections (customer presence, product picks) and a slower path for analytics aggregation (heatmaps, dwell time).

This matters because running every model at maximum speed on every device wastes power and generates unnecessary heat. In an enclosed retail environment, thermal management of edge hardware is a real constraint nobody warns you about. Two of our early Jetson deployments throttled themselves due to ambient temperature in equipment closets. We added thermal monitoring to our health checks after that.

Monitoring at scale

When you have 200 devices running inference, you need monitoring that's both aggregated and drill-down capable. We track three tiers:

System health: CPU/GPU temp, memory usage, disk space, network connectivity, uptime. These tell you if the hardware is okay.
Model health: Inference latency (p50, p95, p99), detection counts per hour, confidence score distributions. A sudden shift in confidence distribution often means the camera moved or lighting changed, not a model problem, but it produces the same symptoms.
Business metrics: Detection accuracy validated against spot checks, false positive rates, coverage gaps. These are the metrics the client actually cares about.

The hardest monitoring problem isn't detecting failures, it's detecting degradation. A camera that slowly drifts out of alignment over weeks will gradually reduce detection accuracy. If you only alert on sudden drops, you miss it. We run weekly accuracy audits against a sample of manual annotations to catch drift.

Failure handling

At 200 branches, something is always broken. On any given day, two or three devices are offline, one has a network issue, and one has a camera that's been bumped. The system has to tolerate this gracefully.

Our failure hierarchy: if a camera feed fails, the system logs the gap and continues processing other cameras at that branch. If the edge device fails, analytics for that branch stop but the central dashboard clearly shows a data gap rather than reporting zero activity. If the central service fails, edge devices buffer locally and sync when connectivity restores, up to 48 hours of local storage.

The most common failure isn't hardware or software, it's environmental. Staff tape over cameras. Displays get moved in front of sensors. Someone unplugs the edge device to charge their phone. We learned to design for humans as the primary failure mode. That means tamper detection, cable lock recommendations in our installation guide, and clear labeling on every piece of hardware.

What we'd do differently

If we were starting this project today, we'd invest more in remote diagnostics upfront. Being able to remotely view a camera's perspective (privacy-safe thumbnails, not full video) saves an enormous amount of truck rolls. We'd also standardize camera hardware more aggressively, supporting three different camera models across the fleet creates a testing matrix that grows faster than you'd expect.

Scale doesn't just multiply problems. It introduces new categories of problems you never see at five or ten locations. Plan for operations from day one, not after the pilot.

Related service

Computer Vision →