Cutting Costs of Computer Vision: Minimize GPUs with Vision at the Edge

There you are, about to walk into your CFO’s office to ask for $10 Million for GPUs to deploy your Computer Vision initiative at scale. I bet you’re wondering how you got here.

Most Computer Vision (abbreviated CV) projects start as proofs of concept, built in the controlled environment of a cloud-based development setup. That’s no small feat, since training models, building applications, and creating initial integrations can be huge cost drivers, but let’s assume the models are trained, the results look promising, and the business case seems solid. You’ve figured out a place where turning cameras into spreadsheets, instead of relying on someone manually recording information with a clipboard, makes a lot of business sense.

Then comes the gut punch: deployment at scale requires a massive investment in GPUs, either in the cloud (with hefty ingress and processing fees) or on-premises (with expensive hardware, power, and maintenance costs). Suddenly, what looked like a game-changing initiative is stuck in limbo, unable to deliver a return on investment.

Computer Vision is one of the most mature and promising fields in AI, with potential applications ranging from manufacturing automation to retail analytics. But if you’ve ever tried to deploy a CV initiative at scale, you already know that the industry has a dirty little secret: the costs don’t just add up—they explode.

Source: Gartner [What’s New in Artificial Intelligence from the 2023 Gartner Hype Cycle]

Why CV Projects Fail Before They Scale: The Computational Demands of Computer Vision

Graphics Processing Units (GPUs) have become integral to AI and CV applications due to their architecture, which allows for parallel processing. Unlike Central Processing Units (CPUs), which are optimized for sequential tasks, GPUs can handle multiple operations simultaneously, making them ideal for the parallelizable workloads common in deep learning and image processing (learn more here) which involve analyzing large amounts of image data and performing extensive matrix multiplications—core operations in neural networks.

GPUs can handle thousands of computations in parallel, significantly accelerating model inference and training. This parallelism is crucial when, for instance, a single 1080p image comprises over 2 million pixels, and processing such detailed information requires substantial computational power. Moreover, advanced CV tasks often necessitate complex models, such as neural networks with numerous layers and parameters. These models are designed to recognize patterns, detect objects, and interpret scenes, but their complexity translates to increased computational requirements.

However, this performance comes at a cost. GPUs can cost 22 times more than CPUs, both in terms of initial investment and operational expenses (learn more here). They consume more power and often require specialized cooling and maintenance, adding to the total cost of ownership. Additionally, the demand for GPUs in AI applications has surged, leading to supply constraints and further driving up prices.

An Architect’s Dream is a Budget’s Nightmare

In the interest of performance, the typical approach to Computer Vision assumes a 1:1 relationship between applications and GPUs. Every time a camera feeds video into a CV model, that data “has to” be processed in real-time—either by shipping it to the cloud for inference or by investing in powerful (and costly) edge hardware. This is commonly where projects fall apart. It’s in the best interest of the CV expert making the POC to not show the full deployment costs, since the back of the envelope makes it so that nobody would have accepted the project when it was proposed–GPU sticker shock would have stopped it before it began.

To compound this problem, a single CV application might be performing several CV processes, each requiring a different model, and therefore ANOTHER GPU. Let’s say you have a camera on a factory floor and want to perform automated Quality Control. You’ll need to find the object in the frame, so you build an object detection model. Then you want to know if it contains defects so you compare it to a known good ideal version of the product modeled as a comparison. If it fails that comparison, you’ll want to know if it’s because of something being done imprecisely by your machinery so it needs maintenance (which requires a dimension measurement model), or if a flaw is being introduced and you need a floor supervisor to find the root cause (so you need a model for defect detection). If any of your data might pass through the EU or California, you’ll also need a model to detect faces to blur them for privacy compliance. And all of these will be run on each frame, streaming in 30 times a second.

So now I need five models on five GPUs…

The False Choice of Deployment Options

Many organizations begin their CV initiatives with a proof-of-concept that runs efficiently on a limited scale. However, when scaling up their deployment to multiple locations, they realize that the “yadda yadda”-ing of finding efficiencies was just a lie. You, as the stakeholder of the CV initiative, are then faced with two model-to-subject data processing options:

Cloud-Based GPU Processing: Leveraging cloud services provides on-demand access to GPUs, but this approach incurs costs related to data ingress, processing, and storage (while marking up the costs for their convenience). High-resolution video streams can lead to substantial data transfer fees, and continuous processing will result in lockstep escalating expenses.
Edge-Based GPU Processing: Deploying GPUs on-premises or at the edge reduces data transfer costs and latency but requires a significant upfront investment in hardware. Additionally, managing and optimizing these systems necessitates ongoing maintenance and specialized expertise.

Either option makes large-scale deployment financially prohibitive and halts the initiative, as the anticipated return on investment diminishes when confronted with the realities of scaling.

Cheaper Preprocessing to Reduce GPU Needs

The whole point of a business Filter (a CV application) is to turn cameras into spreadsheets–this is a form of what’s called semantic compression. A single 4k video frame is 23.71 Million bytes, but the relevant information held in it might be summarized in a JSON file using only 150 bytes. Eliminating irrelevant visual data is the fundamental task of Computer Vision for business.

Since it’s all fundamentally digital data, you could semantically compress all of the relevant information into a file size that’s equivalent to the tiny pink box in the bottom right corner.

To address these challenges, preprocessing raw video on less-expensive CPUs before it reaches the GPU can significantly reduce computational loads and associated costs. By ensuring that only relevant frames—those containing predefined critical events—are processed, businesses can decrease GPU workload and achieve substantial cost savings—up to 90% in some cases. Implementing such preprocessing strategies allows organizations to scale their CV applications more effectively, making deployments financially viable and operationally efficient.

If you want Vision at the Edge, you’ll need to leverage a combination of utilities to compress raw video into structured, efficient inputs:

Color Description: Identifies and processes only frames where meaningful action is taking place.
Image Transformation: Cropping and transforming down to critical components.
Event-Based Processing: Triggers analysis only when predefined events occur rather, than continuously.
Compression and Encoding Optimization: Ensures that the necessary frames are stored and transmitted in the most efficient format possible.

By offloading these preprocessing tasks to inexpensive CPU-based systems, businesses can dramatically reduce GPU usage, cutting cloud processing fees and edge hardware requirements (or both). Most Computer Vision projects fail not because the models don’t work, but because the economics don’t work. By shifting the heavy lifting away from GPUs and toward lightweight preprocessing techniques, we make it possible to deploy CV applications at scale—without blowing your budget.

Cutting Costs of Computer Vision: Minimize GPUs with Vision at the Edge

Why CV Projects Fail Before They Scale: The Computational Demands of Computer Vision

An Architect’s Dream is a Budget’s Nightmare

The False Choice of Deployment Options

Cheaper Preprocessing to Reduce GPU Needs

Let’s Talk

Recent Posts