Computer vision has the power to transform how businesses monitor, measure, and act on visual data. But before you dive into model training or deployment, it’s essential to understand the conditions that enable successful outcomes. At Plainsight, we follow six foundational principles, call them “Rules of Thumb,” that help us and our customers determine whether a computer vision project is set up for success.

 

1. A model must have clearly defined, structured outputs, such as spreadsheet-ready fields, before training begins.

Before you build a model, define the outcome.

Can the customer articulate what they want to extract from the video stream? Do they know how they’ll use the outputs in downstream systems? If you can’t list what should be in a spreadsheet at the end of the pipeline, like object counts, duration, dimensions, or positions, then it’s too early for model development.

This rule ensures projects start with clarity and measurable objectives. Computer vision doesn’t generate “magic,” it maps visual input to specific data fields.

Ask: “If we gave you a CSV right now, what columns would it contain? How would you use it?”

 

2. A model must be trained on large, diverse, real-world image sets, including edge cases.

A model is only as good as the data used to train it.

To build a reliable filter, you need hundreds to thousands of representative images that capture variations in the object’s appearance, lighting, angles, motion, and background environments. One or two “hero shots” aren’t enough. We need the gritty, real-world edge cases too.

Ask: “Can you give us image examples from bad lighting, motion blur, occlusions, and cluttered scenes?”

 

3. A model must detect objects that occupy at least 2% of the frame and appear in 20% of consecutive frames.

Two quick checks help determine whether an object is viable for real-time detection:

  • 2% Minimum Size: The object should occupy at least 2% of the total image frame.
  • 20% Frame-to-Frame Overlap: In video, the object should appear in at least 20% of consecutive frames.

Why? Smaller or inconsistently visible objects degrade detection accuracy and make tracking nearly impossible. This rule helps screen out use cases that are visually too small, fast-moving, or intermittent to be useful.

Ask: “How large is the object in the frame? Can we see it clearly for several frames in a row?”

 

4. A model must only target patterns a trained human could reliably classify in under one second.

This one’s simple: If a well-trained human could accurately classify the object in under a second, chances are good we can train a machine to do it.

This rule serves as a practical gut check. It’s not about AI replacing humans, it’s about whether the visual distinctions are strong and repeatable enough to be learned programmatically.

Ask: “Would a person, given this footage, consistently spot and label the item in under 1 second?”

 

5. A model must report raw data, not insights, leaving interpretation to downstream analytics.

Computer vision filters are built to detect and measure things, size, color, location, duration, and counts, not interpret them.

If the customer wants answers like “Was this a safety violation?” or “Is this person behaving unusually?”, that’s an analytics or rules-engine layer after the computer vision system. Don’t ask the model to make judgments—it should simply deliver structured observations from visual input.

Ask: “Are you asking for data (what happened), or for an insight (what it means)?”

6. A model must only detect what is clearly visible in the frame, consistently and in real-world conditions.

This rule is as old as data science itself. If the information the customer wants isn’t visually available in the frame, clearly, repeatedly, and under real-world conditions, no model can conjure it.

Examples: Trying to measure fill level in a sealed metallic container? Detecting heat without thermal imagery? Counting objects behind walls? No go.

Ask: “Can you literally see what you want the model to detect? Every time?”

Closing Thoughts

These six rules save time, cost, and frustration. They’re not barriers, they’re reality checks that help qualify and scope a successful computer vision project. When these conditions are met, the likelihood of building a performant and valuable filter goes way up.

At Plainsight, we use these rules during discovery and solution design to ensure our partners are ready to go from vision to measurable results. If you are interested in building out your enterprise computer vision use case, ​​let’s chat!

 View All Blogs