Taking a Data-Driven and Model-Driven Approach to AI

It’s been decades now since organizations first began to explore artificial intelligence in earnest by determining priorities, making investments, and, of course, collecting lots of data. Despite the near-incalculable quantities of information at their disposal, most organizations haven’t taken what thought leader and Landing AI CEO Andrew Ng calls a truly data-centric approach. He suggests a lack of true data centricity may be to blame when organizations fail to realize the expected results from their AI investments.

“Enterprises know they need a wealth of data, but disappointing results show that many are not taking the right approach in putting that data to use,” says Plainsight’s Co-Founder and Chief Product Officer, Elizabeth Spears. “Even if they’re focused on data, enterprises are not what Ng would call ‘data centric.’ Plainsight’s vision AI solutions and platform make it simpler for organizations to achieve true visual-data centricity, generating new insights about their products, processes, and customers to drive innovation.”

What Is Data-Centric AI?

Ng describes data-centric AI as an approach to technology that “systematically engineer[s] the data needed to build a successful AI system.” Such an approach would see businesses place a a new emphasis on the quality of their data (even if it means sacrificing quantity) and the consistency of their labeling. By improving the quality of their data, Ng contends, they’ll unlock the value of AI and potentially see the results they’ve been waiting for.

Common Data Problems

What’s keeping organizations from making the most of their data and realizing the impressive powers of AI? Ng suggests a number of data management mistakes and bad habits could be holding businesses back.

Labeling inconsistencies: Organizations in a range of industries (like manufacturing, agriculture, and pharmaceuticals) deploy models to recognize issues like product and packaging defects. Issues can arise, however, when the people who train these models have different definitions for what constitutes a defect. Ambiguous definitions can potentially confuse models and lead to inaccuracies and inconsistencies in their performance. For Ng, part of taking a data-centric approach to AI means ensuring the quality of data by eliminating any quirks that might thwart a model’s efforts.
Too much of a not-so-good thing: When it comes to data, more isn’t always better. Enterprises that place an emphasis on collecting data without simultaneously focusing on the quality and potential usefulness of that data are potentially wasting lots of time and resources. In certain instances, it’s better to have a slightly smaller amount of high-quality data than to amass more data and risk reducing the quality of the dataset.
Do-it-yourself curation: Organizations, Ng suggests, have been far too reliant on the know-how of individual data scientists for far too long. Historically, individuals have often been tasked with (or taken it upon themselves) identifying data-related problems and rectifying them. Instead, a more systematic approach to data collection, curation, and management could lead to quicker and more dependable results.

Data-Centric AI vs. Model-Centric AI

When AI experts and professionals discuss data-centric AI, they often juxtapose it against model-centric AI. This alternative approach to AI-enabled solutions and machine learning focuses on how changes to an AI model (updates to its algorithm, for example) can affect its results. Developers may prefer this approach because it sees them working to directly solve specific problems rather than poring over datasets to ensure their quality.

Ng isn’t the only one to argue that such an approach could find businesses struggling to make use of their AI models. Writing for Toward Data Science on Medium, Dario Radečić acknowledges that a model-centric approach is definitely more fun for developers. That’s probably why Ng claims on YouTube that an overwhelming portion of recent papers on machine learning have taken such an approach.

Radečić is just as emphatic as Ng, however, that a data-centric approach is preferable. It not only leads to higher quality results, but ultimately takes less time than model-centric machine learning, which often requires laborious fine-tuning.

Plainsight Is Data- and Model-Driven

Taking the first step toward more data-centric AI is as simple as recognizing the immense value your data holds as well as the value inherent in curating it carefully. Managing visual data is no small feat and requires a dedicated expert with the appropriate skills and an effective toolset.

But centralizing and systematizing your organization’s approach to AI isn’t enough—nor is it enough to focus on updating and testing the models you deploy at the expense of data quality. Even better than a data-centric approach to AI is a data-driven approach, one that focuses on both the quality of the data collected and the efficacy of the models this data is used to train.

A lifecycle wheel showing the 5 parts of the Plainsight vision AI platform: Collecting Data, Labeling Images, Training Models, Deploying Solutions and Operationalizing the Insights.

Ng argues that what he calls data-centric AI is the surest way to realizing the value of AI-powered technology. What’s more, he concludes that what organizations ultimately need is not a large, multi-purpose AI system, but custom-built systems trained on their own data.

Plainsight’s experts arm enterprises with custom models trained on their visual data and a platform that allows them to scalably maintain model efficacy throughout the computer vision lifecycle. Some use cases include:

Precision Livestock Counting for attaining precise counts in real time and eliminating loss.
Tank Fill-Level Monitoring for remote measurement of liquids inside tanks without opening.
Real-Time Leak Detection for spotting volatile organic compound leaks earlier to keep employees, communities, and the environment safe.
Product Defect Detection for recognizing minute imperfections in product and packaging, potentially stopping recalls before they occur.

Vision AI for Detecting Packaging Defects

Due to customer privacy, the videos above are for demonstration purposes only.

See More with Visual Data-Driven Models from Plainsight

Plainsight provides the unique combination of AI strategy, a visual data science toolset and deep learning expertise to develop, implement, and oversee transformative computer vision solutions for enterprises. Schedule a call to learn more.