Visual Synthetic Data Generation: a Paradigm Shift for Computer Vision


By Guglielmo Iozzia, Associate Director – Data Science, ML/AI, Computer Vision, MSD

Computer Vision (CV) is a branch of Artificial Intelligence (AI) for which we can observe an ever-increasing trend of use cases in different industries.

You can identify 2012 as the boom year for CV, thanks to the improvements since Neural Network algorithms (which were already there for a while) started matching humans in several visual tasks. This progress is due to multiple factors that started approximately in the same year: a continuous improvement and scale production costs reductions of computational hardware, the advent of the cloud, larger research investments both in academic and industrial environments, open sourcing of novel models, larger availability of data sources.

Manufacturing has been the slowest in the adoption of AI-based CV, but nowadays, you can find many (in production or ready to go) applications for diverse tasks, such as quality inspection, safety, robotics, packaging, XR, etc.

Unfortunately, now as these systems are deployed to production, we are often going to hit a brick wall for one or a combination of the following issues:

  • Unreliable labelled data: labelling typically is made by a pool of SMEs that cannot have the same level of expertise in the required knowledge domain, making assigned labels subjective.
  • Little effort from academia to get good data: most research in CV focused and still focused mostly on algorithms.
  • Lack of generalization: several CV algorithms fail when applied to a different domain from the original one they have been trained for.
  • Last but not least, lack of data (at all or part of the data, you need to train a model to perform a given CV task).

The latter is the situation that you can often face in some manufacturing use cases, such as visual inspection or quality controls.

For the reasons listed above, a new paradigm has been born and has already started to be adopted by diverse organizations. The goal is not to create better algorithms but increase performance by changing the data itself by having more control of the data sourcing process. This can be achieved by balancing or enhancing existing datasets through synthetic data generation. The real goal here isn’t adding more data at the same time to training datasets but adding more variety of data (attention to quality rather than quantity then) to make models more robust. Moving from more complex to smarter then, instead of building larger models and using more computational power to solve problems, we can be smart about how we source data from which algorithms learn. Algorithms don’t need more of the same data to learn; they need a variety of everything.

Deep Generative Models are used in this new paradigm. These models usually hit the headlines only when they have been used for malicious purposes to generate Deep Fakes. In reality, there are also several applications for goods across diverse industries.

There is already a market for this: dozens of startup companies with significant revenue offering services to generate synthetic data (images and videos in particular) exist. Unfortunately, the majority of these companies focus on some general knowledge areas, such as automotive/autonomous driving, satellite or aerial imagery, and retail. At the same time, there is very little or no coverage for domains that require specific knowledge, such as manufacturing or chemistry.

With reference to CV, a truly scalable way of creating training data is through Virtual Reality (VR) engines. In terms of fidelity, the output has become indistinguishable from the real world, and this approach gives full scene control to the users, allowing them to generate smart data to be used to better train algorithms. This approach not only can overcome the lack of required data but also reduces the time-consuming and expensive process of manual data labelling.

But the reality is much more than what the human eye can see and refer to use cases understandable by humans with dedicated knowledge and expertise. The algorithms that are built and trained with data that can be generated through VR-based engines are mostly focused on what a human can understand and label. In the future, we would probably be in a condition to build algorithms for sensors that measure beyond human perception and/or niche domains, but at present, these tasks require a programmatic effort. There are already organizations pioneering in this space for these challenging use cases. Some examples have been shared during some major ML/AI conferences, with examples from the energy, semiconductors, chemistry, and pharma manufacturing sectors. Because of the lack of dedicated third-party services, if an organization wants to pursue a synthetic data generation strategy by itself, it has to:

  • Think out of the box.
  • Have a clear idea of the business value to achieve.
  • Once a value assessment has been completed, move fast in order to fail fast to get the answers needed and succeed faster.
  • Be agile.
  • Don’t constantly reinvent the wheel: give existing papers and Open Source models/tools a chance.
  • Reduce technical debt: doing things in cloud infrastructure and using a dedicated ML/AI platform helps keep the focus only on the specific business problem.
  • Most of all: Invest in people and attract and retain talents:
  •                 Domain experts, as they have the knowledge
  •                 Data Scientists and ML Engineers, as they can make ideas reality

Definitely, this is a challenging step, but I have good reasons to believe that the time for wide adoption of visual synthetic data is now.