Automated visual inspection systems give manufacturers the ability to monitor and respond to production issues in real time, reducing costs and improving quality. Today, most visual inspection systems consist of some type of image capture hardware, and an integrated or discrete computer equipped with specialized software to process images. At the heart of this software is a computer vision algorithm that takes in the array of numbers that represents the image of the product, performs some mathematical operations on these numbers, and computes a final result. For example, the computer vision algorithm may determine that an entire product is defective, detect the type and location of a defect on a product, check for the presence of certain subcomponent, or measure the overall quality of finish.

In traditional machine vision systems, this computer vision algorithm is broken into two steps. In the first step, typically called feature extraction, a set of mathematical operations are performed on the raw pixel values of the image. For example, if searching for defects in an image of a product, the feature extraction step may consist of sliding a small window across the entire image, and for each window location, computing the contrast - the difference between the brightest and darkest pixel – for the pixels within the window. This feature could be useful in making a final determination, because windows with higher contrast may be more likely to contain defects.


In the second and final processing step, the features computed in the first step are combined to make a final decision about the image. This decisioning step is often accomplished using a combination of manually tuned parameters or thresholds. For example, our computer vision algorithm may flag an image as defective if any window contains contrast greater than ten.

Now, as you can imagine, this approach may work well in some cases, but may fail in others. Not every high contrast region of an image represents a defect. These types of errors often result in high false positive rates, where machine vision systems flag good products as defective. To mitigate these issues, some systems use many different types of features, in an effort to make more fine grain distinctions. This approach can result in better performance, but comes with a real cost. Lots of features means lots of parameters or thresholds to tune, making these systems difficult to adapt to changing conditions on the factory floor, even for the most experienced operators and engineers.

So this two-step approach of feature extraction followed by decisioning, at the heart of many machine vision systems, can in practice be very difficult to successfully deploy and maintain.

Now as you can imagine, this is not just a problem in manufacturing – this two-step approach shows up in many other computer vision applications as well, and for decades researchers have been searching for a more robust and scalable way forward.


One interesting alternative approach is to replace our two-step pipeline with a single unified model that is capable of both extracting features from our images and decisioning. Of course, if we set out to engineer or design a unified model like this, we may end up right back where we started, with two distinct steps.

The real trick here is, instead of explicitly programming the unified model, designing the unified model is such a way that it can learn from labeled data. Fabric-Sample


Then, in 2012, researchers at the University of Toronto published breakthrough work showing how for the first time, a neural network that was many layers deep that could be successfully trained on a large-scale dataset. [2]Unfortunately, for most of the history of computer vision, no one really knew how to accomplish this. Researchers came close in the 1980s and 1990s, developing computational models called neural networks; but even as recently as 2010, it really wasn't clear if these models could solve the types of general computer vision problems we really care about solving. [1]

This unified learning approach is called deep learning, and over the last few years has completely revolutionized the field of computer vision. For example, on the challenging ImageNet image classification benchmark, deep learning based approaches dropped the error rate from around 30% to less than 4% in from 2012 to 2015 – achieving super-human image classification performance across the challenge’s 1000 distinct image classes.


For visual inspection applications, deep learning offers dramatic performance improvements over traditional feature extraction and decisioning methods. By learning from human-expert-labeled examples specific to the manufacturing problem at hand, deep learning models can emulate expert decisioning at large scale and high speed. Further, since both feature extraction and decisioning is learned, deep learning systems do not require endless tuning to adapt to changing conditions. Stephen-Tree-e1572886079118


Deep learning offers a powerful alternative to traditional machine vision approaches, and when deployed in the right applications, and on top of the right infrastructure, can deliver tremendous business value.

Watch a full presentation hosted by Vision Systems Design or Contact Us to see how we can help improve your machine vision process.


[1] Remarkably, most of the key mathematics and ideas for deep learning we’re in place in the 1990s. The biggest missing ingredient were computer power and large labeled datasets. See: LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

[2] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

This post was authored by Stephen Welch.