A Deeper Dive on Learning OCR

You need to know everything there is to know about this new AI-based text reading tool, mainly because it’ll tell you what you need to know about so many things such as your products’ quality and status.

My colleague Jim Witherspoon caused quite a stir when he claimed in a recent blog post that deep learning OCR was one of the most significant advancements in vision technology as of late. Most people don’t think of OCR as “advanced” in any way, shape, or form except maybe advanced in age. (It’s over 50 years old.) However, deep learning OCR will ace tests that would stump conventional OCR. So, I argue that deep learning OCR tools should be as commonplace in manufacturing, warehousing, shipping, and receiving facilities as smartphones are in modern society.

Anywhere products and packaging must constantly be scrutinized before being cleared and sorted for onward movement could benefit from deep learning OCR tools that have been trained to hone in on even the tiniest of discrepancies in text engraved on the tiniest chips, pills and components. Seeing inconsistencies in label data? Worried that the wrong information was printed on packaging? Deep learning OCR tools will tell you if your hunch is right, and it will do so in a split second.

That’s right: contrary to popular belief, it is possible to teach an old dog new tricks and this old dog (OCR) can learn a lot of new tricks, and fast, thanks to deep learning AI models. We’re talking within minutes.

So, keep an open mind as you keep reading because deep learning OCR is NOT the traditional OCR you’re thinking of right now, and it won’t create the same problems that gave traditional OCR a bad rap over the years. In fact, it solves for many if not all of the biggest challenges you’ve probably experienced with traditional OCR techniques. For example, deep learning OCR.

Does NOT need a lot of training time. There are now pre-trained models you can have up and running in 5-10 minutes. Just “unpack” the neural network, give it a few directions, and it will get to work.

Remains stable even as environmental settings such as lighting change.

Handles complex use cases like a pro, in part because the neural network is trained for industrial, clean room and supply chain scenarios where “complex” is normal. (We’ve seen the Aurora Deep Learning OCR™ neural network achieve up to 97% accuracy straight out of the box, even when dealing with very difficult cases.)

Eliminates the need for AI or machine vision “experts” to be on your payroll. Showing the pre-trained AI algorithm how to work in your environment is as easy as drawing a box around the characters in whatever you need it to inspect and letting the tool do the rest. Your team just sets the character’s height, minimum confidence score and match string to have things up and running. If you need to make a change, inspections can be rapidly adjusted on the fly to account for new printing methods or font changes.

Works on any smart camera or PC-based platform. It can be deployed on so many different devices running so many different operating systems – think Windows, Linux or Linux ARM embedded desktops and compact devices (like Raspberry Pi or Nvidia Jetson), Android handheld devices and, of course, smart cameras. It can work on a GPU or CPU. And you can take “full control over development and integration with other applications in C++ or .NET using Zebra’s Aurora Vision Library” as my friend Donato Montanari has reminded the world on many occasions.

Think of it this way, deep learning OCR (at least the model Zebra offers) is akin to the brain of an engineer who has already been trained on hundreds of thousands of images and learned to accommodate for different scenarios. That brain is ready to be put to work and make an immediate impact after a 5-10 minute debrief. Conventional OCR, on the other hand, is like asking a five-year-old kid to decipher what’s in front of them and detect “what’s wrong with this picture/phrase” with perfection even though they’re just learning to read. The kid may be able to understand the few letters and identify the type/color of fonts that he has seen, but that’s about it.

That’s why it’s hard to find faults with deep learning OCR techniques. It’s just so good at finding faults in text on the products and labels coming off the line, and not just because the alternative inspection method is a more rudimentary/conventional OCR tool.

With deep learning OCR, it doesn’t matter if the characters you ask the AI to read are obscured, damaged, engraved, embossed, custom to your company, reflective, on a curved surface, or appearing differently than the original training set because of lighting variances. It will tell you if something is present or absent, right, or wrong, ready to go to the customer or needing to be pulled from inventory. And it will do so in milliseconds!

A Zebra partner did a demo at SPS Italia a little over a year ago to show how quickly deep learning OCR could read different types of markings, and the average execution time was 12 milliseconds. Honestly, though, it’s not unusual for execution times to be as low as 8-10 milliseconds even in what would typically be considered a “complex” scenario for traditional OCR. Check this out:

Now, you might be wondering when/how you should be using deep learning OCR when it’s the right inspection method and if you should retire your conventional OCR systems and replace them with deep learning OCR.

Examples of the Zebra team, advise customers to use deep learning for machine vision. Often, you’ll see deep learning OCR used in similar operating environments and workflows, but for slightly different purposes. For example, deep learning OCR can handle the following “challenges” with ease:

  • Reading identification, compliance, safety and other markings on vehicle tires
  • Test tube label and cap analysis
  • Blood pack label inspection
  • Waybill document reviews

Of course, end-of-line inspection, traceability of parts, and presence/absence are prime workflows where deep learning OCR can deliver value, as “getting it right” is important here. However, I walked through several other potential deep learning OCR applications from easiest to hardest in this recent webinar if you want a better feel for its potential applications.

Honestly, here’s the best way to sum up when/where/how you should use deep learning OCR:

Whether you need to read best-before-dates, serial numbers, lot numbers, vehicle identification numbers (VINs), or label symbiology, deep learning OCR will tell you what you need to know, which is whether the correct components and parts are in the right place at that moment in time based on defined safety, compliance, and customer requirements.

Still On the Fence About Deep Learning OCR?

I know that you may have your doubts about AI and machine vision, as many business leaders are still trying to sort things out. (Only 40% of tier 1 auto suppliers, 35% of tier 2 auto suppliers, and 49% of automotive OEMs in Germany have embraced AI machine vision to date, according to this recently released study.) However, let me call out a few things that could help you twist someone’s arm or even convince yourself that deep learning OCR is the right investment – and a low-risk move – to make right now:

  • No one buys machine vision systems/smart cameras because they’re cool. They do it because they are super helpful. The same is true of deep learning OCR tools.
  • Speech is easy. It’s a natural learning process. However, reading is hard for humans. It takes years and a ton of effort to learn how to read! It’s also a modern cultural invention. So, you may in fact be offering relief to your workers by taking “reading” off their to-do lists, especially given how tedious reading tends to be in the package/product/parts inspection process. Plus, have you ever tried to decipher someone else’s handwriting? Yeah, not a confidence booster. Reserve that for the party games, not the loading dock or production/packing lines.
  • Machine vision is trying to mimic humans it’s an abstraction of humans. It’s a camera trying to read like a human. We know how hard it is for humans to read, and OCR is essentially a camera trying to read from a picture. Without the deep learning algorithms/process to assist, OCR is going to stay at that elementary reading level forever. That’s why deep learning OCR is so different (and much more valuable) than conventional OCR. That said…
  • Traditional, teachable OCR works well if you’re trying to read a basic, standard image and if the text it is actually reading is what you’re expecting it to read. If everything is consistent and perfect. The problem is that we don’t live in a perfect world! Therefore, traditional OCR is going to struggle to be “perfect” when it’s challenged to read something that looks different than what it learned to read. If something is unexpected, it’s going to seize up. (Well, it won’t be that dramatic, but then conventional OCR is likely going to cause some drama because it’s always going to tell you, “Stop introducing optical distortion! Don’t change the lighting, don’t change the font size, don’t change the contrast. I don’t like it. I can’t do what you want me to do. Just show me what you taught me to look for.”)

Now, I’ll play devil’s advocate for a moment because I know it’s hard to accept that change is necessary (and will cost you some amount of money.) If you want to make conventional OCR work, you could absolutely reteach it what it needs to know and create a super-rich library of fonts, variations, etc. if you have the skills and the time. But what if the next item has a different background? How many times are you going to reteach the conventional OCR what deep learning OCR has already learned to do? I mean, deep learning OCR works on color images, can read almost any text in any condition (including handwriting), and can be online within minutes, trained on your own CNN AI model if you want. You don’t have to train fonts or maintain libraries for deep learning OCR, either.

So, while your conventional OCR setup is not obsolete, you do need to understand when it’s the right choice and when deep learning OCR is the better choice.

What I can say in one sentence is that “Conventional OCR should be used when you want the camera to read ABC and you want to ensure the camera is reading ABC when the text should be consistent.” However, I explain all the criteria for making that determination in this 30-minute online demo, so I highly recommend watching it when you have a few minutes before you decide whether it could work for you or if you need a deep learning OCR tool. (You’ll see I try to paint a very real picture of what could happen if you try to make conventional OCR “work.”)

The Takeaway

Many OCR tools require you to invest a lot of time for something that works in perfect conditions but too often struggles to work perfectly. The exception is deep learning OCR. It offers a flexible experience for you, your industrial imaging engineers and quite frankly anyone who is tired of misreads – or missed reads and wants to fix things.

Plus, don’t you want an adult (rather than a kid) checking what you’re putting out into the market? And wouldn’t it be better if that adult had superhuman powers and could work at warp speed? That’s what I thought.

So, stop thinking that deep learning OCR isn’t for you because you’re exactly who deep learning OCR was trained to help. Take advantage of this old dog’s new tricks because those tricks will keep the human brain from playing tricks on you and costing you a lot of heartache and money.

This blog is contributed by Zebra Technologies.

Leave a Reply

Your email address will not be published. Required fields are marked *