How to break down any AI problem?

FAQs for AI Zero → One (Episode #6)

Sandeep Uttamchandani
4 min readJun 2, 2022

--

Transcript:

Hey, everyone. Welcome to episode number six, FAQs on AI and going from zero to one. Today’s question we’re going to tackle is how to break down an AI problem into basic building blocks. So let me first explain why this is required and important. Consider an example, you’re building out an AI-based receipt scanner where your goal is not only scanning [00:00:30] the receipts and mapping the right fields but then flagging violations based on the policies you may have within the company.

Now think about it. To build an application that can provide or do this, it actually includes multiple sub-tasks that you can decompose this problem into. One simple way, um, and, [00:01:00] and again, you can make this more granular. I just wanted to have three sub-tasks. The first task really would be text box detection. The second would be OCR, optical character recognition, figuring out what those characters mean. So the first is where are the texters in that image? Second is what does the text mean? Is it a number, is it a word, is it an alphabet? And then call it making sense of it. Is this a sales tax? Is this the total? Is this the item price? [00:01:30] And so on.

Now, for each of these decomposed tasks, this is what you need to now kind of moving tally. So for each building block here, you have to start looking at where is the leap of faith assumption? Where can you use standard off-the-shelf solutions that could perfectly match your data needs and what you can accomplish versus [00:02:00] areas where you will have to innovate, you’d have to maybe further decompose the task, or integrate um, you know, a combination of techniques uh, from rule-based to model-based to ML to solve the problem. So in this example, text box detection is, again, is a well, well-defined topic. There are several tutorials out there. I’ll encourage you to go check out these tutorials um, um, that, [00:02:30] that in some sense apply different techniques to understand where on this image reader is the text written.

When it comes to OCR, the second, second bit, it’s interesting that when many folks think about, you know, optical tactile…

--

--

Sandeep Uttamchandani

Sharing 20+ years of real-world exec experience leading Data, Analytics, AI & SW Products. O’Reilly book author. Founder AIForEveryone.org. #Mentor #Advise