36.7 Document Understanding: OCR, Layout Analysis, and DocVQA
Right, so you want to get a machine to actually read a document, not just scan it. We’re past the point of simple Optical Character Recognition (OCR), which frankly, is about as useful as a typesetter who only gives you the text and throws away the font, the layout, and the coffee stains. Modern Document Understanding is the whole package: it’s the OCR, the spatial awareness to understand a layout (that’s Layout Analysis), and the cognitive ability to answer questions about it (Document Visual Question Answering, or DocVQA). It’s the difference between getting a text file and getting an intern who actually understands the memo.