TLDR: The article explains using OCR to extract account and transaction data from bank statements in PDFs or images by analysing pixel patterns to convert text into machine-readable data.
Bank statement data extraction from PDFs and images
We use optical character recognition (OCR) to extract the account information and transaction data from bank statements uploaded as an image or PDF. OCR works by analysing the pixels of the image, identifying patterns that form letters, numbers and other characters and comparing these patterns with a database of known characters. OCR software then assigns a corresponding value to each recognised character, essentially "reading" the text and converting it into machine-readable data.
This guide is for Thirdfort Clients using the new CDD platform. This article may not apply if you have not yet been migrated to the new platform or access Thirdfort via a partner or reseller