Problem
Techfin Solutions has a partner in Europe that delivers a supply chain traceability solution verifying various claims related to products. This service enhances consumer trust by examining and verifying documentation that support claims such as fair trade, vegan, organic, recycled, etc. The company performed the document verification manually, an approach that demanded significant amount of hands on involvement from experts. This was a costly, slow and error-prone process due to a large amount of manual tasks involving a number of people. The company wanted a digital solution that automates a significant portion of this operation in order to reduce costs and improve accuracy.
Solution
We designed and implemented a GenAI based solution for injesting, versioning and processing various types of documents involved in supply chains such as invoices, purchase orders, certifications, product catalogues, bill of lading, lab reports, audit reports, etc. The system utilized Google Vision API to perform layout-preserving OCR on unreadable documents such as images and scans. The extracted text was subsequently processed through the OpenAI API, with tailor-made prompts in order to organize information into an ontology which was used as the underlying data model. Various other Natural Language Processing techniques such as NER (Named Entity Recognition) and relationship maps were also used to supplement this process, enhancing the accuracy. New algorithms were developed to infer column relationships in csv files, which was the most common structured file type used in supply chains. Unstructured files such as pdf, plain text and word docs were processed using vector database techniques.
When a bunch of files were uploaded as the input, the system was able to derive the supply chain configuration (product and supplier relationships) including fine-grained details about each supplier, product and material.
Results
The final system was tested against 500+ documents extracted from two European textile supply chains. The system delivered an accuracy higher than 95% in extracting content from structured files while the accuracy with unstructured files was around 85%. False positives were maintained below 1%. This lead to a new process which eliminated manual verification up to a great extent, resulting in reduced costs, faster operation and reduced errors.
About The Author: Dileepa Jayathilake
More posts by Dileepa Jayathilake