Accelerating Operations with pdf2txt, GPT Summarization, and Azure OpenAI Service in the Open Source Ecosystem

In today's fast-paced world, businesses are constantly looking for ways to streamline their processes and accelerate decision-making. One such area is the evaluation and ranking of complex documents. This presentation will explore how leveraging open-source tools like pdf2txt, GPT summarization, and Azure OpenAI Service can help organizations automate and optimize their document evaluation processes while maintaining anonymity and confidentiality.

We will demonstrate a three-level approach that combines the power of GPT models with human evaluation criteria, allowing organizations to efficiently analyze and rank documents based on their relevance and quality:

  • Basic Matching and Ranking: In this level, we will use pdf2txt for document extraction and GPT models to match description criteria with items. The system will then rank the items based on their relevance, providing a basic analysis of the documents.
  • Incorporating Human Evaluation Criteria: Building on the basic matching and ranking, we will introduce human evaluation criteria that guide the GPT models in their analysis. This approach allows for a more nuanced and human-like understanding of the documents, enabling more accurate and meaningful ranking.
  • GPT Model Fine-tuning with Human-ranked Samples: In the final level, we will provide the GPT models with samples of ranked items evaluated by human experts. The models will learn from these samples and fine-tune their ranking capabilities. Once trained, the GPT models can then be applied to new sets of documents, resulting in a more accurate and efficient ranking process that closely mirrors human judgment.

This presentation will showcase how the combination of open-source technologies and innovative approaches can help organizations accelerate their operations and improve decision-making. Attendees will gain insights into the practical applications of these techniques and learn how to implement them in their own organizations to streamline processes and enhance efficiency.

Quick Info
Event Type
Is Topic
Target Audience
Developer, Power User, General User

Dr. Chung, NG

Dr. Chung is a SVP at Group CTO Office of the HKT/PCCW Group, where he’s responsible for leading the group’s product and technology roadmap and strategic development. He also represents the group as board members of Lynx Analytics and Bindo Labs.

Before HKT/PCCW, Chung contributed to the Big Data/AI strategy at Telstra as well as its international growth strategy. Prior to Telstra, Chung was an Associate Partner of Cluster Technology Limited which serves the Greater China market with professional services and solutions in high-performance computing, machine learning, big data, and public cloud.

In 2008, Chung joined McKinsey & Company in the Hong Kong office. He received his DPhil in Information Engineering from the University of Oxford and held the Croucher Foundation Scholarship to work toward his research degree in wireless ad-hoc networks. Chung also received BEng and MPhil in Information Engineering from the Chinese University of Hong Kong.

Country / Region
Hong Kong
Is Remote Presentation