Skip to content

LNshuti/imf-articlefour-tldr

Repository files navigation

International Monetary Fund(IMF) Article IV tldr

Scrape and Summarize IMF Article 4 by country. Build a chatbot to interrogate these reports for a given country/year. This project demonstrates a practical, complex knowledge work application of using Large Language Models(LLMs). We implement a document summarization application from first principles. We design a good user experience by paying special attention to low latency(building a very fast application), and accuracy(returning correct and relevant information) about specific technical reports.

Data

We use a combination of

  1. pdf documents -- official IMF article IV reports downloaded directly from the IMF website.

  2. Finetuning datasets generated using gpt 3.5-turbo(exact model name gpt-3.5-turbo-0125). We generate 100 examples by running the following prompt over the ingested document:

Prompt:

Create a finetuning dataset with 100 examples from this document. The dataset has three columns: question, answer, context. The context should cite the paragraph the answer was extracted from. Return the dataset in jsonl format.

  {
        "question": "What percentage of Guyana's economy is accounted for by the oil sector?",
        "answer": "The oil sector accounts for over 60 percent of Guyana's economy.",
        "context": "After almost three years of exporting oil, Guyana’s oil sector accounts for over 60 percent of its economy, raising Guyana’s growth prospects for both its oil and non-oil economy."
    },
    {
        "question": "How has GDP per capita changed in Guyana since the start of oil production?",
        "answer": "GDP per capita increased from one of the lowest in the LAC region in the early nineties to US$18,342 in 2022.",
        "context": "After oil production commenced in end-2019, the Guyanese economy has tripled in size. GDP per capita increased from one of the lowest in the Latin America and Caribbean (LAC) region in the early nineties to US$18,342 in 2022."
    },
    {
        "question": "What is the projected oil production in Guyana by 2028?",
        "answer": "Oil production in Guyana is projected to almost triple by 2028, reaching about 1 million barrels per day.",
        "context": "Oil production is ramping up rapidly, supporting extraordinarily high real GDP growth. With the coming on stream of Liza-2, Guyana more than doubled its oil production in 2022, to about 400,000 thousand barrels per day (bpd). As a result, Guyana recorded the highest real GDP growth rate in the world, 62.3 percent. Going forward, oil production will almost triple by 2028 to about 1 million bpd as four new approved fields will come on stream by 2028."
    },
    {
        "question": "How does the Guyanese government plan to manage oil revenue?",
        "answer": "The Guyanese government plans to use oil revenues to support the growth of the non-oil economy, invest in human capital, and increase infrastructure resilience.",
        "context": "The government plans to use the oil revenues transferred to the budget to support growth of the non-oil economy, through human capital (building schools and hospital), increasing access to energy and lowering energy costs, investing in transport infrastructure (highway to Brazil, bridge to Suriname, bridge and deep water port, plus rehabilitation of the road and railway network) and in infrastructure to increase the economy’s resilience to natural disasters and climate change, and increasing value added exports (e.g. agribusiness)."
    },
    {
        "question": "What fiscal strategies is Guyana implementing to handle the oil revenue?",
        "answer": "Guyana has implemented a Natural Resource Fund to manage oil revenues, aiming to ensure fiscal sustainability and support national development.",
        "context": "The government has two sets of rules based on which it receives and saves a large share of oil proceeds to avoid building up rapid macro economic imbalances. One set of rules governs how oil proceeds are shared with the Exxon consortium. Since oil extraction requires large upfront investments, recoveries for exploration and development costs received by the private sector companies represent a high share of the oil receipts in the first years, reducing the portion of oil production considered as profit. Once costs are fully recovered, a larger portion of oil production will be considered profit and an increasingly large share of oil proceeds will be received by the government. The second set of rules are laid out in the Natural Resource Fund (NRF) Act, approved in 2019 and amended in 2021, which establishes a sovereign wealth fund, the NRF, to delink public spending from the volatility in natural resource revenues, to “ensure that natural resource revenues do not lead to a loss in competitiveness, to transfer natural resource wealth across generations fairly, and to use natural resources wealth to finance national development priorities and any initiatives aimed at realizing an inclusive green economy.” Withdrawals from the NRF to the budget are subject to annual ceilings and parliamentary approval, and can be used only to finance national development priorities and “essential projects that are directly related to ameliorating the effect of a major natural disaster.” In 2020 the first oil proceeds were received and deposited in an offshore account of the NRF, and in 2022 the first funds were transferred to the budget (just over 4 percent of GDP)."
    }

For affordable inference deployment, we use a finetuned nous-hermes/llama3-70B with 100 examples, nous-hermes/llama3-7b and nous-hermes/mistral-7B-v3 with 1000 examples each.

Unit Testing and Evaluation: OpenPipe

We track the following three metrics:

Latency: time it takes for the search application to return results after a query is submitted. This metric is crucial for user experience because users generally prefer faster responses.

Latency Components:

  • Query Time: The time taken to process the user's query.
  • Model Inference Time: The time taken by the model (e.g., nous-hermes/llama3-70B) to generate a response.
  • Data Retrieval Time: The time taken to fetch the relevant data from the database or search index.
  • Network Latency: The time taken for data to travel across the network

Faithfullness: Measured using a combination of Recall and F1.

image

Recall: fraction of the returned answers that are relavant given the context. Higher recall means that the model generally returns responses that are faithfull to the contents of the IMF articles loaded by the application.

image

$$ \text{Recall} = \frac{\text{Number of Relevant Answers Retrieved}}{\text{Total Number of Relevant Answers in Context}} $$

F1: harmonic mean of precision and recall. It provides a single score that considers both false positives and false negatives.

$$ \text{F1 Score} = 2 \times \left(\frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\right) $$

  1. Inference/Deployment Stack: Modal Labs FastAPI App on NVDIA H100s

    Gradio Space Huggingface

    Gradio with Groq for Ulta-Fast Inference?

We begin with prompt engineering, build a keyword, semantic, and hybrid search versions of the application. We implement a RAG(Retrieval Augmented Generation) app on the same set of documents. Finally, we finetune open source models nous-hermesllama3-7B/llama3-70B on our test datasets, and deploy the application using Gradio from Huggingface.

Using quantitative and "vibes" metrics i.e. subjective evaluation of the quality of the response, we systematically compare all the versions. The "winning" application that returns the desired responses, is fast and balances performance with cost is deployed to production using Modal Labs. To use this application, click the link below.

References

  1. Modal Labs Text Generation Inference. https://github.com/LNshuti/modal-examples/blob/main/06_gpu_and_ml/embeddings/text_embeddings_inference.py