Financial Portfolio Information Extraction from SEC Filings

Overview

This use case demonstrates how you can use the GenAI Builder Snaps to extract information from a PDF of densely packed information.

Problem

SEC filings, especially for larger companies, contain vast amounts of detailed financial and business information. The sheer volume of data can make it difficult to locate specific pieces of information quickly. In general, financial reports often use complex financial terminology and industry-specific jargon, which can be difficult for the average reader to understand and navigate.

Moreover, a lack of complete standardization in how financial data is presented across different companies and industries imposes an even larger challenge. For example, SEC filings are organized into various forms (such as the 10-K, 10-Q, 8-K) and sections, which can be confusing for those unfamiliar with the structure. Because SEC filings often contain time-sensitive information, the pressure to meet strict deadlines can sometimes result in less user-friendly presentation of data. Finally, SEC reporting requirements change over time, which can lead to inconsistencies in how information is presented across different years or reporting periods.

Solution

Developing a comprehensive AI-driven system that integrates with existing information can effectively address the challenge of analyzing legal contracts. This solution employs intelligent document processing to extract specific information from a sample SEC filing.

Understanding the solution

This use case demonstrates how to use the GenAI Builder Snaps to create an information extractor pipeline integration with a chatbot that answers questions from an 10-K form SEC filing. This example consists of two pipelines:

  • Indexer: Loads data from the source document. This pipeline extracts text from a PDF file of an 10-K form, breaks it into chunks, generates embeddings for each chunk using Amazon Titan Embedder, and indexes the embeddings in OpenSearch for context retrieval.
  • Retriever: Retrieves data to answer questions. This pipeline connects to an LLM endpoint using Amazon Bedrock Snaps. It takes in a question, generates embeddings using the Titan Embedder Snap, queries similar embeddings in OpenSearch Query, constructs a context for the Amazon Bedrock Prompt Generator Snap, and passes it to the Anthropic Claude on AWS Messages Snap to generate a response.

Prerequisites

Tip:

You can download the pipeline from here:

Indexer pipeline configuration steps

In this example, we will load data into a vector database using the Indexer pipeline



Note: The following steps assume you have already imported the pipelines or are building them as you go through the steps.
  1. In SnapLogic Designer, configure the File Reader Snap to read a source document for information extraction. In the File field, upload the source document - in this case, the Apple Inc. SEC Filing 10-K form.
  2. Add a PDF Parser, and for the Parser type, select Text Extractor.
  3. Configure the Chunker Snap with the settings.
    Shows Chunker Snap configuration settings.

  4. In the Amazon Titan Embedder Snap, select the model in the Model Name field, and in the Text to embed field, enter $chunk.
  5. Configure the Mapper Snap with the following three mappings, shown in the image:
    Shows the Mapper Snap configuration settings.

  6. In the OpenSearch Upsert Snap, give the Index an appropriate name, such as apple-10k-index.

Retriever pipeline configuration steps

In this example, we will load data into a vector database using the Indexer pipeline



Note: The following steps assume you have already imported the pipelines or are building them as you go through the steps.
  1. In the JSON Generator Snap, enter the prompt.

    For example, in the JSON editor, enter the prompt, as shown below:

  2. Configure the Titan Embedder Snap with the same settings as the one in the Indexing pipeline.
  3. Configure the Mapper Snap as shown in the following image:
    Configuration settings in the first Mapper Snap.

  4. Configure the settings in the OpenSearch Query Snap, as shown in the following image:
    Configuration settings in the OpenSearch Query Snap.

  5. In the second Mapper Snap, configure the settings as shown in the following image:

  6. For the Amazon Bedrock Prompt Generator Snap, use the following prompt.
    Context: {{#data}}{{{.}}}{{/data}}
                  Question: {{prompt}} 
                  Answer:
    Tip: Learn more about Mustache syntax in the following SnapLogic Community article:

    Using Mustache Templating with the Prompt Generator Snap in SnapLogic

  7. In the Anthropic Claude on AWS Messages Snap, select the model, and set the value in the Prompt field to $prompt.
  8. Validate the pipeline, then open the Data Preview for the Anthropic Claude on AWS Messages Snap.

    Result: The preview contains the answer, as shown below: