Financial Portfolio Information Extraction from SEC Filings
Introduction
This use case demonstrates how you can use the GenAI Builder Snaps to extract information from a PDF of densely packed information.
Problem
SEC filings, especially for larger companies, contain vast amounts of detailed financial and business information. The sheer volume of data can make it difficult to locate specific pieces of information quickly. In general, financial reports often use complex financial terminology and industry-specific jargon, which can be difficult for the average reader to understand and navigate.
Moreover, a lack of complete standardization in how financial data is presented across different companies and industries imposes an even larger challenge. For example, SEC filings are organized into various forms (such as the 10-K, 10-Q, 8-K forms) and sections, which can be confusing for those unfamiliar with the structure. Because SEC filings often contain time-sensitive information, the pressure to meet strict deadlines can sometimes result in less user-friendly presentation of data. Finally, SEC reporting requirements change over time, which can lead to inconsistencies in how information is presented across different years or reporting periods.
Solution
Developing a comprehensive AI-driven system that integrates with existing information can effectively address the challenge of analyzing legal contracts. SnapLogic AgentCreator can streamline the process of extracting and processing data from filing forms through its intelligent integration and automation capabilities. This solution employs intelligent document processing to extract specific information from a sample SEC filing.
Understanding the solution
This use case demonstrates how to use the GenAI Builder Snaps to create an information extractor pipeline integration with a chatbot that answers questions from an the SEC 10-K form filing. This example consists of two pipelines:
- Indexer: Loads data from the source document. This pipeline extracts text from a PDF file of a 10-K form, breaks it into chunks, generates embeddings for each chunk using Amazon Titan Embedder, and indexes the embeddings in OpenSearch vector database for context retrieval.
- Retriever: Retrieves data to answer questions. This pipeline connects to an LLM endpoint using Amazon Bedrock Snaps. It takes in a question, generates embeddings using the Titan Embedder Snap, queries similar embeddings in OpenSearch Query Snap, constructs a context for the Amazon Bedrock Prompt Generator, and passes it to the Anthropic Claude on AWS Messages Snap to generate a response.
Prerequisites
- Access to an OpenSearch instance (download at https://opensearch.org/downloads.html) with an existing index. Alternatively, you can use a local cluster or AWS services to host an OpenSearch instance.
- Valid AWS account with access to Foundational Models on AWS Bedrock, such as Amazon Bedrock Titan in Amazon Bedrock
- A source document to upload, such as the SEC Form 10-K for Apple Inc. for 2003.
- Download the patterns from the SnapLogic Public Pattern Library
Indexer pipeline configuration steps
In this example, we load data into a vector database using the Indexer pipeline:
- In SnapLogic Designer, configure the File Reader Snap to read a source document for information extraction. In the File field, upload the source document - in this case, the Apple Inc. SEC 10-K form filing.
- Add a PDF Parser, and for the Parser type, select Text Extractor.
- Configure the Chunker Snap with the settings.
- In the Amazon Titan Embedder Snap, select the model in the Model Name field, and in the Text to embed field, enter $chunk.
- Configure the Mapper Snap with the following three mappings:
- In the OpenSearch Upsert Snap, give the Index an appropriate name, such as apple-10k-index.
Retriever pipeline configuration steps
In this example, we load data into a vector database using the Indexer pipeline
- In the JSON Generator Snap, enter the prompt.
For example, in the JSON editor, enter the prompt, as shown below:
- Configure the Titan Embedder Snap with the same settings as the one in the Indexing pipeline.
- Configure the Mapper Snap as shown in the following image:
- Configure the settings in the OpenSearch Query Snap, as shown in the following image:
- In the second Mapper Snap, configure the settings as shown in the following image:
- For the Amazon Bedrock Prompt Generator Snap, use the following prompt.
Context: {{#data}}{{{.}}}{{/data}} Question: {{prompt}} Answer:
Tip: Learn more about Mustache syntax in the following SnapLogic Integration Nation article:Using Mustache Templating with the Prompt Generator Snap in SnapLogic
- In the Anthropic Claude on AWS Messages Snap, select the model, and set the value in the Prompt field to $prompt.
- Validate the pipeline, then open the Data Preview for the Anthropic Claude on AWS
Messages Snap. Result: The preview contains the answer.