Extract data from unstructured document

This example pipeline demonstrates how to extract structured content from a raw unstructured document using the Partition API and segmenting it futher into different types of data - text, images, and tables.

  1. Configure the File Reader Snap to read the contents of the wildfire_stats.pdf file.
    On validation, the Snap displays the content of the wildfire_stats.pdf file and in a binary format.
  2. Configure the Partition API Snap to segment and process the extracted data from the unstructured document. Set the Strategy to auto to generate output containing tables and text.
    On validation, the Snap displays the partitioned output. The unstructured data is segmented into various structured components such as—Title, NarrativeText, Table, Image, and more.
    Partition API Snap Configuration Partition API Snap Output

    Partition API Snap Configuration


    Partition API Snap Output

  3. Configure the Router Snap to route documents from the upstream Snap to different output views based on data type—table, image, and text.
    On validation, the Snap displays the three specified routes along with the corresponding data types.
    Router Snap Configuration

    Router Snap Configuration

    Router Snap Output (Table) Router Snap Output (Image) Router Snap Output (Text)

    Router Snap Output (Image)


    Router Snap Output (Image)


    Router Snap Output (Text)

To successfully reuse pipelines:
  1. Download and import the pipeline into SnapLogic Platform.
  2. Configure Snap accounts, as applicable.
  3. Provide pipeline parameters, as applicable.