Partition unstructured document

This example pipeline demonstrates how to extract structured content from a raw unstructured document using the Partition API.

  1. Configure the File Reader Snap to read the contents of the unstructured-doc.pdf file.
    On validation, the Snap displays the content of the unstructured-doc.pdf file and in a binary format.
  2. Configure the Partition API Snap to segment and process the extracted data from the unstructured document. Set the Strategy to auto to generate output containing tables and text.
    On validation, the Snap displays the partitioned output. The unstructured data is segmented into various structured components such as, Header, Title, NarrativeText, Table, Image, ListItem, and PageNumber.
    Partition API Snap Configuration Partition API Snap Output

    Partition API Snap Configuration


    Partition API Snap Output

To successfully reuse pipelines:
  1. Download and import the pipeline into SnapLogic Platform.
  2. Configure Snap accounts, as applicable.
  3. Provide pipeline parameters, as applicable.