This example pipeline demonstrates how to extract structured content
from a raw unstructured document using the Partition API.
-
Configure the File Reader Snap to read the
contents of the unstructured-doc.pdf file.
On validation, the Snap displays the content of the
unstructured-doc.pdf file and in a binary format.
-
Configure the Partition API Snap to
segment and process the extracted data from the unstructured document. Set the
Strategy to auto to generate output
containing tables and text.
On validation, the Snap displays the partitioned output. The unstructured data is
segmented into various structured components such as, Header, Title, NarrativeText, Table,
Image, ListItem, and PageNumber.
Partition API Snap
Configuration |
Partition API Snap
Output |
|
|
To successfully reuse pipelines:
- Download and import the pipeline into SnapLogic Platform.
- Configure Snap accounts, as applicable.
- Provide pipeline parameters, as applicable.