This example pipeline demonstrates how to extract structured content
from a raw unstructured document using the Partition API and segmenting it futher into
different types of data - text, images, and tables.
-
Configure the File Reader Snap to read the
contents of the wildfire_stats.pdf file.
On validation, the Snap displays the content of the
wildfire_stats.pdf file and in a binary format.
-
Configure the Partition API Snap to
segment and process the extracted data from the unstructured document. Set the
Strategy to auto to generate output
containing tables and text.
On validation, the Snap displays the partitioned output. The unstructured data is
segmented into various structured components such as—Title, NarrativeText, Table, Image,
and more.
Partition API Snap
Configuration |
Partition API Snap
Output |
|
|
-
Configure the Router Snap to
route documents from the upstream Snap to different output views based on data type—table,
image, and text.
On validation, the Snap displays the three specified routes along with the
corresponding data types.
Router Snap Configuration |
|
Router Snap Output (Table) |
Router Snap Output (Image) |
Router Snap Output (Text) |
|
|
|
To successfully reuse pipelines:
- Download and import the pipeline into SnapLogic Platform.
- Configure Snap accounts, as applicable.
- Provide pipeline parameters, as applicable.