Partition API

Overview

You can use this Snap to extract raw unstructured data and transform this data in into document elements with the Unstructured API. The document elements include—Title, NarrativeText, Table, Image, FigureCaption, and ListItem. Learn more about the supported documents.


Partition API Snap Settings

Prerequisites

None.

Limitations and known issues

None.

Snap views

View Description Examples of upstream and downstream Snaps
Input This Snap has at the most one binary input view. It requires a raw unstructured document as a binary format.
Output This Snap has at the most one document output view. The output is a structured representation of the document, divided into elements including—Title, NarrativeText, Table, Image, FigureCaption, and ListItem. Table data can be extracted as plain text, HTML, or structured table cells. Images can be extracted as Base64 encoded strings. Only images and PDFs can include image elements in the output. Other document types will include only textual elements, without any image extraction.
Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution Stops the current pipeline execution when the Snap encounters an error.
  • Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records.
  • Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap settings

Note:
  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
  • Expression icon (): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
  • Add icon (): Indicates that you can add fields in the field set.
  • Remove icon (): Indicates that you can remove fields from the field set.
Field / Field set Type Description
Label String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: Partition API

Example: Extract_Financial_Data_ReportsFY'22
Strategy Dropdown list/Expression Required. Choose the partitioning strategy to use. Available options include:
  • auto: Selects automatically based on the document characteristics.
  • fast: Prioritizes speed, outputs text only (no tables or images).
  • hi_res: Accurately processes input .
Note: This field is only applicable for image and PDF document types and has no effect on other file types.

Default value: auto

Example: fast
Extract elements as images

Appears when you select auto or hi-res in the Strategy dropdown list.

Use the following fields to extract the element types as Base64 images.
Figures Checkbox/Expression
Select this checkbox to extract figures as Base64 encoded images.
Note: This field does not support input values from the upstream Snap.

Default status: Selected

Tables Checkbox/Expression
Select this checkbox to extract tables as Base64 encoded images.
Note: This field does not support input values from the upstream Snap.

Default status: Selected

Advanced configuration Use the following fields to set the advanced configuration to customize the output.
Generate element IDs Checkbox/Expression
Select this checkbox to generate UUIDs (Universally Unique Identifiers) from element IDs to ensure uniqueness. When you deselect this checkbox, the element text is computed using SHA-256 to create IDs.
Note: This field does not support input values from the upstream Snap.

Default status: Deselected

Include page breaks Checkbox/Expression
Select this checkbox to include page breaks in the output (if the file type supports it).
Note: This field does not support input values from the upstream Snap.

Default status: Deselected

Snap execution Dropdown list
Select one of the three modes in which the Snap executes. Available options are:
  • Validate & Execute. Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only. Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled. Disables the Snap and all Snaps that are downstream from it.

Default value: Validate & Execute

Example: Execute only

Examples