Extract

Overview

You can use this Snap to extract text, tables, and figures from a binary PDF file.

Unstructured Adobe Extract

Prerequisites

Valid Adobe Account.

Limitations and known issues

None.

Snap views

View Description Examples of upstream and downstream Snaps
Input This Snap has at the most one binary input view. It requires a PDF file in binary format. File Reader
Output This Snap has at the most one document output view. It returns the content (structure) as a document and the extracted files (csv, png) as base64-encoded strings. The output document contains:
  • structuredData: The JSON output of the extract result
  • tables: The table files in base64 encoding if Full table is set to true
    Note: The PNG files of the tables are also in the tables array instead of the figures array if you select the Figure checkbox .
  • images: The image files in base64 encoding if Full table is set to true
Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution Stops the current pipeline execution when the Snap encounters an error.
  • Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records.
  • Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap settings

Note:
  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
  • Expression icon (): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
  • Add icon (): Indicates that you can add fields in the field set.
  • Remove icon (): Indicates that you can remove fields from the field set.
Field / Field set Type Description
Label String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: Extract

Example: Extract
Extract Options Extract options enable you to configure the elements to extract from the PDF file.
Text Checkbox/Expression

Select this checkbox to extract the text element.

Default status: Selected

Full table Checkbox/Expression

Select this checkbox to extract the table elements.

Default status: Deselected

Figure Checkbox/Expression

Select this checkbox to extract PNG files from the PDF.

Default status: Deselected

Advanced Extract Options Configure the additional elements to extract from the PDF file.
Add character info Checkbox/Expression

Select this checkbox to add character-level bounding boxes to the output.

Default status: Deselected

Get styling info Checkbox/Expression

Select this checkbox to add styling information to the output.

Default status: Deselected

Snap execution Dropdown list
Select one of the three modes in which the Snap executes. Available options are:
  • Validate & Execute. Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only. Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled. Disables the Snap and all Snaps that are downstream from it.

Default value: Validate & Execute

Example: Execute only

Troubleshooting

Failed to process the JSON output.

Invalid configuration.

Verify the settings and try again.

Failed to process request.

The usage limit has been reached.

Address the usage limit and try again

An error occurred while attempting to connect to Adobe Services.

Either the Client ID or Client secret is incorrect.

Verify the account settings are valid and try again.

Examples