PDF Parser

Overview

This Snap enables you to extract fields, text, images, or tables from a PDF document. This Snap requires the PDF account only for parsing a locked PDF file.

Parse-type Snap
Works in Ultra Tasks

Prerequisites

None.

Known issues

The PDF Snaps perform validation of the files before processing them. In this process, if the Snap finds a PDF file that is not well-formed, it displays an error Only PDF files are supported, even when it is a PDF file.
Workaround: Fix the PDF file using an online or in-house tool and retry.
The PDF Parser Snap might encounter issues when it parses tables that are embedded images, lack borders, have a complex row-column structure, or span multiple pages.

Snap views


View	Description	Examples of upstream and downstream Snaps
Input	This Snap has exactly one binary input view.	JSON Generator HTML Generator
Output	This Snap supports exactly one binary or document output view.	File Writer S3 File Writer
Error	Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are: Stop Pipeline Execution Stops the current pipeline execution when an error occurs. Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records. Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution. Learn more about Error handling in Pipelines.

Snap settings

Legend:

Expression icon (): Allows using JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
Suggestion icon (): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
Upload : Uploads files. Learn more.

Learn more about the icons in the Snap settings dialog.


Field / Field set	Type	Description
Label	String	Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if there are more than one of the same Snap in the pipeline.
Pages	String	Required. Specify pages or the range of pages that you want to split into separate PDF documents. For a range of pages, use a hyphen and separate each page or range with a comma. For example: 2–17 (take the content of pages 2 through 17 and create a separate PDF document) 1–3, 5–7, 10 (take the content of pages 1 through 3, 5 through 7, and page 10 to create a separate PDF document) Default value: N/A Example: 1–3, 5–7
Parser type	Dropdown list	Select how the parser should act on the pages specified in the Pages field. The options available are: Text extractor Images extractor from pages Pages to images converter Table parser Default value: Text extractor Example: Table parser
Snap execution	Dropdown list	Choose one of the three modes in which the Snap executes. Available options are: Validate & Execute. Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime. Execute only. Performs full execution of the Snap during pipeline execution without generating preview data. Disabled. Disables the Snap and all Snaps that are downstream from it.