S3 Select

Overview

You can use this Snap to execute a SELECT query on CSV, JSON, and Parquet S3 objects, which may be compressed in GZIP or BZIP2 format.

Note:
  • AWS S3 does not define any folder object and the '/' characters are part of the S3 object key names.

  • This Snap captures metadata and lineage information from the input document.



Prerequisites

  • A valid S3 account with the required permissions.

Limitations

The conditional fields Field Delimiter, Record Delimiter, Quote Character, Quote Escape Character, and File Header Info associated with the CSV Data Format are not displayed for non-CSV data formats. They are also not displayed if the Data Format field is expression-enabled. If you plan to make the Data Format expression-enabled to process the CSV-related properties, follow these steps:

  1. Set CSV as the Data Format.

  2. Configure the conditional fields.

  3. Enable the expression.

You must follow these steps because the values of these properties are used during the execution only if the value of the Data Format is set to CSV, even when the associated conditional fields are not displayed.

Snap views

View Description Examples of upstream and downstream Snaps
Input An upstream Snap is optional. Any document with key-value pairs to evaluate expression properties.
Output An example of the output binary data is as following:

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution Stops the current pipeline execution when an error occurs.
  • Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records.
  • Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap settings

Legend:
  • Expression icon (): Allows using pipeline parameters to set field values dynamically (if enabled). SnapLogic Expressions are not supported. If disabled, you can provide a static value.
  • SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
  • Suggestion icon (): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
  • Upload : Uploads files. Learn more.
Learn more about the icons in the Snap settings dialog.
Field / Field set Type Description
Label String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: S3 Select

Example: Select Invoices from Q4 2018
Bucket String/Expression/ Suggestion

Required. Specify the S3 bucket name where S3 objects are located.

Warning: Do not add S3:/// before bucket name, because the Snap can fail.
Note:
  • Bucket names are unique globally and can be accessed without the region name in most cases. If you cannot access a bucket name without its region name, you can specify the region information using the following syntax:

    • <S3_bucket_name>@<region_name>

  • You can also access the S3 bucket in an S3 Virtual Private Cloud (VPC) endpoint by specifying the bucket name using the following syntax:

    • <S3_bucket_name>@<VPC_S3_endpoint>

Note: If you enter an incorrect region name, but the bucket name is valid, the AWS S3 service may successfully access the bucket without any errors.

Default value: None.

Examples:
Object Key String/Expression/ Suggestion

Required. Specify the S3 object key name, which may include one or more forward-slash ('/') characters.

Note: The forward-slash character is part of the S3 object key name and there is no folder object defined in AWS S3. The maximum length of the suggested list is 1,000.

Default value: None.

Examples:
  • test.csv

  • abc/test.json

  • abc/xyz/test.xml

Select Query String/Expression

Enter a SELECT query to be executed on the S3 object. For a detailed description on using the SELECT command, refer to the Amazon documentation:SELECT command - Amazon Simple Storage Service

Default value: None.

Example: select * from S3Object

Input Data Format
Compression Type Dropdown list/Expression Required. Select one of the following compression types: NONE, GZIP, or BZIP2.
Note: The Amazon S3 Select service does not support Compression Type BZIP2 or GZIP for the Parquet input data type.

Default value: None.

Example: GZIP

Data format Dropdown list/Expression

Required. Select one of the data formats for the input data: CSV, JSON or Parquet.

Default value: CSV

Example: JSON

Field Delimiter String/Expression Appears when you select CSVfor Data Format.

Enter the field delimiter character used in the input data.

Default value: , (Comma)

Example:
  • | (Pipe)

  • \t (tab)

Record Delimiter String/Expression/ Suggestion Appears when you select CSVfor Data Format.

Enter the record delimiter used in the input data.

Default value: LF (new line)

Example: CRLF

Quote Character String/Expression Appears when you select CSVfor Data Format.

Enter the quote character used in the input data.

Default value: " (double quote)

Example: ' (single quote)

Quote Escape Character String/Expression Appears when you select CSVfor Data Format.

Enter the escape character used to escape quote characters inside the values, where values are wrapped around quote characters in the input data. Note that some CSV file uses double quote character to escape double quotes inside values. For example, "He said ""Yes""", which means: He said "Yes" when it is parsed.

Default value: \

Example: "

File Header Info Dropdown list/Expression Appears when you select CSVfor Data Format.

Specify whether to use the header information in the SELECT query.

Default value: USE

Example:
  • NONE

  • IGNORE

Show Advanced Properties Checkbox Select this checkbox to display the advanced properties.

Deselect this checkbox to hide the properties.

Default status: Deselected

Maximum Retries Integer/Expression

Appears when you select the Show Advanced Properties checkbox.

Required. Specify the maximum number of retry attempts to perform in case of a temporary network loss.

Default value: 3

Example: 2

Snap execution Dropdown list
Choose one of the three modes in which the Snap executes. Available options are:
  • Validate & Execute: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only: Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default value: Execute only

Example: Validate & Execute

Examples