Athena Bulk Load

Overview

You can use this Snap to support bulk upload in Athena. All input documents are written into S3 objects in Parquet or ORC format, and then Athena table is created.


Athena Bulk Load Overview

Prerequisites

None.

Known issues

  • Choosing the Overwrite option for the If table exists or S3 directory is not empty field deletes the existing Athena table and its associated S3 objects when the Snap execution starts. Use this option cautiously to prevent data loss.

Snap views

View Description Examples of upstream and downstream Snaps
Input This Snap has two document input views. You can use the second input view to specify the column metadata of the Athena table.
Important: If the second input view is not provided, the Snap attempts to generate one automatically from the first document. Therefore, some data types it guessed may not be optimal, or some columns may be missing.
Output This Snap has one document output view.
Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution Stops the current pipeline execution when an error occurs.
  • Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records.
  • Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap settings

Legend:
  • Expression icon (): JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
  • SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
  • Suggestion icon (): Populates a list of values dynamically based on your Account configuration.
  • Upload : Uploads files. Learn more.
Learn more about the icons in the Snap settings dialog.
Field / field set Type Description
Label String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: Athena Bulk Load

Example: Athena Bulk Load - Client list
Database String/Expression/Suggestion

Required. Specify the name of the Athena database, that is a logical grouping of Athena tables.

Default value: N/A

Example: mydatabase
Table String/Expression/Suggestion

Required. Specify the name of the Athena table for the Snap to create and to perform the bulk load operation.

Default value: N/A

Example: Sales_db
If table exists or S3 directory is not empty Dropdown list/Expression
Required. Choose the action to perform if the target table name already exists or the specified S3 directory is not empty. Available options are:
  • Overwrite: The Snap overwrites the existing table by deleting the existing table and all associated S3 objects.
    Warning: Choosing the Overwrite option may delete the Athena table and its associated S3 objects when the Snap execution starts. Use this option cautiously to prevent data loss.
  • Ignore: The Snap does nothing and terminates the execution.
  • Error: The Snap throws an exception if the table already exists or the specified S3 directory is not empty. This option is default in order to avoid an accidental data loss.

Default value: Error

Example: Overwrite
S3 directory path String/Expression

Required. Specify the Amazon S3 directory path where the Parquet/ORC data is stored.

Default value: s3://<bucket>/<path>/

Example: s3://test-bucket/parquet/partition_test/
Partitioned by Use this field set to define the partition of the columns.
Column String/Expression/Suggestion

Specify the list of column names to write input data into multiple S3 objects.

Important:
  • If Partitioned by is empty, the Snap creates one S3 object in the specified S3 directory. Its file name is <table_name>.<parquet or orc>.
  • For example, Partitioned by has columns year and month, the Snap appends year=<value>/month=<value>/<table_name>.<parquet or orc> to the specified S3 directory path for each S3 object it creates.

Default value: N/A

Example: partnername
Data format Dropdown list/Expression Required. Choose the format of the data being loaded into the Athena table. Available options are:
  • Parquet
  • ORC

Default value: Parquet

Example: ORC
Compression for Parquet Dropdown list/Expression

Appears when Parquet is selcted in the Data format field.

Choose the compression type for Parquet files. Available options are:
  • None
  • SNAPPY
  • GZIP
  • LZO

Default value: None

Example: LZO
Compression for ORC Dropdown list/Expression

Appears when ORC is selcted in the Data format field.

Choose the compression type for ORC files. Available options are:
  • None
  • SNAPPY
  • ZLIB
  • LZO

Default value: None

Example: SNAPPY
Client settings Use this field set to define the client settings.
Maximum retries Integer/Expression Required. Specify the maximum number of retry attempts.

Default value: 3

Example: 10

Multipart size (MB) Integer/Expression Required. Specify the size of each part in megabytes (MB) used during S3 multipart upload.
Important: The maximum number of parts in the S3 multipart upload is 10,000. Therefore, if the expected S3 object is larger than 100 GB, you must enter a value larger than 10. The maximum size of S3 object is 5 TB.

Minimum value: 5

Maximum value: 5000

Default value: 10

Example: 100

Snap execution Dropdown list
Select one of the three modes in which the Snap executes. Available options are:
  • Validate & Execute: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only: Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default value: Execute only

Default value: Validate & Execute

Additional information

The Athena Bulk Load Snap supports the following data types:

Athena data type ‘Data format’ = Parquet 'Data format’ = ORC
BOOLEAN Supported Supported
TINYINT Supported (stored as INT32 in the parquet file) Supported
SMALLINT Supported (stored as INT32 in the parquet file) Supported
INT, INTEGER Supported Supported
BIGINT Supported Supported
FLOAT Supported Supported
DOUBLE Supported Supported
DECIMAL (precision, scale) Not supported (Changed to DOUBLE, precision and scale are ignored) Supported
STRING Supported Supported
CHAR (length) Supported (length is required) Supported (length is required)
CHAR Not supported Not supported
VARCHAR (length) Supported (length is required) Supported (length is required)
VARCHAR Not supported Not supported
BINARY Supported Supported
TIME Not supported (use STRING data type) Not supported (use STRING data type)
DATE Supported (The input value should be a string in the format yyyy-MM-dd, e.g., 2024-10-15) Supported (The input value should be a string in the format yyyy-MM-dd, e.g., 2024-10-15)
TIMESTAMP Supported (The input value should be a string in the format yyyy-MM-dd HH:mm:ss.SSS, e.g., 2024-10-15 11.12.13.456) Supported (The input value should be a string in the format yyyy-MM-dd HH:mm:ss.SSS, e.g., 2024-10-15 11.12.13.456)
ARRAY <element_type> ARRAY of primitive type or STRUCT is supported; Recursive ARRAY of ARRAY is not supported (for example, ARRAY<ARRAY<String>>) Supported
STRUCT <field_name_1:field_type_1, field_name_2:field_type_2, …> Supported Supported
MAP (key_type, value_type) Not supported Not supported

Learn more about Data types in Amazon Athena.

Troubleshooting

Failed to convert to epoch millis / Failed to convert to epoch days

The provided date/time format is incorrect or incompatible with the expected format.

Ensure the date/time follows the required format:
  • For epoch millis: yyyy-MM-dd'T'HH:mm:ss.SSS
  • For epoch days: yyyy-MM-dd

Examples