Athena Bulk Load
Overview
You can use this Snap to support bulk upload in Athena. All input documents are written into S3 objects in Parquet or ORC format, and then Athena table is created.

Write-type Snap
Does not support Ultra Tasks
Prerequisites
None.
Known issues
- Choosing the Overwrite option for the If table exists or S3 directory is not empty field deletes the existing Athena table and its associated S3 objects when the Snap execution starts. Use this option cautiously to prevent data loss.
Snap views
View | Description | Examples of upstream and downstream Snaps |
---|---|---|
Input | This Snap has two document input views. You can use the second input view to
specify the column metadata of the Athena table. Important: If the second
input view is not provided, the Snap attempts to generate one automatically from
the first document. Therefore, some data types it guessed may not be optimal, or
some columns may be missing. |
|
Output | This Snap has one document output view. | |
Error |
Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:
Learn more about Error handling in Pipelines. |
Snap settings
- Expression icon (
): JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
- SnapGPT (
): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
- Suggestion icon (
): Populates a list of values dynamically based on your Account configuration.
- Upload
: Uploads files. Learn more.
Field / field set | Type | Description |
---|---|---|
Label | String | Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Athena Bulk Load Example: Athena Bulk Load - Client list |
Database | String/Expression/Suggestion | Required. Specify the name of the Athena database, that is a logical grouping of Athena tables. Default value: N/A Example: mydatabase |
Table | String/Expression/Suggestion | Required. Specify the name of the Athena table for the Snap to create and to perform the bulk load operation. Default value: N/A Example: Sales_db |
If table exists or S3 directory is not empty | Dropdown list/Expression | Required. Choose the action to perform if the target
table name already exists or the specified S3 directory is not empty. Available
options are:
Default value: Error Example: Overwrite |
S3 directory path | String/Expression | Required. Specify the Amazon S3 directory path where the Parquet/ORC data is stored. Default value: s3://<bucket>/<path>/ Example: s3://test-bucket/parquet/partition_test/ |
Partitioned by | Use this field set to define the partition of the columns. | |
Column | String/Expression/Suggestion | Specify the list of column names to write input data into multiple S3 objects. Important:
Default value: N/A Example: partnername |
Data format | Dropdown list/Expression | Required. Choose the format of the data being loaded
into the Athena table. Available options are:
Default value: Parquet Example: ORC |
Compression for Parquet | Dropdown list/Expression | Appears when Parquet is selcted in the Data format field. Choose the compression type for Parquet files. Available options are:
Default value: None Example: LZO |
Compression for ORC | Dropdown list/Expression | Appears when ORC is selcted in the Data format field. Choose the compression type for ORC files. Available options are:
Default value: None Example: SNAPPY |
Client settings | Use this field set to define the client settings. | |
Maximum retries | Integer/Expression | Required. Specify the maximum number of retry
attempts. Default value: 3 Example: 10 |
Multipart size (MB) | Integer/Expression | Required. Specify the size of each part in megabytes
(MB) used during S3 multipart upload. Important: The maximum number of
parts in the S3 multipart upload is 10,000. Therefore, if the expected S3 object
is larger than 100 GB, you must enter a value larger than 10. The maximum size of
S3 object is 5 TB. Minimum value: 5 Maximum value: 5000Default value: 10 Example: 100 |
Snap execution | Dropdown list |
Select one of the three modes in which the Snap executes.
Available options are:
Default value: Execute only Default value: Validate & Execute |
Additional information
The Athena Bulk Load Snap supports the following data types:
Athena data type | ‘Data format’ = Parquet | 'Data format’ = ORC |
---|---|---|
BOOLEAN | Supported | Supported |
TINYINT | Supported (stored as INT32 in the parquet file) | Supported |
SMALLINT | Supported (stored as INT32 in the parquet file) | Supported |
INT, INTEGER | Supported | Supported |
BIGINT | Supported | Supported |
FLOAT | Supported | Supported |
DOUBLE | Supported | Supported |
DECIMAL (precision, scale) | Not supported (Changed to DOUBLE, precision and scale are ignored) | Supported |
STRING | Supported | Supported |
CHAR (length) | Supported (length is required) | Supported (length is required) |
CHAR | Not supported | Not supported |
VARCHAR (length) | Supported (length is required) | Supported (length is required) |
VARCHAR | Not supported | Not supported |
BINARY | Supported | Supported |
TIME | Not supported (use STRING data type) | Not supported (use STRING data type) |
DATE | Supported (The input value should be a string in the format yyyy-MM-dd, e.g., 2024-10-15) | Supported (The input value should be a string in the format yyyy-MM-dd, e.g., 2024-10-15) |
TIMESTAMP | Supported (The input value should be a string in the format yyyy-MM-dd HH:mm:ss.SSS, e.g., 2024-10-15 11.12.13.456) | Supported (The input value should be a string in the format yyyy-MM-dd HH:mm:ss.SSS, e.g., 2024-10-15 11.12.13.456) |
ARRAY <element_type> | ARRAY of primitive type or STRUCT is supported; Recursive ARRAY of ARRAY is not supported (for example, ARRAY<ARRAY<String>>) | Supported |
STRUCT <field_name_1:field_type_1, field_name_2:field_type_2, …> | Supported | Supported |
MAP (key_type, value_type) | Not supported | Not supported |
Learn more about Data types in Amazon Athena.
Troubleshooting
Failed to convert to epoch millis / Failed to convert to epoch days
- For epoch millis: yyyy-MM-dd'T'HH:mm:ss.SSS
- For epoch days: yyyy-MM-dd