Snowflake - Bulk Load
Overview
You can use the Snowflake - Bulk Load Snap to load data from input sources or files stored on external object stores like Amazon S3, Google Storage, and Azure Storage Blob into the Snowflake data warehouse.

Write-type Snap
-
Works in Ultra Tasks
Prerequisites
You must have minimum permissions on the database to execute Snowflake Snaps. To understand if you already have them, you must retrieve the current set of permissions. The following commands enable you to retrieve those permissions:
SHOW GRANTS ON DATABASE <database_name>
SHOW GRANTS ON SCHEMA <schema_name>
SHOW GRANTS TO USER <user_name>
- External volume has to be created on the Snowflake worksheet, or Snowflake Execute snap. Learn more about creating external volume
- Usage (DB and Schema): Privilege to use the database, role, and schema.
- Create table: Privilege to create a temporary table within this schema.
grant usage on database <database_name> to role <role_name>;
grant usage on schema <database_name>.<schema_name>;
grant "CREATE TABLE" on database <database_name> to role <role_name>;
grant "CREATE TABLE" on schema <database_name>.<schema_name>;
Learn more about Snowflake privileges: Access Control Privileges.
- Amazon S3 Bucket: The Snowflake account must include the S3 Access Key ID, S3 Secret Key, S3 Bucket Name, and S3 Folder Path.
- Amazon S3 Bucket: The S3 bucket must be located in the same region as your Snowflake cluster.
- Microsoft Azure Blob Storage: A valid and active Snowflake Azure database account is required.
- Google Cloud Storage: Ensure the storage bucket has the correct public access settings and access control permissions configured on the Google Cloud Platform.
Limitations and Known Issues
- The special character
~
is not supported in the temporary directory name on Windows. It is reserved for the user's home directory. - Snowflake supports cross-account access through Storage Integration. This method allows you to use Cross Account IAM in external staging without passing credentials explicitly. Access is granted via the named stage or integration object. Learn more about configuring Cross Account IAM role support.
- The Snowflake Bulk Load Snap expects the column order in the input data to match the target table's structure. Mismatched column order will cause data validation to fail.
- If the Snowflake Bulk Load Snap fails due to insufficient memory on the JCC node while using an Input View as the data source and Internal Stage as the staging location, consider switching to an external staging location such as Amazon S3, Azure Blob, or Google Cloud Storage.
- When a bulk load operation fails due to invalid input that lacks default columns, the
error view may not display the incorrect columns properly. This is a known Snowflake issue
tracked under
JIRA SNOW-662311
andJIRA SNOW-640676
. - This Snap does not support creating Iceberg tables with an external catalog in Snowflake. Currently, Snowflake only allows read-only access to such tables and does not support write capabilities. Learn more about Iceberg catalog options
- Snowflake does not support cross-cloud or cross-region Iceberg tables when using
Snowflake as the Iceberg catalog. If the Snap displays an error such as
External volume <volume_name> must have a STORAGE_LOCATION defined in the local region
, ensure that the external volume uses a storage location in the same region as your Snowflake account.
Behavior change
- Add the following key-value pair in the Global properties
section of the Node Properties
tab: Key: jcc.jvm_optionsValue: -Dhttp.useProxy=true
- Add the following key-value pairs in the URL properties of the Snap under Advanced properties.
Known Issues
Known Issue: :The create table operation fails if it contains a geospatial data type column.
Workaround: :Create the table manually ahead of time or use the Snowflake - Execute to create the table.
Because of performance issues, all Snowflake Snaps now ignore the Cancel queued queries when pipeline is stopped or if it fails option for Manage Queued Queries, even when selected. Snaps behave as though the default Continue to execute queued queries when the Pipeline is stopped or if it fails option were selected.
Snap views
View | Description | Examples of upstream and downstream Snaps |
---|---|---|
Input | Documents containing the data to be uploaded to the target
location. Note: Second Input View This Snap has one document input view by default. You can add a second input view for metadata for the table as a document so that when the target table is absent, this table metadata can be created in the database with a similar schema as the source table. This schema is usually from the second output of a database Select Snap. If the schema is from a different database, the data types might not be properly handled. Learn more about adding metadata for the table in the second input view from the example Providing Metadata For Table Using The Second Input View. |
|
Output | If an output view is available, then the output document displays the number of input records and the status of the bulk upload as follows: | |
Error |
Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:
Learn more about Error handling in Pipelines. |
Snap settings
- Expression icon (
): Allows using JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
- SnapGPT (
): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
- Suggestion icon (
): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
- Upload
: Uploads files. Learn more.
Field / Field set | Type | Description |
---|---|---|
Label | String | Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if there are more than one of the same Snap in the pipeline. |
Schema Name | String/Expression/ Suggestion | Required. Specify the database schema name. In case it
is not defined, then the suggestion for the Table Name retrieves all tables names of
all schemas. The property is suggestible and will retrieve available database
schemas during suggest values. Note: The values can be passed using the Pipeline
parameters but not the upstream parameter. Default value: N/A Example: Schema_demo |
Table Name | String/Expression/ Suggestion | Required. Specify the name of the table to execute bulk
load operation on. Note: The values can be passed using the Pipeline parameters but
not the upstream parameter. Default value: N/A Example: employees_table |
Create table if not present | Checkbox | Select this checkbox to automatically create the target table if it does not
exist. Note:
Note: This should not be used in production since there are no indexes or
integrity constraints on any column and the default varchar() column is over 30k
bytes. Note: The create table operation fails if it contains a geospatial
data type column. Workaround: Create the table manually ahead of time, or use the
Snowflake - Execute to create the
table. Default value: N/A Example: employees_table |
Iceberg table | Checkbox | Appears when you select Create table if not present.
Select this checkbox to create an Iceberg table with the Snowflake catalog. Learn
more about how to create and Iceberg table with Snowflake as the
Iceberg catalog. Default status: deselected |
External volume | String/Expression/ Suggestion | Required. Appears when you select the Iceberg
table.Specify the external volume for the Iceberg table. Learn more
about how to configure an external volume for Iceberg
tables. Default value: N/A Example: employees |
Base location | String/Expression | Required. Appears when you select the Iceberg
table checkbox. Specify the Base location for the Iceberg table.
Note: The base location is the relative path from the external
volume. Default value: N/A Example: employees |
Data source | Dropdown list | Specify the source from where the data should load. The available options are
Input view and Staged files. Note: When the option Input
View is selected, leave the Table Columns
field empty, and if the Staged files option is selected,
provide the column names for the Table Columns to which the
records are to be added. Default value: Input view Example: Staged files |
Preserve case sensitivity | Checkbox |
Select this checkbox to preserve the case sensitivity of the column names.
Default status: deselected |
Load empty strings | Checkbox |
Select this checkbox to load empty string values in the input documents as empty
strings to the string-type fields. Else, empty string values in the input
documents are loaded as null. Null values are loaded as null regardless.
Default status: deselected |
Truncate data | Checkbox | Select this checkbox to truncate existing data before performing data load.
With the Bulk Update Snap, instead of doing truncate and then update, a Bulk Insert
would be faster. Default status: deselected |
Staging location | Dropdown list/Expression | Select the type of staging location that is to be used for data loading:
Note: The Snap creates temporary files in JCC when the Staging
location is internal and the Data
source is input view. These temporary files
are removed automatically once the Pipeline completes execution. Vector support:
If you want to delete the temporary files from the S3 Bucket, we recommend you assign the delete object permission policy to the S3 user to delete the files. Learn how to assign delete object permission to an S3 user in AWS S3. If you do not want to delete the temporary files, you can add an error view to the Snap and run the pipeline. Default value: Internal Example: External |
Flush chunk size (in bytes) | String/Expression | Appears when you select Input view for Data
source and Internal for Staging
location. When using internal staging, data from the input view is written to a temporary chunk file on the local disk. When the size of a chunk file exceeds the specified value, the current chunk file is copied to the Snowflake stage and then deleted. A new chunk file simultaneously starts to store the subsequent chunk of input data. The default size is 100,000,000 bytes (100 MB), which is used if this value is left blank. Default value: N/A Example: 400000000 |
Target | String/Expression | Specify an internal or external location to load the data. If you select
External for Staging Location, a staging area is created in Azure, GCS, or S3 as
configured. Otherwise, a staging area is created in Snowflake's internal location.
This field accepts the following input:
Default value: N/A Example: s3://test_bucket |
Storage Integration | String/Expression | Specify the pre-defined storage integration that is used to authenticate the
external stages. Note: The value for the expression has to be provided as a pipeline
parameter and cannot be provided from the Upstream Snap for performance reasons
when you use expression values. Default value: N/A Example: External |
Staged file list | Use this field set to define staged file(s) to be loaded to the target file. | |
Staged file | String | Appears when you select Staged files for Data
so urce. Specify the staged file to be loaded to the target table.
Default value: N/A Example: File_name |
File name pattern | String | Appears when you select Staged files for Data
source. Specify a regular expression pattern string, enclosed in
single quotes with the file names and /or path to match. Default value: N/A Example: .length |
File format object | String | Specify an existing file format object to use for loading data into the table.
The specified file format object determines the format type such as CSV, JSON, XML,
AVRO, or other format options for data files. Default value: N/A Example: jsonPath() |
File format type | String | Specify a predefined file format object to use for loading data into the table.
The available file formats include CSV, JSON, XML, and AVRO. Note: This Snap
supports only CSV and NONE as file format types when the
Datasource is Input view. Default value: N/A Example: CSV |
File format option | String | Specify the file format option. Separate multiple options by using blank spaces
and commas. Note:
You can use various file format options including a binary format which passes
through in the same way as other file formats. Learn more: File Format Type Options. Before loading
binary data into Snowflake, you must specify the binary encoding format, so that
Snap can decode the string type to binary types before loading it into
Snowflake. This can be done by selecting the following binary file format:
Note: When using external staging locations:
Default value: N/A Example: BINARY_FORMAT=UTF8 |
Table Columns | Use this field set to specify the columns to be used in the COPY INTO command. This only applies when the Data source is Staged files. | |
Columns Default | String | Appears when you select Staged files for Data source. Specify the staged file
to be loaded to the target table. Default value: N/A Example: File_name |
Select Query | String/Expression |
Default value: N/A Example: select substr(t.$2,4), t.$1, t.$5, t.$4 from @mystage t Appears when the Data source is Staged files. Specify the SELECT query to transform data before loading it into the Snowflake database. The SELECT statement transform option enables querying the staged data files by either reordering the columns or loading a subset of table data from a staged file. For example, select $1:location, $1:dimensions.sq_ft, $1:sale_date, $1:price from @mystage/sales.json.gz t This query loads the file sales.json from the internal stage mystage, (which stores the data files internally); wherein location, dimensions.sq_ft, and sale_date are the objects. (OR) select substr(t.$2,4), t.$1, t.$5, t.$4 from @mystage t This query reorders the column data from the internal stage mystage before loading it into a table. The (SUBSTR), SUBSTRING function removes the first few characters of a string before inserting it. Note: We recommend you not use a temporary stage while loading your data.
|
Encryption type | Dropdown list |
Default value: None Example: Server-Side Encryption Specify the type of encryption to be used on the data. The available encryption options are:
The KMS Encryption option is available only for S3 Accounts (not for Azure Accounts and GCS) with Snowflake. Note: If Staging Location is set to Internal, and when Data source is Input view, the Server Side Encryption and Server-Side KMS Encryption options are not supported for Snowflake snaps: This happens because Snowflake encrypts loading data in its internal staging area and does not allow the user to specify the type of encryption in the PUT API. Learn more: Snowflake PUT Command Documentation.
|
KMS key | String/Expression |
Default value: N/A Example: <Encrypted> Specify the KMS key that you want to use for S3 encryption. Learn more about the KMS key: AWS KMS Overview and Using Server Side Encryption. |
Buffer size (MB) | String/Expression |
Default value: 10MB Example: 20MB Specify the data in MB to be loaded into the S3 bucket at a time. This property is required when bulk loading to Snowflake using AWS S3 as the external staging area. Minimum value: 5 MB Maximum value: 5000 MB Note: S3 allows a maximum of 10000 parts to be uploaded so this property must be configured accordingly to optimize the bulk load. Refer to Upload Part for more information on uploading to S3.
|
Manage Queued Queries | Dropdown list |
Default value: Continue to execute queued queries when the Pipeline is stopped or if it fails Example: Cancel queued queries when the Pipeline is stopped or if it fails Select this property to determine whether the Snap should continue or cancel the execution of the queued Snowflake Execute SQL queries when you stop the pipeline. If you select Cancel queued queries when the pipeline is stopped or if it fails, then the read queries under execution are canceled, whereas the write type of queries under execution are not canceled. Snowflake internally determines which queries are safe to be canceled and cancels those queries. |
On Error | Dropdown list | Select an action to perform when errors are encountered in a file. The available actions are:
|
Error Limit | Integer | Appears when you select SKIP_FILE_*error_limit* for
On Error. Specify the error limit to skip file loading when
the number of errors exceeds the limit.Default value: 0 Example: 3 |
Error Percentage Limit | Integer | Appears when you select SKIP_FILE_*error_percent_limit*% for
On Error. Specify the percentage of errors to skip file
loading.Default value: 0 Example: 1 |
Size Limit | Integer | Specify the maximum size (in bytes) of data to be loaded. At least one file is
loaded regardless of this setting unless there are no files. Null indicates no
limit. Note: At least one file is loaded regardless of the value specified for
SIZE_LIMIT unless there is no file to be loaded. A null value indicates no size
limit. Default value: 0 Example: 5 |
Purge | Checkbox | Appears when the Staging location is
External. Specify whether to purge the data files from the
location automatically after the data is successfully loaded. Default status: deselected |
Return Failed Only | Checkbox | Specify whether to return only files that have failed to load while
loading. Default status: deselected |
Force | Checkbox | Specify if you want to load all files, regardless of whether they have been
loaded previously and have not changed since they were loaded. Default status: deselected |
Truncate Columns | Checkbox | Select this checkbox to truncate column values that are larger than the maximum
column length in the table. Default status: deselected |
Strings considered as null | String/Expression | Specify the strings to be considered as null when loading into Snowflake. You
can configure multiple case-sensitive comma-separated strings from the input data as
null placeholders. This ensures effective handling of null values without impacting
existing data and optimizes the storage space in the internal stage. Default value: N/A Example: ^_AB, _nullIf |
Validation Mode | Dropdown list | Select the validation mode for visually verifying the data before unloading
it. The available options are:
Default value: N/A Example: RETURN_n_ROWS |
Validation Errors Type | Dropdown list | Appears when Validation Mode is not NONE. Select one of
the following methods for displaying the validation errors:
Default value: Full error Example: Do not show errors |
Rows to Return | Integer | Appears when you select RETURN_n_ROWS, RETURN_ERRORS, or
RETURN_ALL_ERRORS for Validation Mode. Specify the
number of rows not loaded into the corresponding table. Instead, the data is
validated to be loaded and returns results based on the validation option specified.
It can be one of the following values: RETURN_n_ROWS | RETURN_ERRORS |
RETURN_ALL_ERRORS. Default value: 0 Example: 5 |
Snap execution | Dropdown list |
Choose one of the three modes in
which the Snap executes. Available options are:
Default value: Execute only Example: Validate & Execute |
Instead of building multiple Snaps with interdependent DML queries, we recommend you use the Stored Procedure or the Multi Execute Snaps.
In a scenario where the downstream Snap depends on the data processed on an upstream database Bulk Load Snap, use the Script Snap to add delay for the data to be available.
For example, when performing a create, insert, and delete function sequentially on a pipeline, the Script Snap helps to create a delay between the insert and delete function. Otherwise, the delete function may get triggered before inserting the records into the table.
Troubleshooting
Data can only be read from Google Cloud Storage (GCS) with the supplied account credentials (not written to it).
Snowflake Google Storage Database accounts do not support external staging when the Data source is the Input view
Data can only be read from GCS with the supplied account credentials (not written to it).
Use internal staging if the data source is the input view or change the data source to staged files for Google Storage external staging.