BigQuery Bulk Load

Overview

This Snap executes a bulk load of the incoming documents into the Google BigQuery database directly without any intermediary such as Google Cloud Storage. The data is loaded in batches of configurable size. Once a batch is successfully loaded into the target table, the next batch is loaded. The batch load is also subject to a timeout. If the timeout is reached before all the documents in the batch are loaded, the batch is flushed and the next batch's bulk load is started. If the bulk load fails, then the Snap will try to load the batch again after a time delay. Retry occurs for a set number of times. The batch timeout limit, size of a batch, the time delay between retries, and the number of retries are all configurable.

Important:

Snaps in Google BigQuery Snap Pack

Write datetime values to the database tables, always in UTC format.
Convert any non-UTC values in the incoming data to UTC before writing them.
Consider datetime values without the time zone mentioned, as in UTC.

So, ensure that you include the time zone in all the datetime values that you load into Google BigQuery tables using this Snap.

For example: "2020-08-29T18:38:07.370 America/Los_Angeles", "2020-09-11T10:05:14.000-07:00", "2020-09-11T17:05:14.000Z"

Snap Type: Write
Does not support Ultra Tasks

Supported Accounts

Prerequisites

Write access to the Google BigQuery Account is required.

Limitations and Known Issues

The batch size can be a maximum of 10,000.

Known Issue

Copying data by creating a table with the same name in Google BigQuery immediately after deleting it, may not insert the rows as expected. This behavior is due to the way the tables are cached and the internal table ID is propagated throughout the system.

Workaround

We recommend you avoid rewriting in Google BigQuery and suggest the following workarounds. You can choose to use them individually or in unison, to suit your requirement.

Truncate the existing table instead of deleting it.
Add some randomly generated prefix to the table name each time you create the table (new or unique table name every time).

Behavior Change

In the 4.27 (427patches13615) release and later, pipelines that truncated (or deleted and re-created) a table and then used the BigQuery Write or BigQuery Bulk Load (Streaming) Snaps were allowed to retry the load operation. This lead to potential data loss because Google BigQuery uses an eventually consistent backend where one server might respond that it's okay to write, but other servers still truncate the table.

In the 4.32 (432patches20298) release and later, we disabled retries on truncated tables in these two Snaps to prevent a potential loss of data. We recommend that you use the BigQuery Bulk Load - Cloud Storage Snap instead.

Configurations

Account & Access

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. See Configuring Google BigQuery Accounts for information on setting up this type of account.

Snap Views


Type	Format	Number of Views	Examples of Upstream and Downstream Snaps	Description
Input	Document	Min: 1 Max: 1	Mapper JSON Generator	This Snap has exactly one document input view.
Output	Document	Min: 0 Max: 1	Mapper	This Snap has at most one document output view.
Learn more about Error handling.

Snap Settings

Note: Learn about the common controls in the Snap settings dialog.


Field/Field set	Description
Label String	Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: N/A Example: BigQuery Bulk Load Streaming
Project ID String/Expression	Required. Project ID of the project billed for the query. This is a suggestible field and can be populated based on the Account settings. Default value: None. Example: case1234
Dataset ID String/Expression	Required. Dataset ID of the destination table. This is a suggestible field and all the datasets in the specified project will be populated. Default value: None. Example: babynames
Table ID String/Expression	Required. Table ID of the destination table. This is a suggestible field and all the tables in the datasets will be listed. Default value: None Example: customerdata1
Create table if not present Checkbox	Specifies that the table should be created if not already present in the database. Default: Deselected
Batch size Integer/Expression	The number of records batched per request. If the input has 10,000 records and the batch size is set to 100, the total number of requests would be 100. Default value: 1000 Example: 100
Batch timeout (milliseconds) Integer/Expression	Time in milliseconds to elapse following which the batch if not empty will be processed even though it might be lesser than the given batch size. Default value: 2000 Example: 500 Important: Batch timeout value must be set with care. When this limit is reached, the batch will be flushed irrespective of whether all the records in the batch were loaded.
Batch retry count Integer/Expression	The number of times the server should try to load a failed batch. Default value: 0 Example: 5
Batch retry delay (milliseconds) Integer/Expression	The time delay between each retry. Default value: 500 Example: 500
	Choose one of the three modes in which the Snap executes. Available options are: `Validate & Execute`: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime. `Execute only`: Performs full execution of the Snap during pipeline execution without generating preview data. `Disabled`: Disables the Snap and all Snaps that are downstream from it. Default value: Execute only Example: Validate & Execute

Implicit retries in BigQuery Snaps

The BigQuery Snaps handle all retriable BigQuery errors (BigQuery exception, IO exception, and Runtime exception) internally.

429 (Too Many Requests):
- Retry attempts: Maximum of 5 retries.
- Delay Between Retries: Backoff strategy with jitter (random variation) is applied to prevent synchronized retries and reduce load.
401 (Unauthorized):
- Retry attempts: Maximum of 3 retries.
- Delay Between Retries: Backoff strategy is applied.
- Additional Actions: Reloads the BigQuery account on the retry event.
IOException and 500, 502, 503, 504 (Server Errors):
- Retry attempts: Maximum of 3 retries.
- Delay Between Retries: Backoff strategy is applied.

Note on Writing Numeric Values

Important:

Writing numeric values into Google BigQuery tables

Google BigQuery tables support columns with a NUMERIC data type to allow storing big decimal numbers (up to 38 digits with nine decimal places). But Snaps in Google BigQuery Snap Pack that load data into tables cannot create numeric columns. When the Create table if not present check box is selected, the Snaps create the required table schema, but map big decimals to a FLOAT64 column. So, to store the data into numeric columns using these Snaps, we recommend the following actions:

Create the required schema, beforehand, with numeric columns in Google BigQuery.
Pass the number as a string.

The Google API converts this string into a number with full precision and saves it in the numeric column.

Example:


Value Passed Through Snap	Value Stored in BigQuery	Remarks
`"12345678901234567890123456789.123456789"`	12345678901234567890123456789.123456789	As per this issue logged in Google Issue Tracker, if you send the values as strings, the values are never converted to floating-point form, so this works as expected.
`12345678901234567890123456789.123456789`	123456789012345678000000000000	Big decimal values sent as non-string values lose precision.

BigQuery Bulk Load - Streaming