BigQuery Bulk Upsert - Streaming

Performs bulk update or insert (upsert) operations into a BigQuery table from existing tables or any input data stream.

Overview

This Snap enables you to perform bulk update or insert (upsert) operations into a BigQuery table from existing tables or any input data stream. The upsert operation updates existing rows if the specified value exists in the target table and inserts a new row if the specified value does not exist in the target table.



Prerequisites

Write access for the Google BigQuery Account is required.

Limitations and Known Issues

None.

Supported Accounts

This Snap works with the following account types. For more information, see Configuring Google BigQuery Accounts.

Snap Views

Type Format Number of Views Examples of Upstream and Downstream Snaps Description
Input Document
  • Min: 1
  • Max: 1
  • CSV Parser
  • JSON Parser
  • JSON Generator

This Snap has exactly one document input view.

Input can come from any Snap that can pass a document to the output view, such as Structure or JSON Generator. Pipeline parameters can also be passed for project ID, dataset ID, and table ID, and so on.

Output Document
  • Min: 0
  • Max: 1
  • Mapper
  • Google BigQuery Execute

The output is in document view format. The data from the incoming document that is loaded to the destination table is the output from this Snap. It gives the load statistics after the operation is completed

The output view contains information about the bulk load details in the temporary table to better understand the flow. This also helps with error handling.

The output view also lists the number of rows that were updated, modified, or inserted in the target table.

Learn more about Error handling.

Snap Settings

Note: Learn about the common controls in the Snap settings dialog.
Field/Field set Description
Label

String

Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Default value: BigQuery Bulk Upsert (Streaming)

Example: GBQ Load Employee Tables

Project ID

String/Expression

Specify the project ID in which the dataset resides.

Default value: N/A

Example: test-project-12345

Dataset ID

String/Expression

Specify the dataset ID of the destination.

Default value: N/A

Example: dataset-12345

Table ID

String/Expression

Specify the table ID of the table you are creating.

Default value: N/A

Example: table-12345

Batch size

String

The number of records batched per request. If the input has 10,000 records and the batch size is set to 100, the total number of requests would be 100.

Default value: 1000

Batch timeout (milliseconds)

String

Time in milliseconds after which the batch will be processed (even though it might be less than the specified batch size).

Batch timeout value must be set with care. When this limit is reached, the batch will be flushed whether or not all the records in the batch were loaded.

Default value: 2000

Batch retry count

String

The number of times the server should try to load a failed batch.

Default value: 0

Batch retry delay (milliseconds)

String

The time delay between each retry.

Default value: 500

Snap execution

Dropdown list
Choose one of the three modes in which the Snap executes. Available options are:
  • Validate & Execute. Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only. Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled. Disables the Snap and all Snaps that are downstream from it.

Default value: Validate & Execute

Example: Execute only

Troubleshooting

Error/Error scenario Reason Resolution
In Google BigQuery, recently streamed data might not be available for DML operations for up to 90 minutes. As a result, you might encounter an error if you execute the BigQuery Bulk Upsert (Streaming) Snap on a table that has been modified within the last 30 minutes. This is a limitation of the BigQuery service when using DML statements. Learn more: https://cloud.google.com/bigquery/docs/data-manipulation-language#dml-limitations

Although Google states that it can take up to 90 minutes to be able to perform upserts on a recently modified table, the time needed to wait is often much less than that and data may be available within minutes.

To attempt retries, configure the Batch retry count and Batch retry delay (milliseconds) settings in the BigQuery Bulk Upsert (Streaming) Snap.
Key column name is required. No key column(s) specified for checking for existing entries. Please enter one or more key column names.
Key column name is not present in target table. Incorrect key column(s) specified for checking for existing entries. Please select one or more key column names from the suggestion box.
All columns in target table are key columns. The merge will fail as all columns in the target table are key columns. Please select one or more (but not all) key column names from the suggestion box.