How AutoSync loads data

SnapLogic® AutoSync retrieves data from the sources you choose and loads it to a destination Cloud Data Warehouse (CDW). Source data can come from CDWs, relational databases, files such as CSV and JSON, or applications such as Salesforce, Zendesk, and Gainsight. View the list of all supported endpoints.

This page describes how AutoSync loads data from sources with schemas and from applications. Refer to How AutoSync loads files for information on file loading.

After the initial load, if you set a schedule, AutoSync synchronizes automatically. To optimize throughput, AutoSync loads in batches. Depending on the source and target, synchronization might be a full load (where target data is overwritten), an incremental load, append (for CSV files only), or SCD2, which captures historical changes. You can change the synchronization schedule or load type after creating a data pipeline.

To verify the number of records that AutoSync processed, check the data pipeline details panel. Open the panel by clicking the data pipeline card. For the last run, the panel lists the number of new or updated records and those that failed to load.

To choose which tables or objects to load from a source, consider the following:
  • Size: The number of table rows or objects and their size directly affects how long it will take to load and synchronize and any costs associated with storage in staging and in the destination. Some destinations limit table or record size. Load only the data you need to get the best performance and reduce costs. When creating a data pipeline, you can filter out any non-required columns that you do not want to synchronize.
  • Synchronization: Choose a sychronization schedule and load type that meets your requirements for data freshness with the least amount of data transfer.
  • Related information: If a table or object references other tables or objects, you must select the related entities to have AutoSync load them.

Names and data types in the target

For the loaded data, AutoSync:
  • Uses the source table name, file name, or object name for the target table. If the table doesn't exist, AutoSync creates it.
  • Converts the data into a compatible type supported by the destination. For sources without a schema, such as CSV files, AutoSync uses a string type in the target. For JSON files, AutoSync can infer string, integer, float, boolean, and object types.
  • Adds columns that AutoSync uses internally. The column names are pre-pended with AUTOSYNC_. For example, the following shows the column definitions of a table that AutoSync loaded into Snowflake:
    Columns created by AutoSync

For example, the following image shows how AutoSync loads a Salesforce Contact object as a Contact table in Snowflake. The image shows only the first three fields and columns in alphabetic order. AutoSync converts the Salesforce Reference and String values to VARCHAR in Snowflake. Because Snowflake supports the DATE type, AutoSync does not convert the birth dates.

A Salesforce Contact object loads as a Contact table in Snowflake

Learn more about how AutoSync maps source data types to the target.

Reserved words and name casing

Destination endpoints typically have a set of reserved keywords that cannot be used as table or column names without using an escape character. Some destinations, such as Snowflake use a convention of uppercase letters for table and column names. By default, AutoSync will convert case to the convention of the destination and escape any reserved keywords. You might want to override this behavior if you have queries that are case sensitive and expect the case of the source.

When configuring a Snowflake endpoint, you can select the option to Enclose all target table and column names in double quotes to preserve the case. The following table uses a reserved keyword, CONSTRAINT, to illustrate the results in the destination and the impact on the SQL query.

Case preserved Name in source Name in target SQL query
Yes Constraint "Constraint" SELECT * FROM “Constraint”

How AutoSync handles errors

Errors in loading can be caused by many reasons, including:

  • Invalid source data.
  • Size of source data exceeds limits of the destination. For example, the number of rows or columns or the size in bytes.
  • Names in the source for tables or columns are not valid for the destination. For example, names with unsupported or too many characters.

On the initial run of a data pipeline, AutoSync does a full load. With the exception of file sources, and endpoints that do not support incremental load, subsequent runs load incrementally, only adding new records and updating changed records. You have the option on some endpoints to use SCD2, which also loads only new and updated records. AutoSync handles errors differently for full and incremental loads:

  • If errors occur during a full load, AutoSync generates an error log and attempts to finish the load. Environment admins (Org admins) can Find error logs for records that failed to load.
  • If errors occur during upsert, synchronization fails.
The data pipeline details panel reports errors and failed records, which are failures during loading. Open the details panel by clicking a data pipeline in the card or list view. Error messages contain a link to the Execution overview page, where you can view information about the failures:

For the incremental load type, AutoSync can log errors in a destination table:
  • The error table name is the table name prepended by autosync_error_table_.
  • The first column in the table holds the primary key of the record that failed.
    • If an insert error occurs for a full or an incremental load, the value in the ERROR_TYPE column is Insert Error.
    • If an update error occurs for an incremental load, the value in the ERROR_TYPE column is Update Error.
Currently, error tables are supported for Snowflake and Google BigQuery. To enable error tables, check the error table box when creating or editing credentials. The following example shows the checkbcx for Snowflake:
Error table checkbox