Load from files

SnapLogic AutoSync can load CSV and JSON files:

  • The CSV file endpoint loads files from a network file system.
  • The Amazon S3 endpoint loads CSV or JSON files from an Amazon S3 bucket.
  • The SFTP source loads CSV or JSON files using Secure File Transfer Protocol (SFTP)

The endpoints have different features and constraints:

  • The CSV file endpoint allows you to select up to 100 files, each of which must be 100 MB in size or less. The data pipeline runs when you save it. You can add data to the same table by changing the load type to Append and uploading a file with the same name.
  • The Amazon S3 and SFTP endpoints allow you to select either CSV or JSON files. Each data pipeline can load multiple files of the same format. AutoSync doesn't load the files until you run the integration. When the data pipeline runs, AutoSync performs a full load. You can schedule synchronization, but be aware that a second load of a file with the same name overwrites the data in that table.

The following sections describe loading from files. Refer to How AutoSync loads data for information on other sources.

CSV file endpoint

The source CSV files must have comma-separated values. On the first run, AutoSync creates a table for each CSV file. AutoSync uploads the files to temporary storage, loads them to the destination, and deletes them from temporary storage.

AutoSync stores CSV values in the target as strings, unless you enable Infer data types for CSV files in the connection configuration. To infer column types, AutoSync checks up to the first 100 rows and can detect the following types:
  • STRING
  • NUMBER
  • BOOLEAN
  • TIMESTAMP
With type inference enabled, AutoSync treats empty strings as null values.

You can configure transformations for the CSV file endpoint. If you later upload files with the same name, AutoSync applies the configured transformations. Learn more.

The default load type for CSV files is Full load. Change the load type to Append to support these use cases:

  • A file exceeds the 100 MB size limit and you want all data in the same destination table. Split the file into multiple files with the same name and upload them one at a time.
  • To upload data from multiple files into the same destination table, use the same file name and upload them one at a time.
Tip: With the Append load type, if you upload a file with content that was previously uploaded, the content is duplicated in the destination. This is because CSV files don't have primary keys to distinguish data as unique.

S3 and SFTP endpoints

For each source file, AutoSync:

  • Creates a table in the destination with the same name as the file.
  • Sets the destination column types:
    • For JSON files, by inferring a basic set of data types from source values.
    • AutoSync stores CSV values in the target as strings, unless you enable Infer data types for CSV files in the connection configuration. To infer column types, AutoSync checks up to the first 100 rows and can detect the following types:
      • STRING
      • NUMBER
      • BOOLEAN
      • TIMESTAMP
      With type inference enabled, AutoSync treats empty strings as null values.
  • Performs a full load. If a table with the same name as the file exists in the destination, AutoSync removes table data and loads values from the file referenced by the data pipeline. AutoSync only supports the Full load type for files loaded from S3.

File source Full load use cases

Full load supports two use cases. One is to upload new data without overwriting and the other is to overwrite stale data:

  • To upload new data without overwriting existing data:

    Use a unique file name each time. For example, a tool might generate a weekly report named weekly-report.csv. To avoid having a data pipeline overwrite last week's report, append a date to the name before uploading the file. Each report loads in a new table. In contrast, for CSV files, if you use the Append load type and don't change the file name, subsequent runs load to the original table.

  • To overwrite existing data periodically:

    Use the same file name as you uploaded before. The new file must have the same structure. For example, a CSV file should have the same header and a JSON file should have the same fields and value types.