CSV file as a source

You can use comma-delimited CSV files as a source for a data pipeline. AutoSync uploads the files to temporary storage, loads them to the destination, and deletes them from temporary storage. Each source file loads into a target table of the same name.

AutoSync stores CSV values in the target as strings, unless you enable Infer data types for CSV files in the connection configuration. To infer column types, AutoSync checks up to the first 100 rows and can detect the following types:
  • STRING
  • NUMBER
  • BOOLEAN
  • TIMESTAMP
With type inference enabled, AutoSync treats empty strings as null values.

AutoSync supports the Full load and Append load types for CSV files. The first run of the data pipeline uses Full load. You can change the load type to Append by editing the data pipeline and selecting the Auto synchronize tab. Learn more about how AutoSync loads files and load types.

You cannot schedule synchronization for a data pipeline with CSV files as a source. To use the data pipeline again, select more files. The Upload CSV file action is available from the card, the three-dot menu, the list view, and the details panel.

Tip: This endpoint uploads files from a network location. You can also upload CSV files from an Amazon S3 bucket using the S3 endpoint or by using the SFTP endpoint.

Requirements

File names cannot include single or double quotes or a special character, such as #, $, or %. Files must not be empty. To load correctly, each file must:

  • Have a header row.
  • Be uncompressed.
  • Be less than 100 MB in size.
  • Use single commas as delimiters.
  • Use quotes to enclose meaningful spaces and commas that are part of a field value. AutoSync trims leading and trailing spaces.
  • Use quotes to enclose fields that include commas in the value. For example, in a source that lists vehicles where the header specifies the year, make, model, and description columns, the quotes preserve the comma in the description column: 1997, Ford, E350, "Extended cab, luxury truck"

Transformations

A data pipeline using the CSV file endpoint doesn't save the file names after the data loads to the target. However, AutoSync does save any transformation configuration. You can use the same data pipeline to upload one or more files with the same name, and AutoSync applies any previously configured transformations. This is true whether you use the Upload CSV file link on the data pipeline card, or edit it — when you select files with the same name.

For example, you might create a data pipeline named CSV-employees to upload data from an employees.csv file to Snowflake. This file lists phone numbers that shouldn't be exposed in the Snowflake table. When creating the data pipeline, remove the Phone column in the Filter columns dialog to prevent AutoSync from loading it. Later, the HR department hires more people and adds them to the employees.csv file.

You can add the new hires to the target table in one of the following ways:
  • From the CSV-employees data pipeline card:
    1. Click Upload CSV file.
    2. Select the updated employees.csv file.
    3. Click Save to upload the data without the phone numbers.
  • Double-click the CSV-employees data pipeline to edit it:
    1. In the Endpoints tab, select the updated employees.csv file.
    2. Click the Transformations tab and AutoSync prompts you to refresh the preview.
    3. Click the Refresh link in the message.
    4. The Removed columns list still contains the Phone column.
    5. Click Save to load the updated file without the phone numbers.