File Reader

Overview

You can use this Snap to read data from various sources (such as SLDB, HTTP, S3, SFTP, HDFS, etc.) and produce a binary data stream at the output.

Note:

You must install the AzCopy utility, if you use the ABFS (Azure Blob File Storage) file protocol Azure Data Lake Gen 2 for bulk operation. The utility must be installed in Snaplex to fetch the file path. If the path is null, the native Azure Storage SDK is used for all operations. Learn more about the AzCopy command. If AzCopy Utility is not installed for ABS file transfer, the file transfer will not be as fast as using AzCopy because a REST call will be invoked for each file content instead of a bulk operation.

The SnapLogic Platform does not support the installation of utilities or processes on Cloudplexes. Learn more.

Important:

We plan to introduce additional S3 features exclusively in Amazon S3 Snaps, while Binary Snaps with S3 support will not contain these updates. Therefore, we recommend you to use the Amazon S3 Snap Pack for all your S3 operations within your pipelines. However, Binary Snaps will be retained as is to maintain backward compatibility, but be aware that we will no longer provide S3 support for the Binary Snaps.

Learn more: Migrate from Binary to S3 Snaps.



Prerequisites

IAM Roles for Amazon EC2

The 'IAM_CREDENTIAL_FOR_S3' feature is used to access S3 files from EC2 Groundplex, without Access-key ID and Secret key in the AWS S3 account in the Snap. The IAM credential stored in the EC2 metadata is used to gain access rights to the S3 buckets. To enable this feature, set the Global properties (Key-Value parameters) and restart the JCC:jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE

This feature is supported in the EC2-type Groundplex only. Learn more.

Connect to FTP server:

To connect to the FTP server that needs to reuse the session for data transfer over TLS protocol, add:
-DFTPS_SSL_TLS_PROTOCOL=TLSV1.2  (or) TLSV1.3
property as a JVM option under the Global properties of the Node Properties tab:

Limitations

  • For most file protocols, the Snap behaves the same in both Snaplex and Groundplex. However, the HDFS protocol works only in the Groundplex. The Hadoop cluster must be open to the Groundplex server instance without any authentication.

  • When reading a file over HTTP, the File Reader Snap displays an error if the number of bytes consumed does not match the Content-Length header value present in the response.

  • Do not use sldb as a file system or storage. File Assets are intended only for specialized files that a pipeline uses to reference specific data, such as accounts, expressions, or JAR files. Use a Cloud storage provider to store production data. File Assets should not be used as a file source or as a destination in production pipelines. When you configure the File Reader Snap, set the file path to a cloud provider or an external file system.

Known issues

  • This Snap fails for SMB file path with the error: unable to create new native thread.
  • This Snap Pack does not natively support SHA1-based algorithms to connect to SFTP endpoints. With the August 2023 GA release, you can now leverage the properties specified in the Configuration settings for Snaps to add support for ones that are disabled on your Snaplex.
  • If the Snap encounters a file with the same name as your Project Space, it can result in an error when you attempt to use that file's name within the Mapper Snap. For instance, if your Project Space is named "servicenow/to_snowflake" and the file being read is named "servicenow_to_snowflake_demo.json," you may encounter issues. Consider using the complete file path instead of just the file name as a workaround.

Snap views

View Description Examples of upstream and downstream Snaps
Input Input may contain value(s) to evaluate the JavaScript expression in the File property.
  • Upstream Snap is optional. Any Snap with a document output view can be connected upstream.
Output Binary data read from the source specified in the File property with header information about the binary stream.
Warning: When reading a file, all the characters in the filename (except dot, hyphen, underscore, space, alphabet, and digit) are replaced with an underscore (_) in the content-disposition column of the output. If you want to get the exact file name in the output column, consider using content-location as the identifier instead of content-disposition.
An example of the output preview on the File property value of "http://www.facebook.com" is as follows:
[ { "": "Preview binary0...", "content-type": "text/html; charset=utf-8", "x-frame-options": "DENY",
 "connection": "keep-alive", "transfer-encoding": "chunked", "date": "Thu, 23 Oct 2014 00:24:40
GMT", "content-location": "https://www.facebook.com", "pragma": "no-cache", "p3p": "CP=\"Facebook
does not have a P3P policy. Learn why here: http://fb.me/p3p\"", "cache-control": "private, no-
cache, no-store, must-revalidate", "x-xss-protection": "0", "x-content-type-options": "nosniff", "x-
fb-debug":
"N6wiHWAvz9kzpPUoM5vTm+yZzCZyiSrHXFXumHQixfMd0Qi+VDm514PkrrmQu2ISuuMTTFtUTqDZgDVG4blPTw==",
"expires": "Sat, 01 Jan 2000 00:00:00 GMT", "set-cookie": "reg_ext_ref=deleted; expires=Thu, 01-Jan-
1970 00:00:01 GMT; Max-Age=0; path=/; domain=.facebook.com" } ]
Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution Stops the current pipeline execution when an error occurs.
  • Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records.
  • Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap settings

Legend:
  • Expression icon (): Allows using pipeline parameters to set field values dynamically (if enabled). SnapLogic Expressions are not supported. If disabled, you can provide a static value.
  • SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
  • Suggestion icon (): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
  • Upload : Uploads files. Learn more.
Learn more about the icons in the Snap settings dialog.
Field / Field set Type Description
Label String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: File Reader

Example: File Reader
File String/Expression Required. Specify the URL for a regular file that must begin with a file protocol. The supported file protocols are:
  • http:
  • https:
  • s3:
  • ftp:
  • ftps:
  • sftp:
  • hdfs:
  • sldb:
  • smb:
  • file: (only for use with a Groundplex)
  • wasb:
  • wasbs:
  • gs:
  • adl:

You can also upload a file from using the Upload icon. You can preview the uploaded file using the preview icon. Learn more about Previewing File

Note:
  • This Snap supports S3 Virtual Private Cloud (VPC) endpoints. For example:

    s3://my-bucket@bucket.vpce-028b7814794578709-vu0vvauy.s3.us-west-2.vpce.amazonaws.com

  • This Snap supports Oracle Object Storage endpoints when used with pre-authenticated requests. For example:

    https://objectstorage.us-sanjose-1.oraclecloud.com/p/123AbcdEFG12345_xyz123/n/MyNamespace/b/snaplogic-academy/o/sample.json

  • To create a pre-authenticated request, refer to the instructions in the following Oracle article: Using Pre-Authenticated Requests.
Warning: The File value should be an absolute path for all protocols except for SLDB. For files in SLDB, the Snap can read only files in the same Project Directory or the Shared Project Directory. It cannot access files from other Projects. Typically, the file names in the Reader Snaps are read from incoming document which might have a structure different from the relative path. For optimal results, we recommend that you build absolute paths to their projects and then add the file name.

Note: When you provide a file path that contains more than five entities (for example, entity1/entity2/entity3/entity4/file1.json) the Snap displays a Lint Warning in your Pipeline.

Note:
  • "://" is a separator between the file protocol and the rest of the URL and the host name and the port number should be between "://" and "/". The hostname and port number are omitted in the SLDB and s3 protocols. If the port number is omitted, a default port for the protocol is used.
  • The file:/// protocol is supported only on Groundplex. In Cloudplex configurations, use SLDB or other file protocols. When using the file:/// protocol, the file access is done using the permissions of the user assigned or associated with the Snaplex (by default Snapuser). File system access is to be used with caution, and it is the customer's own responsibility to ensure that file system is cleaned up after use.

Default value: N/A

Example:
  • "asset.json" or "sldb:///asset.json"

  • "shared/asset.json" or "sldb:///shared/asset.json"

  • s3://my-bucket@bucket.vpce-028b7814794578709-vu0vvauy.s3.us-west-2.vpce.amazonaws.com
  • s3:///<S3_bucket_name>@s3.

    <region_name>.amazonaws.com/<path>/<file_name>

    For region names and their details, see AWS Regions and Endpoints.

    Example: s3:///[email protected]/test.json sftp://ftp.snaplogic.com:22/dir/filename

  • smb://smb.snaplogic.com:445/test_files/csv/input.csv

  • $filename (The value of the $filename is obtained from the input document and the document should have an entry with the "filename" key.)

  • _filename (A key/value pair with "filename" key should be defined as a pipeline parameter.)

  • file:///D:/testFolder/ (if the Snap is executed in the Windows Groundplex and needs to access D: drive)

  • wasb:///Snaplogic/testDir/sample.csv (to read 'sample.csv' file in the 'testDir' folder in the 'Snaplogic' container)

  • gs:///mybucket/csv/test.csv (to read 'test.csv' file in the 'csv/' folder of the 'mybucket' bucket)

  • adl://storename/folder/filename (to read the file from a location of the storage)

Prevent URL encoding Checkbox

When enabled, this will prevent the Snap from automatically URL encoding the file path (including the query string if it exists). Enable this setting to use the file path value as-is.

Refer to section : Encoding of Characters in a URL

Default status: Deselected

Enable staging Checkbox If selected, the Snap downloads the source file into a local temporary file. When the download is completed, it streams the data from the temporary file to the output view. This property prevents the Snap from being blocked by slow downstream pipeline. The local disk should have sufficient free space as large as the expected file size.
Warning: Some Snaps may take a long time to process large amounts of data. This, in turn, could lead to connection timeouts, causing the pipeline to fail. Selecting this property saves the data on your local disk, enabling you to avoid such timeouts.

Default status: Deselected

Number of retries Integer/Expression Specify the maximum number of retry attempts that the Snap must make in case there is a network failure, and the Snap is unable to read the target file.
Note:
  • If the value is larger than 0, the Snap first downloads the target file into a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap streams the data from the temporary file to the downstream Pipeline. All temporary local files are deleted when they are no longer needed.

  • Ensure that the local drive has sufficient free disk space to store the temporary local file.

Minimum value: 0

Default value: 0

Example: 3
Retry interval (seconds) Integer/Expression Specify the minimum number of seconds for which the Snap must wait before attempting recovery from a network failure.

Minimum value: 1

Default value: 1

Example: 3

Additional output headers Use this field set to define key-value pairs as additional headers in the binary output.
Note: If the downstream Snap is HTTP Client Snap, these headers are used to evaluate expressions and perform the multipart file upload from various file protocols multiple times.
Key String Specify the key name for the additional output header.
Value Dropdown list/Expression Specify the value for the additional output header.
Advanced properties Use this field set to define specific settings for polling files. Click to add a new row for defining an advanced property. This field set contains the following fields:
  • Properties
  • Values
Properties Dropdown list The URI of the Shared Access Storage (SAS) to be accessed. Supported SAS types are:
  • Service SAS on container

  • Service SAS on blob

  • Account SAS

Values String/Expression Specify the value for the SAS URI.
Warning: Ensure that the URI is specified in the format described here.

If the SAS URI value is provided in the Snap settings, then the settings provided in the account (if any account is attached) are ignored.

Default value: N/A

Example: https://myaccount.blob.core.windows.net/sascontainer/sasblob.txt?sv=2015-04-05&st=2015-04-29T22%3A18%3A26Z&se=2015-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D

Snap execution Dropdown list
Choose one of the three modes in which the Snap executes. Available options are:
  • Validate & Execute: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only: Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default value: Execute only

Example: Validate & Execute

Reading files from Project and Shared Project Spaces

  • If a Pipeline is created in a project other than the shared project and you want to read the "asset.json" file from the same project, enter "asset.json" or "sldb:///asset.json".
  • If a Pipeline is created in the shared project and you want to read the "asset.json" file from the shared project, enter "asset.json" or "sldb:///asset.json".
  • If a Pipeline is created in a project other than the shared project and you want to read the "asset.json" file from the shared project, enter "shared/asset.json" or "sldb:///shared/asset.json".
  • Ensure the file name, folder name, or the file path does not contain '?' character because it is not fully supported and when present, the Snap might fail.

File value as an Expression

The File value can be a JavaScript expression which is evaluated with values from the input view document and the pipeline parameters. The syntax for file value is: [protocol]://[host][:port]/[path]

  • $filename (The value of the $filename is obtained from the input document and the document should have an entry with the "filename" key.)
  • _filename (A key/value pair with "filename" key should be defined as a pipeline parameter.)

Encoding of Characters in a URL

Following are some of the common characters that are automatically encoded by the Snap:

Character name Character URL Encoded value
backslash \ %5C
Left-angle < %3C
Left-square [ %5B
percent % %25
Pound # %23
Right-angle > %3E
Left-curly { %7B
Right-curly } %7D
Right-square ] %5D
space %20
Right-square ] %5D
Left-curly { %7B
Right-curly } %7D
Right-angle > %3E
Pound # %23
percent % %25
Left-angle < %3C
Left-angle [ %5B
backslash \ %5C

Following are some of the characters that are not automatically encoded by the Snap:

Character name Character URL Encoded value
semi-colon ; %3B
question mark ? %3F
plus + %2B
forward slash / %2F
equals = %3D
dollar $ %24
comma , %2C
colon : %3A
ampersand & %26
colon : %3A
comma , %2C
dollar $ %24
equals = %3D
forward slash / %2F
plus + %2B
question mark ? %3F
semi-colon ; %3B

Preview File

To preview a file, in the File click the Preview icon.



The Preview Type contains the following options:
  • Hex: Displays the preview data in hexadecimal format.
  • Text: Displays the preview data in text format.
  • Render text with whitespace: Renders whitespaces as dots "." and tabs as underscores "_" in the preview data.

Troubleshooting

Error Reason Resolution

Response code: 400, unable to import the file <file name>

Request from elastic.snaplogic.com returned an error.

The name of the file that is being read by the Snap cannot be the same as the Project Space name.

Provide the complete path of the file (instead of only the file name) in this format: “/orgname/projectspace/project/filename

For example: /snaplogic/shared/analytics/ga4.json

Response code: 400, unable to import expression library: <file_ProjectSpace>

Request from elastic.snaplogic.com returned an error.

Path names at root level are not allowed.

Provide the complete path of the file (instead of only the file name) in this format: “/orgname/projectspace/project/filename

For example: /snaplogic/shared/analytics/ga4.json

Examples