Multi File Reader

Overview

You can use this read type Snap to read binary data from various sources such as SLDB, HTTP, S3, SFTP, HDFS, and produces a binary data stream at the output. Unlike the File Reader Snap, this Snap can read more than one file in the given directory and its subdirectories recursively.
Important:

We plan to introduce additional S3 features exclusively in Amazon S3 Snaps, while Binary Snaps with S3 support will not contain these updates. Therefore, we recommend you to use the Amazon S3 Snap Pack for all your S3 operations within your pipelines. However, Binary Snaps will be retained as is to maintain backward compatibility, but be aware that we will no longer provide S3 support for the Binary Snaps.

Learn more: Migrate from Binary to S3 Snaps.



Prerequisites

IAM Roles for Amazon EC2

The 'IAM_CREDENTIAL_FOR_S3' feature is used to access S3 files from EC2 Groundplex, without Access-key ID and Secret key in the AWS S3 account in the Snap. The IAM credential stored in the EC2 metadata is used to gain access rights to the S3 buckets. To enable this feature, set the Global properties (Key-Value parameters) and restart the JCC:jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE

This feature is supported in the EC2-type Groundplex only. Learn more.

Connect to FTP server:

To connect to the FTP server that needs to reuse the session for data transfer over TLS protocol, add:

-DFTPS_SSL_TLS_PROTOCOL=TLSV1.2 (or) TLSV1.3 property as a JVM option under the Global properties of the Node Properties tab:



Limitations

  • For most file protocols, the Snap behaves the same way in both Snaplex and Groundplex. However, the HDFS protocol works only in a Groundplex. The Hadoop cluster must open to the Groundplex server instance without any authentication.
  • Do not use sldb as a file system or storage. File Assets are intended only for specialized files that a pipeline uses to reference specific data, such as accounts, expressions, or JAR files. Use a Cloud storage provider to store production data. File Assets should not be used as a file source or as a destination in production pipelines. When you configure the Multi File Reader , set the file path to a cloud provider or external file system.

Known issues

  • This Snap Pack no longer natively supports RSA-SHA1 authentication with the Secure File Transfer Protocol (SFTP). To enable support for RSA-SHA1 authentication, set the following property from the Node Properties section of the Configuration Options:

-Djsch.server_host_key=ssh-rsa -Djsch.client_pubkey=ssh-rsa

With the 4.33 GA release of the Binary Snap Pack, support for some algorithms for SFTP connection negotiation is removed for improved security and because we’ve updated the library used to connect to SFTP sources. If you want to revert to the previous settings, you can set the following jcc.jvm_options from the Node Properties section of the Configuration Options. To update Cloudplexes, contact SnapLogic Support.

-Djsch.kex=ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1-Djsch.server_host_key=ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521-Djsch.client_pubkey=ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521-Djsch.cipher=aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc-Djsch.check_ciphers=aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256-Djsch.check_kexes=diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521-Djsch.check_signatures=ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521

Account

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. This Snap supports a Basic auth account, an AWS S3 auth account, SSH Auth account, SMB account, or no account. See Configuring Binary accounts for information on setting up accounts that work with this Snap. Account types supported by each protocol are as follows:

Protocol Account types
sldb no account
s3 AWS S3
ftp Basic Auth
sftp Basic Auth, SSH Auth
ftps Basic Auth
hdfs no account
http no account
https no account
smb SMB
wasb Azure Storage
wasbs Azure Storage
gs Google Storage
Warning:

The FTPS file protocol works only in explicit mode. The implicit mode is not supported.

Required settings for account types are as follows:

Account type Settings
Basic Auth Username, Password
AWS S3 Access-key ID, Secret key
SSH Auth Username, Private key, Key Passphrase
SMB Domain, Username, Password
Azure Storage Account name, Primary access key
Google Storage Approval prompt, Application scope, Auto-refresh token(Read-only properties are Access token, Refresh token, Access token expiration, OAuth2 Endpoint, OAuth2 token and Access type.)

Snap views

View Description Examples of upstream and downstream Snaps
Input N/A N/A
Output Binary data read from the source specified in the Selected files property.
Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution Stops the current pipeline execution when an error occurs.
  • Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records.
  • Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap settings

Legend:
  • Expression icon (): Allows using pipeline parameters to set field values dynamically (if enabled). SnapLogic Expressions are not supported. If disabled, you can provide a static value.
  • SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
  • Suggestion icon (): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
  • Upload : Uploads files. Learn more.
Learn more about the icons in the Snap settings dialog.
Field / Field set Type Description
Label String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: Multi File Reader

Example: Multi File Reader
Selected Files Required.

Use this field set to define data sources.

Warning: All selected files must be under the same protocol.
Folder/File String/Expression Specify the URL for the data source, which can be a directory or a file. It should begin with a file protocol. The supported file protocols are:
  • http:
  • https:
  • s3:
  • sftp:
  • ftp:
  • ftps:
  • hdfs:
  • sldb:
  • smb:
  • wasb:
  • wasbs:
  • gs:
Note: This Snap supports S3 Virtual Private Cloud (VPC) endpoint. For example, s3://my-bucket@bucket.vpce-028b7814794578709-vu0vvauy.s3.us-west-2.vpce.amazonaws.com
The File property should have the syntax: [protocol]://[host][:port]/[path]
  • _filename (A key/value pair with "filename" key should be defined as a pipeline parameter.)
  • If a Pipeline is created in a project other than the shared project and you want to read the "asset.json" file from the shared project, enter "shared/asset.json" or "sldb:///shared/asset.json".
  • If a Pipeline is created in the shared project and you want to read the "asset.json" file from the shared project, enter "asset.json" or "sldb:///asset.json".
Note: "://" is a separator between the file protocol and the rest of the URL and the host name and the port number should be between "://" and "/". If the port number is omitted, a default port for the protocol is used. The hostname and port number are omitted in the sldb and s3 protocols.
Note:
  • Ensure the file name, folder name, or the file path does not contain '?' character because it is not fully supported and when present, the Snap might fail.
  • The File property should be an absolute path for all protocols except sldb. For sldb files, the Snap can access only files in the same project directory or the shared project directory, and cannot access files in other projects.
  • For sldb, http and https protocols, URL for a regular file should be entered. Folders are not supported for these protocols. If this property is a regular file, the Wildcard and Include subfolders property are ignored.
Warning: In the SnapLogic 4.3.2 release, WASB (Windows Azure Storage Blob) or WASBS protocol (wasb:/// or wasbs:///) support has been added to the Binary Snaps.
In the WASB and WASBS file URL, the top directory should be the name of the 'Azure Storage container'.
  • If an account is not used within the Snap, then use: s3://yourAcccessKeyID:yourSecretKey@s3/yourBucketName/folder1/folder2/
  • if an account is not used within the Snap, then use: s3://yourAcccessKeyID:yourSecretKey@s3/yourBucketName/folder1/rawData.csv
  • If the Snap is executed in the Windows Groundplex and needs to access D: drive, then use file:///D:/testFolder/
  • To read files in the 'testDir' folder in the 'Snaplogic' container, then use wasb:///Snaplogic/testDir/sample.csv
  • If the bucket name is 'testBucket', then gs:///testBucket/testDir/

Default value: None.

Example:
  • s3:///<S3_bucket_name>@s3.<region_name>.amazonaws.com/<path>

    For region names and their details, see AWS Regions and Endpoints.

  • sftp://ftp.snaplogic.com:22/dir/filename

  • smb://smb.Snaplogic.com:445/test_files/csv/input.csv
Wildcard String/Expression Specify the wildcard pattern, if the URL in the Folder/File property is for a directory. All files matching the wildcard pattern are selected. This property is not supported for the sldb, http, and https protocols. The asterisk pattern character ("*", also called "star") and the question mark ("?") are supported. The "*" character matches zero or more characters. The "?" matches exactly one character.

Default value: None.

Example:
  • *.* *
  • .csv
  • *.json
  • *.??? (matches all files with three-character extensions)
Include Subfolders Checkbox

Select to search subfolders for the specified Wildcard if Folder/File is set to a directory.

If you select this checkbox and the Folder/File property is a folder, all files in the subfolders matching the given wildcard pattern are selected. This checkbox is not supported for the sldb, http, and https protocols.

Default status: Deselected

Number of retries Integer/Expression

Specify the maximum number of retry attempts the Snap must make in case there is a network failure, and the Snap is unable to read the target file.

If the value is larger than 0, the Snap first downloads the target file to a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap starts to stream the data from the temporary file to the downstream pipeline. All temporary local files are deleted when they are no longer needed.

Note:
  • Ensure that the local drive has sufficient free disk space to store the temporary local file.

  • The retry operation is applied for each file the Snap downloads.

Minimum value: 0

Default value: 0

Example: 3

Retry interval (seconds) Integer/Expression Specify the minimum number of seconds for which the Snap must wait before attempting recovery from a network failure.

Minimum value: 1

Default value: 1

Example: 3

Advanced Properties Use this field set to define additional properties.
SAS URI Dropdown list Specify the URI of the Shared Access Storage (SAS) you need to access. You can generate the SAS URI either from the SAS Generator Snap or from the Azure portal → Shared access signature.
The supported SAS types are:
  • Service SAS on a container
  • Service SAS on blob
Warning:
  • Ensure that the URI is specified in the format described here.
  • If you provide SAS URI in this field, then:
    • the Primary access key given in the account settings is overridden while authentication. If you do not provide the SAS URI, the Snap considers the Primary access key in the account settings.
    • only this URL is used and the Snap ignores the SAS URI settings that you have configured in the associated account.
Warning: If the SAS URI value is provided in the Snap settings, then the settings provided in the account (if any account is attached) are ignored.

Default value: N/A

Example: https://myaccount.blob.core.windows.net/sascontainer/sasblob.txt?sv=2015-04-05&st=2015-04-29T22%3A18%3A26Z&se=2015-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D

Values String/Expression Specify the value for the property.
Snap execution Dropdown list
Choose one of the three modes in which the Snap executes. Available options are:
  • Validate & Execute: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only: Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default value: Execute only

Example: Validate & Execute

Warning:

The Pipeline validation (achieved by pressing "Retry") imposes a 5-minute timeout. If there are a large number of files to be read by the Snap as a result of Wildcard and Include subfolders settings, the Snap validation may fail due to this 5-minute timeout limit.

Output Fields for the Different Protocols

The output fields that the Multi File Reader Snap generates depends on the protocol you select. The following table lists the output fields for the different protocols supported by the Snap:

Protocol Output Fields
S3
  • content-type
  • content-length
  • last-modified: _snaptype_datetime
  • etag
  • accept-ranges
  • content-location
  • content-disposition
SLDB
  • content-type date
  • x-amz-meta-md5
  • content-length
  • server x-amz-server-side-encryption
  • x-amz-meta-length
  • x-amz-meta-create_time
  • last-modified: _snaptype_datetime
  • x-amz-meta-file_id
  • x-amz-meta-ttl
  • content-disposition
  • x-amz-meta-owner
  • x-amz-meta-expire_time
  • etag
  • x-amz-request-id
  • x-amz-meta-mimetype
  • x-amz-id-2
  • accept-ranges
  • content-location
  • WASB
  • SMB
  • SFTP
  • GStorage
  • content-type
  • content-location
  • content-disposition

Examples