Databricks Account

Overview

You can use this account type to connect Databricks Snaps with data sources that use a Databricks Account with the following endpoints as the source or target:
  • Amazon S3
  • Azure Blob Storage
  • Azure Data Lake Storage Gen2
  • DBFS
  • Google Cloud Storage
  • JDBC


Behavior changes

Starting from May 2025 main31019 Snap Pack version, if you are using the bundled Databricks driver (2.6.40), you might encounter a socket timeout error.

Workaround: Set the SocketTimeout value to 0 in the account URL properties field set of the Databricks Account.

Account settings



Legend:
  • Expression icon (): Allows using JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
  • SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
  • Suggestion icon (): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
  • Upload : Uploads files. Learn more.
Learn more about the icons in the Snap settings dialog.
Field / Field set Type Description
Label String

Required. Specify a unique label for the account.

Default value: N/A

Example: STD DB Acc DeltaLake AWS ALD
Account Properties Use this field set to configure the information required to establish a JDBC connection with the account.
Download JDBC Driver Automatically Checkbox Select this checkbox to allow the Snap account to download the certified JDBC Driver for DLP. The following fields are disabled when this checkbox is selected:
  • JDBC JAR(s) and/or ZIP(s) : JDBC Driver
  • JDBC driver class
We recommend that you use the bundled JAR file version (databricks-jdbc-2.6.40) in your pipelines. However, you may choose to use a custom JAR file version. To use a custom JDBC Driver:
  1. Deselect the Download JDBC Driver Automatically checkbox.

  2. In the JDBC Driver field, click the Upload icon.
  3. Select the desired JAR file, and click Upload File.

Note: You can use a different JAR file version other than the list of JAR file versions.
Note: Spark JDBC and Databricks JDBC
If you do not select this checkbox and use an older JDBC JAR file (older than version 2.6.40), ensure that you use:
  • The old format JDBC URL ( jdbc:spark:// ) instead of the new one ( jdbc:databricks:// )
    • For a JDBC driver prior to version 2.6.40, the JDBC URL starts with jdbc:spark://

    • For a JDBC driver version 2.6.40 or later, the JDBC URL starts with jdbc:databricks://

  • The older JDBC Driver Classcom.simba.spark.jdbc.Driver instead of the new com.databricks.client.jdbc.Driver.

Default status: Deselected

JDBC URL String Required. Enter the JDBC driver connection string that you want to use in the syntax provided below, for connecting to your DLP instance. Learn more in Microsoft's JDBC and ODBC drivers and configuration parameters.

jdbc:spark://dbc-ede87531-a2ce.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/6968995337014351/0521-394181-guess934;AuthMech=3;UID=token;PWD=<personal-access-token>

Note: Avoid passing a Password inside the JDBC URL

If you specify the password inside the JDBC URL, it is saved as it is and is not encrypted. Instead, we recommend using the provided Password field to ensure that your password is encrypted.

Default value: N/A

Example:
jdbc:spark://adb-2409532680880038.18.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/2409532680880038/0326-212833-drier754;AuthMech=3;
Authentication type Dropdown list Choose the authentication type to use. Available options are:

Default value: Token authentication

Example: M2M authentication
Token String/Expression

Appears when you select Token-based as the Authentication type.

Required. Specify the token for Databricks Lakehouse Platform authentication.

Default value: N/A

Example: dapi1234567890abcdef1234567890abcdef

Username String Appears when you select Password as the Authentication type.

Specify the username that is allowed to connect to the database. The username will be used as the default username when retrieving connections. The username must be valid to set up the data source.

Default value: N/A

Example: snapuser

Password String Appears when you select Password as the authentication type.

Specify the password used to connect to the data source. The password will be used as the default password when retrieving connections. The password must be valid to set up the data source.

Default value: N/A

Example: <Encrypted>

Client ID String Appears when you select M2M as the Authentication type.

Specify the unique identifier that is assigned to the application when it is registered with the OAuth2 provider.

Default value: N/A

Example: 12345678-abcd-1234-efgh-56789abcdef0

Client secret String Appears when you select M2M as the Authentication type.

Specify the confidential key assigned to the application with the Client ID.

Default value: N/A

Example: ABCD1234abcd5678EFGHijkl9012MNOP

Database name String Required. Enter the name of the database to use by default. This database is used if you do not specify one in the Databricks Select or Databricks Insert Snaps.

Default value: N/A

Example: Default

Source/Target Location Dropdown list Select the source or target data warehouse into which the queries must be loaded. The available options are:
  • None: Select None when using Read Snaps, and if you do not want to write anything to the target data warehouse.

  • Amazon S3

  • Azure Blob Storage

  • Azure Data Lake Storage Gen2

  • DBFS

  • Google Cloud Storage

  • JDBC

Learn more about settings specific to each data warehouse: Source/Target location

Advanced Properties
URL properties Use this field set to define the account parameter's name and its corresponding value. Click + to add the parameters and the corresponding values. Add each URL property-value pair in a separate row.
URL property name String Specify the name of the parameter for the URL property.

Default value: N/A

Example: queryTimeout

URL property value String Specify the value for the URL property parameter.

Default value: N/A

Example: 0

Batch size Integer Required. Specify the number of queries that you want to execute at a time.
  • If the Batch size is one, the query is executed as-is, that is the Snap skips the batch (nonbatch execution).

  • If the Batch size is greater than one, the Snap performs the regular batch execution.

Default value: N/A

Example: 3

Fetch size Integer

Required. Specify the number of rows a query must fetch for each execution.

Larger values could cause the server to run out of memory.

Default value: 100

Example: 12

Min pool size Integer Required. Specify the minimum number of idle connections that you want the pool to maintain at a time.

Default value: 3

Example: 0

Max pool size Integer Required. Specify the maximum number of connections that you want the pool to maintain at a time.

Default value: 15

Example: 0

Max life time Integer Required. Specify the maximum lifetime of a connection in the pool, in seconds:
  • Ensure that the value you enter is a few seconds shorter than any database or infrastructure-imposed connection time limit.

  • 0 (zero) indicates an infinite lifetime, subject to the Idle Timeout value.

  • An in-use connection is never retired. Connections are removed only after they are closed.

Minimum value: 0

Maximum value: No limit

Idle Timeout Integer Required. Specify the maximum amount of time in seconds that a connection is allowed to sit idle in the pool.

0 (zero) indicates that idle connections are never removed from the pool.

Minimum value: 0

Maximum value: No limit

Source/Target Location

ADSL Gen2

If you select Azure Data Lake Storage Gen2 as the target data warehouse, the account displays the following fields:



Field/ Field set Type Description
Azure storage account name Dropdown list Required. Enter the name with which Azure Storage was created. The Bulk Load Snap automatically appends the '.blob.core.windows.net' domain to the value of this property.

Default value: N/A

Example: testblob

Azure Container String/Expression Required.

Enter the name of an existing Azure container.

Default value: N/A

Example: sl-bigdata-qa

Azure folder String/Expression Required.

Enter the name of an existing Azure folder in the container to be used for hosting files.

Default value: N/A

Example: test-data

Azure Auth Type Dropdown list

Select the authorization type to use while setting up the account. Options available are:

  • Storage Account Key
  • Shared Access Signature

Default value: Shared Access Signature

Example: Shared Access Signature

Azure storage account key String/Expression

Appears when Azure Auth Type is Storage account key.

Required. Enter the access key ID associated with your Azure storage account.

Default value: N/A

Example: ABCDEFGHIJKL1MNOPQRS

SAS Token String/Expression

Appears when Azure Auth Type is Shared Access Signature.

Required. Enter the SAS token, which is the part of the SAS URI associated with your Azure storage account. Learn more: Getting Started with SAS.

Default value: N/A

Example: ?sv=2020-08-05&st=2020-08-29T22%3A18%3A26Z&se=2020-08-30T02%3A23%3A26Z&sr=b&sp=rw&sip=198.1.2.60-198.1.2.70&spr=https&sig=A%1DEFGH1Ijk2Lm3noI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D

Azure Blob Storage

If you select Azure Blob Storage as the target data warehouse, the account displays the following fields:



Field/ Field set Type Description
Azure storage account name Dropdown list Required. Enter the name with which Azure Storage was created. The Bulk Load Snap automatically appends the '.blob.core.windows.net' domain to the value of this property.

Default value: N/A

Example: testblob

Azure Container String/Expression Required. Enter the name of an existing Azure container.

Default value: N/A

Example: sl-bigdata-qa

Azure folder String/Expression Required. Enter the name of an existing Azure folder in the container to be used for hosting files.

Default value: N/A

Example: test-data

Azure Auth Type Dropdown list

Select the authorization type that you want to consider when setting up the account. Options available are:

  • Storage Account Key
  • Shared Access Signature: Select when you want to enter the SAS Token associated with the Azure storage account.

Default value: Shared Access Signature

Example: Shared Access Signature

SAS Token String/Expression

Appears when Azure Auth Type is Shared Access Signature.

Required. Enter the SAS token which is the part of the SAS URI associated with your Azure storage account. Learn more in Getting Started with SAS.

Default value: N/A

Example: ?sv=2020-08-05&st=2020-08-29T22%3A18%3A26Z&se=2020-08-30T02%3A23%3A26Z&sr=b&sp=rw&sip=198.1.2.60-198.1.2.70&spr=https&sig=A%1DEFGH1Ijk2Lm3noI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D

Azure storage account key String/Expression

Appears when Azure Auth Type is Shared Access Signature.

Required. Enter the access key ID associated with your Azure storage account.

Default value: N/A

Example: ABCDEFGHIJKL1MNOPQRS

AWS S3

If you select AWS S3as thetarget data warehouse, the account displays the following fields:



Field/ Field set Type Description
S3 Bucket String Required. Specify the relative path to a folder in the S3 bucket listed in the S3 Bucket field. This is used as a root folder for staging data to Databricks.

Default value: N/A

Example: https://sl-bucket-ca.s3.<ca>.amazonaws/<sf>

S3 Folder String Required. Specify the relative path to a folder in the S3 bucket listed in the S3 Bucket field. This is used as a root folder for staging data to Databricks.

Default value: N/A

Example: https://sl-bucket-ca.s3.<ca>.amazonaws/<sf>

Aws Authorization type Dropdown list Select the authentication method to use for accessing the source data. Available options are:
  • Source/Target Location Credentials. Select this option when you do not have a storage integration setup in your S3. Activates the Access Key and Secret Key fields for S3.
  • Source/Target Location Session Credentials. Select this option if you have session credentials to access the source location in S3. Activates the Session Access Key, Session Secret Key, and Session Token fields.

  • Storage Integration. Select this option when you want to use the storage integration to access the selected source location. Activates the Storage Integration Name field.

Default value: Source Location Credentials for S3 and Azure, Storage Integration for Google Cloud Storage.

Example: Storage Integration

S3 Access-key ID String

Required. Specify the S3 access key ID that you want to use for AWS authentication.

Default value:

Default value: N/A

Example: 2RGiLmL/6bCujkKLaRuUJHY9uSDEjNYr+ozHRtg

S3 Secret Key String

Appears when Azure Auth Type is Shared Access Signature.

Required. Specify the S3 secret key associated with the S3 Access-ID key listed in the S3 Access-key ID field.

Default value: N/A

Example: 2RGiLmL/6bCujkKLaRuUJHY9uSDEjNYr+ozHRtg

S3 AWS Token String

Appears when Source/Target Location Session Credentials is selected in Aws Authorization type.

Required. Specify the S3 AWS Token to connect to private and protected Amazon S3 buckets.

Tip:

The temporary AWS Token is used when:

  • Data is staged in S3 location.
  • Data is coming from the input and the files are staged in an external staging location.

Default value: N/A

Example: AQoDYXdzEJr

Google Cloud Storage

If you select Google Cloud Storage as the target location, the account displays the following fields:



Field/ Field set Type Description
GCS Bucket String/Expression Appears when Google Cloud Storage is selected for Source/Target Location.

Required. Specify the GCS Bucket to use for staging data to be used for loading to the target table.

Default value: N/A

Example: sl-test-bucket

GCS Folder String/Expression

Appears when Google Cloud Storage is selected for Source/Target Location.

Required. Specify the relative path to a folder in the GCS Bucket. This is used as a root folder for staging data.

Default value: N/A

Example: test_data

GCS Authorization type Dropdown list Appears when Google Cloud Storage is selected for Source/Target Location.

Select the authentication type to load data. By default the authentication type is Service Account.

Default value: Service Account

Service Account Email String/Expression

Appears when Google Cloud Storage is selected for Source/Target Location

Required. Specify the service account email allowed to connect to the BigQuery database. This will be used as the default username when retrieving connections. The Email must be valid in order to set up the data source.

Default value: N/A

Example: [email protected]

Service Account Key File Path String/Expression

Appears when Google Cloud Storage is selected for Source/Target Location.

Required. Specify the path to Key file used to authenticate the service account email address with the BigQuery database.

Default value: N/A

Example: 7f7c54a1c19b.json

JDBC (Any SQL Database)

If you select JDBC as the target data warehouse, the account displays the following fields:



Field/ Field set Type Description
Source JDBC URL String .

Required. Specify the JDBC URL of the source table.

Default value: N/A

Example: jdbc:snowflake://snaplogic.east-us-2.azure.snowflakecomputing.com

Source username String

Specify the username of the external source database.

Default value: N/A

Example: db_admin

Source password String

Specify the password for the external source database.

Default value: N/A

Example: M#!ikA8_0/&!

Source : DBFS

If you select the DBFS as the target data warehouse, the account displays the DBFS Folder path (source for loading Databricks table) field.



Field/ Field set Type Description
DBFS Folder path (source for loading Databricks table) String .

Enter the folder path for the source files to be loaded from. The path must begin with a forward slash /.

Default value: N/A

Example: /data_folder/path

Troubleshooting

Error Reason Resolution
Socket Timeouterror when connecting to Databricks. With v2.6.35of the Databricks JDBC driver, a new feature [SPARKJ-688] was introduced in the connector that turns on the socket timeout by default for HTTP connections. The default socket timeout is set to 30 seconds. If the server does not respond within 30 seconds, a SocketTimeoutException is displayed. Set the SocketTimeout value to 0 in the account URL properties field set of the Databricks Account.