Scale

To scale numeric values in fields to specific ranges or apply statistical transformations before applying an ML algorithm

Overview

The Scale Snap is a Transform type Snap that scales numeric values in fields to specific ranges or applies statistical transformations. The Snap helps with data preparation before applying a machine learning algorithm to the data.

The Scale Snap supports the following four transformations:

  • Scale to range [0,1]
  • Scale to range [-1,1]
  • Z-transformation
  • Log-transformation

Scale Snap Settings

Prerequisites

None.

Limitations and known issues

None.

Snap views

View Description Examples of upstream and downstream Snaps
Input This Snap has exactly two document input views - the Data input view and the Profile input view.
  1. A document with numeric fields. The profile input view is required for Range [0,1], Range [-1,1], and Z-transformation types.
  2. A document containing data statistics computed by the Scale Snap. For Log-transformation type, the second input view of the Snap can remain unconnected.
Note:

The Scale Snap processes the data in streams while the Profile Snap consumes all the data before it derives any statistics. Therefore, while using the Profile Snap:

  • Build a pipeline with the Profile Snap to generate data statistics that are stored in the JSON format using the JSON Formatter and File Writer Snaps.
  • Use the File Reader and JSON Parser Snaps to read statistics from the Profile Snap and feed the output data into the Scale Snap. These Snaps read the data statistics computed by the Profile Snap in another pipeline. It is required to select Value distribution in the Profile Snap and set Top value limit according to the number of unique values; or set to 0, which means unlimited.
Output This Snap has exactly one document output view. A document with numeric fields that are transformed as per the transformation type.
Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution Stops the current pipeline execution when an error occurs.
  • Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records.
  • Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap settings

Legend:
  • Expression icon (): JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
  • SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
  • Suggestion icon (): Populates a list of values dynamically based on your Account configuration.
  • Upload : Uploads files. Learn more.
Learn more about the icons in the Snap settings dialog.
Field / field set Type Description
Label String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: Scale

Example: Scale numerics
Policy

Specify your preferences for a field's transformation. Each policy contains an input Field, transformation Rule, and the result field. The Snap transforms the values in the input field and writes them to the Result field.

Note:

You can apply multiple transformations to the same input field. However, the result fields must be different. If the result field is the same as the input field, the Snap overwrites the input field with the result field.

Field Expression/Suggestion

The field that must be transformed. This is a suggestible field that suggests all the fields in the dataset. The Snap displays an error message for non-numeric fields.

Default: None.

Rule Dropdown list

The type of transformation to be performed on the selected field. The available options are:

  • Range [0,1]: Scale values into [0,1]. The minimum value is scaled to 0, the maximum to 1. Other values are scaled to (x - min) / (max-min).
  • Range [-1,1]: Scale values into [-1,1]. The minimum value is scaled to -1, the maximum to 1.
  • Z-transformation: This is the same as standardization, which is z = (x - mean) / sd.
  • Log-transformation: Natural logarithm is applied to each value.

For example, if you want to transform 35 with the following statistics:

  • min = 0
  • max = 50
  • mean = 25
  • sd = 10

The result for each transformation rule is as following:

  • Range [0,1]: 0.7
  • Range [-1,1]: 0.4
  • Z-transformation: 1
  • Log-transformation: 3.56
Result field Expression/Suggestion

The result field to use in the output map. If the Result field is the same as Field, the values are overwritten. If the Result field does not exist in the original input document, a new field is added.

Default: None.

Snap execution Dropdown list
Select one of the three modes in which the Snap executes. Available options are:
  • Validate & Execute: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only: Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default value: Execute only

Example: Validate & execute