Compute statistics for data analysis

This example pipeline demonstrates how to compute data statistics with and without Value distribution enabled.

  1. Configure the JSON Generator Snap to pass your input data.
    Note: In this example, we use the JSON Generator Snap. However, you can replace the JSON Generator Snap with any Snap of your choice, such as the Chunker, Constant, File Reader, or S3 File Reader Snaps.

    JSON Generator Snap - Edit JSON

  2. Configure the Profile Snap to compute data statistics with Value distribution enabled, providing comprehensive insights into your data set's characteristics.
    On validation, the Snap displays a summary of the computed data statistics, including measures such as mean, median, mode, standard deviation, and more.
    Profile Snap (with Value distribution) Configuration Profile Snap (with Value distribution) Output

    ML Analytics Profile Snap (with Value distribution) Configuration


    ML Analytics Profile Snap (with Value distribution) Output

  3. Configure the Profile Snap to compute data statistics with Value distribution disabled, providing a basic summary of key statistical measures such as mean, median, mode, standard deviation, and others.
    On validation, the Snap displays a summary of the computed data statistics, including measures such as mean, median, mode, standard deviation, and more.
    Profile Snap (without Value distribution) Configuration Profile Snap (without Value distribution) Output

    ML Analytics Profile Snap (without Value distribution) Configuration


    ML Analytics Profile Snap (without Value distribution) Output

    Note: After the data is generated, you can use Snaps such as the Filter and Aggregate Snaps for advanced processing. You can also use GenAI App Builder to integrate machine learning models.
To successfully reuse pipelines:
  1. Download and import the pipeline into the SnapLogic Platform.
  2. Configure Snap accounts, as applicable.
  3. Provide pipeline parameters, as applicable.