Compute statistics for data analysis

This example pipeline demonstrates how to compute data statistics with and without Value distribution enabled.

  1. Configure the JSON Generator Snap to pass your input data.
    Note: In this example, we use the JSON Generator Snap. However, you can replace the JSON Generator Snap with any Snap of your choice, such as the Chunker, Constant, File Reader, or S3 File Reader Snaps.

    JSON Generator Snap - Edit JSON

  2. Configure the Profile Snap to compute data statistics with Value distribution enabled, providing comprehensive insights into your dataset's characteristics.
    On validation, the Snap displays a summary of the computed data statistics, including measures such as mean, median, mode, standard deviation, and more.
    Profile Snap (with Value distribution) Configuration Profile Snap (with Value distribution) Output

    ML Analytics Profile Snap (with Value distribution) Configuration


    ML Analytics Profile Snap (with Value distribution) Output

  3. Configure the Profile Snap to compute data statistics with Value distribution disabled, providing a basic summary of key statistical measures such as mean, median, mode, standard deviation, and others.
    On validation, the Snap displays a summary of the computed data statistics, including measures such as mean, median, mode, standard deviation, and more.
    Profile Snap (without Value distribution) Configuration Profile Snap (without Value distribution) Output

    ML Analytics Profile Snap (without Value distribution) Configuration


    ML Analytics Profile Snap (without Value distribution) Output

    Note: After the data is generated, you can use Snaps such as the Filter and Aggregate Snaps for advanced processing. Further, you can use GenAI Builder to integrate machine learning models.
To successfully reuse pipelines:
  1. Download and import the pipeline into SnapLogic.
  2. Configure Snap accounts as applicable.
  3. Provide pipeline parameters as applicable.