Apply Sampling Algorithms

This pipeline demonstrates the use of various sampling algorithms in the Sample Snap.

This example applies the following sampling algorithms using separate configurations of the Sample Snap:

  • Streamable Sampling
  • Strict Sampling
  • Stratified Sampling
  • Weighted Stratified Sampling

Data Sampling Pipeline

Download this pipeline .

  1. Configure the CSV Generator Snap to generate the input dataset.
    CSV Generator Output

    The generated dataset includes 50 records with two classes in the $Gender field: M and F.

  2. Use the Copy Snap to create five parallel streams from the input dataset.

    Four of these streams go into separate Sample Snaps, and one goes into a Profile Snap to generate statistics.


    Profile Snap Output

    The Profile Snap shows:

    • Total records: 50
    • Classes: M (33), F (17)
  3. Configure the Sample Snap for Streamable Sampling.

    This configuration randomly passes approximately 50% of the documents.


    Streamable Sampling Settings


    Streamable Sampling Output

    Profile Snap confirms 28 records are sampled—close to the 50% pass-through rate.


    Strict Sampling Settings

  4. Configure the Sample Snap for Strict Sampling.

    In this mode, the Snap selects exactly 50% of the records.


    Strict Sampling Output

    Profile Snap confirms exactly 25 records were sampled.

  5. Configure the Sample Snap for Stratified Sampling.

    Select $Gender as the stratified field. The Snap selects an equal number of documents from each class while maintaining the pass-through percentage.


    Stratified Sampling Settings

    Profile Snap confirms a balanced subset with 24 total records (12 per class).


    Weighted Stratified Sampling Output

  6. Configure the Sample Snap for Weighted Stratified Sampling.

    Specify $Gender as the stratified field. The Snap preserves the class distribution ratio in the sampled output.


    Weighted Stratified Sampling Configuration

    Profile Snap confirms 24 records were sampled, maintaining the original ratio of M:F (approx. 2:1).


    Weighted Stratified Sampling Output

  7. Optional: Write each sampled output to separate files using the File Writer Snap.
To successfully reuse pipelines:
  1. Download and import the pipeline in to the SnapLogic Platform.
  2. Configure Snap accounts, as applicable.
  3. Provide pipeline parameters, as applicable.