Calculate common word frequency

This example pipeline demonstrates how to calculate the frequency of the most common words in a dataset using the Tokenizer and Bag of Words Snaps.

  1. Configure the File Reader Snap to read the contents of the Employee data.json file.
    On validation, the Snap displays the read contents of the yelp_dataset.txt file and offers a binary stream as output.
    File Reader Snap Configuration File Reader Snap Preview

    File Reader Snap Configuration


    File Reader Snap Preview

  2. Configure the JSON Parser Snap to parse JSON data from the binary input data.
    On validation, the Snap provides a document to further use in the Tokenizer Snap.
  3. Configure the Tokenizer Snap with $businesses to read from the business data.
    On validation, the Snap displays the content from $businesses that will be tokenized and output as an array of tokens.
    Tokenizer Snap Configuration Tokenizer Snap Output

    Tokenizer Snap Configuration


    Tokenizer Snap Output

  4. Configure the Common Words Snap to compute the frequency of each word that appears in the array of tokens.
    On validation, the Snap displays a detailed summary of the word frequencies, providing insights into the most common words in the dataset.
    Common Words Snap Configuration Common Words Snap Output

    Common Words Snap Configuration


    Common Words Snap Output

    Note: After the data is generated, you can use Snaps such as the Filter and Aggregate Snaps for advanced processing. Further, you can use GenAI Builder to integrate machine learning models.
To successfully reuse pipelines:
  1. Download and import the pipeline into SnapLogic.
  2. Configure Snap accounts as applicable.
  3. Provide pipeline parameters as applicable.