Tokenize text data

This example pipeline demonstrates how to convert sentences into an array of tokens, which can be further used in other ML NLP Snaps.

  1. Configure the CSV Generator Snap to pass your input data which contains a set of sentences as documents.
    Note: In this example, we use the CSV Generator Snap. However, you can replace the CSV Generator Snap with any Snap of your choice, such as the Chunker, JSON Generator Constant, File Reader, or S3 File Reader Snaps.

    CSV Generator Snap

  2. Configure the Tokenizer Snap with $text to tokenize the content provided in the field and generate an array of tokens as output.
    On validation, the Snap displays the tokenized array of words, with a clear view of the processed text data. Each word in the input sentences becomes a token, and sentences in each input document becomes an array.
    Tokenizer Snap Configuration Tokenizer Snap Output

    Tokenizer Snap Configuration

    Tokenizer Snap Output

    Note: After the data is generated, you can use Snaps such as the Filter and Aggregate Snaps for advanced processing. Further, you can use GenAI Builder to integrate machine learning models.
To successfully reuse pipelines:
  1. Download and import the pipeline into SnapLogic.
  2. Configure Snap accounts as applicable.
  3. Provide pipeline parameters as applicable.