Tokenize text data

This example pipeline demonstrates how to convert sentences into an array of tokens, which can be further used in other ML NLP Snaps.

Download this pipeline.

Configure the CSV Generator Snap to pass your input data, which contains a set of sentences as documents.

Note: In this example, we use the CSV Generator Snap. However, you can replace the CSV Generator Snap with any Snap of your choice, such as the Chunker, JSON Generator Constant, File Reader, or S3 File Reader Snaps.

Configure the Tokenizer Snap with $text to tokenize the content provided in the field and generate an array of tokens as output.

On validation, the Snap displays the tokenized array of words, with a clear view of the processed text data. Each word in the input sentences becomes a token, and sentences in each input document becomes an array.


Tokenizer Snap configuration	Tokenizer Snap output

Note: After the data is generated, you can use Snaps such as the Filter and Aggregate Snaps for advanced processing. You can also use AgentCreator to integrate machine learning models.

To successfully reuse pipelines:

Download and import the pipeline into the SnapLogic Platform.
Configure Snap accounts, as applicable.
Provide pipeline parameters, as applicable.