Tokenizer

Overview

You can use this Snap to convert sentences into an array of tokens.

Note:

In the context of the ML Natural Language Processing Snap Pack, a token can be a word or special characters. In order to perform Natural Language Processing (NLP) operations with other Snaps in this Snap Pack, an array of tokens is required.
This Snap uses Apache OpenNLP Library, which is a machine-learning based toolkit for processing natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.

Prerequisites

None.

Limitations and known issues

None.


Type	Description	Examples of upstream and downstream Snaps
Input	This Snap supports a maximum of one document input view and it requires an input document.	Any Snap that offers documents. Mapper CSV Generator
Output	This Snap supports a maximum of one document output view and it requires a document containing an array of tokens.	Any Snap that accepts documents. Common Words Bag of Words Mapper
Learn more about Error handling.

Legend:

Expression icon (): Allows using JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
Suggestion icon (): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
Upload : Uploads files. Learn more.

Learn more about the icons in the Snap settings dialog.


Field/Field set	Type	Description
Label `String`	String	Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Tokenizer Example: Customer data token
Text	String/Suggestion	Required.Specify the text containing the sentences that must be tokenized from the upstream Snap. Default value: N/A Example: $text
Word only	Checkbox	Select this checkbox to exclude special characters in the output. Default status: Selected
Snap execution `Dropdown list`	Dropdown list	Choose one of the three modes in which the Snap executes. Available options are: `Validate & Execute`: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime. `Execute only`: Performs full execution of the Snap during pipeline execution without generating preview data. `Disabled`: Disables the Snap and all Snaps that are downstream from it. Default value: Validate & Execute Example: Execute only