Tokenizer
Overview
- In the context of the ML Natural Language Processing Snap Pack, a token can be a word or special characters. In order to perform Natural Language Processing (NLP) operations with other Snaps in this Snap Pack, an array of tokens is required.
- This Snap uses Apache OpenNLP Library, which is a machine-learning based toolkit for processing natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.

- This is a Transform-type Snap.
Works in Ultra Tasks
Prerequisites
None.
Limitations and known issues
None.
Snap views
| Type | Description | Examples of upstream and downstream Snaps |
|---|---|---|
| Input | This Snap supports a maximum of one document input view and it requires an input document. | Any Snap that offers documents. Examples:
|
| Output | This Snap supports a maximum of one document output view and it requires a document containing an array of tokens. | Any Snap that accepts documents. Examples:
|
| Learn more about Error handling. | ||
Snap settings
- Expression icon (
): Allows using JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
- SnapGPT (
): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
- Suggestion icon (
): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
- Upload
: Uploads files. Learn more.
| Field/Field set | Type | Description |
|---|---|---|
| Label | String |
Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Tokenizer Example: Customer data token |
| Text | String/Suggestion | Required. Specify the text containing the sentences that must be tokenized from the upstream Snap. Default value: N/A Example: $text |
| Word only | Checkbox | Select this checkbox to exclude special characters in the output. Default status: Selected |
| Snap execution | Dropdown list |
Choose one of the three modes in
which the Snap executes. Available options are:
Default value: Validate & Execute Example: Execute only |