Tokenizer
Overview
- In the context of the ML Natural Language Processing Snap Pack, a token can be a word or special characters. In order to perform Natural Language Processing (NLP) operations with other Snaps in this Snap Pack, an array of tokens is required.
- This Snap uses Apache OpenNLP Library, which is a machine-learning based toolkit for processing natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
- Transform-type Snap
- Works in Ultra Tasks
Prerequisites
None.
Limitations and known issues
None.
Snap views
View | Description | Examples of upstream and downstream Snaps |
---|---|---|
Input | This Snap supports a maximum of one document input view and it requires an input document. | Any Snap that offers documents. Examples:
|
Output | This Snap supports a maximum of one document output view and it requires a document containing an array of tokens. | Any Snap that accepts documents. Examples:
|
Error |
Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:
Learn more about Error handling in Pipelines. |
Snap settings
- Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
- Expression icon (): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
- Add icon (): Indicates that you can add fields in the field set.
- Remove icon (): Indicates that you can remove fields from the field set.
Field / Field set | Type | Description |
---|---|---|
Label | String |
Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Tokenizer Example: Customer data token |
Text | String/Suggestion | Required. Specify the text containing the sentences that must be tokenized from the upstream Snap. Default value: N/A Example: $text |
Word only | Checkbox | Select this checkbox to exclude special characters in the output. Default status: Selected |
Snap execution | Dropdown list |
Select one of the three modes in which the Snap executes.
Available options are:
Default value: Validate & Execute Example: Execute only |