Chunker
Overview
You can use this Snap to customize chunk knowledge context according to the Snap configuration. You can use the output of this Snap to query the vector data.
- Transform-type Snap
- Works in Ultra Tasks
Prerequisites
None.
Limitations and known issues
None.
Snap views
View | Description | Examples of upstream and downstream Snaps |
---|---|---|
Input |
This Snap supports a maximum of one binary or document input view.
|
|
Output | This Snap has at the most one document output view. There is an output document for each chunk generated (includes passing the original input document/header with every request). The result chunks can then be embedded and stored in the vector database. | Pinecone Upsert |
Error |
Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:
Learn more about Error handling in Pipelines. |
Snap settings
- Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
- Expression icon (): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
- Add icon (): Indicates that you can add fields in the field set.
- Remove icon (): Indicates that you can remove fields from the field set.
Field / field set | Type | Description |
---|---|---|
Label | String |
Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Chunker Example: Create chunk of report |
Content | String/Expression |
Appears when the Input type is Document in the Views tab. Required. Specify the text content to be chunked. Default value: N/A Example: "test text" |
Chunk unit | Dropdown list/Expression |
Required.
Specify the unit of the chunk size and the chunk overlap. Available options include:
Default value: Char Example: Token |
Chunk size | Integer/Expression |
Required. Specify the maximum number of tokens each chunk must have. Chunk size is the total size of each chunk (if a header is specified, that contributes to the chunk size). Default value: 1000 Example: 100 |
Chunk overlap | Integer/Expression |
Required. Specify the number of tokens that can overlap between consecutive chunks. Default value: 0 Example: 1 |
Chunk header configuration | Use this to define chunk header properties. | |
Header type | Dropdown list/Expression |
Specify the chunk header type. Available options include:
Default value: None Example: User-defined header |
Header length | Integer/Expression |
Appears when you select First characters of content in the Header type field. Specify the number of characters in the beginning of the content to be added to the beginning of every chunk of the output. Default value: N/A Example: 10 |
Header content | String/Expression |
Appears when you select User-defined header in the Header type field. Specify the user-defined chunk header to be added to the beginning of every chunk of the output. Default value: N/A Example: Long header |
Snap execution | Dropdown list |
Select one of the three modes in which the Snap executes.
Available options are:
Default value: Validate & Execute Example: Execute only |