Chunker

Overview

You can use this Snap to customize chunk knowledge context according to the Snap configuration. You can use the output of this Snap to query the vector data.

Transform-type Snap
Works in Ultra Tasks

Prerequisites

None.

Limitations and known issues

None.

Snap views


View	Description	Examples of upstream and downstream Snaps
Input	This Snap supports a maximum of one binary or document input view. Binary Input type: Requires a Binary file as an input. Document Input type: Requires a Document input and provides an expression-enabled property (by default) - Content for input.	Mapper JSON Generator
Output	This Snap has at the most one document output view. There is an output document for each chunk generated (includes passing the original input document/header with every request). The result chunks can then be embedded and stored in the vector database.	Pinecone Upsert
Error	Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are: Stop Pipeline Execution Stops the current pipeline execution when an error occurs. Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records. Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution. Learn more about Error handling in Pipelines.

Snap settings

Legend:

Expression icon (): Allows using JavaScript syntax to access SnapLogic Expressions to set field values dynamically (if enabled). If disabled, you can provide a static value. Learn more.
SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
Suggestion icon (): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
Upload : Uploads files. Learn more.

Learn more about the icons in the Snap settings dialog.


Field / Field set	Type	Description
Label	String	Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline. Default value: Chunker Example: Create chunk of report
Content	String/Expression	Appears when the Input type is Document in the Views tab. Required. Specify the text content to be chunked. Default value: N/A Example: "test text"
Chunk unit	Dropdown list/Expression	Required. Specify the unit of the chunk size and the chunk overlap. Available options include: `Char`: Refers to character-based chunking. If you choose this option as the chunk unit, the text is divided into chunks based on the number of characters. Each chunk will have a specified length in terms of characters. `Token`: Refers to token-based chunking. If you choose this option as the chunk unit, the text is divided into chunks based on linguistic tokens. This option supports only GPT-3.5 Turbo and GPT-4 models. Default value: Char Example: Token
Chunk size	Integer/Expression	Required. Specify the maximum number of tokens each chunk must have. Chunk size is the total size of each chunk (if a header is specified, that contributes to the chunk size). Default value: 1000 Example: 100
Chunk overlap	Integer/Expression	Required. Specify the number of tokens that can overlap between consecutive chunks. Default value: 0 Example: 1
Chunk header configuration	Use this to define chunk header properties.
Header type	Dropdown list/Expression	Specify the chunk header type. Available options include: `None` `User-defined header` `First characters of content` Default value: None Example: User-defined header
Header length	Integer/Expression	Appears when you select First characters of content in the Header type field. Specify the number of characters in the beginning of the content to be added to the beginning of every chunk of the output. Default value: N/A Example: 10
Header content	String/Expression	Appears when you select User-defined header in the Header type field. Specify the user-defined chunk header to be added to the beginning of every chunk of the output. Default value: N/A Example: Long header
Snap execution	Dropdown list	Choose one of the three modes in which the Snap executes. Available options are: Validate & Execute: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime. Execute only: Performs full execution of the Snap during pipeline execution without generating preview data. Disabled: Disables the Snap and all Snaps that are downstream from it. Default value: Validate & Execute Example: Execute only

Examples

Generate manageable chunks of an article