Pipeline Design

The SnapLogic platform provides data integration, transformation, and orchestration capabilities for diverse data sources, such as databases, spreadsheets, and applications. The core asset in SnapLogic is the pipeline. SnapLogic pipelines provide a powerful, flexible way to handle all your data integration and transformation needs, from simple file transfers to complex multi-system orchestrations. Pipelines are composed of endpoint-specific processing components called Snaps. Snaps are grouped logically into Snap Packs. The Snap paradigm and unique platform architecture lends itself to an endless number of pipeline designs, and we focus on the core components in the platform. This article describes some of the common patterns that SnapLogic customers have successfully addressed.

A pipeline is a collection of Snaps that is designed to solve a business use case. For example, they might extract some data, transform it or map it, and then load it somewhere else. The execution engine for pipelines is the Snaplex. Snaplexes can run in the cloud or an on-premises environment.

SnapLogic provides a variety of Snap Packs that connect to endpoints across applications, databases, cloud data warehouses, and data lakes. You can create connections to accounts for those endpoints.

Snap Packs are grouped in the following categories:

Data Mapping

The Mapper Snap, part of the core Transformation Snap Pack, provides a robust set of tools for transforming data and mapping source and target data fields. The Snap supports the SnapLogic expression language, and infers the input schema to offer a visualization of the data set. You can also use AutoPrep to prepare data for analysis, reporting, and machine learning.

Orchestration

The most common design pattern for automation of data processing jobs is the parent/child pipeline construct. With the Pipeline Execute Snap, you can call child pipelines from a parent pipeline which is useful for orchestrating many data processing jobs.

Structure complex pipelines into smaller segments through child pipelines. You can add a Pipeline Execute Snap in the child pipeline, and call another generation of child pipelines. In the standard mode, a new child pipeline is started per input document. In reuse mode, child pipelines are started and each child pipeline instance can process multiple input documents from the parent. The Snap settings include batching and pooling options to enable parallel processing in the child pipeline.

Scheduling Jobs

For iterative jobs like bulk updates, you can schedule a pipeline run. SnapLogic supports the complex scheduling of pipelines, including cron jobs. However, consider that a pipeline run that takes longer than the rate of the schedule might not be an effective design.

Moving high-volume data and integration scheduling

You can also build endpoint-to-endpoint integrations in AutoSync. AutoSync is a stand-alone application that provides a different UI than Designer. The interface design makes building these data integrations and scheduling them easily. The AutoSync application can be launched from the IIP platform.

Pipeline Invocation

Pipelines are commonly implemented as APIs. SnapLogic provides flexibility in the design and implementation of these pipelines.

Low-volume APIs

The most common way to start a pipeline is by giving it a URI endpoint and calling the URI. In the SnapLogic platform, this is called a Triggered Task, which can be used to build an endpoint of a web API. A Triggered Task also allows passing data into and retrieving data from a pipeline. Because it has the properties of an API, it can be shared and used in automation. Use Triggered Tasks for low-volume handling of APIs or for situations where a swift response is not required. They also are appropriate for batching with on-premise sources and targets, where running the task on a Groundplex (Self-managed Snaplex) reduces latency.

High-volume APIs

Most web application endpoints can be accessed through APIs. For production pipelines that process high-volume APIs, the Ultra pipeline task provides the ability to maintain continuous delivery. Because the Ultra pipeline is always in a prepared state, the transmission of data is a reliable vehicle for web applications. The most popular Ultra pipeline design is a straightforward request-response construct that is used as a data access layer for real-time web services. This design transforms a pipeline into a continuously running job. Documents are supplied to the pipeline through a FeedMaster node inside the Snaplex that maintains a queue of documents. The pipeline parses the incoming documents, and responses are returned through the FeedMaster to the caller. Ultra is a subscription feature. Although it is similar to Triggered Tasks because it is called with a URL, Ultra also limits the usage of specific Snaps. Learn more about using Ultra task pipelines to power a web application.

Cloud Data Warehouse: same source and target

Use the ELT Snap Pack to design pipelines in the same cloud data warehouse. You can transform your data and leverage the pushdown optimization offered by the cloud data warehouse. You can also use a pipeline Execute Snap to call ELT pipelines when moving data from your data lake data lakes via Snap Accounts that enable you to access resources on the endpoint. You can create applications in the endpoint that establishes a connection between SnapLogic and the endpoint.

Polling different data sources

If your pipeline is polling different data sources, you can use a Headless Ultra design. This pipeline type enables you to leverage the prepared pipeline state without using a FeedMaster node in your Snaplex. Because it can continuously poll the target messaging service, this design type is preferable to running a standard pipeline on a schedule with intervals of less than one minute.

Data connection interruptions

If data connection interruptions require suspension and manual resumption of data processing, you can create a Resumable Pipeline. When the endpoint is down, the Pipeline goes into a suspended state, from which it can only be resumed manually.

AI Agents

AgentCreator is a feature that enables you to build pipelines to power agents. Along with AI-based Snaps that allow you to connect to LLMs, AgentCreator provides pipeline patterns that allow you to create functioning agents easily that are scalable and can draw from many data resources.