Pipeline execution flow

Integration execution includes the following stages:


The four integration execution stages

The Snaplex execution flow topic shows how the control plane and data plane (Snaplex nodes) divide the work during execution.

Initialization

During initialization, the system establishes the context and assigns resources for the upcoming pipeline execution. Initialization begins when the control plane receives an execution request from one of the following:

  • Scheduled task

  • External triggers, such as API calls and event notifications

  • Manual initiation by a user from the Designer or Classic Manager

For multi-node Snaplexes, the control plane assesses the status and resource capacity of the Snaplex nodes. It ranks nodes based on workload, processing power, and memory, and selects the most appropriate nodes to execute the integration. One node is designated as the leader.

Preparation

During preparation, the control plane and the data plane coordinate to ensure that all necessary components are in place for pipeline execution. This stage allows the system to identify and address potential configuration issues.

The prepare stage includes:

  • Retrieval of metadata from the control plane. This includes configuration and dependencies, such as referenced Snaps, shared resources, and connections.
  • Pre-execution checks to identify any missing or invalid mandatory attributes in the Snap configuration and retrieve the account credentials needed by the Snaps.
  • Verify connections to endpoints using the configured protocols for smooth data flow during execution.
  • Initial resource allocation on the designated nodes based pipeline needs and expected workload. This ensures that sufficient memory and processing power are available for pipeline execution.
  • Distribution of pipeline code and components to the designated Snaplex node. This involves transferring the necessary files and configuration data for execution.

Execution

During the execution stage, data integration, transformation, and orchestration happens in real-time. Snaplex nodes communicate with the control plane to report the status of task execution, provide updates, and receive further instructions.

  • Snap execution: The individual Snaps begin processing data following the directives provided by the control plane.
  • Endpoint interactions: The pipeline establishes connections to any required external endpoints (for example, databases, applications, and cloud services) using the specified protocols.
  • Data flow orchestration: The pipeline coordinates the flow of data between Snaps and endpoints, ensuring that data moves through the pipeline in the correct sequence and format.
  • Resource management: The Snaplex node dynamically manages resources (memory, CPU, network) during execution for optimal performance and to prevent bottlenecks. The node collects execution metrics, such as processing time, data volume, and error rates.
  • Pipeline execution statistics: As a pipeline executes, the statistics update periodically so that you can monitor its progress. For more details, refer to the Snap Statistics tab View pipeline details.

Completion

The completion stage begins after the pipeline execution finishes and the node releases resources. During this stage, the pipeline sends a comprehensive set of execution metrics to the control plane, including:

  • Execution start and end timestamps: Timestamps for the duration of the pipeline execution.
  • Data volume processed: The amount of data that the pipeline ingests, transforms, and processes during execution.
  • Number of records processed: The count of individual data records ingested, transformed, and processed by the pipeline during its execution.
  • Success or failure status: A successful status indicates that the pipeline completed its tasks without encountering errors or issues. A failure status indicates that the pipeline encountered errors or exceptions during execution.
  • Any errors or warnings encountered: The issues or notifications that arise during the execution of the pipeline. These can include data validation failures, connectivity issues with endpoints, resource constraints, or any other issues that might impact the successful processing of the data.
  • Resource usage statistics: This includes the following statistics:
    • CPU utilized: The percentage of available CPU resources used by the pipeline during execution.
    • Memory usage: The amount of memory (RAM) consumed by the pipeline to store data, code, and intermediate results.
    • Network usage: The amount of network traffic generated by the pipeline, typically in bytes or megabytes, for data transfer and communication with external systems.