Pipeline execution flow

From the standpoint of an integration, the execution flow includes four stages: Inititialization, Preparation, Execution, and Completion. This topic describes those stages, Snaplex execution flow shows how the Control Plane and Snaplex nodes divide the responsibility during execution.

Initialization

Initialization encompasses the preparatory steps required to set up for execution. During initialization, the system establishes the context and assigns resources for the upcoming pipeline execution. This stage includes:

  1. Incoming request: The control plane receives the pipeline execution request from one of the following:
    • Scheduled task

    • External triggers, such as API calls and event notifications

    • Manual initiation by a user from the Designer or Classic Manager

  2. Leader node decision: The control plane assesses the status and resource capacity of the Snaplex nodes, prioritizes nodes based on workload, processing power, and memory, selects the most appropriate node, and prepares it for pipeline execution.

Prepare

During this stage, the control plane and the data plane communicate to ensure that all necessary components are in place for pipeline execution. Preparation allows the system to identify and address potential configuration issues. The stage includes:

  • Retrieval of metadata and dependencies: The system fetches pipeline metadata, including its structure, configuration, and dependencies (such as referenced Snaps, shared resources, and connections) from the control plane.
  • Pre-execution checks: The system identifies any missing or invalid mandatory attributes in the Snap configuration and retrieves the account credentials needed by the Snaps.
  • Endpoint Interaction: The system prepares and verifies connections to endpoints using the configured protocols. This prepares for smooth data flow during the execution stage.
  • Initial Resource Allocation: Depending on pipeline needs and expected workload, the control plane allocates initial resources on the designated Snaplex node or nodes. This ensures that resources such as sufficient memory and processing power are available for pipeline execution.
  • Pipeline distribution: The pipeline code and components are distributed to the designated Snaplex nodes. This involves transferring the necessary files and configuration data for execution.

Execution

During the execution stage, integration tasks run, and the orchestrated flow of data, as defined in the pipeline, happens in real-time. Throughout this process, the Snaplex nodes communicate with the control plane to report the status of task execution, provide updates, and receive further instructions.

  • Snap execution: The individual Snaps begin processing data. They perform the designated task according to the directives provided by the control plane.
  • Endpoint interactions: The pipeline establishes connections to any required external endpoints (for example, databases, applications, and cloud services) using the specified protocols. This enables the Snaps to process data from these systems.
  • Data flow orchestration: The pipeline coordinates the flow of data between Snaps and endpoints, ensuring that data moves through the pipeline in the correct sequence and format.
  • Resource management: The Snaplex node dynamically manages resources (memory, CPU, network) during execution for optimal performance and to prevent bottlenecks. The pipeline collects execution metrics, such as processing time, data volume, and error rates.
  • Pipeline execution statistics: The pipeline execution statistics is an information on the pipeline status when executed. As a pipeline executes, the statistics update periodically so that you can monitor progress. For more details, refer to the Snap Statistics tab View pipeline details.

Completion

The Completion stage begins after execution finishes and the node releases resources. During this stage, the pipeline sends a comprehensive set of execution metrics to the control plane, including:

  • Execution start and end timestamps: Timestamps for the duration of the pipeline execution.
  • Data volume processed: The amount of data that the pipeline ingests, transforms, and processes during execution.
  • Number of records processed: The count of individual data records ingested, transformed, and processed by the pipeline during its execution.
  • Success or failure status: A successful status indicates that the pipeline completed its tasks without encountering errors or issues, while a failure status indicates that the pipeline encountered errors or exceptions during execution.
  • Any errors or warnings encountered: The issues or notifications that arise during the execution of the pipeline. These can include data validation failures, connectivity issues with endpoints, resource constraints, or any other issues that might impact the successful processing of the data.
  • Resource usage statistics: This includes the following statistics:
    • CPU utilized: The percentage of available CPU resources used by the pipeline during execution.
    • Memory usage: The amount of memory (RAM) consumed by the pipeline to store data, code, and intermediate results.
    • Network usage: The amount of network traffic generated by the pipeline, typically in bytes or megabytes, for data transfer and communication with external systems.