Jupyter Notebook Integration

Integration of and working with Jupyter Notebook with SnapLogic

Remote Python Script (Python) Snap executes a Python script natively on local/remote Python executors. The script is stored inside the Snap, which can be edited using the built-in script editor. However, the built-in script editor does not provide interactive features, which can be found in the Jupyter Notebook. Integrating SnapLogic application with Jupyter Notebook enables you to connect the Python Snap to Jupyter Notebook. This gives you access to all the interactive features that come with Jupyter Notebook, making it easier for you to develop the Python script. Once the script is ready, you can publish it to the pipeline. This way, we can develop the Python script inside the notebook with sample data using local executor on your laptop. When the script is published, it will be executed as part of the pipeline in the production environment.

The image below shows a pipeline with a Python Snap. The Snap is shown on the left, and the built-in script editor is on the right. The built-in script editor is appropriate for quick editing but not for developing large Python scripts.


Pipeline with Python Snap and Built-in Editor

The video below shows the main menu in Jupyter Notebook, where you must specify the Python Snap you want to use, develop the script, publish, and validate the pipeline. The input, output, error, and console can be displayed as a table, JSON format, or in its raw format inside the notebook.

Installation

Python 3.6

The recommended version of Python is 3.6, which can be downloaded from here.

SnapLogic package

The SnapLogic package can be installed using pip command. All dependencies will be installed automatically.

pip3 install snaplogic
Installing in Linux:

To install Jupyter Notebook on a Linux machine:

  1. Install the SnapLogic package

    pip3 install snaplogic
  2. Install Jupyter Notebook

    sudo apt install jupyter-notebook

Run Jupyter Notebook to verify if all the packages have been installed properly. If the login page is not displayed properly, it is because the Jupyter widgets are not setup properly. To setup widgets, install and enable them by executing the following commands:

  1. Install widgets

    pip3 install ipywidgets
  2. Enable widgets

    jupyter nbextension enable --py widgetsnbextension

If the Paste Info button is missing in the main menu, you must install Pyperclip using the following command:

sudo apt-get install xclip

Getting Started

Starting Jupyter Server

In the terminal, start the Jupyter server using the following command. Update the port number as appropriate. Once the server starts, the URL is displayed in the console along with the access token. In most cases, you will be redirected automatically.

jupyter notebook --port 9999

Creating Notebook

On the right, click New, then Python 3 to create a new notebook.


Creating a New Python 3 Notebook

Using the SnapLogic - Jupyter Notebook Integration

Logging In

In the cell, execute the following script to display the login form.

from snaplogic.jupyter import SnapLogic
sl = SnapLogic()

Jupyter Login

Alternatively, the username and password can be passed as parameters to log in programmatically without the login form.

from snaplogic.jupyter import SnapLogic
sl = SnapLogic(username="[email protected]", password="**********")

Displaying Main Menu

After logging in, execute the following script to launch the main menu.

sl.display_main_menu()

Jupyter Main Menu

Linking Snap

In SnapLogic Designer canvas, right-click the Remote Python Script Snap and select Copy Info.


Linking Snap to Notebook

Then, click Paste Info to connect the Snap to the notebook.


Linked Snap in Notebook

Fetching Script

Once the Snap is connected to the notebook, click Fetch Script to download the script from the Snap. The script will be placed in the cell below.

Below is the starter script. There are three main functions:

  1. snaplogic_init
  2. snaplogic_process
  3. snaplogic_final

The first function (snaplogic_init) is executed before consuming input data. The second function (snaplogic_process) is called on each of the incoming documents. The last function (snaplogic_final) is processed after all incoming documents have been consumed by snaplogic_process.

Fetching Script from Snap

Publishing and Validating

Now, you can develop the script inside the notebook. Click Save Script to save the script locally. The saved script will be displayed in the Script tab for review. In this case, we use the script from the tutorial - Iris Flower Classification using Neural Networks. The input is the Iris Flower classification dataset, and the output is the Neural Networks model.


Saving the Script

Then, we can publish the script to the pipeline by double-clicking Publish. Alternatively, double-clicking Cloud Validate will also validate the pipeline and the preview data will be generated. Input, output, and error can be viewed as a table, JSON format, or in its raw format. Moreover, the standard output from print statements or verbose operations will be displayed in the console.

Publishing the Script Validated Pipeline Output Preview Data Displayed

Local Validation

The Cloud Validate operation can take a long time. It comprises the following steps:

  1. Publish the script to the pipeline.
  2. Perform pipeline validation.
    1. Pipeline description is sent to the Snaplex.
    2. Snaplex validates the pipeline based on the first 50 input documents.
    3. The preview data is encrypted and saved.
  3. Download the preview input, output, error, and console data to the notebook.
  4. Decrypt and display.

Alternatively, the Local Validate operation executes the script locally using the input data from the Cloud Validate operation. This operation can be completed quickly.

Note:

The result from local validation can be slightly different, since the local environment may be different from the production one on Snaplex.