ipynb download badge   Binder badge

Nodes and workflows

Nodes and workflows are a functionality within sisl that allows you to define functional workflows that are lazily computed in a very easy way. Some of the reasons that you might want to use the sisl.nodes framework are:

  • Cleaner and maintainable code: The framework forces to write your workflows in a functional style. That is, each piece of functionality must be packed into a node, which must be a pure function, i.e. a function whose result depends only on its inputs. If you manage to do this, your code will be much more reusable and reproduceable by other people. It also helps a lot in testing your code, since you can very easily test each piece individually.

  • Easier to use by an external interface: Graphical interfaces (GUI) can have a hard time interacting with code if there is no clear division of functionalities or the inputs to provide are very complex. Nodes are pieces of functionality with simple input fields. Whenever an input to a node is too complex, you can always generate it from another node, simplifying the input that the user needs to provide. Linking a node’s output to another node’s input is therefore a very efficient way of creating “workchains” from a GUI by providing only simple inputs.

Note: The plan is to convert `sisl-gui <https://pypi.org/project/sisl-gui/>`__ to use these nodes and workflows, so whatever that you develope within the framework will be usable automatically by the GUI.

Nodes

In sisl.nodes, you have the Node class:

[1]:
import sisl
from sisl.nodes import Node
info:0: SislInfo: Please install tqdm (pip install tqdm) for better looking progress bars

You can easily create a node from a function with the from_func method:

[2]:
@Node.from_func
def my_sum(a: int, b: int):
    print(f"SUMMING {a} + {b}")
    return a + b


# Instead of using it as a decorator, if you want to keep the pristine function,
# you can always create the node later:
#
# def my_sum(a: int, b: int):
#     print(f"SUMMING {a} + {b}")
#     return a + b
#
# my_sum_node = Node.from_func(my_sum)

By default, nodes compute lazily. That is they only run when you explicitly ask for the result. Therefore, calling your node won’t run the function, it will just create a new node instance.

[3]:
my_sum(2, 5)
[3]:
<__main__.my_sum at 0x7f8ca00edf40>

It is only when you call .get() on it that it will compute its result.

[4]:
result = my_sum(2, 5)

result.get()
SUMMING 2 + 5
[4]:
7

The result is then stored in the node, and if you keep requesting it the node will not need to recompute, it will just return the result:

[5]:
# This won't execute the function, so we won't see the printed message.
result.get()
[5]:
7

Nodes will typically be part of a workflow. If you want to change an input in a section of your workflow, you should not need to substitute your node. Therefore, nodes have a method to change inputs:

[6]:
result.update_inputs(a=8)
[6]:
<__main__.my_sum at 0x7f8ca82b4e30>

And now, when you need the value again, it will understand that the stored output is outdated and recompute:

[7]:
result.get()
SUMMING 8 + 5
[7]:
13

A node’s context defines how it behaves. One of the context keys is lazy, which determines whether the node should be recomputed each time its inputs change. By default it is True, which means it waits for its output to be needed. However, it can be set to False.

[8]:
auto_result = my_sum(2, 5)

auto_result.context.update(lazy=False)

auto_result.get()
auto_result.update_inputs(a=8)
SUMMING 2 + 5
SUMMING 8 + 5
[8]:
<__main__.my_sum at 0x7f8c7494b5f0>

And now comes the most useful thing about nodes. If you pass a node as an input to another node, the nodes are recursively resolved until they reach a leaf that is not a node.

In the following example, we will create a node that depends on another node. We will see that whenever you need the result for the final node, all its dependencies are computed.

[9]:
# Compute a first value
first_val = my_sum(2, 5)
# Use the first value to compute our final value
final_val = my_sum(first_val, 5)

final_val.get()
SUMMING 2 + 5
SUMMING 7 + 5
[9]:
12

Exactly as in the case where we had only one node, if you update the inputs of any node, the results also get recomputed when the value is requested.

In the following example we update the input of the first node. When we request the output of the last node, the first node goes like “Wait a moment, I am outdated, I need to recompute my value”. Then when the value is recomputed, the final node goes on to also recompute its value with the new input.

[10]:
first_val.update_inputs(a=7)
final_val.get()
SUMMING 7 + 5
SUMMING 12 + 5
[10]:
17

And if a node doesn’t need to be recomputed, it will just return the stored output. In the following cell we update the inputs of our second node, but the first one still has the same inputs and therefore doesn’t need to recompute:

[11]:
final_val.update_inputs(b=20)
final_val.get()
SUMMING 12 + 20
[11]:
32

When nodes are passed as inputs, they are not only recursively resolved. A connection between them is made so that they can propagate information through the tree. That is, when a node updates its inputs, it will send a signal up the tree that its output is outdated. In this way, if some node up the tree wants automatic recalculation, it will trigger a recompute of himself, which will recursively reach the outdated node.

Let’s create again two nodes, but this time the final one will have automatic recalculation:

[12]:
# Compute a first value
first_val = my_sum(2, 5)
# Use the first value to compute our final value, which we want to
# automatically recompute when there are changes.
final_val = my_sum(first_val, 5)
final_val.context.update(lazy=False)

# Get the value
final_val.get()
SUMMING 2 + 5
SUMMING 7 + 5
[12]:
12

Now, when we update the inputs of the first node, the second one will notice, and it will trigger a recompute on all the tree, just as if we had called its .get() method.

[13]:
# Update the inputs of the first node, which will trigger recalculation
first_val.update_inputs(a=7)
SUMMING 7 + 5
SUMMING 12 + 5
[13]:
<__main__.my_sum at 0x7f8c749840e0>

This might be useful to create “event listeners” that enable live updating naturally. We might introduce “async” nodes at some point (?).

Workflows

At this point, the need for somehow packing the workchain that we created arises naturally. What you would usually do is to wrap your code into a function. And that is also what we do here. We call this wrapper function a workflow, and workflows can be created just as nodes are:

[14]:
from sisl.nodes import Workflow


def my_sum(a, b):
    print(f"SUMMING {a} + {b}")
    return a + b


# Define our workchain as a workflow.
@Workflow.from_func
def triple_sum(a: int, b: int, c: int):
    first_val = my_sum(a, b)
    return my_sum(first_val, c)


# Again, if you want to keep the pristine function,
# don't use the decorator
#
# def triple_sum(a: int, b: int, c: int):
#    first_val = my_sum(a, b)
#    return my_sum(first_val, c)
#
# my_workflow = Workflow.from_func(triple_sum)
warn:0: SislWarning: Decorators are ignored for now on workflow creation. Ignoring 1 decorators on triple_sum

When a workflow is defined, the nodes within it are discovered and stored in dryrun_nodes:

[15]:
wf_nodes = triple_sum.dryrun_nodes
wf_nodes
[15]:
<sisl.nodes.workflow.WorkflowNodes at 0x7f8c75d0a750>

You can print them to get an idea of the nodes that you have there:

[16]:
print(wf_nodes)
Inputs: {'a': <sisl.nodes.workflow.WorkflowInput object at 0x7f8ca8fc91c0>, 'b': <sisl.nodes.workflow.WorkflowInput object at 0x7f8c74e31070>, 'c': <sisl.nodes.workflow.WorkflowInput object at 0x7f8c777716d0>}

Workers: {'my_sum': <__main__.my_sum object at 0x7f8c7494b500>, 'my_sum_1': <__main__.my_sum object at 0x7f8c74f4cf50>}

Output: <sisl.nodes.workflow.WorkflowOutput object at 0x7f8c74984830>

Named nodes: {'first_val': 'my_sum'}

One important thing that you can see here is that the workflow gives names to the nodes that it uses so that it can easily find them when needed. The name is usually just the node’s name, but if there are multiple nodes of the same type it can get a bit more messy, because it adds a suffix _1, _2, _3

For that reason, it is always a good idea to give more human understandable names to nodes. On workflow creation, variable assignments are automatically discovered, and the workflow uses the name of the variable as an alias to find the node. In this way, you can very easily give more meaningful names to nodes so that you can find them afterwards!

[17]:
wf_nodes.first_val
[17]:
<__main__.my_sum at 0x7f8c7494b500>

Accessing nodes of the workflows is nice, but often it is difficult to get an idea of the whole workflow if it gets a bit complex.

It is always easier to understand the workflow by visualizing it. For that, you can use the visualize method of its network attribute, but you need networkx and pyvis installed in your computer, which you can install through pip.

[18]:
triple_sum.network.visualize(notebook=True)
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/IPython/core/display.py:431: UserWarning: Consider using IPython.display.IFrame instead
  warnings.warn("Consider using IPython.display.IFrame instead")

There are many tweaks that you can try on the visualization, but we are not going to enter into the details. You can play with it to find the most appropiate representation!

Workflows are just a way of organizing nodes, so they work exactly the same. By default, they are lazy, so calling your workflow class will just get you an instance:

Workflows are, unlike nodes, lazy by default. So when you call a workflow, you get a workflow instance:

[19]:
result = triple_sum(2, 3, 4)
result
[19]:
<__main__.triple_sum at 0x7f8ca8f67a70>

And then whenever you ask for the value, the workflow runs.

[20]:
result.get()
SUMMING 2 + 3
SUMMING 5 + 4
[20]:
9

Workflows link their inputs to inputs of the nodes they contain. In this way, if you update some input of the workflow, only the nodes that used that input will get updated, and only the necessary recomputation will be performed, exactly as we saw before.

[21]:
result.update_inputs(c=8)
[21]:
<__main__.triple_sum at 0x7f8ca8f67a70>
[22]:
result.get()
SUMMING 5 + 8
[22]:
13

Once the workflow has been instantiated, it will contain instantiated nodes, which are different from the nodes that are produced during the discovery run.

[23]:
result.nodes
[23]:
<sisl.nodes.workflow.WorkflowNodes at 0x7f8c73353c20>

One can imagine reusing their result for something else.

As an example, we can create an automatically recalculating node that will just inform us whenever the intermediate value changes

[24]:
@Node.from_func(context={"lazy": False})
def alert_change(val: int):
    print(f"VALUE CHANGED, it now is {val}")


# We feed the node that produces the intermediate value into our alert node
my_alert = alert_change(result.nodes.first_val)

# Now when we update the inputs of the workflow, the node will propagate the information through
# our new node.
result.update_inputs(a=10)
VALUE CHANGED, it now is 5
SUMMING 10 + 3
VALUE CHANGED, it now is 13
[24]:
<__main__.triple_sum at 0x7f8ca8f67a70>

It sometimes might be useful to provide methods for a workflow. For that case, workflows can also be defined with class syntax, passing the workflow as a static method in the function method.

[25]:
class TripleSum(Workflow):
    # Define the function that runs the workflow, exactly as we did before.
    @staticmethod
    def function(a: int, b: int, c: int):
        first_val = my_sum(a, b)
        return my_sum(first_val, c)

    # Now, we have the possibility of adding new methods to it.
    def scale(self, factor: int):
        self.update_inputs(
            a=self.get_input("a") * factor,
            b=self.get_input("b") * factor,
            c=self.get_input("c") * factor,
        )
warn:0: SislWarning: Decorators are ignored for now on workflow creation. Ignoring 1 decorators on function

We can now use the workflow exactly as we did before.

[26]:
result = TripleSum(2, 3, 4)
result.get()
SUMMING 2 + 3
SUMMING 5 + 4
[26]:
9

But with the added possibility of using this useful provided method:

[27]:
result.scale(4)
result.get()
SUMMING 8 + 12
SUMMING 20 + 16
[27]:
36

It is important to know that whatever calculation that you do in your workflow’s code will be reconverted to a node. This means that its inputs and outputs are stored. As a not so obvious example, if you sum three values, the intermediate sum will be stored as a node:

[28]:
@Workflow.from_func
def sum_triple(a, b, c):
    val = a + b + c
    return val


sum_triple.network.visualize(
    notebook=True,
)
warn:0: SislWarning: Decorators are ignored for now on workflow creation. Ignoring 1 decorators on sum_triple

So if you don’t want that, you should pack everything that you don’t want to be saved in a separate function, and then use it in the workflow:

[29]:
def operation(a, b, c):
    return a + b + c


@Workflow.from_func
def sum_triple(a, b, c):
    val = operation(a, b, c)
    return val


sum_triple.network.visualize(notebook=True)
warn:0: SislWarning: Decorators are ignored for now on workflow creation. Ignoring 1 decorators on sum_triple
[ ]: