Welcome!

Welcome to flow, a system for defining and running parallel, dataflow programs like this one:

First flow

Page through this book using the '>' and '<' buttons on the side of the page or navigating directly to a section using the Table of Contents on the left.

The top-level sections are:

Installing flow on your system

There are three main options to getting a working install of flow on your system:

from source
downloading a release
homebrew tap

From Source

All pretty standard:

clone this repo
install pre-requisites with make config
build and test with make

That will leave binaries (such as flowc and flowrcli etc) in target/debug, and flowstdlib installed into $HOME/.flow/lib.

You can use them from there or you can install using cargo install.

Downloading the latest release

From latest GitHub release download and manually install the executables for your target system:

flowc
flowrcli
flowrex
flowrgui

Then download the portable WASM flowstdlib and expand to the directory $HOME/.flow/lib/flowstdlib

Homebrew tap

A homebrew tap repo is maintained here which you can use to install with homebrew:

> brew tap andrewdavidmackenzie/dataflow
> brew install dataflow

That should install the binaries, the portable flowstdlib WASM library and be ready for running.

What is 'flow'

flow is a system for defining and running inherently parallel, data-dependency-driven 'programs'.

Wikipedia defines dataflow programs as

"dataflow programming is a programming paradigm that models a program as a directed 
graph of the data flowing between operations"

which pretty much sums it up.

A flow program is created by defining a directed graph of processes that process data and that are connected by connections.

A process can have zero or more inputs and produces zero or one output. They have no side-effects. There is no shared-memory.

In flow a process is a generic term. A process can be a function that directly implements the processing on data, or it can be a nested "sub-flow". i.e. Another flow definition, that in turn may contains functions and/or other sub-flows. When we wish to refer to them indistinctly, we will use the term process process. When distinctions need to be made we will use function, flow or sub-flow.

Thus, a flow is an organization object, used to hierarchically organize sub-flows and functions, and functions are what actually get work done on data.

Flows can be nested infinitely, but eventually end in functions. Functions consist of a definition (for the compiler and the human programmer) and an implementation (for the runtime to use to process data).

The connections between processes are explicit declarations of data dependencies between them. i.e. what data is required for a process to be able to run, and what output it produces.

Thus a flow is inherently parallel, without any further need to express the parallelism of the described algorithm.

As part of describing the connections, I would like flow to be also visual, making the data dependencies visible and directly visually "author-able", but this is still a work in progress and a declarative text format for flow definitions was a step on the way and what is currently used.

Functions and sub-flows are interchangeable and nestable, so that higher level programs can be constructed by combining functions and nested flows, making flows reusable.

I don't consider flow a "programming language", as the functionality of the program is created from the combination of functions, that can be very fine grained and implemented in many programming languages (or even assembly, WebAssembly or something else).

Program logic (control flow, loops) emerges from how the processes are "wired together" in 'flows'.

I have chosen to implement the functions included with flow (in the flowstdlib standard library and the context functions of the flowrcli flow runner) in in rust, but they could be in other languages.

I don't consider flow (or the flow description format) a DSL. The file format is chosen for describing a flow in text. The file format is not important, providing it can describe the flow (processes and connections).

I chose TOML as there was good library support for parsing it in rust and it's a bit easier on the eyes than writing JSON. I later implemented multiple deserializers, so the flow description can be in other formats (including json and yaml) and even to be able to mix and combine descriptions in multiple formats.

Q. Is it high-level or low-level?

A. "Well...yes".

The level of granularity chosen for the implementation of functions that flows are built from is arbitrary. A function could be as simple as adding two numbers, or it could implement a complex algorithm.

Interchangeability of `functions` and `sub-flows` as `processes`

A number of simple primitive functions can be combined together into a flow which appears as a complex process to the user, or it could be a complex funtion that implements the entire algorithm in code in a single function.

The users of the process should not need to know how it is implemented. They see the process definition of it's inputs and outputs, a description of the processing it performs, and use it indistinctly.

Fundamental tenets of 'flow'?

The 'tenets', or fundamental design principles, of flow that I have strived to meet include:

No Global or shared memory

The only data within flow is that flowing on the connections between processes. There is no way to store global state, share variables between functions nor persist data across multiple function invocations.

Pure Functions

Functions have no side-effects (except context functions which I'll describe later). Jobs for functions are created with a set of inputs and they produce an output, and the output should only depend on the input, along the lines of "pure" functions in Functional Programming. Thus a function should be able to be invoked multiple times and always produce the same output. Also, functions can be executed by different threads, processes, machines and machines architectures and always produce the same output.

This helps make flow execution predictable, but also parallelizable. Functions can be ran in parallel or interleaved without any dependency on the other functions that may have ran before, those running at the same time, or those in the future - beyond their input values.

This can enable novel tracing and debugging features also such as "time travel" (going backwards in a program) or "un-executing" a function (stepping backwards).

Encapsulation

The complexity of a process is hidden inside it's definition and you don't need to know it's implementation to know how to use it.

Public specification of a process: inputs and outputs for the compiler and user and a text description of what processing it performs on its inputs and what output(s) it produces, for the human programmer.
Private implementation. A process implementation can be a function implemented in code or an entire sub-flow containing many sub-layers and eventually functions.

A process's implementation should be able to be changed, and changed from a function to a sub-flow or vice versa without affecting flow programs that use it.

Re-usability

Enabled by encapsulation. A well defined process can be used in many other flows via references to it. Facilitate the "packing" of processes (be they functions or sub-flows) for re-use by others in other flows.

Portability

The intention is that the run-time can run on many platforms. The libraries have been written to be able to compile to WASM and be portable across machines and machine architectures.

The function implementations in libraries are compiled to native for performance but also to WASM for portability. Function implementations provided by the user as part of a flow are compiled to WASM once, then distributed with the flow and run by any of the run-times, making the flow portable without re-compilation.

Polyglot

Although the compiler and runtimes are written in one language (rust), others versions could be written in other languages, there should be nothing in flow semantics or flow definition specific to one language.

Process implementations supplied with a flow could be written in any language that can compile to WASM, so it can then be distributed with the flow and then loaded and run by any run-time implementation.

Functional Decomposition

Enable a problem to be decomposed into a number of communicating processes, and those in turn can be decomposed and so on down in a hierarchy of processes until functions are used. Thus the implementation is composed of a number of processes, some of which maybe reused from elsewhere and some specific to the problem being solved.

Structured Data

Data that flows between processes can be defined at a high-level, but consist of a complex structure or multiple levels of arrays of data, and processes and sub-processes can select sub-elements as input for their processing.

Inherently Parallel

By making data dependencies between processes the basis of the definition of a flow, the non-parallel aspects of a flow (when one process depends on data from a previous process) are explicit, leading to the ability to execute all processes that can execute (due to the availability of data for them to operate on) at any time, in parallel with other executions of other processes, or of other instances of the same process.

The level of concurrency in a flow program depends only on the data structures used and the connections between the processes that operate on them. Then the level of parallelism exploited in its execution depends on the resources available to the flow runner program running the flow.

Distributable

As the functions are pure, and only depend on their inputs, they maybe executed across threads, cores, processes, machines and (via portability and WASM) even a heterogeneous network of machines of different CPU architectures and operating systems.

Separate the program from the context

There is an explicit separation between the flow program itself, and the environment in which it runs. Flows contain only pure functions, but they are run by a "flow runner" program (such as flowrcli) that provides "impure" context functions for interacting with the context in which it is runs, for things like STDIO, File System, etc.

Efficiency

When there is no data to process, no processes are running and the flow and the flow runner program running it are idle.

Project Components and Structure

Here is a summary of the project components, their purpose and a link to their README.md:

flowcore - A set of core structs and traits used by flowr and flowc plus code to fetch content from file/http and resolve library (lib://) references.
flowmacro - A macro used to help write function implementation code that compile natively and to wasm
flowc - The flowc flow compiler binary is a CLI built around flowrclib that takes a number of command line arguments and source files or URLs and compiles the flow or library referenced.
- flowrclib is the library for compiling flow program and library definitions from toml files, producing generated output projects that can be run by flowrcli or flowrgui.
flowrlib - The flow runner library that loads and executes compiled flows.
flowr - The flowr flow runner binary that can be used to run and debug flows compiled with a flow compiler such as flowc.
flowrex - flowrex is a minimal flow job executor, intended for use across the network associated with flowrcli or flowgui (above).
flowstdlib - the flow "standard library" which contains a set of functions that can be used by flows being defined by the user
examples - A set of examples flows that can be run

The Inspirations for 'flow'

I have had many sources of inspiration in this area over the past three decades.

Without realizing it they started to coalesce in my head and seemingly unrelated ideas from very different areas started to come together to form what I eventually called 'flow' and I started to work on it.

The impetus to actually implement something, instead of just thinking about it, came when I was looking for some "serious", more complex, project in order to learn rust (and later adding WebAssembly to the mix).

It should be noted, that this project was undertaken in a very "personal" (i.e. idiosyncratic) way, without any formal background in the area of functional programming, data flow programming, communicating serial processes or similar. When starting it, I wanted to see if any of my intuitions and ideas could work, ignoring previous efforts or established knowledge and previous projects. I didn't want to get "sucked in" to just re-implementing someone else's ideas.

I have done quite a bit of reading of paper on these areas after getting a reasonable version of flow working and saw I was repeating a number of existing ideas and techniques..no surprise!

Specific inspirations from my career

I have worked with these technologies listed below over the decares (from University until now) and they all added something to the idea for flow in my head.

The Inmos transputer chip and its Occam parallel programming language (which I studied at University in the '80's), without realizing that this was based on Hoare's CSP.
- Parallel programming language (although not based on data dependencies)
- Parallel hardware 8and software processes) that communicated by sending messages over connections (some virtual in software, others over hardware between chips)
Structured Analysis and Design from my work with it in HP the '80s!
- Hierarchical functional decomposition
- Encapsulation
- Separation of Program from Context
UNIX pipes
- Separate processes, each responsible for limited functionality, communicating in a pipeline via messages (text)
Trace scheduling for compiler instruction scheduling based on data dependencies between instructions (operations) work done at MultiFlow and later HP by Josh Fisher, Paolo Faraboschi and others.
- Exploiting inherent parallelism by identifying data dependencies between operations
Amoeba distributed OS by Andrew Tannenbaum that made a collaborating network of computers appear as one to the user of a "Workstation"
- Distribution of tasks not requiring "IO", abstraction of what a machine is and how a computer program can run
Yahoo! Pipes system for building "Web Mashups"
- Visual assembly of a complex program from simpler process by connecting them together with data flows

Non-Inspirations

There are a number of things that you might suspect were part of my set of inspirations for creating 'flow', or maybe you think I even copied the idea from them, but that in fact (you'll have to trust me on this one) is not true.

I didn't study Computer Science, and if I had I may well have been exposed to some of these subjects a long-time ago. That would probably have saved me a lot of time.

But, then I would have been implementing someone else's ideas and not (what I thought were) my own. Think of all the satisfaction I would have lost out on while re-inventing thirty to forty year-old ideas!

While implementing the first steps of 'flow' I started to see some materials come up in my Internet searches, that looked like they could be part of a theory of the things I was struggling with. The Main one would be Hoare's 1976 paper on the "Theory of Communicating Sequential Processes" (or CSP for short).

It turns out some of that based work was the basis for some of my inspirations (e.g. Inmos Transputer and Occam language), unbeknownst to me.

But I decided to deliberately ignore them as I worked out my first thoughts, did the initial implementation and got some simple examples up and running!

Later, I looped back and read some papers, and confirmed most of my conjectures.

I got a bit bored with the algebra approach to it (and related papers) though and didn't read or learn too much.

One Hoare paper refers more to a practical implementation, and does hit on a number of the very subjects I was struggling with, such as the buffering (or not) of data on "data flows" between functions (or processes in his terms).

Once I progress some more, I will probably go back and read more of these papers and books and find solutions to the problems I have struggled to work out on my own - but part of the purpose of this project for me is the intellectual challenge to work them out for myself, as best as I can.

Parallelism

Using flow algorithms can be defined that exploit multiple types of parallelism:

Data Parallelism
Pipelining
Divide and Conquer

Data Parallelism

Also known as "Single Program Multiple Data".

In this case the data is such that it can be segmented and worked on in parallel, using the same basic algorithm for each chunk of data.

An example would be some image processing or image generation task, such as generating the mandlebrot set (see the mandlebrot example in flowr/examples).

The two-dimensional space is broken up into a 2D Array of pixels or points, and then they are streamed through a function or sub-flow that does some processing or calculation, producing a new 2D Array of output values.

Due to the data-independence between them, all of them can be calculated/processed in parallel, across many threads or even machines, exploiting the inherent parallelism of this "embarrassingly parallel" algorithm.

They need to be combined in some way to produce the meaningful output. This could be using an additional sub-flow to combine them (e.g. produce an average intensity or color of an image), that is not parallel, or it could be to render them as an image for the user.

In the case of producing a file or image for the user, functions can be used for that from the flow runner's context functions leaving the flow itself totally parallel.

In a normal procedural language, an image would be rendered in memory in a 2D block of pixels and then written out to file sequentially so that the pixels are placed in the correct order/location in the file.

In a flow program, that could be gone, although accumulating the 2D array in memory may represent a bottleneck. flowrcli's image buffer context function is written such that it can accept pixels in any random order and render them correctly, but having the following inputs:

### Inputs
* `pixel` - the (x, y) coordinate of the pixel
* `value` - the (r, g, b) triplet to write to the pixel
* `size`  - the (width, height) of the image buffer
* `filename` - the file name to persist the buffer to

Map Reduce

Map-Reduce is done similar to above, using a more complex initial step to form independent data "chunks" ("Mapping") that can be processed totally in parallel, and a combining phase ("Reducing) to produce the final output.

Pipelining

A flow program to implement pipeline processing of data is trivial and there is a pipeline example inflowr/examples.

A series of processes (they can be functions or subflows) are defined. Input data is connected to flow to the first, whose output is sent to the second, and so on and the output rendered for the user.

When multiple data values are sent in in short succession (additional values are sent before the first value has propagated out of the output) then multiple of the processes can run in parallel, each one operating on a different data value, as there is no data or processing dependency between the data values.

If there are enough values (per unit time) to demand it, multiple instances of the same processes can be used to increase parallelism, doing the same operation multiple times in parallel on different data values.

Divide and Conquer

Just as in procedural programming, a large problem can be broken down into separate pieces and programmed separately, this can be done with flow.

A complex problem could be broken down into two (say) largely independent sub-problems. Each one can be programmed in different sub-flows, and fed different parts (or copies) of the input data. Then when both produce output they can be combined in some way for the user.

As there is no data dependency between the sub-flow outputs (intermediate values in the grander scheme of things) they can run totally in parallel.

If the two values were just need to be output to the user, then they can each proceed at their own pace (in parallel) and each one output when complete. In this case the order of the values in the output to the user might vary, and appropriate labelling to understand them will be needed.

Depending on how the values need to be combined, or if a strict order in the output is required, then a later ordering or combining step maybe needed. This step will necessarily depends on both sub-flow's output value, thus introducing a data dependency and this final step will operate without parallelism.

Providing the final (non-parallel step) is less compute intensive than the earlier steps, an overall gain can be made by dividing and conquering (and then combining).

Status

The semantics of flows, processes and connections along with the implementation of the flowc compiler, flowrcli runner, context functions and the flowstdlib library has allowed for the creation of a set of example flows that execute as expected.

There has pretty good overall test coverage (> 84%) that allows for safer refactoring.

The book is reasonably extensive but can always be improved. They probably need "real users" (not the author) to try to use them and flow to make the next round of improvements. There are issues in the repo and the project related to improving the book.

I have added an experimental GUI for running flows in flowrgui that uses the rust Iced GUI toolkit.

First flow

Without knowing anything about flow and its detailed semantics you might be able to guess what this flow below does when executed and what the output to STDOUT will be. First flow It is a fibonacci series generator.

Understanding the flow

NOTE:You can find a complete description of flow semantics in the next section Defining Flows

Root flow

All flows start with a root "flow definition". Other sub-flows can be nested under the root, via references to separate flow description files, to enable encapsulation and flow reuse.

In this case it is the only one, and no hierarchy of flows descriptions is used or needed. You can see the TOML root flow definition for this flow in the flowrcli crate's fibonacci example. root.toml

Interaction with the execution environment

The root defines what the interaction with the surrounding execution environment is, such as Stdout, or any other context function provided by the flow runtime being used (e.g. flowrcli).

The only interaction with the execution environment in this example is the use of stdout to print the numbers in the series to the Terminal.

Functions

Functions are stateless, and pure, and just take a set of inputs (one on each of its inputs) and produce an output.

When all the inputs of a function have a value, then the function can run and produce an output, or not produce outputs, as in the case of the impure stdout function.

This flow uses two functions (shown as orange ovals):

stdout from the context functions as described above
- stdout only has one, unnamed, default input and no outputs. It will print the value on STDOUT of the process running the flow runner (flowrcli) that is executing the flow.
the add function from the flow standard library flowstdlib to add two integers together.
- add has two inputs "i1" and "i2" and produces the sum of them on the only, unnamed, "default" output.

Connections

Connections (the solid lines) take the output of a function when it has ran, and send it to the input of connected functions. They can optionally have a name.

When a functions has ran, the input values used are made available again at the output.

In this case the following three connections exist:

"i2" input value is connected back to the "i1" input.
the output of "add" (the sum of "i1" and "i2") is connected back to the "i2" inputs. This connection has optionally been called "sum"
the output of "add" (the sum of "i1" and "i2") is connected to the default input of "Stdout". This connection has optionally been called "sum"

Initializations

Inputs of processes (flows or functions) can be initialized with a value "Once" (at startup) or "Always" (each time it ran) using input initializers (dotted lines)

In this example two input initializers are used to setup the series calculation

"Once" initializer with value "1" in the "i2" input of "add"
"Once" initializer with value "0" in the "i1" input of "add"

Running the flow

This flow exists as an example in the flowr/examples/fibonacci folder. See the root.toml root flow definition file

You can run this flow and observe its output from the terminal, while in the flow project root folder:

> cargo run -p flowc -- -C flowr/src/bin/flowrcli flowr/examples/fibonacci

flowc will compile the flow definition from the root flow definition file (root.toml) using the context functions offered by flowrcli (defined in the flowr/src/bin/flowrcli/context folder) to generate a manifest.json compiled flow manifest in the flowr/examples/fibonacci folder.

flowc then runs flowrcli to execute the flow.

flowrcli is a Command Line flow runner and provides implementations for context functions to read and write to stdio (e.g. stdout).

The flow will produce a fibonacci series printed to Stdout on the terminal.

> cargo run -p flowc -- -C flowr/src/bin/flowrcli flowr/examples/fibonacci
   Compiling flowstdlib v0.6.0 (/Users/andrew/workspace/flow/flowstdlib)
    Finished dev [unoptimized + debuginfo] target(s) in 1.75s
     Running `target/debug/flowc flowr/examples/first`
1
2
3
5
8
...... lines deleted ......
2880067194370816120
4660046610375530309
7540113804746346429

Step-by-Step

Here we walk you through the execution of the previous "my first flow" (the fibonacci series example).

Compiled flows consist of only functions, so flow execution consists of executing functions, or more precisely, jobs formed from a set of inputs, and a reference to the function implementation.

Init

The flow manifest (which contains a list of Functions and their output connections) is loaded.

Any function input that has an input initializer on it, is initialized with the value provided in the initializer.

Any function that has either no inputs (only context funcitons are allowed to have no inputs, such as Stdin) or has a value on all of its inputs, is set to the ready state.

Execution Loop

The next function that is in the ready state (has all its input values available, and is not blocked from sending its output by other functions) has a job created from its input values and the job is dispatched to be run.

Executors wait for jobs to run, run them and then return the result, that may or may not contain an output value.

Any output value is sent to all functions connected to the output of the function that the job ran for. Sending an input value to a function may make that function ready to run.

The above is repeated until there are no more functions in the ready state, then execution has terminated and the flow ends.

Specific Sequence for this example

Below is a description of what happens in the flor runtime to execute the flow.

You can see log output (printed to STDOUT and mixed with the number series output) of what is happening using the -v, verbosity <Verbosity Level> command line option to flowrcli.

Values accepted (from less to more output verbosity) are: error (the default), warn, info debug and trace.

Init:

The "i2" input of the "add" function is initialized with the value 1
The "ii" input of the "add" function is initialized with the value 0
The "add" function has a value on all of its inputs, so it is set to the ready state
STDOUT does not have an input value available so it is not "ready"

Loop Starts

Ready = ["add"]

"add" runs with Inputs = (0, 1) and produces output 1
- value 1 from output of "add" is sent to input "i2" of "add"
  - "add" only has a value on one input, so is NOT ready
- value 1 from output of "add" is sent to default (only) input of "Stdout"
  - "Stdout" has a value on all of its (one) inputs and so is marked "ready"
- input value "i2" (1) of the executed job is sent to input "i1" of "add"
  - "add" now has a value on both its inputs and is marked "ready"

Ready = ["Stdout", "add"]

"Stdout" runs with Inputs = (1) and produces no output
- "Stdout" converts the number value to a String and prints "1" on the STDOUT of the terminal
- "Stdout" no longer has values on its inputs and is set to not ready

Ready = ["add"]

"add" runs with Inputs = (1, 1) and produces output 2
- value 2 from output of "add" is sent to input "i2" of "add"
  - "add" only has a value on one input, so is NOT ready
- value 2 from output of "add" is sent to default (only) input of "Stdout"
  - "Stdout" has a value on all of its (one) inputs and so is marked "ready"
- input value "i2" (1) of the executed job is sent to input "i1" of "add"
  - "add" now has a value on both its inputs and is marked "ready"

Ready = ["Stdout", "add"]

"Stdout" runs with Inputs = (2) and produces no output
- "Stdout" converts the number value to a String and prints "2" on the STDOUT of the terminal
- "Stdout" no longer has values on its inputs and is set to not ready

Ready = ["add"]

The above sequence proceeds, until eventually:

add function detects a numeric overflow in the add operation and outputs no value.
- No value is fed back to the "i1" input of add
  - "add" only has a value on one input, so is NOT ready
- No value is sent to the input of "Stdout"
  - "Stdout" no longer has values on its inputs and is set to not ready

Ready = []

No function is ready to run, so flow execution ends.

Resulting in a fibonacci series being output to Stdout

1
2
3
5
8
...... lines deleted ......
2880067194370816120
4660046610375530309
7540113804746346429

Debugging your first flow

Command line options to `flowc`

When running flowc using cargo run -p flowc you should add -- to mark the end of the options passed to cargo, and the start of the options passed to flowc

You can see what they are using --help producing output similar to this:

cargo run -p flowc -- --help                         
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
     Running 'target/debug/flowc --help'
flowc 0.8.8

USAGE:
    flowc [FLAGS] [OPTIONS] [--] [ARGS]

FLAGS:
    -d, --dump        Dump the flow to .dump files after loading it
    -z, --graphs      Create .dot files for graph generation
    -h, --help        Prints help information
    -p, --provided    Provided function implementations should NOT be compiled from source
    -s, --skip        Skip execution of flow
    -g, --symbols     Generate debug symbols (like process names and full routes)
    -V, --version     Prints version information

OPTIONS:
    -L, --libdir <LIB_DIR|BASE_URL>...    Add a directory or base Url to the Library Search path
    -o, --output <OUTPUT_DIR>             Specify a non-default directory for generated output. Default is $HOME/.flow/lib/{lib_name} for a library.
    -i, --stdin <STDIN_FILENAME>          Read STDIN from the named file
    -v, --verbosity <VERBOSITY_LEVEL>     Set verbosity level for output (trace, debug, info, warn, error (default))

ARGS:
    <FLOW>            the name of the 'flow' definition file or library to compile
    <flow_args>...    Arguments that will get passed onto the flow if it is executed

Command line options to `flowrcli`

By default flowc uses flowrcli to run the flow once it has compiled it. Also it defaults to passing the -n/--native flag to flowrcli so that flows are executed using the native implementations of library functions.

In order to pass command line options on to flowrcli you separate them from the options to flowc after another -- separator.

flowrcli accepts the same -v/--verbosity verbosity options as flowc.

Getting debug output

If you want to follow what the run-time is doing in more detail, you can increase the verbosity level (default level is ERROR) using the -v/--verbosity option.

So, if you want to walk through each and every step of the flow's execution, similar to the previous step by step section then you can do so by using -v debug and piping the output to more (as there is a lot of output!):

cargo run -p flowc -- flowr/examples/fibonacci -- -v debug| more

which should produce output similar to this:

INFO    - 'flowr' version 0.8.8
INFO    - 'flowrlib' version 0.8.8
DEBUG   - Loading library 'context' from 'native'
INFO    - Library 'context' loaded.
DEBUG   - Loading library 'flowstdlib' from 'native'
INFO    - Library 'flowstdlib' loaded.
INFO    - Starting 4 executor threads
DEBUG   - Loading flow manifest from 'file:///Users/andrew/workspace/flow/flowr/examples/fibonacci/manifest.json'
DEBUG   - Loading libraries used by the flow
DEBUG   - Resolving implementations
DEBUG   - Setup 'FLOW_ARGS' with values = '["my-first-flow"]'
INFO    - Maximum jobs dispatched in parallel limited to 8
DEBUG   - Resetting stats and initializing all functions
DEBUG   - Init: Initializing Function #0 '' in Flow #0
DEBUG   -               Input initialized with 'Number(0)'
DEBUG   -               Input initialized with 'Number(1)'
DEBUG   - Init: Initializing Function #1 '' in Flow #0
DEBUG   - Init: Creating any initial block entries that are needed
DEBUG   - Init: Readying initial functions: inputs full and not blocked on output
DEBUG   -               Function #0 not blocked on output, so added to 'Ready' list
DEBUG   - ===========================    Starting flow execution =============================
DEBUG   - Job #0:-------Creating for Function #0 '' ---------------------------
DEBUG   - Job #0:       Inputs: [[Number(0)], [Number(1)]]
DEBUG   - Job #0:       Sent for execution
DEBUG   - Job #0:       Outputs '{"i1":0,"i2":1,"sum":1}'
DEBUG   -               Function #0 sending '1' via output route '/sum' to Self:1
DEBUG   -               Function #0 sending '1' via output route '/sum' to Function #1:0
DEBUG   -               Function #1 not blocked on output, so added to 'Ready' list
DEBUG   -               Function #0 sending '1' via output route '/i2' to Self:0
DEBUG   -               Function #0, inputs full, but blocked on output. Added to blocked list
DEBUG   - Job #1:-------Creating for Function #1 '' ---------------------------
DEBUG   - Job #1:       Inputs: [[Number(1)]]
DEBUG   -                               Function #0 removed from 'blocked' list
DEBUG   -                               Function #0 has inputs ready, so added to 'ready' list
DEBUG   - Job #1:       Sent for execution
DEBUG   - Job #2:-------Creating for Function #0 '' ---------------------------
DEBUG   - Job #2:       Inputs: [[Number(1)], [Number(1)]]
1
DEBUG   - Job #2:       Sent for execution
DEBUG   - Job #2:       Outputs '{"i1":1,"i2":1,"sum":2}'
DEBUG   -               Function #0 sending '2' via output route '/sum' to Self:1
DEBUG   -               Function #0 sending '2' via output route '/sum' to Function #1:0
DEBUG   -               Function #1 not blocked on output, so added to 'Ready' list
DEBUG   -               Function #0 sending '1' via output route '/i2' to Self:0
DEBUG   -               Function #0, inputs full, but blocked on output. Added to blocked list

Defining Flows

We will describe the syntax of definitions files, but also the run-time semantics of flows, functions, jobs, inputs etc in order to understand how a flow will run when defined.

A flow is a static hierarchical grouping of functions that produce and consume data, connected via connections into a graph.

Root Flow

All flows have a root flow definition file.

The root flow can reference functions provided by the "flow runner" application that will execute the flow, for the purpose of interacting with the surrounding environment (such as file IO, standard IO, etc). These are the context functions.

The root flow (as any sub-flow can) may include references to sub-flows and functions, joined by connections between their inputs and outputs, and so on down in a hierarchy.

The root flow cannot have any input or output. As such, all data flows start or end in the root flow. What you might consider "outputs", such as printing to standard output, is done by describing a connection to a context functions that interacts with the environment.

Flows in General

Any flow can contain references to functions it uses, plus zero or more references to nested flows via Process References, and so on down.

Data flows internally between sub-flows and functions (collectively known as "processes"), as defined by the connections.

All computation is done by functions. A flow is just a hierarchical organization method that allows to group and abstract groups of functions (and sub-flows) into higher level concepts. All data that flows originates in a function and terminates in a function.

flow and sub-flow nesting is just an organizational technique to facilitate encapsulation and re-use of functionality, and does not affect program semantics.

Whether a certain process in a flow is implemented by one more complex function - or by a sub-flow combining multiple, simpler, functions - should not affect the program semantics.

Valid Elements of a flow definition

Valid entries in a flow definition include:

flow - A String naming this flow (obligatory)
docs - An optional name of an associated markdown file that documents the flow
version - A SemVer compatible version number for this flow (Optional)
authors - Array of Strings of names and emails of authors of the flow (Optional)
input|output - 0 or more input/outputs of this flow made available to any parent including it (Note: that the root flow may not contain any inputs or outputs). See IOs for more details.
process - 0 or more references to sub-processes to include under the current flow. A sub-process can be another flow or a function. See Process References for more details.
connection - 0 or more connections between io of sub-processes and/or io of this flow. See Connections for more details.

Complete Feature List

The complete list of features that can be used in the description of flows is:

Flow definitions
- Named inputs and outputs (except root flow which has no parent)
- References to sub-processes to use them in the flow via connections
  - Functions
    - Provided functions
    - Library functions
    - Context functions
  - Sub-flows
    - Arbitrarily from the file system or the web
    - From a library
  - Initializers for sub-process inputs and the flow outputs
    - Once initializers that initialize the input/output with a value just once at the start of flow execution
    - Always initializers that initialize the input/output every time it is emptied by the creation of a job that takes the value.
  - Use of aliases to refer to sub-process with different names inside a flow, facilitating the use of the same function or flow multiple times for different purposes within the sub-flow
- Connections between outputs and inputs within a flow
  - Connections can be formed between inputs to flow or outputs of one process (function or flow) and outputs of the flow or inputs of a process
  - Multiple connections from a source
  - Multiple connections to a destination
  - Connection to/from a default input/output by just referencing the process in the connection
  - Destructuring of output struct in a connection to just connect a sub-part o fit
  - Optional naming of a connection to facilitate debugging
Function definitions
- With just inputs
- With just outputs
- With inputs and outputs
- default single input/output, named single input/output, named multiple inputs/outputs
- author and versioning meta-data and references to the implementation
Libraries of processes (functions and flows) can be built and described, and referenced in flows

Name

A string used to identify an element.

IO

IOs produce or consume data of a specific type, and are where data enters/leaves a flow or function (more generally referred to as "processes").

name - used to identify an input or output in connections to/from it
type (optional) - An optional Data type for this IO

Default inputs and outputs

If a function only has one input or one output, then naming that input/output is optional. If not named, it is referred to as the default input. Connections may connect data to/from this input/output just by referencing the function.

Generic Inputs or Outputs

If an input or output has no specific Data type specified, then it is considered generic and can take inputs of any type. What the function does, or what outputs it produces, may vary depending on the input type at runtime and should be specified by the implementor of the function and understood by the flow programmer using it.

Example: A print function could accept any type and print out some human readable representation of all of them.

Example: An add function could be overloaded and if provided two numbers it would sum them, but if provided two strings it could concatenate them.

Process Reference

Flows may reference a another flow or a function (generically referred to as a process) which is defined in a separate definition file. These are "process references"

Process Reference Fields

source - A Url (or relative path) of a file/resource where the process is defined.

For example, here we reference a process called stdout (see context functions)

[[process]]
source = "context://stdio/stdout"

This effectively brings the function into scope with the name stdout and it can then be used in connections as a source or destination of data.

Alias for a Process Reference

alias - an alias to use to refer to a process in this flow.
- This can be different from the name defined by the process itself
- This can be used to create two or more instances of a process in a flow, and the ability to refer to them separately and distinguish them in connections.

For example, here the process called add is aliased as sumand then can be referred to using sumin connections.

[[process]]
alias = "sum"
source = "lib://flowstdlib/math/add"

Source Url formats

The following formats for the source Url are available:

No "scheme" in the URI --> file: is assumed. If the path starts with / then an absolute path is used. If the path does not start with / then the path is assumed to be relative to the location of the file referring to it.
file: scheme --> look for process definition file on the local file system
http: or https: scheme --> look for process definition file on a the web
lib: --> look for process in a Library that is loaded by the runtime. See flow libraries for more details on how this Url is used to find the process definition file provided by the library.
context: --> a reference to a function in the context, provided by the runner application. See context functions for more details on how the process definition file is used.

File source

This is the case when no scheme or the file:// scheme is used in the source Url. The process definition file is in the same file system as the file referencing it.

in the flow's directories, using relative file paths
- e.g. source = "my_function"
- e.g. source = "my_flow"
- e.g. source = "subdir/my_other_function"
- e.g. source = "subdir/my_other_process"
in a different flow's directories, using relative file paths
- e.g. source = "../other_flow/other_function"
- e.g. source = "../other_flow/other_flow"
elsewhere in the local file system, using absolute paths
- e.g. source = "/root/other_directory/other_function"
- e.g. source = "/root/other_directory/other_flow"

Web Source

When the http or https Url scheme is used for source the process definition file is loaded via http request to the specified location.

e.g. source = "http://my_flow_server.com/folder/function"
e.g. source = "https://my_secure_flow_server.com/folder/flow"

Initializing an input in a reference

Inputs of a referenced process may be initialized, in one of two ways:

once - the value is inserted into the input just once on startup and there after it will remain empty if a value is not sent to it from a Process.
always - the value will be inserted into the input each time after the process runs.

Example, initializing the add function's i1 and ì2 inputs to 0 and 1 respectively, just once at the start of the flow's execution.

[[process]]
source = "lib://flowstdlib/math/add"
input.i1 = { once =  0 }
input.i2 = { once =  1 }

Example, initializing the add function's i1 input to 1 every time it runs. The other input is free to be used in connections and this effectively makes this an "increment" function that adds one to any value sent to it on the i2 input.

[[process]]
source = "lib://flowstdlib/math/add"
input.i1 = { always =  1 }

Initializing the default input

When a process only has one input, and it is not named, then you can refer to it by the name default for the purposes of specifying an initializer

Example, initializing the sole input of stdout context function with the string "Hello World" just once at the start of flow execution:

[[process]]
source = "context://stdio/stdout"
input.default = {once = "Hello World!"}

Function Definitions

A function is defined in a definition file that should be alongside the function's implementation files (see later)

Function Definition Fields

function - Declares this files is defining a function and defines the name of the function. This is required to link the definition with the implementation and allow the flow compiler to be able to find the implementation of the function and to include it in the generated project. name must match exactly the name of the object implemented.
source - the file name of the file implementing the function, relative to the location of the definition file
docs - a markdown file documenting the function, relative to the location of the definition file
input - zero (for impure)|one (for pure) or more inputs (as per IO)
output - zero (for impure)|one (for pure) or more outputs (as per IO)
impure - optional field to define an impure function

Types of Function Definitions

Functions may reside in one of three locations:

A context function provided by a flow running applications, as part of a set of functions it provides to flows to allow them to interact with the environment, user etc. E.g. readline to read a line of text from STDIN.
A library function provided by a flow library, that a flow can reference and then use to help define the overall flow functionality. E.g. add from the flowstdlib library to add two numbers together.
A provided function where the function's definition and implementation are provided within the flow hierarchy. As such they cannot be easily re-used by other flows.

`Impure` (or `context`) functions

An impure function is a a function that has just a source of data (e.g. stdin that interacts with the execution environment to get the data and then outputs it) or just a sink of data (e.g. stdout that takes an input and passes it to the execution environment and produces no output in the flow).

The output of an impure function is not deterministic based just on the inputs provided to it but depends on the system or the user using it. It may have side-effects on the system, such as outputting a string or modifying a file.

In flow these are referred to as context functionsbecause they interact with (and are provided by) the execution context where the flow is run. For more details see context functions

Impure functions should only be defined as part of a set of context functions, not as a function in a library nor as a provided function within a flow.

Impure functions should declare themselves impure in their definition file using the optional impure field.

Example, the stdin context function declares itself impure

function = "stdin"
source = "stdin.rs"
docs = "stdin.md"
impure = true
...

`Pure` functions

Functions that are used within a flow (whether provided by the flow itself or from a library) must be pure (not depend on input other than the provided input values nor have no side-effects in the system) and have at least one input and one output.

If they had no input, there would be no way to send data to it and it would be useless
If it had no output, then it would not be able to send data to other functions and would also be useless

Thus, such a pure function can be run anytime, anywhere, with the same input and it will produce the same output.

Function execution

Functions are made available to run when a set of inputs is available on all of its inputs. Then a job is created containing one set of input values (a value taken from each of it's inputs) and sent for execution. Execution may produce an output value, which using the connections defined, will be passed on to the connected input of one or more other functions in the function graph. That in turn may cause that other function to run and so on and so forth, until no function can be found available to run.

Default inputs and outputs

Types

By default flow supports JSON types:

null
boolean
object
array
number
string

Connection

Connections connect a source of data (via an IO Reference) to a sink of data (via an IO Reference) of a compatible type within a flow.

name (Optional) - an Optional name for the flow. This can be used to help in debugging flows
from = IO Reference to the data source that this connection comes from
to = IO Reference to a data sink that this connection goes to

Connections at multiple level in flow hierarchy

A flow is a hierarchy from the root flow down, including functions and sub-flows (collectively sub-processes).

Connections are defined within each flow or sub-flow from a source to a destination.

Within a flow sources include:

an input of this flow
an output from one of the sub-processes

and destinations include

an input of one of the sub-processes
an output of this flow

A connection may be defined with multiple destinations and/or there maybe multiple connections a one source or to a destination.

Connection "branching"

Within a sub-flow there may exist a connection to one of it's outputs, as a destination. At the next level up in the flow hierarchy that sub-flow output becomes a possible source for connections defined at that level.

Thus a single connection originating at a single source in the sub-flow may "branch" into multiple connections, reaching multiple destinations.

Connection feedback

It is possible to make a connection from a process's output back to one of its inputs. This is useful for looping, recursion, accumulation etc as described later in programming methods

Connection from input values

The input values used in an execution are made available at the output alongside the output value calculated, when completes execution. Thus a connection can be formed from this input value and the value is sent via connections when the function completes, similar to the output value. It is also possible to feedback this input value back to the same or different input for use in recursion. An example of this can be seen in the fibonacci example flow definition.

# Loop back the input value #2 from this calculation, to be the input to input #1 on the next iteration
[[connection]]
from = "add/i2"
to = "add/i1"

Connection Gathering and Collapsing

When a flow is compiled, sources of data (function outputs) are followed through the through layers of sub-flows/super-flow definition of the flow hierarchy and the resulting "tree" of connections to be eventually connected (possibly branching to become multiple connections) to destination(s).

The chain of connections involved in connecting a source to each of the destinations is "collapsed" as part of the compilation process, to leave a single connection from the source to each of the destinations.

Connection Optimizing

Thru flow re-use, come connections may end up not reaching any destination. The compiler optimizes these connections away by dropping them.

If in the process of dropping dead connections a function ends up not having any output and/or input (for "pure functions) it maybe removed, and an error or warning reported by the compiler.

IO References

An IO Reference uniquely identifies an Input/Data-source (flow/function) or an Output/Data-sink in the flow hierarchy.

If any flows or functions defined in other files are referenced with an alias, then it should be used in the IO references to inputs or outputs of that referenced flow/function.

Thus valid IO reference formats to use in connections are:

Data sinks

input/{input_name} (where input is a keyword and thus a sub-flow cannot be named input or output)
{sub_process_name}/{output_name} or {sub_process} for the default output

Where sub_process_name is a process referenced in this flow, and maybe a function or a sub-flow. The reference use the process's name (if the process was not given an alias when referenced) or it's alias.

Data sinks

output/{output_name} (where output is a keyword and thus a sub-flow cannot be named input or output)
{sub_process_name}/{input_name} or {sub_process} for the default input

Selecting parts of a connection's value

A connection can select to "connect" only part of the data values passed on the source of the connection. See below Selecting sub-structures of an output for more details.

Run-time Semantics

An input IO can be connected to multiple outputs, via multiple connections.

An output IO can be connected to multiple inputs on other flows or functions via multiple connections.

When the data is produced on the output by a function the data is copied to each destination function using all the connections that exists from that output.

Data can be buffered at each input of a function.

The order of data arrival at a functions input is the order of creation of jobs executed by that function. However, that does not guarantee order of completion of the job.

A function cannot run until data is available on all inputs.

Loops are permitted from an output to an input, and are used as a feature to achieve certain behaviours.

When a function runs it produces a result that can contain an output. The result also contains all the inputs used to produce any output. Thus input values can be reused by connecting from this "output input-value" in connections to other processes, or looped back to an input of the same function.

Example, the fibonacci example uses this to define recursion.

...
# Loop back the input value #2 from this calculation, to be the input to input #1 on the next iteration
[[connection]]
from = "add/i2"
to = "add/i1"
...

Type Match

For a connection to be valid and used in execution of a flow, the data source must be found, the data sink must be found and the two must be of compatible DataTypes.

If those conditions are not met, then a connection will be dropped (with an error message output) and the flow will attempted to be built and executed without it.

By not specifying the data type on intermediary connections thru the flow hierarchy, the flow author can enable connections that are not constrained by the intermediate inputs/outputs used and those types are not need to be known when the flow is being authored. In this case the type check will pass on the intermediate connections to those "generic" inputs our output.

However, once the connection chain is collapsed down to one end-to-end connection, the source and destination types must also pass the type check. This includes intermediate connections that may select part of the value.

Example

Subflow 1 has a connection: A function series with default output Array/Number --> Generic output of the subflow
- The destination of the connection is generic and so the intermediate type check passes
Root flow (which contains Subflow 1) as a connection: Generic output of the subflow --> Function add input i1 (which has a data type Number) that includes selection of an element of the array of numbers /1
- The source is generic, so the intermediate type check passes
A connection chain is built from the series output thru the intermediate connection to the add function input i1
The connection chain is collapsed to a connection from the Array element of index 1 of the series function's output to the add functions input i1
The from and totypes of this collapsed connection are both Number and so the type check passes

Runtime type conversion of Compatible Types

The flow runtime library implements some type conversions during flow execution, permitting non-identical types from an output and input to be connected by the compiler, knowing the runtime will handle it.

These are know as compatible types. At the moment the following conversions are implemented but more maybe added over time:

Matching Types

Type 'T' --> Type 'T'. No conversion required.

Generics

Generic type --> any input. This assumes the input will check the type and handle appropriately.
Array/Generic type --> any input. This assumes the input will check the type and handle appropriately.
any output --> Generic type. This assumes the input will check the type and handle appropriately.
any output --> Array/Generic type. This assumes the input will check the type and handle appropriately.

Array Deserialization

Array/'T' --> 'T'. The runtime will "deserialize" the array and send it's elements one-by-one to the input. NOTE that 'T' maybe any type, including an Array, which is just a special case.
Array/Array/'T' --> 'T'. The runtime will "deserialize" the array of arrays and send elements one-by-one to the input

Array Wrapping

'T' --> Array/'T'. The runtime will take the value and wrap it in an array and send that one-element array to the input. Again, 'T' can be any type, including an Array.
'T' --> Array/Array/'T'. The runtime will take the value and wrap it in an array in an array and send that one-element array of arrays to the input.

Default input or output

If a function only has one input or one output, then naming that input/output is optional. If not names it is referred to as the default input. Connections may connect data to/from this input just by referencing the function.

Example, the stdout context function only has one input and it is not named

function = "stdout"
source = "stdout.rs"
docs = "stdout.md"
impure = true

[[input]]

and a connection to it can be defined thus:

[[connection]]
from = "add"
to = "stdout"

Named inputs

If an input is defined with a name, then connections to it should include the function name and the input name to define which input is being used.

Example

[[connection]]
from = "add"
to = "add/i2"

Selecting an output

When a function runs it produces a set of outputs, producing data on zero or more of it's outputs, all at once.

A connection can be formed from an output to another input by specifying the output's route as part of the IO Reference in the from field of the connection.

Example:

[[connection]]
from = "function_name/output_name"
to = "stdout"

Selecting sub-structures of an output

As described in types, flow supports Json data types. This includes two "container types", namely: "object" (a Map) and "array".

If an output produces an object, a connection can be formed from an entry of the map (not the entire map) to a destination input. This allows (say) connecting a function that produces a Map of strings to another function that accepts a string. This is done extending the route used in the IO Reference of the connection with the output name (to select the output) and the key of the map entry (to select just that map entry).

Example: function called "function" has an output named "output" that produces a Map of strings. One of those Map entries has the key "key". Then the string value associated with that key is used in the connection.

[[connection]]
from = "function/output/key"
to = "stdout"

Similarly, if the output is an array of values, a single element from the array can be specified in the connection using a numeric subscript.

Example: function called "function" has an output named "output" that produces an array of strings. Then a single string from the array can be sent to a destination input thus:

[[connection]]
from = "function/output/1"
to = "stdout"

If a function runs and produces an output which does not contain the sub-structure selected by a connection, for the purpose of the destination of that connection it is just as if the output was not produced, or the function had not run. Thus, no value will arrive at the destination function and it will not run.

Connecting to multiple destinations

A single output can be connected to multiple destinations by creating multiple connections referencing the output. But, to make it easier (less typing) to connect an output to multiple destinations the [[connection]] format permits specifying more than one to = "destination".

Example

[[connection]]
from = "output"
to = ["destination", "destination2"]

Flow Libraries

Libraries can provide functions (definition and implementation) and flows (definition) that can be re-used by other flows.

An example library is the flowstdlib library, but others can be created and shared by developers.

Library structure

A flow library's structure is upto the developer to determine, starting with a src subdirectory, with optional sub-directories for modules, and sub-modules.

Native crate structure

In order to support native linking of lib, it must be a valid rust crate and so a Cargo.toml file in the source that references a lib.rs file, that in turn references mod.rs files in sub folder that reference the sources, so that it is all included into the crate when compiled.

Example

[lib]
name = "flowstdlib"
path = "src/lib.rs"

Parallel WASM crate structure - WASM library build speed-up

Each function (see below) contains it's own Cargo.toml used to compile it to WASM. If left like this, then each function will re-compile all of the source dependencies, even if many of the dependencies are shared across all the functions, making the library compile to WASM very slow.

To speed up library builds, a solution ("hack") is used. A cargo workspace is defined in parallel with the Native crate mentioned above, with it's root workspace Cargo.toml in the {lib_name}/src/ folder. This workspace includes as members references to all the Cargo.toml files of the functions (see below). Thus when any of them are compiled they share a single target directory and the common dependencies are only compiled once

Including a flow

Flow definition files may reside at any level. Example, the sequence flow definition in the math module of the flowstdlib library.

Alongside the flow definition a documentation Markdown file (with .md extension) can be included. It should be referenced in the flow definition file using the docs field (e.g. docs = "sequence.md").

Including a function

Each function should have a subdirectory named after function ({function_name}), which should include:

Cargo.toml - build file for rust implementations
{function_name}.toml - function definition file. It should include these fields
- type = "rust" - type is obligatory and "rust" is the only type currently implemented
- function = "{function_name}" - obligatory
- source = "{function_name}.rs" - obligatory and file must exist.
- docs = "{function_name}.md" - optional documentation file that if referenced must exist
{function_name}.md - if references in function definition file then it will be used (copied to output)
{function_name}.rs - referenced from function definition file. Must be valid rust and implement required traits

Compiling a library

Flow libraries are compiled using the flowc flow compiler, specifying the library root directory as the source url.

This will compile and copy all required files from the library source directory into a library directory. This directory is then a self-contained, portable library.

It can be packaged, moved, unpackaged and used elsewhere, providing it can be found by the compiler and runtime (using either the default location $HOME/.flow/lib, FLOW_LIB_PATH env var or -L, --libdir <LIB_DIR|BASE_URL> options).

The output directory structure will have the same structure as the library source (subdirs for modules) and will include:

manifest.json - Generated Library manifest, in the root of the directory structure
*.md - Markdown source files copied into output directory corresponding to source directory
*.toml - Flow and Function definition files copied into output directory corresponding to source directory
*.wasm - Function WASM implementation compiled from supplied function source and copied into output directory corresponding to source directory
*.dot - 'dot' (graphvis) format graph descriptions of any flows in the library source
*.dot.svg - flow graphs rendered into SVG files from the corresponding 'dot' files. These can be referenced in doc files

Lib References

References to flows or functions are described in more detail in the process references section. Here we will focus on specifying the source for a process (flow or function) from a library using the "lib://" Url format.

The process reference to refer to a library provided flow or function is of the form: lib://lib_name/path_to_flow_or_function

Breaking that down:

"lib://" Url scheme identifies this reference as a reference to a library provided flow or function
"lib_name" (the hostname of the Url) is the name of the library
"path_to_flow_or_function" (the path of the Url) is the location withing the library where the flow or function resides.

By not specifying a location (a file with file:// or web resource with http:// or https://) allows the system to load the actual library with it's definitions and implementation from different places in different flow installations thus flows that use library functions are portable, providing the library is present and can be found wherever it is being run.

The flowrlib runtime library by default looks for libraries in $HOME/.flow/lib, but can accept a "search path" where it should also look (using the library's name "lib_name" from the Url)

Different flow runners (e.g. flowrcli or flowrgui or others) provide provide a command line option (-L) to add entries to the search path.

Default locaiton

If the library you are referencing is in the default location ($HOME/.flow/lib) then there is no need to configure the library search path or provide additional entries to it at runtime.

Configuring the Library Search Path

The library search path is initialized from the contents of the $FLOW_LIB_PATH environment variable.

This path maybe augmented by supplying additional directories or URLs to search using one or more instances of the -L command line option.

Finding the references lib process

The algorithm used to find files via process references is described in more detail in the process references section. An example of how a library function is found is shown below.

A process reference exists in a flow with source = "flowstdlib://math/add"

Library name = flowstdlib
Function path within the library = math/add

All the directories in the search path are searched for a top-level sub-directory that matches the library name.

If a directory matching the library name is found, the path to the process within the library is used to try and find the process definition file.

For example, if a flow references a process thus:

[[process]]
source = "flowstdlib://math/add"

Then the directory /Users/me/.flow/lib/flowstdlib is looked for.

If that directory is found, then the process path within the library stdio/stdin is used to create the full path to the process definition file is /Users/me/.flow/lib/flowstdlib/math/add.

(refer to the full algorithm in process references)

If the file /Users/me/.flow/lib/flowstdlib/math/add.toml exists then it is parsed and made available to the flow for use in connections.

Context Functions

Each flow runner application can provide a set of functions (referred to as context functions) to flows for interacting with the execution environment.

They are identified by a flow defining a process reference that uses the context:// Url scheme. (see process references for more details).

In order to compile a flow the compiler must be able to find the definition of the function.

In order to execute a flow the flow runner must either have an embedded implementation of the function or know how to load one.

Different runtimes may provide different functions, and thus it is not guaranteed that a function is present at runtime.

Completion of Functions

Normal "pure" functions can be executed any number of times as their output depends only on the inputs and the (unchanging) implementation. They can be run any time a set of inputs is available.

However, a context function may have a natural limit to the number of times it can be ran during the execution of a flow using it. An example would be a function that reads a line of text from a file. It can be ran as many times as there are lines of text in the file, then it will return End-Of-File and a flag to indicate to the flow runtime that it has "completed" should not be invoked again.

If this was not done, as the function has no inputs, it would always be available to run, and be executed indefinitely, just to return EOF each time.

For this reason, each time a function is run, it returns a "run me again" flag that the runtime uses to determine if it has "completed" or not. If it returns true, then the function is put into the "completed" state and it will never be run again (during that flow's execution)

Specifying the Context Root

At compile time the compiled must know which functions are available and their definitions.

Since it is the flow runner that provides the implementations and knows their definitions, it must make these discoverable and parseable by the compiler as a set of function definition files.

This is done by specifying to the flowc compiled what is called the context root or the root folder of where the targeted runtime's context functions reside.

Context Function Process References

A reference to a context function process (in this case it is always a function) such as STDOUT is of the form:

[[process]]
source = "context://stdio/stdout"

The context:// Url scheme identifies it is a context function and it's definition should be sought below the Context Root. The rest of the Url specifies the location under the Context Root directory (once found).

Example

The flow project directory structure is used in this example, with flow located at /Users/me/flow and flow in the users $PATH.

The fibonacci example flow is thus found in the /Users/me/flow/flowr/examples/fibonacci directory.

The flowrcli flow runner directory is at /Users/me/flow/flowr/src/bin/flowrcli. Within that folder flowrcli provides a set of context function definitions for a Command Line Interface (CLI) implementation in the context sub-directory.

If in the root directory of the flow project, using relative paths, an example flow can be compiled and run using the -C, --context_root <CONTEXT_DIRECTORY> option to flowc:

> flowc -C flowr/src/bin/flowrcli flowr/examples/fibonacci

The flowc compiler sees the "context://stdio/stdout" reference. It has been told that the Context Root is at flowr/src/bin/flowrcli/context so it searches for (and finds) a function definition file at flowr/src/bin/flowrcli/context/stdio/stdout/stdout.toml using the alrgorithm described in process references.

Provided Functions

As described previously, flows can use provided functions provided by the flow runner app (e.g. flowrcli) and by flow libraries.

However, a flow can also provide its own functions (a definition, for the compiler, and an implementation, for the runtime).

The process references section describes the algorithm for finding the function's files (definition and implementation) using relative paths within a flow file hierarchy.

Using relative paths means that flows are "encapsulated" and portable (by location) as they can be moved between directories, files systems and systems/nodes and the relative locations of the provided functions allow them to still be found and the flow compiled and ran.

Examples of Provided Functions

The flowr crate has two examples that provide functions as part of the flow:

Reverse Echo in the folder flowr/examples/reverse-echo - a simple example that provides a function to reverse a string
Mandlebrot in the folder flowr/examples/mandlebrot - provides two functions:
- pixel_to_point to do conversions from pixels to points in 2D imaginary coordinates space
- escapes to calculate the value of a point in the mandlebrot set

What a provided function has to provide

In order to provide a function as part of a flow the developer must provide:

Function definition file

Definition of the function in a TOML file.
Example escapes.toml
The same as any other function definition it must define:

function - field to show this is a function definition file and provide the function's name
source - the name of the implementation file (relative path to this file)
type - to define what type of implementation is provided ("rust" is the only supported value at this time)
input- the function's inputs - as described in IOs
output- the function's outputs - as described in IOs
docs - Documentation markdown file (relative path)
Example escapes.md

Implementation

Code that implements the function of the type specified by type in the file specified by source.
Example: escapes.rs

This may optionally include tests, that will be compiled and run natively.

Build file

In the case of the rust type (the only type implemented!), a Cargo.toml file that is used to compile the function's implementation to WASM as a stand-along project.

How are provided function implementations loaded and ran

If the flow running app (using the flowrliblibrary`) is statically linked, how can it load and then run the provided implementation?

This is done by compiling the provided implementation to WebAssembly, using the provided build file. The .wasm byte code file is generated when the flow is compiled and then loaded when the flow is loaded by flowrlib

Programming Methods

flow provides the following facilities to help programmers create flows:

Encapsulation

Functionality can be encapsulated within a function at the lowest level by implementing it in code, defining the function via a function definition file with it's inputs and outputs and describing the functionality provided by it in an associated markdown file.

Sets of functions, combined together to provide some defined functionality, can be grouped together and connected in a graph in a flow, described in a flow definition file. This "flows"'s functionality can be defined via it's inputs and outputs just like a function, and its functionality described in an associated markdown file.

Flow definitions in turn can reference and incorporate other flows, alongside functions, until the desired functionality is reached.

Thus functionality is encapsulated via a "process" definition file, where a "process" can be defined as a function or a flow.

The mechanism to reference a process in a flow definition file is common for both types, and in fact the flow does not "know" if the process referenced is implemented as a function or a flow. At a later date the functionality of the sub-process should be able to be changed from being a function to a flow (or vice versa) with no semantic difference and no required change on the program and no impact to its execution result.

Semantics of Output sending

WHen a job executes, and it's results received by the runtime, the ouput values (if any) are sent onto the destination functions at the same time, before any other job's results are processes, and before creating any new jobs or dispatching new jobs.

The outputs of a function's jobs are all handled, and sent to their destinations at the same time.

Value deserialization

If the output of a job's function is (say) and array of numbers (array/number) and it is connected in the flow graph to another function who's input is of type number, then that array may be deserialized into a stream of numbers and sent to the destination one after another (all when the job result is being processed).

This can mean that the destination function's input gathers rapidly a number of inputs able to be used in job creation.

The values are sent in order of their appearance of the "higher order structure" (array) that contains them.

Value wrapping

Conversely, if the output value is of lower order that the destination (say a number being sent to an input that accepts array/number) then the runtime may "wrap" the single value in an array and send it to the destination.

Job Creation

Jobs are created by gathering a set of input values from a function's inputs. The job is put into the ready_jobs queue with the values, and a reference to the function's implementation.

The inputs values order at the function's inputs is the order the values were sent to those inputs. The order of jobs created respects this order. So, the order of job creation for a function follows the order of values sent to that function.

When creating jobs, a runtime may decide to create as many jobs as can be created, and increase the potential for parallel execution later.

Thus, for a stream of deserialized values at the function's input, the runtime may attempt to maximize parallelization and creates as many jobs as inputs sets of values it can take. The order of the jobs created will follow the order of the deserialized stream.

Job Dispatch

Different jobs for the same function are independent of each other. They will be dispatched in the order of jobs creation (which follows the order of input value arrival).

When dispatching jobs, a runtime can decide to dispatch as many jobs as possible, or limit the number, in order to increase the potential for parallel execution of the jobs later.

This, many jobs maybe created from the deserialized stream, but the order of the jobs will follow the order of the deserialized stream.

Job Completion Order and Determinism

Jobs maybe executed by the same or a different executor, on the same or a different machine, with the same or different CPU architecture, with jobs being sent and results being received back over the network.

Thus, the order of job completion is not guaranteed to match the order of job creation.

In the deserialized stream case, here the order maybe lost. Thus algorithms exploiting this parallelism in the execution, but requiring to preserve order of the stream for some reason may have to handle the order and preserving it themselves (e.g. adding an index and later combining results using that index).

The order of a flow or sub-flow's output is determined by the data dependencies of the flow expressed in the graph.

Examples of ways to create determinism are:

fibonacci example use of a feedback connection so that one value is used in the calculation of the next value, thus guaranteeing the order of the series.
sequence example use of a "data flow control" function (join) to ensure that a string is not sent to the stdout function until a specific condition (end-of-sequence) is met.
```
# Output a string to show we're done when the Sequence ends
[[process]]
source = "lib://flowstdlib/control/join"
input.data = {once =  "Sequence done"}
```

In imperative, procedural programming we often either assume, or can rely on order, such as the order of execution of statements within a for loop. But with flow and its focus on concurrency this is much less so. A series of jobs (similar to the for loop example) to calculate a number of values, but they maybe all generated at once (or soon after each other) and executed in parallel, with the calculations completing out of order.

Also, in flow libraries, such as flowstdlib, some functions are written differently from what you might expect, don't assume order, and the results maybe different from what you expect. This is reflected in the naming of functions also, such as sequence that is named carefully to communicate that the values are generated in a specific order. The range function does not guarantee order, only that all the numbers in the range will be output. This it may generate the numbers in the range out of order, unlike what one would expect from a procedural language.

Re-use

flow provides a number of mechanisms to help re-use, namely:

definition and implementation of a function once, and then be able to incorporate it into any number of flows later via a process reference
definition of a flow, combining sub-flows and/or functions, into a flow and then be able to incorporate it into any number of flows later via a process reference
definition of portable libraries containing flows and/or functions that can be shared between programmers and incorporate it into any number of flows later via process references

Connection "branching"

As described in more detail in connections, a connection within a re-used flow to one of its outputs can be "branched" into multiple connections to multiple destinations when the flow is compiled, without altering the definition of the original flow.

Control flow via Data flow

In flow, everything is dataflow, and dataflow is everything. There are no other mechanisms to produce values, or coordinate activity. There are no loops, if-then-else or other logic control mechanisms.

The flowstdlib library provides the control module with a a series of functions and flows that you can use to control the flow of data, and hence program "control". These are functions such as:

Looping

Looping is not a natural construct in flow. If we look at how we would translate some use of loops from a procedural language to flow it might illustrate things.

For example, to perform an action or calculation 'n' times, we might well generate a range of 'n' values, create a process that does the desired action or calculation, and then combine the two with a 'data flow control' function such as join. Thus, the action/calculation can only produce an output for use downstream 'n' times, triggered (possibly all in parallel) by the 'n' values that "gate" it's output.

Accumulating

In procedural programming a loop can be used to accumulate a value (such as the total of the values in an array).

In flow there i sno global state and no variables that are persistent for a function across multiple invocations of it.

The mechanism we use to do this in flow is to use the add function, initializing one input Once with zero, sending values to the other input, looping back the output (the partial sum) to the first input, so that the sum (initialized to zero) is accumulated as values flow through it.

The same technique can be used to gather values into "chunks" of a determined size. One input of accumulate is initialized with an empty array ([]), the other input receives the elements to gather, and we feed back the array of elements gathered so far, and so on until the desired size of chunk is accumulated.

Nested Loops

What would be a nested for loop in a procedural program can be implemented by putting two flows in series, with one feeding the other.

For example in the sequence-of-sequences example a first instance of a sequence flow generates a series of "limits" for sequence of sequences to count up to.

A value for the start of each sequence, and the series of sequence limits is fed into another instance of the sequence function. This second flow generates a sequence each time it receives a set of inputs specifying the start and end of the sequence.

a first sequence is defined with start=1, end=10, step = 1 and hence generates: 1..10
a second sequence is defined
- the start input is initialized always to 0
- the step input is initialized always to 1
a connection is defined from the output of the first sequence to the end input of the second sequence
- thus it generates 0,1,0,1,2,0,1,2,3 ending 0,1,2,3,4,5,6,7,8,9,10

Wrapping processes for convenience

Another mechanism used for convenience (it may abbreviate written flows) is to have a simple flow to wrap a function or process for a common use case, maybe initializing an input with a pre-defined value or creating feedback loops around the process to create a specific behaviour.

flowrcli Context Functions

Each flow runner application can provide a set of functions (referred to as context functions) to flows for interacting with the execution environment.

flowrcli is a, Command Line oriented, flow runner, and it provides a set of context functions to interact with the file system and standard input/output.

Args (//context/args)

Functions to handle run-time arguments, command line arguments from invocation, etc

get - get the arguments the flow was invoked with

Args (//context/args/get)

Get the arguments the flow was executed with

Include using

[[process]]
source = "context://args/get"

Inputs

Output

string - Array of Strings of the command line arguments the flow was invoked with.
json - Array of Json parsed values of the command line arguments the flow was invoked with.

File (//context/file)

Functions to interact with the Environment, related to file input and output.

write
read

Write (//context/file/file_write)

Writes bytes of data supplied to the file named filename, creating it if necessary.

Include using

[[process]]
source = "context://file/file_write"

Inputs

bytes - the data to be written to the file
filename - String with the name of the file to be written, absolute or relative to the current working directory of the process invoking the flow.

Outputs

Read (//context/file/file_read)

Reads bytes of data from the file with path path

Include using

[[process]]
source = "context://file/file_read"

Inputs

path - String with the path of the file to be read, absolute (starting with /) or relative to the current working directory of the process invoking the flow.

Outputs

bytes - the raw data data read from the file
string - the data read from the file, as a string
path - String with the path of the file that was read, as was passed to the input.

Image (//context/image)

Functions to write to Images

image_buffer

ImageBuffer (//context/image/image_buffer)

Write pixels to an image buffer

Include using

[[process]]
source = "context://file/image_buffer"

Inputs

pixel - the (x, y) of the pixel
value - the (r, g, b) triplet to write to the pixel
size - the (width, height) of the image buffer
filename - the file name to persist the buffer to

Stdio (//context/stdio)

Functions to interact with the Environment, related to standard input and output (and error).

The values sent to these functions are read from standard input of the process that launched the flow causing the function to block until input (or EOF) is detected. Output is printed on the STDOUT/STDERR of the process invoking the flow.

Readline (//context/stdio/readline)

Read a line of text from the STDIN of the process invoking the flow. The line is terminated by EOL but leading and trailing whitespace are trimmed before being output.

The function will be scheduled for running again, until EOF is detected, after which it will not run again.

Include using

[[process]]
source = "context://stdio/readline"

Inputs

prompty - String prompt, or "" (empty string) can be used for none.

Output

text - Line of text read from STDIN - with leading and trailing whitespace trimmed.
json - Json value parsed from from STDIN

Stdin (//context/stdio/stdin)

Read text from the STDIN of the process invoking the flow until EOF is detected, after which it will not run again. If you wish to get the value of a line (i.e. after ENTER is pressed, then use readline)

Include using

[[process]]
source = "context://stdio/stdin"

Inputs

Output

text - Text read from STDIN - with leading and trailing whitespace (including EOF) trimmed.
json - Json value parsed from from STDIN

Stdout (//context/stdio/stdout)

Output text to the STDOUT of the process invoking the flow. If an array is passed then each element is output on a separate line.

Include using

[[process]]
source = "context://stdio/stdout"

Input

(default) - the object to output a String representation of (String, boolean, Number, array)

Output

Stderr (//context/stdio/stderr)

Output text to the STDERR of the process invoking the flow. If an array is passed then each element is output on a separate line.

Include using

[[process]]
source = "context://stdio/stderr"

Input

(default) - the object to output a String representation of (string, boolean, number, array)

Output

flowrgui's Context Functions

Each flow runner can provide a set of functions (referred to as context functions) to flows for interacting with the execution environment.

flowrgui is a GUI flow runner, and it provides a set of context functions to interact with the file system and standard input/output.

Args (//context/args)

Functions to handle run-time arguments, command line arguments from invocation, etc

get - get the arguments the flow was invoked with

Args (//context/args/get)

Get the arguments the flow was executed with

Include using

[[process]]
source = "context://args/get"

Inputs

Output

string - Array of Strings of the command line arguments the flow was invoked with.
json - Array of Json parsed values of the command line arguments the flow was invoked with.

File (//context/file)

Functions to interact with the Environment, related to file input and output.

write
read

Write (//context/file/file_write)

Writes bytes of data supplied to the file named filename, creating it if necessary.

Include using

[[process]]
source = "context://file/file_write"

Inputs

bytes - the data to be written to the file
filename - String with the name of the file to be written, absolute or relative to the current working directory of the process invoking the flow.

Outputs

Read (//context/file/file_read)

Reads bytes of data from the file with path path

Include using

[[process]]
source = "context://file/file_read"

Inputs

path - String with the path of the file to be read, absolute (starting with /) or relative to the current working directory of the process invoking the flow.

Outputs

bytes - the raw data data read from the file
string - the data read from the file, as a string
path - String with the path of the file that was read, as was passed to the input.

Image (//context/image)

Functions to write to Images

image_buffer

ImageBuffer (//context/image/image_buffer)

Write pixels to an image buffer

Include using

[[process]]
source = "context://file/image_buffer"

Inputs

pixel - the (x, y) of the pixel
value - the (r, g, b) triplet to write to the pixel
size - the (width, height) of the image buffer
filename - the file name to persist the buffer to

Stdio (//context/stdio)

Functions to interact with the Environment, related to standard input and output (and error).

Readline (//context/stdio/readline)

Read a line of text from the STDIN of the process invoking the flow. The line is terminated by EOL but leading and trailing whitespace are trimmed before being output.

The function will be scheduled for running again, until EOF is detected, after which it will not run again.

Include using

[[process]]
source = "context://stdio/readline"

Inputs

prompty - String prompt, or "" (empty string) can be used for none.

Output

text - Line of text read from STDIN - with leading and trailing whitespace trimmed.
json - Json value parsed from from STDIN

Stdin (//context/stdio/stdin)

Include using

[[process]]
source = "context://stdio/stdin"

Inputs

Output

text - Text read from STDIN - with leading and trailing whitespace (including EOF) trimmed.
json - Json value parsed from from STDIN

Stdout (//context/stdio/stdout)

Output text to the STDOUT of the process invoking the flow. If an array is passed then each element is output on a separate line.

Include using

[[process]]
source = "context://stdio/stdout"

Input

(default) - the object to output a String representation of (String, boolean, Number, array)

Output

Stderr (//context/stdio/stderr)

Output text to the STDERR of the process invoking the flow. If an array is passed then each element is output on a separate line.

Include using

[[process]]
source = "context://stdio/stderr"

Input

(default) - the object to output a String representation of (string, boolean, number, array)

Output

Running Flows

In order to run a flow, it must first be compiled. Then a "flow runner" (such as flowrcli) can be used to run the compiled flow manifest.

For convenience, flowc, the flow compiler, compiles the flow, then uses flowrcli to run it for you (unless you specify otherwise). So that is the easiest way to run a flow and is used below.

If you have run make or make install_flow then you will have flowc and flowrcli installed on your system. Be sure they are in your $PATH so that can be invoked directly.

Then you can run flows easily using flowc.

If you do not wish to install flowc then you can run it using cargo from the root of the project directory by substituting cargo run -p flowc -- for flowcin the examples below.

User's terminal Current Working Directory should be the root directory of the flow project

Compiling `flowstdlib` if you have used `cargo install` to install `flowstdlib`

If you have not compiled the project from source using make, then it's probable that flowstdlib has not been compiled to WASM. However, there should be a flowstdlib binary on your system. This should be run, passing it the path to the flowstdlib source folder (the root, not src inside it) in order to compile it.

This will take considerable time, and leave the compiled WASM files in $HOME/.flow/flib/flowstdlib

Finding Libraries

In order for flowc and flowrcli to be able to find library functions, the (default) directory where the flowstdlib is built by default ($HOME/.flow/lib) is searched

Directories to add to the library search path to help find libraries used can be passed to flowc via one or more instances of the -L, --libdir <LIB_DIR|BASE_URL> Option (see below for an example).

Full List of `flowc` Options

See the next section flowc for a description of the command line arguments it accepts.

Example Invocations

flowc -C flowr/src/bin/flowrcli flowr/examples/fibonacci

uses the context_functions provided by flowrcli and run the flow whose root flow is defined in ./flowr/examples/fibonacci/root.toml. Do not pass in any arguments to the flow.
- You should get a fibonacci series output to the terminal,
echo "Hello" | flowc -C flowr/src/bin/flowrcli flowr/examples/reverse-echo - This example reads from STDIN so we echo in some text.
- You may see some output like:
  
  Testing /Users/andrew/workspace/flow/flowr/examples/reverse-echo/reverse/Cargo.toml WASM Project Compiling /Users/andrew/workspace/flow/flowr/examples/reverse-echo/reverse/Cargo.toml WASM project
  
  the first time this example is run as the provided function is tested and compiled to WASM, followed by
  
  olleH
  
  which is the input string "Hello" reversed.
flowc -C flowr/src/bin/flowrcli flowr/examples/fibonacci - You should get a fibonacci series output to the terminal
flowc -C flowr/src/bin/flowrcli flowr/examples/sequence 10 - as previous examples except that after the source_url a flow_argument of "10" is passed in
- A short sequence of numbers (2, 5, 8) and a string will be printed. The "10" represents the maximum of the sequence.

Running a flow from the web

As stated, the source_url can be a Url to a web resource, or a flow definition hosted on a web server.

Example running a flow from the web

We can use a flow that is part of the flow project, where the flow definition is hosted on the web by GitHub:

flowc -C flowr/src/bin/flowrcli "https://raw.githubusercontent.com/andrewdavidmackenzie/flow/master/flowcore/tests/test-flows/hello-world/root.toml"

That will pull the flow definition content from the web, compile it and run it, producing the expected output:

Hello World!

`flowc` Command Line Arguments

flowc is the flow "compiler", although compiling a flow is very different to a procedural language compile.

What it and other components do is described in more detail later in the Internals section.

This section describes command line arguments that can be supplied to flowc and what they are useful for.

Getting help

Use -h, --help (e.g. flowc -h or cargo run -p flowc -- -h) to print out help for the usage of flowc.

This will print something like this:

Usage: flowc [OPTIONS] [source_url] [flow_args]...

Arguments:
  [source_url]    path or url for the flow or library to compile
  [flow_args]...  List of arguments get passed to the flow when executed

Options:
  -d, --debug
          Generate symbols for debugging. If executing the flow, do so with the debugger
  -c, --compile
          Compile the flow and implementations, but do not execute
  -C, --context_root <CONTEXT_DIRECTORY>
          Set the directory to use as the root dir for context function definitions
  -n, --native
          Compile only native (not wasm) implementations when compiling a library
  -L, --libdir <LIB_DIR|BASE_URL>
          Add a directory or base Url to the Library Search path
  -t, --tables
          Write flow and compiler tables to .dump and .dot files
  -g, --graphs
          Create .dot files for graphs then generate SVGs with 'dot' command (if available)
  -m, --metrics
          Show flow execution metrics when execution ends
  -w, --wasm
          Use wasm library implementations when executing flow
  -O, --optimize
          Optimize generated output (flows and wasm)
  -p, --provided
          Provided function implementations should NOT be compiled from source
  -o, --output <OUTPUT_DIR>
          Specify a non-default directory for generated output. Default is $HOME/.flow/lib/{lib_name} for a library.
  -v, --verbosity <VERBOSITY_LEVEL>
          Set verbosity level for output (trace, debug, info, warn, error (default))
  -i, --stdin <STDIN_FILENAME>
          Read STDIN from the named file
  -h, --help
          Print help information
  -V, --version
          Print version information

Options

-d, --debug Generate symbols for debugging. If executing the flow, do so with the debugger
-c, --compile Compile the flow and implementations, but do not execute
-C, --context_root <CONTEXT_DIRECTORY> Set the directory to use as the root dir for context function definitions
-n, --native Compile only native (not wasm) implementations when compiling a library
-L, --libdir <LIB_DIR|BASE_URL> Add a directory or base Url to the Library Search path
-t, --tables Write flow and compiler tables to .dump and .dot files
-g, --graphs Create .dot files for graphs then generate SVGs with 'dot' command (if available)
-m, --metrics Show flow execution metrics when execution ends
-w, --wasm Use wasm library implementations (not any statically linked native implementations) when executing flow
-O, --optimize Optimize generated output (flows and wasm)
-p, --provided Provided function implementations should NOT be compiled
-o, --output <OUTPUT_DIR> Specify the output directory for generated manifest
-v, --verbosity <VERBOSITY_LEVEL> Set verbosity level for output (trace, debug, info, warn, error (default))
-i, --stdin <STDIN_FILENAME> Read STDIN from the named file
-h, --help Print help information
-V, --version Print version information

`source_url`

After the Options you can supply an optional field for where to load the root flow from. This can be a relative or absolute path when no Url scheme is used, an absolute path if the file:// scheme is used or a web resources if either the http or https scheme is used.

If no argument is supplied, it assumes the current directory as the argument, and continues as below
If it's a directory then it attempts to load "root.toml" from within the directory
If it's a file then it attempts to load the root flow from that file

`flow_args`

If a flow directory or filename is supplied for source_url, then any arguments after that are assumed to be arguments for the flow itself. When it starts executing it can retrieve the value of these parameters using context functions.

Passing Command Line Arguments

Arguments are passed to the flow being executed by flowc by placing them after the flow name in the execution string (either using cargo run -p flowc or flowc directly). e.g. cargo run -p flowc -- flowr/examples/mandlebrot mandel.png 4000x3000 -1.20,0.35 -1,0.20

The context functions include a function called args/get that can be used to read the arguments, allowing them to then be processed in the flow like any other inputs.

Include the args/get function in your flow:

[[process]]
source = "lib://flkowr/args/get"

Then create a connection from the desired output (second arg in this example) of args/get to another function:

[[connection]]
from = "function/get/2"
to = "function/parse_bounds/input"

Specifying the flow's root file to load

Supported File Extensions and Formats

flowc supports TOML, JSON and YAML file formats. It assumes these file extensions: ".toml", "yaml"|"yml" or "json".

Flow root file argument

The flow "path" argument (if present) can be a local (relative or absolute) file name, a "file:///" Url or an "http://" or "https://" Url.

When the argument is not present it assumes a local file is being loaded, from the Current Working Directory, using the Local File algorithm described below.

When the "file:///" Url scheme is used it assumes a local file as described below.

When "http://" or "https://" schemes are used, it will use the Url loading algorithm described below.

Local File

flowc tries to load a flow from it's root file using one of these three methods:

If an existing directory path is specified, it looks for the default root flow file name ("root.{}") in that directory, for each of the supported extensions. The first matching filename.extension is loaded.
- E.g. flowc will load ./root.toml if it exists
- E.g. flowc dirname will load ./dirname/root.toml if the file exists
- E.g. flowc /dirname will load /dirname/root.toml if the file exists
If a path to an existing file is passed, it uses that as the filename of the flow root file.
- E.g. flowc path/to/root.toml will load root.toml from the ./path/to/ directory
- E.g. flowc path/to/root.yaml will load root.yaml from the ./path/to/ directory, even if root.json and root.toml also exist
If a path to an non-existent file or directory is passed, it will look for matching files with supported extensions
- E.g. flowc root will load ./root.toml if it exists in the Current Working Directory
- E.g. flowc root will load ./root.json if root.toml doesn't exist but root.json does
- E.g. flowc path/to/root will load path/to/root.toml if it exists
- E.g. flowc path/to/root will load root.yaml from the ./path/to/ directory, if it exists and root.toml does not
If a path to an existing directory is specified, it looks for a file named ("dirname.{}") in that directory (where dirname is the name of the directory), for each of the supported extensions.

Urls and loading from the web

The flow root file (http resource) will attempt to be loaded from the Url thus:

The Url supplied, as-is
The Url supplied, appending each of the supported extensions (see above)
The Url supplied, appending "/root.{extension}" for each of the supported extensions
The Url supplied, appending "/" and the last path segment, for each of the supported extensions

Why the dirname option?

The dirname option above in the file and url algorithms is used to be able to name a flow (or library or other file) after the directory it is in, and have it found specifying a shorter filename or url. Thus path/dirname will find a file called path/dirname/dirname.toml.

Standard Input

context provides functions to read from STDIN. You can pipe input to the flow by piping it to the cargo run -p flowc or flowc command line used to execute the flow.

If not piped in, then the stdin function will attempt to read STDIN, blocking that function until input (or EOF) is provided. If input is read then it will be passed on by that function at its output. The function will indicate to the run-time that it should be run again (to read more lines of STDIN) and it will be re-added to the ready list and executed again later.

When EOF is detected, that function will indicate to the run-time that it does not want to be run again and will not be added back to the ready list for re-execution.

Standard Output & Standard Error

context provides functions to send output to STDOUT/STDERR. This output is printed on stdout or stderr of the process that executed the cargo run -p flowc or flowc command to execute the flow.

Writing to Files

context supplies the file_writefunction (context://file/file_write) that allows flows to write to files hosted by the file system where the flow runner is running.

Here is an example of a flow that writes the ASCII string "hello" to a file called "pipe":

flow = "test"

[[process]]
source = "context://file/file_write"
input.bytes = { once = [104, 101, 108, 108, 111] }
input.filename = { once = "pipe" }

You can run that flow from the command line using:flowc -C flowr/src/bin/flowrcli root.toml

and see that it has worked using: cat pipe

which will show the text hello

Now clean-up: rm pipe

Named Pipes

On most *nix systems (including macos) there exists what are called "named pipes", which allow interprocess communication via something that looks to them like files.

An example of how to use that, using the above flow is:

Terminal Window 1
- mkfifo pipe
- cat pipe
- (process should block reading from that file and not display anything)
Terminal Window 2
- Run the flow as before using flowc -C flowr/src/bin/flowrcli root.toml
- The process blocked above in Terminal Window 1 will unblock and display hello
- The flow will run to completion in Terminal Window 2

You can also run the flow first, it will block writing to the pipe, and then read from the pipe using cat pipe. Both processes will run to completion and hello will be displayed.

Exceptions and Panics

Currently, there are no special provisions for handling or recovering from run-time exceptions. The functions are implemented in rust and when they fail they will panic as usual in rust. The panic will be caught by the runtime and a crash avoided, and an error logged, but nothing else is done.

This may cause the result of the flow to not be what is expected, or to terminate early due to lack of jobs to execute.

Running flows with `flowrcli`

In order to run a flow, it must first be compiled. Then a "flow runner" such as flowrcli can be used to run the compiled flow manifest).

Flow runners in general and flowrcli run the compiled flow manifest (by default named manifest.json).

In order to compile a flow definition down to a flow manifest that can be run, you use flowc as usual, with the addition of the -c, --compile option. This compiles the flow but does not invoke flowrcli to run it.

Then flowrcli as described below can be used to run the compiled flow.

This section describes command line arguments that can be supplied to flowrcli and what they are useful for.

Getting help for `flowrcli`

Use -h, --help (e.g. flowc -h or cargo run -p flowc -- -h) to print out help for the usage of flowc.

This will print something like this:

Usage: flowr [OPTIONS] [flow-manifest] [flow_args]...

Arguments:
  [flow-manifest]  the file path of the 'flow' manifest file
  [flow_args]...   A list of arguments to pass to the flow.

Options:
  -d, --debugger                     Enable the debugger when running a flow
  -m, --metrics                      Calculate metrics during flow execution and print them out when done
  -n, --native                       Link with native (not WASM) version of flowstdlib
  -s, --server                       Launch flowr with a coordinator only, no client
  -c, --client <port>                Launch flowr with a client only, no coordinator, to connect to a flowr coordinator
  -C, --context                      Execute only 'context' (not general) jobs in the coordinator
  -j, --jobs <MAX_JOBS>              Set maximum number of jobs that can be running in parallel)
  -L, --libdir <LIB_DIR|BASE_URL>    Add a directory or base Url to the Library Search path
  -t, --threads <THREADS>            Set number of threads to use to execute jobs (min: 1, default: cores available)
  -v, --verbosity <VERBOSITY_LEVEL>  Set verbosity level for output (trace, debug, info, warn, default: error)
  -h, --help                         Print help information
  -V, --version                      Print version information

Similarly to flowc, in order to locate libraries used in flow execution, flowrcli needs to know where to locate them. As for flowc, you can rely on the default ($HOME/.flow/lib), modified using the $FLOW_LIB_PATH environment variable, or using one or more instance of the -L, --libdir <LIB_DIR|BASE_URL> option.

`flow-manifest`

If no argument is supplied, it assumes the current directory as the argument, and continues as below
If it's a directory then it attempts to load "root.toml" from within the directory
If it's a file then it attempts to load the root flow from that file

`flow_args`

Any arguments after flow-manifest are assumed to be arguments for the flow itself. When it starts executing it can retrieve the value of these parameters using context functions.

Example Invocations

For each of these examples, there is first a flowc line showing how the flow can be compiled. This will leave a compiled manifest.json flow manifest alongside the flow's root definition file. That manifest is then run using flowrcli

flowc -C flowr/src/bin/flowrcli -c flowr/examples/fibonacci - compile the fibonacci example only
flowrcli flowr/examples/fibonacci - run the pre-compiled fibonacci example flow manifest
- You should get a fibonacci series output to the terminal,
flowc -C flowr/src/bin/flowrcli -c flowr/examples/sequence - compile the flow only, do not run it
flowrcli flowr/examples/sequence 10 - run the compiled flow, a short sequence of numbers (2, 5, 8) and a string will be printed. The "10" represents the maximum of the sequence.
flowrcli flowr/examples/sequence/manifest.json 10 - run the compiled flow, specifying the full path to the manifest.json file

`flowrgui`

Similar to flowrcli that interacts with the terminal and the file system for IO, flowrgui is another runner for flows, but with a Graphical User Interface (GUI). It displays STDIO and STDERR on the UI, shows images written to visually and tracks writes to files during execution.

Most (but not all) of the same command line options as flowrcli are supported, and help can be see using:

flowrgui --help

Running a flow in client/server mode of `flowrcli`

`flowrlib` architecture

The flowrlib library is designed to be used, not just in flowrcli CLI-style flow runners, but in other incarnations such as a GUI application, or web application, etc.

In order to have flowrlib work well in such applications, it avoids running any context function function that interacts with the environment (Read/Write to a FIle, Read/Write to STDIO, etc) and that may block, on the main thread running the "coordinator" that managers flow execution.

Different applications, like a GUI App, may need to provide totally different implementations for some of those functions, provided by the application and not the library.

For this reason, it implements a "client/server" architecture, where a "server" thread runs the coordinator and sends and receives messages to a client thread (in the flow runner app) that runs the context functions whose implementations are provided by the flow runner application that links the flowrlib library.

`flowrcli` - an example of a flow runner app

flowrcli is one example of a flow runner app that uses flowrlib to build an application to run flows.

It implements a set of client function, that interact with STDIO etc, on a client thread.

The flowrcli process running that client thread must be able to interact with STDIO.

In normal use, flowrcli runs the client and server threads in the same process and the user is unaware of this separation.

Separating the client from the server

However, flowrcli can be run as two separate processes, one "client" process that executes the context functions and interacts with STDIO, and another "server" process with a thread that that runs the coordinator plus a number of threads running executors for job execution.

These two "client" and "server" processes exchange messages over the network. The two processes can be on the same node/machine or on separate machines. The one running the "client" should be able to interact with the FileSystem and STDIO and interact with the user. The "server" does not run any such function and does not need to interact with the user.

They use mDNS and service discovery to discover the network address and port of the other process, running within the same network.

Example of running a flow with "client" separate from "server"

First let's compile the fibonacci example (but not run it) by using flowc with the -c, --compile option:

> flowc -c -C flowr/src/bin/flowrcli flowr/examples/fibonacci

Let's check that worked:

> ls flowr/examples/fibonacci/manifest.json
flowr/examples/fibonacci/manifest.json

In Terminal 1, lets start the server that will wait for a flow to be submitted for execution, using flowrcli with debug logging verbosity level to be able to see what it's doing.

> flowrcli -n -s -v debug

which will log some lines, ending with:

INFO - Server is waiting to receive a 'Submission'

In Terminal 2, let's start a client using flowrcli with the -c, --client option. This will submit the flow to the server for execution over the network, reading the flow manifest from the File System. It will then execute the client functions, in response to messages from the server, providing STDIO (just standard out in this example)

> flowr -c flowr/examples/fibonacci

That will produce the usual fibonacci series on the STDOUT of Terminal 2.

Logs of what is happening in order to execute the flow will be produced by the server in Terminal 1, ending with

INFO - Server is waiting to receive a 'Submission'

which indicates the server has returned to the initial state, ready to receive another flow for execution.

You can execute the flow again by repeating the same command in Terminal 2.

In order to exit the server, in Terminal 1 just hit Control-C.

Distributed execution of jobs with `flowrcli` and `flowrex`

Job Dispatch and Job Execution

The flowrlib that is used by flow runner applications to execute a flow has two important functions:

job dispatch - that managers the state of the flow, the dispatch of jobs for execution, and distribution of results received back, passing those results onto other functions in the flow etc.
job execution - this is the execution of "pure" functions, receiving a set of input data, a reference to the function's implementation. It executes it with the provided input, and returns the job including the results.

Job dispatch is done by the server thread running the coordinator, responsible for maintaining a consistent state for the flow and it's functions and coordinating the distribution of results and enabling of new functions to be run.

Additional threads are started for job execution, allowing many jobs to be executed concurrently, and in parallel on a multi-core machine. Job execution on "pure" functions can run in isolation, just needing the input data and the function implementation.

Normal Execution

Normally, the flowrcli process runs the coordinator in one thread and a number of executors in additional threads.

However, due to the "pure" nature of the job execution, it can be done anywhere, including in additional processes, or on processes in additional machines.

`flowrex` executor binary

florex is an additional small binary that is built. It cannot coordinate the execution of a flow but it can execute (just library for now) jobs.

Additional instances of flowrex can be started in other processes on the same machine and have it execute some of the jobs, increasing compute resources and concurrency/parallelism of flow execution.

It is possible to start flowrcli with 0 executor threads and force flowrex to execute all the (library) jobs.

It can also be ran on another node, even one with a different architecture such as ARM, on the network and have job execution done entirely by it or shared with flowr.

How many jobs are done in one process/machine or another depends on the number of executors and network and cpu speed.

The flowrcli flow runner and the flowrex job executor discover each other using mDNS and then jobs are distributed out over the network and results are sent back to the coordinator running in flowrcli also over the network.

TODO

It is pending to allow flowrec to also execute provided functions, by distributing the architecture-neutral WASM function implementations to other nodes and hence allow them to load and run those functions also.

Example of distributed execution

This can be done in two terminals on the same machine, or across two machines of the same or different CPU architecture.

Terminal 1

Start an instance of flowrex that will wait for jobs to execute. (we start with debug logging level to see what's happening)

> flowrex -v debug

The log output should end with

INFO - Waiting for beacon matching 'jobs._flowr._tcp.local'

indicating that it is waiting to discover the flowrcli process on the network.

Terminal 2

First let's compile the fibonacci example (but not run it) by using flowc with the -c, --compile option:

> flowc -c -C flowr/src/bin/flowrcli flowr/examples/fibonacci

Let's check that worked:

> ls flowr/examples/fibonacci/manifest.json
flowr/examples/fibonacci/manifest.json

Then let's run the example fibonacci flow, forcing zero executors threads so that we see flowrex executing all (non context) jobs

> flowr -t 0 flowr/examples/fibonacci

That will produce the usual fibonacci series on the STDOUT of Terminal 2, then flowrcli exiting

Logs of what is happening in order to execute the flow jobs will be produced in Terminal 1, ending with the same line as before:

INFO - Waiting for beacon matching 'jobs._flowr._tcp.local'

Indicating that it has returned to the initial state and is ready to discover a new flowr dispatcher of jobs to it.

The Flow Debugger

NOTE: To be able to use the flow debugger that is part of flowrcli, flowrcli must be compiled with the "debugger" feature enabled. If not, the debugger code is not included in flowrcli.

Compiling with Debug Symbols

The debugger can be used to debug any flow, but flows compiled by flowc using the -g or --symbols option will have extra human readable content included in the compiled manifest (names of processes etc) and be more convenient to debug.

Running the flow with the debugger

To start debugging a flow, run it using flowrcli as normal, but using the -d or --debugger options.

The compiled flow manifest will be loaded by flowrcli as usual, functions initialized and a command prompt for the debugger will be shown.

You can use the 'h' or 'help' command at the prompt to to get help on debugger commands.

If you want to inspect the state of the flow at a particular point to debug a problem or understand its execution then you will probably want to either set some breakpoints initially before running the flow, or to step through the flow's execution one function invocation at a time.

Those can be done using the Break command to set breakpoints, the List command to list breakpoints set, the Run command to start flow execution, the Continue command to continue execution after a breakpoint triggers, and the Step command to step forward one function invocation.

Debugger Commands

Break: Set a breakpoint on a function (by id), an output or an input using spec: ** function_id ** source_id/output_route ('source_id/' for default output route) ** destination_id:input_number ** blocked_process_id->blocking_process_id
Continue: Continue execution until next breakpoint or end of execution
Delete a breakpoint: Delete the breakpoint matching {spec} or all breakpoints with '*'
Exit: Stop flow execution and exit debugger
Help: Display this help message
List breakpoints: List all breakpoints
Print: Print the overall state, or state of process number 'n'
Quit: Stop flow execution and exit debugger (same as Exit)
Run: Run the flow or if running already then reset the state to initial state
Step: Step over the next 'n' jobs (default = 1) then break
Validate: Run a series of defined checks to validate the status of flow

`flowstdlib` Library

flowstdlib is a standard library of functions and flows for flow programs to use.

Modules

flowstdlib contains the following modules:

Use by the Compiler

In order for the compiler to be able to find the library's flow and function definitions, the directory containing this library must be in the default location ($HOME/.flow/lib), be part of FLOW_LIB_PATH or specified using an instance of the -L command line option to flowc,

NOTE: That flows are compiled down to a graph of functions at compile time, and do not exist at runtime.

Building this library from Source

Libraries like flowstdlib are built using flowc, specifying the library root folder as the source url.

This builds a directory tree (in target/{lib_name}) of all required files for a portable library, including:-

documentation files (.md MarkDown files, .dot graphs of flows, graphs rendered as .dot.svg SVG files)
TOML definition files for flows and functions
Function implementations compiled to a .wasm WASM file
A manifest.json manifest of the libraries functions and where the implementations (.wasm files) can be found. This is used by the Runtime to be able to load it.

Dual nature of flow libraries

Flow libraries such as flowstdlib have a dual nature. They can compiled and linked natively to a binary such as flowr, or when compiled by flowc all the functions implementations are compiled to .wasm WASM files.

Native use by a Runtime

flowr offers the -n/--native option for the flowstdlib to be used natively. When used, functions it contains will be run natively (machine code), as opposed to WASM implementations of the functions.

WASM use by a Runtime

If the -n/--native option is not used, and the library manifest (manifest.json) is found by the flow runner (e.g. flowr) at runtime (using default location, FLOW_LIB_PATH or -L), then the manifest is read and the functions WASM implementations found and loaded.

When a job is executed that requires one of these library functions, the WASM implementation is run.

features

There are no features to enable.

Control (//flowstdlib/control)

Functions and flows to control the flow of data in a flow based on control inputs.

List of Functions and Flows

CompareSwitch (//flowstdlib/control/compare_switch)

Description

Compares two input values and outputs the right hand and left hand values on different outputs, depending on the comparison result:

equal

The left/right value is output on the "equal" output

greater than

The left value is output on the "left-gt", right value on the "right-gt" output

greater than or equal

The left value is output on the "left-gte", right value on the "right-gte" output

less than

The left value is output on the "left-lt" output, right value is output on the "right-lt" output

less than or equal

The left value is output on the "left-lte" output, right value is output on the "right-lte" output

Usage

[[process]]
source = "lib://flowstdlib/control/compare_switch"

Index (//flowstdlib/control/index)

Pass thru a value based on the index of an item in the stream of values

Include using

[[process]]
source = "lib://flowstdlib/control/index"

index_f (//flowstdlib/control/index_f)

A flow that wraps the index function to simplify its use by supplying most frequently used initializers to some inputs.

Use it to select an item from a stream of items by index.

Include using

[[process]]
source = "lib://flowstdlib/control/index_f"

Flow Graph

Click image to navigate flow hierarchy.

Join (//flowstdlib/control/join)

Control the flow of a piece of data by waiting for a second value to be available

Include using

[[process]]
source = "lib://flowstdlib/control/join"

Route (//flowstdlib/control/route)

Route data to one or another based on a boolean control value.

Include using

[[process]]
source = "lib://flowstdlib/control/route"

Select (//flowstdlib/control/select)

Select which data to output, based on a boolean control value.

Include using

[[process]]
source = "lib://flowstdlib/control/select"

Tap (//flowstdlib/control/tap)

Control the flow of data (flow it through this function, or have it disappear) based on a boolean control value.

Include using

[[process]]
source = "lib://flowstdlib/control/tap"

Control (//flowstdlib/control)

Some generic processes that act on data.

List of Functions and Flows

Accumulate (//flowstdlib/data/accumulate)

Accumulate input values into an array upto the limit specified

Include using

[[process]]
source = "lib://flowstdlib/data/accumulate"

Append (//flowstdlib/data/append)

Append two strings

Include using

[[process]]
source = "lib://flowstdlib/data/append"

Count (//flowstdlib/data/count)

Takes a value on it's input and sends the same value on it's output and adds one to the count received on 'count' input and outputs new count on 'count' output

Include using

[[process]]
source = "lib://flowstdlib/data/count"

Duplicate (//flowstdlib/data/duplicate)

Takes a value on it's input and sends the same value factor times in an array output

Include using

[[process]]
source = "lib://flowstdlib/data/duplicate"

Enumerate (//flowstdlib/data/enumerate)

Enumerate the elements of an array

With an input array such as ["a", "b"] it will assign an index to each element and produce an output array of tuples (array of two elements) such as [[0, "a"], [1, "b"]]

Include using

[[process]]
source = "lib://flowstdlib/data/enumerate"

Info (//flowstdlib/data/info)

Output info about the input value

Include using

[[process]]
source = "lib://flowstdlib/data/info"

OrderedSplit (//flowstdlib/data/ordered_split)

Split a string into (possibly) its constituent parts based on a separator.

It guarantees to produce an array of strings, ordered the same as the input string.

Include using

[[process]]
source = "lib://flowstdlib/data/ordered_split"

Remove (//flowstdlib/data/remove)

Remove a value from a vector of values

Include using

[[process]]
source = "lib://flowstdlib/data/remove"

Sort (//flowstdlib/data/sort)

Sort an array of numbers

Include using

[[process]]
source = "lib://flowstdlib/data/sort"

Split (//flowstdlib/data/split)

Split a string into (possibly) two parts and a possible token, based on a separator.

This function is implemented in a deliberate way to be able to showcase parallelization.

Instead of going through the string in order looking for the separator and gathering an array of sections it takes an alternative approach.

It starts in the middle of the string looking for a separator character from there towards the end. If it finds one then the string is split in two and those two sub-strings are output as an array of strings on the partial output. NOTE that either or both of these two sub-strings may have separators within them, and hence need further subdivision.

For that reason, the partial output is feedback to the string input, and the runtime will serialize the array of strings to the input as separate strings.

If from the middle to the end no separator is found, then it tries from the middle backwards towards the beginning. If a separator is found, the two sub-strings are output on partial output as before.

If no separator is found in either of those cases, then the string doesn't have any and is output on the token output.

Thus, strings with separators are subdivided until strings without separators are found, and each of those is output as a token.

Due to the splitting and recursion approach, the order of the output tokens is not the order they appear in the string.

Include using

[[process]]
source = "lib://flowstdlib/data/split"

Zip (//flowstdlib/data/zip)

Takes two arrays of values and produce an array of tuples of pairs of values from each input array.

Include using

[[process]]
source = "lib://flowstdlib/data/zip"

Fmt (//flowstdlib/fmt)

Functions for the formatting of values and conversion from one type to another.

List of Functions and Flows

Reverse (//flowstdlib/fmt/reverse)

Reverse a String

Include using

[[process]]
source = "lib://flowstdlib/fmt/reverse"

ToJson (//flowstdlib/fmt/to_json)

Convert a String to Json

Include using

[[process]]
source = "lib://flowstdlib/fmt/to_json"

ToString (//flowstdlib/fmt/to_string)

Convert an input type to a String

Current types supported are:

null - A null will be printed as "null"
boolean - boolean JSON value
number - A JSON Number
string - a bit redundant, but it works
array - An JSON array of values that can be converted, they are converted one by one
object - a Map of names/objects that will also be printed out

Include using

[[process]]
source = "lib://flowstdlib/fmt/to_string"

Math (//flowstdlib/math)

Math Functions and flows

List of Functions and Flows

Add (//flowstdlib/math/add)

Add two inputs to produce a new output

Include using

[[process]]
source = "lib://flowstdlib/math/add"

Compare (//flowstdlib/math/compare)

Compare two input values and output different boolean values depending on if the comparison is equal, greater than, greater than or equal, less than or less than or equal.

Include using

[[process]]
source = "lib://flowstdlib/math/compare"

Divide (//flowstdlib/math/divide)

Divide one input by another, producing outputs for the dividend, divisor, result and the remainder

Include using

[[process]]
source = "lib://flowstdlib/math/divide"

Multiply (//flowstdlib/math/multiply)

Multiply one input by another

Include using

[[process]]
source = "lib://flowstdlib/math/multiply"

Range (//flowstdlib/math/range)

Generate numbers within a range

Include using

[[process]]
source = "lib://flowstdlib/math/range"

Flow Graph

Click image to navigate flow hierarchy.

Range (//flowstdlib/math/range_split)

Split a range of numbers into two sub-ranges, or output the number if they are the same

Include using

[[process]]
source = "lib://flowstdlib/math/range"

Sequence (//flowstdlib/math/sequence)

Generate a sequence of numbers

Include using

[[process]]
source = "lib://flowstdlib/math/sequence"

Flow Graph

Click image to navigate flow hierarchy.

Sqrt (//flowstdlib/math/sqrt)

Calculate the square root of a number

Include using

[[process]]
source = "lib://flowstdlib/math/sqrt"

Subtract (//flowstdlib/math/subtract)

Subtract one input from another to produce a new output

Include using

[[process]]
source = "lib://flowstdlib/math/subtract"

Control (//flowstdlib/control)

Operations on two dimensional matrices.

List of Functions and Flows

DuplicateRows (//flowstdlib/matrix/duplicate_rows)

Duplicate the rows of a matrix

Include using

[[process]]
source = "lib://flowstdlib/matrix/duplicate_rows"

Multiply (//flowstdlib/matrix/multiply)

Multiply two matrices.

This flow is designed to stress particular aspects of the runtime:

deserialization of an array of objects (a matrix or array/array/numbers in this case) into a lower order structure (array/number in this case, which are the rows and columns of the matrices that are fed to multiply_row).
The send of one value (a matrix with its rows repeated) from a previous function the matrix is deserialized and produces many writes of many values (rows) in one "tick", thus piling up multiple values at the destination function's inputs
When taking those values from the function to create new jobs on the ready queue, the runtime attempts to maximize parallelization and creates as many jobs as inputs sets of values it can take.
When dispatching new jobs for execution, taking those jobs from the ready job queue, the runtime again tries to maximize parallelization and creates many jobs for the same function at once. Those jobs are dispatches and start executing in parallel (how many and in which order depends on the maximum number of parallel jobs allowed, if the limit is set, the number of cores and hence job executors being used, and previous jobs completing on those executors and their input queues).
So, the order of jobs dispatched will match the order of the elements of the original structure that was deserialized.
But the order of completion of jobs in not guaranteed, and they can arrive out of order.
When constructing the final matrix that is the multiplication of the two input matrices, the order of elements in rows, and rows in the matrix is critical.
Thus the matrix multiplication algorithm here attaches row and column indexes as part of the values, and they proceed thru the flowstdlib matrix functions to preserve and combine them into (row,column) pairs.
These pairs are used at the end of the algorithm by compose_matrix to write the elements calculated into the correct (row, column) positions in the matrix, giving the correct result.

Writing algorithms like this, that require strict preservation of order in some parts, while desiring to maximize parallelization of execution, require that extra work and are a bit of a pain to do. We will look into ways that the language and runtime can help make this easier in the past, without breaking the maxim that "things can happen out of order" and programmers should not rely on any inherent order of things happening that is not determined by the data dependencies expressed in the graph.

Include using

[[process]]
source = "lib://flowstdlib/matrix/multiply"

Flow Graph

Click image to navigate flow hierarchy.

MultiplyRow (//flowstdlib/matrix/multiply_row)

Multiply two matrix rows to a product

Include using

[[process]]
source = "lib://flowstdlib/matrix/multiply_row"

Transpose (//flowstdlib/matrix/transpose)

Transpose a matrix's rows and columns

Include using

[[process]]
source = "lib://flowstdlib/matrix/transpose"

ComposeMatrix (//flowstdlib/matrix/compose_matrix)

Compose a matrix from a set of matrix elements

Include using

[[process]]
source = "lib://flowstdlib/matrix/compose_matrix"

Flow Examples

examples contains a set of example flows used to demonstrate flow (flowc, flowr, flowstdlib), the different semantics and characteristics of flows that can be written, and test them to ensure they continue to run correctly over time.

Each subdirectory holds a self-contained flow example, with flow definition, docs etc and some of them provide their own function implementations that get compiled to WASM by flowc when the flow is compiled.

Flow enables higher levels of parallelization of execution of 'jobs' within flows by allowing many jobs to be run in parallel, which then may be executed out of order. This can lead to unpredictable ordering of the output values of some operations. To embrace this, the examples typically avoid requiring a specific ordering of the output values.

Environment Variable Requirements

If you are using make, then temporary additions to $PATH will be made for you so that the required flow executables (flowcand flowr) are found.

However, if you wish to run an example from the command line, then you will need to make sure the flowc and flowr executables (built by the Makefile) are in your path (or use the full path when running them).

You can do this using:

export PATH="target/debug:target/release:$PATH"

from the project root directory.

Building all examples

cargo test

Builds all examples to make sure they compile, but they are not run.

cargo build --examples

Builds all examples

Running one example

cargo run --example $example-name"

This can be run from the root folder or the flowr folder. The named example is build and run.

The flow will be run with the arguments and standard input defined in files within each directory (if they are not present then those args or input is zero).

Testing one example

cargo test --example $example-name

The flow will be run with the arguments and standard input defined in files within each directory (if they are not present then those args or input is zero) and the output compared to the expected output (defined in files in the directory). If the output does not match the expected output then the test fails.

Testing all examples

cargo test --examples

Will run the tests in all examples.

args

Description

A test flow that tests that args passed to flowc are correctly passed to flowr and onto the flow and can be used in the flow execution

Features Used

Root Flow
Context Functions used (get and stdout)
Connecting from an element of an array output (array of args from get)

args-json

Description

A test flow that tests that args passed to flowc are correctly passed to flowr and onto the flow and can be used in the flow execution, using the JSON formatted output of the context get function

Features Used

Root Flow
Context Functions used (get and stdout)
Connecting from an element of a JSON array output (array of args from get)

arrays

Description

Sample to show the capabilities of:

gathering a stream of outputs of type Object to an input of type Object, of a specified size. This is done by the P2 'composer' (ComposeArray) function.
Decomposing an output of type array of objects to a stream of objects. This is done by the runtime when it sees a connection from an array of Type to Type.
P1 - sequence - generates a stream of outputs of type number
P2 - accumulator - accumulate the stream of numbers into arrays of numbers of size 4
P3 - adder - input of type Number and output of type Number, adding 1 in the process
P4 - print - print the output (the original sequence with '1' added to each number)

This example (with default settings on a multi-core machine) shows parallelism of the add function, dispatching multiple jobs for it in parallel as the array of number output from the previous process is deserialized (from array/number to number) in the connection from accumulator to added, creating a job for each Number. You can see this by using the -j option of the runtime to limit the number of outstanding jobs and the -m options to dump metrics after execution. The "Max Number of Parallel Jobs" should be similar or greater than 4, which is the size of the array of numbers formed.

Root

Click image to navigate flow hierarchy.

Features Used

Root Flow
Setting initial value of a Function's inputs
Multiple connections into and out of functions and values
Library Functions
Array of numbers construction (using the "accumulate" function) from a stream of numbers. This uses a loop-back connection of partially accumulated arrays. By specifying an array ("chunk size") of four, a a stream of arrays of four numbers (the"chunks") is produced.
Implicit conversion between arrays of (four) numbers to a stream of numbers done automatically by run-time, from the accumulator's output of arrays of four numbers, to "add"'s input of a single number.

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

debug-help-string

Description

Test of printing debug help string from the debugger

debug-print-args

Description

Check that we can execute a flow in the debugger and that it works.

The flow features used is not important for this test and the flow is the same as ../print-args.

double-connection

Description

Check that a flow with two connections from one function to another function compiles and runs correctly. In this case it takes the first and third of the input arguments and prints them to standard out.

factorial

Description

A flow that calculates the factorial of a number and prints it out on stdout.

Root

Click image to navigate flow hierarchy.

Features Used

Root Flow
Connections between functions
Loop-back connections to accumulate a multiplication result
Initializing function inputs with values, once and constantly
Multiple connections into and out of functions
Library Functions to_json, multiply, subtract from flowstdlib
Library Functions tap, compare from flowstdlib
Use of aliases to refer to functions with different names inside a flow

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

fibonacci

Description

A flow that generates a Fibonacci series of numbers and prints it out on stdout.

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Root Flow
Child flow described separately, with named outputs to parent flow
Connections between Input/Outputs of parent/child flows
Setting initial value of a Value at startup
Multiple connections into and out of functions and values
context Functions used (stdout)
Library Functions used (buffer and add from flowstdlib)
Use of aliases to refer to functions with different names inside a flow
Connections between flows, functions and values

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

hello-world

Description

A simple flow that prints "Hello World!" on stdout

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Root Flow
Library Functions used (stdout from flowstdlib)

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

line-echo

Description

A trivial flow that takes a line on stdin, and prints it on stdout

Features Used

Root Flow
Library Functions used (stdin and stdout from context)
Connections between functions

mandlebrot

Render a mandlebrot set into an image file, with the output image size and imaginary number coordinate space configured via input parameters.

The pre-configured test (input arguments in test.args) renders a very small mandlebrot set (20x15 pixels) in order to keep the test running time time short and be able to use in in CI runs.

Description

Notably, there is also a standalone rust project in the project (Cargo manifest) folder. The functions are used in the rust program that is built and also made available as functions to the Flow project that is described in the toml files - showing how native code can live alongside and be used by the flow.

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Root Flow
subflow described separately, with named outputs to parent flow
Connections between Input/Outputs of parent/child flows
Setting initial value of a function with a Once initializer
Multiple connections into and out of functions and sub-flows
Library Functions used to convert Number to String and to add numbers
Use of aliases to refer to functions with different names inside a flow
Connections between flows, functions and values
flowr context function used to render output to an Image Buffer
provided functions in rust that get compiled to WASM and then loaded and executed by the runtime

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

SubFlows and Functions Description

Subflow parse_args reads the argument passed to the flow and outputs the filename to render to, the size (width, height in array/number) and bounds of coordinate space (an array of 2 imaginary numbers, where an imaginary number is two numbers, so expressed as array/array/number) to calculate the set for
Subflow generate pixels that enumerates the 2D array of pixels to calculate, producing "a stream" of pixels (x, y coordinates) to be used to calculate the appropriate value for that pixel.
Subflow render that uses the functions below to take the pixels, calculate it's location in the 2D imaginary space, calculate the value in the set for that point and then render value at the pixel in the image buffer.
- Function pixel to point to calculate the corresponding location in the 2D imaginary coordinate system for each pixel
- Function escapes to calculate the appropriate value (using the core mandlebrot algorithm) for each pixel.

Escapes

Try to determine if 'c' is in the Mandelbrot set, using at most 'limit' iterations to decide if 'c' is not a member, return 'Some(i)', where 'i' is the number of iterations it took for 'c' to leave the circle of radius two centered on the origin.

If 'c' seems to be a member (more precisely, if we reached the iteration limit without being able to prove that 'c' is not a member) return 'None'

Pixel To Point function

Given the row and column of a pixel in the output image, return the corresponding point on the complex plane.

pipeline

Description

A example that shows a simple "pipeline flow" with a number of functions organized into a pipeline. When supplied with a "stream" of inputs, multiple functions are able to run in parallel utilizing more than one core on the machine.

Using command line options (-j, -t) the flow can be invoked with just one worker thread and it becomes sequential. The metrics of how many jobs were able to be processed in parallel can be viewed using the -m command line option.

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Rot Flow
Setting initial value of a Value at startup
Multiple connections into and out of functions and values
Library Functions used from flowstdlib
Use of aliases to refer to functions with different names
Connections between functions and values
Referring to a function's input by name in connections

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

prime

WIP

Description

A flow that finds prime numbers upto the maximum specified in the input arguments

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Root Flow
Sub-flows
Value (with an initial value set)
Constant Value
Arguments parsing for the execution
Constant Value used
Connections between functions
Library Functions used
- Args to parse arguments
- sequence to generate a sequence of numbers
- divide to divide two numbers
- compare function to produce outputs based on comparing two input values
- switch function to stop or pass a data flow based on another one
- ToString to convert Numbers to Strings
- stdout to print a String to standard output

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

primitives

Description

A flow that takes a value and a constant and adds them, and then takes the result and adds it to the constant again and then printed the final value to stdout. It also uses the switch function to stop a flow with a false value, and compares the result of the add to a value and print out to stdout if it is greater or equal to it.

The purpose is not to do anything useful, but just to show the use of and stress the semantics of a number of the primitives.

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Root Flow
Library Functions used (add and stdout from flowstdlib)
Value used (with an initial value set)
Constant Value used
Connections between functions
Two functions of the same name in the same flow, distinguished by alias
switch function to stop or pass a data flow based on another one
compare function to produce outputs based on comparing two input values

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

reverse-echo

Description

Trivial flow that takes a line on stdin, reverses it and then prints it on stdout

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Root Flow
Library Functions used (stdin and stdout from flowstdlib)
Custom function (in rust) with a structure on the output with sub-elements
Connections between functions
Connections from sub-elements of a function's output
Function with single input (stdout) not requiring input name

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

router

Description

This example implements the algorithm (as described here https://github.com/andrewdavidmackenzie/router) for calculating the shortest route from a start-point to an end-point through a simplified road network.

Road route London Heathrow to City Center

Root Diagram

Click image to navigate flow hierarchy.

Features Used

The selection of a single Value from an array of values that is one of a number of outputs (not the only output). This involves selecting the structure from the output by route, and then the value from that by index.

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

sequence

Description

A flow that generates a sequence of output numbers in a range between two input numbers

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Root Flow
Initial value setting on flow inputs
Connections between functions
Library function join used
Library subflow sequence used
Detecting end of sequence and outputting a message when completed
context Functions used
- stdout to print a String to standard output

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

sequence-of-sequences

Description

A flow that generates a sequence of numbers, and for each of those numbers it generates a sequence from 1 upto the number from the sequence.

This shows nested flows, and the inner flow running an "inner loop"

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Root Flow
Library Functions used (stdout from flowstdlib)
Library Flows used (sequence from flowstdlib)
Connections between functions

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

tokenizer

Description

Sample to show the possible parallelization (depending on the runtime implementation) of splitting a string into tokens using the string splitter function from the flowstdlib.

Root Diagram

Click image to navigate flow hierarchy.

Features Used

Root Flow
Setting initializer of a Function's input with a constant initializer
Library Functions
Iteration (possibly in parallel) via feedback of partial output values back to the same funtion's input.
Implicit conversion between arrays of string and string done by run-time, in feedbak loop to the same process

Functions Diagram

This diagram shows the exploded diagram of all functions in all flows, and their connections.

Click image to view functions graph.

two-destinations

Description

Check that a flow that specifies two destinations in a single connection, compiles and executes correctly.

Developing flow

Supported Operating Systems

The CI tests for 'flow' run on Mac OS X and Linux. Others may well work as rust projects are pretty portable but I develop on Mac OS X and don't know the status on other OS.

Pre-requisites required to build and test

These are the pre-requisites that are required to build and test 'flow':

rust toolchain (rustup, cargo, rustc, etc )
- with wasm32-unknown-unknown target for compiling to wasm
- clippy for checking coding best practices
zmq (Zero Message Queue) library
graphviz utilities for automatic generation of SVG files for the book

For building the book: mdbook and the mdbook-linkcheck plug-in

Installing pre-requisites

You have to install rustup, cargo and rust toolchain yourself. I decided to stop short of futzing with people's installed toolchains.

There is a Makefile target config that will install the other dependencies:

make config

That will add the wasm32-unknown-unknown target, clippy, graphviz, mdbook and mdbook-linkcheck.

Building `flow`

Install Pre-requisites and Build

You need Git to clone the repo.

Clone the repo

From your command line:

git clone https://github.com/andrewdavidmackenzie/flow.git

Install build tools

You need make and a rust toolchain (cargo, rustc, clippy) to build from source (I suggest using rustup).

Once you have those, you can install the remaining pre-requisites using:

make config

These include libraries like ssl and tools like mdbook, mdbook-linkcheck and graphviz to build the book.

The make config target should install them all for you. It should work on macos and linux variants using apt-get or yum package managers (PRs to Makefile are welcome for other linux package managers).

Build and test

To build and test (including running the examples and checking their output is correct) as well as building the book and ensuring all links are value, use:

make

NOTE The first time you build (using make or make all), it will take a long time. This is due the function implementations in the flowstdlib standard library being compiled to WASM. After the first build, dependencies are tracked by the flowc compiler and implementations are only re-compiled when required.

Make book changes

The book is rebuilt as part of every PR and Merge to master to ensure it is not broken. The book is rebuild and deployed here on every release.

Project components and structure

The Project is structured into a number of rust crates that form part of a rust cargo "workspace".

Currently, two binaries are built: flowc the flow compiler two flow runners flowrcli and flowrgui.

See the Project Components and Structure section of the book for more details.

Contributing

I organize all issues in a Github Project and chose things to work on from the "Next" column. I have only marked a couple of issues with "help wanted" label but I can do more if there is interest. If in doubt reach out to me by email, or GitHub issue.

Crates in the flow project

The flow project is split into a number of different crates for a number of reasons:

proc macros need to be in their own crate
sharing of structures and code across compiler and runner crates
desire to separate core functionality in libraries from CLI binaries and enable UI applications using only the libraries
provide CLI versions of compiler and runner
avoid cyclic dependencies between parts
allow to compile optionally without some features, not using code in a crate
separate library implementation from compiler and runner

The following sections provide a brief description of the crates in the project.

`flowcore`

flowcore is a library of structs and traits related to flow that are shared between multiple crates in the flowproject.

`Implementation` trait

This is a trait that implementations of flow 'functions' must implement in order for them to be invoked by the flowrlib (or other) run-time library.

An example of a function implementing the Implementation trait can be found in docs.rs for Implementation

`Provider`

This implements a content provider that resolves URLs and then gets the content of the url.

Features

flowcore crate supports a number of "features" for conditional compiling with more or less features.

features

These are the conditionally compiled features of flowcore:

default - none are activated by default
context - makes this crate aware of the flow context functions or not
debugger - feature to add the debugger
online_tests - run any tests activated by this feature
meta_provider - include the meta provider for resolving "lib://" and "context://" Urls
file_provider - include a provider to fetch content from the file system
http_provider - include a provider to fetch content from the web

Examples

flowrlib library crate compiles flowcore activating the "file_provider", "http_provider", "context" and "meta_provider" features
flowr compiled flowcore activating the "context" feature as it provides context functions. It has a number of features that, if activated, active corresponding features in flowcore (flowr "debugger" feature actives "flowcore/debugger" feature.) and it depends on flowrlib (above) that in turn activates features
flowrex compiles flowcore with the default set of features (which is the minimal set in the case of flowcore as it does not provide ant context functions ("context" feature), nor does it coordinate flow running and provide a debugger ("debugger" feature), nor does it have the need for running "online_tests", and lastly it does not fetch content via any of the various "providers" ("meta_provider", "file_provider", and "http_provider" features).

`flowmacro`

flow_function is a proc macro to be used on the structure that provides an implementation for a function (by implementing the FlowImpl trait), so that when compiled for the wasm32 target, code is inserted to help read the inputs, help form the outputs and allocate memory (alloc) as well as serialize and deserialize the data passed across the native/wasm boundary.

Features

flowmacro has no features

`flowc`

flowc is the "compiler and linker" for flows and flow libraries, although it is not very similar to what you might be familiar with as a compiler or linker.

It is part of the overall flow project (README.md).

It loads flow definition files, and builds the flow hierarchy reading from referenced flows/functions and library references, and builds the flow in memory.

Then it connects all functions via data flows through the hierarchy and removes most of the flow structure leaving a "network of functions" which it then optimizes (removing unused functions and connections).

It checks that types match and required connections exist.

It also checks for some illegal or cases that would prove problematic at runtime (specific types of "loops" or contention for a connection)

Lastly it generates a manifest describing the flow, which can be executed by flowr.

It then may (depending on the command line options used) invoke flowr (using cargo to ensure it is up to date and built).

It is structured as a library with a thin CLI wrapper around it that offers command line arguments and then uses the library to compile and optionally run the compiled flow.

`flowrclib`

This library contains most of the compilation and linking logic for flowc.

features

These are the conditionally compiled features of the flowc crate:

default - The "debugger" feature is enabled by default
debugger - feature to add the debugger

Flowc Parser and Compiler Test flows

These are a number of test flows to exercise specific features of flowc's parser and compiler and are not intended to be "real world programs" or solutions to problems.

They are contained in the flowc/tests/test-flows folder in the code.

To understand each test, see the test code in:

`flowr`

flowr includes the flowrliblibrary for running flows (see below for details)

flowr includes a number of "runner" applications (built using the flowrlib library) for running flows:

flowrclito run flows from the command line
flowrguia flow runner with a graphical user interface (GUI) built using Iced
flowrex a binary that only executes jobs (does not coordinate flow execution) and can be used over the network by a coordinator as a way to have more execution resources executing a flow's jobs

They handle the execution of Functions forming a Flow according to the defined semantics.

flowrlib

It is responsible for reading a flow definition in a Manifest file, loading the required libraries from LibraryManifest files and then coordinating the execution by dispatching Jobs to be executed by Function Implementations, providing them the Inputs required to run and gathering the Outputs produced and passing those Outputs to other connected Functions in the network of Functions.

features

These are the conditionally compiled features of flowr crate:

submission - include the ability to receive a submission of a flow for execution
context - makes this crate aware of the flow context functions or not
debugger - feature to add the debugger
metrics - feature for tracking of metrics during execution
flowstdlib - (is an optional dependency, which act like a feature flag) to allow native versions of flowstdlib functions to be compiled and linked or not (and rely on wasm versions)

By default, the following are enabled: "debugger", "metrics", "context", "submission", "flowstdlib"

`flowrcli` and `flowrgui`

Context Functions

The context folder implements the context functions that each runtime provides for flows to interact with the environment (such as Standard IO and File System) as well as providing definitions of the context functions to be used when compiling a flow.

These are all impure functions, or functions with side effects, not part of the flow itself.

Those functions are organized into the following modules, each with multiple functions:

args - used to get arguments that flow was invoked with
file - used to interact with the file system
image - used to create image files
stdio - used to interact with stdio

`flowrex`

You can find more details about how to use it in running flows in the distributed section.

`flowr`

flowr includes the flowrliblibrary for running flows (see below for details)

flowr includes a number of "runner" applications (built using the flowrlib library) for running flows:

flowrclito run flows from the command line
flowrguia flow runner with a graphical user interface (GUI) built using Iced
flowrex a binary that only executes jobs (does not coordinate flow execution) and can be used over the network by a coordinator as a way to have more execution resources executing a flow's jobs

They handle the execution of Functions forming a Flow according to the defined semantics.

flowrlib

features

These are the conditionally compiled features of flowr crate:

submission - include the ability to receive a submission of a flow for execution
context - makes this crate aware of the flow context functions or not
debugger - feature to add the debugger
metrics - feature for tracking of metrics during execution
flowstdlib - (is an optional dependency, which act like a feature flag) to allow native versions of flowstdlib functions to be compiled and linked or not (and rely on wasm versions)

By default, the following are enabled: "debugger", "metrics", "context", "submission", "flowstdlib"

`flowrcli` and `flowrgui`

Context Functions

These are all impure functions, or functions with side effects, not part of the flow itself.

Those functions are organized into the following modules, each with multiple functions:

args - used to get arguments that flow was invoked with
file - used to interact with the file system
image - used to create image files
stdio - used to interact with stdio

`flowrex`

You can find more details about how to use it in running flows in the distributed section.

`flowr`

flowr includes the flowrliblibrary for running flows (see below for details)

flowr includes a number of "runner" applications (built using the flowrlib library) for running flows:

flowrclito run flows from the command line
flowrguia flow runner with a graphical user interface (GUI) built using Iced
flowrex a binary that only executes jobs (does not coordinate flow execution) and can be used over the network by a coordinator as a way to have more execution resources executing a flow's jobs

They handle the execution of Functions forming a Flow according to the defined semantics.

flowrlib

features

These are the conditionally compiled features of flowr crate:

submission - include the ability to receive a submission of a flow for execution
context - makes this crate aware of the flow context functions or not
debugger - feature to add the debugger
metrics - feature for tracking of metrics during execution
flowstdlib - (is an optional dependency, which act like a feature flag) to allow native versions of flowstdlib functions to be compiled and linked or not (and rely on wasm versions)

By default, the following are enabled: "debugger", "metrics", "context", "submission", "flowstdlib"

`flowrcli` and `flowrgui`

Context Functions

These are all impure functions, or functions with side effects, not part of the flow itself.

Those functions are organized into the following modules, each with multiple functions:

args - used to get arguments that flow was invoked with
file - used to interact with the file system
image - used to create image files
stdio - used to interact with stdio

`flowrex`

You can find more details about how to use it in running flows in the distributed section.

`flowstdlib` Library

flowstdlib is a standard library of functions and flows for flow programs to use.

Modules

flowstdlib contains the following modules:

Use by the Compiler

NOTE: That flows are compiled down to a graph of functions at compile time, and do not exist at runtime.

Building this library from Source

Libraries like flowstdlib are built using flowc, specifying the library root folder as the source url.

This builds a directory tree (in target/{lib_name}) of all required files for a portable library, including:-

documentation files (.md MarkDown files, .dot graphs of flows, graphs rendered as .dot.svg SVG files)
TOML definition files for flows and functions
Function implementations compiled to a .wasm WASM file
A manifest.json manifest of the libraries functions and where the implementations (.wasm files) can be found. This is used by the Runtime to be able to load it.

Dual nature of flow libraries

Native use by a Runtime

WASM use by a Runtime

When a job is executed that requires one of these library functions, the WASM implementation is run.

features

There are no features to enable.

Flow Examples

Environment Variable Requirements

If you are using make, then temporary additions to $PATH will be made for you so that the required flow executables (flowcand flowr) are found.

You can do this using:

export PATH="target/debug:target/release:$PATH"

from the project root directory.

Building all examples

cargo test

Builds all examples to make sure they compile, but they are not run.

cargo build --examples

Builds all examples

Running one example

cargo run --example $example-name"

This can be run from the root folder or the flowr folder. The named example is build and run.

The flow will be run with the arguments and standard input defined in files within each directory (if they are not present then those args or input is zero).

Testing one example

cargo test --example $example-name

Testing all examples

cargo test --examples

Will run the tests in all examples.

Important make targets

(default) make will build, run local tests and build the book and check links are valid

other targets you can run to perform only a part of the whole build

make build will build the libs and binaries
make clippy will run clippy on all code including tests
make test will run all tests, including testing flowr examples run and pass
make book will build the book and check all links are valid

Contributing

There are many ways of contributing:

adding an issue with a bug report of an enhancement request or new feature idea
pick up an issue and try and fix it (I am not very good at labelling these, but will try)
adding to or correcting the code documentation or the book
adding a new example
improvements to the libraries, compiler, standard library, run-time
improvements to unit or integration tests
improvements to build processes

To get started, fork the repo, clone it locally and build from source, as described in building.

Maybe run an example or two. Samples that don't require specific arguments or standard input to work correctly (such as fibonacci) are the easiest to get started.

Then, once you know everything is working correctly, chose an issue to work on from the GitHub project kanban.

Create a branch to work on, and dive in. Try to make descriptive commit messages of limited scope, and stick to the scope of the issue.

Make sure all code builds, there are no clippy errors, tests pass, and the books builds, before pushing, by running make.

When you think you are done, you can create a Pull-Request to the upstream project. If you include "Fixes #xyq" in the PR description, it will close the issue "xyz" when it is merged.

If you are not sure if it is ready or want some early feedback, prefix the name of the PR with "WIP - " and I will know it's not intended to be merged yet.

If in doubt, just reach out to me by email to [email protected], create an issue in GitHub, comment an existing issue or message me on matrix (andrewdavidmackenzie:matrix.org).

Issues

Issues can be found in the repo, if you are not yet a project contributor then just add a comment to one to say you'd like to work on it and I will avoid doing the same.

I work on issues KanBan style in this Github Project

Adding new issues you find can also be helpful, although with my limited time on the project, fixing issues and sending PRs are more welcome! :-)

PRs

If you want to contribute code or a test or to the book:

if no existing issue exists for it, create one so we can agree on what to do before starting (a good idea to make later PR merges easier to accept I think!)
if an issue exists already add a comment to it so I know you want to work on it
fork the repo
create a branch for the issue in your repo
make your changes and update tests, code docs, the book and examples as required
run local build and test ('make') before pushing to your branch
wait for GH Actions on PR to pass
submit the PR, referencing the issue is a good idea

Flow Examples

Environment Variable Requirements

If you are using make, then temporary additions to $PATH will be made for you so that the required flow executables (flowcand flowr) are found.

You can do this using:

export PATH="target/debug:target/release:$PATH"

from the project root directory.

Building all examples

cargo test

Builds all examples to make sure they compile, but they are not run.

cargo build --examples

Builds all examples

Running one example

cargo run --example $example-name"

This can be run from the root folder or the flowr folder. The named example is build and run.

The flow will be run with the arguments and standard input defined in files within each directory (if they are not present then those args or input is zero).

Testing one example

cargo test --example $example-name

Testing all examples

cargo test --examples

Will run the tests in all examples.

Continuous Integration testing of flow

The CI build and test run on each push to a branch or PR include unit and integration tests, rust doc-tests and it also compiles, generates, runs and checks the output of all the examples in flowr

It also checks that clippy passes without warnings and that the book builds and does not have any broken links.

Before pushing to Github, you should make sure that make passes.

Internals Overview

In this section we provide some more details on what flowc does when you use it to compile, generate, build and run a flow.

The process includes these areas described in more detail in the following pages:

Flow loading: the process of reading in the flow description and building an internal representation of it
Flow compiling: take hierarchical flows representation loaded from previous stage and "compile down" to one more suited for project for flow project generation for execution.
Flow execution: The generated project is loaded by the generic run-time library (flowrlib) and the functions are executed in turn.

Flow Loading

Read in the hierarchical definition, recursively reading all nested flows until everything loaded.

Build the connections between values, functions, input and outputs using the unaliased routes to functions and subflows.

Check that the from/to types on connections match.

Flow Compiling

From the hierarchical definition of a flow program produced by the loading stage:

Connection Reducing

Build a flat table of connections.

For every connection that ends at a flow:

Look through all other connections and for each one that starts where this flow starts:
- Replace the connection's destination with this connections destination.
- Delete this connection

When done there should be no connections ending at flows. Any connections left that starts at a flow, is unconnected and can be deleted.

Value and Function Tables

Build a table of values and functions.

Pruning Value and Function Tables

Drop the following combinations, with warnings:

values that don't have connections from them.
values that have only outputs and are not initialized.
functions that don't have connections from their output.
functions that don't have connections to all their inputs.

Flow Execution

In this section we describe how a flow is executed by a flow runner.

Components of Execution

A compiled flow consists of:

Context functions - references to functions provided by the flow runner
Flow functions - references to functions used in the flow
Connections - connections between a function's output and one or more other functions' inputs
Jobs - created by the runner as execution progresses

Context Functions

Context functions are functions provided by the flow runner program for interacting with the surrounding execution environment, such things as standard IO, file system, etc.

These are "impure" functions where the outputs are not derived solely from the inputs. Some of them will have inputs only (e.g. a stdout "print" function). Some of them will have outputs only (e.g. a stdin "readline" function) None of them will have inputs AND outputs.

Flow Functions

A flow is a graph of connected functions (including Context functions) where outputs of one function are connected to inputs of another. A flow references the functions used, that may come either from a library or provided by the flow itself via custom source functions that are compiled to WASM for running by the flow runner.

These functions are all "pure" with no side-effects and the outputs are solely derived from the inputs, in a reliable way. A function does not store any value or state.

Such functions must have one or more defined inputs and an output, as a (non-Context) function without an input cannot receive data to run with and will never be invoked and an (non-Context) function without an output has no effect and does not need to be run.

A function has only one output. But the output may produce structured data, and a connection can be made from a data element to another function's input using the "output route" selector.

A functions output will be connected to one or more other functions inputs. It is possible to connect a functions output back to one of it's input for the purpose of recursion or iteration. These are called "loopback" connections.

A Function can only run when a value is available at each of it's inputs and the destinations it sends values to are all free to send to. It is blocked from running until these conditions are met.

When the (job that uses the) function completes, it will produce an optional output value. If so, a copy of this output value (or part of it via an "Output route") will be sent to each connected destination function's input, possibly enabling them to run.

"RunAgain" and the Completed State

A function also returns a "RunAgain" value that indicates if it can/should be run again by the runner. This is usually for use in Context functions, such as the case of reading from standard in, say using "readline".

The "readline" function can be invoked many times, each time it will read a line of text from the standard input and return TRUE for RunAgain, until EOF is reached when it will return FALSE for RunAgain.

When that happens the runner will put the function in the Completed state and it will not be invoked again for the remained of this flow's execution.

Input Initializers

A functions inputs in a flow can be configured with an "Input Initializer", they are not part of the function's definition, allowing them to be re-used in multiple flows or locations in a flow with different initializers.

An initializer can be of type "Once" where the input value is initialized with the provided value just once, or of type "Constant", in which case the input is re-initialized with the provided value each time after it has run. Once an input is supplied the value from the initializer, everything functions the same as if the value had come from another function's output.

Connections

Values in the flow graph proceed via connections between function's outputs and functions inputs. An output can be the entire value produced or a part of it selected via an "Output Route". On a given execution a function may produce or not an output. Also, the output data structure may vary and an "Output Route" may or may not have data in it.

If no value present, then nothing is sent to the destination function's input and it will remain waiting.

Jobs

Jobs are created in order to execute a function, with a given set of inputs. Initially they contain inpout values and a reference to the function to execute. Once run, they will also contain the results ("RunAgain" and an Optional value produced)

Generalized Rules

Functions can have zero or more inputs
Each input can be connected to one or more Outputs that may send values to it during execution
Each (non-Context) function has one output, that can send values to one or more destinations
Non-Context functions must have 1 or more inputs and an output
Connections to destinations may consume the whole output value/struct, or may select a portion of it using a route
If no output is produced, or there is no value at the selected route, then nothing is sent to destinations
Can only be run (via a Job) once a value is available at each of the inputs and the output is free to send to the destinations it is connected to. Is blocked from running until these conditions are met
Once ran, it produces an output that is sent to all destinations connected to its output
Each of the destinations consumes (a copy of) the output value only once
Once the output has been consumed (once) by all of the destinations, then the function may be ran again
The only thing that determines if a function is available to run is the availability of data at its inputs, and the ability to produce the result at its output by the destination inputs being free.
If a destination function hasn't consumed its input, then the first function will be blocked.
A flow's execution ends when there are no functions left in the "ready" state available for execution

Parallelized Execution

A core goal of 'flow' is to enable parallel execution of programs, without explicitly programming the parallel execution, but allowing the inherent parallelism in an algorithm to occur.

This is possible due to a flow definition being done by describing functions on data, with the data dependencies being explicit via "Connections" between functions and execution not occurring until data is available.

Thus, multiple instances of functions (via Jobs containing input data then output results) maybe executing in parallel as governed by the data dependency and execution rules above, and in fact multiple instances of the same function (in different jobs) maybe executing in parallel.

The level of parallelism is determined by the algorithm as defined in the flow, the flow execution rules and the number of cores in the execution machine(s) executing jobs.

Execution Order

Dataflow execution like that done by 'flow', and especially if parallel execution is performed, does not guarantee any specific order of function/job execution or completion. Data dependencies expressed in the flow should govern results.

This requires some unlearning of rules learned in previous procedural languages and some assumptions are no longer valid. e.g. a Range of numbers from 0..10, could "appear" as data values in the graph as 3,8,1,2,0,9,6,5,7,4 instead of the expected 0,1,2,3,4,5,6,7,8,9.

If a specific order is required in output data, then either the algorithm should enforce it inherently, or some specific functions that impose order can be used (preferably just prior to output) at the expense of parallelism.

At a given time, in the flow graph there can be a number of functions ready for execution and having Jobs created for them. They maybe executed in different orders by the runner, while still producing "correct" output (e.g. if order of output is not important, two different orders of output values are both considered "correct").

The 'flowr' runner two ExecutionStrategy that affect the order of job execution:

"InOrder" - execution is in the order that they became ready to execute - first come first served
"Random" - functions are selected at random from within the set of those Ready

Note that the time taken to execute different jobs may be different, and each may vary on a given machine and of the flow is distributed across a network then other effects and other machines can affect Job execution, and hence Job completion time. So, beyond the different execution orders mentioned above, there are also no guarantees about job completion order. Flow programs should be programmed to be robust to this.

Execution States

Prior to initialization, all functions will be in the Initial state.

The Initialization step described below is run, after which all functions will be in one or more of the following states (see State struct in run_state.rs):

Ready - Inputs are satisfied, the Output destinations are free and it can be run
Blocked- One or more destination inputs this functions sends to is full, blocking execution
Waiting - One or more of the inputs lack data, so the function cannot run
Running - There is at least one job running that is using this function
Completed - The function has returned FALSE for "RunAgain" and is not available for execution

Execution Process

Submission

A flow is sent for execution by a client application sending a Submission containing a reference to the compiled flow manifest to the runner application.

Loading

All functions are loaded as they are read from the flow manifest. If they refer to library functions, then they are loaded from the library reference (either a pre-loaded native implementation or a WASM implementation).

If they are WASM implementations supplied by the flow itself, then they are also loaded.

Initialization

Any functions with "Input Initializers" ("Once" or "Constant" types) will have the relevant inputs initialized with the specified value.

This may satisfy the function's need for input data and then it will be set into the Ready state.

Since the function's input is full, this may cause a block on other functions pending to send to that input.

Some Context functions that have no inputs (e.g. stdin "readline") may be placed immediately into the Ready state (they are always ready until they return FALSE to "RunAgain).

Now, the execution loop is started.

Execution Loop

A function in the Ready state is selected to run (depending on the ExecutionStrategy discussed above).

A Job is created using the function's available input values and is sent for execution.

this may unblock another function which was blocked sending to this functions as it's input was full

Execution of jobs by "pure" functions are non-blocking by nature, as their execution depends only on their input values (which are in the job) and the availability of an executor to run them one. So, once they start executing they complete as soon as possible.

A blocking wait on completed jobs is performed. For each completed job that is received:

Any output value in the Result (whole or using an "Output Route to select part of the data) is made available to inputs on connected functions
- This may satisfy the inputs of the other function, causing them to transition to the Ready state

If the function has any "Always" initializer on any of it's inputs, it is run, possible refilling one or more of its inputs.

According to the availability of data at its inputs and ability to send to its outputs a function may transition to the Ready or Waiting (for inputs) or Blocked (on sending) state. If a functions returns DontRunAgain then it will be moved to the Completed state. This is used in particular for "impure" functions (such as readline) so that when they read EndOfFile (and running them again makes no sense) they may complete, and when their outputs are processed may cause the entire flow to complete. As such a function has no inputs, it otherwise would always be Ready and always be rerun, and "livelock" would occur and the flow would never end.

Termination

The execution of a flow terminates when there are no functions left on the ready list to execute. Depending on options used and the runner, this may cause the output of some statistics, unloading of loaded objects and either runner program exit, or return to wait for a Submission and the whole process starts again.

Flow Execution State Machine

States

Prior to initialization, all functions will be in the Initial state.

The Initialization step described below is run, after which all functions will be in one or more of the following states (see State struct in run_state.rs):

Ready - Inputs are satisfied, the Output destinations are free and it can be run
Blocked- One or more destination inputs this functions sends to is full, blocking execution
Waiting - One or more of the inputs lack data, so the function cannot run
Running - There is at least one job running that is using this function
Completed - The function has returned FALSE for "RunAgain" and is not available for execution

Events that cause state changes

The following top level events trigger evaluation of a function's state using the state variables and may cause a function to transition to a new state:

NewJob- A new job is created by the runner for a specific function, taking values from its inputs. This will cause the flow containing the function in the job to also be marked as busy, preventing functions from outside the flow sending to it while it is busy. Functions inside the same flow that were previously blocked sending to this function are now unblocked as the inputs are available to send to (unblock_internal_flow_senders). Functions from outside the flow attempting to send to it are blocked by "flow_blocks" that are removed when the flow goes idle later (as all the functions within it go idle).
JobDone - a job that was using a function completes, returning a result to the runner that includes the run_againvalue. This may cause a change in state in the function that was running in the job, and via the ValueReceived event below it may also affect other functions it sends values to.
ValueReceived - a function receives a value on one of its inputs, caused either by:
- An "Input Initializer" on the function being run
- A value was sent to it from another function on JobDone
- A value was sent to it from itself upon JobDone (loopback)
UnBlock - previously a function was blocked from running as a destination it sends to had it's inputs full. That function ran and it's inputs were freed, and it is not blocked on any other destination so the sender can now be unblocked. Functions that were blocked sending to the function being used in the job may become unblocked and so produces multiple UnBocks

State Variables

State variables for a function can be calculated at any time based on its inputs states, and other functions states. They are used in determining the next state that a function should be transitioned to when an event occurs:

needs_input - the function has at least one input that has no data on it, and so the function cannot run
output_blocked - the function has at least one destination input that is full and so it cannot send a result value to that destination, hence the function cannot run

State Transitions

An event may cause the affected functions to transition to a new state, based on its state variables:

NewJob Ready --> Running - The function used in the job transitions to Running
JobDone (job_done) Running --> Completed - !run_again (job_done) Running --> Waiting - run_again && needs_input Running --> Blocked - run_again && !needs_input && output_blocked (make_ready_or_blocked) Running --> Ready - run_again && !needs_input && !output_blocked (make_ready_or_blocked)
ValueReceived - a function receives a value on one of its inputs. (send_a_value) Waiting --> Waiting - needs_input Waiting --> Blocked - !needs_input && output_blocked (make_ready_or_blocked) Blocked --> Blocked - !needs_input && output_blocked (make_ready_or_blocked) Waiting --> Ready - !needs_input && !output_blocked (make_ready_or_blocked)
UnBlock - (remove_blocks) <-- (unblock_flows, unblock_internal_flow_senders) <-- (job_done) Blocked --> Ready

State Transition Diagram

                         +---------+
                         | Initial |
                         +---------+
                              |
                              |ValueReceived (via InputInitializer)
                              v
            UnBlock      +---------+    ValueReceived
       +---------------> |  Ready  |<--------------------+
       |                 +---------+                     |
       |                  ^       |                      |
       |           JobDone|       |NewJob                |
  +---------+             |       |                   +---------+
  | Blocked |<------------|-------|-------------------| Waiting |
  +---------+             |       |                   +---------+
       ^                  |       |                        ^
       |                  |       v                        |
       |   JobDone       +---------+      JobDone          |
       +-----------------| Running |-----------------------+
                         +---------+        
                              |
                              |JobDone
                              v
                         +---------+
                         |Completed|
                         +---------+

The 'flow' book