Distributed execution of jobs with flowrcli
and flowrex
Job Dispatch and Job Execution
The flowrlib
that is used by flow runner applications to execute a flow has two important functions:
- job dispatch - that managers the state of the flow, the dispatch of jobs for execution, and distribution of results received back, passing those results onto other functions in the flow etc.
- job execution - this is the execution of "pure" functions, receiving a set of input data, a reference to the function's implementation. It executes it with the provided input, and returns the job including the results.
Job dispatch is done by the server thread running the coordinator, responsible for maintaining a consistent state for the flow and it's functions and coordinating the distribution of results and enabling of new functions to be run.
Additional threads are started for job execution, allowing many jobs to be executed concurrently, and in parallel on a multi-core machine. Job execution on "pure" functions can run in isolation, just needing the input data and the function implementation.
Normal Execution
Normally, the flowrcli
process runs the coordinator in one thread and a number of executors in additional
threads.
However, due to the "pure" nature of the job execution, it can be done anywhere, including in additional processes, or on processes in additional machines.
flowrex
executor binary
florex
is an additional small binary that is built.
It cannot coordinate the execution of a flow but it can execute (just library for now) jobs.
Additional instances of flowrex
can be started in other processes on the same machine and have it
execute some of the jobs, increasing compute resources and concurrency/parallelism of flow execution.
It is possible to start flowrcli
with 0 executor threads and force flowrex
to execute all the
(library) jobs.
It can also be ran on another node, even one with a different architecture such as ARM, on the network and have job execution done entirely by it or shared with flowr.
How many jobs are done in one process/machine or another depends on the number of executors and network and cpu speed.
The flowrcli
flow runner and the flowrex
job executor discover each other using mDNS
and then jobs are distributed out over the network and results are sent back
to the coordinator running in flowrcli
also over the network.
TODO
It is pending to allow flowrec
to also execute provided functions, by distributing the architecture-neutral WASM
function implementations to other nodes and hence allow them to load and run those functions also.
Example of distributed execution
This can be done in two terminals on the same machine, or across two machines of the same or different CPU architecture.
Terminal 1
Start an instance of flowrex
that will wait for jobs to execute.
(we start with debug logging level to see what's happening)
> flowrex -v debug
The log output should end with
INFO - Waiting for beacon matching 'jobs._flowr._tcp.local'
indicating that it is waiting to discover the flowrcli
process on the network.
Terminal 2
First let's compile the fibonacci example (but not run it) by using flowc
with the -c, --compile
option:
> flowc -c -C flowr/src/bin/flowrcli flowr/examples/fibonacci
Let's check that worked:
> ls flowr/examples/fibonacci/manifest.json
flowr/examples/fibonacci/manifest.json
Then let's run the example fibonacci flow, forcing zero executors threads so that we
see flowrex
executing all (non context) jobs
> flowr -t 0 flowr/examples/fibonacci
That will produce the usual fibonacci series on the STDOUT of Terminal 2, then flowrcli
exiting
Logs of what is happening in order to execute the flow jobs will be produced in Terminal 1, ending with the same line as before:
INFO - Waiting for beacon matching 'jobs._flowr._tcp.local'
Indicating that it has returned to the initial state and is ready to discover a new flowr dispatcher of jobs to it.