Programming Methods
flow
provides the following facilities to help programmers create flows:
Encapsulation
Functionality can be encapsulated within a function at the lowest level by implementing it in code, defining the function via a function definition file with it's inputs and outputs and describing the functionality provided by it in an associated markdown file.
Sets of functions, combined together to provide some defined functionality, can be grouped together and connected in a graph in a flow, described in a flow definition file. This "flows"'s functionality can be defined via it's inputs and outputs just like a function, and its functionality described in an associated markdown file.
Flow definitions in turn can reference and incorporate other flows, alongside functions, until the desired functionality is reached.
Thus functionality is encapsulated via a "process" definition file, where a "process" can be defined as a function or a flow.
The mechanism to reference a process in a flow definition file is common for both types, and in fact the flow does not "know" if the process referenced is implemented as a function or a flow. At a later date the functionality of the sub-process should be able to be changed from being a function to a flow (or vice versa) with no semantic difference and no required change on the program and no impact to its execution result.
Semantics of Output sending
WHen a job executes, and it's results received by the runtime, the ouput values (if any) are sent onto the destination functions at the same time, before any other job's results are processes, and before creating any new jobs or dispatching new jobs.
The outputs of a function's jobs are all handled, and sent to their destinations at the same time.
Value deserialization
If the output of a job's function is (say) and array of numbers (array/number
) and it is connected in the flow
graph to another function who's input is of type number
, then that array may be deserialized into a stream of numbers
and sent to the destination one after another (all when the job result is being processed).
This can mean that the destination function's input gathers rapidly a number of inputs able to be used in job creation.
The values are sent in order of their appearance of the "higher order structure" (array) that contains them.
Value wrapping
Conversely, if the output value is of lower order that the destination (say a number
being sent to an input that
accepts array/number
) then the runtime may "wrap" the single value in an array and send it to the destination.
Job Creation
Jobs are created by gathering a set of input values from a function's inputs. The job is put into the ready_jobs queue with the values, and a reference to the function's implementation.
The inputs values order at the function's inputs is the order the values were sent to those inputs. The order of jobs created respects this order. So, the order of job creation for a function follows the order of values sent to that function.
When creating jobs, a runtime may decide to create as many jobs as can be created, and increase the potential for parallel execution later.
Thus, for a stream of deserialized values at the function's input, the runtime may attempt to maximize parallelization and creates as many jobs as inputs sets of values it can take. The order of the jobs created will follow the order of the deserialized stream.
Job Dispatch
Different jobs for the same function are independent of each other. They will be dispatched in the order of jobs creation (which follows the order of input value arrival).
When dispatching jobs, a runtime can decide to dispatch as many jobs as possible, or limit the number, in order to increase the potential for parallel execution of the jobs later.
This, many jobs maybe created from the deserialized stream, but the order of the jobs will follow the order of the deserialized stream.
Job Completion Order and Determinism
Jobs maybe executed by the same or a different executor, on the same or a different machine, with the same or different CPU architecture, with jobs being sent and results being received back over the network.
Thus, the order of job completion is not guaranteed to match the order of job creation.
In the deserialized stream case, here the order maybe lost. Thus algorithms exploiting this parallelism in the execution, but requiring to preserve order of the stream for some reason may have to handle the order and preserving it themselves (e.g. adding an index and later combining results using that index).
The order of a flow or sub-flow's output is determined by the data dependencies of the flow expressed in the graph.
Examples of ways to create determinism are:
- fibonacci example use of a feedback connection so that one value is used in the calculation of the next value, thus guaranteeing the order of the series.
- sequence example use of a "data flow control" function (
join
) to ensure that a string is not sent to thestdout
function until a specific condition (end-of-sequence
) is met.# Output a string to show we're done when the Sequence ends [[process]] source = "lib://flowstdlib/control/join" input.data = {once = "Sequence done"}
In imperative, procedural programming we often either assume, or can rely on order, such as the order of execution
of statements within a for loop. But with flow
and its focus on concurrency this is much less so. A series of jobs
(similar to the for loop example) to calculate a number of values, but they maybe all generated at once (or soon
after each other) and executed in parallel, with the calculations completing out of order.
Also, in flow libraries, such as flowstdlib
, some functions are written differently from what you might expect,
don't assume order, and the results maybe different from what you expect. This is reflected in the naming of functions
also, such as sequence
that is named carefully to communicate that the values are generated in a specific order.
The range
function does not guarantee order, only that all the numbers in the range will be output.
This it may generate the numbers in the range out of order, unlike what one would expect from a procedural language.
Re-use
flow
provides a number of mechanisms to help re-use, namely:
- definition and implementation of a function once, and then be able to incorporate it into any number of flows later via a process reference
- definition of a flow, combining sub-flows and/or functions, into a flow and then be able to incorporate it into any number of flows later via a process reference
- definition of portable libraries containing flows and/or functions that can be shared between programmers and incorporate it into any number of flows later via process references
Connection "branching"
As described in more detail in connections, a connection within a re-used flow to one of its outputs can be "branched" into multiple connections to multiple destinations when the flow is compiled, without altering the definition of the original flow.
Control flow via Data flow
In flow
, everything is dataflow, and dataflow is everything. There are no other mechanisms to produce values,
or coordinate activity. There are no loops, if-then-else or other logic control mechanisms.
The flowstdlib library provides the control
module with a a series of
functions and flows that you can use to control the flow of data, and hence program "control".
These are functions such as:
Looping
Looping is not a natural construct in flow
. If we look at how we would translate some use of loops from a
procedural language to flow it might illustrate things.
For example, to perform an action or calculation 'n' times, we might well generate a range of 'n' values, create a
process that does the desired action or calculation, and then combine the two with a 'data flow control' function
such as join
. Thus, the action/calculation can only produce an output for use downstream 'n' times, triggered
(possibly all in parallel) by the 'n' values that "gate" it's output.
Accumulating
In procedural programming a loop can be used to accumulate a value (such as the total of the values in an array).
In flow
there i sno global state and no variables that are persistent for a function across multiple invocations
of it.
The mechanism we use to do this in flow
is to use the add
function, initializing one input Once
with zero,
sending values to the other input, looping back the output (the partial sum) to the first input, so that the sum
(initialized to zero) is accumulated as values flow through it.
The same technique can be used to gather values into "chunks" of a determined size. One input of accumulate
is
initialized with an empty array ([]
), the other input receives the elements to gather, and we feed back the
array of elements gathered so far, and so on until the desired size of chunk is accumulated.
Nested Loops
What would be a nested for loop in a procedural program can be implemented by putting two flows in series, with one feeding the other.
For example in the sequence-of-sequences
example a first instance of a sequence
flow generates a series of "limits" for sequence of sequences to count up to.
A value for the start of each sequence, and the series of sequence limits is fed into another instance of the
sequence
function. This second flow generates a sequence each time it receives a set of inputs specifying the start
and end of the sequence.
- a first sequence is defined with start=1, end=10, step = 1 and hence generates: 1..10
- a second sequence is defined
- the start input is initialized always to 0
- the step input is initialized always to 1
- a connection is defined from the output of the first sequence to the
end
input of the second sequence- thus it generates 0,1,0,1,2,0,1,2,3 ending 0,1,2,3,4,5,6,7,8,9,10
Wrapping processes for convenience
Another mechanism used for convenience (it may abbreviate written flows) is to have a simple flow to wrap a function or process for a common use case, maybe initializing an input with a pre-defined value or creating feedback loops around the process to create a specific behaviour.