Gripple: a graphical user interface to the BIC pipelining system - design ideas

Pipelining systems as a whole lend themselves very well to a graphical representation. Most pipelines are, in fact, initially designed as a rough flow-chart which describes the data-flow that is desired in that pipeline. It would therefore seem useful to allow the user to structure the pipeline in the graphical way that is inherently intuitive for pipelining systems, and to then let the software take care of the the details of code generation. This page therefore contains some ideas, hopefully coherent, about how such a user interface to a pipelining system might look. As such, it is still under construction and new ideas will hoperfully be added on a daily basis!

Feature list and requirements

There are some basic features that such a programme would need. The following is an inherently incomplete and unordered list, but it is a list nonetheless.

Basic Goal: The basic goal of the programme is to allow the user to create a pipelining system by modeling the data-flow in a graphical fashion. As such it will need the ability to represent all of the tools (such as mincresample, mritotal, etc.), to determine variables (filenames, optional arguments ...), and to connect the variables and tools together.

System independance: Ideally, the graphical interface would be entirely separate from the actual pipelining system. The communication between the two should take place using some combination of pipes, scripts, and CORBA. A change in the underlying process control system would therefore only require that a new translator be written, which would (ideally) be feasible without changing the core code of the GUI, but just by creating a new plugin for the underlying system.

Tool Builder: The programme would also need to separate the design of individual pipelines from the design of pipeline stages. This would allow for inidividual stages to be created (using some form of a tool builder) with all the necessary inputs and outputs representing the required files and options. A stage created in such a fashion could then be dropped onto the canvas of the pipeline being designed at the moment. Moreover, each stage created in that way could be reused in different pipelines, the only thing that would change would be the way in which the stages are connected together (and connected with the variables representing filenames and options) on the canvas of that pipeline.

Pipeline reuse: The designing of tools should be modular enough to allow for whole pipelines to be considered a tool. One could, for example, build a pipeline which takes care of talairach registration and resampling, and then create a tool out of that pipeline to be reused in a larger pipeline.

Connecting stages: The connection of different pipelining stages should allow for two separate options: (1) The connection of a variable, i.e. an entire filename which will be left untouched; and (2) the connection with an output of a different stage, in which case some predefined variable substitution would be called for. For example, if the native file is (mni_icbm_00201_t1.mnc.gz) is connected to mritotal, the output would be mni_icbm_00201_t1_tal.xfm. Furthermore, connecting the inputs and outputs in this fashion allows the programme to establish data dependencies and prerequisites ... though these can be overridded manually if the user so desires.

Output directories: Along with the variable name one ought to be able to connect an output directory to each output port of a stage.

DSID builder: Another necessary component of a graphical pipeline system would be a dataset-id builder. One ought to be able to set up a strategy of parsing the dsid from an input (usually a filename, thought the system would have to be able to handle different kinds of input as well), as well as actually generate a list of dsids with which to run the pipeline.

Pipeline test runs: The graphical interface will also be able to visually display the data running though the pipeline, allowing the user to see exactly what filenames are generated and how the data progresses through the pipeline.

"Live" monitoring of data: The graphical system would also be able to show you which datasets are at what point in a running pipeline. Moreover, this should allow one to rerun a specified ID only from a particular point.

Implementation Issues

The most difficult part of the above project will be implementing the canvas to be used in the pipeline design. There will be a need for a series of efficient algorithms to match the inputs and outputs of pipeline stages based on drawing lines between the two, the ability to correctly redraw all of the wires if one or more of the tools have been moved on the canvas, and a way of determining data dependecies based on the wiring of stages. So here's the beginning of a strategy to answer those concerns.

Each item on the canvas will be represented by an xy coordinate, a type, and a name. For example, you might have,at location 35 70 a "stage" which is "mincresample". Each stage will have a series of connector knobs protruding from it, the amount and location of which is to be determined by the type of tool. Before I continue on this thread it might be a good idea to sidetrack onto how tools will be represented.

Each tool will effecitvely represent a graphical front-end to a command-line programme. The gripple front end will therefore have to have the ability to pass all files and options on to the command line programme that it is emulating. These tools will be definable using their own series of dialogs, and the end result will be a defining tag for every possible infile, outfile, and option. Along with defining the possibilities, the tool-designer can also set default options and variable substitutions.

When a tool that has been defined in this way is placed on the pipelining canvas, the structure of the tool will be parsed an the appropriate connection knobs will be generated. The designer of the pipeline can then connect variables to options or inputs, and connect the outputs of one stage to the input of another. The canvas will have to include some visual cues to clearly communcicate to the designer exactly what knobs of each stage are being connected together.

The connecting wires will therefore be designated by a start-point (knob x of tool y), intermediate xy coordinates (since just being able to draw straight lines would look hideous), and and endpoint ( knob a of tool b). The colour of the wire will depend on what kind of data it is carrying.

Aside from tools, the designer will also have the option of placing variables on the canvas, in order to be able to represent hard-coded options and files. For example, one might want a hardcoded file variable to designate the model used for mincresample. These variable can naturally also be wired to tools.

Upon completion of the design of a pipeline, the canvas has to be parsed along the connected to wires to establish the exact data flow. This is also the stage where warning can be ommitted if required options have not been linked, etc. If all of the requirements have been met, the final output can be generated.

The final output will in all likelihood be a structured xml document, as that format appears to be designed for representing structured documents (which a pipeline essentially is), as well as being human-readable.

Gripple will also include a run-time representation of the dataflow, which will come in two flavours: real-time monitoring as well as simulation. For both of these cases the monitoring can take place for a single dsid as well as for the the entire dataset (or a specified subset thereof). Simulation would graphically show (though the use of floating "tooltips" as well as highlighting of wires and tools) how the data would flow, whereas real time monitoring would check the status of each dsid using some form of plugin (remember the desired goal of actual system independence) and represent the status in the same graphical fashion.