|
Pipelines by TenFiftyTwo |
v2.0 |
Pipelines is inspired by and based upon
CMS Pipelines; an enterprise systems utility originally designed and developed
by IBM.
Pipelines, which executes under the
umbrella of ooRexx, allows you to modify the contents of a text/data file or
files, quickly and easily. You can specify that only certain sections of a file
are to be changed; you can confine those changes to a column, word or field
range, translate words and phrases, discard or insert new lines of data. You
can perform a whole range of operations on a file or files, using only a simple
set of commands. You may find Pipelines useful for data-mining or updating
extremely large log-files; searching for and replacing values based on simple
pattern matching or complex expressions using regular expression parsing, in
reverse record-order if necessary. A pipeline can call third-party WIN32
programs, issue system (CMD) and Powershell commands; capturing console output
in order to operate on the data. You can connect multi-purpose pipelines
together to quickly construct an on the fly solution to a wide range of
transformation problems that might otherwise consume a great deal of your time.
Pipelines build’s on the concept of
directing the output of one process to the input of another, commonly known as
pipelining. It is an old idea and almost all operating systems support an
implementation of varying degree of usefulness. In general they support the
linear, single-stream model; where if you lay each process out in a straight
line, data starts in the first process, passing into the next where it is
changed in some way, and so on down the pipeline chain in a sequential fashion
until it reaches a sink. For example:
stage1
| stage2 | stage3 | ... | stagen
Pipelines builds on this mechanism;
allowing you to create multi-stream pipelines, where the
topology is no longer horizontal and linear, but two-dimensional; where the
records travel up and down the pipeline chain through intersecting joints which
control the flow of data. Multi-stream pipelines allow you to select and
operate on specific sets of records; routing unselected records through a joint
into and out of other sections of the pipeline.
|
● |
Pipelines treats its input data
as lines or records, reading them
one at a time from its input and writing them one at a time to its output. As
such and unless the entire input needs to be loaded into memory-storage; Pipelines
only consumes a fraction of the memory that might otherwise be required, as
only a handful of records are ever
in the pipeline at any one time. |
|
● |
Pipelines allows you to operate on
files of any size in a single pass; isolating sections of the file without
having to needlessly buffer or sort the data simply in order to maintain the relative record order. Consider the
following simple pipeline, which, utilising only 6 stages, reads the file: myfile.txt
and in a single-pass, changes the word hello
to goodbye only in records that
contain the word friend. |
**** Top of file ****01 Address Rxpipe0203 ‘pipe (endchar ?)’,
04 ‘< myfile.txt’,
05 ‘| a: locate /friend/’,
06 ‘| change /hello/ /goodbye/
07 ‘| b: faninany‘,
08 ‘| > myfile.txt’,
09 ‘?’,10 ‘a:11 ‘| take *’,
12 ‘| b:’1314 Exit 0**** End of file ****
|
● |
Pipelines comprises a range of
input, output, selection and transformation stages which provide a number of useful manipulation functions,
including; splitting records, stripping characters, joining records,
collating and sorting and more. On the whole, similar operations are
performed by a single stage; which
means that you do not have to remember the names of an unnecessarily lengthy
list of stages. For example;
stripping characters from a record, Pipelines provides a single stage called STRIP which removes
characters from the beginning and/or the end of a record. |
|
● |
Pipelines also has a number
sub-commands; PEKKTO, READTO and OUTPUT that can be used to create ooRexx
scripts that work as user-defined stages. |
|
● |
Pipelines is general purpose; it
has not been developed with any particular field in mind, it is simply a line/record orientated textual processing utility that is useful for
manipulating data. The design of Pipelines is essentially a compromise between
speed and flexibility. A bespoke, dedicated program may out-perform
Pipelines, However, with a dedicated program; each time your requirements
change that means altering the source code (if it is compiled; that means
re-building it as well). This is not a problem when the program is small or
simplistic. But, when we start to talk about pattern, field, word and column
selection, recursive sorting, collating, splitting and joining records from
multiple input files, possibly large files, then we have a different
scenario. Pipelines is designed with this type of processing in mind; it is
intended to offer a quick and efficient processing utility that can help you
manipulate data into a format that suits your needs. |
|
● |
Pipelines itself is
extensible; it comprises an a stage command Visual Studio/VC++ (VS8 VC++ 9) Stage command API library
which contains all the stage initialisation parsing functions and runtime
extraction routines that support the current set of builtin stage filters. The API allows you to create new stage
DLL’s that augment the current builtin
set. The API addresses’ most of the needs that a stage might reasonably
require; console locking and synchronisation, multi-stream connectivity,
multiple column, word and field isolation, pre-process functionality,
character range expansions, input and output record availability and more.
Pipelines ships with a DEBUG and RELEASE API library version. The Pipelines Stage
command API utilises the Microsoft Foundation Class (MFC) CString class
extensively and other MFC specific classes under the covers, as and when
required. |
|
● |
Pipelines supports third-party
non-API WIN32 console applications/modules through the SHELLEXECUTE stage
command. SHELLEXECUTE will load and service any WIN32 application; reading
input records from that process’ STDOUT and STDERR I/O streams; writing
records to the SHELLEXECUTE stages’ primary and secondary output streams,
respectively. |
|
● |
Pipelines provides a convenient
and easy way to create a new ooRexx script; simply right-click anywhere on
your desktop or within a folder, to access to the ‘New->Pipelines file’
option. Selecting this option will create a very simple skeleton ooRexx file;
ext (.REX). File associations under Windows can be a troublesome, especially
when you try to re-name a file by extension - using this method; you can
create a new ooRexx file with the minimum of effort. |
You may well find that
ooRexx/Pipelines will help you solve your problem, quickly and easily, saving
you time and effort that could be better spent on other tasks. Pipelines is
free; there are no evaluation caveats, you may download it and use it as you
please.
Pipelines is designed and
maintained by: James Laing; if you have any questions
or comments; please contact: TenFiftyTwo.
|
|