|
Pipelines by TenFiftyTwo |
v1.6 |
Pipelines is inspired by and
based upon CMS Pipelines; an enterprise systems utility originally designed and
developed by John Hartmann of IBM.
Pipelines
allows
you to modify the contents of a text/data file or files, quickly and easily.
You can specify that only certain sections of a file are to be changed; you can
confine those changes to a column, word or field range, translate words and
phrases, discard or insert new lines of data. You can perform a whole range of
operations on a file or files, using only a simple set of commands. You may
find Pipelines useful for data-mining or updating extremely large log-files;
searching for and replacing values based on simple pattern matching or complex
expressions using regular expression parsing, in reverse record-order if
necessary. A pipeline can call third-party WIN32 programs, issue system (CMD)
and Powershell commands; capturing console output in order to operate on the
data. You can connect multi-purpose pipelines together to quickly construct an
on the fly solution to a wide range of transformation problems that might
otherwise consume a great of your time.
Pipelines build’s on
the concept of directing the output of one process to the input of another,
commonly known as pipelining. It is an old idea and almost all operating
systems support an implementation of varying degree of usefulness. In general
they support the linear, single-stream model; where if you lay each process out
in a straight line, data starts in the first process, passing into the next
where it is changed in some way, and so on down the pipeline chain in a
sequential fashion until it reaches a sink.
For example:
stage1
| stage2 | stage3 | ... | stagen
Pipelines builds on
this mechanism; allowing you to create multi-stream pipelines,
where the topology is no longer horizontal and linear, but two-dimensional;
where the records travel up and down the pipeline chain through intersecting
joints which control the flow of data. Multi-stream pipelines allow you to
select and operate on specific sets of records; routing unselected records
through a joint into and out of other sections of the pipeline.
|
● |
Pipelines
treats
its input data as lines or records,
reading them one at a time from its input and writing them one at a time to
its output. As such and unless the entire input needs to be loaded into
memory-storage; Pipelines only consumes a fraction of the memory that might
otherwise be required, as only a handful of records are ever in the pipeline at any one time. |
|
● |
Pipelines allows
you to operate on files of any size in a single pass; isolating sections of
the file without having to needlessly buffer or sort the data simply in order
to maintain the relative record order.
Consider the following simple pipeline, which, utilising only 6 stages, reads the file: myfile.txt and in a single-pass,
changes the word hello to goodbye only in records that contain
the word friend. |
pipe (endchar ?)
< myfile.txt
| a: locate ‘friend’
| change ‘hello’ ‘goodbye’
| b: faninany
| > myfile.txt
? a:| take *
| b:
|
● |
Pipelines
comprises a range of input, output, selection and transformation stages which provide a number of
useful manipulation functions, including; splitting records, stripping
characters, joining records, collating and sorting and more. On the whole,
similar operations are performed by a single stage; which means that you do not have to remember the names of
an unnecessarily lengthy list of stages.
For example; stripping characters from a record, Pipelines provides a single stage called STRIP which removes
characters from the beginning and/or the end of a record. |
|
● |
With Pipelines,
the pipeline can be specified on the system command-line (CMD), in a batch
file or in a Pipelines file, ext (.PPL).
You design the pipeline in your favourite editor and save it; to execute the
pipeline you simply double click the file icon and Pipelines will launch it.
You can specify pipelines which accept arguments which substitute stage operands and even stage names and coupled with the capability to connect pipelines together, this
allows you build a range of utility pipelines that can be called upon
whenever you need them. |
|
● |
Pipelines
is
general purpose; it has not been developed with any particular field in mind,
it is simply a line/record orientated textual processing
utility that is useful for manipulating data. The design of Pipelines is
essentially a compromise between speed and flexibility. A bespoke, dedicated
program may out-perform Pipelines, However, with a dedicated program; each time
your requirements change that means altering the source code (if it is
compiled; that means re-building it as well). This is not a problem when the
program is small or simplistic. But, when we start to talk about pattern,
field, word and column selection, recursive sorting, collating, splitting and
joining records from multiple input files, possibly large files, then we have
a different scenario. Pipelines is designed with
this type of processing in mind; it is intended to offer a quick and
efficient processing utility that can help you manipulate data into a format
that suits your needs. |
|
● |
The latest version
of Pipelines; 1.6, comprises a stage command – Visual Studio/VC++ (VS8 VC++
9) Stage command API. The API provides
all the initialisation and runtime routines that support the current builtin
stage set; comprising stage command parsing, runtime extraction and stream
connection routines and a stage DLL project wizard which creates a fully
functional skeletal Pipelines stage DLL. |
You may find
Pipelines of use in cases where you might otherwise have to write a program to solve
the problem and it may well save you some time and effort that could be better
spent on other tasks. Pipelines is free; there are no
evaluation caveats, you may download it and use it as you please.
Pipelines is
designed and maintained by: James Laing; if you have
any questions or comments; please contact: TenFiftyTwo.
|
|