Processing multiple files

Pipelines v1.6

 

    

Method 1

Home

 

This first method can only be used when the pipeline writes exactly one output record for each input record that is read. The pipeline must not re-order the sequence or alter the number of records in the pipeline, however; you can translate the records.

 

The following example pipeline illustrates how to specify a pipeline that can process multiple files.

 

pipe (endchar ?)
     filelist noh       .* Generate the list of input files.
     | specs w5-* 1     .* Discard the file stats.
     | a: <             .* Open and read each file; route the filename to secondary output stream.
     | specs w2-* 1     .* Discard the record line number.
 
     .* ...
     .* ...
     .* Do your processing here!. 
     .* You can specify multi-stream input and output intersection labels on stages that route 
     .* the records in and out of this section; as long as they do not alter the number of records 
     .* or their sequence.
     .* ...
     .* ... 
 
     | b: >             .* Write the records to the file specified in the secondary input stream.
     ?
     a:
     | take *           .* Select all the records.
     | b:               .* Route back to > secondary input stream.
 

The pipeline works as follows:

 

Each time the FILELIST stage finds a file; it writes a record that contains the name of that file to its primary output stream. The < stage reads this record; opens the file specified in the record for input and begins reading records. For each record that < reads; it first writes a record which contains this same input filename to its secondary output stream and then it writes the input file record to its primary output stream.

 

You then specify the stages that you want to operate on the records (as long as they conform to the constraints as described above). The last stage must write its records to its primary output stream.

 

Next, the > stage reads a record from its secondary input stream; which is the record written by < on its secondary output stream. This record denotes the name of the file to write to. If > determines that the record contains a filename that is different from the previous one, > closes the current output file and opens the new one, as specified in the record. Then > reads a record from its primary input stream and writes this record to the output file.

 

This process continues until FILELIST cannot find anymore files and terminates causing the pipeline to end.

 

    

Method 2

 

This second method allows you to process multiple files and to perform translations that alter the number and sequence of the records. You can sort, split, discard records or introduce new ones, however, in order to do this you need to construct two separate pipelines.

 

Consider the pipelines; list.ppl and format.ppl, below.

 

Each time the FILELIST stage in the first pipeline: list.ppl, finds a file; it writes a record that contains the name of that file to its primary output stream. The SPECS stage reads this record and isolates the filename, surrounds it with quotation marks (“) and writes the modified record to its primary output stream. Finally; the RUNPIPE stage reads this record from its primary input stream and launches the specified pipeline: format.ppl; with the record as its command-line argument.

 

The second pipeline: format.ppl substitutes the argument placeholder: &arg1 with its command-line argument; which is the name of the file to update.

 

list.ppl
 
pipe filelist noh
     | specs /"/ 1 w5-* n /"/ n
     | runpipe format.ppl
 
format.ppl
 
pipe < &arg1
 
     .* ...
     .* ...
     .* Do your processing here!. 
     .* You can specify multi-stream input and output intersection labels on stages that route 
     .* the records in and out of this section.
     .* ...
     .* ... 
 
     | > &arg1
 

A pipeline launched by the RUNPIPE stage (with the default WAIT operand) always passes back its return code to the calling pipeline. This allows you to construct a chain of pipelines that unravel when any one pipeline fails with an error code.