Page: (fri)FRI architecture, Next:
Opcodes, Prev:
Top, Up:
Top
FRI architecture
FRI is a declarative language. The FRI virtual machine's workspace is composed of three items: an input character stream, an output character stream and a dictionary of rules. A rule is a sequence of bytes that can represent characters to be matched against the input stream, characters to be written to the output stream, or opcodes specifying general actions that can modify either stream, or the dictionary itself, or initiate a new rule. A rule
matches
if it runs to its end, or
fails
if an input character or an inner nested rule does not match.
FRI's alphabet is restricted to 7-bit ascii, and bytes with the 8th bit set, when they appear in the dictionary amidst the characters to be recognized, are given the meaning of opcodes specifying a variety of operations.
The input stream is generally attached to an input file to be manipulated, but can also be modified by explicit instructions (opcodes). The output stream is not automatically attached to a file and must be written out explicitly when desired. It is internally subdivided into a stack of strings which can be individually recalled. Every time a rule matches, all but its top output strings are forgotten while its top string is returned to the calling rule (either appended to its output or prepended to its input).
The dictionary of rules is alphabetically ordered and, with one exception, independent of the order in which the rules are entered. Each new rule inserted in the dictionary gets assigned to a progressively numbered
scope
. A new scope can be opened at any time by an explicit command and will contain subsequently generated rules, all of which disappear when the scope is eventually closed. A certain number of opcodes with no action are reserved as
strands
, markers of rules with related meaning. Each rule must start with a strand opcode, but strand opcodes can also appear elsewhere with a userassigned meaning. If a rule, or more accurately a strand of alternate rules, is viewed as a sort of subroutine, this subroutine can be
called
by inserting its strand byte in the input stream.
When one rule fails, execution restarts with the next one in the dictionary. If two rules have a common root, only the part of the rule that failed undergoes a new attempt; side effects of the part that matched will not be repeated more than once. Execution terminates, as a match, when any one rule's byte sequence attains its end (or an explicit end opcode) or, as a failure, when the dictionary runs out of rules. For debugging purposes, it can also be terminated in an error message by suitable opcodes.
Page: (fri)Opcodes, Next:
Command line, Prev:
FRI architecture, Up:
Top
Language opcodes
Character matching
Symbol pronounced action
ascii char any 7-bit ascii character matches itself.
-- except matches if the following character does not match the input.
- through matches if the input is not alphabetically enclosed (in ascii
order) between the next two characters, excluded.
]= cork (coded as ascii char 127) automatically added to the input
stream every time new input is provided; like all ascii
characters, matches itself.
{ { (ascii code 18) if found in the input stream, groups a block of
zero or more characters up to the next } (ascii code 19) for
copying purposes. Can be nested. Like all ascii characters,
matches itself.
copy copy copies one (non-cork) character or one {}-enclosed block from
the input to the output stream.
∗ copy up to copies zero or more characters or {}-enclosed blocks from
the input to the output stream until the following character
(or a cork) is encountered. The matching character is popped
from the input stream but not written out.
Character writing
=[ put opens a new string in the output stack and switches from input
to output mode, which will last until the next unescaped opcode
starting with a ], -> or ∗.
ascii char in output mode an ascii character is just written; even opcodes
are written rather than executed, with the exception of the
above output terminators, which restore input mode and are then
executed, and #, ^, delete, UniqueNo, LineNo which are executed
without leaving output mode.
^ escape the character following an escape is written to the output
stream unconditionally, in either mode and even if it is an
opcode. A sequence of $n$ escapes is written as a sequence of
$n-1$ escapes plus the following character, except if the
following character is a null (ascii 0) which is not written.
# number recalls an output string identified by the number that follows,
and appends it to the current output. It can be followed by a
positive number, denoting a stack count backwards from the top,
a negative number, denoting a stack count backwards from the
first string of the present rule, or a symbolic label assigned
by the @ pseudocode.
@ label an assembler pseudo-opcode that assigns the following alpha numeric string as a label to the current position in the output
stack, so it can later be recalled (within the same rule only)
by name.
] end exits output mode or the current control sequence or the
current rule.
]= cork exits output mode and adjoins the top two output strings,
decreasing the stack count by one.
backspace backspace drops the top string from the output stack.
Flow control
-> pipe into pops the top output string and prepends it as a new input
extent to the current input, separated by a ]= (cork) byte. The
new input shall either be passed as such to the rest of the
rule, if it starts with a 7-bit ascii character, or be
processed by a new instance of the rule-matching virtual
machine if it starts with an opcode. After the called-up rule
matches, its output becomes the new input and the calling rule
continues, but in case of a later failure the rule-matching
machine is invoked again and again until a match is obtained or
no more rules are available.
convert convert null opcode, matches itself and returns what follows in its
input extent. Typically used before another opcode in a ->
(pipe into) operation, in order for its contents to get a
chance to be examined as input by the rest of the rule before
being ``converted'' by a new rule-matching passage.
] end successfully terminates the current rule, the output mode, or
any of the compound sub-rules described below.
1[ optional executes the enclosed subrule (up to a matching
]
) 0 or 1
times, reporting success to the containing rule.
[ repeat repeats the enclosed subrule (up to a matching
]
) 0 or more
times until it fails, reporting success to the containing rule
and adjoining each output to the output stream.
r[ recur repeats the enclosed subrule (up to a matching
]
) 0 or more
times until it fails, reporting success to the containing rule
and replacing the output stream by each output.
ifnot[ if not succeeds if the enclosed rule fails, and fails if the enclosed
subrule (up to a closing
]
) matches, discarding its output.
until[ until succeeds if the enclosed rule fails and fails if the enclosed
subrule (up to a closing
]
) matches, saving its output in the
output stream.
# number when used in input mode, recalls a string from the output stack
(by the same numbering scheme as in output mode), and executes
it as a string of bytecodes, as if they appeared at the current
position in the current rule.
cut cut like the Prolog ``cut'', asserts that failure beyond its
position is not intended to occur. If the present rule fails
after a cut, an error will be reported rather than moving on to
the next rule. In addition, cut has lower priority than all
other opcodes except ], so if two rules are identical up to
where a cut appears, the cut rule will be tried last.
Dictionary manipulation
]-> save saves the top output string to the dictionary as a new rule, in
the same scope as rules starting with the following opcode, or
in the current scope if the following character is null.
openscope open scope opens a new scope.
closescope close scope closes the current scope and erases all rules generated
since the last openscope.
delete delete deletes from the dictionary all rules that start with the byte
sequence contained in the top output string and belong in the
current scope.
File handling
open open opens a file for output, taking its name from the top output
string.
close close closes the current output file, resuming output to the
previously open one.
]->out write out writes the top output string to the current output file and
pops it off the output stack.
include include opens a file for input, taking its name from the top output
string, and includes it at the current position in the input
stream, followed by a ]= (cork). At the end of this file the
file will be closed and the previous input will resume
automatically.
closein close input forces closing the current input file, resuming input from
where it was in the previously opened one.
Shortcut opcodes
spaces spaces matches a sequence of zero or more blank characters (spaces and
tabs).
∗]= copy all copies all remaining characters from the last input extent to
the output stream, reading but not copying the next cork.
Nearly equivalent to ∗ ]=, but notice that the boundary of an
input extent is internally marked, so ∗]= will copy all of it
even if it contains an internal ]=} (cork).
=[] put end opens an empty new string in the output stack and goes back to
input mode; single opcode equivalent to =[ ].
do do starts execution of the rule that begins with the next bytecode
in a new virtual machine; replaces a single-byte -> (pipe).
]> append appends the top output string to the rule that starts with the
following bytecode. Useful for the temporary storage of
arbitrary strings.
get get a do (do) replacement that only executes a rule composed of a
single =[ (put), presumably created by a sequence of ]>
(append) operations, and deletes the rule at the same time. It
does its job by reference, which for long strings speeds up
execution.
Debugging
]->error cast error writes the top output string to stderr and terminates
execution in error.
trace trace toggles in and out of tracing mode, where an interactive
animation of the rule-matching process is displayed on the
controlling terminal.
LineNo line number writes the current input-file line number to the output
stream.
list-> list extracts from the dictionary all the rules that start by the
byte sequence provided in the top output string, and prepends
them to the input one after the other, each one as a new input
extent separated by a ]= (cork), with an empty input extent
terminating the whole list.
wrsym write symbol converts the opcode provided in the top output character to
its symbolic name.
display display displays the top output string on the controlling terminal.
Miscellaneous, less frequently used operations
## scope number only valid in output mode, recalls the current scope
identifier so it can later be used in a ]-> (save) operation.
=> force write appends everything that follows to the output stream in an
8-bit safe way, including any opcodes that might terminate a
normal =[ (put).
AssignStrand assign strand decrements an internal counter and outputs the next
available still unused null opcode (deputed as a rule strand).
UniqueNo unique number increments and outputs (in decimal ascii) the
internal Unique Number counter, made available in order to
assign unambiguous identifiers to variables.
pushNo push number pushes the value of the Unique Number counter (as a binary
integer) onto the output stream.
popNo pop number pops the value of the Unique Number counter (as a binary
integer) from the output stream.
writeimage write image writes a binary image of the whole dictionary to the
output file (so it can later be quickly reloaded rather re translated).
readimage read image reads in a binary dictionary image prepared by
writeimage.
younger younger matches if the current input file is younger (has a more
recent modification time) than the file named in the top output
string, or the latter does not exist.
wrinputpos write input position writes the input-stream position pointer (as a
binary sequence of bytes) to the output stream.
rdinputpos read input position pops the input-stream position pointer (as a
binary sequence of bytes) from the output stream, effectively
resuming input from where it was when wrinputpos was invoked.
cpinputpos copy input position outputs the input-stream chunk going from the
position previously marked by wrinputpos to the current input
position.
ascii ascii outputs the character whose ascii code appears in hexadecimal
notation in the next two input characters.
shell command shell command passes the top output string to the operating system shell for execution as a command.
Page: (fri)Command line, Prev:
Opcodes, Up:
Top
The fri command line
Usage: fri [rule file] [-e command] [-i] [data file] ...
All arguments are optional. fri called with no arguments displays its copyright and version message and exits.
The rule file contains a list of FRI definitions and optionally a -e clause. It can also redefine the syntax of the command line itself, as for instance the CPL compiler does, in which case no -e clause need appear. An initial line beginning with # is ignored, and allows the rule file to be made executable in a unix shell by declaring fri as its interpreter.
If the -e clause appears either inside the rule file or on the command line, the remaining parameters are interpreted as data files to act upon. A file name of - stands for stdin. If no data files are listed, input is taken from stdin.
The command specified in FRI syntax by the -e clause is executed in a loop until the end of file, copying one character from input to output every time the command fails. The command that succeeds may consume one or more or even all characters.
If the -i option is present, each data file is overwritten in place; otherwise each file is acted upon in a sequence and its result catenated to stdout.
EXAMPLES
cat replacement:
fri -e copy
tac replacement:
fri -e 'r[ =[ ∗ \n =[\n #2]=]'
substitute apples for oranges:
fri -e '
orange
=[
apple
]'