CPL
Page: (fri)Top, Next: FRI architecture, Up: (cpl)cpl Index

The FRI interpreter

The design of most compilers passes through a formal description of the language that is compiled by a parser generator like yacc into a C program. At the heart of CPL's extensibility there is a different structure, where a formal description of the language is never compiled to generate the compiler, but rather interpreted each time a compilation is run. The engine that does this is named fri, the Formal Rule Interpreter. If yacc is a compiler-compiler, fri may be called a compiler-interpreter. In fact, fri is a quite general pattern-matching engine with its own FRI meta-language, and only learns about CPL when it reads the cpl file every time the compiler is invoked. Translators for many other languages can in principle be written in it. The interpreted nature of the CPL compiler, in addition to having allowed a smooth continued development and fine-tuning over the years, has the effect that its syntax can be extended on the fly by the compiled program itself with a few lines of FRI code written in a (cpl)FRI SECTION. This abilility is used by (cpl)Library modules in order to define, for instance, complex-number notation, matrix-algebra operations or symbolic manipulations.
Page: (fri)FRI architecture, Next: Assembler, Prev: Top, Up: Top Index

FRI architecture

FRI is a declarative language. Much like a Turing machine, the FRI virtual machine's workspace is composed of three items: an input character stream, an output character stream and a dictionary of rules. A rule is a sequence of bytecodes that can represent characters to be matched against the input stream, characters to be written to the output stream, or opcodes specifying general actions that can modify either stream, or the dictionary itself, or initiate a new rule. A rule matches if it runs to its end, or fails if an input character or an inner nested rule does not match. FRI's alphabet is restricted to 7-bit ascii, and bytes with the 8th bit set, when they appear in a rule amidst the characters to be recognized, are given the meaning of opcodes specifying a variety of operations. The input stream is generally attached to an input file to be manipulated, but can also be modified by explicit instructions (opcodes). The output stream is not automatically attached to a file and must be written out explicitly when desired. It is internally subdivided into a stack of strings which can be individually recalled. Every time a rule matches, all but its top output string are forgotten while its top string is returned to the calling rule (either appended to its output or prepended to its input). When a rule fails, the input counter is reset to where it was when the rule began, and the output counter is reset so that all of its output strings are erased. The dictionary of rules is alphabetically ordered and, with one exception, independent of the order in which the rules are entered. Additionally, each new rule inserted in the dictionary gets assigned to a numbered scope. A new scope can be opened at any time by an explicit command and will contain subsequently generated rules, all of which disappear when the scope is eventually closed. A certain number of opcodes with no action are reserved as strands, markers of rules with related meaning. Each rule must start with a strand opcode, but strand opcodes can also appear elsewhere with a user assigned meaning. If a rule, or more accurately a strand of alternate rules, is viewed as a sort of subroutine, this subroutine can be called by inserting its strand byte in the input stream. When one rule fails, execution restarts with the next one in the dictionary. If two rules have a common root, only the part of the rule that failed undergoes a new attempt; side effects of the part that matched will not be repeated more than once. Execution terminates, as a match, when any one rule's byte sequence attains its end (or an explicit end opcode) or, as a failure, when the dictionary runs out of rules. For debugging purposes, it can also be terminated in an error message by suitable opcodes.
Page: (fri)Assembler, Next: Opcodes, Prev: FRI architecture, Up: Top Index

The FRI assembler

Much like machine code of an actual CPU (into which fri might one day be implemented, perhaps), FRI bytecode is hard to type in by a human being. When fri boots up, the rule dictionary is initially filled with just enough rules to assemble human-readable FRI into FRI bytecode. The FRI assembler uses symbolic names for Opcodes, listed below, and user-assigned names for strands. Additionally it provides the @ pseudo-opcode by which a stack element can be assigned a label and later referred to by this label rather than by its position on the stack. The two strands FRIsymbol and FRIdefinition contain code for the assembler itself: at the basics FRIsymbol translates a symbolic name to its corresponding bytecode, and FRIdefinition assembles a sequence of FRIsymbol's into a rule and adds it to the dictionary. The assembler, like everything FRI, can modify its own rules, and frills and whistles are added this way. A FRI assembler program is a text file with one FRI rule per line (and possible comments). The order of lines is largely irrelevant because they will be alphabetically sorted at the time they are inserted in the dictionary, with the exception that new FRIsymbol's and FRIdefinition's (like the definitions of strand names or of shortcuts) must come before they are used, and that rules that branch in output mode, after a =[ (put) opcode, are kept in last-to-first order. The branching-tree structure of the dictionary makes it so that, in rules having a common stem, the stem will be executed only once, even if one rule fails and the next one is tried. Branching is not allowed to split an output string even if the first few characters are coincident with another's, and will always occur at the =[ opcode that opens it. The last-to-first ordering allows a dynamically generated, more recent rule to temporarily override an older one, with the older rule automatically restored when the overriding one is deleted.
Page: (fri)Opcodes, Next: Command line, Prev: Assembler, Up: Top Index

Language opcodes

FRI opcodes are represented by single bytes (with the 8th bit set), or in some cases by two bytes the first of which is a shift opcode, and in the assembler by symbolic names or punctuation symbols. The meaning of each symbolic name is described below, categorized according to function. Character matching Symbol pronounced action (ascii char) any 7-bit ascii character matches itself. -- except (followed by a single byte) matches if the following character does not match the input. - through (followed by two bytes) matches if the input is not alphabetically enclosed (in ascii order) between the next two characters, excluded. ]= cork (coded as ascii char 127) automatically added to the input stream every time new input is inserted, to mark the end of the new input extent; like all ascii characters, matches itself. { { (ascii code 18) if found in the input stream, groups a block of zero or more characters up to the next } (ascii code 19) for copying purposes. Can be nested. Like all ascii characters, matches itself. copy copy copies one (non-cork) character or one {}-enclosed block from the input to the output stream. ∗ copy up to (followed by a single byte) copies zero or more characters or {}-enclosed blocks from the input to the output stream until the following character (or a cork) is encountered. The terminating character is popped from the input stream but not written out. Character writing =[ put opens a new string in the output stack and switches from input to output mode, which will last until the next unescaped opcode starting with a ], -> or ∗. (ascii char) in output mode an ascii character is just written; even opcodes are written rather than executed, with the exception of the output terminators above, which restore input mode and are then executed, and #, ^, delete, UniqueNo, LineNo which are executed without leaving output mode. ^ escape (followed by a single byte) the character following an escape is written to the output stream unconditionally, in either mode and even if it is an opcode. A sequence of $n$ escapes is written as a sequence of $n-1$ escapes plus the following character, except if the following character is a null (ascii 0) which is not written. # number (followed by a decimal digit or label) recalls an output string identified by the number that follows, and appends it to the current output. It can be followed by a positive number, denoting a stack count backwards from the top, a negative number, denoting a stack count backwards from the first string of the present rule, or a symbolic label assigned by the @ pseudocode. @ label (followed by alphanumeric label) an assembler pseudo-opcode that assigns the following alphanumeric string as a label to the current position in the output stack, so it can later be recalled (within the same rule only) by name. ] end exits output mode, or the current control sequence, or the current rule. ]= cork exits output mode and adjoins the top two output strings, decreasing the stack count by one. backspace backspace drops the top string from the output stack. Flow control -> pipe into pops the top output string and prepends it as a new input extent to the current input, separated by a ]= (cork) byte. The new input shall either be passed as such to the rest of the rule, if it starts with a 7-bit ascii character, or be processed by a new instance of the rule-matching virtual machine if it starts with an opcode. After the called-up rule matches, its output becomes the new input and the calling rule continues, but in case of a later failure the rule-matching machine is invoked in a loop until a match is obtained (success) or no more rules are available (failure). convert convert null opcode, matches itself and returns what follows in its input extent. Typically used before another opcode in a -> (pipe into) operation, in order for its contents to get a chance to be examined as input by the rest of the rule before being ``converted'' by a new rule-matching passage. ] end successfully terminates the current rule, the output mode, or any of the compound sub-rules described below. 1[ optional executes the enclosed subrule (up to a matching ]) 0 or 1 times, reporting success to the containing rule. [ repeat repeats the enclosed subrule (up to a matching ]) 0 or more times until it fails, reporting success to the containing rule and appending each output to the top output string. r[ recur repeats the enclosed subrule (up to a matching ]) 0 or more times until it fails, reporting success to the containing rule and replacing the top output string by each output. ifnot[ if not succeeds if the enclosed subrule fails, and fails if the enclosed subrule (up to a closing ]) matches, discarding its output. unless[ unless succeeds if the enclosed subrule fails and fails if the enclosed subrule (up to a closing ]) matches, appending its output to the output stream. # number (followed by a decimal digit or label) when used in input mode, recalls a string from the output stack (by the same numbering scheme as in output mode), and executes it as a string of bytecodes, as if they were inserted in the current rule at the current position. cut cut like the Prolog cut, asserts that failure beyond its position is not intended to occur. If the present rule fails after a cut, an error will be reported rather than moving on to the next rule. In addition, cut has lower priority than all other opcodes except ], so if two rules are identical up to where a cut appears, the cut rule will be tried last. Dictionary manipulation ]-> save (followed by a single byte) saves the top output string to the dictionary as a new rule, in the same scope as rules starting with the following opcode, or in the current scope if the following byte is null. openscope open scope opens a new scope. closescope close scope closes the current scope and erases all rules generated since the last openscope. mergescope merge scope closes the current scope and merges all rules generated since the last openscope into the previous scope. delete delete deletes from the dictionary all rules that begin with the byte sequence contained in the top output string and belong in the current scope. File handling open open opens a file for output, taking its name from the top output string. close close closes the current output file, resuming output to the previously opened one. ]->out write out writes the top output string to the current output file and pops it off the output stack. include include opens a file for input, taking its name from the top output string, and includes it at the current position in the input stream, followed by a ]= (cork). At the end of this file the file will be closed and the previous input will resume automatically. closein close input forces closing the current input file, resuming input from where it was in the previously opened one. Shortcut opcodes spaces spaces matches a sequence of zero or more blank characters (spaces and tabs). ∗]= copy all copies all remaining characters from the last input extent to the output stream, reading but not copying the next cork. Nearly equivalent to ∗ ]=, but notice that the boundary of an input extent is internally marked, so ∗]= will copy all of it even if it contains an internal ]= (cork). =[] put end opens an empty new string in the output stack and goes back to input mode; single opcode equivalent to =[ ]. do do starts execution of the rule that begins with the next bytecode in a new virtual machine; functionally equivalent to a single-byte -> (pipe). ]> append appends the top output string to the rule that starts with the following bytecode. Useful for the temporary storage of multi-line strings. get get a do (do) replacement that only executes a rule composed of a single =[ (put), presumably created by a sequence of ]> (append) operations, and deletes the rule at the same time. It operates by reference, which for long strings speeds up execution. Debugging ]->error cast error writes the top output string to stderr and terminates execution in error. trace trace toggles in and out of Trace mode, where an interactive animation of the rule-matching process is displayed on the controlling terminal. LineNo line number writes the current input-file line number to the output stream. list-> list extracts from the dictionary all the rules that begin with the byte sequence provided in the top output string, and prepends them one after another to the input, each one as a new input extent separated by a ]= (cork), with an empty input extent terminating the whole list. wrsym write symbol disassembles the opcode provided in the top output byte and replaces it by its symbolic name. display display displays the top output string on the controlling terminal. Miscellaneous, less frequently used operations ## scope number only valid in output mode, recalls the current scope identifier so it can later be used in a ]-> (save) operation. => force write appends everything that follows to the output stream in an 8-bit safe way, including any opcodes that might terminate a normal =[ (put). AssignStrand assign strand decrements an internal counter and outputs the next available rule strand (still unused null opcode). UniqueNo unique number increments and outputs (in decimal ascii) the internal Unique Number counter, made available in order to assign unambiguous identifiers to variables. pushNo push number pushes the value of the Unique Number counter (as a binary integer) onto the output stream. popNo pop number pops the value of the Unique Number counter (as a binary integer) from the output stream. writeimage write image writes a binary image of the whole dictionary to the output file (so it can later be quickly reloaded rather retranslated). readimage read image reads in a binary dictionary image prepared by writeimage. newer newer matches if the file named in the top output string exists and is newer (has an equal or more recent modification time) than the current input file. wrinputpos write input position writes the input-stream position pointer (as a binary sequence of bytes) to the output stream. rdinputpos read input position pops the input-stream position pointer (as a binary sequence of bytes) from the output stream, effectively resuming input from where it was when wrinputpos was invoked. cpinputpos copy input position outputs the input-stream chunk going from the position previously marked by wrinputpos to the current input position. ascii ascii outputs the character or bytecode denoted in hexadecimal notation by the next two input characters. \ (followed by two bytes) outputs the character or bytecode denoted in hexadecimal notation by the next two rule bytes. shell command shell command passes the top output string to the operating system shell for execution as a command.
Page: (fri)Command line, Next: Trace mode, Prev: Opcodes, Up: Top Index

The fri command line

Usage: fri [rule file] [-e command] [-i] [data file] ... All arguments are optional. fri called with no arguments displays its copyright and version message and exits. The rule file contains a list of FRI rules and optionally a -e clause. It can also redefine the syntax of the command line itself, as for instance the CPL compiler does, in which case no -e clause need appear. An initial line beginning with # is ignored, and allows the rule file to be made executable in a unix shell by declaring fri as its interpreter. If the -e clause appears either inside the rule file or on the command line, the remaining parameters are interpreted as data files to act upon. A file name of - stands for stdin. If no data files are listed, input is taken from stdin. The command specified in FRI syntax by the -e clause is executed in a loop until the end of input, copying one character from input to output every time the command fails. This feature allows simple sed-like scripts to be specified on the command line itself. The command that succeeds may consume one or more or even all characters. If the -i option is present, each data file is overwritten in place; otherwise each file is acted upon in a sequence and its result catenated to stdout. EXAMPLES cat replacement: fri -e copy tac replacement: fri -e 'r[ =[ ∗ \n =[\n #2]=]' change oranges to apples: fri -e 'orange =[apple]'
Page: (fri)Trace mode, Next: Arithmetic mode, Prev: Command line, Up: Top Index

Trace mode

When the fri compiler-interpreter runs in trace mode, which can be toggled by the trace bytecode, a screenful of information is displayed on the user's ANSI terminal, in a format where ascii characters are colored in yellow and FRI opcodes and strands are disassembled to their symbolic names colored in blue. The top two lines contain a running view of the input buffer, with a marker pointing at the current position of the input pointer, and the current input line number and file name displayed at its left. The next two lines contain a similar view of the output buffer, centered about the current position of the output pointer. The statistics collected at its left are: current scope depth in the dictionary, cut toggle, occupied number of bytes in the output buffer, current depth of the output stack. The following lines contain a running view of the presently executing dictionary rule. Every time a -> (pipe into) is executed a new line is started, so that the suspended calling lines remain visible above it. The green label on the left is the strand (first bytecode) of each rule. The next branching to follow the current instruction is also displayed, with alternatives vertically listed below the currently executing line. Tracing proceeds as an animation, with one instruction being executed every 0.7 seconds and the display being updated accordingly. The following single-key commands modify it interactively: ? or / help display a list of these single-key commands. q quit stop execution and quit the interpreter. c continue exit trace mode and continue normally. s slow toggle in and out of slow mode, which also traces single ascii characters and other trivial bytecodes that are normally skipped over. p or esc pause pause execution. Any key will resume. <- -> horizontally scroll the input, output and current-instruction lines. Focus is re-centered on exit from pause. l list followed by the stem of a dictionary rule in assembler notation, displays all the dictionary rules that start with that stem. # or 3 number followed by a number and Enter, displays the corresponding output stack line. Followed by just Enter, displays the last few lines of the stack. b backfiles displays the stack of currently open input filenames. tab complete run to the end of the current displayed line with tracing suspended, and resume tracing upon return to the line above it. space fast-forward through tracing as long as the key is pressed, and resume the normal 0.7-second rate once released.
Page: (fri)Arithmetic mode, Prev: Trace mode, Up: Top Index

Arithmetic mode

For the purpose of a more traditional CPU emulation (used by (cpl)cpl i), fri provides an additional set of opcodes that specify arithmetic operations. These become active in an ad-hoc arithmetic mode that exists alongside input and output modes. Most arithmetic opcodes manipulate the fri output buffer as a stack of fixed-width objects (binary integers, machine addresses, floats) irrespective of its subdivision in strings. Nevertheless the subdivision remains in action, and output strings serve double use as stack frames. A stack address in what follows is a binary integer expressing a byte count relative to the frame. Symbol pronounced action compute[ compute activates arithmetic mode for the following bytecodes up to a closing ] (end). # number (followed by a single byte) when in arithmetic mode recalls the stack address of the corresponding string as a binary integer (instead of copying the string itself). goto (followed by a binary integer) go to the indicated bytecode address, relative to the current instruction. bcall (followed by a binary address) call the machine-coded function at the given machine address, obtained for instance from the dl library. bytecall (followed by a binary address) interpret the bytecode function at the given machine address. spop pop a binary integer len from the top of stack and further shorten the stack by len bytes. gres pop a binary integer len from the top of stack, malloc a field of len bytes, and return its address to the top of stack. gdel pop a binary address from the top of stack and free the malloc'ed field at that address. 0 push binary integer 0 to the stack. 1 push binary integer 1 to the stack. 2 push binary integer 2 to the stack. 3 push binary integer 3 to the stack. 4 push binary integer 4 to the stack. 8 push binary integer 8 to the stack. pushint (followed by a binary integer) push that integer to the stack. NULL push binary address NULL to the stack. pushaddr (followed by a binary address) push that address to the stack. pushreal (followed by a binary real) push that real to the stack. pushshort (followed by a binary short integer) push that short integer to the stack. pushbyte (followed by a single byte) push that byte to the stack. pushn (followed by a binary integer count plus count bytes) push count bytes to the stack. saddr (followed by a binary integer) convert the given stack address to an absolute machine address and push it onto the stack. sframe open a new output string (similar to what =[ does) for use as a stack frame. sres pop a binary integer len from the stack and advance the stack counter by len bytes (reserving space for stack variables). sdel pop the present output string (delete stack frame). sloadint (followed by a binary integer) get a binary integer from the given stack address, and push it onto the stack. sload8 (followed by a binary integer) get a binary 8-byte real from the given stack address, and push it onto the stack. sload (followed by a binary integer len and a binary integer addr) get len bytes from stack address addr and push them onto the stack. dload (followed by a binary integer len and a machine address addr) get len bytes from machine address addr and push them onto the stack. iloadint pop a stack address from the stack, and push the integer found at that address. iloadreal pop a stack address from the stack, and push the real found at that address. iloadaddr pop a stack address from the stack, and push the binary address found at that address. iload (followed by a binary integer len) pop a stack address from the stack, and push len bytes found at that address. nload pop a machine address addr, pop a binary integer len, get len bytes from machine address addr and push them onto the stack. ssave4 (followed by a binary integer addr) pop a binary integer from the stack and save it to stack address addr. ssave8 (followed by a binary integer addr) pop a binary real from the stack and save it to stack address addr. ssave (followed by a binary integer len and a binary integer addr) pop len bytes from the stack and save them to stack address addr. dsave4 (followed by a machine address addr) pop a binary integer from the stack and save it to machine address addr. dsave8 (followed by a machine address addr) pop a binary real from the stack and save it to machine address addr. dsave (followed by a binary integer len and a machine address addr) pop len bytes from the stack and save them to machine address addr. isave (followed by a binary integer len) pop a machine address addr from the stack, then pop len bytes from the stack saving these bytes to machine address addr. prs pop a binary integer len and a machine address addr from the stack, then save at most len-1 bytes of the current output string to address addr. iadd pop the two topmost integers and replace them by their sum. isub pop the two topmost integers and replace them by their difference. imul pop the two topmost integers and replace them by their product. idiv pop the two topmost integers and replace them by their integer ratio. imod pop the two topmost integers and replace them by their integer division remainder. iand pop the two topmost integers and replace them by their bitwise and. ior pop the two topmost integers and replace them by their bitwise or. ixor pop the two topmost integers and replace them by their bitwise xor. irshift pop the two topmost integers and replace them by the first one shifted right according to the second. ilshift pop the two topmost integers and replace them by the first one shifted left according to the second. itoc pop the topmost integer and convert it to a single-byte character. itoa (followed by a null-terminated string f) pop the topmost integer and convert it to a string according to printf format f. addradd pop a binary integer a, pop a machine address m, and replace them by machine address m+a. addrsub pop two machine addresses m1, m2, and replace them by their integer difference m2-m1. imuladd (followed by a binary integer mul) pop two binary integers i1, i2, and replace them by the integer i1∗mul+i2. imuladdaddr (followed by a binary integer mul) pop a binary integer i1, a machine address m2, and replace them by the machine address i1∗mul+m2. bytetoint pop the topmost single byte and convert it to an integer. seq (followed by a binary integer s) pop two s-byte objects o1, o2, and replace them by the integer boolean result of o2=o1. sne (followed by a binary integer mul) pop two s-byte objects o1, o2, and replace them by the integer boolean result of o2#o1. sgt pop two machine addresses m1, m2, and replace them by the integer boolean result of m2>m1. sge pop two machine addresses m1, m2, and replace them by the integer boolean result of m2>=m1. slt pop two machine addresses m1, m2, and replace them by the integer boolean result of m2<m1. sle pop two machine addresses m1, m2, and replace them by the integer boolean value m2<=m1. ieq pop two binary integers i1, i2, and replace them by the integer boolean result of i2=i1. ine pop two binary integers i1, i2, and replace them by the integer boolean result of i2#i1. igt pop two binary integers i1, i2, and replace them by the integer boolean result of i2>i1. ige pop two binary integers i1, i2, and replace them by the integer boolean result of i2>=i1. ilt pop two binary integers i1, i2, and replace them by the integer boolean result of i2<i1. ile pop two binary integers i1, i2, and replace them by the integer boolean result of i2<=i1. dgt pop two binary reals r1, r2, and replace them by the integer boolean result of r2>r1. dge pop two binary reals r1, r2, and replace them by the integer boolean result of r2>=r1. dlt pop two binary reals r1, r2, and replace them by the integer boolean result of r2<r1. dle pop two binary reals r1, r2, and replace them by the integer boolean result of r2<=r1. dtoi convert real to integer. dadd pop the two topmost reals and replace them by their sum. dsub pop the two topmost reals and replace them by their difference. dmul pop the two topmost reals and replace them by their product. ddiv pop the two topmost reals and replace them by their ratio. dtoa (followed by a null-terminated string f) pop the topmost real and convert it to a string according to printf format f. daddto pop a machine address m, pop a binary real r, and increase the real at address m by r. dsubfrom pop a machine address m, pop a binary real r, and decrease the real at address m by r. ftod convert float to real. itod convert integer to real. not boolean not on integer value ichs sign change integer value ineg bitwise not on integer value iabs integer absolute value dchs sign change real value dabs real absolute value iaddto pop a machine address m, pop a binary integer i, and increase the integer at address m by i. isubfrom pop a machine address m, pop a binary integer i, and decrease the integer at address m by i. if (followed by a binary signed integer s) pop a boolean integer t, if t is true then continue else skip s instruction bytes. nif (followed by a binary signed integer s) pop a boolean integer t, if t is false then continue else skip s instruction bytes. and (followed by a binary signed integer s) pop a boolean integer t, if t is true then continue else re-push t and skip s instruction bytes. or (followed by a binary signed integer s) pop a boolean integer t, if t is false then continue else re-push t and skip s instruction bytes. pushstr (followed by a null-terminated string s) push s to the stack. iloadstr pop a machine address, push the null-terminated string at that address onto the stack. nmove pop machine addresses m1, m2 and integer n; copy n bytes from address m2 to address m1. smove (followed by binary integers len,s) push len bytes from stack address s. prf pop FILE pointer, print current stack frame (or output string) to the given file descriptor. popaddr read machine address from the fri input stream and push it onto the stack. popint read binary integer from the fri input stream and push it onto the stack. popbyte read single byte from the fri input stream and push it onto the stack. popreal read binary real from the fri input stream and push it onto the stack. sizetoint convert machine address size (C ssize_t) to an integer. inttosize convert integer to machine address size (C ssize_t). i8toint convert 8-bit integer to machine integer. inttoi8 convert machine integer to 8-bit integer. savestr pop machine address, save top output string to that address.