Runtime internals
Much more needed here…
Stack
The runtime stack is a zlist
of zvalue
structs. All variables and temporary values are stored on the stack. The “special variables” (FS
, NF
, RS
, CONVFMT
, etc.) are stored at the bottom of the stack. Program globals are stored above that. As the program runs, values are pushed, popped, operated on, etc. Function call arguments and other related information, including variables local to functions, are put on the stack as well, as discussed below.
In earlier versions, the zlist_append()
function was called for every stack push. This ensured that the stack would never overflow unless memory is exhausted, but profiling showed that this was expensive. Now, a healthy margin of stack is checked (only) at every function call, where the stack will be expanded as needed to ensure the margin is maintained. Nearly all awk statements will be “stack neutral”, so the stack is at the same point before and after a statement except for function calls. One exception is for (index in array)...
, which maintains some array iteration information on the stack inside the for
loop. The margin MIN_STACK_LEFT
(currently 1024) ensures that only a pathological statement will be able to overflow the stack, but it is possible. It is possible to analyze the maximum stack usage for each statement during compilation, and avoid any possibility of overflow, but I believe it is not worth the added complexity.
Function definition, call, return, stack frame design
A function definition function f(a, b, c,...) { ... }
generates:
tkfunction function_number
(code for function body)
tknumber uninitvalue
tkreturn number_of_params
As each parameter is parsed, it is added to a table of local variables for this function. The tkreturn is added to return an uninitialized value if the code falls off the end of the function.
When a return
keyword is encountered, the expression following it is compiled and its value is left on the stack. If no expression follows the return, then a tknumber 0
is compiled to push a zero on the stack.
A function call f(a, b, c,...)
generates:
opprepcall function_number
(code to push args)
tkfunc number_of_args
At runtime, these work as follows:
The runtime keeps an index into the stack called parmbase
(a local variable of the main interpreter function interpx()
, initially 0) that points into the current call stack frame as follows:
return_value parmbase-4
return_addr parmbase-3
prev_parmbase parmbase-2
arg_count parmbase-1
function_number parmbase
arg1 parmbase+1
arg2 parmbase+2
...
A function call executes as follows:
opprepcall
pushes placeholder 0 values for the return value, return address, previous parmbase, and argument count. Then it pushes the function number (from the opprepcall function_number
code sequence).
The arguments are pushed onto the stack after that.
tkfunc
retrieves the number of arguments (arg count) from the tkfunc number_of_args
code sequence, calculates where the parmbase should be by subtracting the arg count from the current stack top index, then fills in the return address (offset of next zcode word in the zcode table) and the arg count in the stack frame. Finally, the tkfunc
op pushes the arg count on the stack and sets the ip
(zcode instruction pointer) to go to the tkfunction
op of the function definition.
tkfunction
finds the local variable table for the function and gets the number or parameters. It then pops the arg count (number of actual arguments) from the stack, calculates the new parmbase (stack top index minus arg count), stores the previous parmbase in the stack frame, and sets parmbase
to the new parmbase. Next, it loops to drop excess calling arguments if more args have been supplied in the call than there are params defined for the function. (NOTE: This is an error and should be caught at compile time! FIXME) Then, if the number of supplied args is less than the number of defined params, additional “args” are pushed to be used as local variables by the function. This is where the “maybe map” variables may have to be created, as explained in “Parsing awk”.
When the function returns, either via a return
keyword or “falling off the end” of the function (where a tkreturn
op has been compiled), the tkreturn
op picks up the param count (from the tkreturn number_of_params
code sequence) [NOTE this is unused; remove it? FIXME], gets the arg count from the stack frame, and copies the return value from the stack into the return value slot in the stack frame, and drops the return value from the stack. At this point, the stack should have on it just the arguments, including the locals created beyond the args supplied. The tkreturn
op loops through the locals not supplied by the caller, releasing any map (array) data and dropping the local “arg” from the stack. Now, only the args actually supplied by the caller remain on the stack, and these are dropped. Finally, the ip
instruction pointer is set from the return address in the stack frame, the previous parmbase
value is restored from the stack frame, and the execution continues with the next instruction.