english correction 2
This commit is contained in:
@@ -77,48 +77,44 @@
|
||||
|
||||
\input{semantics.tex}
|
||||
|
||||
The excerpt of operational semantics of the \lambdammm\ is shown in Figure \ref{fig:semantics}. This big-step semantics is a conceptual explanation of the evaluation that, when the current time is $n$, the previous evaluation environment $t$ samples before can be referred to as $E^{n-t}$ , and that when the time < 0, the evaluation of any term is evaluated to the default value of its type (0 for the numeric types).
|
||||
An excerpt of the operational semantics for \lambdammm\ is shown in Figure \ref{fig:semantics}. This big-step semantics conceptually explains the evaluation process: when the current time is $n$, the evaluation environment from $t$ samples prior can be referred to as $E^{n-t}$. If the time is less than 0, any term is evaluated to the default value of its type (0 for numeric types).
|
||||
|
||||
Of course, if we tried to execute this semantics in a straightforward manner, we would have to redo the calculation from time 0 to the current time every sample, with saving all the variable environments at each sample. In practice, therefore, a virtual machine is defined that takes into account the internal memory space used by $delay$ and $feed$, and the \lambdammm\ terms are converted into instructions for that machine before execution.
|
||||
Naturally, if we attempted to execute this semantics directly, we would need to recalculate from time 0 to the current time for every sample, saving all variable environments at each step. In practice, however, a virtual machine is defined that accounts for the internal memory space used by $delay$ and $feed$, and \lambdammm\ terms are compiled into instructions for this machine before execution.
|
||||
|
||||
\section{VM Model and Instruction Set}
|
||||
\label{sec:vm}
|
||||
|
||||
A model for the virtual machine and its instruction set to run \\ \lambdammm\ is based on the VM for Lua version 5\cite{ierusalimschy2005}.
|
||||
The virtual machine model and its instruction set for running \lambdammm\ are based on the Lua version 5 VM \cite{ierusalimschy2005}.
|
||||
|
||||
When executing a computational model based on lambda calculus, a key challenge is handling the data structure known as a closure. A closure captures the variable environment in which the inner function is defined, allowing it to refer to variables from the outer function’s context. If the inner function is paired with a dictionary of variable names and values, the compiler (or interpreter) implementation is straightforward, but runtime performance is limited.
|
||||
|
||||
When executing a computational model based on lambda calculus, the problem is how to handle a data structure called a closure that captures the variable environment where the inner function is defined, to refer the outer variables from the inner function context. If the dictionary data of names and values of variables are paired with inner function, implementation of the compiler (intepreter) is simple, but run-time performance is limited.
|
||||
In contrast, runtime performance can be improved by using a process called closure conversion (or lambda lifting). This process analyzes all the outer variables referenced by the inner function and transforms the inner function by adding arguments so the outer variables can be referred to explicitly. However, implementing this transformation in the compiler is relatively complex.
|
||||
|
||||
On the contrary, a runtime performance can be improved by performing a process called closure transformation (or lambda lifting), which analyses all the names of outer variables referred by the inner function and transforms the inner function by adding argument so that the variables can be referred explicitly, but the compiler implementation of the transformation is relatively complex.
|
||||
The Lua VM adopts a middle-ground approach between these two methods by adding the VM instructions \texttt{GETUPVALUE} and \texttt{SETUPVALUE}, which allow outer variables to be dynamically referenced at runtime. The implementation of the compiler and VM using \textit{upvalues} is simpler than full closure conversion, while still avoiding significant performance degradation. In this approach, outer variables are accessed via the call stack, rather than heap memory, unless the closure escapes the original function's context \cite{nystrom2021}.
|
||||
|
||||
The Lua VM takes an intermediate approach between these two by adding the VM instructions \texttt{GETUPVALUE} / \\ \texttt{SETUPVALUE}, which allows the outer variables to be referred dynamically at runtime. The implementation of compiler and VM using \textit{upvalue} is simpler than closure conversion, while at the same time preventing execution performance degradation, as outer variables can be referred via the call stack rather than on the heap memory unless the closure object escapes from the context of the original function\cite{nystrom2021}.
|
||||
|
||||
Also, upvalue helps interoperations between other programming languages, as Lua can be easily embedded through C language API and when implementing external libraries in C, programmer can access to upvalues of Lua Runtime not only the stack values in C API.
|
||||
Additionally, \textit{upvalues} facilitate interoperability with other programming languages. Lua can be easily embedded through its C API, and when implementing external libraries in C, programmers can access the upvalues of the Lua runtime, not just the stack values available via the C API.
|
||||
|
||||
\subsection{Instruction Set}
|
||||
\label{sec:instruction}
|
||||
|
||||
VM Instructions for \lambdammm\ differs from the Lua VM in the following respects.
|
||||
|
||||
\begin{enumerate}
|
||||
|
||||
\item{Since mimium is a statically typed language unlike Lua, instructions for basic arithmetics are provided for each type.}
|
||||
\item{The call operation is separated into the normal function call and the call of closure due to its static typing similarly, and also to handle higher-order statefull functions(See \ref{sec:vmstructure} for details). }
|
||||
\item{If statements are realised by a combination of two instructions, \texttt{JMP} and \texttt{JMPIFNEG}, whereas the Lua VM uses a dedicated \texttt{TEST} instructions.}
|
||||
\item{Instructions related to for loop, the \texttt{SELF} instruction used for object-oriented programming and the \texttt{TABLE}-related instructions for metadata references to variables are omitted in mimium as they are not used.}
|
||||
\item{Instructions related to list-related data structures are also omitted in this paper, as the implementation of data structures such as tuples and arrays was omitted in the description of the \lambdammm\ in this paper.}
|
||||
|
||||
The VM instructions for \lambdammm\ differ from those of the Lua VM in the following aspects.
|
||||
|
||||
\begin{enumerate}
|
||||
\item Since mimium is a statically typed language, unlike Lua, instructions for basic arithmetic operations are provided for each type\footnote{In the actual implementation, instructions such as \texttt{MOVE} include an additional operand to specify the word size of values, particularly for handling aggregate types like tuples.}.
|
||||
\item The call operation is split into normal function calls and closure calls, due to the static typing, and to manage higher-order stateful functions (see \ref{sec:vmstructure} for details).
|
||||
\item Conditional statements are implemented using a combination of two instructions, \texttt{JMP} and \texttt{JMPIFNEG}, whereas the Lua VM employs a dedicated \texttt{TEST} instruction.
|
||||
\item Instructions related to for-loops, the \texttt{SELF} instruction used in object-oriented programming, and the \texttt{TABLE}-related instructions for metadata references to variables are omitted in mimium as they are unnecessary.
|
||||
\item Instructions related to list-like data structures are also excluded from this paper, as the implementation of data structures such as tuples and arrays is outside the scope of the \lambdammm\ description here.
|
||||
\end{enumerate}
|
||||
|
||||
Instructions in \lambdammm\ VM are 32bit data with operation tag and 3 operands. Currently, a bit width for the tag and each operands are all 8 bit\footnote[1]{Reason for this is that it is easy to implemented on \texttt{enum} data structure on Rust, a host language of the latest mimium compiler. Operands bitwidth and alignment may be changed in the future.}.
|
||||
The VM for \lambdammm\ operates as a register machine, similar to the Lua VM (post version 5). However, unlike traditional register machines, it does not employ physical registers; instead, the register number simply refers to an offset index on the call stack relative to the base pointer during VM execution. The first operand of most instructions specifies the register number where the result of the operation will be stored.
|
||||
|
||||
The list of instructions is presented in Figure \ref{fig:instructions} (basic arithmetic operations are partially omitted). The notation for the instructions follows the format outlined in the Lua VM documentation \cite[p.13]{ierusalimschy2005}. From left to right, the operation name, a list of operands, and the pseudo-code of the operation are displayed. When each of the three operands is used as an unsigned 8-bit integer, they are represented as \texttt{A B C}. If an operand is used as a signed integer, it is prefixed with \texttt{s}. When two operand fields are combined into a 16-bit value, the suffix \texttt{x} is added. For example, when \texttt{B} and \texttt{C} are merged and treated as a signed 16-bit value, they are represented as \texttt{sBx}.
|
||||
|
||||
The VM of \lambdammm\ is a register machine like the Lua VM (after version 5), although the VM has no real register but the register number simply means the offset index of the call stack from the base pointer at the point of execution of the VM. The first operand of most instructions is the register number in which to store the result of the operation.
|
||||
|
||||
The list of instructions is shown in Figure \ref{fig:instruction} (basic arithmetic operations are partly omitted). The notation for the instruction follows the Lua VM paper \cite[p.13]{ierusalimschy2005}. From left to right, the name of operation, a list of operands, and pseudo-code of the operation. When using each of the three operands as unsigned 8 bits, they are denoted as \texttt{A B C}. When used with a signed integer, prefix \texttt{s} is added, and when the two operand fields are used as one 16 bits, an suffix \texttt{x} is added. For example, when B and C are merged and treated as signed 16 bits, they are denoted as \texttt{sBx}.
|
||||
|
||||
In pseudo-code describing an functionality, \texttt{R(A)} means that data is moved in and out through the register (call stack) at the point of base pointer for current function + \texttt{A}. \texttt{K(A)} means that it retrieves the \texttt{A}-th number in the static variable field of the compiled program. \texttt{U(A)} means that referring \texttt{A}-th upvalue of the current function.
|
||||
|
||||
In addition to Lua's Upvalue operation, 4 operations related to internal state variables over time, \texttt{GETSTATE}, \texttt{SETSTATE}, \\ \texttt{SHIFTSTATE} and \texttt{DELAY} are added to compile $delay$ and $feed$ expressions.
|
||||
In the pseudo-code, \texttt{R(A)} denotes data being moved in and out of the register (or call stack) at the base pointer + \texttt{A} for the current function. \texttt{K(A)} refers to the \texttt{A}-th entry in the static variable section of the compiled program, and \texttt{U(A)} accesses the \texttt{A}-th upvalue of the current function.
|
||||
|
||||
In addition to Lua’s upvalue operations, four new operations—\texttt{GETSTATE}, \texttt{SETSTATE}, \texttt{SHIFTSTATE}, and \texttt{DELAY}—have been introduced to handle the compilation of the $delay$ and $feed$ expressions in \lambdammm.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\tt
|
||||
\small
|
||||
@@ -165,19 +161,19 @@
|
||||
\subsection{Overview of the VM Structure}
|
||||
\label{sec:vmstructure}
|
||||
|
||||
The overview of a data structure of the virtual machine, the program and the instantiated closure for \lambdammm\ is shown in Figure \ref{fig:vmstructure}. In addition to the normal call stack, the VM has a storage area for managing internal state data for feedback and delay.
|
||||
The overall structure of the virtual machine (VM), program, and instantiated closures for \lambdammm\ is depicted in Figure \ref{fig:vmstructure}. In addition to the usual call stack, the VM has a dedicated storage area (a flat array) to manage internal state data for feedback and delay.
|
||||
|
||||
This storage area is accompanied by data indicating the position from which the internal state is retrieved by the \texttt{GETSTATE} / \texttt{SETSTATE} instructions. This position is modified by \\ \texttt{SHIFTSTATE} operation back and forth. The actual data in the state storage memory are statically layed out at compile time by analyzing function calls that include references to \texttt{self}, call of \texttt{delay} and the functions which will call such statefull functions recursively. \texttt{DELAY} operation takes 2 inputs, B for an input and C for the delay time in samples.
|
||||
This storage area is accompanied by pointers indicating the positions from which internal state data are retrieved via the \texttt{GETSTATE} and \texttt{SETSTATE} instructions. These positions are shifted forward or backward using the \texttt{SHIFTSTATE} instruction. The actual data layout in the state storage memory is statically determined during compilation by analyzing function calls involving references to \texttt{self}, \texttt{delay}, and other stateful functions, including those that invoke such functions recursively. The \texttt{DELAY} operation takes two inputs: \texttt{B}, representing the input value, and \texttt{C}, representing the delay time in samples.
|
||||
|
||||
However, in the case of higher-order functions that receive a function as an argument and return another function, the layout of the internal state of the given function is unknown at the compilation, so an internal state storage area is created for each instantiated closure separately from the global storage area held by the VM instance itself. The VM have an another stack to keep the pointer to state storage. Each time \texttt{CALLCLS} used, VM pushes a pointer to the state storage of instantiated closure to the state stack and, at the end of the closure call, VM pops out the state pointer from the stack.
|
||||
However, for higher-order functions—functions that take another function as an argument or return one—the internal state layout of the passed function is unknown at compile time. As a result, a separate internal state storage area is allocated for each instantiated closure, distinct from the global storage area maintained by the VM instance. The VM also uses an additional stack to keep track of pointers to the state storage of instantiated closures. Each time a \texttt{CALLCLS} operation is executed, the VM pushes the pointer to the closure's state storage onto the state stack. Upon completion of the closure call, the VM pops the state pointer off the stack.
|
||||
|
||||
Instantiated closures also hold the storage area of upvalues. Until the closure exits the context of parent function (such a closure is called ``Open Closure''), upvalues holds a negative offset on the stack at the ongoing execution. This offset value can be determined at compile time, the offset is stored in the function prototype in the program. Also, not only local variables, upvalue may refer to parent funtion's upvalue (this situation can happens when at least 3 functions are nested). So the array of upvalue indexes in the function prototype holds a pair of tag whether it is local stack value or further upvalue and its index (negative offset of stack or parent function's upvalue index).
|
||||
Instantiated closures also maintain their own storage area for upvalues. Until a closure exits the context of its parent function (known as an "Open Closure"), its upvalues hold a negative offset that references the current execution's stack. This offset is determined at compile time and stored in the function's prototype in the program. Additionally, an upvalue may reference not only local variables but also upvalues from the parent function (a situation that arises when at least three functions are nested). Thus, the array of upvalue indices in the function prototype stores a pair of values: a tag indicating whether the value is a local stack variable or an upvalue from a parent function, and the corresponding index (either the negative stack offset or the parent function's upvalue index).
|
||||
|
||||
For instance, if the Upvalue indexes in the program were like \texttt{[upvalue(1),local(3)]}, \texttt{GETUPVALUE 6 1} means that, take \texttt{3} from the upvalue indexes 1 and get value from \texttt{R(-3)} over the base pointer and store it to \texttt{R(6)}.
|
||||
For example, consider a scenario where the upvalue indices in the program are specified as \texttt{[upvalue(1), local(3)]}. In this case, the instruction \texttt{GETUPVALUE 6 1} indicates that the value located at index \texttt{3} from the upvalue list (referenced by \texttt{upvalue(1)}) should be retrieved from \texttt{R(-3)} relative to the base pointer, and the result should be stored in \texttt{R(6)}.
|
||||
|
||||
When the closure escapes from the original function with \\ \texttt{RETURN} instruction, inserted \texttt{CLOSE} instruction \\ the \texttt{RETURN} instruction moves actual upvalues from the stack into somewhere on the heap memory. This upvalues may be referred from multiple locations when using nested closures, and some form of garbage collection needed to free memory after they are no longer referred.
|
||||
When a closure escapes its original function context through the \texttt{RETURN} instruction, the inserted \texttt{CLOSE} instruction moves the active upvalues from the stack to the heap memory. These upvalues may be referenced from multiple locations, especially in cases involving nested closures. As such, a garbage collection mechanism is required to free memory once these upvalues are no longer in use.
|
||||
|
||||
In the current specification, the paradigm is call-by-value and reassignment expression does not exist, therefore, \texttt{SETUPVALUE} instruction does not exist in \lambdammm\ VM. This difference also make a difference to the implemention of open upvalue in the closure because the open upvalue should be shared memory cell which maybe recursively converted into memory cell of closed value when the \texttt{CLOSE} instruction is called.
|
||||
In \lambdammm's VM, since the paradigm is call-by-value and there is no reassignment expression, the \texttt{SETUPVALUE} instruction is omitted. If reassignment were allowed, the open upvalues would need to be implemented as shared memory cells, as the values might be accessed by multiple closures that could trigger a \texttt{CLOSE} operation.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centerline{\includegraphics[width=\hsize]{lambdammm_vm_structure}}
|
||||
@@ -203,15 +199,15 @@
|
||||
RETURN 3 1
|
||||
\end{lstlisting}
|
||||
|
||||
Listing \ref{lst:bytecodes_onepole} shows an basic example when the mimium code in Listing \ref{lst:onepole} is compiled into VM bytecode. When \texttt{self} is referred, the value is obtained with the \texttt{GETSTATE} instruction, and the internal state is updated by storing the return value with the \\ \texttt{SETSTATE} instruction before returning the value with \texttt{RETURN} from the function. Here, the actual return value is obtained by the second \texttt{GETSTATE} instruction in order to return the initial value of the internal state when time=0.
|
||||
Listing \ref{lst:bytecodes_onepole} shows a basic example of how the mimium code in Listing \ref{lst:onepole} is compiled into VM bytecode. When \texttt{self} is referenced, the value is retrieved using the \texttt{GETSTATE} instruction, and the internal state is updated by storing the return value with the \texttt{SETSTATE} instruction before returning it via the \texttt{RETURN} instruction. In this case, the actual return value is obtained by the second \texttt{GETSTATE} instruction, which ensures that the initial state value is returned when time = 0.
|
||||
|
||||
For example, when a time counter is written as \texttt{| | \{self + 1\}}, it is the compiler's design choice whether the return value of time=0 should be 0 or 1 though the latter does not strictly follow the semantics E-FEED in Figure \ref{fig:semantics}. If the design is to return 1 when time = 0, the second \texttt{GETSTATE} instruction can be removed and the value for the \texttt{RETURN} instruction should be \texttt{R(2)}.
|
||||
For example, if a time counter is written as \texttt{| | {self + 1}}, the decision on whether the return value at time = 0 should be 0 or 1 is left to the compiler design. Though returning 1 does not strictly follow the semantics of E-FEED in Figure \ref{fig:semantics}, if the compiler is designed to return 1 at time = 0, the second \texttt{GETSTATE} instruction can be omitted, and the value for the \texttt{RETURN} instruction should be \texttt{R(2)}.
|
||||
|
||||
A more complex example code and its expected bytecode instructions are shown in Listing \ref{lst:fbdelay} and Listing \ref{lst:bytecodes_fbdelay}. The codes define delay with a feedback as \texttt{fbdelay}, the other function \texttt{twodelay} uses two feedback delay with different parameters, and \texttt{dsp} finally uses two \texttt{twodelay} function.
|
||||
A more complex example, along with its expected bytecode instructions, is shown in Listings \ref{lst:fbdelay} and \ref{lst:bytecodes_fbdelay}. The code defines a delay with feedback as \texttt{fbdelay}, while another function, \texttt{twodelay}, uses two feedback delays with different parameters. Finally, \texttt{dsp} uses two \texttt{twodelay} functions.
|
||||
|
||||
Each after the referring to \texttt{self} through \texttt{GETSTATE} instruction, or call to the other statefull function, \\ \texttt{SHIFTSTATE} instruction inserted to move the position of state storage forward to prepare the next non-closure function call. Before exiting function, the state position is reset to the same position as that the current function context has begun by \texttt{SHIFTSTATE} (A sum of the operand for \texttt{SHIFTSTATE} in a function must be always 0). Figure \ref{fig:fbdelay_spos} shows how the state position moves by \texttt{SHIFT-}\\\texttt{STATE} operations during the execution of \texttt{twodelay} function.
|
||||
After each reference to \texttt{self} through the \texttt{GETSTATE} instruction, or after calling another stateful function, the \texttt{SHIFTSTATE} instruction is inserted to advance the state storage position in preparation for the next non-closure function call. Before the function exits, the state position is reset to where it was at the beginning of the current function context by using the \texttt{SHIFTSTATE} instruction. The total operand value for \texttt{SHIFTSTATE} within a function must always sum to 0. Figure \ref{fig:fbdelay_spos} illustrates how the state position shifts with \texttt{SHIFTSTATE} operations during the execution of the \texttt{twodelay} function.
|
||||
|
||||
By describing an internal state as a relative position in the state storage, the state data can be expressed as a flat array, which makes the implementation of the compiler simple, not like a tree structure that need to analyze a call tree from the root to generate as in the previous implementation of mimium. This is similar to upvalue makes the implementation of the compiler simpler by describing free variables as relative positions on the call stack.
|
||||
By representing the internal state as a relative position within state storage, the state data can be stored as a flat array, simplifying the compiler implementation. This avoids the need to generate a tree structure from the root, which was required in the previous implementation of mimium. This approach is similar to how upvalues simplify compiler implementation by treating free variables as relative positions on the call stack.
|
||||
|
||||
\begin{lstlisting}[float,floatplacement=H,label=lst:fbdelay,language=Rust,caption=\it Example code that combines self and delay without closure call.]
|
||||
fn fbdelay(x,fb,dtime){
|
||||
|
||||
Reference in New Issue
Block a user