-
Notifications
You must be signed in to change notification settings - Fork 4
Assembly language
The tenyr assembly language is algebraic, rather than mnemonic like most common assembly languages : it doesn't use keywords like mov
or xor
to signal operations to the assembler. Examples will clarify :
_start:
b <- 10 // set up loop constraint
// comments can appear anywhere on a line
top:
b <- b - 1 // decrement loop variable
c <- b > a // compare b to 0
p <- c & -3 + p // jump back
done:
illegal
There are 16 general-purpose registers, all 32 bits wide, named A through P. Register A, when read from, is always 0 ; it can be written to, but with no effect. Register P is the program counter / instruction pointer ; it can be written to directly, but otherwise will increment by 1 with every executed instruction. Its current value is the address of the executing instruction, plus one.
All operations in tenyr have essentially one form :
- Z arrow W op X + Y
where arrow
is <-
or ->
(the latter is allowed only if the right side is dereferenced) ; op is one of the supported operations ; Z is a named register ; and W, X, and Y are named registers and a 12-bit sign-extended immediate values (any one of the three must be an immediate, and the remaining two are registers). Regardless of the op
, the operation involving W op X
always occurs logically before the addition of Y
. Dereferencing (using the value as a pointer into memory, and retrieving or storing through that pointer) is also possible ; however, only one side of the arrow can be dereferenced per instruction. It is possible to leave out any one or two of the elements on the right side of the arrow ; this is effected by replacing missing operands with A (which is always zero) and missing operations with bitwise-or.
For example, using real register names this time (whitespace is not significant, and is inserted only for comparison) :
b <- c # load b with the value c
b <- [c + d + 1] # load b with the value at address c+d+1
[b] <- c + 1 # store to address b the value (c|a)+1
b <- c * 3 # load b with the value c*3+a
b <- [ 3 + d] # load b with the value at address (a|3)+d
[b] <- 3 # store to address b the value (a|3)+a
b <- 3 * c # load b with the value 3*c+a
and when the right-hand side is dereferenced, the arrows can be reversed :
b -> [c + d + 1] # store b to address c+d+1
b -> [0x333] # store b to address 0x333
There are few accepted syntax sugars :
b <- -c # equivalent to b <- a - c
b <- ~c # equivalent to b <- a ^~ c
b -> c # equivalent to c <- b
Currently the following operations are supported. These operations have not yet been finalised, but they are not likely to change.
- X + Y : add
- X - Y : subtract
- X * Y : multiply
- X < Y : compare less than
- X == Y : compare equal
- X >= Y : compare greater than or equal to
- X <> Y : compare not equal
- X | Y : bitwise or
- X & Y : bitwise and
- X &~ Y : bitwise and ones' complement
- X ^ Y : bitwise xor
- X ^~ Y : bitwise xor ones' complement
- X << Y : shift left
- X >> Y : shift right logical
- X >>> Y : shift right arithmetic
Immediate values are always sign-extended from 12 bits to 32 bits, regardless of the operation and instruction type.
Labels are strings from 2 - 31 characters long that are used to mark and refer to addresses in a code symbolically. They must match the regular expression /[A-Z_][A-Z0-9_]{1,30}|[Q-Z_][A-Z0-9_]{0,30}/
; i.e., starting with an alpha character or underscore, less than 32 characters long, and not the single letters A
through P
. A label is created by suffixing it with a colon, and is referred to by prefixing it with an at-sign :
b <- @foo
foo:
.word 0xbeef
at which point register B now contains the address represented by foo
, which points to a 32-bit word containing the value 0x0000beef
. .word
is a directive.
The .
character can be used as a sort of special label reference, to the current instruction or directive. For example, if bar
in the following example represents address 6, then the value at bar
will be 11.
bar: .word . + 5
The directives currently supported are
-
.word
— This directive creates 32-bit words containing the values of the expressions following it (multiple expressions must be separated by commas). -
.ascii
— This directive packs the double-quoted string following it at 8 bits per character into 32-bit words, little end first. -
.utf32
— This directive creates one 32-bit word for each character in the double-quoted string following it. It's not really UTF-32, but it intends to be. -
.global
— This directive takes a label name and marks that label as global to the linker.
For example :
.word 0
.word 2 * 3, 3 * 4, 0x2
.word (@bar + 7) * @foo - .
For example :
.ascii "hello, world"
.ascii "this" " " "is a series of " "concatenations"
.utf32 "hello, world"
.utf32 "this" " " "is a series of " "concatenations"
Three kinds of comments are supported : C89-style comments (non-nesting) delimited by /*
and */
; C99-style comments starting with //
; and shell-style comments starting with #
. The latter two types of comments extend only to the end of line, and as there is no line continuation character, every commented line must have its own //
or #
character.