-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLISP-OLD.DOC
2883 lines (2414 loc) · 130 KB
/
LISP-OLD.DOC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
St. Vitus' Lisp
Lisp Interpreter for MS-DOS machines.
Copyright (C) 1988 - 1991
by
Antti J. Karttunen
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 1, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License (file GPL.TXT) for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
PREFACE
Note: In this document words "cons cells" and "list nodes"
are used interchangeably. Also () and nil mean the same thing.
Sometimes acronym SVL is used for St. Vitus' Lisp. Acronym i*86
is used for all the Intel series processors 8086, 8088, 80126,
80286, 80386 and so on.
Note: I do not use any version numbering with St. Vitus's
Lisp, instead when the program is started it will print
it's header:
St. Vitus's Lisp (X) Copyright (C) 1991 A. Karttunen. Enter (help) for help.
where X is number of the internal functions. Generally this will
be greater in future when more functions are added to the system.
Currently X is 206.
And this version is still PREMATURE! Even this document is
premature and contains perhaps erroneous information. I know that
many things should be done better, but my plane is leaving soon and
I have no time to debug and check everything.
INTRODUCTION
Saint Vitus' Lisp is relatively small, relatively effective,
and relatively unconventional lisp interpreter for MS-DOS
machines with Intel's i*86 series of processors.
Some of the basic ideas behind the St. Vitus' Lisp:
1) St. Vitus' Lisp doesn't attempt to be a compatible
with any standard, not at least with Common Lisp.
However, some functions does have the same names as in
Franz Lisp of unix/vms, and maybe SVL gets closer to
Franz in future.
2) St. Vitus' Lisp uses freely all kind of features and
concepts from various dialects of lisp (Franz Lisp,
Interlisp, Commonlisp), and from the other programming
languages too, most notably from the C language.
3) St. Vitus' Lisp is not so "high-level" language as
lisps usually, instead it's more like between the C and
traditional Lisps. Emphasis is not so much on "high-level"
AI-coding than providing simple, powerful interpreter for
all kind of little tasks where I have usually used C.
Of course you can use SVL also for the traditional AI-tasks,
for the Lisp as general purpose language is suitable for
almost all kind of problems.
Especially I have been interested about reading and
parsing sophisticated symbolic input possibly with
nesting list structures, and for compiling and translating
this data. (Optimizers, assemblers, compilers, human-text
analyzers). For this reason, special emphasis is put on
the flexibility of the
I/O-system. This is acquired with user definable read
macros, possibility to define own integer subtypes
and output-formats for them, and by tuning the
I/O-system by modifying the *charflags* and *io-flags*
system variables.
4) Although St. Vitus' Lisp is interpreter it attempts
to be comparatively fast and memory-effective. Following
things contribute to the speediness of system:
a) There's no type-checking not at all so much as in
traditional lisps. See section TYPE CHECKING.
b) St. Vitus' Lisp uses a new innovative method for
transferring the actual arguments to the called
function. See section FUNCTION CALLING.
c) Source code doesn't attempt to be easily portable. Most
fundamental functions like car, cdr, argument
pushing functions, etc. are coded in assembly
language of i*86 processors. However, it shouldn't
be ultimately impossible to port the SVL to machines
using MC68k, for example.
d) Data structures make optimal use of the segment:offset
long pointers of Intel's (awful) architecture, so that
basic objects of St. Vitus' Lisp take only 3 bytes
in memory, cons cells only six bytes, and compact
lists, another innovation, only half of that, three
bytes per node. In this respect, however, some of the
speed is sacrificed for the data compactness.
But if you are still wondering why code runs so slowly
on some particular occassion, you must remember that
this is only interpreter, after all. And of course execution
time also depends whether you run software on 4.77 Mhz
original PC/XT or on some 33 Mhz monster.
5) SVL doesn't attempt to be a pedantic and strict language
like some Pascal, which concentrates more on what programmer
SHOULDN'T do, than what (s)he COULD do. Instead it tries
to be exact opposite, so that almost everything should be
possible, like in the assembly language, even if that may
result bad, unpractical, weird-looking, unlegible, or even
buggy code. Self-modifying code is the one example.
So it's attempted to give programmer the full control of
internal machinery of the system, even if that's "dangerous".
I assume that programmer using this system has at least some
knowledge of the C and assembly languages (not to talk about
lisp!), and of the internal workings of the processor
at machine code level, rather than to be some theoretical-minded
"high-level" type. And so I hope that (s)he has the self-discipline
to write the correct code.
6) Note that SVL isn't ready yet, probably never, as all my
programs are under the constant development. There are
probably numerous bugs, and numerous other misfeatures.
REQUIREMENTS
You can run St. Vitus' Lisp on any computer having some
Intel's i*86 processor and some reasonable version of
MS-DOS. So this includes at least IBM/PC-clones, NEC/PC-9801
and clones, and maybe even Nokia's obsolete MikroMikko 2 ????
I am not sure how much of the ram SVL needs, but I hope
that you have enough. I have run this on machine having 512K,
so at least it suffices. SVL doesn't use any graphics-modes,
so any screen-controller goes. The global variable *screen*
points to the beginning of CGA-videomemory, but that's easily
changed.
I have run SVL on these machines, among others:
1) PC/AT clone with 512K of ram and MS-DOS 3.30
2) PC/386 with the megabytes of ram and MS-DOS 5.00
(This distribution is compiled with this machine.)
3) Epson PC-286/LE with 640K of ram and Japanese version
of MS-DOS 4.01 This computer is compatible with the Japanese
NEC PC-9801.
Modules have been compiled with Aztec-C version 3.40a,
assembled with Aztec 8086 Assembler version 3.40a and
linked with Aztec C Linker Vers. 3.40a. See the file
COMPILE.LOG which contains the log of all modules compiled
and linked, and also some virus-checking.
Note that some assembly-code is extremely Aztec-specific
and it's dubious whether this compiles with even any other
C-compiler & assembler for the PC-clones. Assembly-functions
have been written relying on the fact that only SI & DI
registers and CS, DS & SS segment registers need to be saved
between function calls, but AX, BX, CX, DX and ES can be
modified freely. Sixteen-bit integer results are returned
in AX and 32-bit integer results in DX and AX, so that former
contains the high word, and latter the low word.
Functions coded with C are called so that arguments are pushed
from right to left so that first argument in list is left
topmost on the stack. Calling function unwinds the stack
with ADD SP,somevalue instruction after the called function
has returned with RET instruction.
Also there is some Aztec-specific library functions like
ptrtoabs, _ptradd and _ptrdiff for the conversion of long
pointers to 20-bit absolute addresses and so on.
Function scdir scans the directory in the function dir (lispfuns.c)
and scr_getc() in sighandler (lisp.c) function gets the one character
from the keyboard without echo and without waiting the CR.
COMMAND LINE OPTIONS & ENVIRONMENT VARIABLES
You can start St. Vitus' Lisp simply by entering
lisp
and that starts the interpreter.
Currently there is only one option, -l, which tells
the interpreter to automatically load the file whose
name is following the option. If filename doesn't
contain the period (.), then the default extension .LSP
is used. There can be space between -l and filename,
but doesn't have to. Example:
lisp -lveba this stuff is ignored -l hiba.l
loads files veba.lsp and hiba.l to interpreter at the
starting time. Note that all other command line arguments
are ignored at starting time, and can be later accessed
with argv-function.
There is also following environment variables which
affect the workings of SVL:
MAXMEM
This is the most important one, because this affects how
much memory is allocated at the initialization of list system.
If this environment variable is not defined, or is not numeric
then default value 150000 is used. If value of this variable
is too small then memory can be exhausted, and when that
condition is encountered SVL stops the execution brutally,
and exits back to shell. So it's generally good idea to set
this environment variable as big as possible. However, if
you want to use shell-function (which jumps to shell), then
you should leave enough space for command interpreter of the
operating system. If MAXMEM is set to too big value, then
the error message like this is printed:
initlists: can't set heap pointer to 0x9d0b0008 ! heap: 0x786c0008 allocspace:
150000.
and you should set MAXMEM to smaller value respectively.
OBTABSIZE
This is an environment variable used to set the size of
the internal symbol table *obtable*. By default it's 251.
For more information see description of global variable *obtable*.
LINEBUFSIZE
This is an environment variable used to set the size of
the internal read buffer. By default it should be big enough
for the most purposes, so you probably don't need to use
this one.
TYPES OF DATA OBJECTS IN LISP
One of the most fundamental differences between Lisp and
more conventional programming languages is that while latter
have usually the compiletime-typing (like int x or char c in C),
lisp has the runtime typing. This means that every data object
in lisp contains also little piece of information (usually called
a tag) which tells the type of that data object. That is, whether
it's a list, integer, symbol or something else. And that
information is used at runtime to decide what to do for the object.
Here is the detailed description of the various data types of
St. Vitus' Lisp and how they are implemented at C and assembly
level:
Fundamental type of the list system is TOB (at source level!),
meaning simply 'Typed OBject'. It's defined in lists.h as:
typedef ULI TOB;
ULI meaning 'Unsigned Long Integer', i.e. it's 32-bit unsigned
entity (i.e. doubleword). TOB is composed of the following fields:
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 : 1 1 1 1 1 1
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 : 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
: : : : : :
:Highest: T :C: : :
:nybble : Y :O: : these bits are : OFFSET of the pointer,
:of the : P :M: : set to zeroes : or 16-bit integer if
:segment: E :P:*: when object is : the type is INT
: or : :A: : loaded from :
:integer: :C: : the memory. :
:subtype: :T: : :
: : : : : :
Bits 27-26 contain the type of TOB, allowing 4 different types.
Compact bit (25) is explained later, and bit 24 is reserved for
the future use. Low word, bits 15-0 contains the offset of pointer,
or 16-bit integer if type is INT. Four highest bits 31-28 contain
highest nybble of segment, other bits of segment set to zeroes,
when fetching data. However, if type is INT, then those bits contain
printing subtype of integer.
So TOB is 4 bytes, when it's loaded "out from memory" (as with car,
or value), but only 3 bytes when in memory, because bits 23-16 are
not needed, and in 4 byte mode they are always zeroes.
If type of TOB is pointer to somewhere, then bits 27-24 are just
masked off, and result is used as pointer.
Conanized pointers
With canonized pointer is usually understood the long pointer
of the i*86-processors, which is guaranteed to be in the format:
????:000?
that is, segment part can be anything, but offset part is always
between 0-15, other nybbles than lowest one being zeroes.
Correspondignly with "conanized pointer" I mean a pointer which
is guaranteed to be in the format: ?000:????
I.e. offset part can be anything from 0 to 65535, but in segment
part only the highest nybble is significant, other ones being zeroes.
And because all 20-bit addresses from 0 to 1048575 can be uniquely
expressed with this format, other bits in segment can be used for
expressing run time type, or something else. Note that you get
absolute 20-bit address from conanized pointer just by putting
highest nybble to the front of other nybbles in offset part.
(That is: multiply the highest nybble with 65536 and add an offset).
There is functions conanize and t_conanize in the file conanize.asm,
and they are used for converting long pointers in arbitrary format
to the conanized form. t_conanize additionally sets the type byte
also.
There are four basic types, there being just two bits reserved
for the type.
They are:
CONS CELLS
These are list-nodes, list-nodes being six-byte entities
composed from the two fields, called for the lisp-historical
reasons CAR & CDR. Both can contain any other TOB object.
Usually car-field contains the 'head' of the list, and cdr-field
the 'tail', i.e. pointer to the rest of list. So if we have the
list which, when printed, looks like (A B C), it's in memory
looking like this:
(Exclamation-columns (!) stand for pointers to the
corresponding objects (in this case to symbols A, B and C),
arrows in CDR fields meaning a pointers to the next cons cells
and zeroes at upper lefthand corners mean that all compact
bits are off).
CAR CDR CAR CDR CAR CDR
+---+---+ +---+---+ +---+---+
:0 :0 : :0 :0 : :0 :0 :
: ! : --------> : ! : --------> : ! :NIL:
+-!-+---+ +-!-+---+ +-!-+---+
! ! !
A B C
So this list takes altogether 3*(2*3) = 18 bytes in the memory.
However, if COMPACT bit (bit 25) is on in the CAR field of cons-cell,
then it means that rest of the list is not got by following the
pointer in the CDR field, instead it is just the address of this
cons cell plus size of TOB in memory (i.e. 3 bytes). List whose
all cons nodes have compact bit on (except last) is the so called
"compact list", and is effectively a vector, i.e. one-dimensional
array, except that elements of it can still be of the different
types. So the same list (A B C) as compact list is in memory:
(c at upper lefthand corner meaning that compact bit is on, 0 that
it is off).
Compact list Same list after
(A B C): the cdr-function,
resulting (B C):
CAR CAR CAR CDR CAR CAR CDR
+---+---+---+---+ +---+---+---+
:c :c :0 :0 : :c :0 :0 :
: ! : ! : ! :NIL: : ! : ! :NIL:
+-!-+-!-+-!-+---+ +-!-+-!-+---+
! ! ! ! !
A B C B C
So now it takes only 4*3 = 12 bytes in memory. Note that the
last cons cell in compact list must always be normal one,
compact bit off, and cdr containing NIL, because there is no
other way to mark the end of the compact list. Note that there
can exist even mixed lists of compact and normal cons cells,
e.g. if normal list is concatenated to the end of the compact
list or vice versa.
What is this mysterious NIL anyway ? Well, it's the most
important object of the list system, being just a longword zero,
and defined at source level as: #define NIL ((TOB) 0)
It corresponds to the NULL of the C language, and is used with same
way to mark all kind of special conditions, and is also used as
truth value "false", to be opposite of all other TOB's which are
considered "true". So when the cdr-field of the cons cell is NIL,
it means that list ends here, so that list accessing functions
shouldn't go any further. Note that, although NIL has same type
bits as CONS cells (i.e. 00), it is still not proper CONS cell,
because there is no CAR or CDR field in it, it is just NIL,
containing no other information. However, CONS cells and NIL
belong to the same supertype which is LIST, because NIL also
stands for an empty list, i.e. the list where is no elements
(whose length is zero), and is printed as: ()
This is also only way to refer NIL in kanjidic, in lisp it can
be also referenced with the name nil.
SYMBOLS
Symbols may look just like some kind of strings at first glance,
but the print name of symbols which is visible, is only one aspect
of the symbols. There is two or three parts in symbols, which are:
Print name, which is just string containing the name of symbol.
Value, which is field which can contain any TOB object, usually
some information associated with symbol.
Plist, meaning "property list", kind of second value, which can also
used for keeping information about symbol. However, space for
plist field is allocated only if global variable *plist-flag* is
set to 1.
(By default this is on in St. Vitus' Lisp).
Symbol with print name "muuliaasi" is allocated to memory as:
value
+---+---+-----------+
: : :muuliaasi\0:
+---+---+-----------+
plist ^
(optio- :
nal) Address of symbol points to the beginning of print name.
So it takes length of print name plus 4 bytes (3 for value,
1 for ending zero) or length of print name plus 7 bytes if
plist fields are used. Note that because address of symbols points
to the beginning of print name, it can be fetched just by converting
the symbol to the type string.
The most important point in symbols is that everyone is unique
(or at least should be), i.e. that there is never more than one symbol
in the lisp system with given name. This is guaranteed by functions
lookup & intern (in lists1.c), which maintain hash table of symbols
read into the system so far. So when string muuliaasi is encountered
in input stream, it is first checked that it is not already in symbol
table (called oblist, as for OBject LIST), and if it is, then that
old symbol is returned. Only if it is not previously met, is the new
symbol allocated.
Symbols are used in St. Vitus' Lisp for the names of the (global)
variables, and for the names of the functions, among other things.
INTEGERS
Integers are 16-bit words, which can be interpreted as signed
or unsigned numbers or as bit-masks, or characters, or for whatever
purpose user likes to apply them. Because these are not pointers
the segment bits 31-28 at the most significant nybble of high word
can be used for the other purposes. So they are used to denote the
printing subtype of integer, which affects how the integer is printed.
They are:
DEC Ordinary decimal. This is default, (segbits are zero).
HEX Unsigned hexadecimal, eg. 0xABCD
OCT Unsigned octal, eg. 0177777
CHAR Single character, eg. `c` That quotecharacter can be
changed, or it can be printed without quotes, with princ
for example.
PICK Pick-forms. These are used for fetching function arguments
from the system stack. More about this below.
More printing subtypes can be invented at future, because there
is 2^4 = 16 possibilities. Integers with same magnitude but with
different subtypes are considered (or at least should be) equal,
although macro eq (in C-sources, defined in lists.h) which just
uses the == operator of the C, treats them as different. However,
function L_eq (in fundamen.asm) which is used by eq function of lisp,
treats them correctly as equal.
OTHERS
This is a typeclass containing all the other types, which are:
STRING
This is just a pointer to C styled string. There isn't any
values or other things associated with it, just characters
of string followed by '\0', i.e. zero, as in C.
FP
This is the File Pointer, corresponding to FILE *fp type of C,
which is just conanized. These can be distinguished from other
subtypes of OTHER because they all point to that determined area
where C keeps its file pointers (stdin, stdout, etc.)
FUN or BCD (= binary code)
This is pointer to compiled function, i.e. binary coded, and
is currently distinguished from other subtypes by the fact
that because currently kanjidic and lisp are compiled in
"small code, large data" memory model, all the addresses of the
code are just 16 bits, so segment bits are zero. And because
there is operating system & other kind of garbage in low memory,
there (I hope so) isn't any pointers to data in area 0000:0000 to
0000:FFFF.
However, if list system is compiled in the future with "large code"
memory model, or for different processor (for M68k for example), type
testing must be done differently. Fortunately, most systems put
code and data to different sections, so it would suffice just to
test that pointer points to somewhere in code section.
ENDMARK
This is special object which is used to represent special conditions
where NIL cannot be used. It is simply 0xFFFF:FFFF, so it cannot
be stored in value of symbol or in list without corrupting it,
because when stored to memory and fetched back it's corrupted
to 0xFF00:FFFF because the second highest byte is not stored
to memory. Despite this defect it is still useful in certain
situations, for example, read-functions in lists1.c return
it if they detect some error (e.g. unbalanced parentheses), or
End Of File condition. It would be stupid to return NIL on
those occassions, because () can be part of legal input stream.
Also functions which take variable number of arguments,
like listn & clistn require this to be last argument of theirs,
so that they can detect when arguments end. (Because C doesn't
support variable number of arguments in any special way).
SYNTACTICAL CONVENTIONS
In general, for every internal lisp object there is corresponding
"external" ascii-represetation. It's the duty of the read functions
to convert to and allocate corresponding internal lisp objects
when it parses the external ascii-represetations, and the task
of the print functions is to do the translation from internal
binary objects to external ascii-format. Generally, reading
and parsing is much harder to do than the printing. Also,
every internal lisp object produces some output when printed
(at least garbage), but there is some lisp-objects which
cannot be read in, instead they must be created at the execution
time (like pick-expressions).
Here is some of the syntax explained, but I unfortunately haven't
time now to describe everything.
COMMENTS
There is two kind of comments. Everything after and including
semicolon (;) upto the end of line is ignored. This is the traditional
comment character used by the Lisp and most of the assembly
languages. Also everything inside /* and */ are ignored. Like
in C these don't nest.
; For example, this is the semicolon-comment.
/* And this is the "C-style" comment. */
NUMBERS
Octal numbers are preceded by the digit 0, followed by one or
more octal digits (0-7) and delimited by some non-continuous
character, except period. For example, 0123456 is an octal number.
0 by itself is not.
Hexadecimal numbers are preceded by the prefix 0x followed
by one or more hexadecimal digits (0-9A-Fa-f).
Normal decimal integers begin with any other digit than zero,
followed by zero or more digits and delimited by
some non-continuous character. However, if value of the
global variable ibase is 8 then these kind of numbers are also
read in as octals. If number ends with a dot (.) then it's
understood as decimal even if its first digit is zero,
or ibase is 8. Also single digit 0 is understood as normal
decimal integer, not octal. Numbers greater than 65535
are read in as longcells, i.e. cons-cell is created for them,
car-field containing the low-word and cdr-field the high-word.
Longcells haven't been fully implemented yet, but maybe in
future they are also printed automatically as longwords and
some arithmetic functions like plus and times can take them
as arguments and return as results. But not yet.
Note that token beginning with digits but containing some
other continuous characters after them (e.g. letters),
is read as symbol, not number. E.g. 2SWAP is read as symbol
2SWAP not separately as the number 2 and symbol SWAP.
Character constants are enclosed by backquotes (`) and
contain one or two characters between them. Backslash
escape system can be also used, as with strings. If there
is one character, like `A` then it's put to the low byte
of resulting char-integer, and high byte is set to zero.
If there is two characters, like `QW` ("doublecharacter"),
then the first char is put to high byte, and second one
to low byte.
Strings begin with a doublequote (") and end with same
character, which should be at the same line. New string
is allocated for the characters between. Backslash escape
extensions are done also for them. (See the file PARSECHR.C
what escape-expressions are recognized).
Traditional interpreted lisp system has four main elements:
1) Read routines which do the hard work in translating
the data, i.e. lists, symbols, integers and so on from the
external format to the internal binary format.
It parses the input, allocates memory for the objects
encountered, and makes sure that there is always only
one symbol in system with same printname, by using
*oblist* to collect all encountered symbols.
(This is called interning, and it uses hashing for
speedy retrieval of symbols).
In this process all kind of fancy things are possibly
done, like macro expansion, etc.
2) Print routines which convert data from the internal
binary format to external ascii format. This is
generally a little bit easier than reading, but all
kind of fancy processing can be done on this time too.
3) Evaluating routines which evaluate the given lisp-expression
and return the result.
4) Garbage Collector which is periodically invoked when memory
is exhausted, and which examines all the used memory,
and frees those list nodes which are no longer accessible (used).
St. Vitus' Lisp doesn't have a Garbace Collector. More on
this below.
Then there is of course lots of functions to handle all
kind of fundamental things, like for following list nodes
(car & cdr) and constructing them (cons), allocating memory,
doing arithmetic, freeing list nodes, and so on.
Now the lisp interpreter is just the loop which first reads
expression from the standard input, then evaluates it, and
prints it out. In lisp this is expressed as (print (eval (read)))
And then that is done again and again, until user exits.
Source for parts 1 & 2 of St' Vitus Lisp is mainly in file
lists1.c and that code is also used by many other programs,
like KANJIDIC and ODE11.
Source for part 3 is mainly in files LISP.C and PUSHARGS.ASM.
Then there is also lots of files written in C and Assembly
for various built-in functions included in St. Vitus' Lisp.
MEMORY MANAGEMENT
Why there is no garbage collector in St. Vitus' Lisp ?
This is partly accidental, partly intentional.
Originally I thought to code it (that's why there is still
one surplus bit in TOBs, for gc-bit), but then I noticed that
it's very hard problem to keep track of all used list nodes,
because:
1) There is things called compact lists, which cannot be
followed by traditional "reversing-pointers" method.
And because normal and compact lists can be mixed
that makes things even more harder.
2) There's no clear boundary between interpreted lisp-level
and binary-coded C-assembly-level, because both interpreted
lisp-functions and machine-coded internal functions are
called essentially with same way, i.e. arguments are
pushed to system stack (sp) just in same way as they are
pushed in C. And so it's very hard to dig from the stack
every "still-in-use" TOB pointing to list node, because there
is a lot of miscellaneous binary stuff of underlying
C- and assembly-functions.
3) Properly working garbage collectors are anyway notoriously
hard to code (if lisp system is not very simple), so I
thought it's better to go ahead with other things than
to start spending rest of my life fighting with lots of
insidious bugs crashing now and then the whole lisp system.
So what are the consequences of this ?
I have supplemented two functions for freeing not-anymore-used
lists and list-nodes. They are called free-list and free-cons.
They free the whole list and one cons cell (= list node)
respectively, i.e. put them back to freelist for later usage.
Of course using them requires extreme caution and understanding
how user's code works, because after the list is freed,
NO REFERENCES OF ANY KIND should be made to it or its toplevel parts.
Also user should note that every time (s)he uses cons
it takes six bytes of valuable memory, and list functions
which contain copying-routine inherently, like append and
reverse, all devour memory.
So there's three strategies for writing memory-effective code:
1) After data is read in use only "pure", "list-following"
functions like car, cdr, member, etc. to traverse various paths
of list-structures, but don't produce any new list nodes.
2) Use no-side-effecting functions like append and reverse which
copy their arguments before using them, but keep track of lists
which are no longer used, and then return them with free-list
and free-cons.
3) Use side-effecting functions rplaca, rplacd, rplacx, nconc,
nreverse and so on, but keep data in same list structures,
and change car-parts of them with rplaca, rplacx, and like,
but don't allocate any new list nodes.
In practice all of the techniques mentioned are used
together, and you can only hope that there's enough memory
and no logical bugs in your code.
Anyway, in other programming languages, like in C, you must
also keep track of memory usage by yourself. And at least
SVL doesn't halt mysteriously in the middle of the execution
for garbage collection, feature I find sometimes annoying, as it
may take a lot of time in some lisps.
I have thought that in future it would be possible to allocate
cons cells from the system stack (where is also arguments of
functions), so that they would be automatically deallocated
when returning from the enclosing function. But of course
this also creates a danger of so called "dangling references"
if so created cons cells are returned someway back to
calling function. (Also stack segment should be aligned to
an area of X000:0000 - X000:FFFF).
TYPE AND ERROR CHECKING
Generally only the most "low-level" functions like car, cdr, rplaca,
rplacd, setvalue (used by setq), value (fetchs value of symbol),
etc. do the type-checking so they catch errors flowing down from the
more sophisticated functions like reverse, memq, nconc and so on.
Most arithmetic functions doesn't have type-checking at all, and
neither have some string-using functions. Purpose of this is to
make code speedier and compacter but also to ensure that
programmer knows at the CODING TIME what's going for what ever function
(like (s)he must know also with C or assembly), instead of finding
it out with the trial and error method at the run-time.
If you supply invalid argument to function, for example:
(car 'a) then interpreter prints the following error message:
**ERROR: argument for function car is invalid !
03E8 05DE 0779
7400:1E5F a
**Back to toplevel.
->
And returns back to toplevel.
03E8 05DE 0779 are the last three return-addresses dug from
the system stack, so that the righmost address points to
that location which called function car, middle one (05DE)
is address to function which called that function, and
leftmost is address to the function which in turn called that.
You can find from file LISP.SYM, in which built-in function
those addresses reside. For example, 0655 is function gapply_
and 07BC is function quit_, so car was called from the
gapply_, from where we can see that error was caught in
function which was directly called from the interpreter,
i.e. car. That's because all the internal functions called
from interpreted code go via internal function _gapply,
whose source is in the file lisp.c
Hex-number below that (7400:1E5F) is argument printed
in internal format, and after that is argument printed in
the normal symbolic format.
More examples:
(setq () 'kala) ; Trying to set value for nil produces the
following output:
**ERROR: 1. argument for function setvalue is invalid !
05DE 0779 0868
0000:0000 7400:1FAB
1st arg: ()
2nd arg: kala
When multiple argument function catches error it also says
which argument was erroneous, three return addresses, and
and then prints all arguments in their hex-formats and then
in normal format.
(append 'a '(b c)) ; Trying to append symbol to list, produces
the following output:
**ERROR: argument for function car is invalid !
0779 28DF 2AD0
7400:1E5F a
By examining LISP.SYM we find that 2AD0 is in the function
topcopy, 28DF is in the append, and 0779 is in the gapply.
(Of course those actual addresses can be changed in future
versions, when more stuff is added to executable
Actually, they are not anymore same in this distribution's
LISP.SYM these ones in this document).
From this we can see that interpreter called the
append, which called the topcopy (to make a new copy
of first argument), which in turn called the function car,
which detected the erroneous type of the argument.
Some of the list functions, namely last, length and memqnth
detect if their argument is a circular list by keeping
count of the length, and when it goes over 65535 back to
zero they output error message. Of course this means that
you cannot give them lists of longer than 65535 elements,
and because integers are just sixteen bits, the result
would be incorrect anyway.
(Compact list of 65536 elements takes at least 196611 bytes
of the memory and a normal list with the same length takes
393216 bytes).
FUNCTION CALLING
When in traditional dynamic binding lisp system function like
(defun zup (x y) (cons y x))
is called like (zup 'a 'b)
the first thing to do is save the old values of symbols x and y
to the stack, assign a and b to them, execute the function body
(in this case (cons y x) which results a dotted pair (b . a))
and when returning from the function, restore the old values of
x and y.
In St. Vitus' Lisp function is first "pre-compiled" by
defun to new form: (lambda 2 (cons <#pick 8> <#pick 4>))
where formal argument list is replaced by length of it, and all
formal arguments in body are replaced by pick-forms so that first
formal argument becomes <#pick 4>, second one becomes <#pick 8>,
third one becomes <#pick 12> and so on. Those pick-forms are
actually just integers which are printed in that peculiar way.
When function is called the actual arguments are evaluated from
left to right, but they are put into stack "as they were pushed"
from right to left, as in underlying C. When body of the called
function is executed after that and when eval subsequently
encounters pick-forms they are added to the current framepointer
to fetch corresponding actual argument from the stack.
So that's almost like C and Assembly language work at machine code
level. Of course this is strictly lexical binding.
There is three so called "disciplines" in SVL. Lambda, nlambda and
vlambda. First is normal case where arguments are evaluated
and pushed to stack one by one. Nlambda functions are internally
called just by one argument, their unevaluated argument list.
It's the duty of function in those cases to evaluate its
arguments. In general most of the control-forms like progn,
cond, if, or, and, while and so on are nlambdas. Vlambda is
my own invention, meaning "variable argcount lambda". Currently
it's almost the same as lambda except when function has been
defined as vlambda having zero arguments, then maximum _max_args
arguments from the arglist are pushed to stack, and followed always
by the ENDMARK. _max_args is the internal variable which can be set
by the user with set-max-args function. Otherwise lambda and vlambda
functions put ENDMARK after their arguments if there was less
arguments in actual argument list than in defined formal arglist.
Of course it would be better that with lambda discipline the
rest of the formal args (possibly some defined with &opt and &aux)
would be filled with ()'s (it would be even quite fast to do that
with REPE STOSW instruction in module pushargs.asm), but I have
no time to do that now. So you must always check the optional
arguments with endmarkp function, because the first argument
in formal args not specified in calling function is set to
ENDMARK and all args after that ARE UNDEFINED. (I.e. whatever
is left on the stack).
DESCRIPTION OF THE BUILT-IN FUNCTIONS
Here is the description of the all internal C- or assembly-coded
functions of St. Vitus' Lisp. First is calling format described
in kind of "metalanguage", so that for example, optional arguments
are enclosed in brackets []. Then is so called discipline of
the function, which is LAMBDA, NLAMBDA or VLAMBDA, which affects
how arguments are handled. After that is "classification letters"
of function, one or more of the following characters:
P Function is pure in a sense that it doesn't modify its
arguments, but can still have side effects (PS in that case).
L(count) Function consumes normal list nodes.
C(count) Function consumes compact list nodes.
R(count) Function consumes string space.
Count in parentheses specifies how many objects of that type
is consumed from memory. Normal list nodes take six
bytes, compact nodes three bytes, and every character of
string space takes one byte, of course. If count is
specified as ? then memory usage is variable. And anyway,
those counts are not always exact.
M Function modifies one or more of its arguments physically.
S Function is executed mainly for its side-effects.
! One or more exclamation marks indicate the degree of
"dangerousness" of the function.
Sign => means "results".
LIST ACCESSING FUNCTIONS.
(car x) LAMBDA P
Returns car-part of the cons-cell x, which can be a normal or
compact list. If x is NIL, then result is NIL.
Example: (car '(a b c)) => a
(cdr x) LAMBDA P
Returns cdr-part of the cons-cell x, which can be a normal or
compact list. If x is NIL, then result is NIL.
Example: (cdr '(a b c)) => (b c)
Composites of the car & cdr:
(caar x) = (car (car x)) LAMBDA P
(cadr x) = (car (cdr x)) LAMBDA P
(cdar x) = (cdr (car x)) LAMBDA P
(cddr x) = (cdr (cdr x)) LAMBDA P
(last l) LAMBDA P
Returns a last cons cell from list l, which can be a normal
or compact one.
Examples:
(last '(a b c)) => (c)
(last '(a)) => (a)
(last ()) => ()
(last '(a b c . d)) => (c . d) ; Note! You cannot actually enter
; that kind of mixed lists/dotted-pairs directly from the toplevel!
Note! Use (car (last l)) to return last toplevel element of list l.
(length l) LAMBDA P
Returns a length of list l, i.e. number of its toplevel list nodes.
Examples:
(length ()) => 0
(length '(a b c)) => 3
(length '(a b c . d)) => 3
(length '(a (b c d) e) => 3
(memq x l) LAMBDA P
Returns first occurrence of expression x in list l, and () if x
is not found.
Examples:
(memq 'c '(a b c d e)) => (c d e)
(memq 'x '(a b c d e)) => ()
Note that memq uses eq for testing the equality, so it can't
be used for digging sublists from lists, if they are not physically
same. So (memq '(c d) '((a b) (c d) (e f)) => ()
Use member for that purpose.
Name memq is from Franz Lisp and is probably acronym for
member quick, or member using eq.
(member x l [:TEST or :TEST-NOT] [test-fun]) VLAMBDA P
With two arguments behaves like memq, but uses equal instead
of eq, so detects also logically, even if not physically same objects.
Example: (memq '(c d) '((a b) (c d) (e f)) => ((c d) (e f))
When there is keyword :test or :test-not (can be in uppercase also)
as third argument then fourth argument is used as testing function
to decide when to return part of l as result. If keyword is
:test-not then the result is negated before decision.
Note! (member x l test-fun) is equivalent to (member x l :test test-fun)
x and every toplevel element of l are given to test-fun so that
x is the first arg. and element of l is the second one.
Examples:
(member 4 '(8 5 3 0 15) :test greaterp) => (3 0 15)
(member 4 '(8 5 3 0 15) :test-not greaterp) => (8 5 3 0 15)
(member 'lahna '(a b c) :test-not #'(lambda (x y) (set y x))))
returns (), but sets also values of symbols a, b and c to be lahna.
(memqnth x l) LAMBDA P
Returns the position of the first x in list l, so if x is first
one in l returns 0, if second one returns 1, and if not in list l
at all returns (). Note, this functions uses eq not equal.
Examples:
(memqnth 'a '(a b c)) => 0
(memqnth 'b '(a b c)) => 1
(memqnth 'x '(a b c)) => ()
(nth n l) LAMBDA P
Returns n:th element from the list l, so that indexes are zero-based.
() if n is equal or more than length of l (i.e: out of bounds).
Refer also qnth.
Examples:
(nth 0 '(a b c)) => a
(nth 1 '(a b c)) => b
(nth 3 '(a b c)) => ()
(nthcdr n l) LAMBDA P
Returns n:th cdr-part from the list l, () if n is out of bounds.
Refer also qnthcdr.
Examples:
(nthcdr 0 '(a b c)) => (a b c)
(nthcdr 1 '(a b c)) => (b c)
(nthcdr 3 '(a b c)) => ()
Note that (nth n l) is equivalent to (car (nthcdr n l)).
LIST CREATION FUNCTIONS
(append list1 list2) LAMBDA PL(length list1)
Returns lists list1 and list2 appended together. List1 is copied
before it is concatenated with list2. Both lists can be normal
or compact lists. This function is equivalent to:
(nconc (topcopy list1) list2)
Examples:
(append '(a b c) '(1 2 3)) => (a b c 1 2 3)
(append () ()) => ()
(append () '(a)) => (a)
(append '(a) ()) => (a)
(append '(a b c) 'd) => (a b c . d)
(cons x y) LAMBDA PL(1)
Returns a new cons-cell (= normal list node) constructed from
the expressions x and y.
(list x1 x2 x3 ... xn) VLAMBDA PL(number of arguments)
Returns a normal list constructed from its arguments.
Refer also clist.
Examples:
(list) => ()
(list 'a) => (a)