Information about the rattrapage is now available here: please be sure to read it, particularly the part about group projects.

Luca Saiu - Free Software - "Long" TP

The final six-hour TP consists in modifying a piece of free software.

Please also read the requirements.

2008-04-13: You can not submit solutions of the "striken-through" tasks (the ones written like this) for the rattrapage. I'm sorry, but I've helped so many students with those tasks that their solutions have become public knowledge now. Other tasks of about the same difficulty are available.

This is intended as a simulation of the experience of working on code which you have never seen before and you don't fully understand. This is what happens every time you start modifying some free software: even experts need a long time before really getting to know a piece of code written by someone else.

There are some simplifications compared to a real situation:

Despite all of this, this is a kind of software you're probably not familiar with, and you'll probably not understand all of it; nonetheless you should try to undertand and modify at least some part of it, trying not to break the rest.
This is not a particularly strange situation: it's exactly what happens in real life, and on a much smaller scale: so don't panic.


News


The software you will work on

I've called the software you're going to modify nanolisp: it's a very simple Lisp interpreter written in C. The language recognized by nanolisp is a subset of Scheme, essentially identical to the language I showed in the first lesson.

I've implemented nanolisp in the course of just three evenings (but I went to bed quite late every time, I admit it), hence it can't be a mature full-featured implementation.
It's already usable, it looks stable and is not too slow, but you can make it much better by adding features.

Here's nanolisp. I've written it for you students, but I seize this occasion to also release it to the general public. It's a small, cute project: have fun with it.

nanolisp
Copyright © 2007 Luca Saiu
nanolisp is free software, released under the GNU General Public License.
It comes with absolutely no warranty.

Please use the latest version if you have not started yet; if you have already started then you can safely continue to use the version you have. No change is particularly important and worthy an upgrade.


Your task

You can choose from a list of tasks of widely varying difficulty to extend some part of nanolisp. There are tasks suitable for every student, including beginners.

I'd be very happy if you completed more than one task, but I suggest to start from a single, easy one.

If you have some good idea for a task I didn't think of, tell me; I will happily accept reasonable suggestions.

Submission

You can optionally send me your solution by e-mail. Plese don't send any binaries, but only (compressed, if possible) sources. Good submissions may yield a (possibly substantial) bonus on the final evaluation; bad solutions, as usual, will not affect your evaluation.

You should ideally do your work in the six hours scheduled for the TP, but I will also accept submissions later.

Please put copyright and license headers where appropriate.


Practical instructions for working on the lab machines

Because you don't have a superuser account on the lab machines you have to install the needed software in some place where you have write permission. I've made the procedure automatic for you:

Execute this, all on a single line:

pushd /tmp && rm -f go && wget -O go /lipn/tp/install-everything-from-sources && . ./go && rm -f go && popd

The command line above installs all the needed software into your local computer's temporary directory /tmp. Note that the contents of /tmp may be lost after each reboot.
The command line above takes some minutes to execute, because it compiles everything from sources. I'm sorry for the time it takes, but I think this is the only way to make the software work on all the lab machines, which are configured in several different ways.
While you are waiting you can
choose your task.

You can also start working on nanolisp in another terminal before the command line above has completed: the important libraries are installed first.

Every time you open a new terminal window (or tab, or virtual text terminal) for working on nanolisp you have to simply execute

source ~/add-prefix /tmp/my-usr

to be able to use the software installed under /tmp. add-prefix changes some environment variables including PATH, C_INCLUDE_PATH, and LD_LIBRARY_PATH so that the correct directories are searched.

Decide where you want to work; don't use a temporary directory for this, because you will have to save your modifications, so you should use your home directory or a subdirectory of it; you don't want your modified sources to disappear when rebooting the machine!
Enter your working directory with cd, for example by typing

cd ~

if you work in your home directory, or otherwise something like

cd ~/DEVL

if you like to work in ~/DEVL (which must already exist, of course).

Now (unless you have already downloaded and modified nanolisp: you don't want to overwrite your modifications!) download and uncompress nanolisp (again, on a single line):

wget /lipn/tp/nanolisp-0.2.tar.gz && tar xfvz nanolisp-0.2.tar.gz && cd nanolisp-0.2 && . ~/add-prefix /tmp/my-usr

Now you're ready to explore nanolisp. To compile it just run

make

It should generate an executable called nanolisp. Play with it for some minutes to be able to understand it, then give a quick read to the sources, starting from main.c and nanolisp.h (nanolisp.h is the most important file to understand).
There are also some examples in examples/. Try to run them.

At this point you can choose your task if you've not already done so, and begin.

If you want to do everything manually without following my instructions, please see the next section. You can also find all the files downloaded by the command lines above in /lipn/tp/.


Instructions for working on other GNU/Linux machines

Install everything from sources or use the packaging system (if you choose a packaging system then be sure to also intall the "development" packages, which contain the header files you need to compile. Development packages usually have a name ending with -dev or -devel) of your distro. It's easy.

You need:

Some tasks also require other software like:

Compiling from sources on your machine

If you want to compile from sources and install into /tmp/my-usr, like in the lab, then: first of all you should add /tmp/my-usr to the relevant environment variables: an easy way to do this is with my script add-prefix (the following command line assumes that you have it in your home directory):

source ~/add-prefix /tmp/my-usr

Then, for each package you should uncompress the tarball into a temporary directory, enter the directory which is crated by tar, run ./configure --prefix=/tmp/my-usr and finally make && make install.

It's all very popular, mature and easy to install software. You shouldn't have any problem when compiling from sources.


Resources

Use the web, particularly search engines, to get the information you need: read tutorials and howtos, and consult manuals when you need to understand something in depth.

GNU manuals can be accessed from the Info system; I've installed many of them into your alternative prefix. You can also find them in many formats including HTML on http://www.gnu.org/manual/manual.html.
You will almost surely need at least the libc manual. Some tasks require flex and Bison.


Tasks


Name: use GNU Readline instead of gets()
Difficulty: trivial
Requirements: none
Comment: nanolisp currently uses gets(), which is unsafe: modify the code to use readline() instead. The Makefile already contains a commented-out line linking the library.
Of course you need the Readline manual in order to know how to call readline(). The very little information you need is
here.

Name: add support for floats
Difficulty: easy
Requirements: none
Comment: nanolisp only supports integer numbers, implemented with the C int type. You can add support for floats by adding a new case in nanolisp_type_t, writing C functions for creating float objects and do the type checking, and update the printer. Support in the scanner and parser is already present: you should just uncomment it.
You can make primitive procedures like + and - support both floats and integers (all "real" Lisp implementation do that); otherwise you can also just add new different primitives which only work on floats (for example you can call them +f, -f and so on). If you want you can add primitive procedures like sin and cos.


Name: add support for bignums or rationals
Difficulty: medium
Requirements: none
Comment: bignums are arbitrary-precision integer numbers; Guile has them: try to execute the factorial example to compute the factorial of 100 in Guile and in nanolisp to see the difference. In the same spirit, you can add support for exact fractions.
The needed functionality is implemented in GMP (GNU Multiple Precision library: it's installed), a C library. You can add a new s-expression type and implement primitive procedures by calling GMP functions.
Note that bignums are strictly more powerful than the current implementation of integers; it makes no sense to keep both, so if you implement bignums you should just replace the implementation of integers.

Name: add support for strings
Difficulty: easy/medium
Requirements: modifying the scanner flex source (the parser part is trivial)
Comment: Similar to the float task; when you allocate a string on the heap you should use the function in gc.c instead of malloc(): we have a garbage collector, so let's use it! It makes things much simpler.
Primitive functions for strings can be something like append, symbol->string and string->symbol.

Name: add support for arrays
Difficulty: easy/medium
Requirements: none
Comment: An array of three elements containing the integer 42, the boolean #t and the empty list is written in Lisp as
#(42 #t ())
An array is syntactically distinguished from a list by the initial character #; arrays don't have a dotted notation (it would not make sense for them, as they are not made of conses).
As you can see from the example above, arrays can have as elements s-expressions of different types: as it's usual in Lisp, there is no restriction to which type you can use for an array element: in particular it's allowed for an array to contain other arrays, conses and even procedures (by the way, this doesn't complicate the implementation in any way: you'll see that it's perfectly natural).
Similar to the float and string tasks, but the implementation is slightly more complicated. Support in the scanner and parser is already implemented, and commented out. Arrays need some primitive procedures similar to these: make-array (taking an integer parameter size; it creates an array of the given size, with all elements initialized to #t), lookup-array (it takes an array a and an integer parameter i, and returns the i-th element of a); and set-array! (it takes an array, and index and a new element, writes the new element at the given position in the given array, and returns #t).
Primitives should fail by calling nanolisp_error() in case of out-of-bounds lookup or set.
You can also write another procedure to resize an existing array, if you want.

Name: make the implementation of the symbol table correct
Difficulty: easy
Requirements: none
Comment: The current implementation of the symbol table has fixed size, and overflows a buffer when completely filled. Make the table dynamically-allocated. You can use malloc() and realloc(), or the garbage collector we're already using. Using the garbage collector is even easier.

Name: make the implementation of symbol interning efficient
Difficulty: easy/medium
Requirements: implement (or reuse) a hash table or balanced binary tree in C
Comment: When a symbol is parsed it's interned into the symbol table: this is currently implemented by scanning the table, which is inefficient: you can change the implementation of the symbol table so that it uses a hash table or balanced binary tree instead of an array. You can also reuse an existing implementation of the data structure, but it may cost you more work. Also pay attention to the license in that case!
Hint: if you implement the new data structure from scratch you can use the garbage collector internally. It makes things easier.

Name: implement imperative operators
Difficulty: easy
Requirements: a very basic understanding of interpreters
Comment: Implement set!, set-car! and set-cdr! (you can Google for them). set! can be implemented in terms of set-cdr!, because it has to modify a binding in a local environment (environments are represented as association lists in nanolisp: see interpreter.c) in some cases. Updating the global environment is trivial: see how define is implemented in nanolisp_eval_combination().

Name: implement a while loop
Difficulty: easy/medium
Requirements: undertanding interpreters
Comment: Implement a while loop; this makes much more sense if you also implement imperative operators, but you can also choose only this task: in this case you could use define as a (dirty) way to assign a global variable.
You can use any any "syntax" you like, but I'd prefer the "usual"
(while CONDITION FORM-1 ... FORM-n), or the slightly simpler (while CONDITION FORM) which always takes exactly two subforms.
Hint 1: you need to touch nanolisp_eval_combination().
Hint 2: if you implement while with the first suggested syntax you can expand the body into the form (begin FORM-1 ... FORM-n): there's already a similar case in the interpreter, for lambda.
Hint 3: the size of the needed modification is very small (I'd say about 15 lines).

Name: add bindings for a graphic library like SDL
Difficulty: easy/medium
Requirements: knowing how to use a graphic library from C
Comment: Add bindings to a graphic library: adding a primitive procedure drawing a single colored pixels in single a window is enough. The procedure would take as parameters the pixel coordinates and the color. SDL is installed, but you can also use other libraries like Xlib or OpenGL instead.

Name: add bindings for some other library you like
Difficulty: easy/medium
Requirements: knowing how to use the library from C
Comment: Similar to the graphic library task. Use any library you like.

Name: Make the interpreter faster
Difficulty: hard
Requirements: none
Comment: Speed up the benchmark example by at least 20% by changing the C sources (you should measure performance by always compiling in optimized mode, with the same optimizations). Don't make the implementation unsafe: for example you should not remove needed type checks to make the implementation faster; but of course you can remove unneeded type checks: there are some.

Name: Re-implement s-expressions in a more efficient way by using an unboxed representation where possible
Difficulty: very hard
Requirements: very low-level C programming: type casts and bitmask operations
Comment: Change the implementation of s-expressions: use just a few bits of a machine word to store the type of an object and (when possible) the rest of the word to store the content, instead of always using a heap-allocated struct which contains an enum; differnt types of s-expressions would be allocated in a different way (for example a cons would be heap-allocated and garbage-collected: an integer or symbol would be just as large as a machine word, and would not need to be allocated on the heap.
For example, if you use two bits for the type, the content of an integer s-expression would be represented with (size-of-the-machine-word - 2) bits (30 bits on a 32-bit architecture or 62 bits on a 64-bit architecture); in this case integers would not need to be heap-allocated or garbage-collected.
Hint 1: the Boehm garbage-collector (and also the GNU implementation of malloc(), by the way) always allocates objects with an alignment of a power of two, not less then 8 bytes: so the rightmost three bits of a pointer to an heap-allocated object are guaranteed to always be 000. You can use (some of) the three rightmost bits for the runtime type information.
Note that the garbage-collector needs to be able to always recognize pointers, so you should not change the bit representation of pointers.
Hint 2: You don't need to store all the type information within the type bits: you can say, for example, that if the type bits are 00 then the object is boxed, the nanolisp_sexpression_t contains a pointer, and the actual type is stored as a field in the heap-allocated object (this is what the current implementation always does; your task is to modify it so that it does it less often).
An unboxed representation is a big performance win, hence practically all "real" Lisp systems do this.

Name: Implement let
Difficulty: easy/medium
Requirements: interpreters
Comment: Implement a let construct with static scoping. Expanding into an application is permitted (but there are other ways to do it). You can do it with a "syntax" different from the Scheme syntax.

Name: Make the frontend recognize a "mundane" syntax
Difficulty: medium
Requirements: knowledge of parsers and yacc/bison
Comment: Change the scanner and parser so that the concrete syntax is more "mundane", with infix operators and syntactic sugar. Continue to generate the same s-expressions from the parser.
Hint: This requires to change only the scanner and the parser.

Name: Make the frontend recognize also a "mundane" syntax
Difficulty: medium/hard
Requirements: knowledge of lex/flex and yacc/bison
Comment: Update the scanner and parser so that they can recognize both s-expressions and a mundane syntax with infix operators and syntactic sugar. You should require braces or brackets to mark a piece of s-expression syntax nested within a piece of non-sexpression syntax and vice versa; you can choose what kind of syntax is required at the top level.
Parsing the following examples (here I've chosen to require s-expressions at the top level) would yield the same s-expression:
(if (< a b) (- (+ 1 2) a) (f a b))
(if {a < b} {1 + 2 - a} (f a b))
(if {a < b} {1 + 2 - a} {f(a, b)})
{if a < b then 1 + 2 - a else f(a, b) end if}
Of course the concrete sugared syntax shown above for if and function application is only an example.
Hint: Also this requires to change only the scanner and the parser: all the rest of the implementation, including the interpreter, doesn't need to be touched.

Name: Implement gensym
Difficulty: easy/medium
Requirements: none
Comment: gensym is a function of zero arguments which returns a new symbol, which is guaranteed to be fresh, i.e. not currently present in the symbol table.

Name: Implement gensym efficiently
Difficulty: medium/hard
Requirements: implementing in C (or reusing) a hash table or balanced binary tree data structure
Comment: Implement gensym without requiring a full scan of the symbol table. Pay attention to licenses if you reuse some code.
Hint: if you implement the new data structure from scratch you can use the garbage collector internally. It makes things easier.

Name: Implement a load primitive procedure
Difficulty: trivial
Requirements: none
Comment: Implement a primitive procedure which loads a text file with the given name, parses it and evaluates each s-expression it contains. If you've not implemented the support for strings you can use a symbol as a parameter instead of a string (for example (load 'qqq) would load a file called "qqq" (or "qqq.scm", as you like)).
This can be implemented with a cut/paste if you understand main.c.

Name: Implement the boolean connectives and, or and not as primitive procedures
Difficulty: ridiculously easy, even easier than "trivial"
Requirements: none
Comment: Implement and and or as primitive procedures which take two arguments, and (of course) not as a primitive procedure which takes one argument.
Note 1: When and and or are implemented this way they can't be short-circuit, because they receive all their arguments already evaluated, like all Lisp procedures; hence for example in
(and #f (very-complicated-and-slow-procedure 42))
both arguments are evaluated, even if the value returned by the second one is clearly not needed.
Note 2: understanding the note above about short-circuit operators is not needed for this task. However I strongly urge you to try to understand it, for your personal culture: it's quite important. Of course you can ask me.


Name: Implement the boolean connectives and and or as short-circuit special forms, and not
Difficulty: easy
Requirements: some very basic understanding of the interpreter
Comment: Make and and or take exactly two arguments (this is a simplification compared to "real" Lisps).
Hint 1: if the first argument of and is #f then the result is #f, and there is no need to evaluate the second one; if the first argument of or is #t then the result is #t, and there is no need to evaluate the second one.
Hint 2: you need to add new cases to nanolisp_eval_combination().
Hint 3: you can implement not as a special form in the intepreter or as a primitive procedure; not is unary, so it can't be short-circuit.

Name: Implement variadic non-primitive procedures
Difficulty: medium
Requirements: understanding the interpreter
Comment: What is known in the C jargon as a variadic function is a procedure taking an unlimited number of parameters. This is also possible in Lisp, and in fact it's much easier than in C: you just have to write a single symbol or a dotted-list whose cdr is a symbol as the paramter list in a lambda.
Examples:
(lambda all-the-arguments all-the-arguments)
(lambda (argument-1 argument-2 . other-arguments) other-arguments)
In the examples above the symbols all-the-arguments is bound to the list of all the arguments of the procedure within the body, and other-arguments is bound to the list of all the arguments except the first two.
Many predefined functions are variadic in "real" Lisp systems, for example + (you can sum zero or more numbers, and not just two like in nanolisp). Other famous examples are and, or and list: the first example above, in fact, is a perfectly good definition of list.
nanolisp does not currently support this kind of procedures, but support is not difficult to add. You need to modify the lambda case in nanolisp_eval_combination(), adjust nanolisp_apply() and maybe one or two helper functions.
You can make experiments using guile, which of course supports variadic procedures: try to make nanolisp behave in the same way.

Name: Implement list as a special form
Difficulty: trivial
Requirements: a very superficial understanding of the interpreter: you can understand it in class even if you've never seen an interpreter before
Comment: list is a procedure taking any number of parameters (zero or more) and returning a list with them (of course parameters are evaluated before the call happens: this is true for all Lisp procedures). For example (list (+ 1 2) 'a 4 #f) returns the list (3 a 4 #f), (list 42) returns (42) and (list) returns the empty list ().
If you implement variadic procedures (see the task above) it is extremely easy to define list in Lisp; but in this task I propose you to implement it not as a Lisp procedure, but as a special form in the interpreter.
Hint 1: Modify nanolisp_eval_combination() adding a new case for the operator list. Of course the parameters (not yet evaluated) will be the operands.
Hint 2: You have to return the list of the results of evaluating all the s-expressions in a list (in this case the list of the operands). There is already a function which does exactly this in the interpreter, nanolisp_eval_list(). You just have to call it.
Hint 3: You have to only add 4 or 5 lines of code; there's no need to delete or modify any existing line.
I highly recommend this task to everybody because it's a way to start to understand interpreters with a very simple excercise: it's good for your culture. I'd suggest you to do this as your second task, so that you already know something about the nanolisp source before starting.
Name: Show me any bug you discover in my code
Difficulty: maybe not too hard (I've written this very quickly...)
Requirements: ?
Comment: I guarantee a bonus in your evaluation if you are the first person to show me a real bug in my code. An extra bonus if you fix it.

Back to the course page...
Back to my home page...


Luca Saiu
Last modified: 2007-12-26
Copyright © 2007 Luca Saiu
Verbatim copying and redistribution of this entire page are permitted provided this notice is preserved.