This page is kept for historical purposes only. Please see the home page for my current situation. —Luca Saiu.
DLL rattrapage Project
Notice that I have changed some rules for the rattrapage: be sure to read them all before starting.
All text added since the first publication of this page is written in green (like this), in order to help you quickly find it.
Make sure you correctly deal with legal issues such as copyright
and license notices; you can reuse any piece of published free
software you like, as long as you respect its license. (This
means that you cannot take any random
piece of code from the net where no license is specified, and of
course you can never pretend you wrote anything you didn't write:
that's cheating and
makes me very angry). The finished combined
software must be a clean source tarball, with clear authorship,
copyright and license information. You can omit the actual
license and refer to a dummy "license file" if you don't want to
really release your code as free software, but legal notices must
be there.
You must
subscribe to
the
mailing list.
If you only have Internet access at the university lab, that will be enough.
Even if that's not realistic for large free software projects, for
simplicity
I allow you to write mailing list messages in French rather than in English (if you want).
Variable names, comments and documentation must be in English; I'm not asking you to
write great literature; let's say that your English should be at least as
good as my French.
In collaborative mode you have to use bzr for getting the latest
sources -- unless a volunteer comes forward I'll manage the
central repository for you, but you can maintain your branches is
you want. I'll try not to participate too much in
development; I can help you, but you have to write the
code.
A mark of 10/20 is very easy to get: any fourth-year student
should be able to satisfy the minimum requirements.
But you can do much more. Do a glorious hack. Surprise me.
Rules
-
There are now two ways of doing the project. You have to choose only one:
-
a) collaborative development: you work on
implementing a part of the software, while other students
(who also chose collaborative development) implement the
rest. You use a shared bzr repository, and you use our
mailing list to communicate with the other people also using
the collaborative model.
-
This possibility is new:
b)
solo development: you work alone without
collaborating with any other student, and when you have
finished you send me your
complete
project (you don't need to use bzr if you work
solo). You cannot reuse any code or documentation
written by others. Of course pretending you
wrote
anything not written by you
is
cheating.
The barème is the same in either case: each student's mark
will be the sum of the values of the tasks she works on, minus
penalties.
Of course, in any case, incomplete or
imperfect solutions may be worth partial credit, and very good
solutions may be worth more than full credit.
-
New rule: no pairs or groups.
You have to work strictly alone for the rattrapage project.
Of course if you choose collaborative development you can
cooperate with other people by communicating on the mailing list
and changing their code, but all such communication must be
explicit, in written form, and public (to the mailing list).
If somebody claims she worked with another person when
sending a patch or submitting a solo project, all the
people involved will get zero points, and the work will be
wasted. No exceptions.
-
One intelligent student wrote me that the rules of the February project
"privileged quantity over quality". I think she was right, and
her interpretation can explain the uncontrollable flow of
rubbish-quality contributions, containing code with syntactic
errors which clearly had never been tested once, only extremely vague
ideas, or tiny modifications lost in whole pages of code copied
with no clear purpose. Moreover some students also proved not to
have understood anything about free software, copyright and
licensing. The situation is even worse than you think: most of
such exchanges happened between a student and me, without
involving the mailing list.
The following new rules address the shortcomings above:
-
New rule: if you choose collaborative development, all
submissions must be sent to the mailing list, and not
directly to me.
Of course solo projects must be submitted only to me.
If you choose collaborative mode, I will only keep into
account your patches on the mailing list.
The idea is shaming people into behaving correctly.
-
New rule: for collaborative development, all submissions
must be in the form of (reasonable) patches, attached to
mailing list messages, and must specify the purpose of
the change in the e-mail message (not in the patch itself);
for example, "this patch solves bug #424" or
"this patch improves memory use in frobbing mode". If
a patch contains correct code or documentation but does
not include the required legal notices, I will penalize
you.
Solo projects with missing or incorrect
legal notices will also be penalized.
If you choose collaborative development you will have to
use bzr diff
; I have already written how to
work with patches and bzr in a simple way in a mailing list
message. You have to ensure that your patch works with the
most recent version of the sources at the time of your
writing; I will attempt to do simple ports if it's not too
much work for me, but otherwise I may ask you to port your
patch to a new version if needed, before giving you any
points.
-
New rule: whenever you send a contribution with one or
more trivial errors your mark goes down by at least 3
points.
Notice that this also holds for documentation sources.
In any case, I will only re-examine your submission after you
send a new patch without trivial errors
I reserve the right at my discretion to ignore any positive merit of your contribution if it contains syntacitc errors, until you send a fixed version.
-
New rule: whenever you write something wrong about
copyright or licenses, proving you didn't understand the
theoretical part of the course, your mark goes down
by at least 3 points.
If your final mark is negative I will crop
it to zero only when bringing it to the jury, which
means that you will have to compensate for negative points
by doing more work.
Example situation:
- The project begins, and your initial mark is 0/20. You choose collaborative development;
- As your first contribution you send some code that you didn't bother to ever test, and of course there are syntactic errors. I don't bother looking at your code, and your mark goes down to -3/20;
- You fix your code, and send a new patch; I look at your contribution, and I say it's worth 2 points. Your mark becomes -1/20;
- You send another good patch worth 5 points. Your mark is now 4/20.
-
The project should have enough documentation (in English) for
somebody not following DLL to be able to use it; of course the
documentation must be written in some reasonable format
supported by free software; I strongly
recommend Texinfo;
in any case, absolutely not OpenOffice, or any other
"office" thing: we're writing software documentation, and
serious software documentation is written in appropriate
systems. Of course LaTeX is also OK, but it's much harder to
use than Texinfo.
New rule: the documentation must be in Texinfo.
In collaborative mode, every code contribution introducing
some feature should also include documentation about it (for
example, a section to be added to the Texinfo manual), written
by the same author who wrote the code; but it is also
acceptable to have somebody (in collaborative mode) working on
documentation only. In this case, however, whoever writes the
documentation must perfectly understand the code she
deals with: the exercise becomes one about code comprehension
-- of course you can ask questions to the authors.
Excuses such as I can't program and I can't read code, so I
will write documentation instead won't work: you're a
Master-level student, and this is a course about
development. If you can't program, you
will fail.
-
The sources should look clean and familiar to somebody used to
work with
source tarballs;
we should have README and INSTALL
files, a file containing the license, examples, sources with
readable comments.
We're using Guile for the
implementation: is there some convention to follow for projects
written in Guile? If so, we should follow it.
-
Deadline:
Tuesday 2012-06-05 Friday 2012-06-08, less than two weeks before the jury. This time there will be no extension I'm a sucker.
-
Copying code from the net, submitting anything not written by you
or allowing others to contribute in your name
is cheating: don't
even think of doing that; it makes me very angry.
Project description
The project must be implemented in Scheme and run on Guile, on
GNU/Linux systems. If you submit something different I will
ignore it.
You can use Guile version 1.8.x or
2.0.x, as you prefer. It shouldn't make a lot of difference for
the tasks you have to work on.
This project is about text analysis: frequency analysis,
substitution ciphers, Huffman coding, substring search,
regular expressions and
simple automatic translation.
Clarification: you have to support all 8-bit iso-8859-1
7-bit ASCII characters
(not only lower-case letters as in some examples you can find on
the Net), including for example every character you find in
the corpora below.
Support for all 8-bit characters in iso-8859-1, a superset of
ASCII, is optional and worth more points. Still more points if
your 8-bit solution correctly supports both Guile 1.8.x and
Guile 2.0.x.
Specific tasks are on our Bugzilla installation [not available any longer]; you can do as many of them as you want.
Bugzilla entries contain some links to additional information you
will need, and and indicative "value" of each task in points (over
a total of 20). I didn't add a lot of taks since I expect most
students will choose solo developing, but If you have more (on
topic) ideas, ask me. If they are reasonable I will add them as
new tasks.
Resources
- Our mailing list
web page,
which you can use to change settings and manage your subscription.
- Bugzilla: this bugzilla installation contains bug/task
specifications for everybody, collaborative or solo
developers; but the bug status reflects its status in
collaborative development --- for example: bug #234234 might be closed and
bug #234235 might be in progress, but of course solo developers can
independently work on the same tasks.
-
Our Bugzilla installation
[not available any longer]
We use Bugzilla as a task tracker more
than as a bug tracker; the distinction between tasks and bugs
is a little blurred for us, but that's not important.
-
Bzr repositories are for collaborative development only. I
shouldn't need to say this again, but these should be used
with a bzr client and not with a web browser.
-
Our "official" repository (read-only for everyone except the one student in charge, and me):
[not available any longer]
-
A playground repository you can play with, without asking permission to anybody:
[not available any longer]
bzr+ssh://dllstudent@ageinghacker.net/home/dllstudent/playground
I've already sent the password to the mailing list.
-
Just to avoid any misunderstanding, I've renamed the old pre-rattrapage repository:
what was called
[not available any longer]
is now called
[not available any longer]. Of couse you
must not use the old repo for the rattrapage.
-
Corpora (sing.: corpus). I've collected some texts
written in several different languages, which you can use as
representative samples.
All files are
plain text, encoded in the iso-8859-1 charset (8 bits per
character) with Unix-style newlines.
Documentation
Possibly the hardest part in this project is to familiarize
yourself with the software we use. You have an opportunity of
learning something you will use again in the future; or if not,
you will at least learn to learn quickly, and to use the
documentation -- this is really important.
Hints
Here are some suggestions:
-
Guile and Texinfo are installed in system directories at
least in rooms F206 and F207 (I've checked with the system
administrator).
When working in other rooms at the Institut Galilée labs, you will need to run the usual hack
described at the beginning of
TP
pages: I have installed Guile, bzr and Texinfo in my home
directory, and running the line written in red will let you
access my directories.
Of course you don't need that if you work on your computer: if you have root
privileges you can install everything system-wise.
Frequently-Asked Questions
Most of these questions were frequently asked during the project which ended in February.
-
Je ne parle pas anglais ; pouvez-vous tout ré-expliquer en français ?
Tough. As a Master-level student you are required to understand and use English.
-
I only know Java...
I'm aware that most of you haven't used Scheme before; in fact
you're supposed to learn something new, and the project is
balanced with this fact in mind: it would be trivial if you had
experience with the tools we use.
By the way only knowing one programming language, or two very
similar ones, is a very bad sign for a computer
scientist. Seize this opportunity and learn Scheme well.
-
But -blah blah blah- web service -blah blah blah- database blah -blah blah blah- semantic web...
There are some people who are fixated on the web and refuse to
learn anything else. Again, tough. You have to learn something
you don't know already. If you're unable to learn, then your
mark will be zero.
-
But the project is so incredibly difficult...!
No, it's not.
Any real-world project, free software or not, is huge
compared to what you have to do here. Big projects are made of
millions of lines of code. Average projects still have tens or
hundreds of thousands of lines. Wherever you go after
university, the projects you will work on will be much more
complex than this.
-
But my Licence studies didn't prepare me at all...!
Sadly, you're probably right. I've raised this issue with
my experienced colleagues, and I'm not the only teacher thinking
that the education you got in your Licence was very, very poor
(Some professors claim that
this year the best M1 students come from IUTs; and indeed, my
limited experience seems to confirm this). So, you will have to
learn now what you didn't learn before. Sorry, I will not take
part in giving Masters Degrees to a generation of incompetent
computer scientists.
Back to my home page...
Luca Saiu
Last modified: 2012-05-31
Copyright © 2011, 2012 Luca Saiu
Verbatim copying and redistribution of this entire page are permitted provided this notice is preserved.