-
Notifications
You must be signed in to change notification settings - Fork 0
Exercises on Data Structures
Rough notes about data structures to look at, to see if it is worth while to make exercises about them.
There are basically four "families" of abstract data types, and a number of interesting data structures which are members of the families.
- Arrays
-
Bit Array, pretending an
int64
is an array of 64 bits, for example - Circular Buffer
- Dynamic Arrays when you need to add an unknown amount of elements to an array "on the fly"
-
Bit Array, pretending an
- Linked Structures, when arrays just aren't good enough
- Linked List
- Association List, a linked list of key-value pairs
-
Trees
-
Search Trees useful for searching for stuff fast
- B-Tree used all the time in database implementations
- Binary Search Tree
-
Trie
- Hash Trees -- c.f., paper on Ideal Hhttps://en.wikipedia.org/wiki/Binary_decision_diagramash Trees
- Hash Array Map Trie -- c.f., paper
- Heap
-
Search Trees useful for searching for stuff fast
-
Graphs
- Binary Decision Diagram, very useful for automated theorem proving
There are, of course, abstract data types which do not fall under these four families. The exceptions are:
- List
- Stack
- Queue
- Set
As Rob Pike points out
The following data structures are a complete list for almost all practical programs:
- array
- linked list
- hash table
- binary tree
Of course, you must also be prepared to collect these into compound data structures.
I should spend some time discussing further these four data structures...
Remark (Hashing Function).
I also should note the djb2
hashing function for strings, which in C looks like:
uint32_t
djb2_hashCode(unsigned char *str) {
uint32_t hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
It's a really good and fast hashing function for strings. Ostensibly, if we represent an object as an array of bytes, this could be used as well (quite dangerous though!). See also Why are 5381 and 33 so important in the djb2 algorithm? and a discussion About 5381. (End of Remark)
I suppose one approach is to notice how useful arrays are.
First, we move away from the concrete syntax of array[index]
to the more abstract array_get(array, index)
and array_set(array, index, value)
. (Exercise 1)
But it would be nice if the array could resize automatically as needed, which is precisely the dynamic array data structure. (Exercise 2)
We observe how nice it is to have an array, and treat the index as somehow meaningful (e.g., in a census program, the index could be the age in years, and the value would be the number of people at this age). It would be nice if we could let the index be structured, in the sense of making it a string or a custom data structure. This gives us association lists (Exercise 3)
But then we can do some formal analysis of the association list (exercise 4) and notice how terrible it is. This gives motivation for...trying anything else.
This gives a natural motivation for hash tables (Exercise 5), tries (exercise 6), and so forth (exercises 7 through infinity).
Remark: Sets for Free. Actually, what's really interesting is...once we allow "anything" to be used as the index ("key"), the array is called an "associative array" (to each "key" is an associated "value"), a "dictionary", or more generally "map". The interesting thing is we can use this to implement a set, by making the key equal to the value we guarantee there are no duplicate entries.
So, whenever discussing a new data structure, the basic questions one should ask are:
- What are the basic operations on it? (What are the signatures of those operations, i.e., what are the inputs and outputs?)
- What are the axioms for those operations?
The next questions one should ask, when implementing the thing:
- What does the
struct
look like? (What fields does it have, and what type are those fields?) - What functions implement the operations defined on the type?
- Can we implement the axioms as preconditions/postconditions?
The first batch of questions (what operations are defined, and what are their axioms) are hard to jump into...even asking what the struct
looks like, and what functions are defined on it is hard.
So it seems to me the best approach is to gradually get the student to think about such things. Initially, a generic exercise should look like: given the data structure
typedef struct { /* ... */ } foo
Please implement the following functions
-
foo* new_foo()
creates a newfoo
object in memory, and returns a pointer to it (ornull
if there is no memory left) -
void free_foo(foo *obj)
will delete thefoo
object in memory, and haveobj
be a null pointer; ifobj
is null already, do nothing -
char* foo_toString(foo *f)
will produce a string representation off
-
bool foo_equals(foo *lhs, foo *rhs)
tests if the twofoo
objects are equal, intuitively iflhs == rhs
; if either point isnull
, returns false always -
hash_t foo_hashCode(foo *obj)
returns a hash code forobj
; null pointers return 0 by convention - etc.
Basically, give the function signature, and specify the contract expected. Perhaps add some more axioms, like
- if
foo_equals(x, y)
thenfoo_hashCode(x) == foo_hashCode(y)
(equivalentfoo
objects have equivalent hash codes) - if
foo_equals(x, y)
thenstrcmp(foo_toString(x), foo_toString(y)) == 0
(equalfoo
objects have identical string representations) - etc.
After a few of these, perhaps the struct
details can be left to the reader to decide, but more explanation of the data structure be given. (Having diagrams like the ones used in the interpreter handbook may be really useful, too.)