Skip to content

Exercises on Data Structures

Alex edited this page Jul 6, 2018 · 3 revisions

Rough notes about data structures to look at, to see if it is worth while to make exercises about them.

There are basically four "families" of abstract data types, and a number of interesting data structures which are members of the families.

  1. Arrays
  2. Linked Structures, when arrays just aren't good enough
  3. Trees
  4. Graphs

There are, of course, abstract data types which do not fall under these four families. The exceptions are:

  • List
  • Stack
  • Queue
  • Set

As Rob Pike points out

The following data structures are a complete list for almost all practical programs:

  • array
  • linked list
  • hash table
  • binary tree

Of course, you must also be prepared to collect these into compound data structures.

I should spend some time discussing further these four data structures...

Remark (Hashing Function). I also should note the djb2 hashing function for strings, which in C looks like:

uint32_t
djb2_hashCode(unsigned char *str) {
    uint32_t hash = 5381;
    int c;

    while (c = *str++)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;
}

It's a really good and fast hashing function for strings. Ostensibly, if we represent an object as an array of bytes, this could be used as well (quite dangerous though!). See also Why are 5381 and 33 so important in the djb2 algorithm? and a discussion About 5381. (End of Remark)

Sequence of exercises with motivation

I suppose one approach is to notice how useful arrays are.

First, we move away from the concrete syntax of array[index] to the more abstract array_get(array, index) and array_set(array, index, value). (Exercise 1)

But it would be nice if the array could resize automatically as needed, which is precisely the dynamic array data structure. (Exercise 2)

We observe how nice it is to have an array, and treat the index as somehow meaningful (e.g., in a census program, the index could be the age in years, and the value would be the number of people at this age). It would be nice if we could let the index be structured, in the sense of making it a string or a custom data structure. This gives us association lists (Exercise 3)

But then we can do some formal analysis of the association list (exercise 4) and notice how terrible it is. This gives motivation for...trying anything else.

This gives a natural motivation for hash tables (Exercise 5), tries (exercise 6), and so forth (exercises 7 through infinity).

Remark: Sets for Free. Actually, what's really interesting is...once we allow "anything" to be used as the index ("key"), the array is called an "associative array" (to each "key" is an associated "value"), a "dictionary", or more generally "map". The interesting thing is we can use this to implement a set, by making the key equal to the value we guarantee there are no duplicate entries.

Structure of Exercises

So, whenever discussing a new data structure, the basic questions one should ask are:

  • What are the basic operations on it? (What are the signatures of those operations, i.e., what are the inputs and outputs?)
  • What are the axioms for those operations?

The next questions one should ask, when implementing the thing:

  • What does the struct look like? (What fields does it have, and what type are those fields?)
  • What functions implement the operations defined on the type?
  • Can we implement the axioms as preconditions/postconditions?

The first batch of questions (what operations are defined, and what are their axioms) are hard to jump into...even asking what the struct looks like, and what functions are defined on it is hard.

So it seems to me the best approach is to gradually get the student to think about such things. Initially, a generic exercise should look like: given the data structure

typedef struct { /* ... */ } foo

Please implement the following functions

  • foo* new_foo() creates a new foo object in memory, and returns a pointer to it (or null if there is no memory left)
  • void free_foo(foo *obj) will delete the foo object in memory, and have obj be a null pointer; if obj is null already, do nothing
  • char* foo_toString(foo *f) will produce a string representation of f
  • bool foo_equals(foo *lhs, foo *rhs) tests if the two foo objects are equal, intuitively if lhs == rhs; if either point is null, returns false always
  • hash_t foo_hashCode(foo *obj) returns a hash code for obj; null pointers return 0 by convention
  • etc.

Basically, give the function signature, and specify the contract expected. Perhaps add some more axioms, like

  • if foo_equals(x, y) then foo_hashCode(x) == foo_hashCode(y) (equivalent foo objects have equivalent hash codes)
  • if foo_equals(x, y) then strcmp(foo_toString(x), foo_toString(y)) == 0 (equal foo objects have identical string representations)
  • etc.

After a few of these, perhaps the struct details can be left to the reader to decide, but more explanation of the data structure be given. (Having diagrams like the ones used in the interpreter handbook may be really useful, too.)

Clone this wiki locally