[Sugar-devel] I'm looking for a tree...

Albert Cahalan acahalan at gmail.com
Mon Jun 8 06:13:24 EDT 2009


Benjamin M. Schwartz writes:

> I am looking for a fast data structure with the following properties:
>
> Maintains an indexed list of arbitrary, non-ordered objects
> (like a python List or C array)
> Allows fast:
> Insertion at any location
> Deletion at any location
> Lookup of an object by its index
> Reverse lookup, to determine the index of an object

It may have been better to describe what you wanted to do.
For example:

Might the amount of data ever exceed the size of RAM?

What data type is an index? (string, int, arbitrary blob...)

What data type is an object? (string, int, arbitrary blob...)

How can you say "Insertion at any location" yet "non-ordered"?
This doesn't seem to make sense. Did you change your mind?

> Python's List has O(1) lookup, but O(N) insert, delete, and
> reverse-lookup.

Heh. A "high level" language is supposed to not have that
kind of problem. Try perl. :-)

> To make reverse lookup O(1) I could maintain a separate
> Dict mapping objects to indices, but this would cost an
> additional O(N) on every insertion and deletion.

If it did, you're still O(N) because O(N)+O(N)==O(N).
It should not take O(N) for insert/delete, even if you
decide you want things ordered.

Reverse mapping is easy: store a copy of the index in (with)
the object.

> A linked list has O(1) insertion and deletion,

I wouldn't say that. You have to find the object's location
to do that, so you need to include the O(N) lookup cost.

> A standard self-balancing tree cannot be used because the objects
> are not ordered, and self-balancing trees require ordered keys.
> I could use the index of an object as the sort key, but then
> insertion and deletion are O(N) because all subsequent keys must
> be altered.  I could fabricate new sort keys to ensure that
> insertions occur at the desired location, but then the length of
> the keys will grow like O(N), making all operations at least O(N).

Standard self-balancing trees don't have these problems.
Neither do hash tables, assuming reasonable table size.

Here you go:

http://en.wikipedia.org/wiki/Judy_array
http://en.wikipedia.org/wiki/Red-black_tree
http://en.wikipedia.org/wiki/AVL_tree
http://en.wikipedia.org/wiki/Splay_tree
http://en.wikipedia.org/wiki/B%2B_tree
http://en.wikipedia.org/wiki/Radix_tree
http://en.wikipedia.org/wiki/Scapegoat_tree
http://en.wikipedia.org/wiki/AA_tree
http://en.wikipedia.org/wiki/Heap_(data_structure)
http://en.wikipedia.org/wiki/Skip_list
http://en.wikipedia.org/wiki/Hash_table

If you end up swapping or otherwise going to disk, pick a
tree with a large branching factor.


More information about the Sugar-devel mailing list