Hello,
On 03/25/2011 07:13 PM, David Robillard wrote:
Sorry. I still have plenty of releases left on the
table before I have
time to get around to this stuff... and yes, I took a little break for a
change ;)
Okay, no problem. And congratulations to you and Stefano for the new releases.
I also should have mentioned I found an old radix tree
implementation of
mine that should be pretty solid. I will contrast/compare these when I
get the chance.
Okay. My implementation is pretty much experimental by the way. The remove()
function is broken, and I think it would be better to use a linked list
internally to avoid reallocs. And it needs some refactoring.
The radix tree is for interning URIs. You could also
use a hash table,
but hash tables are ugly and clunky IMO, I prefer more elegant
structures that grow organically (and radix trees are simple to
implement), and as mentioned we have many common prefixes here which
makes it a suitable choice.
Interning URIs is used to get a number to represent a URI string. This
is necessary for even remotely appropriate performance because it means
statement lookup is a series of integer comparisons, and not a series of
vastly more expensive (and redundant) string comparisons.
For a store of n statements with URIs of length k, a search will thus be
O(lg(n) + k). Without interning it would be O(k*lg(n)), and have more
fragmented memory access as well.
In regard to performance here is what I get when running the attached benchmark:
olivier@etoile:~/dev/lv2/sord/src$ ./radix_benchmark all_words.txt
Read 21110 words from all_words.txt
Inserted words into the radix tree in 17ms
Inserted words into the hash table in 8ms
Looked up all words from the radix tree in 9ms
Looked up all words from the hash table in 4ms
Removed all words from the radix tree in 11ms
Removed all words from the hash table in 8ms
It comares the performance of the GHashTable and my radix tree (on a quad core).
It's done using a 20k list of english words, which I think isn't very
appropriate, because it leads to a lot of single-character branches, and the
tree never gets deep. Given this, it performs reasonably well I think.
Anyway, if you need something more sophisticated and suited to your specific
needs, it's a bit hard for me to help you I think, because there may be many
details you're thinking about which I don't know.
So, right now, I suppose that I could remove the glib dependency for my own
needs, by using drop-in replacements for the hash table and sequence, even if
they're slow. I don't need such performance for parsing tiny LV2 files anyway.
--
Olivier