On 24/03/11 06:10 PM, Olivier Guilyardi wrote:
On 03/24/2011 07:49 AM, Stefano D'Angelo wrote:
Hi Olivier,
2011/3/19 Olivier Guilyardi<list(a)samalyse.com>om>:
On 03/18/2011 06:06 PM, Olivier Guilyardi wrote:
Hi!
On 03/11/2011 07:22 PM, David Robillard wrote:
> On Fri, 2011-03-11 at 12:08 +0100, Olivier Guilyardi wrote:
>
>> I will try and submit a patch to remove glib. It'll take some time because I
>> have dozens of other things to do, but I will work on this. I had a quick look
>> at sord, it seems it only needs glib's sequence and hash table. Is this
correct,
>> or will you need some more utilities?
>>
> Overall I need sequence, hash table (or hash table like thing, I'll
> probably use a radix tree)
>
Alright, attached is a minimal radix tree implementation. I just wrote it from
scratch. Would that work for sord?
If so, I'll try and benchmark it and do some more tests.
Attached is an updated version with a couple of fixes and
optimizations.
I see nobody answered yet... but don't despair, it's probably
because
the long-awaited LV2r4 release made David want to run away for a
while. :-)
However, I have no idea why he needs such a thing and I have to admit
my ignorance in this regard (the only thing I ever read about radix
trees is the Wikipedia article, I'm afraid).
Yes, same thing here, the wikipedia article ;) But it was rather fun to
write.
There are quite a few things to improve, and I've realized that the remove()
function is broken. It's pretty experimental, but my first benchmarks are quite
good. I'll try and work some more on it soon.
I think that in the context of RDF, David expects many keys to share common
(sub)prefixes, and so the radix tree may be the more appropriate for speed, when
compared to a standard hash table.
With LV2, RDF is pretty small so it's really not a big deal IMO.. But sord could
also be used out of LV2 I guess, for some other RDF needs.
Sorry. I still have plenty of releases left on the table before I have
time to get around to this stuff... and yes, I took a little break for a
change ;)
I also should have mentioned I found an old radix tree implementation of
mine that should be pretty solid. I will contrast/compare these when I
get the chance.
The radix tree is for interning URIs. You could also use a hash table,
but hash tables are ugly and clunky IMO, I prefer more elegant
structures that grow organically (and radix trees are simple to
implement), and as mentioned we have many common prefixes here which
makes it a suitable choice.
Interning URIs is used to get a number to represent a URI string. This
is necessary for even remotely appropriate performance because it means
statement lookup is a series of integer comparisons, and not a series of
vastly more expensive (and redundant) string comparisons.
For a store of n statements with URIs of length k, a search will thus be
O(lg(n) + k). Without interning it would be O(k*lg(n)), and have more
fragmented memory access as well.
Thanks,
-dr