[SWIPL] what improvements can I make to improve the search of a large set of static facts

Richard O'Keefe ok at cs.otago.ac.nz
Fri Aug 19 07:26:35 MEST 2011


On 19/08/2011, at 2:50 PM, T M wrote:

> is there a faster way to compile or index a large number of facts to improve
> speed? I have roughly 100K facts of the form linked(groupid,id)
> 
> I am try to find connections given an id, but the performance has been
> really slow.  I consult the facts and  I use the rules
> 
> connected(Id1,Id2) :-
> linked(Grp1,Id1),
> linked(Grp1,Id2),
> \+Id1=Id2.

Something terrible happened to your indentation on the way to us.

connected(Id1, Id2) :-
    % let us assume that both Id1 and Id2 are known.
    linked(Grp1, Id1),
    % OOPS! Grp1 is certainly unknown.  It will take a linear
    % search through linked/2 to find this.
    linked(Grp1, Id2),
    % OK! Grp1 is known now.
    \+ Id1 = Id2.

> 
> connected(Id1,Id2) :-
> connected(Id1,Id3),
> connected(Id3,Id2),
> \+Id1=Id2,!.

Oh-oh!  You are trying to do a transitive closure.  This is about the
worst way you could do it.  Suppose we call
	?- connected(x, y).
=>	?- connected(x, Id3),   connected(Id3, y), \+ x = y.
=>	?- connected(x, Id3'),  connected(Id3', Id3), \+ x = Id3,
	                        connected(Id3, y), \+ x = y.
=>	?- connected(x, Id3''), connected(Id3'', Id3'), \+ x = Id3',
                                connected(Id3',  Id3),  \+ x = Id3,
                                connected(Id3,   y),    \+ x = y.

In SWI Prolog you can use the :-index directive to get
indexing on the *second* argument of linked/2.

:- index(linked(0, 1)).

connected(Id1, Id2) :-
    % let us assume that both Id1 and Id2 are known.
    linked(Grp1, Id1),
    % OK! Id1 is known, so 2nd-argument indexing finds Grp1.
    linked(Grp1, Id2),
    % OK! Id2 is known, so 2nd-argument indexing finds Grp1 again.
    Id1 \== Id2.

If Id2 might be unknown, then it could be worth using

:- index(linked(1, 1)).

to get indexing on both arguments (taken separately).

This still leaves the second clause of connected/2 as an
infinite loop eager to happen.

If an Id can only belong to one Grp, you don't need the
second clause at all.  If Grps can overlap, you would
probably do well to precompute equivalence classes of
Grps and do

	connected(Id1, Id2) :-
	    linked(Grp1, Id1),
	    grp_equivalence_class(Grp1, EQ),
            linked(Grp2, Id2),
            grp_equivalence_class(Grp2, EQ),
            Id1 \== Id2.







More information about the SWI-Prolog mailing list