[Oz] beginner: distributing data fields over documents
Denys Duchier
Denys.Duchier at ps.uni-sb.de
Tue Nov 28 14:56:35 CET 2000
suegar at gmx.de (Garibald Hennes) writes:
> I have a set of documents, containing data fields, say docs 1-30, and
> datafield 1-200, and a lot of datafields show up in more than one doc. I
> now want the minimal number of "data blocks" to describe the documents,
>
> i.e. if I have only doc1(field1 field2 field3 field4 field5) and
> doc2(field1 field2) and doc3(field4 field5) I would search for
> block1(field1 field2) block2(field3) and block3(field4 field5)
>
> (and accordingly get doc1(block1 field3 block2) doc2(block1)
> doc3(block2))
Here is my interpretation of your problem:
* there is a finite set of fields
* each document is described by a subset of these fields
* you want to partition the set of fields into blocks
such that the documents can now be described in terms
of blocks (i.e. for each block and each document, the block
is either a subset or disjoint with the set of fields of
the document)
* furthermore, you want the smallest number of blocks
Here is my solution. Function SolutionPredicate takes 1 argument: a
list of set values where each element corresponds to the fields
describing 1 document. It returns a predicate appropriate for search.
A solution is a pair of a list of sets of fields (each corresponds to
a block), some of which may be empty and should simply be ignored when
reading the solution, and the number of empty blocks. It is easy (and
left as an exercise to the reader) to compute the descriptions of each
document in terms of these blocks.
declare
fun {SolutionPredicate Docs}
Fields = {FS.unionN Docs}
in
proc {$ Solution}
Blocks = {FS.var.list.decl {FS.card Fields}}
%% the blocks form a partition of the fields
{FS.partition Blocks Fields}
BlocksAndCards = {Map Blocks fun {$ B} B#{FS.card B} end}
Empties = {Map BlocksAndCards fun {$ _#C} C=:0 end}
NumberOfEmpties = {FD.int 0#{Length Blocks}}
{FD.sum Empties '=:' NumberOfEmpties}
in
Solution = Blocks#NumberOfEmpties
%% blocks are properly nested in documents
{ForAll BlocksAndCards
proc {$ B#C}
{ForAll Docs
proc {$ D}
thread
%% when C==0 the disjunction would otherwise
%% remain undecided
or C\=:0 {FS.subset B D}
[] {FS.disjoint B D}
end
end
end}
end}
%% eliminate symmetries
{List.forAllTail
BlocksAndCards
proc {$ (B1#C1)|L}
{ForAll L
proc {$ B2#C2}
C1>=:C2
thread
or C1>:C2
[] C1=C2
or C2=0
[] C2\=:0 {FS.int.min B1}<:{FS.int.min B2}
end
end
end
end}
end}
{FS.distribute naive Blocks}
end
end
%% here is an order predicate for `branch and bound' search of the
%% best solution. A solution is better if it has more empty blocks,
%% i.e. fewer `real' blocks.
declare
proc {Better _#N1 _#N2}
N1<:N2
end
%% here are the descriptors for your example
declare
Docs = [{FS.value.make [1 2 3 4 5]}
{FS.value.make [1 2]}
{FS.value.make [4 5]}]
%% now we can search for the best solution which will appear
%% right-most in the explorer:
{Explorer.best {SolutionPredicate Docs} Better}
%% Cheers,
--
Dr. Denys Duchier Denys.Duchier at ps.uni-sb.de
Forschungsbereich Programmiersysteme (Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45 http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50 Phone: +49 681 302 5618
66041 Saarbruecken, Germany Fax: +49 681 302 5615
-
Please send submissions to users at mozart-oz.org
and administriva mail to users-request at mozart-oz.org.
The Mozart Oz web site is at http://www.mozart-oz.org/.
More information about the mozart-users
mailing list