[Oz] beginner: distributing data fields over documents

Denys Duchier Denys.Duchier at ps.uni-sb.de
Tue Nov 28 14:56:35 CET 2000


suegar at gmx.de (Garibald Hennes) writes:

> I have a set of documents, containing data fields, say docs 1-30, and
> datafield 1-200, and a lot of datafields show up in more than one doc. I
> now want the minimal number of "data blocks" to describe the documents,
> 
> i.e. if I have only doc1(field1 field2 field3 field4 field5) and
> doc2(field1 field2) and doc3(field4 field5) I would search for
> block1(field1 field2) block2(field3) and block3(field4 field5) 
> 
> (and accordingly get doc1(block1 field3 block2) doc2(block1)
> doc3(block2))

Here is my interpretation of your problem:
* there is a finite set of fields
* each document is described by a subset of these fields
* you want to partition the set of fields into blocks
  such that the documents can now be described in terms
  of blocks (i.e. for each block and each document, the block
  is either a subset or disjoint with the set of fields of
  the document)
* furthermore, you want the smallest number of blocks

Here is my solution.  Function SolutionPredicate takes 1 argument: a
list of set values where each element corresponds to the fields
describing 1 document.  It returns a predicate appropriate for search.
A solution is a pair of a list of sets of fields (each corresponds to
a block), some of which may be empty and should simply be ignored when
reading the solution, and the number of empty blocks.  It is easy (and
left as an exercise to the reader) to compute the descriptions of each
document in terms of these blocks.

declare
fun {SolutionPredicate Docs}
   Fields = {FS.unionN Docs}
in
   proc {$ Solution}
      Blocks = {FS.var.list.decl {FS.card Fields}}
      %% the blocks form a partition of the fields
      {FS.partition Blocks Fields}
      BlocksAndCards = {Map Blocks fun {$ B} B#{FS.card B} end}
      Empties = {Map BlocksAndCards fun {$ _#C} C=:0 end}
      NumberOfEmpties = {FD.int 0#{Length Blocks}}
      {FD.sum Empties '=:' NumberOfEmpties}
   in
      Solution = Blocks#NumberOfEmpties
      %% blocks are properly nested in documents
      {ForAll BlocksAndCards
       proc {$ B#C}
	  {ForAll Docs
	   proc {$ D}
	      thread
		 %% when C==0 the disjunction would otherwise
		 %% remain undecided
		 or C\=:0 {FS.subset B D}
		 [] {FS.disjoint B D}
		 end
	      end
	   end}
       end}
      %% eliminate symmetries
      {List.forAllTail
       BlocksAndCards
       proc {$ (B1#C1)|L}
	  {ForAll L
	   proc {$ B2#C2}
	      C1>=:C2
	      thread
		 or C1>:C2
		 [] C1=C2
		    or C2=0
		    [] C2\=:0 {FS.int.min B1}<:{FS.int.min B2}
		    end
		 end
	      end
	   end}
       end}
      {FS.distribute naive Blocks}
   end
end

%% here is an order predicate for `branch and bound' search of the
%% best solution.  A solution is better if it has more empty blocks,
%% i.e. fewer `real' blocks.

declare
proc {Better _#N1 _#N2}
   N1<:N2
end

%% here are the descriptors for your example

declare
Docs = [{FS.value.make [1 2 3 4 5]}
	{FS.value.make [1 2]}
	{FS.value.make [4 5]}]

%% now we can search for the best solution which will appear
%% right-most in the explorer:

{Explorer.best {SolutionPredicate Docs} Better}

%% Cheers,


-- 
Dr. Denys Duchier			Denys.Duchier at ps.uni-sb.de
Forschungsbereich Programmiersysteme	(Programming Systems Lab)
Universitaet des Saarlandes, Geb. 45	http://www.ps.uni-sb.de/~duchier
Postfach 15 11 50			Phone: +49 681 302 5618
66041 Saarbruecken, Germany		Fax:   +49 681 302 5615
-
Please send submissions to users at mozart-oz.org
and administriva mail to users-request at mozart-oz.org.
The Mozart Oz web site is at http://www.mozart-oz.org/.





More information about the mozart-users mailing list