Lazy-Batched Stream Processing

Denys Duchier duchier at ps.uni-sb.de
Mon Jul 25 22:49:16 CEST 2005


Peter Van Roy <pvr at info.ucl.ac.be> writes:

> I propose to make an extremely simple Fault module in the new distributed
> Mozart implementation, with just two properties: (1) a distributed operation
> that cannot be completed will simply block, and (2) any problems will be
> signaled through a stream that is generated by a watcher.  So far, in all the
> fault tolerant abstractions we have done, this is all the functionality we
> really need.

You have _NO_ idea how happy this makes me.  This is precisely the model that
Christian and I have been lobbying for from the beginning: a _nice_ stream ;-) I
have no opinion about the blocking (although I remember Donatien's talk at
MOZ2004), but the stream-based "fault watching" definitely gets my vote.

> I think that better support for streams, either lazy, distributed, or both, is
> really an idea whose time has come.

OK.  So let's see how we can make this natural and convenient.  There are issues
with composing "on-demand batch" stream abstractions: namely that, if the
"batch" steps are badly chosen, you can end up in a situation where you have
precomputed, at each step of the stream pipe, a lot more data than you ever
anticipated.

For this reason, in my admitedly vague design, I propose to have, for each
transducer module Foo, both:

	{Foo.encode      IN OUT}
and     {Foo.encodeBy BY IN OUT}

where Foo.encode just calls Foo.encodeBy with a reasonable default for the BY
argument (where BY specifies how many elements to generate for each batch).

This makes it possible for abstractions based on other stream processing
abstractions to choose "batch steps" that ensure that the abstractions proceed
in "lock step".

Cheers,

-- 
Dr. Denys Duchier - IRI & LIFL - CNRS, Lille, France
+33 (0)6 25 78 25 74    http://www.lifl.fr/~duchier/



More information about the mozart-hackers mailing list