Improving the open connection procedure (suggestions)
Valentin Mesaros
valentin at info.ucl.ac.be
Wed Jan 21 15:26:34 CET 2004
Erik Klintskog wrote:
>
> Donatien Grolaux wrote:
>
> > erik at sics.se wrote:
> >
> > >So, a proper solution to the problem would be to declare the target process
> > >PermFailed.
> > >
> > As a generalisation, wouldn't it be a good idea to add the possibility
> > to inject a permFail to a distributed entity at the Oz level ? Oz
> > applications would otherwise just continuously try to connect to failed
> > sites which is a waste of ressource that is very bad for continuously
> > running servers for example.
> >
>
> I agree. That can easily be provided. However, the semantics of such an
> operation on
> any entity except a port is unclear. Defining an entity as being permfail, will
> result in
> defining the process the entity was created at as crashed. This will affect
> other
> entities orginating from this process.
>
> The timeout between unsuccesfull connection atempts is increased, according to
> the
> values of some properties(don't reacll them right now). You can define the
> maximum timeout allowed.
> In practice this means that the used resources for a connection that is
> impossible to establish will
> eventually be neglectable.
Since it is not clear what are the implications with the primitive for
injecting a PermFail, I propose to simply have the connection procedure
notify the DL that the target is not recheable (after trying a certain
time, fixed by the user). This would be interpreted by the DL as a
PermFail for the target.
> >
> > >The OpenTimeout is a safety mechanism to abort connection procedures that
> > >hangs due to any reason. This to minimize the utilization of fd resources.
> > >Actually,
> > >it is a tool to protect the DL from buggy connection procedures. I assume
> > >that you know
> > > that the user can suply the DL with customized connection procedures.
> > >
> > Could you document that feature, the documentation states one can do it,
> > but not how to. The source is also complicated to understand and not
> > very helpful in how to implement a connection procedure.
> >
>
> It is actually not that complicated. Have a look at ConnectionProcedure.oz in
> the
> source.
>
> >
> > >>The way it is designed today, the connection opening procedure gives up
> > >>without considering the TCP status. That is, the connection procedure is
> > >>aborted after a default OpenTimeout=3s. In the case of connections that
> > >>involve NAT or wireless peer-to-peer, opening a connection may take more
> > >>than 3s. As a result, the connection procedure gives up and retries
> > >>again without ever succeeding. Of course, the programmer can set
> > >>OpenTimeout greater than 3s and establish the connection. However, the
> > >>useless reconnecting problem
> > >>remains in case of connecting to inaccessible machines.
> > >>
> > >>Q2: When establishing a connection, why not just rely on the transport
> > >>protocol timeouts?
> > >>
> > >>
> > >
> > >See above. Just raise the time out.
> > >
> > Raising the timeout is a way of avoiding the problem, not a way to solve
> > it !
>
> No, don't agree.
>
> The problem the timeout solves is to protect the DSS from connection procedures
> that
> does not terminate. It solves this perfectly fine.
>
> Raising the timeout solves your problem of slow connection establishments.
Come on Erik, since the programmer is allowed to design its own
connection procedure, I don't see why _does_ the DL have to interfere.
If opening a connection hangs, there may be a good reason for that (the
DL has no clue why). Otherwise, it may be a bug that the programmer has
to fix since he assumed to design his own connection procedure. In any
case, I don't think it's a good idea to have the DL taking decisions in
behalf of the programmer.
Moreover, given the strict decoupling between the two layers, I don't
really see how the connection module can hurt the DL.
Maybe there are other reasons for having OpenTimeout, that I don't know.
Otherwise, we think that it is not really usefull and, more important,
it bothers the programmer.
> > I believe the writer of an Oz application shouldn't have to decide
> > a value by himself. And also the DL shouldn't try to protect itself in
> > such a way from a buggy connection procedure : if a developper decides
> > to change the default one, it's his responsability to write one that is
> > not buggy. Instead it would be a very nice feature if two computers can
> > open a tcp connection => Mozart processes can communicate together.
> > Unfortunately, its not the case for now.
>
> It is, if you raise the timeout to a higher value.
>
> >
> > >>Q3: Wouldn't be better to support multihoming?
> > >>
> > >>
> > >>
> > >
> > >Isn't that obvious: Yes. The problem, however, is that the IP number is
> > >closely associated
> > >with the identity of a Mozart process. I don't think this is easy to achive
> > >at DL level.
> > >
> > What about the DSS then ? Will there be this restriction again ?
>
> Nope. I think you should be familar with the possibility of the DSS, since
> Valentine wrote a paper together with me on the new component based messaging
> layer of the DSS. Please read the paper if you're interested.
Come on Erik, not because we wrote a paper, has Donatien read it :-)
Could you elaborate on mozart process identity? What are, exactly, the
inconvenients?
The first big incovenient that I see is that the DSite is stored into a
hashtable on a key made of IP and portnr. I think this can be easily
changed.
Inside the DL, a DSite is the local represntation of a remote process.
Why do you speak about the Mozart process identity? Isn't that used only
locally?
The multi-homing support should not have too many implications, even for
the DL.
Look, this is how I see it. The offerer creates its DSite, as before,
with respect to a certain network interface. It will return a list of
tickets (one for each interface). The first in the list is the ticket
corresponding to the interface the DSite was set to. The taker will try
to connect to the offerer via the provided tickets. If the first it
succeeds with the first ticket, that's it; the connection is
established. Otherwise, it will try with another ticket.
Here comes a problem. Although TCP succeeds to connect, the DL will
consider that the remote site is dead. This is because the ip-dependent
hashing. If we use a different key for storing DSites I thik we can
support multi-homming.
Since you haven't commented to points (4) and (5) from my last mail,
should I suppose that you agree with us?
> >
> > In the meantime would it be possible to change the way Mozart processes
> > connect together to remove the need to connect through a particular IP
> > only (ie if a computer has network interfaces 192.168.0.13 and
> > 130.104.8.9 and the ticket was created using 130.104.8.9 then an
> > incoming connection from 192.168.0.13 is still accepted) ?
>
> Everything is possible, it is just a matter of resources. I don't have the
> resources to restructure the connection mechanisms of Mozart. All my time
> goes into the DSS and the schedule is quite tight, a lot of things have to be
> done this year. However, I think that Valentin could do the job.
YES. As I've already said at the begining of this talk, we _intend_ to
spend some effort on it. However, we might need your expertise.
Valentin
-
Please send submissions to hackers at mozart-oz.org
and administriva mail to hackers-request at mozart-oz.org.
The Mozart Oz web site is at http://www.mozart-oz.org/.
More information about the mozart-hackers
mailing list