Continuing the discussion which Lars Hansen started in the lecture ... Professor Patt-Shamir has clarified ... Computing the subgraph determined by a traversal specification ================================== This problem is central to adaptive programming and therefore deserves a closer analysis. Professor Patt-Shamir below clarifies the differences between algorithm A (the RED-BLUE algorithm) and the algorithm by Lars Hansen. I looks like that currently the winning algorithm is the RED-BLUE with on-the-fly construction of the reversed graph. (See CLR (Corman, Leiserson, Rivest, Algorithms) to learn about graph traversal.) There is an important other improvement which we can make: Assume we have n traversal specifications t1, ... ,tn and one class dictionary G. We need to compute the n subgraphs s1, ... ,sn. Can we preprocess G so that it is easier to compute the n subgraphs? The idea is to shrink G. The answer is positive and this is implemented in the Demeter Tools/C++. What is the answer? -- Karl L. From boaz@ccs.neu.edu Fri Oct 18 17:47:08 1996 From: Boaz Patt-Shamir To: Karl Lieberherr Cc: boaz@ccs.neu.edu, dougo@ccs.neu.edu Subject: propagation graph computation The problem with "algorithm B" is that its time and space complexities are quadratic. > A word about efficiency. > > Assuming G is connected, algorithm A is roughly O(|E|) in the worst case > (in the worst case, the entire graph is the subgraph). > > Algorithm B also needs to follow O(|E|) edges but in addition uses some > data structures (the sets) whose complexities are less well understood > because the representations are not chosen in the description of the > algorithm. Briefly, the set 'visited' can be a boolean vector; visit! > and visited? therefore take O(1) time. 'Subgraph' can be a boolean > vector as well. That's not true. The complexity of finding whether a certain bit in an n-length boolean vector is at least log n (think about large n!) The way to implement this efficiently would be to use the famous union-find algorithm (see CLR, ch.22). > However, keeping 'path' as one makes the Union operator > take O(|V|) time, not a good thing. A better organization is to choose > path as a list of nodes and observe that a node can be on a path only > once (i.e, it's a stack), thereby making adjoin O(1) and the Union > operation O(|path|). This is not as good as Algorithm A, because nodes > in the path will take part in the union several times. One can fix this > by making observations about the nature of 'path', but I leave that as > an exercise. > So this implementation is actually quadratic time! ouch... > > > I dislike Algorithm A chiefly because it relies on destructive updates > > > to the graph during the traversal (prohibiting, for example, concurrent > > > traversals of the same graph). > > > >This is only if you interpret "color a node red" as "set a variable on > >a node" > > [...] > > Not at all. It's the link reversal that gets to me... as I think I > pointed out further down in the message, the color sets can be kept > separate in a number of ways, but I'm unconvinced that Algorithm A can > be implemented with, say, a doubly-linked graph that would allow the > update of the links to be avoided. > The way to fix that is that while doing the first pass, to create the travesed graph on the fly with back-edges. Once the first pass is done, use the temporary graph to do the second pass. This solves both the coloring and the reversal requirements without disrupting the original graph. There is a slight difficulty about freeing the temporary memory, but this can left as an execise to the reader... Boaz Hi Boaz: here is the sequence of message which resulted. It would be great if you can clarify. -- Karl -------------------- From lth@ccs.neu.edu Fri Oct 18 10:50:03 1996 Received: from denali.ccs.neu.edu (root@denali.ccs.neu.edu [129.10.113.75]) by amber.ccs.neu.edu (8.7.5/8.7.3) with ESMTP id KAA23215; Fri, 18 Oct 1996 10:49:55 -0400 (EDT) Received: from localhost.ccs.neu.edu (lth@localhost.ccs.neu.edu [127.0.0.1]) by denali.ccs.neu.edu (8.7.5/8.6.4) with SMTP id KAA23639; Fri, 18 Oct 1996 10:49:54 -0400 (EDT) Message-Id: <199610181449.KAA23639@denali.ccs.neu.edu> X-Authentication-Warning: denali.ccs.neu.edu: Host lth@localhost.ccs.neu.edu [127.0.0.1] didn't use HELO protocol To: com3360@denali.ccs.neu.edu cc: lth@denali.ccs.neu.edu Subject: subgraph searching Date: Fri, 18 Oct 96 10:49:53 -0400 From: Lars Thomas Hansen Status: R [The message is longish, but I thought it better that way.] Re: the discussion about computing the subgraph in last night's class. The problem is this: given a graph G=(V,E), and a, b in V, find all vertices v in V s.t. there is a path from a to b in G (inclusive). Dr. Lieberherr presented the following algorithm (I'm paraphrasing slightly), which I will call Algorithm A: 1. Color 'a' RED 2. While there is e=(i,j) in E s.t. i is RED and j is not RED, traverse (i,j) and color j RED. Set E = E-{(i,j)}+{(j,i)} (i.e., reverse the link.) This can be a depth-first search. 3. Color 'b' BLUE. 4. While there is e=(i,j) in E s.t. i is BLUE and j is not BLUE, traverse (i,j) and color j BLUE. This can be a depth-first search. 5. The nodes we want are now the ones colored both RED and BLUE. I dislike Algorithm A chiefly because it relies on destructive updates to the graph during the traversal (prohibiting, for example, concurrent traversals of the same graph). Dr. Lieberherr pointed out that the graph can be copied beforehand, which is clearly true, but this is expensive for larger graphs or for graphs with large nodes. In the context of Demeter the approach is plausibly affordable. Dr. Lieberherr also suggested that one can avoid the link reversal by keeping doubly-linked graphs; since the colors can clearly be kept external to the graph this would avoid all updates to the graph. However, double links would not by themselves solve the problem, as the link reversal serves a crucial role in locking out dead-ends from the graph before the second traversal (consider the graph presented in class, for example). So I consider this a conjecture about which I am not convinced. With that as a motivation, here is a different algorithm, Algorithm B. You will pardon me for using an actual language to describe it; the level of detail should not be off-putting. (define subgraph (make-empty-set)) (define target-node ) (define (search node path) (if (same-node? node target-node) (set! subgraph (union subgraph path))) (if (not (visited? node)) (begin (visit! node) (let ((path (adjoin node path))) (for-each (lambda (child) (search child path)) (children node)))))) To keep track of visited nodes, we use another set: (define visited (make-empty-set)) (define (visit! node) (set! visited (adjoin node visited))) (define (visited? node) (member node visited)) To start, we run (search (make-empty-set)). Algorithm B computes the set of nodes in the global variable 'subgraph', using the global variable 'visited' as a set of nodes that have already been visited. It keeps a set of nodes, 'path', that are the nodes along the path from a to whatever node it's currently at. When it gets to the target node, it adds the nodes in the path to the nodes of the subgraph. It should be clear that the 'subgraph' and 'visited' sets can be made non-global and that one can therefore perform concurrent searches of the graph. (For the purists, one can even avoid the assignments.) The reasons I like Algorithm B is that (a) it is nice and abstract -- there is no mention of the graph implementation, and (b) it does not change the graph as it goes along. A word about efficiency. Assuming G is connected, algorithm A is roughly O(|E|) in the worst case (in the worst case, the entire graph is the subgraph). Algorithm B also needs to follow O(|E|) edges but in addition uses some data structures (the sets) whose complexities are less well understood because the representations are not chosen in the description of the algorithm. Briefly, the set 'visited' can be a boolean vector; visit! and visited? therefore take O(1) time. 'Subgraph' can be a boolean vector as well. However, keeping 'path' as one makes the Union operator take O(|V|) time, not a good thing. A better organization is to choose path as a list of nodes and observe that a node can be on a path only once (i.e, it's a stack), thereby making adjoin O(1) and the Union operation O(|path|). This is not as good as Algorithm A, because nodes in the path will take part in the union several times. One can fix this by making observations about the nature of 'path', but I leave that as an exercise. --lars From lth@ccs.neu.edu Fri Oct 18 10:52:24 1996 Received: from denali.ccs.neu.edu (root@denali.ccs.neu.edu [129.10.113.75]) by amber.ccs.neu.edu (8.7.5/8.7.3) with ESMTP id KAA23356; Fri, 18 Oct 1996 10:52:22 -0400 (EDT) Received: from localhost.ccs.neu.edu (lth@localhost.ccs.neu.edu [127.0.0.1]) by denali.ccs.neu.edu (8.7.5/8.6.4) with SMTP id KAA23740; Fri, 18 Oct 1996 10:52:20 -0400 (EDT) Message-Id: <199610181452.KAA23740@denali.ccs.neu.edu> X-Authentication-Warning: denali.ccs.neu.edu: Host lth@localhost.ccs.neu.edu [127.0.0.1] didn't use HELO protocol To: Lars Thomas Hansen cc: com3360@denali.ccs.neu.edu Subject: Re: subgraph searching In-reply-to: Your message of "Fri, 18 Oct 96 10:49:53 EDT." <199610181449.KAA23639@denali.ccs.neu.edu> Date: Fri, 18 Oct 96 10:52:18 -0400 From: Lars Thomas Hansen Status: R >The problem is this: given a graph G=(V,E), and a, b in V, find all >vertices v in V s.t. there is a path from a to b in G (inclusive). _through v_, of course. Sigh. --lars From dougo@ccs.neu.edu Fri Oct 18 13:01:46 1996 Received: from k2.ccs.neu.edu (dougo@k2.ccs.neu.edu [129.10.112.116]) by amber.ccs.neu.edu (8.7.5/8.7.3) with ESMTP id NAA28454; Fri, 18 Oct 1996 13:01:42 -0400 (EDT) From: Doug Orleans Received: (dougo@localhost) by k2.ccs.neu.edu (8.7.5/8.6.4) id NAA13697; Fri, 18 Oct 1996 13:01:41 -0400 (EDT) Date: Fri, 18 Oct 1996 13:01:41 -0400 (EDT) Message-Id: <199610181701.NAA13697@k2.ccs.neu.edu> To: lth@ccs.neu.edu cc: lieber@ccs.neu.edu, com3360@ccs.neu.edu Subject: Re: subgraph searching Status: R > I dislike Algorithm A chiefly because it relies on destructive updates > to the graph during the traversal (prohibiting, for example, concurrent > traversals of the same graph). This is only if you interpret "color a node red" as "set a variable on a node"; if you interpret it instead as "add a node to a set", then you have essentially your algorithm: the "visited" set is the set of red nodes, and the "subgraph" set is the set of blue & red nodes. The remaining difference is that you do it in one pass instead of two, so you don't need to reverse the links (I use the same optimization in demjava). The tradeoff is basically the simplicity of directly attaching the color to the nodes, versus the complexity of making an efficient set implementation (which you left as an exercise...), but I think in an abstract sense the algorithms are identical. --Doug From lth@ccs.neu.edu Fri Oct 18 13:05:33 1996 Received: from denali.ccs.neu.edu (root@denali.ccs.neu.edu [129.10.113.75]) by amber.ccs.neu.edu (8.7.5/8.7.3) with ESMTP id NAA28574; Fri, 18 Oct 1996 13:05:30 -0400 (EDT) Received: from localhost.ccs.neu.edu (lth@localhost.ccs.neu.edu [127.0.0.1]) by denali.ccs.neu.edu (8.7.5/8.6.4) with SMTP id NAA28126; Fri, 18 Oct 1996 13:05:28 -0400 (EDT) Message-Id: <199610181705.NAA28126@denali.ccs.neu.edu> X-Authentication-Warning: denali.ccs.neu.edu: Host lth@localhost.ccs.neu.edu [127.0.0.1] didn't use HELO protocol To: Doug Orleans cc: lth@ccs.neu.edu, lieber@ccs.neu.edu, com3360@ccs.neu.edu Subject: Re: subgraph searching In-reply-to: Your message of "Fri, 18 Oct 96 13:01:41 EDT." <199610181701.NAA13697@k2.ccs.neu.edu> Date: Fri, 18 Oct 96 13:05:26 -0400 From: Lars Thomas Hansen Status: R > > I dislike Algorithm A chiefly because it relies on destructive updates > > to the graph during the traversal (prohibiting, for example, concurrent > > traversals of the same graph). > >This is only if you interpret "color a node red" as "set a variable on >a node" [...] Not at all. It's the link reversal that gets to me... as I think I pointed out further down in the message, the color sets can be kept separate in a number of ways, but I'm unconvinced that Algorithm A can be implemented with, say, a doubly-linked graph that would allow the update of the links to be avoided. --lars =============== See page 495, excercise 15.23 and chapter 15 for related information.