assertion -- cxn->writeBlockedTimerId == 0 is back

Russ Allbery rra at stanford.edu
Tue Jul 5 00:43:07 UTC 2005


Russ Allbery <rra at stanford.edu> writes:

> I looked at this for a little while, and I see that in various places we
> check to see if a write is pending as well as seeing if the queue is
> empty before calling cxnIdle with a comment that some hosts return
> responses before we're done sending data, but in other places we don't
> do this.

> I wonder if making this uniform so that every place where we send an
> article to a peer, we check for a pending write before marking the
> connection idle would fix this problem.  It at least appears to be a
> harmless fix.  I'm going to go ahead and make that fix and see if it
> helps.

I went back and took another comprehensive look at this, and as a result,
I applied the following patch.  Here's the commit message:

| In all of the response handlers that idle the connection if there's
| nothing left in the queue, don't idle if there are writes pending.  The
| cases where this could possibly trigger are obscure and involve the
| remote peer doing evil things, but the rest of the code handles it
| correctly and we were still seeing assertion failures, indicating that
| evil may be happening.
| 
| In issueStreamingCommands, make certain that there are no pending writes
| before idling the connection.
| 
| Add the code to ihaveBodyDone that was already in commandWriteDone to
| idle the connection if the queue is empty in case we'd had to defer the
| idle in the response handler due to an unfinished write.
| 
| Whenever doSomeWrites is called with writes still pending, add a work
| callback to do the write at the next opportunity.  This should eliminate
| a temporary connection deadlock state on flushing, where the response to
| the IHAVE body arrived before we finished writing it.  Before,
| doSomeWrites would have failed to call issueQUIT because writes were
| still pending, and then after the writes complete, there's no code to go
| back and issue it until the read timeout expires.

That took me about three hours, and I now understand innfeed a lot better.
Let's hope I don't recycle those brain cells very quickly.

Index: connection.c
===================================================================
--- connection.c	(revision 7347)
+++ connection.c	(working copy)
@@ -2222,10 +2222,16 @@
         cxnSleep (cxn) ;
     }
   else
-    /* The article has been sent, so start the response timer. */
-    initReadBlockedTimeout (cxn) ;
+    {
+      /* Some hosts return a response even before we're done sending, so don't
+         go idle until here. */
+      if (cxn->state == cxnFeedingS && cxn->articleQTotal == 0)
+        cxnIdle (cxn) ;
+      else
+        /* The command set has been sent, so start the response timer. */
+        initReadBlockedTimeout (cxn) ;
+    }
 
-
   freeBufferArray (b) ;
 
   return ;
@@ -2275,8 +2281,8 @@
     }
   else
     {
-      /* Some(?) hosts return the 439 response even before we're done
-         sending, so don't go idle until here */
+      /* Some hosts return a response even before we're done sending, so don't
+         go idle until here */
       if (cxn->state == cxnFeedingS && cxn->articleQTotal == 0)
         cxnIdle (cxn) ;
       else
@@ -2609,6 +2615,11 @@
  * take a long time. Instead the Connection just tries its next article on
  * tape or queue, and if that's no good then it registers this callback so
  * that other Connections have a chance of being serviced.
+ *
+ * This function is also put on the callback queue if we called doSomeWrites
+ * but there was already a write pending, mostly so that we don't deadlock the
+ * connection when we got a response to the body of an IHAVE command before we
+ * finished sending the body.
  */
 static void cxnWorkProc (EndPoint ep UNUSED, void *data)
 {
@@ -2833,7 +2844,7 @@
   else
     {
       remArtHolder (artHolder, &cxn->checkRespHead, &cxn->articleQTotal) ;
-      if (cxn->articleQTotal == 0)
+      if (cxn->articleQTotal == 0 && !writeIsPending (cxn->myEp))
         cxnIdle (cxn) ;
       hostArticleDeferred (cxn->myHost, cxn, artHolder->article) ;
       delArtHolder (artHolder) ;
@@ -2894,7 +2905,7 @@
       cxn->checksRefused++ ;
 
       remArtHolder (artHolder, &cxn->checkRespHead, &cxn->articleQTotal) ;
-      if (cxn->articleQTotal == 0)
+      if (cxn->articleQTotal == 0 && !writeIsPending (cxn->myEp))
         cxnIdle (cxn) ;
       hostArticleNotWanted (cxn->myHost, cxn, artHolder->article);
       delArtHolder (artHolder) ;
@@ -2955,7 +2966,7 @@
       cxn->takesSizeOkayed += artSize(artHolder->article);
 
       remArtHolder (artHolder, &cxn->takeRespHead, &cxn->articleQTotal) ;
-      if (cxn->articleQTotal == 0)
+      if (cxn->articleQTotal == 0 && !writeIsPending (cxn->myEp))
         cxnIdle (cxn) ;
       hostArticleAccepted (cxn->myHost, cxn, artHolder->article) ;
       delArtHolder (artHolder) ;
@@ -3049,7 +3060,7 @@
             {
               cxn->checksRefused++ ;
               remArtHolder (artHolder, &cxn->checkRespHead, &cxn->articleQTotal) ;
-              if (cxn->articleQTotal == 0)
+              if (cxn->articleQTotal == 0 && !writeIsPending (cxn->myEp))
                 cxnIdle (cxn) ;
               hostArticleNotWanted (cxn->myHost, cxn, artHolder->article);
               delArtHolder (artHolder) ;
@@ -3065,8 +3076,7 @@
       cxn->takesSizeRejected += artSize(artHolder->article);
 
       remArtHolder (artHolder, &cxn->takeRespHead, &cxn->articleQTotal) ;
-      /* Some(?) hosts return the 439 response even before we're done
-          sending */
+      /* Some hosts return the 439 response even before we're done sending */
       if (cxn->articleQTotal == 0 && !writeIsPending(cxn->myEp))
         cxnIdle (cxn) ;
       hostArticleRejected (cxn->myHost, cxn, artHolder->article) ;
@@ -3130,7 +3140,7 @@
       cxn->takesOkayed++ ;
       cxn->takesSizeOkayed += artSize(artHolder->article);
       
-      if (cxn->articleQTotal == 0)
+      if (cxn->articleQTotal == 0 && !writeIsPending (cxn->myEp))
         cxnIdle (cxn) ;
 
       hostArticleAccepted (cxn->myHost, cxn, artHolder->article) ;
@@ -3810,7 +3820,10 @@
 
   /* If there's a write pending we can't do anything now. */
   if ( writeIsPending (cxn->myEp) )
-    return ;
+    {
+      addWorkCallback (cxn->myEp,cxnWorkProc,cxn) ;
+      return ;
+    }
   else if ( writesNeeded (cxn) ) /* something on a queue. */
     {
       if (cxn->doesStreaming)
@@ -4050,7 +4063,7 @@
      was a big backlog of missing articles *and* we're running in
      no-CHECK mode, then the Host would be putting bad articles on the
      queue we're taking them off of. */
-  if (cxn->missing && cxn->articleQTotal == 0)
+  if (cxn->missing && cxn->articleQTotal == 0 && !writeIsPending (cxn->myEp))
     cxnIdle (cxn) ;
   for (p = cxn->missing ; p != NULL ; p = q)
     {

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.


More information about the inn-workers mailing list