Wednesday, April 27, 2005

BizTalk Zombies and more..

You have got this error from biztalk "Completed with discarded messages" and are wondering what that means you could be potentially seeing what in BTS buzz world is called a Zombie . A Zombie is usually a valid message or a response that is left without a subscription to process at that point in time . Though this one liner definition is quite crude explanation . If you are dealing with scenarios like Sequential convoys and batching you are likely to witness this especially when you have large amount of transactions and when your system is operating near to high load conditions ( that is when I have witnessed it the most, though there are scenarios for encountering zombies even under little or No load ) .


If you really want to learn more about Zombies read this Blog from Biztalk Core Engine Team


Now If your solution involves a while loop surrounding a listen with one branch having a receive and the other having a delay shape followed by some expression shape that sets the loop counter and controls the loop termination .You would more than likely see this scenario as delay could be triggered and the message could be delivered after.

Now the condition for delay and time out is we don't want to wait for a message forever . So we timeout after an interval (Tn) but this (Tn) is non-deterministic and is usually set to a small number.


For an app that we had increasing the delay from 10 Secs to 2 minutes( chosen at random)
did away with Zombie occurrences . I would also suggest decreasing the default retry of 5 minutes to a lesser value if you are in a aggregated scenario where you are collecting one message at a time. 2 minute looks like a large interval to wait for one message but keep in mind that this delay is not arithmetic and does not occur for each listen . Most often the message is already there to be consumed and this delay happens ONLY when there is no message to consume. In a stressed Biztalk box our scenario which is an EOD movement had given us a liberty of waiting an extra couple of minutes.



Another solution that works is after the loop is over to have another loop with the same correlation set(following) to drain these instances for a certain amount of time ( DRAINING ). You can do this before or after you send out the final message . in the latter case you can send these drained ones as a second message or log them as errors .

Also a WMI event is generated for a Zombie occurrence , I haven't hit across the details of the same .

No comments: