One of the fairly common issue that most Biztalk designers face is the issue of Debatching and then Batching . Having faced a similar situation myself I must say that there is no best solution for this problme that I have so far come up with . Though there are different solutions that you could employ
Just to explain the problem first you have a set of files coming in say a folder ( transport does not matter ) and then each file consists fo a set of transcations . you need to process them indicvidually and then send them out as one file
The first instict is to use a Biztalk XML or flatfile dissasembler and then send the message to Messagebox ,and then process them independently. THe problem now comes of batching them back together.
Solution 1
Now if you have one file comeing in during an EOD process , doing ths split and then writing it to a file with append mode TRUE would do the trick ost of the times
But this is not a clean solution If you have more than one file coming in almost simultaneously this solution will not hold water as once Biztalk does thwe split it is not possile to identify the transactions to it's original source
Solution 2
A solution to this is to promote a context property in the pipeline and then in the orchestration decide which instance it belongs to create a unique guid based on the the context and then set the SourceFileName property and then use the append method
This is slightly more complicated as writing pipeline components is not the easiest things to do and you have to take care of your compoent to be streamig . Why it needs to be streamsing Read This
Pros: This is real Fast and great for CBR uses streaming XPATH
Cons: All or nothing. If something fails in the pipeline or map the entire message will be lost.
Solution 3
Another solution is not to split the message using the Docuement /Envelope schema and do this using XPATH . This solution makes us of XPATH to count and then uses XPATH to loop around message elements .To see a sample and a write up on this implementation try this
Pros: Excellent flexibility inside the Orchestration. Great control of the processing of your document! This process is sequential and ordered by default. Ability to loop over anything you can XPath and can easily (this is always a relative term) build mini-batches if needed. This is something Receive Pipelines are not able to do without extensive custom code.
Cons: Performance degrades quickly as message size increases. Downright slow on large messages and a resource hog. In some cases, the sequential and ordered processing of this type of debatching may be limiting to the process.
Solution 4
Ths issue with solution 3 above is performence when it comes to large messages ,Read the story of Biztalk and large files from Lee
The solution that we tried for our issue was to use a two stage orchestration. The first orchestration contains a map that promotes a unique value to an additional Node (effectively creating an intermediate schema) for every instance of the file.
This is then fed to a dissasembler but now each file retains information about its parent using the guid and then write out the guid value as file name with append mode on .
Solution 5
Send your output file out tho SQL and promote business fields that help you identify the batch use a SQL Receive port at periodic intervals to aggregate the result using some intelligent querying
So what is the best solution , It depends , as I said before there is no clean solution or a magic bullet .Depends really on the app that you are trying to build
but a thumb rule is for small messages you can get by with XPATH and go for pipeline based splitting for large messages
I welcome comments and opinions
Ciao