I’ve been normalizing a table with 98 million rows for a couple of days now and with a recent change I’ve been able to run my batches in groups of 100,000 instead of just 10,000. The program doing the normalization pulls down 100,000 records at a time, analyzes and groups them and performs an UPDATE
for each item in the group, passing the primary keys back to the server in an IN ()
clause. Everything was fine for a while but with the upgrade to 100,000 and with some groupings sending thousands of keys in the IN
clause I started to get the following error:
Internal Query Processor Error: The Query Processor Ran Out Of Stack Space During Query Optimization.
Microsoft says to insert the keys into a temp table and join with that but that doesn’t really work for me. Instead I just decided to break up the UPDATE
query into ones with more manageable IN
clauses.
Originally I had something like:
Dim mySQL = String.Format("UPDATE Tbl SET Col2=xyz WHERE Col1 IN ({0})", Join(TheList.ToArray(), ","))
Using Com As New SqlCommand(mySQL, con)
Com.CommandType = CommandType.Text
Com.ExecuteNonQuery()
End Using
The new code breaks things into more manageable sizes:
Dim AmmountToProcess = 1000
Do While True
Dim Y = TheList.Take(AmmountToProcess)
Dim mySQL = String.Format("UPDATE Tbl SET Col2=xyz WHERE Col1 IN ({0})", Join(Y.ToArray(), ","))
Using Com As New SqlCommand(mySQL, con)
Com.CommandType = CommandType.Text
Com.ExecuteNonQuery()
End Using
TheList.RemoveRange(0, Math.Min(TheList.Count, AmmountToProcess))
If TheList.Count = 0 Then Exit Do
Loop
The new code uses the wonderful Take
extension method to grab 1,000 items. Take
luckily doesn’t error if you ask for more than available. The smaller UPDATE
query runs and then we remove the first 1,000 items (or however many are left) from the list. Repeat until the list is empty.