Monday, November 26, 2012

List Item Performance - Updating Large Lists

copyright from: e-junkie-chronicles.blogspot.com

As part of my series on performance, I thought I'd look at updating large (20,000 item) lists in bulk, and what are the fastest ways to do this.

I've gone through a few different ways of doing this, finding out where the slowness is coming from, and I present the results of the analysis in this blog. Hopefully someone will take away some value from this!



The Control Test

First of all, I decided to use the worst possible method as a control for the experiement: iterating over SPList.Items. Microsoft have stated quite clearly that one should avoid doing this whereever possible, and I've found out myself that isn't not even advisable to pass in an empty SPQuery to GetItems.

Here's the code I used to update the items with this method:


using System;
using System.Linq;
using Microsoft.SharePoint;

namespace ItemUpdatePerformance
{
    class Program
    {
        static void Main(string[] args)
        {
            SPSite site = new SPSite("http://demolab-sps2010/");
            SPWeb web = site.RootWeb;

            SPList list = web.Lists["PerfTest"];

            string startTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Starting process... - " + startTime);

            foreach (SPListItem item in list.Items)
            {
                Console.WriteLine("Updating item {0}...", item.ID);
                item["Title"] = item["Title"].ToString() + "_1";
                item.Update();
            }

            string endTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Finished process...\n Time taken: {0} - {1}", startTime, endTime);

            Console.ReadLine();
        }
    }
}


And here's a screenshot of the result from this test:

As we can see, 3 minutes 13 seconds for our control, using the "worst possible method" for both accessing and updating the list. One thing to note about this method: the delay between the output "Starting Process..." and the first "Updating Item..." was something in the order of 15 seconds. Imagine a page load delay of that long, even if you're not iterating over all the items in SPList.Items, but just setting a variable to that property! If you simply access SPList.Items, a trip is made to the content database to evaluate all the items in your query, so that when you do eventually come to iterate over the SPListItemCollection, it is somewhat more performant. Next up, the main hope is to reduce this delay, as well as trying to avoid updating each item one at a time.

Experiement 1: Batch Updating

For the next few tests, I decided to look at the MSDN article about Batch Updating list items, using the ProcessBatchCommand method of SPWeb.

I took the code samples on their site and structured it into my solution, here's how it looks:

using System;
using System.Linq;
using Microsoft.SharePoint;
using System.Text;
using Microsoft.SharePoint.Utilities;

namespace BulkUpdate
{
    class Program
    {
        static void Main(string[] args)
        {

            SPSite site = new SPSite("http://demolab-sps2010/");
            SPWeb web = site.RootWeb;

            SPList list = web.Lists["PerfTest"];

            string startTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Starting process... - " + startTime);


            StringBuilder methodBuilder = new StringBuilder();
            string batch = string.Empty;
            DateTime currentDate = DateTime.Now;
            string formattedDate = SPUtility.CreateISO8601DateTimeFromSystemDateTime(currentDate);

            string batchFormat = "" +
              "{0}";

            string methodFormat = "" +
             "{1}" +
             "Save" +
             "{2}" +
             "{3}" +
             "";

            SPQuery query = new SPQuery();
            query.RowLimit = 200;

            int i = 0;

            do
            {
                SPListItemCollection items = list.GetItems(query);

                methodBuilder.Remove(0, methodBuilder.ToString().Length);

                foreach (SPListItem item in items)
                {
                    methodBuilder.AppendFormat(methodFormat, item.ID, list.ID.ToString(), item.ID, item["Title"] + "_4");
                }

                batch = string.Format(batchFormat, methodBuilder.ToString());

                Console.WriteLine("Sending batch {0} for processing...", i);
                web.ProcessBatchData(batch);
                i++;
                query.ListItemCollectionPosition = items.ListItemCollectionPosition;

            } while (query.ListItemCollectionPosition != null);


            string endTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Finished process...\n Time taken: {0} - {1}", startTime, endTime);

            Console.ReadLine();
        }
    }
}



And important line to look out for here is the query.RowLimit = 200; line. I change this for the next couple of experiments, to see how the size of a page I'm getting from the list varies with each update.

Here's the screenshot from the first run, with 200 items:

Here: 4 minutes, 39 seconds. Oh dear, even slower! Let's try a bigger batch. This time, 500 items at a time.


This time, 3 minutes 32 seconds! Much better, bit still slower than the forbidden way!

What happens with smaller batches? 50 items per batch this time:


Uhm, yeah. 10 minutes 53 seconds for that one. Scratch that idea.

So it seems bigger batches are going in the right direction. Let's try the default query throttling limit of 2000 items.


Whoa there, Betsy. 2 minutes 51 seconds! That's a record for us!

Experiment 3: Data Tables

Lastly, I want to see if I can speed up the "getting" of all the items in my list. I want to see if the GetDataTable() method of SPListItemCollection will get me the items any faster than just the Items property. At least, I want to see if I can iterate over the rows in a DataTable faster than I can the Items in the collection. I'm going to again use the Batch method with 2000 rows per time, as this got the best results yet.

Here's the code that uses GetDataTable()

using System;
using System.Linq;
using Microsoft.SharePoint;
using System.Text;
using Microsoft.SharePoint.Utilities;
using System.Data;

namespace BulkUpdate
{
    class Program
    {
        static void Main(string[] args)
        {

            SPSite site = new SPSite("http://demolab-sps2010/");
            SPWeb web = site.RootWeb;

            SPList list = web.Lists["PerfTest"];

            string startTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Starting process... - " + startTime);


            StringBuilder methodBuilder = new StringBuilder();
            string batch = string.Empty;
            DateTime currentDate = DateTime.Now;
            string formattedDate = SPUtility.CreateISO8601DateTimeFromSystemDateTime(currentDate);

            string batchFormat = "" +
              "{0}";

            string methodFormat = "" +
             "{1}" +
             "Save" +
             "{2}" +
             "{3}" +
             "";

            SPQuery query = new SPQuery();
            query.RowLimit = 2000;

            int i = 0;

            do
            {
                SPListItemCollection items = list.GetItems(query);
                DataTable table = list.Items.GetDataTable();

                methodBuilder.Remove(0, methodBuilder.ToString().Length);

                foreach (DataRow row in table.Rows)
                {
                    string moo = row["Title"].ToString();
                    methodBuilder.AppendFormat(methodFormat, row["ID"], list.ID.ToString(), row["ID"], row["Title"] + "_6");
                }


                batch = string.Format(batchFormat, methodBuilder.ToString());

                Console.WriteLine("Sending batch {0} for processing...", i);
                web.ProcessBatchData(batch);
                i++;
                query.ListItemCollectionPosition = items.ListItemCollectionPosition;

            } while (query.ListItemCollectionPosition != null);


            string endTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Finished process...\n Time taken: {0} - {1}", startTime, endTime);

            Console.ReadLine();
        }
    }
}


And here's the results:

4 minutes to send the first batch... and there's 10 of them (2000 items for a 20,000 item list!). Screw that.


I had also been trying, as a final test to see if REST would be any quicker.

Unforunately REST has a couple of major limitations for large bulk operations. The first is that the "GetItems" equivalent only gets 1000 items at a time. The second is that the BulkUpdate command can only handle 100 items at a time. Not really adequate. If someone has been able to successfully do large operations with the REST API, please let me know how you get on! :)

Finally...

So, in summary, here's the table of results from these tests. If you have larger batches, Iterating over the ListItemCollections takes longer but the data is written to the server in one go, if you have smaller batches, Iterating over the ListItemCollections is indeed quicker, but there is the delay whilst many batches are written.

MethodTime
SPList.Items3:13
Batches of 200 items4:39
Batches of 500 items3:32
Batches of 50 items10:53
Batches of 2000 items2:51
Batches of 2000, Iterating over DataTable> 30 mins

Be sure to have a look at my other two posts on performance with Item-Level Permissions too (here and here), if you found this interesting/helpful.

No comments: