Monday, November 26, 2012

List Item Performance - Updating Large Lists

copyright from: e-junkie-chronicles.blogspot.com

As part of my series on performance, I thought I'd look at updating large (20,000 item) lists in bulk, and what are the fastest ways to do this.

I've gone through a few different ways of doing this, finding out where the slowness is coming from, and I present the results of the analysis in this blog. Hopefully someone will take away some value from this!



The Control Test

First of all, I decided to use the worst possible method as a control for the experiement: iterating over SPList.Items. Microsoft have stated quite clearly that one should avoid doing this whereever possible, and I've found out myself that isn't not even advisable to pass in an empty SPQuery to GetItems.

Here's the code I used to update the items with this method:


using System;
using System.Linq;
using Microsoft.SharePoint;

namespace ItemUpdatePerformance
{
    class Program
    {
        static void Main(string[] args)
        {
            SPSite site = new SPSite("http://demolab-sps2010/");
            SPWeb web = site.RootWeb;

            SPList list = web.Lists["PerfTest"];

            string startTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Starting process... - " + startTime);

            foreach (SPListItem item in list.Items)
            {
                Console.WriteLine("Updating item {0}...", item.ID);
                item["Title"] = item["Title"].ToString() + "_1";
                item.Update();
            }

            string endTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Finished process...\n Time taken: {0} - {1}", startTime, endTime);

            Console.ReadLine();
        }
    }
}


And here's a screenshot of the result from this test:

As we can see, 3 minutes 13 seconds for our control, using the "worst possible method" for both accessing and updating the list. One thing to note about this method: the delay between the output "Starting Process..." and the first "Updating Item..." was something in the order of 15 seconds. Imagine a page load delay of that long, even if you're not iterating over all the items in SPList.Items, but just setting a variable to that property! If you simply access SPList.Items, a trip is made to the content database to evaluate all the items in your query, so that when you do eventually come to iterate over the SPListItemCollection, it is somewhat more performant. Next up, the main hope is to reduce this delay, as well as trying to avoid updating each item one at a time.

Experiement 1: Batch Updating

For the next few tests, I decided to look at the MSDN article about Batch Updating list items, using the ProcessBatchCommand method of SPWeb.

I took the code samples on their site and structured it into my solution, here's how it looks:

using System;
using System.Linq;
using Microsoft.SharePoint;
using System.Text;
using Microsoft.SharePoint.Utilities;

namespace BulkUpdate
{
    class Program
    {
        static void Main(string[] args)
        {

            SPSite site = new SPSite("http://demolab-sps2010/");
            SPWeb web = site.RootWeb;

            SPList list = web.Lists["PerfTest"];

            string startTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Starting process... - " + startTime);


            StringBuilder methodBuilder = new StringBuilder();
            string batch = string.Empty;
            DateTime currentDate = DateTime.Now;
            string formattedDate = SPUtility.CreateISO8601DateTimeFromSystemDateTime(currentDate);

            string batchFormat = "" +
              "{0}";

            string methodFormat = "" +
             "{1}" +
             "Save" +
             "{2}" +
             "{3}" +
             "";

            SPQuery query = new SPQuery();
            query.RowLimit = 200;

            int i = 0;

            do
            {
                SPListItemCollection items = list.GetItems(query);

                methodBuilder.Remove(0, methodBuilder.ToString().Length);

                foreach (SPListItem item in items)
                {
                    methodBuilder.AppendFormat(methodFormat, item.ID, list.ID.ToString(), item.ID, item["Title"] + "_4");
                }

                batch = string.Format(batchFormat, methodBuilder.ToString());

                Console.WriteLine("Sending batch {0} for processing...", i);
                web.ProcessBatchData(batch);
                i++;
                query.ListItemCollectionPosition = items.ListItemCollectionPosition;

            } while (query.ListItemCollectionPosition != null);


            string endTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Finished process...\n Time taken: {0} - {1}", startTime, endTime);

            Console.ReadLine();
        }
    }
}



And important line to look out for here is the query.RowLimit = 200; line. I change this for the next couple of experiments, to see how the size of a page I'm getting from the list varies with each update.

Here's the screenshot from the first run, with 200 items:

Here: 4 minutes, 39 seconds. Oh dear, even slower! Let's try a bigger batch. This time, 500 items at a time.


This time, 3 minutes 32 seconds! Much better, bit still slower than the forbidden way!

What happens with smaller batches? 50 items per batch this time:


Uhm, yeah. 10 minutes 53 seconds for that one. Scratch that idea.

So it seems bigger batches are going in the right direction. Let's try the default query throttling limit of 2000 items.


Whoa there, Betsy. 2 minutes 51 seconds! That's a record for us!

Experiment 3: Data Tables

Lastly, I want to see if I can speed up the "getting" of all the items in my list. I want to see if the GetDataTable() method of SPListItemCollection will get me the items any faster than just the Items property. At least, I want to see if I can iterate over the rows in a DataTable faster than I can the Items in the collection. I'm going to again use the Batch method with 2000 rows per time, as this got the best results yet.

Here's the code that uses GetDataTable()

using System;
using System.Linq;
using Microsoft.SharePoint;
using System.Text;
using Microsoft.SharePoint.Utilities;
using System.Data;

namespace BulkUpdate
{
    class Program
    {
        static void Main(string[] args)
        {

            SPSite site = new SPSite("http://demolab-sps2010/");
            SPWeb web = site.RootWeb;

            SPList list = web.Lists["PerfTest"];

            string startTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Starting process... - " + startTime);


            StringBuilder methodBuilder = new StringBuilder();
            string batch = string.Empty;
            DateTime currentDate = DateTime.Now;
            string formattedDate = SPUtility.CreateISO8601DateTimeFromSystemDateTime(currentDate);

            string batchFormat = "" +
              "{0}";

            string methodFormat = "" +
             "{1}" +
             "Save" +
             "{2}" +
             "{3}" +
             "";

            SPQuery query = new SPQuery();
            query.RowLimit = 2000;

            int i = 0;

            do
            {
                SPListItemCollection items = list.GetItems(query);
                DataTable table = list.Items.GetDataTable();

                methodBuilder.Remove(0, methodBuilder.ToString().Length);

                foreach (DataRow row in table.Rows)
                {
                    string moo = row["Title"].ToString();
                    methodBuilder.AppendFormat(methodFormat, row["ID"], list.ID.ToString(), row["ID"], row["Title"] + "_6");
                }


                batch = string.Format(batchFormat, methodBuilder.ToString());

                Console.WriteLine("Sending batch {0} for processing...", i);
                web.ProcessBatchData(batch);
                i++;
                query.ListItemCollectionPosition = items.ListItemCollectionPosition;

            } while (query.ListItemCollectionPosition != null);


            string endTime = DateTime.Now.ToLongTimeString();
            Console.WriteLine("Finished process...\n Time taken: {0} - {1}", startTime, endTime);

            Console.ReadLine();
        }
    }
}


And here's the results:

4 minutes to send the first batch... and there's 10 of them (2000 items for a 20,000 item list!). Screw that.


I had also been trying, as a final test to see if REST would be any quicker.

Unforunately REST has a couple of major limitations for large bulk operations. The first is that the "GetItems" equivalent only gets 1000 items at a time. The second is that the BulkUpdate command can only handle 100 items at a time. Not really adequate. If someone has been able to successfully do large operations with the REST API, please let me know how you get on! :)

Finally...

So, in summary, here's the table of results from these tests. If you have larger batches, Iterating over the ListItemCollections takes longer but the data is written to the server in one go, if you have smaller batches, Iterating over the ListItemCollections is indeed quicker, but there is the delay whilst many batches are written.

MethodTime
SPList.Items3:13
Batches of 200 items4:39
Batches of 500 items3:32
Batches of 50 items10:53
Batches of 2000 items2:51
Batches of 2000, Iterating over DataTable> 30 mins

Be sure to have a look at my other two posts on performance with Item-Level Permissions too (here and here), if you found this interesting/helpful.

SharePoint 2010 List Throtelling

copyright from: adicodes.com

Introduction

List throtelling is the behaviour which allows to configure SharePoint list to prevent any operation from returning too many results at any time.
In SharePoint 2010 one of the improvements is list throttling. In this article we will see what are the various options we have to avoid list throtelling exceptions.

List Throtelling

The default list throtelling limit in SharePoint 2010 is 5000.
To modify the default setting
Go to
Central Administration > Manage Web Applications > General Settings > Resource Throttling > List View Threshold
If we change this option it will affect globally which is not the good way. So, we will check what are the various options we have to set from the code

ListThrotelling options

Consider an example where list throshold error will be coming
1
2
3
4
5
6
7
8
9
10
11
12
13
string query = @"
                            
                                
                                    
                                    A
                                
                            
                 
"
 SPQuery listQuery = new SPQuery();
 listQuery.Query = query;
 SPList list = SPContext.Current.Web.Lists["EmpList"];
 SPListItemCollection items = list.GetItems(listQuery);
The above query will give more results if the list is large. We will see the various options how to work on throtelling from the code.

SPList.EnableThrottling

Setting the SPList.EnableThrottling property to false will disable throttling for a particular list.
In the above example if we change the code like the following it won’t give the throtteling exception
1
2
3
SPList list = SPContext.Current.Web.Lists["EmpList"];
list.EnableThrottling = false;
SPListItemCollection items = list.GetItems(listQuery);
But, in practcal this option is not the good one to use as throttling is disabled for the list.

SPQuery.QueryThrottleMode

Setting the SPQuery.QueryThrottleMode property to SPQueryThrottleOption.Override to disable throttling for a particular query.
This is a good way to disable throttling.
Note: Inorder to avoid list throtteling exception, precondions should be ‘Object Model override’ attribute must be ‘Yes’ and query should be executed under super user
In the above example the code will be like
1
2
3
4
5
SPQuery listQuery = new SPQuery();
listQuery.Query = query;
listQuery.QueryThrottleMode = SPQueryThrottleOption.Override;
SPList list = SPContext.Current.Web.Lists["EmpList"];
SPListItemCollection items = list.GetItems(listQuery);

ContentIterator

We should consider using ContentIterator class in the Microsoft.Office.Server.Utilities for accessing large lists with throttling enabled or if the list is likely to grow in future.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SPQuery listQuery = new SPQuery();
listQuery.Query = query;
SPList list = SPContext.Current.Web.Lists["EmpList"];
string iteratorName = "EmpList Iterator";
ContentIterator ci = new ContentIterator(iteratorName);
ci.ProcessListItems(list, listQuery, ItemProcessor, ErrorProcessor);
private static bool ErrorProcessor(SPListItem item, Exception e)
{
  //logic here to handle exceptions thrown while processing a list.
      // Do not rethrow exception, keep iterating
  return true;
}
 private static void ItemProcessor(SPListItem item)
 {
    // Put the code here to process a list item
 }

Conclusion

List throttling is not only for restriction on retrieval of data from the list. But, also it is the restriction on data that is affected. For example if our single action deletes 5000 items of the list, then also throtelling exception will occur.
So throtelling is determined by the number of rows affected in content database.

References