Improving Blob Upload Speeds

The Challenge:

Upload a 144 MB file into Azure Blob Storage as fast as possible.

CloudBlob.UploadFile()

The CloudBlob.UploadFile() method doesn’t do a bad job. Behind the scenes it’s chunking the file into blocks, and using the ‘PutBlock‘ method with the Parallel Task Library to upload them simultaneously.

However, the Parallel Task Library will not upload all of the blocks at once, instead it uses a thread pool to upload a few at a time. This is good, but we can do better.

All at once

After looking through some performance metrics on blob upload, it seems that Azure storage performance gets better when you use 20-40 threads to upload your file.

I modified my code to chunk the 144 Meg file into 36 x 4 Meg chunks, then used 36 threads to upload all the chunks simultaneously.

The result: it’s faster (most of the time).

Benchmarks

My intention is to create a more complete list of benchmarks, but here’s what I’ve got at the moment.

Location CloudBlob.UploadFile() All at once technique
Azure Instance
(Extra Large)
6.1 seconds 5.3 seconds
T1 Connection 21 seconds 14 seconds
10 Mbps 143 seconds 615 seconds *
Domestic ADSL Line
(0.36 Mbps upload speed)
Timeout 923 seconds

(*) The network equipment is throttling the number of simultaneous outbound connections.

Where to go from here?

More performance gains can be made, but with added complexity. Peer networking could provide one answer. If the file exists in more that one place, perhaps uploading different parts from both locations would help?

Another answer is compression. The particular file I used was already highly compressed. However, a file which could be heavily compressed could be uploaded to Azure compute instances in a GZipped stream, and then inflated and inserted into blob storage by a background process.

I have tested this approach, and significant gains can be made, in direct correlation to the compressibility of the data. Obviously there is a cost implication.

About these ads