CPU-Z on an Azure Compute Instance
Taken from a small instance running in the North Europe (Dublin) data centre.
Taken from a small instance running in the North Europe (Dublin) data centre.
Following a recent post about how slow table storage can be, I decided to investigate how fast table storage could be.
This article has some excellent tips on improving the write performance:
This page has some metrics produced by Microsoft’s Extreme Computing Group:
http://azurescope.cloudapp.net/BenchmarkTestCases/
Apparently speeds of up to 4000 entities/second have been reached. Whether this is a sustainable write speed or not, I’m not sure, but I’m going to give it a go!
Richard and
Andrew Newland are discussing. Toggle Comments
Hi Rich the second link is not resolving to the correct URL. Interesting post though. I was inspired by your first post and started my own testing too. I got 1000 writes in 18 seconds for table storage. This was unfair as I was using the dev store and not a live one.
Looking forward to the follow up.
Thanks Andrew, I have fixed the link, but it doesn’t point straight to the stats page now, you have to go and follow the link in the page. Weird.
I’ve got 1000 entries down to 11 seconds now (from outside Azure) but I think it’ll go faster. I wouldn’t trust the emulator’s performance, it’s behaviour is quite different to real life.
I am frequently asked “How can I build an application on Azure, and use the Azure features (such as blobs, tables and queues), but still have the ability to deploy on premises without having two codestreams?”.
One answer is: you can still use Azure features from your on premises deployment. However, people are generally wary of too much tie-in to the platform, and want the option of deploying without the reliance on any Azure features.
Another answer is; Inversion of Control. If you use a dependency injection framework you can abstract away the implementation of the featues you want to use, and code against a common interface. This means you can work with a variety of underlying technologies, without your code being aware of that detail. The choice would be made by your configuration. I’ll show you how to to it using my favourite IoC framework; Spring.NET.
Start by creating an interface for each of the Azure features you want to use. For a queue, it could look like this:
public interface IQueue { void Push(string message); string Pop(); }
Then create a concrete class that implements this feature for Azure:
public class AzureQueue : IQueue { public AzureQueue(string connectionString) { this.Account = CloudStorageAccount.Parse(connectionString); } public CloudStorageAccount Account { get; private set; } public string QueueName { get; set; } public void Push(string message) { if (null == message) throw new ArgumentNullException("message"); var queue = GetQueue(); queue.AddMessage(new CloudQueueMessage(message)); } public string Pop() { var queue = GetQueue(); var message = queue.GetMessage(); if (null == message) { return null; } queue.DeleteMessage(message); return message.AsString; } private CloudQueue GetQueue() { if (null == this.Account) throw new NullReferenceException("Account"); if (null == this.QueueName) throw new NullReferenceException("QueueName"); var queueClient = this.Account.CreateCloudQueueClient(); var queue = queueClient.GetQueueReference(this.QueueName); return queue; } }
Notice how the connection string is passed in to the constructor, and the queue name is a property. We don’t manually load anything from app settings.
Add the Spring.NET framework using nuget:
Install-Package Spring.CodeConfig
Add a spring section to your config file, and configure your object:
<object id="queue" type="Two10.IoC.AzureQueue" singleton="true"> <constructor-arg name="connectionString" value="UseDevelopmentStorage=true" /> <property name="QueueName" value="test" /> </object>
Notice how the connection string and queue name are set in this configuration.
When you want an instance of the queue, you ask Spring for it:
static void Main(string[] args) { IQueue queue = Create<IQueue>("queue"); queue.Push("test message"); string message = queue.Pop(); Console.WriteLine(message); Console.ReadKey(); } public static T Create<T>(string name) where T : class { IApplicationContext context = ContextRegistry.GetContext(); T obj = context.GetObject(name, typeof(T), null) as T; return obj; }
Using a different queue provider is simply a matter of creating another implementation of IQueue, and modifying the configuration to return that instead of the Azure queue. You could have different configuration files for different builds, and Visual Studio will use the correct set of configuration depending on where the application is deployed.
The other real advantage with this approach is that it makes your code much more testable. By adding a mock for your interface, or a concrete class designed specifically for testing, you can write tests and that run in isolation from Azure.
Download an example project here, which also includes a MemoryQueue implementation.
grahamrhay and
Richard are discussing. Toggle Comments
XML IoC configuration? What is this, 2003?
Ha! What would you suggest Ninject or something? Doing all the injection from code is fine, but how would you nicely switch between two concrete classes?
You can still use config values (or some other aspect of the environment) when you’re configuring using code or, better yet, conventions. Say no to xml!
Are you suggesting reading values from app settings, and adding switch statements to your code is better than xml? I prefer the world as it was in 2003 (but with .NET 4)!
I was going to mention the benefits of compile time safety, but that doesn’t really work with conventions 🙂 All the cool kids are doing it?
Which is faster, SQL Azure or Table Storage? I thought it would be SQL, so I constructed a test to find out. I wrote a console application that inserted 1000 records into SQL Azure (1GB web edition), then the same number into Table Storage. The data was very simple (two small fields), the application used a single thread, and looped 1000 times running one insert statement at a time.
Everything was within the North Europe data centre, the test was run on a small compute instance.
The results:
SQL = 3 seconds
Table Storage = 207 seconds
The SQL insert was a raw SQL statement, whereas the Table Storage was using the SDK, and therefore using an object which had to be serialized. So, whilst it’s not a completely fair test, I don’t think the time was lost in serialization.
What this test doesn’t show is high parallelization. If you had many processes writing simultaneously to SQL Azure, you will eventually hit a bottleneck. With table storage this threshold is in theory much higher.
My code was very badly written. In real life you would parallelize, async and batch, but it shows that Table Storage, the very cheap, highly scalable storage resource can be a bit slow.
franksz,
Richard, and
Lee Smith are discussing. Toggle Comments
Really? Try saving in 1 batch (SaveChangesOptions.Batch)
Hi Lee, thanks for your comment.
You’re right, there are a number of ways to improve the performance of my code, but what I was trying to do was understand which technology was quicker for a single write operation. I exaggerated the effect by repeating it a thousand times. Sorry for not making this clearer.
Perhaps I’ll write a post showing how quick table storage can be if used correctly!
Rich.
Apparently this is because you are appending records to the same partition. This is a no-no. Also Azure restricts writes to the same partition to a few hundred per second. To different partitions the account should allow several thousand per second.
‘Inspired’ by github, here are is some styling you can apply to links and buttons (shown here with/without hover):
HTML:
<a class="button" href="/"><span>Button</span></a> <a class="button danger" href="/"><span>Danger Button</span></a>
CSS:
<style> .button{ height: 23px; padding: 0px 10px; line-height: 23px; font-size: 11px; font-weight: bold; color: white; text-shadow: -1px -1px 0 #333; -webkit-border-radius: 3px; -moz-border-radius: 3px; text-decoration: none; -webkit-text-stroke: 1px transparent; height: 34px; padding: 0; position: relative; top: 1px; margin-left: 10px; font-family: helvetica,arial,freesans,clean,sans-serif; font-weight: bold; font-size: 12px; color:#637DB0; text-shadow: 1px 1px 0 white; white-space: nowrap; border: none; overflow: visible; background: #DDD; filter: progid:DXImageTransform.Microsoft.gradient(GradientType=0,startColorstr='#ffffff',endColorstr='#e1e1e1'); background: -webkit-gradient(linear,0% 0,0% 100%,from(white),to(#E1E1E1)); background: -moz-linear-gradient(-90deg,white,#E1E1E1); border-bottom: 1px solid #EBEBEB; -webkit-border-radius: 4px; -moz-border-radius: 4px; border-radius: 4px; -webkit-box-shadow: 0 1px 4px rgba(0,0,0,0.3); -moz-box-shadow: 0 1px 4px rgba(0,0,0,0.3); box-shadow: 0 1px 4px rgba(0,0,0,0.3); cursor: pointer; -webkit-font-smoothing: subpixel-antialiased!important; display: inline-block; } .button:hover { background-position: 0 -23px; text-decoration: none; color:#FFF; filter: progid:DXImageTransform.Microsoft.gradient(GradientType=0,startColorstr='#81a8ce',endColorstr='#5e87b0'); background: -webkit-gradient(linear,0% 0,0% 100%,from(#81a8ce),to(#5e87b0)); background: -moz-linear-gradient(-90deg,#81a8ce,#5e87b0); text-shadow: 1px 1px 0 #333; border-bottom: 1px solid #5e87b0; } .button.danger, a.button.danger { color: #900; } .button.classy:hover:visited{ color:#FFF; } .button.danger:hover{ filter: progid:DXImageTransform.Microsoft.gradient(GradientType=0,startColorstr='#ce8181',endColorstr='#aa0000'); background: -webkit-gradient(linear,0% 0,0% 100%,from(#ce8181),to(#a00)); background: -moz-linear-gradient(-90deg,#ce8181,#a00); color:white; text-shadow: 1px 1px 0 #333; border-bottom: 1px solid #a00; } .button span, a.button span { display: inline; height: 34px; padding: 0 13px; line-height: 36px; } input.button { padding-left:15px; padding-right:15px; } </style>
MapReduce is a pattern for transforming large quantities of data across a cluster of machines.
A common example is to take an input file, such as a text document. The file is split up and distributed across a number of nodes. Each node runs a mapping process on the file. In this case the mapping process identifies every word in the file. An intermediate file, usually as large, or larger than the input file is produced as the output. The reduce process then takes this file, and transforms it into the desired output format. In this case, a count of every word.
Whilst this example is quite contrived, a real use case is not too dissimilar. Typical scenarios include parsing web log files to understand website usage patterns and referral information, and other analytical analysis of unstructured data.
Project Daytona is an Azure centric implementation of the MapReduce framework. When you compile the project, you have two roles, a master (which you have one instance of) and a slave (which you can have multiple instances of). You then derive a few classes to provide your implementation of the algorithm (i.e. the logic to perform the mapping and reducing, and also the retrieval and splitting of the data).
What is interesting is the separation your implementation has from the Azure infrastructure. When you submit your MapReduce ‘job’ to the master node in your Azure deployment, behind the scenes Daytona is uploading your assemblies. This means that you can submit new types of work without having to recompile and re-deploy your cloud project. Smart stuff.
I wouldn’t say it was straight forward to get Daytona up and running, but it’s certainly worth a look at if you have large amounts of data in blobs or tables, and you want to do some analysis.
AzureARR is an example project which configures Application Request Routing on a web role, to enable sticky session load balancing in Windows Azure.
In the solution are three projects.
Incoming requests to Azure are handled by one of the ARR Roles. If this is the first request made by a client, they will be directed to a WebRole instance using a round robin algorithm. A cookie will be issued, which allows the ARR Role to direct subsequent requests to the same machine (i.e. a sticky session).
The ARR Roles continually query the instances running in your deployment to keep the web farm up-to-date with the current network topology.
Two10.Azure.Arr has the following settings:
The role being load balanced much have an internal endpoint configured, called ‘Internal’ (case sensitive).
The following script will create a new web site on an Azure Web Role:
REM Settings for the site SET ID=2 SET NAME="Default Web Site" SET PORT=80 SET PHYSICAL_PATH=d:\inetpub\wwwroot SET APP_POOL="ASP.NET v4.0" REM Discover the IP Address IPCONFIG |FIND "IPv4 Address" > %temp%\TEMPIP.txt FOR /F "tokens=2 delims=:" %%a in (%temp%\TEMPIP.txt) do set IP=%%a del %temp%\TEMPIP.txt set IP=%IP:~1% REM Configure IIS %systemroot%\system32\inetsrv\appcmd add site /name:%NAME% /id:%ID% /bindings:http/%IP%:%PORT%: /physicalPath:%PHYSICAL_PATH% %systemroot%\system32\inetsrv\appcmd set site /site.name:%NAME% /[path='/'].applicationPool:%APP_POOL%
The IP address of the machine is required in the binding, which is why IPCONFIG is used to discover the address.
Stand by for the fanfare! Microsoft seem to have enabled the loopback address on Windows Azure. Both localhost and 127.0.0.1 were previously disabled on Azure roles, but the address now seems to work. The port does not need configuring as an endpoint in the service definition file.
Richard is discussing. Toggle Comments
In fact, if you set up the port as an internal endpoint, it doesn’t work.
Reply
You must be logged in to post a comment.