Your seat in the clouds awaits you

Ce billet de blog est également disponible en français ici.

Back in November 2011 easyJet announced that starting in the spring of 2012 we would begin a trial of allocated seating and on April 12th we went live on five routes from London Luton and Glasgow. Since then we have gradually extended this trial and we are now offering allocated seating on almost all routes on our network and by the end of November we will be at 100% operational delivery. This is a huge change for easyJet. Free seating, referred to by many of our passengers as “the scrum”, was part of our DNA. It was how we had always operated. It had become part of the definition of easyJet. As our CEO Carolyn McCall said in the article above, the trial could only be deemed successful if it met all three of the following criteria:

1. It had to increase customer satisfaction. We work hard to have happy customers. It’s another thing that’s part of our DNA. Allocated seating had to really make a difference to the passenger experience. Many people said that they wanted it but, once we gave it to them, would it really make the difference they thought?

2. It had to work operationally. easyJet operates one of the quickest turn-around times in the industry. If boarding passengers into allocated seats was seen to have a negative effect on our On Time Performance (OTP) it would not have been considered viable.

3. It had to work commercially. Allocated seating had to prove itself a commercial success as a revenue generating product.

Trial by fire

This highlights the fact that such a move was a calculated risk. We were not sure it would work but it required significant change and investment to find out. One of the major changes was to our reservation system. Our home-grown reservation system did not support allocated seating. The primary advantage of maintaining a bespoke system is that it can be tailored to your exact business needs. No extraneous functionality cluttering up the works. It does, on the other hand, support bookings from around 58 million passengers a year and take over £4 billion in revenue.

Changing the beating heart of our enterprise, our various sales channels like easyJet.com and our operational systems to support allocated seating was no small undertaking, quite apart from the changes to our operational processes. Making those changes to support a trial, an experiment? That called for a quite special approach.

Our first decision was that we definitely did not want to have to conduct open-heart surgery on our reservation system to add this functionality. The I/O load from selling 58 million non-specific seats a year is already a veritable fire-hose. Scaling and refactoring to support the tracking and locking of over 58 million specific seats on a system that can book up to 1500 seats a minute would be a huge project.

“Seat-allocation-as-a-service”

Would it be possible, we wondered, to buy “seat-allocation-as-a-service” (SaaaS?) from a third party? Get someone else to do the heavy lifting of tracking the availability of every single seat we have on sale while we just stored the output, a few bytes that represented the selections made for the seats we have actually sold?

Apparently not. However the idea of a separate “seat-allocation-as-a-service” solution attached via a very light-weight integration was too attractive to let go so we decided to build our own.

What this means, in summary, is that the tens of millions of seats we have available at any given time are tracked via partitioned SQL Azure databases and cached in the Azure AppFabric Cache. All the logic, business rules and data relating to …

  • selecting seats
  • handling contention for seats
  • aircraft types
  • seating layouts and configurations
  • price bands
  • which passengers can sit where
  • seating access for passengers with restricted mobility
  • algorithms for automatically allocating seats to passengers who chose not to make a selection
  • and the million-and-one other things that have to be taken into consideration when seating an aircraft

…all this is done in the cloud. Even the interactive UI that displays the graphical map of the aircraft is served from Azure and injected into the booking pages on easyJet.com.

seating

The ingenious work to achieve this using JSONP, Ajax and Knockout.js (amongst other things) is a tribute to the fantastic development team at easyJet and may be the subject of a subsequent blog post.

The overall approach however has allowed us to implement an incredibly significant change to the way we operate and sell our flights and deliver it at massive scale without needing to implement much more than small refactorings in our core operational and retail systems. The low cost and massive scale of Azure has made the whole notion of experimenting with something so fundamental an achievable reality. This calculated risk has become a bet we can much more easily afford to make.

Most importantly it has massively reduced the cost of failure. We had to conduct a thorough trial. We couldn’t be sure that it would work. Whether it worked or not was primarily a business decision rather than a technical one.

Now that it has been successful we have delivered a solution that works technically, works operationally, works commercially, improves customer experience and transformed our enterprise. However, if it had not worked and we had needed to turn it all off and walk away, we could have done so without having incurred huge risk, technical debt or cost.

Tablet History

http://twitter.com/#!/bertcraven/status/215212109666066435

Azure Service Bus

Azure Service Bus : Connect All the Things !

GOTCHA : Using Silverlight with the Azure AppFabric Access Control Service (ACS)

Version 2 of the Azure AppFabric Access Control Service now serves up a proper ClientAccessPolicy.xml file to Silverlight clients. Here is what you used to get under version 1 if you went to

https://yournamespace.accesscontrol.windows.net/clientaccesspolicy.xml

<access-policy>
	<cross-domain-access>
		<policy>
			<allow-from http-request-headers="*" http-methods="*">
				<domain uri="https://*"/>
				<domain uri="http://*"/>
			</allow-from>
			<grant-to>
				<resource path="/" include-subpaths="true"/>
			</grant-to>
		</policy>
	</cross-domain-access>
</access-policy>

Here’s what you get now :

<access-policy>
	<cross-domain-access>
		<policy>
			<allow-from http-request-headers="*" http-methods="*">
				<domain uri="https://*"/>
				<domain uri="http://*"/>
			</allow-from>
		<grant-to>
			<resource path="/WRAPv0.9" include-subpaths="true"/>
			<resource path="/v2/OAuth2-13" include-subpaths="true"/>
			<resource path="/v2/wstrust" include-subpaths="true"/>
			<resource path="/v2/wsfederation" include-subpaths="true"/>
			<resource path="/v2/mgmt/service" include-subpaths="true"/>
			<resource path="/FederationMetadata/2007-06/FederationMetadata.xml" include-subpaths="true"/>
			<resource path="/v2/wstrust/mex" include-subpaths="true"/>
			<resource path="/v2/metadata/IdentityProviders.js" include-subpaths="true"/>
		</grant-to>
		</policy>
	</cross-domain-access>
</access-policy>

Here’s the gotcha : this may break previously working code because Silverlight considers those paths to be case sensitive !

If you call the ACS from Silverlight and try to get a simple web token from the WRAP endpoint by calling https://yournamespace.accesscontrol.windows.net/WRAPV0.9 you will get a Silverlight security exception BEFORE Silverlight even attempts to make the call. Basically it will get the client access policy, compare the URL to the permitted resource paths and then throw an exception because /WRAPV0.9 does not match /WRAPv0.9. It will not give you ANY CLUES !

More video…

I just found this video while I was looking for easyJet-related resources on the web. Hadn’t even realised it had been filmed. I gave this talk at BAFTA for a Microsoft event called Migrating Businesses to the Cloud. Pretty nerve-wracking as the other speakers were Bob Muglia and David Chappell; tough acts to follow by anyone’s standards.

Implementing a REST-ful service using OpenRasta and OAuth WRAP with Windows Azure AppFabric ACS.

I’ve been building prototypes again and I wanted to build a service that exposed a fairly simple, resource-based data API with granular security, i.e. some of my users would be allowed to access one resource but not another or they might be allowed to read a resource but not create or update them.

To do this I’ve used OpenRasta and created an security model based on OAuth/WRAP claims issued by the Windows Azure AppFabric Access Control Service (ACS).

The client can now make a rest call to the ACS passing an identity and secret key. In return they will be issued with a set of claims. A typical claim encompasses a resource in my REST service and the action(s) the user is allowed to perform, so their claim set might show that they were allowed to execute GET against resource foo but not POST or PUT.

In my handler in OpenRasta I add method attributes that indicate what claims are required to invoke that method, for instance in my handler for resources of type foo I might have the following method  :

[RequiresClaims("com.somedomain.api.resourceaction", "GetFoo")]
public OperationResult GetFooByID(int id)
{
	//elided
}

In my solution I have created an OpenRasta interceptor which examines inbound requests, validates the claim set and then compares the claims required by the method attribute to the claims in the claim set. Only if there is a match can the request be processed.

After a few tweets with @serialseb I refactored the above into an IAuthenticationScheme that validates the claims, leaving the original OperationInterceptor to check the claims required by the method to be invoked. I also added an extension method so that the whole thing can be fluently configured like so :

ResourceSpace.Uses.AzureClaimsAuthenticationScheme();

I was going to write a long blog post about how to build this from scratch with diagrams and screen shots and code samples but I found that I couldn’t be arsed. If you’d like more info more on how to do this just drop me a line. In the meantime I’ve dropped the source files as follows :

Amazon recommends…

Hi Bert, this is the Amazon Recommendation Selection Engine or A.R.S.E. Today we would like to recommend :

  1. Things you have already bought from us because you can never be too careful or have too much stuff.
  2. Things that are almost identical to things you have already bought from us because, God knows, once you bought a second one of the thing you already bought a third slightly different one will be hard to resist.
  3. Things you have already bought as gifts for other people and that were obviously gifts because they were completely outside of your known buying patterns.
  4. Things that are almost identical to things you already bought as gifts for other people just in case they need two of everything too.
  5. Things that you looked at accidentally or only because someone sent you a link entitled “LOL! Look at this ridiculous, furry, raccoon-head ski helmet“.
  6. Things that are similar to ridiculous, furry, raccoon-head ski helmets but probably a lot less LOL.
  7. 1 item that is completely inexplicable.

Basically I think that the A.R.S.E is an algorithm modelled on the Junk Lady from the film Labyrinth.

Scrummerfall

Scrummerfall. n. The practice of combining Scrum and Waterfall so as to ensure failure at a much faster rate than you had with Waterfall alone. – Brad Wilson.

I like this definition 🙂

Azure AppFabric ACS Gotchas : Longest Prefix Matching

I recently got bitten by a bit of Access Control Service logic related to the way it identifies which scope to issue claims for.

I have a service namespace foo. My Azure Service Bus scope for this namespace is therefore http://foo.servicebus.windows.net. When I create the solution for this namespace two ACS instances are created. One is https://foo.accesscontrol.windows.net which is a general ACS and the other is https://foo-sb.accesscontrol.windows.net which is scoped specifically to the Service Bus. This second ACS has a default Token Policy and Scope which I cannot change. NB : this has now changed. When you provision an AppFabric namespace now you can sepecify which services should be available (ACS, Service Bus and Cache). If you specify Service Bus you will get the bus-scoped ACS instance. You only get the generic ACS if you specifically request ACS as a service.

In my Service Bus solution I am exposing endpoints at http://foo.servicebus.windows.net/bar and http://foo.servicebus.windows.net/baz. I have an issuer (Alice) with claims for the scope http://foo.servicebus.windows.net who is able to create endpoints at both http://foo.servicebus.windows.net/bar and http://foo.servicebus.windows.net/baz. I have another issuer (Bob) who is able to send messages to endpoints at both http://foo.servicebus.windows.net/bar and http://foo.servicebus.windows.net/baz.

I introduce a new issuer (Ivan) to whom I only want to grant access to http://foo.servicebus.windows.net/baz. To this end I create a new scope specifically for http://foo.servicebus.windows.net/baz and create claims for Ivan in this new scope.

Here’s where I get bitten. Alice can still expose endpoints at http://foo.servicebus.windows.net/bar and Bob can still send messages to them. Ivan can send messages to http://foo.servicebus.windows.net/baz but not to http://foo.servicebus.windows.net/bar which is exactly as intended. However, Alice can no longer expose endpoints at http://foo.servicebus.windows.net/baz and Bob could not send to them even if she could. The reason for this is that although Alice and Bob have claims for http://foo.servicebus.windows.net when they try to access anything at http://foo.servicebus.windows.net/baz they automatically fall into the new scope, for which they have no claims. The ACS matches scopes using the longest possible prefix and if there are no claims it will not check parent scopes.

The solution is simple – add new claims for Alice and Bob in the new scope, but the problem is, at first, counter-intuitive.

RX Extensions & Threading

Like a lot of people, I’ve been playing with Rx recently, the Reactive Extensions for .Net. To find out a bit more about what these are have a look here. One of the things I found interesting was the use of Rx as a fluent interface for creating sets of asynchronous tasks and then subscribing to the collective result with both a result handler and error handler.

I’ve worked up a little example here that shows some code performing some fairly standard asynchronous tasks. When called, my code needs to call a web service, query a database and fetch a value from some cache. It then needs to combine the results. Since these processes each take different and often varying amounts of time and I want to minimise the executing time of my code I’m going to do them in parallel.

I have shown 4 different methods of achieving this, using my own threads, the thread pool’s threads by way of BeginInvoke(), the Tasks namespace new in .NET 4.0 and Rx.

namespace RXThreadingExample
{
	using System;
	using System.Collections.Generic;
	using System.Diagnostics;
	using System.Linq;
	using System.Threading;
	using System.Threading.Tasks;

	public class Program
	{
		public static void Main()
		{
			const bool throwException = false;

			ClassicAsync(throwException);
			WaitHandles(throwException);
			Tasks(throwException);
			RXExtensions(throwException);

			Console.ReadKey(true);
		}

		private static void ClassicAsync(bool throwException)
		{
			var sw = Stopwatch.StartNew();
			var wsResult = 0;
			string dbResult = null;
			string cacheResult = null;

			var callWebService = new Thread(() => wsResult = CallWebService());
			var queryDB = new Thread(() => dbResult = QueryDB(throwException, "Async"));
			var fetchCacheItem = new Thread(() => cacheResult = FetchCacheItem());

			try
			{
				callWebService.Start();
				queryDB.Start();
				fetchCacheItem.Start();

				callWebService.Join();
				queryDB.Join();
				fetchCacheItem.Join();

				Console.WriteLine(dbResult, wsResult, cacheResult, sw.ElapsedMilliseconds);
			}
			catch (Exception ex)
			{
				Console.WriteLine("Exception : {0}", ex);
			}
		}

		private static void WaitHandles(bool throwException)
		{
			var sw = Stopwatch.StartNew();

			var waitHandles = new List<WaitHandle>();
			Func<int> callWebService = CallWebService;
			var wsAsyncResult = callWebService.BeginInvoke(null, null);			
			waitHandles.Add(wsAsyncResult.AsyncWaitHandle);

			Func<bool, string, string> queryDB = QueryDB;
			var dbAsyncResult = queryDB.BeginInvoke(throwException, "WaitHandles", null, null);
			waitHandles.Add(dbAsyncResult.AsyncWaitHandle);

			Func<string> queryLocalCache = FetchCacheItem;
			var cacheAsyncResult = queryLocalCache.BeginInvoke(null, null);
			waitHandles.Add(cacheAsyncResult.AsyncWaitHandle);

			try
			{
				WaitHandle.WaitAll(waitHandles.ToArray());

				var wsResult = callWebService.EndInvoke(wsAsyncResult);
				var dbResult = queryDB.EndInvoke(dbAsyncResult);
				var cacheResult = queryLocalCache.EndInvoke(cacheAsyncResult);

				Console.WriteLine(dbResult, wsResult, cacheResult, sw.ElapsedMilliseconds);
			}
			catch(Exception ex)
			{
				Console.WriteLine("Exception : {0}", ex);
			}
		}

		private static void Tasks(bool throwException)
		{
			var sw = Stopwatch.StartNew();
			var wsResult = 0;
			string dbResult = null;
			string cacheResult = null;

			var tasks = new List<Task>
				{
					new Task(() => wsResult = CallWebService()),
					new Task(() => dbResult = QueryDB(throwException, "Tasks")),
					new Task(() => cacheResult = FetchCacheItem())
				};

			try
			{
				tasks.ForEach(t => t.Start());
				Task.WaitAll(tasks.ToArray());
				Console.WriteLine(dbResult, wsResult, cacheResult, sw.ElapsedMilliseconds);
			}
			catch (Exception ex)
			{
				Console.WriteLine("Exception : {0}", ex);
			}

		}

		private static void RXExtensions(bool throwException)
		{
			var sw = Stopwatch.StartNew();
			Observable.Join(
				Observable.ToAsync<int>(CallWebService)()
					.And(Observable.ToAsync<bool, string, string>(QueryDB)(throwException, "RX"))
					.And(Observable.ToAsync<string>(FetchCacheItem)())
					.Then((wsResult, dbResult, cacheValue) =>
						new { WebServiceResult = wsResult, DatabaseResult = dbResult, CacheValue = cacheValue })
				).Subscribe(
					o => Console.WriteLine(o.DatabaseResult, o.WebServiceResult, o.CacheValue, sw.ElapsedMilliseconds),
					e => Console.WriteLine("Exception: {0}", e));
		}

		private static int CallWebService()
		{
			Thread.Sleep(500);
			return new Random().Next(1,33);
		}

		private static string QueryDB(bool throwException, string name)
		{
			Thread.Sleep(1500);
			if (throwException)
			{
				throw new Exception("You asked for it !");
			}
			return name + " can rescue {0} Chilean miners in {2} ms. {1}";
		}

		private static string FetchCacheItem()
		{
			return new[]{"Awesome!", "Cool!", "Amazing!", "Jinkies!"}[new Random().Next(0,4)];
		}
	}
}

I have to say, I like the syntactic sugar of Rx. In the tests I have run, however, using Task consistently produces the fastest results. More to follow, I think.