Integrating BITS with Amazon S3 (Part 1)
Recently, I was working on a group project where we had a simple requirement to download prerequisites for the application we were building. That is, the application needed and expected certain components to be installed on the system (i.e. SQL Server Express, Crystal Reports, device drivers, etc.). After some research, it was decided to leverage the Background Intelligent Transfer Service (BITS) that is built into the Windows machine. BITS allows you to transfer files between client and server and it has built-in features like transfer throttling based on available bandwidth and persistence of transfer state even if network disconnect occurs. We also decided to use Amazon S3 in order to store our files to be downloaded as this has great reliability and scalibility. Below is a diagram of the different components.
The application we were bulding is a WPF application written in C#. We tried to leverage as many libraries available in order to expedite development and so for communicating with BITS we leveraged SharpBITS, a .NET wrapper of the BITS API. Using SharpBITS is outside the scope of this post but if you’d like more information on using this library, there is a nice post that is very educational here. So, using public accessible urls to the files stored in Amazon S3, things worked beautifully. BITS is a workhorse and it did its job well until we decided to lock down those resources and added security into the mix. The following describes hurdles and obstacles that were encountered in the process of integrating these two components and the solution that was built in order to address certain limitations that will become apparent as we delve into intricacies of Amazon S3 and BITS.
A great way to lock down files in Amazon S3 and to access them is to generate presigned urls. The urls are only valid for a time period and the credentials are embedded into the signature process and sent along via the query string parameters in the url. Here’s a sample presigned url:
The process of signing and generating these urls is rather involved but one of the key ingredients involves including the HTTP verb (i.e. GET, POST) in the url signature process. So a request for a file named test.txt would generate 2 completely different signatures for a GET and for a HEAD request, for example. And this little detail is the root cause of all evil especially when using BITS to download files. Here’s why.
Whenever a download is scheduled for a file url with BITS, it first makes a HEAD request in order to get information about the file like the file size. It then makes one or more GET requests in order to download the file itself. However, we can only give BITS a single url for a file to be downloaded so which do you give to BITS? Regardless of which one you use, you will succeed for one but get a 403 access denied response for the other.
In part 2 of this post, we will look at the solution to this problem and how we can circumnavigate the way BITS works by basically being a middle man in the communications process.