Recently, I was working on a group project where we had a simple requirement to download prerequisites for the application we were building. That is, the application needed and expected certain components to be installed on the system (i.e. SQL Server Express, Crystal Reports, device drivers, etc.). After some research, it was decided to leverage the Background Intelligent Transfer Service (BITS) that is built into the Windows machine. BITS allows you to transfer files between client and server and it has built-in features like transfer throttling based on available bandwidth and persistence of transfer state even if network disconnect occurs. We also decided to use Amazon S3 in order to store our files to be downloaded as this has great reliability and scalibility. Below is a diagram of the different components.
The application we were bulding is a WPF application written in C#. We tried to leverage as many libraries available in order to expedite development and so for communicating with BITS we leveraged SharpBITS, a .NET wrapper of the BITS API. Using SharpBITS is outside the scope of this post but if you’d like more information on using this library, there is a nice post that is very educational here. So, using public accessible urls to the files stored in Amazon S3, things worked beautifully. BITS is a workhorse and it did its job well until we decided to lock down those resources and added security into the mix. The following describes hurdles and obstacles that were encountered in the process of integrating these two components and the solution that was built in order to address certain limitations that will become apparent as we delve into intricacies of Amazon S3 and BITS.
A great way to lock down files in Amazon S3 and to access them is to generate presigned urls. The urls are only valid for a time period and the credentials are embedded into the signature process and sent along via the query string parameters in the url. Here’s a sample presigned url:
The process of signing and generating these urls is rather involved but one of the key ingredients involves including the HTTP verb (i.e. GET, POST) in the url signature process. So a request for a file named test.txt would generate 2 completely different signatures for a GET and for a HEAD request, for example. And this little detail is the root cause of all evil especially when using BITS to download files. Here’s why.
Whenever a download is scheduled for a file url with BITS, it first makes a HEAD request in order to get information about the file like the file size. It then makes one or more GET requests in order to download the file itself. However, we can only give BITS a single url for a file to be downloaded so which do you give to BITS? Regardless of which one you use, you will succeed for one but get a 403 access denied response for the other.
In part 2 of this post, we will look at the solution to this problem and how we can circumnavigate the way BITS works by basically being a middle man in the communications process.
In part 1 of this post series we talked about the pain points of integrating BITS with Amazon S3. In this part of the series we will look at how to overcome the issue described in part 1.
So basically, at this point, we cannot change the way BITS works obviously. Also, we cannot change the way Amazon S3 works either when security is added to resources. The one thing we can do, however, is be a middle man in the process. In other words, instead of downloading a file from Amazon S3, why can’t we download it from another service (i.e. a proxy service that acts on behalf of Amazon S3 for us) that knows how to deal with these intricacies? The following illustrates all the components taking the proxy service into account.
So now, instead of giving a bits a file url that points to S3, we give a url that points to our proxy service.
This is the approach we took and it worked out really well. You still have to take care of generating both a presigned url for a HEAD and a GET request but when BITS makes either request to our proxy service, we can respond appropriately.
- When BITS make a HEAD request to the proxy service, we make a GET request to Amazon S3 and return the obtained response back to BITS.
- When a GET request comes in, we just do a redirect (302) to the actual Amazon S3 presigned url for a GET and it will respond to BITS directly.
Initially, we hosted our proxy service within the process of the client application. This presented some drawbacks like what happens if the user decided to kill the client application while we were downloading? BITS would encounter a transient error and continue to try to reestablish communications but it tries to do this for a limited amount of time only. What if the application was down for a longer period of time? We decided to host the proxy service in a Windows Service application on the local machine. That way, it was always up and it would automatically start on system reboot.
I hope you found this post helpful. When we encountered this issue, we searched for a solution and there was very little information that was available except for a post here or there that eventually led us in the right direction. Although making a basic wcf service that implements this functionality is rather simple, I will be posting the code for the proxy service in the near future so please check back if not already posted.