I've been a long-time paying customer of CrashPlan. I use it to back up my most important files from my home networked-attached storage (NAS) device, such as photos, important documents, etc. That means I have these files duplicated, both on my NAS (two hard drives in mirroring mode), but also up in the cloud, and this gives me great comfort that my important files will never get lost to a 'digital blackhole'.
I would run CrashPlan on my home server (a Mac Mini I have sitting in a closet), and it would monitor specified directories. Whenever these directories change (files added, modified, or deleted), CrashPlan would make sure that the changes were mirrored up into its cloud. So all I had to do was configure my backup sets (one for photos, one for important docs, etc), and then forget about it. This worked remarkably well for the most part, and I was more than happy paying the ~$100 a year it cost me.
Unfortunately, CrashPlan announced that this service was going away, and they were only focusing on the enterprise market. There are alternative options, but I have been too busy to investigate and pick one. In the mean time, my job has changed and I now work at Microsoft - so I thought perhaps I could spend a few hours exploring the Azure Storage Java SDK to build my own backup client that works in a similar fashion to CrashPlan.
As with all projects, the devil is in the detail, and in fact, the actual code to connect to and use the Azure Storage service is really quite minimal. The bulk of the work I have needed to do is embed a database to record the state of the storage, so that on application shutdown / restart the state is able to be known and we don't need to either query remotely or (stupidly) re-upload all files again!
Also, as alluded to in the headline, this is a 'part one' post - I plan to write more as the app evolves. The source code is available online, and I would love to see others commit to it. At present, the only code that is written is the backup engine. That means there is currently no user interface, or even ability to download partial or full backups. So, here's a small list of tasks remaining to be done:
- There needs to be some mechanism to communicate with the backup engine. I'm wondering if we embed a web server to expose the actions as a REST API, or if we simply have a socket server. Anyway, the backup engine needs some way to be communicated with. There is a bunch of API to still be developed:
- Adding / removing / configuring backup sets
- Full or partial restoring of a backup set from the cloud to the local file system
- The backup engine should run in a background service, so that it is able to always sync updates to the cloud.
- There needs to be a client-side user interface built for calling the API provided by the backup engine. This might be JavaFX, or a web frontend.
Lots of hacking still to do! :-) I'll do some posts in the coming days about various aspects of the implementation, but for now, if anyone is interested in helping to build out any of these features, please let me know!
Thoughts on “Building a cloud backup app in Java with Azure, part one”