Backblaze one year review: 70% data loss unexplained, mediocre software, distrust

tldr;

This article is a review and record of my unfortunate experience with Backblaze Personal Backup service, software, and support. If reading quickly, please go down the page to read the conclusion first and then any other sections after. It will take approximately 18 minutes to read the entire article.

About Backblaze

As an IT professional, I am regularly called upon for my opinion on both personal and business services that need a reliable solution and accordingly, when I get the opportunity to try out a service, I like to take advantage of the chance and have personal experience with the product to make the best recommendations later.

Backblaze claims to offer, “Never lose a photo, video, or file again,” “Cloud backup made easy and automatic,” and as a special offer, offered a six-month Backblaze trial included February 2017’s Humble Bundle. Having made use of Backblaze’s hard drive data in the past, I thought this would be a good occasion to assess their service and platform. When the six-months passed, I elected to pay for an entire year to comprehensively review.

PC Specifications

Backblaze and hard driveThe PC I have been running Backblaze on is using Windows 10, an i7-4790K non-overclocked CPU, 32 GB of RAM and six hard disks comprising 6 TB of data total. The disks use NTFS, MBR, separate drive letters and are in a non-RAID configuration. Two of the disks are 500 GB SSDs and the others are HDDs of assorted sizes.

Disaster strikes

At the end of January, I was moving data from one hard drive to another. During the move, the transfer speed suddenly dropped, and a dreaded cyclic redundancy check error appeared. After a reboot, the disk no longer mounted, and I quickly realized it had failed completely.

I had noticed that the same failed disk contained some bad sectors just a few months ago, so this failure was not a shock.

Now with 1.5 TB of data failed, this was my occasion to try a full drive restore with Backblaze. With a new hard drive ordered, I set off to the Backblaze website to get the restore process underway.

Trying to restore data

To restore data from the web on Backblaze, one selects folders to produce a non-compressed ZIP archive, up to 500 GB at a time. To start restoring my disk’s files, I selected about 70 GB, and let the server to produce the archive.

Uncharacteristically quick, I received an e-mail indicating that the restore archive was ready to download, but I immediately noticed that the size of the archive was only 1.8 GB. Downloading the archive confirmed that most of the files were missing.

As there had been no error message to suggest a problem, I attempted to restore a single file that was appearing on Backblaze but missing in the archive. Within seconds of attempting the single file restore, I received an email with the subject: “We weren’t able to complete your restore request” and on the Backblaze website, the restore showed as Failed.

Backblaze: We weren't able to complete your restore request.

Backblaze showing failed restore

Dealing with support

Prior to this event, I had previously sent Backblaze seven support requests. In one request, I experienced the upload process being stuck on “Producing file lists”, in another, files were not showing up on the website despite being uploaded (Backblaze says it takes 48 hours, in reality it can take weeks), and a feature request for the expected behaviour for a notification area-based (more about that in the software section below).

In my penultimate request in December, I described how after a reboot, the software had appeared to have lost its place, starting the amount of uploaded data from zero, and appeared to be re-uploading files that were already there. Backblaze’s response was that this behavior was due to a “self-healing” process and after uploading pre-existing files for about a day, the software seemed to be back on track. You will hear more about “self-healing” in a moment.

To be concise, the below does not include all the text of my support request. You can click on the hyperlinks in each section or see all the support tickets on my other post.

I began my request by explaining my situation:

But the restore [archive] is only 1.8GB, and in the zip archive most of the files are missing. I selected some of the files that are missing manually, and the restore took only a few seconds to prepare and shows up as “Failed”. Is there anything I can do?

I received a response 12 hours later:

In order to see what’s going on with your backup we’ll need to get the password for your account.

I am pleased Backblaze needs my password to get into my account, but I would hope they could already look at their server logs and see the errors. But regardless, after changing my password, I provided more insight and exact examples of the missing data:

I’ve done more attempts at restores, and for the most part, no restores seem to be complete. In one case I selected a whole group of files at 75GB, and got back a 128KB zip with a bunch of text files. Trying to restore individual files from those folders results in a ‘Failed’.

As an example, if you attempt to restore T:\Lorax, which is an old video project of mine I did for one of my sister’s plays. The selected files are 8,344.64 MB, but the resulting zip is only 1.55GB, and attempting to restore the main file (also missing from the zip), T:\Lorax\Lorax Youtube 1080p.mp4, results in a Failed response.

Another example, T:\TapesEdit, which are my childhood cassettes I digitized and edited at one point. Selected: 7,409.11 MB, but the resulting zip is 463.7 MB. Attempting to restore one of the missing files in the zip like T:\TapesEdit\Flubber.wav, results in a Failed [notification].

Five hours later, I received a response:

After further review, it appears that several large files in your backup from the T drive are missing parts from their file. Typically when this happens Backblaze will go through a process called self-healing where the missing parts of a file will be reuploaded. However in your case it looks like 2 files from the 2 drive were never able to be backed up at all for some reason which kept the backup in an initial backup state. The self-healing only occurs once the initial backup is complete, so because your backup never left the initial backup state the self-healing never occurred to fix these files in your Backblaze backup.

I had noticed that the software was stuck on two files before (or maybe 508? Or 482? More on that shortly) but had yet to open a support case for that problem. The software gives no indication of why it would be stuck in this way, and no error message appeared. I would not have known it was “stuck” if I didn’t often open the software regularly.

As mentioned before, I had seen this “self-healing” in the previous request. However, they go on to say there’s no hope for my files (emphasis mine):

Unfortunately because of this we do not have the several files from your T drive in your backup and there is not a way for us recover them as good versions of the files were never uploaded to our servers successfully. What appears when you download folders from the T drive now is the entirety of what is on our servers for the drive. This was a failure of the Backblaze system to work correctly with your computer and drives and if you did want to cancel your Backblaze service we would be happy to offer a full refund of your subscription. If you would like to continue with us then a fresh backup would be necessary to make sure that files are all backed up correctly.

The “several files” amounts to hundreds of files and at least 143 GB with more soon to be discovered. This answer sounds like they have done little investigation and are solely relying on the information I gave them and somehow blaming my “computer and drives” without even asking for the Backblaze software log files.

Additionally, if my options are to get a refund or start a fresh backup, why wouldn’t I take the refund and just sign up again? Furthermore, despite a major data loss, there is no further examination, no detailed description of the events leading to the failure, and no explanation of why this would not happen again.

Not being satisfied with this answer, I started doing my own research. After some testing, I found that the data loss affected all drives, not just the one that failed:

I did some further tests tonight, and the data loss is not limited to the T drive. For example, trying to recover a folder of 13.4GB from the D drive, results in a 43.08MB zip file. In addition, I did manage to determine what’s left of T on Backblaze and I’ve found that only 423GB is still on the server, which means 1,103.12 GB has been lost from T.

I also mentioned that I had seen the self-healing previously, providing the support request numbers back in December. Next, I looked at the software’s log files to see if I could figure out why the two files were stuck, particularly since Backblaze insists they are the root cause of why the data had gone missing:

Looking again through bztransmit23.log, I see “ATTEMPTING TO DECLARE VICTORY on initial upload” and then “BackupSummaryUserWouldSee=Selected_1,508,660_files_/_6,028,700_MB__Remaining_508_files_/_3,728_MB
NOT DECLARING VICTORY on initial upload – too much left”
However, the remaining files that are noted in the log never goes lower than 482, but the UI only shows two (and you also mentioned that there’s two).

I have zipped/attached my full logs to this message and hopefully you can figure it out better than I.

Since they did not request my log files, and their explanation seems to be based on the backup status screen on the website, I decided to attach them on my own, hoping for a better explanation:

Is there any explanation for how this happened and why it won’t happen again? From my perspective, this plane fell out of the sky, half the people died, and the solution is to build another one and hope for the best? I already know that at least a terabyte is gone and I’ll gladly provide any info or data you need to better analyze and figure this out, but from this side, it seems like you guys don’t care to know or you do know and it’s some sort of secret.

Finally, I revealed that I had not exclusively trusted Backblaze with the only copy of my data and that I wish to see their self-healing in action:

I have another backup of the T drive here, which was done the night before this all happened and I’ve restored it to a new disk tonight. With my data back, I would like to test whether this self-healing solution will work.

Three days later I received a response (emphasis mine):

Your description of self-healing is correct – large files from your backup on our servers are missing chunks of the file and Backblaze goes through the self-healing process to correct this and make files available once more via the Restore browser by attempting to reupload the missing chunks of the file(s) in question.

My request to know more about the stuck files was ignored nor does it seem that there was any further analysis from my attached log files:

We can’t say exactly why this happened but it’s likely that either [sic] the drive had been malfunctioning in some capacity for some time that could have caused this.

This does not make any sense as most of my files were uploaded prior to this hard drive problem, and this massive loss of data affects multiple drives. Moreover, if one drive failing can destroy all your uploaded files, doesn’t that defeat the point of the backup?

The self-healing is attempting to complete but has not been able to and will not be able to because of the drive failure. I’m glad to hear you were able to recover the data off the T drive. The only way to test this would be to start a new backup which will rebuild the data files that keep track of the hashes of these chunks to prevent this from happening in the future.

I had restored all the data to a new drive using my own backups, which they acknowledge. However, in their prior response, the only thing preventing self-healing was two stuck files. But now they claim that self-healing won’t work, and I need to start again, even with the stuck files no longer being an issue. Once again, their explanation does not make sense.

Furthermore, how would you restart your backup from the beginning if all your data was already lost by Backblaze?

This final message came on a Saturday, and by this time I had already concluded that the data loss affected every drive, and nearly all the data was not restorable on Backblaze. By Monday, they had automatically closed the ticket.

There has been no follow-up, no investigation, and no further communication from Backblaze since.

Backblaze made it precisely clear how much they value a customer and their data, and that is the full value of a refund, exactly 50 US dollars, if you ask for it.

How much data lost?

I meticulously went through all six of my drives, noting what Backblaze’s website thinks the size of the restored files should be and then compared the actual size once after requesting the files.

I had 5884.73 GB backed up to Backblaze, and of that, only 1683.98 GB was recoverable, with 4200.75 GB lost. That is 71.38% of all my files that are gone without explanation.

Were the files there before?

I was curious if Backblaze had lost my data after it had been on their servers for a time or if the files had never transferred properly in the first place.

Thankfully to answer this, I had previously made use of my Backblaze account to quickly transfer several single files (500 MB and 3 GB respectfully) and one large folder of Messenger install files (6.3 GB) by restoring to a remote computer that others could download from. An attempt to restore the single files was completely unsuccessful and the folder (previously successfully restored on December 7th), also failed as only 176 MB of the data now remains.

Both the files and the folder that previously restored successfully and now are unrecoverable, were not stored on the drive that failed.

This is confirmation that files previously uploaded, stored, and fully restorable have been since damaged on Backblaze’s servers and lost permanently by the company.

Backblaze Software

There is a lot on the surface that could be discussed regarding the Backblaze software: a strange drop-down menu that requires you to hold down the mouse to select anything, no keyboard support in the main Control Panel, backup exceptions that require editing of a text file to include a drive letter, and the lack of being able to back up files that are open/locked by Windows.

However, as Backblaze insists the software caused the loss of my files, let’s review some of the fundamentals of the software, as it does not follow the normal software design/engineering rules of Windows applications, and the result is confusion, strange behaviours and significant performance problems.

Notification Area/System Tray icon

Now and then, the Windows taskbar/explorer needs restarting manually, or it restarts on its own. When this happens, the applications that use the notification area (also known as the system tray, by the clock) need to re-add their icon or the icon will no longer appear. Mercifully back in 1998, Microsoft added an easy and uncomplicated way to re-add the icon, and I remember as a kid being particularly excited, rapidly adding this feature to all my self-made programs. Today, all Windows-supporting frameworks have this function as a feature that is built-in.

But when I started using Backblaze, any time Explorer restarted, the icon for Backblaze in the notification area would vanish. Eventually I got annoyed enough and sent their support a request, along with a Microsoft developer link to explain how they could add the feature into their software. The next day I received word that they had made this change and shortly after an updated version of Backblaze arrived with a restore-able notification area icon.

Although I was pleasantly surprised this was added so quickly without further fuss, realistically all Windows software that operates in the notification area has been expected to do this since 1998.

Missing version, description and information resource

Backblaze’s software runs multiple processes and updates them to new versions often. Do you want to know what those processes are, or what version you’re using? Don’t plan on opening up the properties of the files, or checking Task Manager to find out, as none of the Backblaze programs have any of the expected information (known as a VERSIONINFO resource in Windows).

(For current users: the About window hides in the menu on the notification area icon.)

Loose Threads

Modern software uses a concept known as threads to perform many actions at once, and for the scenario of backup, you use threads to maximize the speed of transferring data by sending multiple files (or chunks of files) all at the same time.

Backblaze offers a non-optimal form of threading for both uploading and downloading. For uploading, Backblaze threading is enabled by default and it would be painfully slow otherwise, as the software divides up files larger than 30 MB into small 10 MB file chunks and then uploads each chunk separately.

However, Backblaze’s concept of threading on Windows is employed poorly. Unlike Unix-based/inspired operating systems like Mac OS or Linux, implementing threads by starting new processes in Windows is performance-intensive and should be avoided in most situations. Instead, threads are expected to be implemented in the same process the program is running in.

The Backblaze software has been designed against these Windows fundamentals: for uploading, they have 21 identical “transmit” executables, named sequentially from 00 to 19, and the ‘original’ bztransmit.exe. Then all these files are duplicated with 64-bit versions.

Backblaze threading files

For the Backblaze downloader tool, the files follow the pattern with the tool duplicated 13 times for 13 threads:

During the upload process, these thread [worker] processes start from nothing, close, start again, with every 10MB file chunk they transfer, writing and reading a different configuration file each time, thereby unnecessarily reducing computer performance. Using Process Explorer, you can see this in action, the green highlighted processes are starting, red highlighted processes are ending:

Loose Threads, Unresponsive Edition

Staying responsive to the user is an important rule for software and to do this, the above-mentioned threads do ‘work’ in the background, leaving the user free to keep using the application, and most importantly, being able to cancel the operation in progress.

The Backblaze software ignores this key necessity, and as a result the windows consistently dim and then deemed unresponsive.

For example, when Backblaze does a full scan for new files, within seconds the window[2] stops responding and Windows will prompt you to forcibly close it. In the year I have used Backblaze, I never saw the progress bar get past the beginning:

In the Backblaze Downloader, once a download starts, any interaction causes the window to go unresponsive, making it impossible to see the speed of the transfer and making it impossible to cancel the download without fighting with the window (additional software used below to highlight mouse actions):

Backblaze Downloader goes unresponsive

The NeverEnding Polling

Windows supports multiple ways for processes to send information back and forth directly in memory (named pipes are a personal favourite), but despite Backblaze always running several processes, none of these communication methods appear used. Instead, Backblaze software reads several XML files containing the backup status from your hard drive constantly, even when no backup is in progress and the files have not changed. See below the various parts of the Backblaze Control Panel with their associated XML files and contents highlighted:

Using Process Monitor, lets observe this action, from bad to worse to ugly. First, when the backup is not running or paused, the Backblaze Service is parsing the same XML files over and over, every ten seconds:

Process Monitor showing Backblaze Service reloading XML every 10 seconds

Then it gets worse, the Backblaze Control Panel, when opened, reads the same XML files every one second:

Process Monitor showing Backblaze Control Panel reloading the same XML files every 1 second

Finally, while uploading to Backblaze, the re-reading of the XML files speeds up to virtually no delay (this is in addition to all other reads/writes needed to do the backup):

Process Monitor showing Backblaze processes going insane reloading XML files during backup process

The good news is that Windows will cache most of these reads, but if you are low on memory or doing intensive work, your performance will needlessly suffer from running the Backblaze software. Additionally, if you are on battery power, this infinite loop polling and parsing is going to lower your available battery life.

Restoring files

Although restore is a key marketing point for Backblaze, most notably being able to request a USB or hard drive with your data sent to you, the restore experience on the website is utterly frustrating at times due to its lack of standard file search/browsing features.

Search

Each time you visit the restore page of the website, there is a perfectly reasonable waiting period to load up all your files. However, typing in the search field and pressing the enter key does not perform a search, instead the restore page reloads from scratch, reloading all the files again and no search results are displayed. In order to perform a successful search, you need to use the mouse and click on the search button.

Browsing folders

To find a file based on a folder, you make use of a standard tree-hierarchy user interface. However, if a folder name is too long to display, the name is truncated with an appended ellipsis, and no way to show the full name. (The name does not appear while hovering or in the source code of the page.)

Additionally, there is no way to search for a folder, you are stuck to finding folders yourself by browsing.

Browsing folders on Backblaze being unable to see the name of the folder

Browsing files

Once you find the folder you want, browsing the files can be even more difficult. There is no way to sort by Name, or Size, or Date, and although sometimes the listing appears to be in alphabetical order, in some folders it seems completely random or the alphabetical order restarts half way through:

Browing files in nonsortable random order on Backblaze

Conclusion

Backblaze lost file pie chart

I uploaded nearly 6 TB in total from 6 individual internal hard drives onto Backblaze. When one drive failed, I attempted a full restore and found that not only was most of that drive not restorable, but Backblaze had lost 4.2 TB, or 71% of all my files.

Backblaze’s support team acknowledged the issue but could not provide a solution or explanation that fit the facts, blaming my failed drive for their software being unable to “self-heal”/re-upload my formerly intact files from all my six drives, ignoring my follow-up questions, and refusing to do any investigation.

Support proposed I request a refund or alternatively remove and restart my backup from the beginning. After I revealed that I had recovered the lost data from my own backup and sought to see their purported “self-healing” fix the problem, I was told that was not an option. My support request was closed automatically, and no further communication was received.

Given Backblaze’s insistence on their ability to reconstruct data, and “focus on ensuring data integrity”, my expectation is that any data loss of this magnitude would be met with full transparency, a team of engineers wanting to research and scrutinize every detail, daily updates, and a full report explaining what went wrong. Instead, left without any rational reason Backblaze failed catastrophically, or what they plan to do to stop it from happening again, any of their reliability claims seem dubious at best.

Backblaze’s software is not designed to operate the way Windows expects, thereby causing overall computer performance to suffer, the user interface to go unresponsive, and makes the entire service appear unprofessional. Additionally, restoring files is challenging by the lack of search and browsing features, and the website blissfully refuses to acknowledge when its unable to restore multiple files, leading users to think their files are safely backed up when they are not.

Based on my experience with Backblaze, they made it precisely clear how much they value a customer and their data, and that is the full value of a refund, exactly 50 US dollars, if you ask for it.

Lastly, as Backblaze refused to explain how they lost most of my files, I think I might have figured that out on my own.

Backblaze, self-heal thyselfBackblaze self-healing ambulance

In support requests and throughout this article, Backblaze cites “self-healing” repeatedly as an explanation. From my perspective, any “self-healing” should only be necessary in the extremely should-be rare situation when a single bit is damaged and unrepairable on the hard drive. Indeed, five years ago in a Reddit AMA, Backblaze says:

We wrote a “self-healing” functionality that checksums every single file on your system before it is ever uploaded. Then, our system constantly checks every file in our entire storage farm and makes sure that the file we have is exactly the file you had on your system. If it ever doesn’t match, we automatically reach back out to your system and upload that piece again.

For this reason, several times a year I validate the integrity of my files and did so immediately after Backblaze lost my data – there were no errors found. In addition, my Backblaze support request in December involved files re-uploading to the service that I verified were identical to those already on the service and fully recoverable. Support claimed this was normal for “self-healing”.

This “normal” response sounded questionable until last month, when Backblaze posted the following in another reddit thread:

Way more likely than a cosmic ray, sometimes the Backblaze technicians decide it is time to retire a pod or vault in the Backblaze datacenter because the drives in that pod or vault are too small and no longer a good financial decision to use them. The techs start the process by FIRST having that pod or vault stop accepting uploads. Next, the technicians issue a special command that asks all the clients that are still phoning home to please retransmit all files they have stored on that one pod or vault to somewhere else in the Backblaze datacenter. But realistically only 90% of the clients will correctly respond and do this, so the final step to decommissioning a pod or vault is the system copies the remaining 10% of the data off to a special archive area on ANOTHER vault.

They go on to explain that this re-upload process is an optimization to lower costs, by using the customer’s own computer to do the work of moving the data to the new location.

Furthermore, Backblaze’s subreddit has plenty of examples of this re-uploading of files with no resolution. From 2015, we see multiple people experiencing re-uploading occurring, which was supposedly ‘fixed’; from June 2017, “files uploaded many times,” the support people claim it’s “caused by the distance from data server”; the latest from November 2017, “my pc is re-uploading about 500GB of video files that haven’t changed,” with several more people offering confirmation that the re-uploading is happening to them too.

I postulate that sometime at the end of December, the aforementioned pod retirement/re-upload process was triggered on the bulk of my files. Unfortunately, a human, hardware or software problem occurred and as a result, Backblaze lost most of my data and possibility the data of the “10%” of customers whose software did not re-upload.

This theory explains why Backblaze considers “self-healing” to be normal, insists it was the cause of losing my files, and why no further investigation was deemed necessary.

The idea of distributing the workload of maintenance to the users is not a bad one, but it this design should be made perfectly clear when you sign up and should be an available option to opt-out of. If this truly was the cause of the loss of over 4 TB of my data, I should have been informed immediately, certainly better compensation should have been offered, and no further work using this method ought to be done until both the cause is established, and a provable solution implemented.

If you are a current Backblaze customer or prospective customer

My experience leads me to consider trusting data on Backblaze to be a game of Russian roulette and as such, you will need another service or local backup as you cannot trust your data will still be available on Backblaze when you need it. I would recommend regular downloads of your files to verify your files are undamaged. This needs to be done manually, as Backblaze will not give you any notification that any part of the restore has failed. You should also expect to be re-uploading your files regularly whenever they do retire a “pod” of hard drives.

Technical notes

  1. All data units are expressed in powers of 2, 1 TB = 1024 GB, 1 GB = 1024 MB.
  2. Backblaze has different scheduling options for when they upload files. The screen capture shows the manual or “Only when I click <Backup Now>” option. The default option is “Continuous” and does not show the unresponsive behaviour. However, I have experienced issues with the Continuous mode, multiple times support has suggested I turn it off, and their own tips (“Rescan Your Hard Drive to Check for Changes“) suggests to hold down the Alt key and press the Restore Options… button, which triggers this window and unresponsive behaviour to occur.
  3. Here is a per-drive breakdown of the 4.2 TB that was lost by Backblaze:
    Backblaze per-drive failure breakdown

 

Advertisements

Posted on March 9, 2018, in Backblaze, Computers and Internet, Personal. Bookmark the permalink. 4 Comments.

  1. I’m tamping fuming raging. This has just happened to me as well.

  2. Hello, can you give more services like this, you know that are fully working? Thanks

    • For the past few weeks I’ve been trying CrashPlan. It’s definitely not been perfect, but I haven’t had issues that I couldn’t find information/solutions or figure out a workaround. We will see how it goes over a longer period though.

  1. Pingback: Backblaze support transcripts | Jonathan Kay, MessengerGeek

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: