GitLab down after it deletes wrong directory and backups stumble

Folder wiped.

GitLab down after it deletes wrong directory and backups stumble

GitLab.com, a similar site to GitHub, providing source code version control repository used by software developers, has come a cropper after an employee accidentally deleted a directory on the wrong server.

GitLab

GitLab.com is currently offline due to issues with our production database.

We're working hard to resolve the problem quickly. Follow GitLabStatus for the latest updates.

The Register describes what went wrong:

Behind the scenes, a tired sysadmin, working late at night in the Netherlands, had accidentally deleted a directory on the wrong server during a frustrating database replication process: he wiped a folder containing 300GB of live production data that was due to be replicated.

Just 4.5GB remained by the time he canceled the rm −rf command. The last potentially viable backup was taken six hours beforehand.

To be honest, that's the kind of mistake that anyone could make all too easily if there aren't the right safeguards in place. But it's bad news for anyone who was relying on GitLab to look after their code and merge requests properly, and weren't keeping their own backup.

For, as the site explains, GitLab's own backups have just added to the headaches:

Out of 5 backup/replication techniques deployed none are working reliably or set up in the first place.

Although it would be easy to bash GitLab, it is trying to make the best out of a bad situation by showing remarkable transparency with its dev-head users.

Not only has the site set up a regularly updated Google Doc, containing details of the steps it is taking to try to recover data and resume regular site access, but it is even currently live streaming on YouTube so you can watch its employees slowly restore data where possible.

Remember this - don't feel too smug if you're remembering making backups. Yes, it's great that you make backups, but a backup can't be relied upon unless you have *tested* that it restores properly.

More details about the incident can be found on GitLab's blog.

Tags: ,

Smashing Security podcast
Check out "Smashing Security", the new weekly audio podcast, with Graham Cluley, Carole Theriault, and special guests from the world of information security.

"Three people having fun in an industry often focused on bad news" • "It's brilliant!" • "The Top Gear of computer security"

Latest episode:

,

3 Responses

  1. Bob

    February 2, 2017 at 9:53 am #

    First paragraph and last paragraph are remarkably similar. I'm guessing that it wasn't removed after editing.

    • Graham Cluley in reply to Bob.

      February 2, 2017 at 9:59 am #

      Thanks Bob. Now fixed. :) There's a story about how that error happened, which I might tell sometime soon… maybe I'll discuss it on the next Smashing Security podcast (we're recording one in a few hours)

  2. Elliot Alderson

    February 2, 2017 at 5:14 pm #

    when things like this happen here, we just re purpose that admin to watch youtube videos of Barney & Friends for 6 months. hasn't failed yet.
    ¯_(ツ)_/¯

Leave a Reply