AllCompany NewsNew FeaturesBlog

Response and Retrospective on June Outage

By Brad Pliner on July 21, 2017

Recently, TherapyNotes suffered a virus outbreak and a resulting three-day outage. A TherapyNotes outage affects our tens of thousands of customers, which therefore indirectly impacts their millions of patients. Just prior to this, we were looking forward to moving into our new office building, bringing on a number of new staff members, and expanding our services. Then, in an instant, our business was completely down. We persevered and made it through, but it was an extremely challenging process, and we do not underestimate the difficulties this event caused our customers. We deeply apologize for the setbacks caused by this outage, and rest assured that the honest concerns and frustrations you've expressed to us have not gone unheard.

What happened?

Early morning June 26, 2017, our IT team detected a ransomware virus on our network. Files were becoming encrypted and therefore inaccessible. We quickly shut down all servers to minimize the impact of the virus and so that we could assess the situation carefully.

Our forensic review determined that a server was most likely compromised by a vulnerability in Windows. An update for this vulnerability was released only days earlier by Microsoft that unfortunately was not yet installed on this particular server. Once that server was compromised, a variant of the Samsam ransomware virus began running on it and began encrypting files on all other servers, making them inaccessible to us.

For good measure, we hired a data recovery consulting agency with ransomware experience to review our circumstances. They confirmed our findings on the nature and impact of the virus and gave us confidence that we were taking the right steps to move forward.

We do not believe that TherapyNotes was specifically targeted. The hackers scan the Internet for any vulnerable servers and hope to find a company who will pay their ransomware demands. Many companies around the world were also impacted by ransomware viruses the weeks surrounding our outage.

We did not pay the hackers anything to recover our data and were able to restore live production data. We are confident that no data, such as medical records, credit cards, account information, or our own corporate files, were breached or lost. The only exception is practice logos, which can simply be uploaded again by practices.

What is ransomware?

Viruses used to be more about mischief and destruction, but now they are a money-making tool. Once a computer is infected with a ransomware virus, it begins encrypting the files. Without paying the hackers for the decryption key, your files are unusable. Typical antivirus software is not very effective against this type of attack. First, the hackers gain access to a single computer on the network, and from there, it deploys the virus. The virus is customized so that virus scanners usually can't detect it. (See Wikipedia article on ransomware.)

Due to a recent NSA leak, hackers have had access to many of the US government's powerful tools to break into computer networks. This is the reason for the major rash of ransomware viruses that have recently plagued the world. (See related NPR article.)

How did we respond?

As noted above, our first steps were to bring down the servers and then assess the damage. Our server infrastructure was significantly impacted, requiring us to rebuild a number of servers. However, we made the decision to rebuild every server to guarantee there were no residual security holes such as backdoors left behind by the hackers. We don't believe there were any, but we were taking the utmost precautions. Further, we implemented major changes to our network architecture by more strictly isolating various segments of our network and by further locking down our user accounts. We also rebuilt every workstation and server at our corporate office. We reinstalled and reconfigured every back end system from scratch, including our database servers, our web servers, the caching servers, and so forth.

This was no small feat, and it was amazing that we were able to get the TherapyNotes website back up in under three days (about 60 hours). Even after the main site was up, we still had a lot of work to do, especially addressing concerns related to backend systems. Our team mostly worked nonstop for about two weeks, grabbing sleep where we could. It wasn't until very recently that developers were able to resume working on TherapyNotes upgrades.

Because everything was rebuilt from scratch, we've had occasional outages or performance issues to work through, such as an overly restrictive network setting or various configurations that needed to be further tuned. Those have become less frequent and are hopefully behind us.

What could we have done better and what else are we going to do?

It would be foolish for any company (or any person) to go through an ordeal like this and not learn from it. We are determined not to be in a similar situation again. Security has always been our top priority, but in light of these events, here are some of the things we want to do better:

  • We are investing heavily into our backup and disaster recovery infrastructure, including purchasing new servers and backup solutions. Some changes have already been deployed, and more is being planned now. We are committed to being much better prepared should a similar situation arise in the future, allowing us to more quickly resume operations.
  • We are going to continue to be very aggressive about locking down our network, granularly isolating servers or resources that don't need to talk to each other. This helps isolate security incidents from impacting the entire network.
  • Any security-related TherapyNotes software updates will be prioritized. This is unrelated to this specific incident, but we're leaving no stone unturned.
  • We need to improve our customer communication during outages. More on this is below.
  • We already have third parties evaluate our systems for HIPAA and PCI compliance, but we will be engaging with a security firm to fully audit our network to give us another set of eyes.
  • We are going to further expand our IT department team so we can make more progress, faster.

Communication

During the outage, TherapyNotes.com loaded and showed an error message explaining that the site was down and provided a link to Facebook, our blog, and our Help Center with more information. This was possible due to the Cloudflare service we discussed in a previous blog post. Prior to that, visitors would have just seen a browser error when trying to reach our site, with no indication of what was going on.

Updates were posted on Facebook, Twitter, the TherapyNotes Help Center, and the blog, but not consistently across all of these channels. We are working on a plan to have a single place to go to for updates, and we acknowledge they should be more frequent.

Regrettably, we made a number of incorrect assessments along the way, thinking the servers would be up sooner than they were. As we worked on the servers, we continued to hit unexpected hurdles or underestimate the work in front of us. We will work to improve estimates if a major disaster like this strikes again by being better prepared.

We are working on a clear status page to provide a single place to access information about service disruptions and provide timely updates. During the outage, our support and marketing team members worked hard to try to respond to the massive number of social media messages and emails that were coming in. We have tens of thousands of users, so individual requests for updates were challenging to manage. In the future, in the case of an outage, you will be directed to the status page.

What can users do?

In a Facebook message, we made some suggestions as to how practices could be better prepared in the case you are unable to reach TherapyNotes. (This is also a concern if your Internet goes down.) Our recommendations were in response to our customers asking these questions and were by no means intended to point blame.

Suggestions included having a printout or download of patient list with their phone numbers, syncing your calendar to your phone so it's always available, or printing your calendar. We do not recommend downloading and backing up all of your notes locally as that will make you susceptible to data breaches. We recognize that you rely on TherapyNotes to secure and back up your data, and we are determined to maintain your trust.

We also encourage you to be secure. Use secure passwords, keep your computer updated, use antivirus software, set your computer to lock your screen after a period of inactivity, encrypt your hard drive, and be on the lookout for phishing attacks.

Moving forward

I could not be more grateful for my dedicated team who endured sleepless nights and our amazing clients who have been more supportive than I could have ever expected. During a multi-day service interruption, our clients were cheering us on, even sending us lunch and snacks. We even received pizza and supportive emails from our competitors who recognized the challenges we had to endure. This outpouring of support from everyone has completely thrown me off guard and has affected me deeply.

As noted above, we are firmly committed in continuing to prioritize and invest in improved security, backups, and disaster recovery processes, both to minimize the likelihood of future outages or security incidents and to be better prepared when they occur.

After such a challenging ordeal, our new building is a welcome change of pace and a symbol of progress and moving forward. We are excited for what the future will bring for TherapyNotes.

Thank you all for your patience and understanding during the outage and for your continued support. We are extremely fortunate to have you as a customer.


Thank you on behalf of the entire TherapyNotes Team,

Brad Pliner
President & CEO
TherapyNotes, LLC

50 Comments

Never miss a post. Subscribe to receive instant notifications.

Subscribe to Email Updates