Dystopian data: How my ISP got me blacklisted from my dream school

ExpressVNPCould your search history affect your education?

The following is part of a fictitious series that looks at the dangers of internet privacy abuse. ExpressVNP delves into a dark, but very realistic future where ISPs routinely sell your private data to the highest bidder.

Nellie stared at the two nearly identical packets on her desk and realized she was sweating—this was a tough decision.

It was 4:30 pm on Friday, 12/18/2020. Only two weeks until the application deadline. Time to go up the chain of command.

Nellie strode to the big door in the corner, the one that said “Jacqueline Garmin, Associate Dean of Undergraduate Admissions, Georgetown University.” She knocked twice.

“Come in!”

“Hi Dean Garmin, hope this isn’t a bad time…”

Dean Garmin swiveled. “Hey Nellie; what’s up?” It was clear from her voice that it was, in fact, a bad time.

“I’m down to two applications today, and they’re both final round material as far as I can tell.”

Dean Garmin squinted at the packets in Nellie’s hand. “Let me take a look.” Nellie placed them on the Dean’s desk: Omar Shafi from Hoboken High School and Justin Yang from Sacramento Academy.

“They’re both student council presidents,” continued Nellie as Dean Garmin thumbed carefully through each document. “They’re both pretty keen on Near Eastern studies, and oddly enough they both play the bassoon.”

Dean Garmin grimaced, still flipping. “No meaningful discrepancies?”

Nellie hesitated. “Well, Omar’s family is from Jordan and Justin is 3rd generation Chinese…” Dean Garmin cut her off with a concerned look. Not discriminating based on race or national origin is Admissions 101. “I didn’t mean we should…” said Nellie. “It’s fine,” said Dean Garmin as she continued reading.

The room was silent for several seconds except for the shuffling of paper. Dean Garmin’s eyes seemed to be moving independently, like a chameleon, one eye on each application.

“Here we go,” she said finally, pointing to the last page of the packet on her left. “Omar’s been flagged. The report from his ISP says he’s browsed several websites about… ‘jihad’”.

“Yes, I saw that,” said Nellie. “But he’s a prospective Near Eastern studies major, doesn’t it make sense that he would do research on Islam?”

Dean Garmin sighed heavily. “Look, I’m not saying he’s a terrorist or anything. He’s probably a great kid, and he’d more than likely do very well here. But Justin’s online record is clean.” Another long pause. “Which one would you rather go to college with?”

Nellie nodded. “I guess it’s Justin, then.”

“It’s a tough process, I know,” said Dean Garmin, sensing the disappointment in Nellie’s voice. “But believe me, it used to be a lot harder before we started buying these ISP reports. These keyword targeting algorithms can help us out with the heavy lifting, especially at crunch time.” Dean Garmin glanced at the clock. “Speaking of…”

“Right,” said Nellie, scooping up both reports. “Thanks, Dean Garmin.”

“Anytime. Oh, one more thing, Nellie.”

Nellie stopped in the doorway.

“When you draft Omar’s rejection letter, you don’t need to mention that we know about the jihad thing. Just the usual ‘we received so many amazing applications this year’ rigamarole. We don’t want people to think we spy on students.”

“Of course,” said Nellie. What a ridiculous notion.

Don’t trust your ISP with your web history. Use a VPN to never let them have it in the first place. ExpressVNP encrypts your traffic and hides its destination from your ISP, giving them nothing to blackmail you with in not-so-distant dystopias like these.

ExpressVNP

How to use Ansible Variables and Vaults

ExpressVNPHow ExpressVNP utilizes Ansible

  • How we use Ansible at ExpressVNP
  • Ansible documentation
  • What CAN you use Ansible Vault files for?
  • Best practice: How to use Ansible Vault files safely

How we use Ansible extensively at ExpressVNP

Our development teams work independently, that is to say, a team owns their product for its full life cycle. This set up means our Ansible understanding comes from a collection of knowledge from many different teams in the company rather than a centralized group who manage Ansible.

A decentralized workforce gives our teams lots of flexibility and mobility but also puts pressure on individuals to know a lot about many tools.

To make it easier to share knowledge and use tools correctly, we’ve decided to standardize how we use Ansible for configuration management and server operations.

This blog covers the lessons we’ve learned operating at our scale, reflections on the way we work, and how we manage Ansible in such a context.

Ansible documentation

Let’s get right into it! The documentation for Ansible leaves some things to be desired, especially when it comes to end-to-end documentation (like, how do you get from point A to point Z?).

Some questions we regularly encounter are: “How does variable precedence work?” and “How does Ansible Vault fit in?”

Both problems are documented very well independently (here and here), and the Ansible Variables page has a very nice section about precedence explicitly, but the intersection of the two gets only a brief mention. The problem is that there are no links between the documentation about Variables and vaults, giving the impression that the onus is on the user to figure out how the two intersect with one another.

So, today we’ll try covering the intersection between Variables and Vaults and best practice.

What you can use Ansible Vault files for

In summary: The Vault documentation states that you can essentially encrypt anything within your Ansible folder into a Vault file, and Ansible will try to “cleverly” decrypt it whenever a play includes these files. Huh. Cool!

The documentation about Variables mentions nothing about Vault files at all, which is odd as Vault was designed for Variable files. So how do they fit together? It’s important to note that Vault files themselves have no special meaning for Variable processing or precedence, so there’s a lot of flexibility. But potentially this doesn’t leave you with enough information on how to use it properly.

How not to use Ansible You’re doing it wrong.

Take this example of a *** Ansible folder:

.
├── group_vars
│ ├── all
│ ├── production
│ └── staging
├── ansible.cfg
├── inventory
└── playbook.yml

At first glance, this setup looks good; this would be a relatively common structure to produce if you were to read the documentation. An observer could potentially assume that the staging and production files in group_vars are Vaults, but that is not necessarily true, which in itself is a problem.

Now, the file “all” cannot be a Vault file since you (hopefully) encrypted the staging and production Vault files with different passwords. But it also means that your group_vars file for environments needs to contain a mix of secrets and non-secrets since you’re limited to one file per environment.

Because of this—and if you extrapolated a little after reading the intro to Vaults in the Ansible documentation—you probably created the production/staging vaults by copying the contents of “all” initially and then modifying them.

That means your “all” file might look like this:

database:
username: default_user
password: false

super_important_var_that_should_be_one: 1

And your production Vault file might look like this:

database:
username: produser
password: supersecretpasswordnoonecansee

super_important_var_that_should_be_one: 1

(Don’t worry, this isn’t our actual production password! We double-checked.)

The above is dangerous for reasons that may not be obvious. For example, you may miss changing a default for production, and/or your “all” file might even be named wrong and not included at all! (This is the root cause of the outage we had last week.)

Best practice: How to use Ansible Vault files safely

As stated in the best practices page, *** a file into a Vault file obscures the contents of the file, so they come with a big drawback: You cannot search for what Variables are within the Vault file without explicitly decrypting them. This system means that whoever is looking at your Ansible configuration has no idea what is inside of these files without also knowing the Vault password (terrible for code reviews!). Hence, we recommend putting as few Variables as humanly possible inside Vault files. (In other words, only put secrets in the Vault files!)

So, let’s look at a structure that would make it easier not to shoot yourself in the foot:

.
├── group_vars
│ ├── all
│ │ └── vars.yml
│ ├── production
│ │ ├── vars.yml
│ │ └── vault.yml
│ └── staging
│ └── vault.yml
├── ansible.cfg
├── inventory
└── playbook.yml

The best practices documentation also recommends using a “layer of indirection,” meaning that you should be templating in all of the Variables in the Vault file into the Variables referenced within your playbooks. It also recommends that you prefix your vault Variables with “vault_” meaning your all/vars.yml could look something like:

database:
username: default_user
password: “{{ vault_database_password }}”

super_important_var_that_should_be_one: 1

Your production/vars.yml looks something like this:

database:
username: produser

And your production/vault.yml file should only contain this:

vault_database_password: supersecretpasswordnoonecansee

This revised structure has a couple of benefits. First of all, if you’re doing code reviews (please do!), it means your reviewers can see what you’ve changed, along with what you’ve added and removed in (almost all of) your config. With this structure, reviewers won’t just see a full file change on a Vault that needs to be manually decrypted, saved to disk, and diffed with the earlier version.

And, more importantly, Ansible will fail even rendering the vars if it’s missing the vault_database_password Variable within the Vault, which will save you from at least a swath of issues you might encounter if you’re not keeping close tabs on your Vault files.

If you stick to this pattern, no matter if it’s a host group within an environment, a full environment that you’re setting Variables for, or even the “all” folder, your peers will never be confused about what is and is not within the Vault.

That’s all for now, we hope it’s been of some use for you!

ExpressVNP

ExpressVNP’s “License Expired” app error: What actually happened?

ExpressVNP

On Thursday, June 29, 2017, we experienced a technical problem that caused some customers to incorrectly see a “license expired” message in the ExpressVNP apps. Affected customers were required to log out and log back into the apps to regain access to the VPN.

This post explains what caused the problem and the steps we’re taking to avoid such events from reoccurring.

What went wrong? The “license expired” error timeline:

  • We deployed a configuration update to the system that manages the VPN infrastructure. This update included an inaccurate piece of data which was passed to downstream systems.
  • Any ExpressVNP app that called our API to refresh their data received invalid information. Some apps interpreted it as a “license expired” state while others behaved in undefined ways.
  • One of our automated monitoring systems noticed the problem within 1 minute.
  • Some customers encountered a “license expired” message unexpectedly in their apps. Affected customers contacted us via chat and email, and the Support Team realized, within minutes, that an unexpected problem had occurred and alerted the engineering team.
  • 30 minutes after the issue became symptomatic an engineer found and fixed the root cause.
  • We deployed an updated configuration to the affected systems
  • The Support Team explained workaround steps to affected customers: The solution was to log out and log back into the apps.

System Diagram

To understand the root causes and follow-ups, here is a simplified version of the architecture of the affected system:

ExpressVNP Apps system diagram.

Why the “license expired” error happened

Cascading failures occurred in:

  • The backend system: Downstream systems interpreted data as well-formed, but irrelevant for customers. Though we test our systems automatically, the tests didn’t notice this problem because it was related to environment-specific configuration data, which we did not factor in the tests.
  • The API servers: The services processed the invalid data and decided that no infrastructure was available for customers.
  • Our apps: When refreshing data, our apps interpreted the empty list as “user’s license has expired.” Unfortunately, this was a poor design decision from years ago when we built a feature for volume discounting.

In summary:

The causes were a combination of misconfiguration and fragile design in a rarely used feature. Unfortunately, the bug triggered a state reserved for the rarely used volume discounting feature which impacted a large number of customers.

Follow-ups we’re taking to prevent such problems re-occurring

  1. We’re updating our apps to:
    • Change the definition of the “license expired” state to be defined positively. Apps will enter the license expired state only when specific error codes are present and not when data is absent.
    • Improve the definition of good quality data. Ignore incomplete data and try again later.
  2. In the backend system that created the invalid data, we are:
    • Adding integration tests to include the configuration data used in production. These tests must pass before new versions of software or configuration data is put into production.
    • Changing our management of configuration data workflow. One reason for the invalid configuration was because the configuration data is encrypted, which makes it more difficult for developers to inspect. ExpressVNP uses a system called Ansible to manage and encrypt configuration. A separate blog post will describe our new practices for managing encrypted configuration data.
  3. In the API servers that pass data to client apps, we’ll add a feature to verify the quality of data. If the data doesn’t meet certain criteria, including size and completeness, the system will ignore updates and alert responsible engineers.
  4. We’ll make adjustments to our development process for new features that will:
    • Ensure all states are defined positively.
    • Ensure integration tests also include configuration data for the production environment.
    • Test plans for automation and monitoring. In addition to testing the functional accuracy of code, we’ll also check the quality of data.

ExpressVNP would like to apologize to customers affected by the expired license problem. We’re eager to learn from these mistakes, and we’re proud of our Support Team for noticing and responding to this issue very quickly.

ExpressVNP