Allow me to be controversial for a moment: arbitrary password restrictions on banks such as short max lengths and disallowed characters don't matter. Also, allow me to argue with myself for a moment: banks shouldn't have these restrictions in place anyway.
I want to put forward cases for both arguments here because seeing both sides is important. I want to help shed some light on why this practice happens and argue pragmatically both for and against. But firstly, let's just establish what's happening:
People are Upset About Arbitrary Restrictions
This is actually one of those long-in-draft blog posts I finally decided to finish after seeing this tweet earlier on in the week:
My bank tells me that their exactly-5-digit password policy is secure since it has 1.5bn permutations and the account gets blocked after 3 attempts. This just feels wrong but I can’t come up with a strong argument against it. Any thoughts? @troyhunt@SmashinSecurity ?
It feels wrong because 5 digits presents an extremely limited set of different possible combinations the password can be. (There's something a little off with the maths here though - 5 digits would only provide 100k permutations whereas 5 characters would provide more in the order of 1.5B.)
That said, Westpac down in Australia certainly appears to be 6 characters:
Finally thought @Westpac had upped their password game, moving from the long pointless on-screen keyboard (OSK) with a character count limit, to 'normal' password entry. But nope... 6 characters... MAX... for my *online banking*. @troyhuntpic.twitter.com/9FMSdvVRiL
@TSB Please remove the 15 character maximum password length restriction and allow any characters without having to include any specific ones. This is also the advice of the @NCSChttps://t.co/WTmWEldLBO
So on the surface of it, the whole thing looks like a bit of a mess. But it's not necessarily that bad, and here's why:
Password Limits on Banks Don't Matter
That very first tweet touched on the first reason why it doesn't matter: banks aggressively lock out accounts being brute forced. They have to because there's money at stake and once you have a financial motivator, the value of an account takeover goes up and consequently, so does the incentive to have a red hot go at it. Yes, a 5-digit PIN only gives you 100k attempts, but you're only allowed two mistakes. Arguably you could whittle that 100k "possibilities" down to a much smaller number of "likely" passwords either by recognising common patterns or finding previously used passwords by the intended victim, but as an attacker you're going to get very few bites at that cherry:
Good morning, Keep in mind with ING a 4 digit access code is the maximum we offer. However, after 3 attempts of entering an Access Code your account will be blocked. ^Alissa
Next up is the need to know the target's username. Banks typically use customer registration numbers as opposed to user-chosen usernames or email addresses so there goes the value in credential stuffing lists. That's not to say there aren't ways of discovering someone's banking username, but it's a significantly higher barrier to entry than the typical "spray and pray" account takeover attempts.
Then there's the authentication process itself and it reminds me of a discussion I had with a bank's CISO during a recent workshop. I'd just spent two days with his dev team hacking themselves first and I raised the bollocking they were getting on social media due a new password policy along the lines of those in the tweets you see above. He turned to me and said, "Do you really think the only thing the bank does to log people on is to check the username and password?" Banks are way more sophisticated than this and it goes well beyond merely string-matching credentials; there's all sorts of other environment, behavioural and heuristic patterns used to establish legitimacy. You won't ever see a bank telling you how they do it, but those "hidden security features" make a significant contribution to the bank's security posture:
Hi. I understand your concerns. Our Online Card Services login is our first line of security, but we do have many other hidden security features in place that help us to protect your account and details. ^LauraP
Then there's the increasing propensity for banks to implement additional verification processes at key stages of managing your money. For example, one of the banks I regularly use sends me a challenge via SMS whenever setting up a new payee. Obviously, SMS has its own challenges, but what we're talking about now is not just needing to successfully authenticate to the bank, but also to prove control of a phone number at a key stage and that will always be more secure than authentication alone.
And if all of this fails? Banks like ING will give you your money bank:
Hi Owen, your online banking is safe and secure with ING. We take security seriously, and use industry-leading technology to protect your accounts. Plus, we have an Online Security Guarantee in place. In the unlikely event that an unauthorised...
How much sophistication do you think is behind those username and password fields in that vBulletin forum? Exactly, it's basic string-matching and this is really the point: judging banks by the same measures we judge basic authentication schemes is an apples and oranges comparison.
However, I disagree with banks taking this approach so let me now go and argue from the other side of the fence.
Banks Shouldn't Impose Password Limits
There are very few independent means by which we can assess a website's security posture in a non-invasive fashion. We can look for the padlock and the presence of HTTPS (which is increasingly ubiquitous anyway) and we look at the way in which they allow you to create and use passwords. There are few remaining measures of substance we can observe without starting to poke away at things.
So what opinion do you think people will form when they see arbitrary complexity rules or short limits? Not a very positive one and there are the inevitable conclusions drawn:
Hey [bank], does that 16 character limit mean you've got a varchar(16) column somewhere and you're storing passwords as plain text?
As much as I don't believe that's the case in any modern bank of significance, it's definitely not a good look. Inevitably the root cause in situations like this is "legacy" - there's some great hulking back-end banking solution the modern front-end needs to play nice with and the decisions of yesteryear are bubbling up to the surface. It's a reason, granted, but it's not a very good one for any organisation willing to make an investment to evolve things.
But beyond just the image problem, there's also a functional problem with arbitrarily low password limits:
I've been through this myself in the past and I vividly recall creating a new PayPal password with 1Password only to find the one in my password manager had been truncated on the PayPal side and I was now locked out of my account. This is just unnecessary friction.
So wrapping it all up in reverse order, arbitrary low limits on length and character composition are bad. They look bad, they lead to negative speculation about security posture and they break tools like password managers.
But would I stop using a bank (as I've seen suggested in the past) solely due to their password policy? No, because authentication in this sector (and the other security controls that often accompany it) go far beyond just string-matching credentials.
Let's keep pushing banks to do better, but not lose our minds about it in the process.
Turns out it's actually a sunny day in Oslo today, although it's the last one I'll see here for quite some time before heading off to Denmark then other European things for the remainder of this trip. I'm talking a little about those events (all listed on my events page), this week's changes to EV, more data breaches and a somewhat semantic argument about the definition of "theft".
From the emerging spring to the impending autumn, I'm back in Oslo at the beginning of another series of European events that'll take me across Norway, Denmark, Hungary and Switzerland. This week's update comes from under the glow of a warm outdoor heater at ridiculous o'clock as my sleep cycle keeps me making early starts. But it's all transient and by this time next month I'll be back to a very warm, very familiar Aussie landscape. For now, here's what's new on my side:
Back on business as usual, there's the SIM hijacking issue with Jack Dorsey's Twitter account, more data breaches and joyously, the HIBP API being back in full swing with the 500 subscription limit issue on Azure's APIM now being overcome. Next week's update will be from Oslo so a rather different scene, followed by some other cool places across Europe in the ensuing weeks.
Australia! Sunshine, good coffee and back in the water on the tail end of "winter". I'm pretty late doing this week's video as the time has disappeared rather quickly and I'm making the most of it before the next round of events. Be that as it may, there's a bunch of new stuff this week not least of which is the unexpected limit I hit with the Azure API Management consumption tier. I explain the problem in this video along with a bunch of other infosec related bits. I'll do another one from Aus later this week (if I can stick to schedule) and will try and find another nice little spot. Until then, enjoy:
I made it out of Vegas! That was a rather intense 8 days and if I'm honest, returning to the relative tranquillity of Oslo has been lovely (not to mention the massive uptick in coffee quality). But just as the US to Europe jet lag passes, it's time to head back to Aus for a bit and go through the whole cycle again. And just on that, I've found that diet makes a hell of a difference in coping with this sort of thing:
The number one most effective way I’ve found for coping with jet lag, stress, crazy work loads and general health is to focus on diet. It’s hard to control a lot of other environmental factors, but food is definitely one I can easily take charge on. pic.twitter.com/sUdXDbzbbw
This week it's almost all about commercial CAs and their increasingly bizarre behaviour. It's disappointing to see disinformation and privacy violations from any organisations, but when it's from the ones literally controlling trust on the web it's especially concerning. Maybe once they're no longer able to promote EV in the way they have been that will change, but I have a feeling we've got a bunch more crap to endure yet. See what you think about all that in this week's update:
Almost one year ago now, I declared extended validation certificates dead. The entity name had just been removed from Safari on iOS, it was about to be removed from Safari on Mojave and there were indications that Chrome would remove it from the desktop in the future (they already weren't displaying it on mobile clients). The only proponents of EV seemed to be those selling it or those who didn't understand how reliance on the absence of a positive visual indicator was simply never a good idea in the first place.
The writing might have been on the wall a year ago, but the death warrant is now well and truly inked with both Chrome and Firefox killing it stone cold dead. Here's the Google announcement:
On HTTPS websites using EV certificates, Chrome currently displays an EV badge to the left of the URL bar. Starting in Version 77, Chrome will move this UI to Page Info, which is accessed by clicking the lock icon.
In desktop Firefox 70, we intend to remove Extended Validation (EV) indicators from the identity block (the left hand side of the URL bar which is used to display security / privacy information).
Chrome 77 is currently scheduled to ship on September 10 and Firefox 70 on October 22. With both browsers auto-updating for most people, we're about 10 weeks out from no more EV and the vast majority of web users no longer seeing something they didn't even know was there to begin with! Oh sure, you can still drill down into the certificate and see the entity name, but who's really going to do that? You and I, perhaps, but we're not exactly in the meat of the browser demographics.
I will admit to some amusement in watching all this play out, partly because the ludicrous claims about EV efficacy really come crashing down when it's no longer visible to the end user. But also partly because of comments along the lines of "Google is pushing the EV changes into the spec". Google wasn't pushing anything into a spec, no more so than Apple was last year and Mozilla is now, they were all simply adapting their own UIs to better service their customers and they've all arrived at the same conclusion: remove the EV entity name. But it's the reasons why they're doing this that I find particularly interesting, for example in the Chrome announcement:
Through our own research as well as a survey of prior academic work, the Chrome Security UX team has determined that the EV UI does not protect users as intended. Users do not appear to make secure choices (such as not entering password or credit card information) when the UI is altered or removed, as would be necessary for EV UI to provide meaningful protection.
That absolutely nails it - users aren't going to change their behaviour when they see a DV padlock rather than an EV entity name. This is precisely what Mozilla called out in their announcement:
The effectiveness of EV has been called into question numerous times over the last few years, there are serious doubts whether users notice the absence of positive security indicators and proof of concepts have been pitting EV against domains for phishing.
In fact, Mozilla went even further and referenced the great work that Ian Carroll did when he registered a colliding entity name and got an EV cert for it:
More recently, it has been shown that EV certificates with colliding entity names can be generated by choosing a different jurisdiction. 18 months have passed since then and no changes that address this problem have been identified.
All Ian had to do was spend $100 registering "Stripe Inc" in a different US state to the payment processor you'd normally associate the name with then another $77 on the EV cert and less than hour later, he had this newsworthy result:
I'm assuming the bit about brand refers to the entity name in EV as it doesn't appear against OV or DV on that page. Oh - and just for reference, DigiCert refused to issue Ian a certificate for Stripe due to "risk factors". What risk factors? Well...
There were risk factors for the EV business model.
It's time for re-sellers to clean up their act too, for example The SSL Store:
I chose to leave the entire browser window in this screen grab to highlight the irony of "The SSL Store" having an EV cert issued to "Rapid Web Services". Remember one of Apple's complaints - "Org name is not tied to users intended destination" - yeah...
Actually, The SSL Store provides many great opportunities for reflection on the EV craziness that was (it's pretty safe to use the past tense now). Their piece on how EV provides "tremendous value" is clearly now on the nose and is full of great zingers such as how important it is to be able to differentiate PayPal.com from FakePayPal.com. Why a great zinger? Because PayPal themselves decided that didn't matter back in September last year. And since that entire piece was in response to me writing about just how useless EV was even back then, let's pick it apart even further, for example:
The value of an EV certificate is clear. It is the ability to know more than your browser can assert through connecting to a hostname, parsing a certificate file, and verifying an encryption key.
Ouch - that didn't age well!
EV is now really, really dead. The claims that were made about it have been thoroughly debunked and the entire premise on which it was sold is about to disappear. So what does it mean for people who paid good money for EV certs that now won't look any different to DV? I know precisely what I'd do if I was sold something that didn't perform as advertised and became indistinguishable from free alternatives...
Well that's Vegas done. 8 days of absolutely non-stop events that's now pretty much robbed me of my voice but hey, I got a flying cow! Scott and I both spent BSides, Black Hat and DEF CON doing "hallway con" or in other words, wandering around just meeting people. The personal engagement you get from these ad hoc meetups really can't be beat and I appreciate everyone who took the time to come over and say hi. Just a sample of our week is below:
Vegas! I'm a bit late with this week's update but I thought I'd catch up with Scott Helme and do the video together. We're talking about the events in Vegas, the ongoing Project Svalbard process, some very screwy messaging about certificates from Sectigo and the Irish government coming on board HIBP. Next week we'll do another one from Vegas and talk about what the events of the week here were like.
Over the last year and a bit I've been working to make more data in HIBP freely available to governments around the world that want to monitor their own exposure in data breaches. Like the rest of us, governments regularly rely on services that fall victim to attacks resulting in data being disclosed and just like the commercial organisations monitoring domains on HIBP, understanding that exposure is important. To date, the UK, Australian, Spanish and Austrian governments have come onboard HIBP for nation-wide government domain monitoring and today, I'm happy to welcome the Irish government as well. They now have access to all .gov.ie domains and a handful of other government ones on different TLDs.
A big welcome to the Irish National Cyber Security Centre!
I've been in San Fran meeting with a whole bunch of potential purchasers for HIBP and it's been... intense. Daunting. Exciting. It's actually an amazing feeling to see my "little" project come to this where I'm sitting in a room with some of the most awesome tech companies whilst flanked by bankers in suits. I try and give a bit of insight into that in this week's video, keeping in mind of course that I'm a bit limited by how much detail I can go into right now. As the process unfolds I'll share more, but hopefully this will give you a little taste of what I'm going through at present.
It's the last one from Norway before heading off to the US and diving into the deep end of the Project Svalbard pool followed by Black Hat and DEF CON in Vegas. That's off the back of the last week being focused on pushing out Pwned Passwords V5, loading several hundred million new records worth of new data breaches and finally launching something I've been very excited about for a long time now: auth on the HIBP API. I spend most of this week's update talking about that because it's such an important feature and I especially wanted to make it clear why there's now literally a financial price to pay for entry. All that and more in this week's update.
The very first feature I added to Have I Been Pwned after I launched it back in December 2013 was the public API. My thinking at the time was that it would make the data more easily accessible to more people to go and do awesome things; build mobile clients, integrate into security tools and surface more information to more people to enable them to do positive and constructive things with the data. I highlighted 3 really important attributes at the time of launch:
There is no authentication.
There is no rate limiting.
There is no cost.
One of those changed nearly 3 years ago now - I had to add a rate limit. The other 2 are changing today and I want to clearly explain why.
Identifying Abusive API Usage
Let me start with a graph:
This is executions of the V2 API that enables you to search an individual email address. There's 1.06M requests in that 24 hour period with 491k of them in the last 4 hours. Even with the rate limit of 1 request every 1,500ms per IP address enforced, that graph shows a very clear influx of requests peaking at 14k per minute. How? Well let's pull the logs from Cloudflare and see:
This is the output of a little log analyser I wrote that breaks requests down by ASN (and other metrics) over the past hour. There were 15,573 requests from AS23969 across 82 unique IP addresses. Have a look at where those IP addresses came from:
Late last year after seeing a similar pattern with a well-known hosting provider, I reached out to them to try and better understand what was going on. I provided a bunch of IP addresses which they promptly investigated and reported back to me on:
1- All those servers were compromised. They were either running standalone VPSs or cpanel installations.
2- Most of them were running WordPress or Drupal (I think only 2 were not running any of the two).
3- They all had a malicious cron.php running
This helped me understand the source of the problem, but it didn't get me any closer to actually blocking the abusive behaviour. For the sake of transparency, let me talk about how I tried to tackle this because that will help everyone understand why I've arrived at a very different model to what I started with.
Combating Abuse with Firewall Rules
Firewall rules on Cloudflare are amazingly awesome. It takes just a few seconds to have a rule like this in place:
Make more than 40 requests in a minute and you're in the naughty corner for a day. Only thing is, that's IP-based and per the earlier section on abusive patterns, actors with large numbers of IP addresses can largely circumvent this approach. It's still a fantastic turn-key solution that seriously raises the bar for anyone wanting to get around it, but someone determined enough will find a way.
No problems, I'll just take abusive ASNs like the Thai one above and give them the boot. I scripted a lot of them based on patterns in the log files and create a firewall rule like this:
That works pretty quickly and is very effective, except for the fact that there's an awful lot of ASNs out there being abused. Plus, it has side-effects I'll come back to shortly too.
So how about looking at user agent strings instead? I mean could always just block the ones bad actors are using, except that was never going to work particularly well for obvious reasons (you can always define whatever one you like). That said, there were a heap of browser UAs which clearly were (almost) never legitimate for a client making API calls. So I blocked these as well:
That shouldn't have come as a surprise to anyone as the API docs were actually quite clear about this:
The user agent should accurately describe the nature of the API consumer such that it can be clearly identified in the request. Not doing so may result in the request being blocked.
Problem is, people don't read docs and I ended up with a heap of default user agents (such as curl's) which were summarily blocked. And, of course, the user agent requirement was easily circumvented as I expected it would be and I simply started seeing randomised strings in the UA.
Another approach I toyed with (very transiently) was blocking entire countries from accessing the API. I was always really hesitant to do this, but when 90% of the API traffic was suddenly coming from a country in West Africa, for example, that was a pretty quick win.
I'm only writing about this here now because as the new model comes into place, all of this will be redundant. Plus, I wanted to shed some light on the API behaviour some people may have previously seen which they couldn't quite work out, and that brings me to the next section.
The Impact on Legitimate Usage
The attempts described above to block abuse of the API also blocked a lot of good requests. I feel bad about that because it made something I'd always intended to be easily accessible difficult for some people to use. I hope that by explaining the background here, people will understand why the approaches above were taken and indeed, why the changes I'm going to talk about soon were necessary.
I got way too many emails from people about API requests being blocked to respond to. Often this was due to simply not meeting the API requirements, for example providing a descriptive UA string. Other times it was because they were on the same network as abusive users. There were also those who simply smashed through the rate limit too quickly and got themselves banned for a day. Other times, there were genuine API users in that West African country who found themselves unable to use the service. I was constantly balancing the desire to make the API easily accessible whilst simultaneously trying to ensure it wasn't taken advantage of. In the end, the path forward was clear - the API would need to be authenticated.
The New Model: Authenticated Requests
I held back on this for a long time because adding auth to the API adds a barrier to entry. It also adds coding effort on my end as well as management overhead. However, by earlier this year it became clear that this was the only way forward: requests would have to be auth'd. Doing this solves a heap of problems in one fell swoop:
The rate limit could be applied to an API key thus solving the problem of abusive actors with multiple IP addresses
Abuse associated to an IP, ASN, user agent string or country no longer has to impact other requests matching the same pattern
The rate limit can be just that - a limit rather than also dishing out punishment via the 24 hour block
Making an authenticated call is a piece of cake, you just add an hibp-api-key header as follows:
hibp-api-key: [your key]
However, this wasn't going to completely solve the problem, rather it moved the challenge to the way in which API keys were provisioned. It's no good putting controls around the key itself if a bad actor could just come along and register a heap of them. Anti-automation on the form where a key can be requested is one thing, stopping someone from manually registering, say, 20 of them with different email addresses and massively amplifying their request rate is quite another. I had to raise the bar just high enough to dissuade people from doing this, which brings me to the financial side of things.
There's a US$3.50 per Month Fee to Use the API
Clearly not everyone will be happy with this so let me spend a bit of time here explaining the rationale. This fee is first and foremost to stop abuse of the API. The actors I've seen taking advantage of it are highly unlikely to front up with a credit card and provide what amounts to personally identifiable data (i.e. make a credit card payment) in order to mass enumerate the API.
In choosing the $3.50 figure, I wanted to ensure it was a number that was inconsequential to a legitimate user of the service. That's about what a latte costs at my local coffee shop so spending a few bucks a month to search through billions of records seems like a pretty damn good deal, especially when that rate limit enables 57.6k requests per day.
One thing I want to be crystal clear about here is that the $3.50 fee is no way an attempt to monetise something I always wanted to provide for free. I hope the explanation above helps people understand that, and also the fact the API has run the last 5 and a half years without any auth whatsoever clearly demonstrates that financial gain has never been the intention. Plus, the service I'm using to implement auth and rate limits comes with a direct cost to me:
This is from the Azure API Management pricing page which is the service I'm using to provision keys and control rate limits (I'll write a more detailed post on this later on - it's kinda awesome). I chose the $3.50 figure because it represents someone making one million calls. Some people will make much less, some much more - that rate limit represents a possible 1.785 million calls per month. Plus, there's still the costs of function executions, storage queries and egress bandwidth to consider, not to mention the slice of the $3.50 that Stripe takes for processing the payment (all charges are routed through them). The point is that the $3.50 number is pretty much bang on the mark for the cost of providing the service.
What this change does it simultaneously gives me a much higher degree of confidence the API will be used in an ethical fashion whilst also ensuring that those who use it have a much more predictable experience without me dipping deeper and deeper into my own pocket.
The API is Revving to Version 3 (and Has Some Breaking Changes)
With this change, I'm revising the API up to version 3. All documentation on the API page now reflects that and also reflects a few breaking changes, the first of which is obviously the requirement for auth. When using V3, any unauthenticated requests will result in an HTTP 401.
The second breaking change relates to how the versioning is done. Back in 2014, I wrote about how your API versioning is wrong and headlined it with this graphic:
I outlined 3 different possible ways of expressing the desired version in API calls, each with their own technical and philosophical pros and cons:
Via the URL
Via a custom request header
Via the accept header
After 4 and a bit years, by far and away the most popular method with an uptake of more than 90% is versioning via the URL. So that's all V3 supports. I don't care about the philosophical arguments to the contrary, I care about working software and in this case, the people have well and truly spoken. I don't want to have to maintain code and provide support for something people barely use when there's a perfectly viable alternative.
Next, I'm inverting the condition expressed in the "truncateResponse" query string. Previously, a call such as this would return all meta data for a breach:
You'd end up with not just the name of the breach, but also how many records were in it, all the impacted data classes, a big long description and a whole bunch of other largely redundant information. I say "redundant" because if you're hitting the API over and over again, you're pulling but the same info for each account that appears in the same breach. Using the "truncateResponse" parameter reduced the response size by 98% but because it wasn't the default, it wasn't used that much. I want to drive the adoption of small responses because not only are they faster for the consumer, they also reduce my bandwidth bill which is one of the most expensive components of HIBP. You can still pull back all the data for each breach if you'd like, you just need to pass "truncateResponse=false" as true is now the default. (Just a note on that: you're far better off making a single call to get all breached sites in the system then referencing that collection by breach name after querying an individual email address.)
I'm also inverting the "includeUnverified" parameter. The original logic for this was that when I launched the concept of unverified breaches, I didn't want existing consumers of the API to suddenly start getting results for breaches which may not be real. However, with the passage of time I've come across a couple of issues with this and the first is that a heap of people consumed the API with the default params (which wouldn't include unverified breaches) and then contacted me asking "why does the API return different results to the front page of HIBP?" The other issue is that I simply haven't flagged very many breaches as unverified and I've also added other classes of breach which deviate from the classic model of loading a single incident clearly attributable to a single site such as the original Adobe breach. There are now spam lists, for example, as well as credential stuffing lists and returning all data by default is much more consistent with the ethos of considering all breached data to be in scope.
The other major thing related to breaking stuff is this:
Versions 1 and 2 of the API for searching breaches and pastes by email address will be disabled in 4 weeks from today on August 18.
I have to do this on an aggressive time frame. Whilst I don't, all the problems mentioned above with abuse of the API continues. When we hit that August due date, the APIs will begin returning HTTP 400 "Bad Request" and that will be the end of them.
One important distinction: this doesn't apply to the APIs that don't pull back information about an email address; the API listing all breaches in the system, for example, is not impacted by any of the changes outlined here. It can be requested with version 3 in the path, but also with previous versions of the API. Because it returns generic, non-personal data it doesn't need to be protected in the same fashion (plus it's really aggressively cached at Cloudflare). Same too for Pwned Passwords - there's absolutely zero impact on that service.
During the next 4 weeks I'll also be getting more aggressive with locking down firewall rules on the previous versions at the first sign of misuse until they're discontinued entirely. They're an easy fix if you're blocked with V2 - get an API key and roll over to V3. Now, about that key...
Protecting the API Key (and How My Problem Becomes Your Problem)
Now that API keys are a thing, let me touch briefly on some of the implications of this as it relates to those of you who've built apps on top of HIBP. And just for context, have a look at the API consumers page to get a sense of the breadth we're talking about; I'll draw some examples out of there.
For code bases such as Brad Dial's Pwny Corral, it's just a matter of adding the hibp-api-key header and a configuration for the key. Users of the script will need to go through the enrolment process to get their own key then they're good to go.
In a case like What's My IP Address' Data Breach Check, we're talking about a website with a search feature that hits their endpoint and then they call HIBP on the server side. The HIBP API key will sit privately on their end and the only thing they'll really need to do is stop people from hammering their service so it doesn't exceed the HIBP rate limit for that key. This is where it becomes their (your) problem rather than mine and that's particularly apparent in the next scenario...
Rich client apps designed for consumer usage such as Outer Corner's Secrets app will need to proxy API hits through their own service. You don't want to push the HIBP API key out with the installer plus you also need to be able to control the rate limit of all your customers so that it doesn't make the service unavailable for others (i.e. one user of Secrets smashes through the rate limit thus making the service unavailable for others).
One last thing on the rate limit: because it's no longer locking you out for a day if exceeded, making too many requests results in a very temporary lack of service (usually single digit seconds). If you're consuming the new auth'd API, handle HTTP 429 responses from HIBP gracefully and ask the user to try again momentarily. Now, with that said, let me give you the code to make it dead easy to both proxy those requests and control the rate at which your subscribers hit the service; here's how to do it with Cloudflare workers and rate limits:
Proxying With a Cloudflare Worker (and Setting Rate Limits)
The fastest way to get up and running with proxying requests to V3 of the HIBP API is with a Cloudflare Worker. This is "serverless code on the edge" or in other words, script that runs on Cloudflare's 180 edge nodes around the world such that when someone makes a request for a particular route, the script kicks in and executes. It's easiest just to have a read of the code below:
Stand up a domain on Cloudflare's free tier (if you're not on there already) then it's $5 per month to send 10M queries through your worker which is obviously way more than you can send to the HIBP API anyway. And while you're there, go and use the firewall rules to lock down a rate limit so your own API isn't hammered too much (keeping in mind some of the challenges I faced when doing this).
The point is that if you need to protect the API key and proxy requests, it's dead simple to do.
"But what if you just..."
I'll get a gazillion suggestions of how I could do this differently. Every single time I talk about the mechanics of how I've built something I always do! The model described in this blog post is the best balance of a whole bunch of different factors; the sustainability of the service, the desire to limit abuse, leveraging the areas my skills lie in, the limited availability of my time and so on and so forth. There are many other factors that also aren't obvious so as much as suggestions for improvements are very welcomed, please keep in mind that they may not work in the broader sense of what's required to run this project.
There's a couple of these and they're largely due to me trying to make sure I get this feature out as early as possible and continue to run things on a shoestring cost wise. Firstly, there's no guarantee of support. We do the same thing with entry-level Report URI pricing and it's simply because it's enormously hard to do with the time constraints of a single person running this. That said, if anything is buggy or broken I definitely want to know about it. Secondly, there's no way to retrieve or rotate the API key. If you extend the one-off subscription you'll get the same key back or if you cancel an existing subscription and take a new one you'll also get the same key. I'll build out better functionality around this in the future.
Edit: You can now regenerate API keys by going to the API key page and clicking the "regenerate key" button. This will invalidate the old key and issue a new one.
I'm sure there'll be others that pop up and I'll expand on the items above if I've missed any here.
The changes I've outlined here strike a balance between making the API available for good purposes, making it harder to use for bad purposes, ensuring stability for all those in the former category and crucially, making it sustainable for me to operate. That last point in particular is critical for me both in terms of reducing abuse and reducing the overhead on me trying to achieve that objective and supporting those who ran into the previously mentioned blocks.
I expect there'll be many requests to change or evolve this model; other payment types, no payment at all for certain individuals or organisations, higher rate limits and so on and so forth. At this stage, my focus is on keeping the service sustainable as Project Svalbard marches forward and once that comes to fruition, I'll be in a much better position to revisit suggestions (also, there's a UserVoice for that). For now, I hope that this change leads to a much more sustainable service for everyone.
So "Plan A" was to publish Pwned Passwords V5 on Tuesday but a last-minute check showed control characters had snuck in due to the quality (or lack thereof) of the source data. Scratch that and go to "Plan B" which was to push them out today but a last-minute check showed that my "improved" export script had screwed up the encoding and every single hash was wrong. "Plan C" is now to push them out on the weekend with everything working correctly. Hopefully. If I don't screw anything up again...
The constant challenge I've faced over the last few years is the massive amount of multi-tasking required to do all the things I'm presently doing. I touched on this in my Project Svalbard blog post and it goes a long to explaining why HIBP needs to grow up into a larger organisation. I quite literally need people to remove the horizontal tabs and get the encoding right; it's such a simple thing but it's so easy to screw up when you're stretched too thin.
Enough about that, this week I'm also talking about Scott's upcoming public Glasgow workshop, more data breaches, Namecheap's faux pas and EVE Online's great security work they've very generously shared publicly.
Almost 2 years ago to the day, I wrote about Passwords Evolved: Authentication Guidance for the Modern Era. This wasn't so much an original work on my behalf as it was a consolidation of advice from the likes of NIST, the NCSC and Microsoft about how we should be doing authentication today. I love that piece because so much of it flies in the face of traditional thinking about passwords, for example:
Don't mandate password rotation (enforced changing of it every few months)
Never implement password hints
And of most relevance to the discussion here today, don't allow people to use passwords that have already been exposed in a data breach. Shortly after that blog post I launched Pwned Passwords with 306M passwords from previous breach corpuses. I made the data downloadable and also made it searchable via an API, except there are obvious issues with enabling someone to send passwords to me even if they're hashed as they were in that first instance. Fast forward to Feb last year and with Cloudflare's help, I launched Pwned Passwords version 2 with a k-anonymity model. The data was all still downloadable if you wanted to run the whole thing offline, but k-anonymity also gave people the ability to hit the API without disclosing the original password. Subsequent updates to the corpus of breached passwords saw versions 3 and 4 arrive as more passwords flowed in from new breaches whilst the system also continued to grow and grow:
Today, after another 6 months of collecting passwords, I'm releasing version 5 of the service. During this time I collected 65M passwords from breaches where they were made available in plain text (I don't crack passwords for this service). Due to Pwned Passwords already having 551M records as of V4, increasingly new corpuses of passwords are actually adding very few new ones so V5 contributes an additional... 3,768,890 passwords. That may not seem like a lot in comparison, but my virtue of an entire half year passing I wanted to get the existing public set updated to the current numbers. It doesn't just add new ones though, those 65M occurrences all contribute to the exiting prevalence counts for passwords that have been seen before.
New passwords include such strings as "Mynoob" (seen 1,208 times), "Find_pass" (303 times) and "guns and robots" (134 times). There's often biases in password distribution due to the sources they're obtained from, for example the prevalence of the service's name or other attributes or relationships to the breached site.
The entire 555,278,657 passwords are now available for download if you're running the service offline. If you're using the k-anonymity API then there's nothing more to do - I've already flushed cache at Cloudflare so you're now getting the latest and greatest set of bad passwords. If you want to be sure you're getting the latest data via the API, check the "last-modified" response header has a July date rather than a January date.
And just while I'm here talking about updates to the corpus of Pwned Passwords, I'm really conscious that releases are happening on a half-yearly cadence which means a bunch of new passwords sit on my side for months before anyone can start black-listing them. This is one of the things that's high on my post-Project Svalbard list; I'd love to see a constant firehose of new passwords being integrated into this service. Not six-monthly, not monthly and frankly, not even weekly - I want to see passwords in there as soon as I get them. The shorter the period between a breached password entering circulation and it appearing in Pwned Passwords, the more impact the service can have on the scourge of credential stuffing. Stay tuned!
As time has passed and more organisations have implemented the service, there's been some really fantastic implementations come out of the community. I wrote about a bunch of them last year in my post on Pwned Passwords in Practice, but it's the work they've done at EVE Online that really stands out:
Obviously these are all some of my favourite things (HIBP, 1Password and Report URI), but it's the improvements made to the user selection of passwords that makes me particularly happy:
When we first implemented the check, about 19% of logins were greeted with the message that their password was not safe enough. Today, this has dropped down to around 11-12% and hopefully will continue to go down.
That's a massive drop that has a profoundly positive impact not just on the individuals using EVE Online, but to the company itself too. Account takeover attacks are a massive problem on the web today and if you reduce the proportion of customers using known bad passwords by up to 42%, you make a direct impact on the cost the organisation has to bear when dealing with the problem.
The NTLM hashes have been really well-received too as they've allowed organisations to quickly check the proportion of their Active Directory users with known bad passwords. Consistently, I'm hearing the results of this exercise are... alarming:
Well, I finally got the NTLM hashes downloaded, and for 1800+ accounts the number using pwned password is a whopping 25% pic.twitter.com/4b2YQWSLE5
What's great about this work is that not only can it stop people from making bad password choices in the first place, you'll see there's a reference towards the bottom that'll allow you to run it against your entire set of AD users on demand. And just like Pwned Passwords itself, it's 100% free and you can go and grab it all right now.
So that's Pwned Passwords V5 now live. Implement the k-anonymity API with a few lines of code or if you want to run it all offline, download the data directly. Either way, take it and do awesome things with it!
After a very non-stop Cyber Week in Israel, I'm back in Oslo working through the endless emails and other logistics related to Project Svalbard. In my haste this week, I put out a really poorly worded tweet which I've tried to clarify in this week's video. On more positive news, the Austrian government came on board HIBP and my MVP status got renewed for the 9th time. I also wanted to talk this week about some of the stats from HIBP I've been preparing as part of the acquisition. There's a bunch of really interesting numbers in there (for me at least) and rather than just keeping them locked away in an information memorandum, I thought I'd share them with everyone in this week's update.