Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Django-bcrypt (github.com/dwaiter)
92 points by stevelosh on Feb 3, 2011 | hide | past | favorite | 61 comments


Step 1: Install

Step 2: Add to INSTALLED_APPS

Step 3: Enjoy more secure password hashing.

...

Step 4: Set BCRYPT_ROUNDS to something higher than 12 when computers get faster.


You sir, are a machine. Steve started working on this 3 feet behind me a little less than 2 hours ago.


It took him two hours to monkey patch a few methods?


Actually the code was already written -- see the bottom of the README.

It took about 30-40 minutes to:

* Add the BCRYPT_ROUNDS setting.

* Remember how to allow for an optional setting (`if foo in settings` and `settings.get(foo, default)` don't work).

* Add the setup.py file.

* Write a README.

* Test with a real Django site to make sure nothing was broken.

* Create and push to repos on BitBucket and GitHub.


> Remember how to allow for an optional setting (`if foo in settings` and `settings.get(foo, default)` don't work).

Instead of:

    settings.get(foo, default)  
Try:

    getattr(settings, foo, default)
This looks useful. Thanks.


Good idea, pushed!


I left a few minutes after he started. He very well might've been banging his head against other (client-related) issues within those two hours.


How handy. Thanks for putting this together and instantly shortening my to-do list.


Is it just coincidence that I wrote almost-identical code about 7 hours before this? (https://github.com/playfire/django-bcrypt/)


Bcrypt (or scrypt) should actually be used as default for password hashing in application servers. Using something like crypt/md5/sha-1 hashing per default is just completely irresponsible.


You know, I know that bcrypt is the cool new thing, but I don't think it's really that irresponsible, and certainly not completely irresponsible, to use a hashing scheme with a huge salt. I understand that bcrypt is super slow and therefore super awesome since the bad guy will have to sit around longer to decrypt your whole database, but maybe we should be a bit more careful with things that are termed "completely irresponsible". Something may be adequate and not be the best option, and something may be inadequate but not be "completely irresponsible".

For an example of complete and wanton irresponsibility, see Plenty of Fish.


Big salt, little salt, smoked salt, epsom salt. Nothing you can do with your salt protects you from the attack that bcrypt protects you from, which is the same attack that harvested passwords from the Gawker hashes.


Can you provide more detail? That doesn't mesh with my understanding. The Gawker hacks are different in a couple of ways.

First, they had access to the complete Gawker database, could have downloaded it and run their attacks on their side. They still would have needed to use blowfish/bcrypt (afaik), but your high rounds are useless if an attacker gets the actual data out of your infrastructure, just like they are with normaler hashes.

Second, the Gawker guys had no salting, and they used DES, which, as you know, was deprecated before many HN users were born. This made cracking the passwords really easy ... not that it would be that much harder with a conventional salting mechanism or even with bcrypt.

I was not contending that bcrypt is not like really great, just that it's not "completely irresponsible" to not use bcrypt or scrypt. bcrypt is still vulnerable to dictionary attacks, the only difference is the amount of time it will take to get everyone's password out, which, again, is useless in a case like gawker where the dumps are distributed all over the internet for anyone with an interest to peruse.

Please correct me if I'm wrong.


You're wrong. DES crypt(3) hashes have had salts since the 1970s. Capture of complete password databases is exactly why you hash passwords instead of storing them plaintext. I don't think you have the requisite background to hold a strong opinion about this subject; you should just take our word for it.


I think there's a communication barrier here, but I'll just let it go for now. Gawker's passwords were unsalted according to the first few hits on Google and that's all the research I did on it, not that it really makes a difference since a salt isn't the point here anyway; the point is that you don't have to be using bcrypt to be a responsible datakeeper and that bcrypt isn't an end-all as it's made out to be.

You're right that I don't have a very deep cryptographic background, but from everything I've seen and read, dictionary attacks still work on bcrypt'd passwords, just more slowly than otherwise. This is the part I want to know if I'm wrong about; is there some a reason a dictionary attack doesn't work on bcrypt? I understand that it will take longer to get 100% password recovery out of a bcrypt'd database than a database hashed with SHA-256 or SHA-512. Let's say that's not relevant because our theoretical attacker has A) a computer from the future which is fast enough to blow through bcrypt just as quickly as SHA-* or MD5; or B) is only interested in one user's password, so the expense is bearable. In these situations, does bcrypt provide extra protection that isn't routinely implemented in a standard hash?

My understanding is that bcrypt does not provide much besides the slowdown. I'm not trivializing the slowdown, I understand completely why it is beneficial to keep passwords away from bad guys as long as possible after a data leak. I am merely saying that if I'm correct and the only thing standing between bcrypt and SHA-* for a captured password database is X iterations of silicon, I don't really see using SHA-* hashes as "completely irresponsible".


Dictionary attacks can be made infeasible against bcrypt. DES crypt has a 12-bit salt. Using SHA hashes is insecure. It isn't the end of the word. Knowingly choosing to use a naked salted SHA hash instead of a "stretched" SHA has or bcrypt or scrypt or PBKDF is, in fact, irresponsible.

I would have ignored your original comment and avoided pedantry except for your original assertion that your salting did something to mitigate the risk of not using bcrypt. It does no such thing. There's nothing else for us to argue about.


Why bcrypt should be used instead of sha-1 ? (I am genuily asking, I don't know anything about security).


Essentially, because it's slow.

SHA is designed to be fast, so dictionary attacks can find huge amounts of matches nigh-instantly, and rainbow tables (or GPU EC2 instances!) exist to brute-force up to 8-12 characters, more than enough to crack most people's passwords. You can salt the hash (basically: pre-pend a unique string to the password) to defeat rainbow tables, but SHA doesn't do this by default. You can SHA something more than once to slow things down, but SHA doesn't do this by default (and rainbow tables still work - they're based on repeated hashings).

BCrypt is slow by design (ie, thousands of times slower). It's a much-harder calculation, and it automatically uniquely salts what it's hashing, and you can make it more secure incrementally by simply running it through more steps (it's designed to do this). The speed isn't noticeable to a user while logging in because checking if their password is correct is still extremely quick, but attempted-brute-forcers run up against a brick wall as their attempts are now rainbow-table-proof and slower by orders of magnitude. Best of all: it does all this by default, and it stores all the necessary information in the result so comparing values against it is fool-proof. You can't mis-use it.

SCrypt takes all the advantages of BCrypt and goes a step further: it guarantees that a certain amount of memory (ie, large) is required to perform the hashing function. So while SHA / BCrypt can still be attacked more quickly by, say, performing a thousand operations at a time through custom hardware, SCrypt can demand so much memory that it's simply infeasible to do so, so you're stuck testing one. password. at. a. time. It's the ultimate death to brute-forcing, basically.




I haven't done a diff, is the bitbucket or github repo "official"?


Technically BitBucket (the GitHub one has "mirror of django-bcrypt" as a description), but I try to always push to both at the same time.

I'll take patches/pull-requests from either.


Unfortunately, for a lot of us working in locked down environments, this code makes use of the py-bcrypt module which uses the bcrypt C implementation. If you're running on GAE, you're out of luck. I keep telling myself to port to pure Python, but haven't had time. Anyone interested?


I see that Google allows you to use the crypt module, and I would wager that their Python is linked against glibc, in which case, you might be able to use sha-crypt. It's worth checking. e.g.:

  def encode_password(password):
    """Return crypt() encoded password"""
    # Uses glibc SHA-crypt extension via $6$
    salt = b64encode(os.urandom(8)).rstrip('=')
    return crypt(password, '$6$' + salt)


There's a pure-Python implementation of PBKDF2. I know it's not as good as bcrypt or scrypt, but at least it's better than MD5 or one of the SHA's.


I saw a Twitter discussion with Colin a couple weeks ago here he pointed out that Python no longer guarantees constant time for all of the basic binary operations like addition (since fixed width integers can spill over to arbitrary precision math if you overflow them, or breathe on them wrong).

Be careful.


Password hashing function almost by definition does not handle any data that must be protected from the user, so side channel leaks do not make much sense. Only secret information comes into play in final comparison of hash of user supplied password and your stored hash and I don't see any way of exploiting possible timing leak in that comparison that does not require breaking the hash in the first place.


Side channels don't matter for bcrypt.


Can you explain why this is? (Not doubting you; just curious.)


The online operations exposed to an attacker in a bcrypt system are done almost entirely on attacker-known data.


Oh boy. That could get messy.


What were your timings like for bcrypt using 12 rounds? Also, while for the SHA-x algorithms, there have been numerous tests--what about for the pycrypt module?


Timings are going to be completely dependent on the server. Anything I tell you will be wrong unless you're running on my server.

And I didn't test any SHA-x algorithms. Any algorithm that has a time you can't easily increase by tweaking a number (BCRYPT_ROUNDS) will eventually become insecure as computers get faster.


It's not straight SHA-x, I'd wager; glibc includes a version of phk's MD5-based crypt() based on SHA2 (with some bonus insanity thrown in; it's a spawn of MD5 crypt and Drepper, after all.)

Like md5crypt, this new crypt() is not based on established cryptographic principles.


Does anyone know how to use bcrypt on App Engine? py-bcrypt is not pure python and hence can't be used.


I don't know of any pure-Python bcrypt implementation, but App Engine will let you calculate SHA-1 hashes with the built-in hashlib module. To make this slow enough, you'll need to iterate it a bunch of times (at least 1000 is the usual recommendation). Something like this:

    import hashlib
    
    def slow_hash(password, salt, iterations=2000):
        h = hashlib.sha1()
        h.update(password)
        h.update(salt)
        for x in range(iterations):
            h.update(h.digest())
        return h.digest()
I haven't tested this code, but hopefully it illustrates the general idea. Repeatedly run the hash function on the output of the previous iteration. You may need to bump up the number of iterations later as computers get faster. It's not as good as bcrypt, but it also doesn't suck, and it should run just fine on App Engine.


I thought I remember a post a while back but:

What is the cost (time) of using bcrypt instead of the default?


Pretty much anything you want it to be. The higher the work factor (BCRYPT_ROUNDS in this app), the more time it takes. That's the beauty of bcrypt.


That said, you might want to scale it back a little. 2^12 rounds takes 1-2 CPU seconds (on a different implementation, but still.)


next step is to make this the default. unfortunately the django overlords think it's not important enough to be default.


A few things:

First, There's no such things as "the django overlords". We're an open source community of thousands. Hundreds contribute. Dozens have commit access.

Vry few among these contributors, and none (that I know of) on the commit team thinks that bcrypt support "is not important." If you'd like to prove me wrong, please cite your source.

Now, this has been proposed a few times (http://code.djangoproject.com/ticket/5787, http://code.djangoproject.com/ticket/5600). Each time, substantive problems with the approach have been found. A few choice quotes from the discussion should illustrate the issues:

Me, on #5600: "[T]here's a problem with supporting any hash schemes not in Python 2.3 (our lowest supported Python version): it means databases created with a different version of Python break when used under a lower one." (Now it's Python 2.4, and 2.5 soon, but the problem still holds.)

Malcolm, on #5787: "As soon as you start generating passwords that are only computable based on an optional model, the database can only be used with Django installations that have that model available. This removes the ability to move the database around easily. Django operates on a "batteries included" philosophy for exactly that reason: runs anywhere without lots of extra dependencies."

Worse, this third-party module (i.e. Python bcrypt module), last I checked, failed to build on Windows. As much as I hate supporting Windows, I recognize why we have to.

What I'm trying to say is that the issue is technical, not personal or political as you seem to think. If the technical issues can be overcome, I don't see why bcrypt support couldn't be the default.

Finally, I'll close with the obligatory note that open source projects get driven forward by people scratching their own itch. If this bothers you, fix it. If your fix is rejected, try again. If you think we're a bunch of asshats, fork the project. Do any of those things, and I'll respect you. Complain and sling personal attacks and I won't.


This is why I chose to make this a separate app instead of embarking on a painful journey to get it into core.

Those of us that run OSes that don't suck and don't use versions of Python that are older than the average hamster's lifespan can add two lines to their project and be more secure TODAY.


And the others of us in the same situation greatly appreciate it.


If you can't hash passwords properly, you shouldn't be dealing with passwords at all. It's just damned irresponsible.

"the django overlords" "Dozens have commit access."

I suspect that's what he meant.


> It's just damned irresponsible.

Since you feel that strongly I can expect to see a patch from you fixing the technical issues I mentioned, right? You write it, I'll commit it. Go.


Look, I love Django, but to be fair this is kind of a bullshit response.

There are things I hate about Git, and I could fix them myself, but I don't submit patches to Git. I just use Mercurial instead.

"Send patches" isn't a be-all, end-all response to any criticism of an open source project.


You're right. I get kinda pissed when people call me "dammed irresponsible" because I can't find the time to solve some technical problems. It implies a sense of entitlement to my time that rubs me the wrong way.

You did exactly the right thing: figured out how to solve the problem regardless. That's something that'll motivate me; calling me stupid or irresponsible won't.

So yeah, you're right, it is bullshit, and so is the attitude I responded to. Garbage in, garbage out, I suppose.


Having several open source projects of my own, and contributing to a few more, I definitely know how you feel.

Here's what I want to know about this in a nutshell:

Does Django (the project as a whole) want to provide the best possible security, within reason, for its contrib.auth module?

If not, why not, and why isn't it stated prominently in the documentation?

Is bcrypt not the best possible security, or reasonably close to it?

If not, why not? I'm not even remotely close to a cryptography expert, so although bcrypt's support for arbitrary work factors seems to provide very good security to me I know I could very well be horribly wrong in this thinking.

Is providing bcrypt hashing for passwords in contrib.auth not within the realm of reasonable effort? This could mean rewriting bcrypt in pure Python and including it in contrib, to support Windows users.

If not, why not? Perhaps rewriting bcrypt in Pure Python is not easy -- I haven't tried it myself.

If bcrypt hashing is secure and reasonable to implement, and Django wants to provide the best security possible (within reason), why is this not a blocking issue for Django 1.3?

I genuinely don't know the answers to any of these questions, so I'd really love to know.


Having several open source projects of my own, and contributing to a few more, I'd like to say that I don't think you're on the right track if you're trying to get a real response with your questions here; you're basically doing the "so, have you stopped beating your wife yet" routine.

Don't believe me? Your words:

Does Django (the project as a whole) want to provide the best possible security, within reason, for its contrib.auth module? If not, why not, and why isn't it stated prominently in the documentation?

Of course we want to provide the best possible security, within reason. But reasonable people can and do disagree on what's "within reason", and Jacob's outlined some technical hurdles regarding bcrypt which -- so far as I'm aware -- no-one in this thread has bothered to offer solutions for.

If you're genuinely interested in seeing bcrypt in Django, and have constructive suggestions on how to overcome these technical hurdles, then I'm all ears. If, on the other hand, you're just going to post passive-aggressive stuff framed to make us look like we don't really care about security, well, don't expect me or anyone else to fall all over themselves trying to help you out.


Mmm, that's not how I read Steve's questions -- I took them as honest questions about from someone who doesn't really follow the project and isn't sure where our priorities lie. We have to keep in mind that at this point a bulk of our users don't keep close track of the development process and priorities. Heck, even I have trouble keeping up some times.


At best I can say it's incredibly poorly phrased if it was trying to raise constructive points. The implication of "if you really cared about security, you'd..." just rubs me the wrong way.


I'm sorry. I'm a programmer and think in terms of `if X elif Y else Z` statements.

I admitted I might be wrong at pretty much any stage, and Jacob's response convinced me that my "rewrite bcrypt in Python" option is probably not reasonable at this point.

How could I have phrased that differently and still asked the same questions?


> Is bcrypt not the best possible security, or reasonably close to it?

Reasonably close to it, yes, but scrypt is better. scrypt makes the KDF expensive not just in time, but in memory as well.

http://www.tarsnap.com/scrypt.html

"We estimate that on modern (2009) hardware, if 5 seconds are spent computing a derived key, the cost of a hardware brute-force attack against scrypt is roughly 4000 times greater than the cost of a similar attack against bcrypt (to find the same password), and 20000 times greater than a similar attack against PBKDF2."


Well, yes and no. scrypt is a very sensible design based on the battle-tested PBKDF2, but it's still a lot newer than bcrypt. That said, either algorithm should be totally fine.


> I genuinely don't know the answers to any of these questions, so I'd really love to know.

Well, I don't speak for the project as a whole, so I'm going to just answer personally. I'll try to channel the rest of the core team as best I can, but please don't take any of the below as any sort of "official" thing. I may very well have a different point of view or be in the minority -- I often am, actually.

> Does Django (the project as a whole) want to provide the best possible security, within reason, for its contrib.auth module?

I certainly do, and I'm sure the rest of the team feels similarly. We take security issues very seriously and I'm disappointed we've not been able to demonstrate that through our past actions (i.e. built-in XSS and CSRF protection, our security releases, etc.) This indicates to me that we haven't done a good job being clear about our goals with regard to security. So that's something to work on.

I think, though, that reasonable people can -- and do -- disagree about what "within reason" means. I mean, are we building Django to protect against script kiddies? Malicious employees? Corporate espionage? Government agencies?

Me, I suspect I'd choose to fold in the face of a lawsuit or subpoena, so I don't particularly care if my passwords are safe against the NSA or something. But that's because I'm a spoiled comfortable middle class yutz.

> Is bcrypt not the best possible security, or reasonably close to it?

Personally I have no idea. I'm not a security expert, nor am I even a well-informed amateur. I've read (here and on Reddit, mostly) that bcrypt is the best there is. I've read that bcrypt is for lamers and scrypt is better. I've also been told that salts & sha1 is fine. I've also been told that sha1 will eat my children and burn my house down. I'm honestly not qualified to judge.

Given what I know, my feeling is that bcrypt/scrypt is certainly an improvement over sha1, and probably an improvement over any sha version. I'm not convinced that it's an improvement over, say, multiple rounds of a sha algorithm.

More importantly, I'm not clear on exactly how big a deal this is. There's a spectrum: at one end, we activate our security policy, halt everything, and release new versions, damned the backwards compatibility concerns. At the other end of the spectrum we do nothing. I really don't know where on this spectrum the issue falls. I suspect that it's somewhere a bit more serious than the potential timing attacks we just fixed in trunk, but maybe a bit less serious than the DOS attack our last security release fixed.

> Is providing bcrypt hashing for passwords in contrib.auth not within the realm of reasonable effort? This could mean rewriting bcrypt in pure Python and including it in contrib, to support Windows users.

Of course it's possible, but the devil as they say is in the details. I think it should be possible to support bcrypt if it's installed, but the concerns about data portability need to be addressed in some way. At the very least there should be some "I don't care about data portability" flag you can set to turn on bcrupt support.

A pure-Python bcrypt implementation would certainly help. But I certainly am not going to rewrite bcrypt -- I know enough about crypto to know that I shouldn't be allowed with a thousand miles of writing an algorithm by hand. And frankly there aren't any active committers I'd trust to write such an implementation. It would have to come from a pretty unimpeachable source, wouldn't you agree?

> If bcrypt hashing is secure and reasonable to implement, and Django wants to provide the best security possible (within reason), why is this not a blocking issue for Django 1.3?

Because nobody proposed it and we're well past feature freeze for 1.3 and very close to cutting a release candidate. Also because there's a great third-party app that provides this feature in a very easy-to-use way :)

But If a majority of the community wanted to make bcrypt (or whatever) a blocking feature for 1.3 then I'd go along with it. I'd argue against it, but again I'm just one voice. A loud one, maybe, but I'd like to think I can take being wrong graciously. I was against template auto-escaping originally, for example, so clearly I'e already got a good track record of being wrong about security.

I hope that helps; it's late and I've had a long day. Please ask if I'm not being clear.


   I think, though, that reasonable people can -- and do -- disagree about what "within reason" means.
Absolutely, which is where my next questions come from.

    I'm not convinced that it's an improvement over, say, multiple rounds of a sha algorithm.
Sure, for this conversation feel free to replace "bcrypt" with "configurable rounds of SHA1".

    More importantly, I'm not clear on exactly how big a deal this is.
I agree here.

Yes, bcrypt is better.

Is it "better enough" to warrant backwards-incompatible changes? Maybe not.

I'm not clear on Django's database-compatibility policy though. Are databases created with Django 1.X guaranteed to work with Django 1.Y (where Y < X)? If not, then there are no problems. If so, then you're right, a backwards incompatible change like this is not trivial.

Maybe I completely missed this in the docs.

    At the very least there should be some "I don't care about data portability" flag you can set to turn on bcrupt support.
This goes back to my question about databases working with older versions of Django. Did I miss an important part of the docs?

    A pure-Python bcrypt implementation would certainly help. But I certainly am not going to rewrite bcrypt -- I know enough about crypto to know that I shouldn't be allowed with a thousand miles of writing an algorithm by hand. And frankly there aren't any active committers I'd trust to write such an implementation. It would have to come from a pretty unimpeachable source, wouldn't you agree?
This is the first argument that really convinces me. If you need to support Windows and don't have anyone you trust to write a real crypt implementation in pure Python, that kind of kills the idea in its tracks.

A third-party app usable by non-Windows-users seems like the best option.

    I hope that helps; it's late and I've had a long day. Please ask if I'm not being clear.
Definitely. Thanks for taking the time to answer.


I don't submit patches to Django because I: 1) am not a Django dev. 2) Don't use it. 3) believe it is the job of the people who do both of those to do it.

I don't normally have this sort of attitude, but when it comes to security, "if you can't [be bothered to] do it right, DON'T." MD5/SHA/etc are designed to be fast. That is the absolute last thing you want in a hashing algorithm that you're using for passwords.


I enjoyed the part where you addressed any of the issues that have been brought up over the past few years of having this be in core, instead of an "optional" app.


Looking at the py-bcrypt site, it looks like it has worked on Windows in the past. It should be possible to fix it to work on windows (I see a patch from a few months ago to fix a recent build problem on XP).

py-bcrypt works with 2.4, too.


> This removes the ability to move the database around easily.

Aren't warning in documentation "This setting requires this and that, so if you chose it don't expect your project to be easily portable. You have been warned." enough?


the issue is one of priorities and it is clear to me that supporting python 2.3, or remaining batteries-included, or backward compatible, or whatever other excuse you can come up with is more important than having modern password security. perhaps if it seemed remotely likely that a patch that met most but perhaps not all of your requirements would have a chance of being accepted someone would do it. as it stands it seems to be futile given the unrealistic requirements.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: