Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, not really.

In fact, you have it totally backwards: you're not supposed to sanitize all user input before storing it. Instead you're supposed to sanitize any user input before you output it back to your webpage.

Even more so: it's the output that dictates what sanitization you should perform, not the input. You don't do input sanitization for HTML (for XSS etc) when you store your data in your DB. Instead you should sanitize the input for SQL Injection issues. And similarly for whatever other output -- if you take user input and run a shell command, you should sanitize for shell safety, not run html sanitization.



Sanitization does not refer to escaping HTML entities before rendering to avoid cross site scripting. That is just escaping of entities done when rendering templates.

I believe by sanitization most people mean processing content which will be rendered and not escaped, a good example is content from WYSIWYG editors. And this is where sanitization libraries would come into play.

You would sanitize HTML fragments before storing them in database because you don't escape them during rendering. Text content is not sanitized before saving to database as you can just escape it when rendering.


You've said nothing that contradicts my post.

As long as the data is sanitized before it can affect the storage/transport mechanism for its content type, you're good.


> As long as the data is sanitized before it can affect the storage/transport mechanism for its content type, you're good.

No, not really. Storing the user's data as is is almost always of paramount importance. The fact that it may be output as HTML/XML/MarkDown/whatever means that it really is at output-time that you must sanitize/escape/quote.

That's why the moral of the Bobby Tables story isn't: "Oh, just remove all semicolons". It's "use prepared queries".


I don't disagree with sanitizing data at output time when it's clear that A) the input won't affect anything else and B) output is going to happen. But realize not all input winds up in a SQL database, not all input will be considered valid in all contexts, and not all input eventually becomes output.

Sometimes, data really does need to be sanitized at the point of submission. If you disagree, that's more of a point about application design than appsec.


> But realize not all input winds up in a SQL database, not all input will be considered valid in all contexts, and not all input eventually becomes output.

That was the point I was trying to make: Sanitizing input is fail-from-the-start. There's no way to know ahead of time what outputs you're going to be producing 5 years from now. Conclusion: Store all input exactly as received. (We can do that these days with form/url encodings and whatnot).

Ok, so now you have the data stored accurately.

Next step: You need to output to, let's say, HTML. Ok, so you just escape/quote everything appropriately and nobody gets hurt. If you just do the escaping/quoting properly there is no XSS attacks. It's really just that simple.

However, it is NOT about sanitizing at the "input" point. Do you get what I'm saying now?

(I realize that that sounds aggressive, but I really just want to force this point home. Please tell me if you disagree or find some detail in my explanation confusing. This is important for the security of the web and either I'm wrong or you're wrong or I didn't understand what you said. Let's figure out which is the case.)

[1] There are caveats here.


You're all saying the same thing.

I didn't specify whether the sanitize occurred on receiving user input or displaying it.

I only said, sanitize all user input.


> I didn't specify whether the sanitize occurred on receiving user input or displaying it.

I'm sorry, but you basically did. You said:

> You must sanitize ALL user input even if you don't think you're going to render it on a web page

Which implies that sanitizing input at display time, when you know you're rendering it to a web page, is too late. That's why people are jumping on you. Keeping a clean database is the absolute most important thing you can do. The database isn't contextual. The data it stores can find its way into HTML pages, REST responses, SQL queries, PDF reports, XML/JSON data exports and a ton of other formats. Each of these output formats will require a different form of sanitizing. Sanitizing before the data hits disk creates a nightmare for anyone displaying the data in a context other than the sanitization that was performed. So what you said originally is precisely incorrect. Only sanitize input when you know it's going to be rendered to a webpage. Otherwise, leave it alone.

Now, you should be using view-layer frameworks to make that sanitization easy, automatic and the default action. When rendering to HTML, the templating language should sanitize by default and give a way for template authors to opt-out when they know the data did not come from user input. Likewise, in the SQL context, prepared statements also make it easy for the developer to do the right thing. But at no point are you speculatively sanitizing all user input. You're getting user input to disk in as pristine a format as possible and sanitizing contextually depending on how the data is outputted.


Also really it should be less about sanitising, more about escaping. If possible, the input should be stored essentially as-is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: