Cross-Site Scripting (XSS) attacks are a type of injection, in which malicious scripts are injected into websites. XSS attacks occur when an attacker uses a web application to send malicious code generally in the form of a browser side script, to a different end-user.
Early on, two primary types of XSS were identified, Stored XSS and Reflected XSS. In 2005, Amit Klein defined a third type of XSS, which Amit coined DOM Based XSS. These 3 types of XSS are defined as follows:
Stored XSS occurs when user input is stored on the target server, such as in a database, in a message forum, visitor log, comment field, etc. And then a victim is able to retrieve the stored data from the web application without that data being made safe to render in the browser. With the advent of HTML5, and other browser technologies, we can envision the attack payload being permanently stored in the victim’s browser, such as an HTML5 database, and never being sent to the server at all.
Reflected XSS occurs when user input is immediately returned by a web application in an error message, search result, or any other response that includes some or all of the input provided by the user as part of the request, without that data being made safe to render in the browser, and without permanently storing the user provided data. In some cases, the user provided data may never even leave the browser (see DOM Based XSS next).
As defined by Amit Klein, who published the first article about this issue , DOM Based XSS is a form of XSS where the entire tainted data flow from source to sink takes place in the browser, i.e., the source of the data is in the DOM, the sink is also in the DOM, and the data flow never leaves the browser. For example, the source (where malicious data is read) could be the URL of the page (e.g., document.location.href), or it could be an element of the HTML, and the sink is a sensitive method call that causes the execution of the malicious data (e.g., document.write).”
For the majority of PHP applications, htmlspecialchars() will be your best friend. htmlspecialchars() supplied with no arguments will convert special characters to HTML entities, below shows the conversions performed:
'&' (ampersand) becomes '&' '"' (double quote) becomes '"' '<' (less than) becomes '<' '>' (greater than) becomes '>'
Eagle eyed readers may notice this does not include single quotes. For this reason we recommend that htmlspecialchars() is always used with the ‘ENT_QUOTES’ to ensure single quotes will be encoded. Below shows the singe quote entity conversion:
"'" (single quote) becomes ''' (or ')
Another function exists which is almost identical to htmlspecialchars(). htmlenities() performs the same functional sanitization on dangerous characters, however, encodes all character entities when one is available. This may lead to excessive encoding and cause some content to display incorrectly if character sets change.
http://www.example.com/view.php?name=te"st [...] <script> var = "te\"st "; // addslashes() displayname(var); </script>
<script> var = "test1</script><script>alert(document.cookie);</script>"; displayname(var); </script>
We talked before about considerations for the location of data, and will go over some examples where entity encoding with htmlspecialchars() is not enough. One of the most common examples of this is when data is inserted within the actual tag or attribute of an element.
This is just one of many somewhat rare situations where extremely strict filtering is required. For an in-depth look at many injections, take a look at the OWASP XSS Prevention Cheat Sheet.
Virtue Security makes no recommendation or provides any warranty for third party products or software; however, we are aware that several third party PHP libraries are commonly used to assist in XSS prevention. Below are projects that may assist developers building suitable whitelists:
HTML Purifier – http://htmlpurifier.org/
PHP Anti-XSS – https://code.google.com/p/php-antixss/
htmLawed – http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/