How to detect XSS in Java using OWASP - java

I have this thing which I'm dealing with right now, XSS.
I need to detect if a string contains XSS or not. In order to solve it I used that link. And this is the code I'm using:
public static boolean containsXSS(String value) {
if (StringUtils.isEmpty(value)) {
return false;
}
String stripXss = stripXSS(value);
return !value.equals(stripXss);
}
public static String stripXSS(String value) {
if (StringUtils.isBlank(value))
return value;
// Use the ESAPI library to avoid encoded attacks.
Encoder encoder = ESAPI.encoder();
value = encoder.canonicalize(value);
// Avoid null characters
value = value.replaceAll("\0", "");
// Clean out HTML
Document.OutputSettings outputSettings = new Document.OutputSettings();
outputSettings.escapeMode(Entities.EscapeMode.xhtml);
outputSettings.prettyPrint(false);
value = Jsoup.clean(value, "", Whitelist.none(), outputSettings);
return value;
}
Using the code above I do succeed to catch things like: <script>alert('xss')</script>
My problem is that I identify the following string as containing XSS although it's not:
{"item" :5}
It's because jsoup.clean turns it into {"item" :5}
I have tried to solve but with no success.
It makes me wonder if my algorithm is completely wrong (if so where can I find the algorithm to detect XSS), perhaps I don't need to compare to the original String?
I would very appreciate if you could help me.
thanks

You cannot detect if a string contains XSS. XSS in an output issue, not an input issue. Data benign in one context, can cause malicious behaviour in another.
Validate data using white lists to ensure data is valid in your domain (numbers are numbers, names do not contain unwantef characters etc.). This will stop some but definitely not all attacks.
Contextually encode user provided output as explained in the OWASP XSS prevention cheat sheet
Dont mix client side and server side templates
Be careful when using unsanitized data in javascript (see DOM-based XSS)

Related

Regex pattern to split colon char with a condition

I have a string like this :
http://schemas/identity/claims/usertype:External
Then my goal is to split that string into 2 words by colon delimiter, but in need to specified how the regex worked, it will be split the colon but not including colon in "http://", so those strings will be split into :
http://schemas/identity/claims/usertype
External
I have tried regex like this :
(http:\/\/+schemas\/identity\/claims\/usertype)
So it will be :
http://schemas/identity/claims/usertype
:External
then after that i will replace the remaining colon with empty string.
but i think its not a best practice for this, because i rarely used regex.
Do you have any suggestion to simplified the regex ?
Thanks in advance
This is an X/Y problem. Fortunately, you asked the question in a great way, by explaining the underlying problem you are trying to solve (namely: Pull some string out of a URL), and then describing the direction you've chosen to solve your problem (which is bad, see below), and then asking about a problem you have with this solution (which is irrelevant, as the entire solution is bad).
URLs aren't parsable like this. You shouldn't treat them as a string you can lop into pieces like this. For example, the server part can contain colons too: For port number. In front of the server part, there can be an authentication which can also contain a colon. It's rarely used, of course.
Try this one, which shows the problem with your approach:
https://joe:joe#google.com:443/
That link just works. Port 443 was the default anyway, and google ignores the authentication header that ends up sending, but the point is, a URL may contain this stuff.
But rzwitserloot, it.. won't! I know!
That's bad programming mindset. That mindset leads to security issues. Why go for a solution that burdens your codebase with unstated assumptions (assumption: The places that provide a URL to this code are under my control and will never send port or auth headers)? If the 'server' part is configurable in a config file, will you mention in said config file that you cannot add a port? Will you remember 4 years from now?
The solution that does it right isn't going to burden your code with all these unstated (or very unwieldy if stated) assumptions.
Okay, so what is the right way?
First, toss that string into the constructor of java.net.URI. Then, use the methods there to get what you actually want, which is the path part. That is a string you can pull apart:
URI uri = new URI("http://schemas/identity/claims/usertype:External");
String path = uri.getPath();
String newPath = path.replaceAll(":.*", "");
String type = path.replaceAll(".*?:", "");
URI newUri = uri.resolve(newPath);
System.out.println(newUri);
System.out.println(type);
prints:
http://schemas/identity/claims/usertype
External
NB: Toss some ports or auth stuff in there, or make it a relative URL - do whatever you like, this code is far more robust in the face of changing the base URL than any attempt to count colons is going to be.
Use Negative Lookbehind and split
Regex:
"(?<!(http|https)):"
Regex in context:
public static void main(String[] args) {
String input = "http://schemas/identity/claims/usertype:External";
validateURI(input);
List<String> result = Arrays.asList(input.split("(?<!(http|https)):"));
result.forEach(System.out::println);
}
private static void validateURI(String input) {
try {
new URI(input);
} catch (URISyntaxException e) {
System.out.println("Invalid URI!!!");
e.printStackTrace();
}
}
Output:
http://schemas/identity/claims/usertype
External
I think this might help you:
public class Separator{
public static void main(String[] args) {
String input = "http://schemas/identity/claims/usertype:External";
String[] splitted = input.split("\\:");
System.out.println(splitted[splitted.length-1]);
}
}
Output
External

How to read DICOM string value with backslash (VR=LO, Value="0.4323\0.2325")?

Our C++ software use ITK to write DICOM files. In it we have a Private Tag with LO (Long String) as VR and 2 decimal values as value like 0.3234\0.34223.
The LO choice is inherent to ITK.
In other java application, I use dcm4che3 to read/write them. Since it respects the DICOM protocol, backslash are forbidden, and dcm4che interpret the value as "0.3234" and never reach the second value.
All DICOM viewer applications I use can display this value.
So my question is: Is there a trick in dcm4che to read this complete value as a string "0.3234\0.34223" despite the presence of a backslash?
Below, the code I use:
public DicomInfo uploadFile(MultipartFile file) throws IOException, ParseException {
DicomInfo infos = new DicomInfo();
Attributes attrs = readDicomAttributes(file);
infos.setTags(toAttributesObject(attrs).toString());
}
private static JsonObject toAttributesObject(Attributes targetSeriesAttrs)
{
StringWriter strWriter = new StringWriter();
JsonGenerator gen = Json.createGenerator(strWriter);
JSONWriter writer = new JSONWriter(gen);
writer.write(targetSeriesAttrs);
gen.flush();
gen.close();
return Json.createReader(new
StringReader(strWriter.toString())).readObject();
}
public Attributes readDicomAttributes(MultipartFile file) throws IOException
{
DicomInputStream dis = new DicomInputStream(file.getInputStream());
Attributes dataSet = dis.readDataset(-1, Tag.PixelData);
Attributes fmi = dis.readFileMetaInformation();
dis.close();
fmi.addAll(dataSet);
return fmi;
}
In the JSON I get for this tag:
\"00110013\":{\"vr\":\"LO\",\"Value\":[\"0.4323\"]},
As you can see it is LO and the second part is already lost.
The method I use to get the specific attribute:
attr.getStrings(0x00110013)
send back a table with only one value, 0.4323.
The problem happens during the readDataSet function.
When I open tags with software like free dicom viewer, I have the complete data, so data is here.
Ok I found the source of the problem... It is the addAll fmi.addAll(dataSet);
In dataSet, getStrings works perfectly. In fmi after addAll, the attributes lost the second value.
So my problem is to solve this addAll issue now: dcm4che3 java lib: Attributes.addAll method seems to lost multiple LO values
See answer from Paolo, and please believe us that the backslash is not a violation of the VR. Like he said, the attribute is 2-dimensional, i.e. it has two values of VR LO which are separated by the backslash.
I know a bit about the dcm4che project and the people behind it, and it is nearly unthinkable to me that it is generally incapable of handling this.
I strongly suspect that your problem is related to the fact that your attribute is private. That is, without any additional information to the tag and its value, dcm4che (and any other product) can never know that the attribute's value is encoded as VR LO (Long String).
The default transfer syntax in DICOM is Implicit Little Endian. This means, that the dataset does not convey an explicit information about the VR of the attributes in the dataset. This information is implicitly encoded by the Tag of the attribute, and the data dictionary (DICOM Part 6) must be used to look up the tag and obtain the corresponding VR. Obvioulsy this only works for well-known DICOM tags defined in the standard and fails for private ones.
So one option is to try encoding the dataset in Explicit Little Endian, which means that the VR is part of the attribute's encoding in the dataset. I would expect this to be sufficient.
Mature toolkits (like dcm4che) allow for extending the data dictionary by configuration, that is, you can extend the "official" data dictionary used by the tookit with your custom tag definitions - including the VR. So the tag can be looked up in the case that the VR is not explicitly given in the dataset itself.
Again, I am not an expert for dcm4che, but a quick search at google for "dcm4che private dictionary" yields this promising link.
I am more than confident that you can solve the problem in dcm4che and that you do not have to migrate to a different toolkit.
The solution of this problem is to write
dataSet.addAll(fmi);
return dataSet;
instead of
fmi.AddAll(dataSet);
return fmi;
since the addAll methods lost multiple values of private LO
LO can have multiple values separated by a backslash.
The DICOM standard says that in the VR "LO" the backslash cannot be used in values because it is used to separate the different elements.
In VRs that don't allow multiple elements then the backslash can be used in values.
So dcm4che is wrong here.

Perfect way to configure esapi to mitigate XSS SQLI {GET/POST data}

We have a jaxrs service, Unfortunately the raw query is executed,No prepared statement. We used ESAPI to mitigate XSS,SQLI. Something like below:
private String mitigateSQLI(String value) {
Encoder instance = ESAPI.encoder();
Codec c = new MySQLCodec(MySQLCodec.Mode.ANSI);
return instance.encodeForSQL(c, value);
}
private String mitigateXSS(String value) {
if (value == null)
return null;
// Use the ESAPI library to avoid encoded attacks.
value = ESAPI.encoder().canonicalize(value);
// Avoid null characters
value = value.replaceAll("\0", "");
// Clean out HTML
Document.OutputSettings outputSettings = new Document.OutputSettings();
outputSettings.escapeMode(EscapeMode.xhtml);
outputSettings.prettyPrint(false);
value = Jsoup.clean(value, "", Whitelist.none(), outputSettings);
return value;
}
Along with the ESAPI.properties file with default configuration.
Still we are facing SQLI on certain scenarios, Knowing the queries are concatenated and formed.
Wanted to know if the best way/configuration to mitigate these things. Way can be the ESAPI properties or these ESAPI available methods.
Don't run away to the alternatives. As quoted in bold in the OWASP
Primary Defenses:
Option 1: Use of Prepared Statements (with Parameterized Queries)
Option 2: Use of Stored Procedures
Option 3: Whitelist Input Validation
Option 4: Escaping All User Supplied Input
Additional Defenses:
Also: Enforcing Least Privilege
Also: Performing Whitelist Input Validation as a Secondary Defense
Configuring ESAPI.properties as per application requirement is important. When not using Prepared Statement, You must escape inputs on the server side. For Java , StringEscapeUtils from Apache does the job.
It's important to note that the design intent for ESAPI's SQLi codecs was to provide emergency support in a post-compromise scenario to buy time for you to rewrite the vulnerable queries with a PreparedStatement or some equivalent. OWASP makes no assurance because the peculiar language quirks present in the various RDBMS implementations aren't typically common knowledge. It's not that ESAPI won't protect you at all--but it will never be as good as with a PreparedStatement. I would expect only a partial mitigation.
I would also recommend against using Jsoup in lieu of Validator.getValidSafeHTML(). Jsoup will attempt to "correct" invalid HTML, and that gets complicated extremely quickly in the contexts of input validation and safe HTML.

How can I declare a Large String(<Html> codes) in Java to avoid "Constant string too long" error?

In Java I am using a large String for <Html> codes which meant to create a complete designed email body.
eg.
String msg=
"<html>"+
<BODY CONTENT>
"</html>";
Problem is I am getting error "constant string too long”.
I need some ideas if anyone of you have solved this issue or faced this.
There is limitation the Constant string but one can have a much larger String in memory as long as it is created at run-time such as by reading it through a resource.
You can try this solution.
private static String msg = null;
public static String getMyString() throws IOException {
if (null == msg) {
try (BufferedReader br = new BufferedReader(new InputStreamReader(MyClass.class.getResourceAsStream("msg.txt")))) {
msg = br.lines().collect(Collectors.joining("\n"));
}
}
return msg;
}
You can call it and save it in another string :
String str = getMyString();
System.out.println("str = " + str);
or you can build your string with the string builder.
StringBuilder sb = new StringBuilder(100);
sb.append("<html>")
.append("<body>").append(bodycontent)
.append("</body>")
.append("</html>");
String result = sb.toString();
Hope this is helpful. cheers
There is not so many things we can do when facing constant string too long error.
Constants are constrained to 64K elements per single String entry, but you can split your exceeding constant in a couple of smaller than 64K ones as a workaround.
In terms of software design, at the other hand, the idea of working with complete email bodies as just Strings is not ultimately perfect. Usually developers are using template engines for such purposes and do externalize email bodies to a separate files rather than String constants. Please see Chunk template engine as example which fits well into Android app, but there are lots of another such as Velocity or Freemarker. Template engine lets you clearly separate static email content from dynamically-populated pieces, separate your application data model from it's html representation, and maintain valid architecture of your software
Being not aware of exact reason which prevents you from making a file instead of constant, there are lots of best practices to solve typical file-connected issues starting from embedding files into your jar as resources and ending in encrypting them to avoid unwanted leakage. Just ask another question on SO here.
Try to use StringBuilder, e.g.:
StringBuilder msg = new StringBuilder(here_put_size); //change string size, default value is 16 characters
msg.append("<html>");
msg.append("<BODY CONTENT>");
msg.append("</html>");
return msg.toString();

How to verify that a URL parameter has not been tampered with?

Imagine I have a some web page implemented in Java, which is available at http://mycompany.com/page1.xhtml?trafficSource=someTrafficSourceIdentifier&checkSum=....
I want to keep track of how many page visits are generated by different traffic sources (such as different advertising campaigns). For this purpose, I have the trafficSource parameter. The set of traffic source IDs is limited to 5 possible values (e. g. Google organic, Google AdWords, YouTube, Facebook, Twitter).
Imagine, I want to make sure that trafficSource has not been tampered with (nobody has placed there a different value). For this purpose, I introduce the checkSum parameter.
What is the easiest way to create a check sum for a string X (checkSum = f(X)), such that different input strings are converted to different check sums?
The purpose of this measure is a basic check of my parameters.
I tried to use the following, but it produces a byte array with non-alphanumeric characters, which I can't put into the URL.
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class CheckSumGenerator {
public String calculateCheckSum(final String aInput) throws NoSuchAlgorithmException {
final MessageDigest md = MessageDigest.getInstance("SHA");
final byte[] checkSumBytes = md.digest(aInput.getBytes());
final String result = new String(checkSumBytes);
return result;
}
}
You can get the referrer on your server side from the http request header, which is not visible to the user. This could identify visitors coming from youtube or facebook.
To solve your problem of converting the hash to a alphanumeric string, you could convert the resulting bytes to hex code.
final String result = DatatypeConverter.printHexBinary(checkSumBytes);

Categories