I have an really odd behaviour of String.matches:
requestString.matches(".*")
(boolean) false
while requestString is something like
"HTTP/1.1 200 OK - OK
[...]
Content-Type: text/xml; Charset=iso-8859-1
Content-Length: 1545" + more...
Of cause, I want to test against "HTTP/\\d\\.\\d
but obviously this fails eighter:
requestString.matches("HTTP/\\d\\.\\d")
The String in requestString comes in via Socket connection and is send in iso-8859-1 encoding. Here is the code,
StringBuilder result = new StringBuilder();
int ch;
while ( ! timeoutExceeded() && (ch = reader.read()) != -1) {
result.append((char)ch);
}
String requestString = result.toString()
The code is running on android sdk.
What am I missing? Is the encoding the problem?
Solution:
thanks to the hints I tried the DotAll flag (again!) and it works:
requestString.matches("(?s).*HTTP/\\d\\.\\d.*")
First, see here.
Second, by default, the dot does not match newlines. As your input is multiline, this means the regex cannot match.
You have to use a Pattern and compile with Pattern.DOTALL:
final Pattern p = Pattern.compile(".*", Pattern.DOTALL);
p.matcher(anything).matches(); // always returns true
Illustration:
public static void main(final String... args)
{
final String input = "a\nb";
System.out.println(input.matches(".*"));
System.out.println(Pattern.compile(".*", Pattern.DOTALL)
.matcher(input).matches());
}
Result:
false
true
matches must match the entire string and since you are trying to match a multi-line string your pattern is not matching the complete string
eg.
System.out.println("HTTP/1.1 200 OK - OK".matches(".*")); //true
System.out.println("HTTP/1.1 200 OK - OK\nContent-Type: text/xml".matches(".*")); // false
Related
I am trying to open new email from my Java app:
String str=String.valueOf(email);
String body="This is body";
String subject="Hello worlds";
String newStr="mailto:"+str.trim()+"?subject="+URLEncoder.encode(subject,"UTF-8")+"&body="+URLEncoder.encode(body, "UTF-8")+"";
Desktop.getDesktop().mail(new URI(newStr));
Here it is my URLEncoding. As I cannot use body or subject string in URL without encoding them, my output here is with "+" instead of whitespace. Which is normal, I understand that. I was thinking if there is a way to visualize subject and body normally in my message? I tried with .replace("+"," ") but it is not working as it is giving an error. This is how it is now:
I think there might be different character set but I am not sure.
That's the way URLEncoder works.
One possible approach would be to replace all + with %20 after URLEncoder.enocde(...)
Or you could rely on URI constructor to encode your parameters correctly:
String scheme = "mailto";
String recipient = "recipient#snakeoil.com";
String subject = "The Meaning of Life";
String content = "..., the universe and all the rest is 42.\n Rly? Just kidding. Special characters: äöü";
String path = "";
String query = "subject=" + subject + "&body=" + content;
Desktop.getDesktop().mail(new URI(scheme, recipient, path, query, null));
Both solutions have issues:
In the first approach, you might replace actual + signs, with the second, you'll have issues with & character.
Payload:
2016-07-18 16:51:47 GMT 10.65.242.97 WinNT://CSLG1\mbr04105 CONNECT https stats.g.doubleclick.net 443 / - 1925 5148 0 173.194.206.156 c:infr default allow 12.3.33.9
Current Regex to parse an IP (it grabs the first IP right now)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
In order to grab the first and second addresses in one expression, anchor to the text line start and place two IP address pattern delimited by non-greedy fillers:
^.*?((?:\d{1,3}\.){3}\d{1,3}).*?((?:\d{1,3}\.){3}\d{1,3})
Demo: https://ideone.com/FlREEd
Use the Class Patter AND Matcher.
Code :
String yourString = ...
Pattern regular = Pattern.compile("your regex");
Matcher match = regular.matcher(yourString);
match.find()//to make the match
String firstIP = yourString.substring(match.start(),match.end());
String newString = yourString.substring(match.end(),yourString.length());
regular = Pattern.compile("your new regex for the second IP");
//if it is the same regex you can skip this.
match = regular.matcher(newString);
match.find()
String secondIP = newString.substring(match.start(),match.end());
after a recent findbugs (FB) run it complains about a: Security - HTTP Response splitting vulnerability The following code triggers it:
String referrer = req.getParameter("referrer");
if (referrer != null) {
launchURL += "&referrer="+(referrer);
}
resp.sendRedirect(launchURL);
Basically the 'referrer' http parameter contains an url, to which, when clicking on a back button in our application the browser returns to. It is appended to the url as a parameter. After a bit research i know that i need to sanitize the referrer url. After a bit more research i found the esapi project which seem to offer this kind of functionality:
//1st canonicalize
import org.owasp.esapi.Encoder;
import org.owasp.esapi.Validator;
import org.owasp.esapi.reference.DefaultEncoder;
import org.owasp.esapi.reference.DefaultValidator;
[...]
Encoder encoder = new DefaultEncoder(new ArrayList<String>());
String cReferrer = encoder.canonicalize(referrer);
However I didn't figure out how to detect e.g. jscript code or other stuff which doesn't belong to a referrer url. So how can I achieve that with esapi?
I tried:
Validator validator = new DefaultValidator(encoder);
validator.isValidInput("Redirect URL",referrer,"HTTPParameterValue",512,false);
however this doesn't work. What I need is a function which results in:
http://www.google.com (ok)
http://www.google.com/login?dest=http://google.com/%0D%0ALocation: javascript:%0D%0A%0D%0Aalert(document.cookie) (not ok)
Or is it enough to call the following statement?
encoder.encodeForHTMLAttribute(referrer);
Any help appreciated.
Here's my final solution if anyone is interested. First I canonicalize and then URL decode the string. If a CR or LF exists (\n \r) I just cut of the rest of that potential 'attack' string starting with \n or \r.
String sanitize(String url) throws EncodingException{
Encoder encoder = new DefaultEncoder(new ArrayList<String>());
//first canonicalize
String clean = encoder.canonicalize(url).trim();
//then url decode
clean = encoder.decodeFromURL(clean);
//detect and remove any existent \r\n == %0D%0A == CRLF to prevent HTTP Response Splitting
int idxR = clean.indexOf('\r');
int idxN = clean.indexOf('\n');
if(idxN >= 0 || idxR>=0){
if(idxN<idxR){
//just cut off the part after the LF
clean = clean.substring(0,idxN);
}
else{
//just cut off the part after the CR
clean = clean.substring(0,idxR);
}
}
//re-encode again
return encoder.encodeForURL(clean);
}
Theoretically i could have later verified the value against 'HTTPParameterValue' regex which is defined in the ESAPI.properties however it didn't like colon in the http:// and I didn't investigated further.
And one more remark after testing it: Most modern browser nowadays (Firefox > 3.6, Chrome, IE10 etc.) detect this kind of vulnerability and do not execute the code...
I think you have the right idea, but are using an inappropriate encoder. The Referer [sic] header value is really a URL, not an HTML attribute, so you really want to use:
encoder.encodeForURL(referrer);
-kevin
I would suggest white-listing approach wherein you check the referrer string only for permissible characters. Regex would be a good option.
EDIT:
The class org.owasp.esapi.reference.DefaultEncoder being used by you is not really encoding anything. Look at the source code of the method encodeForHTMLAttribute(referrer) here at grepcode. A typical URL encoding (encoding carriage return and line feed) too wont help.
So the way forward would be device some validation logic which checks for valid set of characters. Here is another insightful article.
The accepted answer will not work if in case there is "\n\r" in the string.
Example:
If I have string: "This is str\n\rstr", it returns "This is str\nstr"
Rectified version of above accepted answer is:
String sanitizeCarriageReturns(String value) {
int idxR = value.indexOf('\r');
int idxN = value.indexOf('\n');
if (idxN >= 0 || idxR >= 0) {
if ((idxN > idxR && idxR<0) || (idxR > idxN && idxR>=0)) {
value = value.substring(0, idxN);
} else if (idxN < idxR){
value = value.substring(0, idxR);
}
}
return value;
}
I need to get a file name from file's absolute path (I am aware of the file.getName() method, but I cannot use it here).
EDIT: I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path). I need the part of file's path AFTER certain path provided.
Let's say the file is located in the folder:
C:\Users\someUser
On windows machine, if I make a pattern string as follows:
String patternStr = "C:\\Users\\someUser\\(.*+)";
I get an exception: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence for backslash.
If I use Pattern.quote(File.pathSeparator):
String patternStr = "C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) + "someUser" + Pattern.quote(File.separator) + "(.*+)";
the resulting pattern string is: C:\Q;\EUsers\Q;\EsomeUser\Q;\E(.*+) which of course has no match with the actual fileName "C:\Users\someUser\myFile.txt".
What am I missing here? What is the proper way to parse file name?
What is the proper way to parse file name?
The proper way to parse a file name is to use File(String). Using a regex for this is going to hard-wire platform dependencies into your code. That's a bad idea.
I know you said you can't use File.getName() ... but that is the proper solution. If you would care to say why you can't use File.getName() perhaps I could suggest an alternative solution.
If you indeed want to use a regular expressions, you should use
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
^^ ^^ ^^
instead.
Why? Your string literal
"C:\\Users\\someUser\\(.*+)"
is compiled to
C:\Users\someUser\(.*+)
Since \ is used for escaping in regular expressions too, you'll have to escape them "twice".
Regarding your edit:
You probably want to have a look at URI.relativize(). Example:
File base = new File("C:/Users/someUser");
File file = new File("C:/Users/someUser/someDir/someFile.txt");
String relativePath = base.toURI().relativize(file.toURI()).getPath();
System.out.println(relativePath); // prints "someDir/someFile.txt"
(Note that / works as file-separator on Windows machines too.)
Btw, I don't know what you have as File.separator on your system, but if it's set to \, then
"C:" + Pattern.quote(File.separator) + "Users" + Pattern.quote(File.separator) +
"someUser" + Pattern.quote(File.separator) + "(.*+)";
should yield
C:\Q\\EUsers\Q\\EsomeUser\Q\\E(.*+)
String patternStr = "C:\\Users\\someUser\\(.*+)";
Backslashes (\) are escape characters in the Java Language. Your string contains the following after compilation:
C:\Users\someUser\(.*+)
This string is then parsed as a regex, which uses backslashes as an escape character as well. The regex parser tries to understand the escaped \U, \s and \(. One of them is incorrect regarding the regex syntax (hence your exception), and none of them are what you are trying to achieve.
Try
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
If you want to solve it by pattern you need to escape your Pattern properly
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
Try putting double-double-backslashes in your pattern. You need a second backslash to escape one in the patter, plus you'll need to double each one to escape them in the string. Hence you'll end up with something like:
String patternStr = "C:\\\\Users\\\\someUser\\\\(.*+)";
Move from end of string to first occurrence of file path separator* or begin.
File paths separator can be / or \.
public static final char ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR = '/';
public static final char DIRECTORY_SEPARATOR_CHAR = '\\';
public static final char VOLUME_SEPARATOR_CHAR = ':';
public static String getFileName(String path) {
if(path == null || path.isEmpty()) {
return path;
}
int length = path.length();
int index = length;
while(--index >= 0) {
char c = path.charAt(index);
if(c == ALTERNATIVE_DIRECTORY_SEPARATOR_CHAR || c == DIRECTORY_SEPARATOR_CHAR || c == VOLUME_SEPARATOR_CHAR) {
return path.substring(index + 1, length);
}
}
return path;
}
Try to keep it simple ;-).
Try this :
String ResultString = null;
try {
Pattern regex = Pattern.compile("([^\\\\/:*?\"<>|\r\n]+$)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Output :
myFile.txt
Also for input : C:/Users/someUser/myFile.txt
Output : myFile.txt
What am I missing here? What is the proper way to parse file name?
The proper way to parse a file name is to use the APIs that are already provided for the purpose. You've stated that you can't use File.getName(), without explanation. You are almost certainly mistaken about that.
I cannot use file.getName() because I don't need the file name only; I need the part of the file's path as well (but again, not the entire absoulte path).
OK. So what you want is something like this.
// Canonicalize paths to deal with ".", "..", symlinks,
// relative files and case sensitivity issues.
String directory = new File(someDirectory).canonicalPath();
String test = new File(somePathname).canonicalPath();
if (!directory.endsWith(File.separator)) {
directory += File.separator;
}
if (test.startsWith(directory)) {
String pathInDirectory = test.substring(directory.length()):
...
}
Advantages:
No regexes needed.
Doesn't break if the path separator is something other than \.
Doesn't break if there are symbolic links on the path.
Doesn't break due to case sensitivity issues.
Suppose the file name has special characters, specially when supporting MAC where special characters are allowing in filenames, server side Path.GetFileName(fileName) fails and throws error because of illegal characters in path. The following code using regex come for the rescue.
The following regex take care of 2 things
In IE, when file is uploaded, the file path contains folders aswell (i.e. c:\samplefolder\subfolder\sample.xls). Expression below will replace all folders with empty string and retain the file name
When used in Mac, filename is the only thing supplied as its safari browser and allows special chars in file name
var regExpDir = #"(^[\w]:\\)([\w].+\w\\)";
var fileName = Regex.Replace(fileName, regExpDir, string.Empty);
I am trying to do some stuff with replacing String containing some URL to a browser compatible linked URL.
My initial String looks like this :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
What I want to get is a String looking like :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
I can catch URL with this code line :
String withUrlString = myString.replaceAll(".*://[^<>[:space:]]+[[:alnum:]/]", "HereWasAnURL");
Maybe the regexp expression needs some correction, but it's working fine, need to test in further time.
So the question is how to keep the expression catched by the regexp and just add a what's needed to create the link : catched string
Thanks in advance for your interest and responses !
Try to use:
myString.replaceAll("(.*://[^<>[:space:]]+[[:alnum:]/])", "HereWasAnURL");
I didn't check your regex.
By using () you can create groups. The $1 indicates the group index.
$1 will replace the url.
I asked a simalir question: my question
Some exemples: Capturing Text in a Group in a regular expression
public static String textToHtmlConvertingURLsToLinks(String text) {
if (text == null) {
return text;
}
String escapedText = HtmlUtils.htmlEscape(text);
return escapedText.replaceAll("(\\A|\\s)((http|https|ftp|mailto):\\S+)(\\s|\\z)",
"$1$2$4");
}
There may be better REGEXs out there, but this does the trick as long as there is white space after the end of the URL or the URL is at the end of the text. This particular implementation also uses org.springframework.web.util.HtmlUtils to escape any other HTML that may have been entered.
For anybody who is searching a more robust solution I can suggest the Twitter Text Libraries.
Replacing the URLs with this library works like this:
new Autolink().autolink(plainText)
Belows code replaces links starting with "http" or "https", links starting just with "www." and finally replaces also email links.
Pattern httpLinkPattern = Pattern.compile("(http[s]?)://(www\\.)?([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern wwwLinkPattern = Pattern.compile("(?<!http[s]?://)(www\\.+)([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern mailAddressPattern = Pattern.compile("[\\S&&[^#]]+#([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
String textWithHttpLinksEnabled =
"ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda";
if (Objects.nonNull(textWithHttpLinksEnabled)) {
Matcher httpLinksMatcher = httpLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = httpLinksMatcher.replaceAll("$0");
final Matcher wwwLinksMatcher = wwwLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = wwwLinksMatcher.replaceAll("$0");
final Matcher mailLinksMatcher = mailAddressPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = mailLinksMatcher.replaceAll("$0");
System.out.println(textWithHttpLinksEnabled);
}
Prints:
ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda
Assuming your regex works to capture the correct info, you can use backreferences in your substitution. See the Java regexp tutorial.
In that case, you'd do
myString.replaceAll(....., "\1")
In case of multiline text you can use this:
text.replaceAll("(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)",
"$1<a href='$2'>$2</a>$4");
And here is full example of my code where I need to show user's posts with urls in it:
private static final Pattern urlPattern = Pattern.compile(
"(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)");
String userText = ""; // user content from db
String replacedValue = HtmlUtils.htmlEscape(userText);
replacedValue = urlPattern.matcher(replacedValue).replaceAll("$1$2$4");
replacedValue = StringUtils.replace(replacedValue, "\n", "<br>");
System.out.println(replacedValue);