Remove backslash before forward slash - java

Context: GoogleBooks API returing unexpected thumbnail url
Ok so i found the reason for the problem i had in that question
what i found was the returned url from the googlebooks api was something like this:
http:/\/books.google.com\/books\/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api
Going to that url would return a error, but if i replaced the "\ /"s with "/" it would return the proper url
is there something like a java/kotlin regex that would change this http:/\/books.google.com\/ to this http://books.google.com/
(i know a bit of regex in python but I'm clueless in java/kotlin)
thank you

You can use triple-quoted string literals (that act as raw string literals where backslashes are treated as literal chars and not part of string escape sequences) + kotlin.text.replace:
val text = """http:/\/books.google.com\/books\/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api"""
print(text.replace("""\/""", "/"))
Output:
http://books.google.com/books/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api
See the Kotlin demo.
NOTE: you will need to double the backslashes in the regular string literal:
print(text.replace("\\/", "/"))
If you need to use this "backslash + slash" pattern in a regex you will need 2 backslashes in the triple-quoted string literal and 4 backslashes in a regular string literal:
print(text.replace("""\\/""".toRegex(), "/"))
print(text.replace("\\\\/".toRegex(), "/"))
NOTE: There is no need to escape / forward slash in a Kotlin regex declaration as it is not a special regex metacharacter and Kotlin regexps are defined with string literals, not regex literals, and thus do not need regex delimiters (/ is often used as a regex delimiter char in environments that support this notation).

You could match the protocol, and then replace the backslash followed by a forward slash by a forward slash only
https?:\\?/\\?/\S+
Pattern in Java
String regex = "https?:\\\\?/\\\\?/\\S+";
Java demo | regex demo
For example in Java:
String regex = "https?:\\\\?/\\\\?/\\S+";
String string = "http:/\\/books.google.com\\/books\\/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api";
if(string.matches(regex)) {
System.out.println(string.replace("\\/", "/"));
}
}
Output
http://books.google.com/books/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api

I had same problem and my url was:
String url="https:\\/\\/www.dailymotion.com\\/cdn\\/H264-320x240\\/video\\/x83iqpl.mp4?sec=zaJEh8Q2ahOorzbKJTOI7b5FX3QT8OXSbnjpCAnNyUWNHl1kqXq0D9F8iLMFJ0ocg120B-dMbEE5kDQJN4hYIA";
I solved it with this code:
replace("\\/", "/");

Related

Unable to strip invalid unicode characters java

I have my data which needs to be cleaned up before further processing in various other applications. In this process one of the downstream applications only allows a certain range of Unicode characters. The following is the regex I'm using to strip out the invalid Unicode characters.
/[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]/
However, I'm still having issues getting the regex to work in Java. Is there a special way to treat the above regex, since it contains a range of Unicode characters?
UPDATE:
This is how I tested the same and didn't seem to get it to work with the way suggested by #Andreas :
public void testStripUnicode() {
String doc = "{\"fields\":{\"field1\":\"unicode char '\\u000b'\",\"field2\":[\"unicode char '\\u0003'\"]}}";
String stripped = DocumentCleaner.clean(doc);
System.out.println(doc);
System.out.println(stripped);
}
doc
{"fields":{"field1":"unicode char '\u000b'","field2":["unicode char '\u0003'"]}}
stripped-doc
{"fields":{"field1":"unicode char '\u000b'","field2":["unicode char '\u0003'"]}}
Should be fine, just drop the slashes / and double the backslashes \:
String regex = "[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]";
String stripped = value.replaceAll(regex, "");
Or if you do it repeatedly, you can parse the regular expression once, up front:
// Prepare regular expression
Pattern p = Pattern.compile("[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]");
// Use regular expression
String stripped = p.matcher(value).replaceAll("");

how to replace \$ with $ in regex java

I have the following String
String valueExpression = "value1\\$\\$bla";
i would like it to be parsed to:
value1$$bla
when I try to do:
valueExpression.replaceAll("\\\\$", "\\$");
I get it the same, and when I try to do:
valueExpression.replaceAll("\\$", "$");
I get an error IndexOutOfBound
How can I replace it in regex?
The string is dynamic so I can't change the content of the string valueExpression to something static.
Thanks
valueExpression.replaceAll("\\\\[$]", "\\$"); should achieve what you are looking for.
Simplest approach seems to be
valueExpression.replace("\\$", "$")
which is similar to
valueExpression.replaceAll(Pattern.quote("\\$"), Matcher.quoteReplacement("$"))
which means that it automatically escapes all regex matacharacters from both parts (target and replacement) letting you use simple literals.
BTW lets not forget that String is immutable so its methods like replace can't change its state (can't change characters it stores) but will create new String with replaced characters.
So you want to use
valueExpression = valueExpression.replace("\\$", "$");
Example:
String valueExpression = "value1\\$\\$bla";
System.out.println(valueExpression.replace("\\$", "$"));
output: value1$$bla
You want a string containing \\\$ (double backslash to get a literal backslash, and a backslash to escape the $). To write that quoted as a string in Java you should escape each backslash with another backslash. So you would write that as "\\\\\\$".
Ie.
valueExpression.replaceAll("\\\\\\$", "\\$");
Either use direct string replacement:
valueExpression.replace("\\$", "$");
or you need to escape group reference in the replacement string:
valueExpression.replaceAll("\\$", "\\$");

Regex java from javascript

Hello I have this regex in Javascript :
var urlregex = new RegExp("((www.)(([a-zA-Z0-9-]){2,}\.){1,4}([a-zA-Z]){2,6}(\/([a-zA-Z-_\/\.0-9#:?=&;,]*)?)?)");
And when I try to put it on a Java String I have this error :
Description Resource Path Location Type
Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ ) CreateActivity.java /SupLink/src/com/supinfo/suplink/activities line 43 Java Problem
So I just want to know what I have to change to render it in Java in order to do this(this function runs fine) :
private boolean IsMatch(String s, String pattern)
{
try {
Pattern patt = Pattern.compile(pattern);
Matcher matcher = patt.matcher(s);
return matcher.matches();
} catch (PatternSyntaxException e){
return false;
}
}
EDIT :
Thank you for your help, now I have this :
private String regex = "((www.)(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_\\/\\.0-9#:?=&;,]*)?)?)";
But I don't match what I really want (regex are horrible ^^), I would like to match thes types of urls :
www.something.com
something.com
something.com/anotherthing/anything
Can you help me again ?
really thanks
When you create the Java string, you need to escape the backslashes a second time so that Java understands that they are literal backslashes. You can replace all existing backslashes with \\. You also need to escape any Java characters that normally need to be escaped.
Currently your regex require www at start. If you want to make it optional add ? after (www.). Also you probably want to escape . after www part. Your regex should probably look like.
"((www\\.)?(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_\\/\\.0-9#:?=&;,]*)?)?)"
You should scape \
something like this
"((www.)(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_\\/\\.0-9#:?=&;,]*)?)?)"

How replaceAll "/" in java String to format URL?

String url = "d://test////hello\\\hello";
String separator = File.separator;
url = url.replaceAll("\\*", separator);
url = url.replaceAll("/+", separator);
I want to format those url, but error occurs when i attempt to use replaceAll("/+", separator). and i attempt to escaped "/" as "\\/", it still doesn't work..
This is the Exception from console:
Exception in thread "main" **java.lang.StringIndexOutOfBoundsException**: String index out of range: 1
at java.lang.String.charAt(String.java:686)
at java.util.regex.Matcher.appendReplacement(Matcher.java:703)
at java.util.regex.Matcher.replaceAll(Matcher.java:813)
at java.lang.String.replaceAll(String.java:2189)
Now it works
String separator = null;
if(File.separator.equals("/")) {
separator = "/";
url = url.replaceAll("/+", separator);
url = url.replaceAll("\\\\+", separator);
} else {
separator = Matcher.quoteReplacement(File.separator);
url = url.replaceAll("/+", separator);
url = url.replaceAll("\\+", separator);
}
:) it works in javascript
var i = "d:\\ad////df";
alert(i.replace(/\/+/g, '\\'));
Your platform is Windows right? So File.separator will be a backslash right?
The explanation is that the 2nd argument of String.replaceAll is not a simple String. Rather it is a replacement pattern ...
The javadoc says:
"Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired. "
So your replacement String that consists of a single backslash is an invalid literal replacement string. You need to quote the separator String ... like the javadoc says.
(It is a little surprising that you get that particular exception. I can imagine how it could happen, but I'd have thought that they'd deal with bad escapes more elegantly. Mind you, if this was reported as a "bug", Oracle would probably not fix it. A fix would break backwards compatibility.)
Try:
url = url.replaceAll("\\\\+", separator);
You need 4 backward slashes. Escape once for Java string and once for regex meta-character. That is for regex you need two backward slashes \\, and in string you need to escape both of them with another two.
Also, the quantifier * means zero or more, you need to use +.

How Java regex match for non-subsequence string?

Example:
String url = "http://www.google.com/abcd.jpg";
String url2 = "http://www.google.com/abcd.jpg_xxx.jpg";
I want to match "http://www.google.com/abcd" whatever url or url2.
I write a regex:
(http://.*\.com/[^(.jpg)]*).jpg
But [^(.jpg)]* doesn't like correct. What the regex should be?
Forward slash need to be escaped as well. Use this regex:
^(http:\/\/.+?\.com\/[^.]+)\.jpg
Live Demo
Reluctant quantifier .*? matches to first ".jpg":
(http:\/\/.*\.com\/.*?)\.jpg.*

Categories