Java regex for wrapping everything but some given character - java

I have a String such as
somet3x70rnumb3r5.3.1*#:ch4r5*
I need to wrap everything that isn't *, star character, with a Pattern Quote \Q...\E and replace the * with .*. It should give this:
\Qsomet3x70rnumb3r5.3.1\E.*\Q#:ch4r5\E.*
I can do this with string traversal, splitting on * (or any character I specify), and building a string step by step, but I'd like to do use regexes and Pattern class utilities if possible.
Another example with specified character ? which would be replaced by .:
123?4?
should give
\Q123\E.\Q4\E.
I was thinking of using groups, but I need groups around every zone because each has to be either wrapped or replaced by another character.
My goal is to create a Pattern String from a given String but only consider the areas matching the specified character and ignoring the rest (even if it contains regex patterns).

Something like this?
String s = "abc*efg?123";
s = s.replaceAll("([^\\*\\?]+)", "\\\\Q$1\\\\E");
s = s.replaceAll("\\*", ".*");
s = s.replaceAll("\\?", ".");
Results in \Qabc\E.*\Qefg\E.\Q123\E

It'll be simpler if you don't worry about building a one-liner. A one-liner is probably possible, but it will be a pain. Instead, I suggest you do something like this:
str = str.replaceAll("(?<!^)\\*(?!$)", "\\E.*\\Q")
.replaceAll("(?<!^)\\?(?!$)", "\\E.\\Q");
str = "\\Q" + str + "\\E";
Simpler to write, and much easier to understand.

Related

Regex function to find specific depth in recursive

I have the following scenario where I am supposed to use regex (Java/PCRE) on a line of code and strip off certain defined function and only strong the value of that function like in example below:
Input
ArrayNew(1) = adjustalpha(shadowcolor, CInt(Math.Truncate (ObjectToNumber (Me.bezierviewshadow.getTag))))
Output : Replace Regex
ArrayNew(1) = adjustalpha(shadowcolor, Me.bezierviewshadow.getTag)
Here CInt, Math.Truncate, and ObjectToNumber is removed retaining on output as shown above
The functions CInt, Math.Truncate keep on changing to CStr or Math.Random etc etc so regex query can not be hardcoded.
I tried a lot of options on stackoverflow but most did not work.
Also it would be nice if the query is customizable like Cint returns everything function CInt refers to. ( find a text then everything between first ( and ) ignoring balanced parenthesis pairs in between.
I know it's not pretty, but it's your fault to use raw regex for this :)
#Test
void unwrapCIntCall() {
String input = "ArrayNew(1) = adjustalpha(shadowcolor, CInt(Math.Truncate (ObjectToNumber (Me.bezierviewshadow.getTag))))";
String expectedOutput = "ArrayNew(1) = adjustalpha(shadowcolor, Me.bezierviewshadow.getTag)";
String output = input.replaceAll("CInt\\s*\\(\\s*Math\\.Truncate\\s*\\(\\s*ObjectToNumber\\s*\\(\\s*(.*)\\s*\\)\\s*\\)\\s*\\)", "$1");
assertEquals(expectedOutput, output);
}
Now some explanation; the \\s* parts allow any number of any whitespace character, where they are. In the pattern, I used (.*) in the middle, which means I match anything there, but it's fine*. I used (.*) instead of .* so that particular section gets captured as capturing group $1 (because $0 is always the whole match). The interesting part being captured, I can refer them in the replacement string.
*as long as you don't have multiple of such assignments within one string. Otherwise, you should break up the string into parts which contain only one such assignment and apply this replacement for each of those strings. Or, try (.*?) instead of (.*), it compiles for me - AFAIK that makes the .* match as few characters as possible.
If the methods actually being called vary, then replace their names in the regex with the variation you expect, like replace CInt with (?CInt|CStr), Math\\.Truncate with Math\\.(?Truncate|Random) etc. (Using (? instead of ( makes that group non-capturing, so they won't take up $1, $2, etc. slots).
If that gets too complicated, than you should really think whether you really want to do it with regex, or whether it'd be easier to just write a relatively longer function with plain string methods, like indexOf and substring :)
Bonus; if absolutely everything varies, but the call depth, then you might try this one:
String output = input.replaceAll("[\\w\\d.]+\\s*\\(\\s*[\\w\\d.]+\\s*\\(\\s*[\\w\\d.]+\\s*\\(\\s*(.*)\\s*\\)\\s*\\)\\s*\\)", "$1");
Yes, it's definitely a nightmare to read, but as far as I understand, you are after this monster :)
You can use ([^()]*) instead of (.*) to prevent deeper nested expressions. Note, that fine control of depth is a real weakness of everyday regular expressions.

java String.replaceAll char between two numbers

I would like to replace all char '-' that between two numbers, or that between number and '.' by char '&'.For example
String input= "2.1(-7-11.3)-12.1*-2.3-.11"
String output= "2.1(-7&11.3)-12.1*-2.3&.11"
I have something like this, but I try to do it easier.
public void preperString(String input) {
input=input.replaceAll(" ","");
input=input.replaceAll(",",".");
input=input.replaceAll("-","&");
input=input.replaceAll("\\(&","\\(-");
input=input.replaceAll("\\[&","\\[-");
input=input.replaceAll("\\+&","\\+-");
input=input.replaceAll("\\*&","\\*-");
input=input.replaceAll("/&","/-");
input=input.replaceAll("\\^&","\\^-");
input=input.replaceAll("&&","&-");
input=input.replaceFirst("^&","-");
for (String s :input.split("[^.\\-\\d]")) {
if (!s.equals(""))
numbers.add(Double.parseDouble(s));
}
You can make it in one shot using groups of regex to solve your problem, you can use this :
String input = "2.1(-7-11.3)-12.1*-2.3-.11";
input = input.replaceAll("([\\d.])-([\\d.])", "$1&$2");
Output
2.1(-7&11.3)-12.1*-2.3&.11
([\\d.])-([\\d.])
// ^------------replace the hyphen(-) that it between
// ^__________^--------two number(\d)
// ^_^______^_^------or between number(\d) and dot(.)
regex demo
Let me guess. You don't really have a use for & here; you're just trying to replace certain minus signs with & so that they won't interfere with the split that you're trying to use to find all the numbers (so that the split doesn't return "-7-11" as one of the array elements, in your original example). Is that correct?
If my guess is right, then the correct answer is: don't use split. It is the wrong tool for the job. The purpose of split is to split up a string by looking for delimiter patterns (such as a sequence of whitespace or a comma); but where the format of the elements between the delimiters doesn't much matter. In your case, though, you are looking for elements of a particular numeric format (it might start with -, and otherwise will have at least one digit and at most one period; I don't know what your exact requirements are). In this case, instead of split, the right way to do this is to create a regular expression for the pattern you want your numbers to have, and then use m.find in a loop (where m is a Matcher) to get all your numbers.
If you need to treat some - characters differently (e.g. in -7-11, where you want the second - to be an operator and not part of -11), then you can make special checks for that in your loop, and skip over the - signs that you know you want to treat as operators.
It's simpler, readers will understand what you're trying to do, and it's less error-prone because all you have to do is make sure your pattern for expressing numbers accurately reflects what you're looking for.
It's common for newer Java programmers to think regexes and split are magic tools that can solve everything. But often the result ends up being too complex (code uses overly complicated regexes, or relies on trickery like having to replace characters with & temporarily). I cannot look at your original code and convince myself that it works right. It's not worth it.
You can use lookahead and lookbehind to match digit or dot:
input.replaceAll("(?<=[\\d\\.])-(?=[\\d\\.])","&")
Have a look on this fiddle.

String replace all special characters in the end

Am very poor in regex, so please bear with me.
I have strings LQiW0/QIDAQAB/ and LQiW0/QIDAQAdfB/.
I'm trying to remove the last forward slash.
Tried str= str.replaceAll("\\/","");
I tried replace all but it replaces all forward slashes.. and the thing is, I want to replace if it is at last position
Try following code:
str = str.replaceAll("\\/$", "");
$ means end of line (in this case, end of string).
Do you really need regex? A simple substring will do the job:
str = str.substring(0, str.lastIndexOf("/"));
But, if you want to replace the forward slash only if it is the end of the string, then replaceAll would be good there.
But you can also use this (This might not be more readable compared to replaceAll):
str = str.endsWith("/") ? str.substring(0, str.length() - 1) : str;
It's better not to use regex replacements for these trivial operations. People tend to use regular expressions all the time even when they are not needed. Also, regular expressions can be very straight forward but get ugly pretty fast when you need to cover some side cases. See https://softwareengineering.stackexchange.com/questions/113237/when-you-should-not-use-regular-expressions
In your case there's a good tool for the job.
You can use org.apache.commons.lang.StringUtils
StringUtils.stripEnd("LQiW0/QIDAQAdfB/", "/") = "LQiW0/QIDAQAdfB"
StringUtils.stripEnd("LQiW0/QIDAQAdfB///", "/") = "LQiW0/QIDAQAdfB"
StringUtils.stripStart("///LQiW0/QIDAQAdfB/", "/") = "LQiW0/QIDAQAdfB/"
StringUtils.stripStart("///LQiW0/QIDAQAdfB///", "/") = "LQiW0/QIDAQAdfB///"
str = str.replaceAll(#"\/(?=\n)", "");
This should match a forward slash that is followed by a new line.
If you are going to have strings like LQiW0/QIDAQAB/Sdf4s and you want to remove the last / to obtain LQiW0/QIDAQABSdf4s, then this will work.
str = str.substring(0,str.lastIndexOf('/'))+str.substring(str.lastIndexOf('/')+1);
It will also work for cases with the last character /.

Replace a word that is not on a string

I'm trying to replace a word in a file whenever it appears except when it is contained in a string:
So I should replace this in
The test in this line consists in ...
But should not match in :
The test "in this line" consist in ...
This is what I'm trying:
line.replaceAll( "\\s+this\\s+", " that ")
But it fails with this scenario so I tried using:
line.replaceAll( "[^\"]\\s+this\\s+", " that ")
But doesn't work either.
Any help would be appreciated
This seems to work (in so far as I understand your requirements from the examples provided):
(?!.*\s+this\s+.*\")\s+this\s+
http://rubular.com/r/jZvR4XEbRf
You may need to adjust the escaping for java.
This is a bit better actually:
(?!\".*\s+this\s+)(?!\s+this\s+.*\")\s+this\s+
The only reliable way to do this is to search for EITHER a complete, quoted sequence OR the search term. You do this with one regex, and after each match you determine which one you matched. If it's the search term, you replace it; otherwise you leave it alone.
That means you can't use replaceAll(). Instead you have to use the appendReplacement() and appendTail() methods like replaceAll() itself does. Here's an example:
String s = "Replace this example. Don't replace \"this example.\" Replace this example.";
System.out.println(s);
Pattern p = Pattern.compile("\"[^\"]*\"|(\\bexample\\b)");
Matcher m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find())
{
if (m.start(1) != -1)
{
m.appendReplacement(sb, "REPLACE");
}
}
m.appendTail(sb);
System.out.println(sb.toString());
output:
Replace this example. Don't replace "this example." Replace this example.
Replace this REPLACE. Don't replace "this example." Replace this REPLACE.
See demo online
I'm assuming every quotation mark is significant and they can't be escaped--in other words, that you're working with prose, not source code. Escaped quotes can be dealt with, but it greatly complicates the regex.
If you really must use replaceAll(), there is a trick where you use a lookahead to assert that the match is followed by an even number of quotes. But it's really ugly, and for large texts you might find it prohibitively expensive, performance-wise.

Parsing quoted text in java

Is there an easy way to parse quoted text as a string to java? I have this lines like this to parse:
author="Tolkien, J.R.R." title="The Lord of the Rings"
publisher="George Allen & Unwin" year=1954
and all I want is Tolkien, J.R.R.,The Lord of the Rings,George Allen & Unwin, 1954 as strings.
You could either use a regex like
"(.+)"
It will match any character between quotes. In Java would be:
Pattern p = Pattern.compile("\\"(.+)\\"";
Matcher m = p.matcher("author=\"Tolkien, J.R.R.\"");
while(matcher.find()){
System.out.println(m.group(1));
}
Note that group(1) is used, this is the second match, the first one, group(0), is the full string with quotes
Offcourse you could also use a substring to select everything except the first and last char:
String quoted = "author=\"Tolkien, J.R.R.\"";
String unquoted;
if(quoted.indexOf("\"") == 0 && quoted.lastIndexOf("\"")==quoted.length()-1){
unquoted = quoted.substring(1, quoted.lenght()-1);
}else{
unquoted = quoted;
}
There are some fancy pattern regex nonsense things that fancy people and fancy programmers like to use.
I like to use String.split(). It's a simple function and does what you need it to do.
So if I have a String word: "hello" and I want to take out "hello", I can simply do this:
myStr = string.split("\"")[1];
This will cut the string into bits based on the quote marks.
If I want to be more specific, I can do
myStr = string.split("word: \"")[1].split("\"")[0];
That way I cut it with word: " and "
Of course, you run into problems if word: " is repeated twice, which is what patterns are for. I don't think you'll have to deal with that problem for your specific question.
Also, be cautious around characters like . and . Split uses regex, so those characters will trigger funny behavior. I think that "\\" = \ will escape those funny rules. Someone correct me if I'm wrong.
Best of luck!
Can you presume your document is well-formed and does not contain syntax errors? If so, you are simply interested in every other token after using String.split().
If you need something more robust, you may need to use the Scanner class (or a StringBuffer and a for loop ;-)) to pick out the valid tokens, taking into account additional criterion beyond "I saw a quotation mark somewhere".
For example, some reasons you might need a more robust solution than splitting the string blindly on quotation marks: perhaps its only a valid token if the quotation mark starting it comes immediately after an equals sign. Or perhaps you do need to handle values that are not quoted as well as quoted ones? Will \" need to be handled as an escaped quotation mark, or does that count as the end of the string. Can it have either single or double quotes (eg: html) or will it always be correctly formatted with double quotes?
One robust way would be to think like a compiler and use a Java based Lexer (such as JFlex), but that might be overkill for what you need.
If you prefer a low-level approach, you could iterate through your input stream character by character using a while loop, and when you see an =" start copying the characters into a StringBuffer until you find another non-escaped ", either concatenating to the various wanted parsed values or adding them to a List of some sort (depending on what you plan to do with your data). Then continue reading until you encounter your start token (eg: =") again, and repeat.

Categories