using java to find and replace without markup - java

I'm fairly certain this is asked & answered, but I cant find an (that) answer, so I'll ask:
I want to use javas regex to find and replace. There is no markup involved (no, "${ImMarkup!} in the source string) and the value I wish to replace is contextualized (as in, I cant write a simple replace A with B).
Examples make everything easier, here's some sample code. This is the source string:
! locator's position P1(p1x,p1y),P2(p2x,p2y)
R,1,0.001,0.001,0.001,0.001, , ,
RMORE, , , ,
RMORE
RMORE, ,
ET,MPTEMP,,,,EX, x1=38000
x2 = 2345
MPTEMP,,,,,,,,
MPTEMP,1,0
MPDATA,EX,1,,38000*6894.75
my regex is
+(?<variableName>\w+) *= *-?(?<variableValue>\d+\.?\d*)
(note the space before the first plus)
I'm looking to replace that "x1=38000" with something like "x1=100", and the "x2 = 2345" with "x2 = 200"
With the output
...
RMORE, ,
ET,MPTEMP,,,,EX, x1=100
x2 = 200
MPTEMP,,,,,,,,
...
I've created a gist containing some semi-runnable code here (it uses some stuff from our commons code base, but its followable: https://gist.github.com/Groostav/acf5b584078813e7cbe6)
The code I've got is roughly
String regex = "+(?<variableName>\\w+) *= *-?(?<variableValue>\\d+\\.?\\d*)"
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(sourceText);
while(matcher.find()){
String variableName = matcher.group("variableName");
String existingValue = matcher.group("variableValue");
int newValue;
switch(variableName){
case "x1": newValue = 100; break;
case "x2": newValue = 200; break;
default: throw new IllegalStateException();
}
matcher.appendReplacement(output, "" + newValue);
}
matcher.appendTail(output);
The regex itself works: it captures the values I need, and I can access them through matcher.group("variableName") and matcher.group("variableValue"), the issue I'm having is writing a new value back to 'variableValue', with the java regex API.
In the above example, matcher.group("variableValue") doesnt persist any state, so I cant seem to specify to the appendReplacement() method that I dont want to replace the whole line, but rather simply the second capture group.
Worth metnioning, x1 and x2 are not hard-fast runtime names, so I cant simply cheese it and write separate find and replace strings for x1 and x2. I need the runtime \w+ to find the variable name.
So I can run another regex against the result of the first, this time only capturing the value, but thats not pretty, and it would require me to probably fudge index values around with our StringBuilder/Buffer, rather than that nice index-free call to matcher.appendTail.
PS: the langauge you see above is called the "ANSYS parametric design language (APDL)", and I cant find a grammar for the thing. If any of you guys know where one is, I'd hugely appreciate it.
thanks for reading.

You can use this regex:
(\w+\s*)=(\s*\d+)
Working demo
Check the substitution section. You can use the same approach to replace the content you want as I did using capturing group index.

My Hacky solution that seems to work is to manually traverse our parse tree, down the rhs, and replace the new value. This is annoying since it requires me to refactor my regex and do that manual work, but it does do the job, and I believe its reliable:
// semi-formally, APDL seems to define:
// AssignmentExpression -> QualifiedIdentifier = Expression
// QualifiedIdentifier -> SPACE+ Identifier SPACE*
// Expression -> SPACE* Value //Value is captured as "value"
// Identifier -> [A-Za-z0-9]* //Identifier is captured as "identifier"
// Value -> [0-9]* (DOT [0-9]*)?
private static final String rValue = "\\d+(\\.\\d*)?";
private static final String rIdentifier = "(?<identifier>\\w+)";
private static final String rQualifiedIdentifier = " +" + rIdentifier + " *";
private static final String rExpression = " *-?(?<value>" + rValue + ")";
private static final String rAssignmentExpression = rQualifiedIdentifier + "=" + rExpression;
#Test
public void when_scanning_using_our_regex(){
Pattern assignmentPattern = Pattern.compile(rAssignmentExpression);
Pattern rhsPattern = Pattern.compile("=" + rExpression);
Pattern valuePattern = Pattern.compile(rValue);
Matcher assignmentMatcher = assignmentPattern.matcher(sourceText);
StringBuffer output = new StringBuffer();
int newValue = 20;
while(assignmentMatcher.find()){
String assignment = assignmentMatcher.group();
Matcher rhsMatcher = rhsPattern.matcher(assignment);
assert rhsMatcher.find() : "couldn't find an RHS in an the assignment: '" + assignment + "'?";
String oldRhs = rhsMatcher.group();
Matcher valueMatcher = valuePattern.matcher(oldRhs);
assert valueMatcher.find() : "couldn't find a value in an RHS: '" + oldRhs + "'?";
String oldValue = valueMatcher.group();
String newRhs = oldRhs.replace(oldValue, "" + newValue);
String newAssignment = assignment.replace(oldRhs, newRhs);
assignmentMatcher.appendReplacement(output, "" + newAssignment);
}
assignmentMatcher.appendTail(output);
System.out.println(output.toString());
}

Related

how to parse string having functions and parameter, String is defined in well defined set of "("

My String is
String s="(decode(W_Employee_D_3.Fst_Name,NULL,"
+ "decode(W_Employee_D_3.Last_Name,NULL,"
+ "decode(W_Employee_D_3.Mid_Name,NULL,'emptyString','midnamevailable'),'lastnameavil'),"
+ "concat(CONCAT ( concat(W_Employee_D_3.last_Name, ' ,'),"
+ "W_Employee_D_3.Fst_Name ),W_Employee_D_3.Mid_Name)))";
I need to write some generalize logic which gives fun1=decode, fun2=decode, fun3=decode,fun4=concat,fun5=Concat,fun6=Concat and their respective parameter1, para2, para3 in any type of collection in Java.
Parameter are those which is passed in function,
for example
concat(W_Employee_D_3.last_Name, ' ,')
concat is function and parameters are W_Employee_D_3.last_Name & ','.
String can contains any number of function,parameter and can have different function also.
This looks like a mean kind of task. Maybe you must some ANTLR grammar?
I'll just offer an approach to the less tedious part of the work.
For such kind of nesting one would either
Parse from left to right using a stack, or
Use reducible expressions starting with the inner most found redexes, and building some result.
I use the latter, with a map to hold resulting structures, replaced in the string by some "variable."
We could introduce variables for
a string literal, as that could contain comma, parenthesis and other not to be parsed text,
a function call
If give a solution for a simplified case:
String s = "a(b(c),d(e),f,g(h(),i)";
// Variables are like "§0013" (4 digits)
Map<String, String> variables;
int maxVar = 0;
String expr = "(\\w+|§\\d{4})"; // Either simple term or var name.
Pattern callRedex = Pattern.compile("\\w+\\((" + expr + "(," + expr + ")*)?\\)");
boolean reduced;
do {
reduced = false;
Matcher m = callRedex.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String value = m.group();
String var = String.format("§%04d", maxVar++);
variables.put(var, value);
m.appendReplacement(sb, var);
reduced = true;
}
m.appendTail(sb);
s = sb.toString();
} while (reduced);
Now one has the function calls as variables. Their value contain variable names, and have again to be replaced.

Java String- How to get a part of package name in android?

Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.
As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);
You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag
All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];
private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}
I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");

Java how can remove everything between two substring in a string

I want to remove any substring(s) in a string that begins with 'galery' and ends with 'jssdk));'
For instance, consider the following string:
Galery something something.... jssdk));
I need an algorithm that removes 'something something....' and returns 'Galery jssdk));'
This is what I've done, but it does not work.
newsValues[1].replaceAll("Galery.*?jssdK));", "");
Could probably be improved, I've done it fast:
public static String replaceMatching(String input, String lowerBound, String upperBound{
Pattern p = Pattern.compile(".*?"+lowerBound+"(.*?)"+upperBound+".*?");
Matcher m = p.matcher(input);
String textToRemove = "";
while(m.find()){
textToRemove = m.group(1);
}
return input.replace(textToRemove, "");
}
UPDATE Thx for accepting the answer, but here is a smaller reviewed version:
public static String replaceMatching2(String input, String lowerBound, String upperBound){
String result = input.replaceAll("(.*?"+lowerBound + ")" + "(.*?)" + "(" + upperBound + ".*)", "$1$3");
return result;
}
The idea is pretty simple actually, split the String into 3 groups, and replace those 3 groups with the first and third, droping the second one.
You are almost there, but that will remove the entire string. If you want to remove anything between Galery and jssdK));, you will have to do something like so:
String newStr = newsValues[1].replaceAll("(Galery)(.*?)(jssdK\\)\\);)","$1$3");
This will put the strings into groups and will then use these groups to replace the entire string. Note that in regex syntax, the ) is a special character so it needs to be escaped.
String str = "GaleryABCDEFGjssdK));";
String newStr = str.replaceAll("(Galery)(.*?)(jssdK\\)\\);)","$1$3");
System.out.println(newStr);
This yields: GaleryjssdK));
I know that the solution presented by #amit is simpler, however, I thought it would be a good idea to show you a useful way in which you can use the replaceAll method.
Simplest solution will be to replace the string with just the "edges", effectively "removing" 1 everything between them.
newsValues[1].replaceAll("Galery.*?jssdK));", "GaleryjssdK));");
1: I used "" here because it is not exactly replacing - remember strings are immutable, so it is creating a new object, without the "removed" part.
newsValues[1] = newsValues[1].substring(0,6)+newsValues.substring(newsValues[1].length()-5,newsValues[1].length())
This basically concatenates the "Galery" and the "jssdk" leaving or ignoring everything else. More importantantly, you can simply assign newValues[1] = "Galeryjssdk"

Recursive replace with Java regular expression?

I can replace ABC(10,5) with (10)%(5) using:
replaceAll("ABC\\(([^,]*)\\,([^,]*)\\)", "($1)%($2)")
but I'm unable to figure out how to do it for ABC(ABC(20,2),5) or ABC(ABC(30,2),3+2).
If I'm able to convert to ((20)%(2))%5 how can I convert back to ABC(ABC(20,2),5)?
Thanks,
j
I am going to answer about the first question. I was not able to do the task in a single replaceAll. I don't think it is even achievable. However if I use loop then this should do the work for you:
String termString = "([0-9+\\-*/()%]*)";
String pattern = "ABC\\(" + termString + "\\," + termString + "\\)";
String [] strings = {"ABC(10,5)", "ABC(ABC(20,2),5)", "ABC(ABC(30,2),3+2)"};
for (String str : strings) {
while (true) {
String replaced = str.replaceAll(pattern, "($1)%($2)");
if (replaced.equals(str)) {
break;
}
str = replaced;
}
System.out.println(str);
}
I am assuming you are writing parser for numeric expressions, thus the definition of term termString = "([0-9+\\-*/()%]*)". It outputs this:
(10)%(5)
((20)%(2))%(5)
((30)%(2))%(3+2)
EDIT As per the OP request I add the code for decoding the strings. It is a bit more hacky than the forward scenario:
String [] encoded = {"(10)%(5)", "((20)%(2))%(5)", "((30)%(2))%(3+2)"};
String decodeTerm = "([0-9+\\-*ABC\\[\\],]*)";
String decodePattern = "\\(" + decodeTerm + "\\)%\\(" + decodeTerm + "\\)";
for (String str : encoded) {
while (true) {
String replaced = str.replaceAll(decodePattern, "ABC[$1,$2]");
if (replaced.equals(str)) {
break;
}
str = replaced;
}
str = str.replaceAll("\\[", "(");
str = str.replaceAll("\\]", ")");
System.out.println(str);
}
And the output is:
ABC(10,5)
ABC(ABC(20,2),5)
ABC(ABC(30,2),3+2)
You can start evaluating the inner most reducable expressions first, till no more redux exists. However you have to take care of other ,, ( and ). The solution of #BorisStrandjev is better, more bullet proof.
String infix(String expr) {
// Use place holders for '(' and ')' to use regex [^,()].
expr = expr.replaceAll("(?!ABC)\\(", "<<");
expr = expr.replaceAll("(?!ABC)\\)", ">>");
for (;;) {
String expr2 = expr.replaceAll("ABC\\(([^,()]*)\\,([^,()]*)\\)",
"<<$1>>%<<$2>>");
if (expr2 == expr)
break;
expr = expr2;
}
expr = expr.replaceAll("<<", ")");
expr = expr.replaceAll(">>", ")");
return expr;
}
You could use this Regular Expressions library https://github.com/florianingerl/com.florianingerl.util.regex , that also supports Recursive Regular Expressions.
Converting ABC(ABC(20,2),5) to ((20)%(2))%(5) looks like this:
Pattern pattern = Pattern.compile("(?<abc>ABC\\((?<arg1>(?:(?'abc')|[^,])+)\\,(?<arg2>(?:(?'abc')|[^)])+)\\))");
Matcher matcher = pattern.matcher("ABC(ABC(20,2),5)");
String replacement = matcher.replaceAll(new DefaultCaptureReplacer() {
#Override
public String replace(CaptureTreeNode node) {
if ("abc".equals(node.getGroupName())) {
return "(" + replace(node.getChildren().get(0)) + ")%(" + replace(node.getChildren().get(1)) + ")";
} else
return super.replace(node);
}
});
System.out.println(replacement);
assertEquals("((20)%(2))%(5)", replacement);
Converting back again, i.e. from ((20)%(2))%(5) to ABC(ABC(20,2),5) looks like this:
Pattern pattern = Pattern.compile("(?<fraction>(?<arg>\\(((?:(?'fraction')|[^)])+)\\))%(?'arg'))");
Matcher matcher = pattern.matcher("((20)%(2))%(5)");
String replacement = matcher.replaceAll(new DefaultCaptureReplacer() {
#Override
public String replace(CaptureTreeNode node) {
if ("fraction".equals(node.getGroupName())) {
return "ABC(" + replace(node.getChildren().get(0)) + "," + replace(node.getChildren().get(1)) + ")";
} else if ("arg".equals(node.getGroupName())) {
return replace(node.getChildren().get(0));
} else
return super.replace(node);
}
});
System.out.println(replacement);
assertEquals("ABC(ABC(20,2),5)", replacement);
You can try to rewrite the string using the Polish notation and then replace any % X Y with ABC(X,Y).
Here's the wiki link for the Polish notation.
The problem is that you need to find out which rewrite of ABC(X,Y) occurred first when you recursively replaced them in your string. The Polish notation is useful for "deciphering" the order that these rewrites occur and is widely used in expression evaluation.
You can do this by using a stack and recording which replace occurred first: find the inner-most set of parentheses, push only that expression onto the stack, then remove that from your string. When you want to reconstruct the expression original expression, just start at the top of the stack and apply the reverse transformation (X)%(Y) -> ABC(X,Y).
This is somewhat a form of the Polish notation, with the only difference being that you don't store the entire expression as a string, but rather store it in a stack for easier processing.
In short, when replacing, start with the inner-most terms (the ones that have no parentheses in them) and apply the reverse replace.
It may be helpful to use (X)%(Y) -> ABC{X,Y} as an intermediary rewrite rule, then rewrite the curly brackets as round brackets. This way it will be easier to determine which is the inner-most term, as the new terms won't use round brackets. Also it is easier to implement, but not as elegant.

Java Regex Matcher Problem with dynamic Strings

I have some problems with Regex in Java and dynamic input - No problems with Regex at all ;)
private static Pattern START_SUITE = Pattern.compile("Test Suite '(\\S+)'.*started at\\s+(.*)");
String line = "Test Suite '/a/long/path/to/some/file.octest(Tests)' started at 2011-07-09 08:01:34 +0000";
Matcher m = START_SUITE.matcher(line);
if (m.matches) {
//do something
}
This works fine with my test java application with the string above.
But when the String does come from an other source Matcher doesn't match it.
processHandler.addProcessListener(new ProcessAdapter() {
#Override
public void onTextAvailable(final ProcessEvent event, final Key outputType) {
try {
outputParser.myMatchStringFunction(event.getText());
}
...
}
public void myMatchStringFunction(String line) {
Matcher m = START_SUITE.matcher(line);
if (m.matches) {
...
I checked the String with printing and it looks ok.
Any ideas what could happen?
Whether the string came from a string literal or dynamically from input won't affect anything at all. So it's either something wrong with your regular expression, or something in your input that you weren't expecting and need to trim off.
You say you've printed the string - but it's easy to miss non-printable characters, or newlines etc.
I suggest you print a sample failing string out in full, including the Unicode character values, e.g.
for (int i = 0; i < text.length(); i++)
{
char c = text.charAt(i);
System.out.println("Position: " + i + "Character: " + c
+ " Unicode: " + (int) c);
}
Then you'll be able to put exactly that string into your code if you need to, and you'll probably be able to spot what's wrong just by inspecting it in that form.
Thanks for that hint.
Adding DOTALL and (.*) at the end of every pattern solved the problem
private static Pattern START_SUITE = Pattern.compile("Test Suite '(\\S+)'.*started at\\s+(.*)", Pattern.DOTALL);

Categories