Automated conversion from string concatenation to formatted arguments - java

Our code is littered with things like,
Log.d("Hello there " + x + ", I see your are " + y + " years old!");
I want to be able to script the conversion to something like this,
Log.d("Hello there %s, I see your are %d years old!", x, y);
(Note: I'm not worried about getting the right argument type now. I could pre-process the file to determine the types, or convert to always use strings. Not my concern right now.)
I am wondering if anyone has tackled this. I came up with these regexs for pulling out the static and variable parts of the strings,
static final Pattern P1 = Pattern.compile("\\s*(\".*?\")\\s*");
static final Pattern P2 = Pattern.compile("\\s*\\+?\\s*([^\\+]+)\\s*\\+?\\s*");
By looping on find() for each I can pull out the parts,
"Hello there "
", I see your are "
"years old!"
and,
x
y
But I can't come up with a good way to piece these back together, considering all the possibilities of how they might be concatenated together.
Maybe this is the wrong approach. Should I be trying to pull out, then replace the variable part with the format argument?

If you would replace everything to %s, you could do this:
(ps.: Assuming well formatted code in terms of whitespaces)
Keep resolving from RIGHT to LEFT, as parameter position is important.
1.) Run this regex to resolve everything of the form Log.d({something} + var) to Log.d({something}, var)
(Log\.d\(.*?)\"\s*\+\s*([^\s]+)(\+)?(\))
with replacement
$1%s", $2$4
(https://regex101.com/r/hY2iK6/8)
2.) Now, You need to take care about every variable occuring between strings:
Keep running this regex, until no replacements appear:
(Log\.d\(.*)(\"\s*\+\s*([^\s]+)\s*\+\s*\")(.*?\"),([^\"]+);
with replacement
$1%s$4,$3,$5;
After run 1: https://regex101.com/r/hY2iK6/10
After run 2: https://regex101.com/r/hY2iK6/11
3.) Finally, you need to resolve the Strings containing a leading variable - which is no problem:
(Log\.d\()([^\"]+)\s+\+\s*\"(.*?),([^"]+;)
with replacement
$1"%s$3,$2,$4
https://regex101.com/r/hY2iK6/9
There might be cases not covered, but it should give you an idea.
I added the Log.d to the matchgroups as well as its part of the replacement, so you could as well use Log\.(?:d|f|e) if you like,

You can use the following regex to capture all the arguments and strings together in one go. Therefore you can figure out exactly where the arguments are meant to fit into the overall string using the pairings.
(?:(\w+)\s*\+\s*)?"((?:[^"\\]|\\.)*+)"(?:\s*\+\s*(\w+))?
Regex demo here. (Thanks to nhahtdh for the improved version.)
It will find all the concatenations as part of Log.d in the format:
[<variable> +] <string> [+ <variable>]
Where [] denotes an optional part.
With that you can form the appropriate replacements, take the following example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.lang.StringBuilder;
import java.util.List;
import java.util.ArrayList;
class Main {
public static void main(String[] args) {
String log = "Log.d(\"Hello there \" + x + \", I see your are \" + y + \" years old!\");";
System.out.println("Input: " + log);
Pattern p = Pattern.compile("(?:(\\w+)\\s*\\+\\s*)?\"((?:[^\"\\\\]|\\\\.)*+)\"(?:\\s*\\+\\s*(\\w+))?");
Matcher m = p.matcher(log);
StringBuilder output = new StringBuilder(25);
List<String> arguments = new ArrayList<String>(5);
output.append("Log.d(\"");
while (m.find()) {
if (m.group(1) != null) {
output.append("%s");
arguments.add(m.group(1));
}
output.append(m.group(2));
if (m.group(3) != null) {
output.append("%s");
arguments.add(m.group(3));
}
}
output.append("\"");
for (String arg : arguments) {
output.append(", ").append(arg);
}
output.append(");");
System.out.println("Output: " + output);
}
}
Input: Log.d("Hello there " + x + ", I see your are " + y + " years old!");
Output: Log.d("Hello there %s, I see your are %s years old!", x, y);
Java demo here.

Related

Programmatically fetch multiple code from string in Java

I have string looks like below, the string is joined by line-breaker. In this string, the the first 2 lines and last two lines are fixed, "public class MyClass {/n public void code() {/n"
String doc =
"public class MyClass {
public void code() {
try (...) {
...
}
}
}"
I only want to take out the multiple lines code in the method code, which means no first 2 lines and last 2 lines. This is what I did in my project:
String[] lines = docj.split("\\r?\\n");
String[] codes = Arrays.copyOfRange(lines, 2, lines.length - 2);
String result = String.join("\n", codes);
Do you have better way to fetch the string in the middle?
The only real answer: use an existing parser framework, such as javaparser.
Seriously, that simple.
Anything else means: you are spending time and energy to solve a solved problem. The result will be deficient, compared to any mature product, and it will be a constant liability in the future. You can get your tool to work with code you have in front of you right now, but the second your tool gets used to "parse" slightly different code, it will most likely break.
In case you are asking for educational purposes, then learn how compiler works, and what it takes to tokenize Java source code, and how to turn it into an abstract syntax tree (AST) representation.
Assuming the task is meant for basic educational purposes or a quick hack (otherwise #GhostCat's answer draws first):
Already method detection, taken seriously is not so easy. Basically you have to start implementing your own syntax parser for a fraction the Java language: chop everything to single words, skip the class declaration, wait for "static", "public", "protected", "private", "synchronized", hope I didn't forget one, skip over them and the return type definition ("void", "string"...), then you are at the name, then come optional type parameters ("<T>"), then "(", then optionally method parameters etc.).
Perhaps there are restrictions to the task, that make it less complicated. You should ask for clarification.
The problem in any case will be to find the closing braces and skip them. If you can afford to neglect such stuff as braces in strings (string s = "ab{{c";) or comments ("/* {{{ */")it is enough to count up for each { occuring after e.g. "public void code() {" and count down for "}". when the brace count is 0 and you see another "}", that one can be skipped and everything until the next method declaration.
If that's not precise enough, or your requirements are of a more serious nature, you'd have to get into parsing, e.g. using antlr or Javaparser. Here's a project that seems to do a similar task.
Learning Java Parser takes some amount of time. It isn't difficult, and there is a Java Doc Documentation Page available on the Internet. (See Here) ... But unfortunately, there isn't a lot of text to read in the documentation pages themselves. This class prints out the Method Bodies from a source-code file that is saved as a String.
Every method in the class is printed...
import com.github.javaparser.ast.*;
import com.github.javaparser.ast.stmt.BlockStmt;
import com.github.javaparser.ast.body.MethodDeclaration;
import com.github.javaparser.ast.visitor.VoidVisitor;
import com.github.javaparser.ast.visitor.VoidVisitorAdapter;
import com.github.javaparser.*;
import java.io.IOException;
import java.util.Optional;
public class MethodBody
{
static final String src =
"public class MyClass {" + '\n' +
" public void code() {" + '\n' +
" try {" + '\n' +
" /* do stuff */ " + '\n' +
" }" + '\n' +
" catch (Exception e) { }" + '\n' +
" }" + '\n' +
"}";
public static void main(String[] argv) throws IOException
{
CompilationUnit cu = StaticJavaParser.parse(src);
VoidVisitor<?> visitor = new VoidVisitorAdapter<Void>()
{
public void visit(MethodDeclaration md, Void arg)
{
System.out.println("Method Name: " + md.getName());
Optional<BlockStmt> optBody = md.getBody();
if (! optBody.isPresent()) System.out.println("No Method Body Definition\n");
System.out.println("Method Body:\n" + optBody.get().toString() + "\n\n");
}
};
visitor.visit(cu, null);
}
}
The above code will print this to terminal:
Method Name: code
Method Body:
{
try {
/* do stuff */
} catch (Exception e) {
}
}

Match a string from a list and extract values

What would be the most efficient (low CPU time) way of achieving the following in Java ?
Let us say we have a list of strings as follows :
1.T.methodA(p1).methodB(p2,p3).methodC(p4)
2.T.methodX.methodY(p5,p6).methodZ()
3 ...
At runtime we get strings as follows that may match one of the strings in our list :
a.T.methodA(p1Value).methodB(p2Value,p3Value).methodC(p4Value) // Matches 1
b.T.methodM().methodL(p10) // No Match
c.T.methodX.methodY(p5Value,p6Value).methodZ() // Matches 2
I would like to match (a) to (1) and extract the values of p1,p2,p3 and p4
where:
p1Value = p1, p2Value = p2, p3Value = p3 and so on.
Similarly for the other matches like c to 2 for example.
The first method I have in mind is of course a regular expression.
But that could be complicated to update in the future or to handle hedge cases.
Instead you can try using the Nashorn engine, that allow you to exec javascript code in a jvm.
So you just need to create a special javascript object that handle all your methods:
private static final String jsLib = "var T = {" +
"results: new java.util.HashMap()," +
"methodA: function (p1) {" +
" this.results.put('p1', p1);" +
" return this;" +
"}," +
"methodB: function (p2, p3) {" +
" this.results.put('p2', p2);" +
" this.results.put('p3', p3);" +
" return this;" +
"}," +
"methodC: function (p4) {" +
" this.results.put('p4', p4);" +
" return this.results;" +
"}}";
This is a string for semplicity, than handle your first case.
You can write the code in a js file and load that one easely.
You create a special attribute in your javascript object, that is a Java HashMap, so you get that as the result of the evaluation, with all the values by name.
So you just eval the input:
ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn");
final String inputSctipt = "T.methodA('p1Value').methodB('p2Value','p3Value').methodC('p4Value')";
try {
engine.eval(jsLib);
Map<String, Object> result = (Map<String, Object>)engine.eval(inputSctipt);
System.out.println("Script result:\n" + result.get("p1"));
} catch (ScriptException e) {
e.printStackTrace();
}
And you got:
Script result:
p1Value
In the same way you can get all the other values.
You need to ignore the script errors, are they should be path not implemented.
Just remember to reset the script context before each evaluation, in order to avoid to mix with previous values.
The advantage of this solution compared to regular expressions is that is easy to understand, easy to update when needed.
The only disadvantage I can see is the knowledge of Javascript, of course, and the performances.
You didn't mention the performances as an issue, so you can try this way if is fine for your need.
If you need a better peroformance than you should look on regular expressions.
UPDATE
To have a more complete answer, here is the same example with regular expressions:
Pattern p = Pattern.compile("^T\\.methodA\\(['\"]?(.+?)['\"]?\\)\\.methodB\\(['\"]?([^,]+?)['\"]?,['\"]?(.+?)['\"]?\\)\\.methodC\\(['\"]?(.+?)['\"]?\\)$");
Matcher m = p.matcher(inputSctipt);
if (m.find()) {
System.out.println("With regexp:\n" + m.group(1));
}
Please be aware that this expression didn't handle hedge cases, and you're going to need a reg exp for each string you want to parse and grab the attribute values.

java parsing array input control

Thanks for checking out my question.
Starting off, the program has the following goal; the user inputs currency formatted as "xD xC xP xH"; the program checks the input is correct and then prints back the 'long' version: "x Dollars, x Cents, x Penny's, x half penny's"
Here I have some code that takes input from user as String currencyIn, splits the string into array tokens, then replaces the D's with Dollars etc and prints the output.
public class parseArray
{
public parseArray()
{
System.out.print('\u000c');
String CurrencyFormat = "xD xS xP xH";
System.out.println("Please enter currency in the following format: \""+CurrencyFormat+"\" where x is any integer");
System.out.println("\nPlease take care to use the correct spacing enter the exact integer plus type of coin\n\n");
Scanner input = new Scanner(System.in);
String currencyIn = input.nextLine();
currencyIn.toUpperCase();
System.out.println("This is the currency you entered: "+currencyIn);
String[] tokens = currencyIn.split(" ");
for (String t : tokens)
{
System.out.println(t);
}
String dollars = tokens[0].replaceAll("D", " Dollars ");
String cents = tokens[1].replaceAll("C", " cents");
String penny = tokens[2].replaceAll("P", " Penny's");
String hPenny = tokens[3].replaceAll("H", " Half penny's");
System.out.println(" "+dollars+ " " +cents+ " " +penny+ " " +hPenny);
input.close();
}
}
Question 1: At the moment the program prints out pretty anything you put in. how do I establish some input control? I've seen this done in textbooks with switch statement and a series of if statements, but were too complicated for me. Would it parse characters using charAt() for each element of the array?
Question 2: Is there a 'better' way to print the output? My friend said converting my 4 strings (dollars, cents, penny's, hpenny's) into elements 0, 1, 2, 3 of a new array (called newArray) and print like this:
System.out.println(Arrays.toString(newArray));
Many thanks in advance.
There is a neat solution, involving Regular Expressions, Streams and some lambdas. Core concept is that we define the input format through a regular expression. We need some sequence of digits, followed by a 'D' or a 'd', followed by a " ", followed by a sequence of digits, followed by a C or c,... I will skip derivation of this pattern, it is explained in the regular expression tutorial I linked above. We will find that
final String regex = "([0-9]+)[D|d]\\ ([0-9]+)[C|c]\\ ([0-9]+)[P|p]\\ ([0-9]+)[H|h]";
satisfies our needs. With this regular expression we can now determine whether our input String has the right format (input.matches(regex)), as well as extract the bits of information we are actually interested in (input.replaceAll(regex, "$1 $2 $3 $4"). Sadly, replaceAll yields another String, but it will contain the four digit sequences we are interested in, divided by a " ". We will use some stream-magic to transform this String into a long[] (where the first cell holds the D-value, the second holds the C-value,...). The final program looks like this:
import java.util.Arrays;
public class Test {
public static void main(String... args) {
final String input = args[0];
final String regex =
"([0-9]+)[D|d]\\ ([0-9]+)[C|c]\\ ([0-9]+)[P|p]\\ ([0-9]+)[H|h]";
if (input.matches(regex) == false) {
throw new IllegalArgumentException("Input is malformed.");
}
long[] values = Arrays.stream(input.replaceAll(regex, "$1 $2 $3 $4").split(" "))
.mapToLong(Long::parseLong)
.toArray();
System.out.println(Arrays.toString(values));
}
}
If you want to have a List<Long> instead a long[] (or a List<Integer> instead of an int[]), you would use
List<Long> values = Arrays.stream(input.replaceAll(regex, "$1 $2 $3 $4").split(" "))
.map(Long::parseLong)
.collect(Collectors.toList());
It is necessary to change mapToLong to map to receive a Stream<Long> instead of a LongStream. I am sure that one could somehow write a custom Collector for LongStream to transform it into a List<Long>, but I found this solution more readable and reliable (after all, the Collector used comes from Oracle, I trust they test their code extensively).
Here is some example call:
$> java Test "10D 9c 8p 7H"
[10, 9, 8, 7]
$> java Test "10E 9C 8P 7H"
Exception in thread "main" java.lang.IllegalArgumentException: Input is malformed.
at Test.main(Test.java:10)
$> java Test "10D 9C 8P 7H 10D 9C 8P 7H"
Exception in thread "main" java.lang.IllegalArgumentException: Input is malformed.
at Test.main(Test.java:10)
Question1
You can actually check if the input is what it's supposed to be with simple checks. For example, you can check the first element like this:
if(tokens[0].charAt(1).equals("D"))
return true;
else
return false;
Another way to check if the input is correct is by using Regular Expressions, but I assume you are a beginner and this is too much trouble for you, although it is the better way. So I leave it to you to look through it later.
Question2
You can actually listen to your friend and do as they said. You can write it as follows:
for(int i = 0; i < 4; i++)
System.out.print(" " + tokens[i])
System.out.println();
Or you may use
System.out.println(Arrays.toString(newArray));
And you have saved newArray like this:
newArray[0] = " " + tokens[0];
you could use the .equals() method to see if what a user has typed in matches what you have
if (currencyIn.equals("CurrencyFormat"))
{
...
}
this is probably the simplest way i can think of!

Java: Issue when replacing Strings on loop

I'm building a small app which auto translates boolean queries in Java.
This is the code to find if the query string contains a certain word and if so, it replaces it with the translated value.
int howmanytimes = originalValues.size();
for (int y = 0; y < howmanytimes; y++) {
String originalWord = originalValues.get(y);
System.out.println("original Word = " + originalWord);
if (toReplace.contains(" " + originalWord.toLowerCase() + " ")
|| toCheck.contains('"' + originalWord.toLowerCase() + '"')) {
toReplace = toReplace.replace(originalWord, translatedValues.get(y).toLowerCase());
System.out.println("replaced " + originalWord + " with " + translatedValues.get(y).toLowerCase());
}
System.out.println("to Replace inside loop " + toReplace);
}
The problem is when a query has, for example, '(mykeyword OR "blue mykeyword")' and the translated values are different, for example, mykeyword translates to elpalavra and "blue mykeyword" translates to "elpalavra azul". What happens in this case is that the result string will be '(elpalavra OR "blue elpalavra")' when it should be '(elpalavra OR "elpalavra azul")' . I understand that in the first loop it replaces all keywords and in the second it no longer contains the original value it should for translation.
How can I fix this?
Thank you
you can sort originalValues by size desc. And after that loop through them.
This way you first replace "blue mykeyword" and only after you replace "mykeyword"
The "toCheck" variable is not explained what is for, and in any case the way it is used looks weird (to me at least).
Keeping that aside, one way to answer your request could be this (based only on the requirements you specified):
sort your originalValues, so that the ones with more words are first. The ones that have same number of words, should be ordered from more length to less.

Weird Java String comparison

I'm having a minor issue with Java String comparisons.
I've written a class which takes in a String and parses it into a custom tree type. I've written a toString class which then converts this tree back to a String again. As part of my unit tests I'm just checking that the String generated by the toString method is the same as the String that was parsed in the first place.
Here is my simple test with a few printouts so that we can see whats going on.
final String exp1 = "(a|b)";
final String exp2 = "((a|b)|c)";
final Node tree1 = Reader.parseExpression2(exp1);
final Node tree2 = Reader.parseExpression2(exp2);
final String t1 = tree1.toString();
final String t2 = tree2.toString();
System.out.println(":" + exp1 + ":" + t1 + ":");
System.out.println(":" + exp2 + ":" + t2 + ":");
System.out.println(exp1.compareToIgnoreCase(t1));
System.out.println(exp2.compareToIgnoreCase(t2));
System.out.println(exp1.equals(t1));
System.out.println(exp2.equals(t2));
Has the following output; (NB ":" - are used as delineators so I can ensure theres no extra whitespace)
:(a|b):(a|b):
:((a|b)|c):((a|b)|c):
-1
-1
false
false
Based on manually comparing the strings exp1 and exp2 to t1 and t2 respectively, they are exactly the same. But for some reason Java is insisting they are different.
This isn't the obvious mistake of using == instead of .equals() but I'm stumped as to why two seemingly identical strings are different. Any help would be much appreciated :)
Does one of your strings have a null character within it? These might not be visible when you use System.out.println(...).
For example, consider this class:
public class StringComparison {
public static void main(String[] args) {
String s = "a|b";
String t = "a|b\0";
System.out.println(":" + s + ":" + t + ":");
System.out.println(s.equals(t));
}
}
When I ran this on Linux it gave me the following output:
:a|b:a|b:
false
(I also ran it on Windows, but the null character showed up as a space.)
Well, it certainly looks okay. What I would do would be to iterate over both strings using charAt to compare every single character with the equivalent in the other string. This will, at a minimum, hopefully tell you the offending character.
Also output everything else you can find out about both strings, such as the length.
It could be that one of the characters, while looking the same, may be some other Unicode doppelganger :-)
You may also want to capture that output and do a detailed binary dump on it, such as loading it up into gvim and using the hex conversion tool, or executing od -xcb (if available) on the captured output. There may be an obvious difference when you get down to the binary examination level.
I have some suggestions
Copy each output and paste in Notepad (or any similar editor), then
copy them again and do something like this
System.out.println("(a|b)".compareToIgnoreCase("(a|b)"));
Print out the integer representation of each character. If it is a weird unicode, the int representation will be different.
Also what version of JDK are you using?

Categories