Match a string from a list and extract values - java

What would be the most efficient (low CPU time) way of achieving the following in Java ?
Let us say we have a list of strings as follows :
1.T.methodA(p1).methodB(p2,p3).methodC(p4)
2.T.methodX.methodY(p5,p6).methodZ()
3 ...
At runtime we get strings as follows that may match one of the strings in our list :
a.T.methodA(p1Value).methodB(p2Value,p3Value).methodC(p4Value) // Matches 1
b.T.methodM().methodL(p10) // No Match
c.T.methodX.methodY(p5Value,p6Value).methodZ() // Matches 2
I would like to match (a) to (1) and extract the values of p1,p2,p3 and p4
where:
p1Value = p1, p2Value = p2, p3Value = p3 and so on.
Similarly for the other matches like c to 2 for example.

The first method I have in mind is of course a regular expression.
But that could be complicated to update in the future or to handle hedge cases.
Instead you can try using the Nashorn engine, that allow you to exec javascript code in a jvm.
So you just need to create a special javascript object that handle all your methods:
private static final String jsLib = "var T = {" +
"results: new java.util.HashMap()," +
"methodA: function (p1) {" +
" this.results.put('p1', p1);" +
" return this;" +
"}," +
"methodB: function (p2, p3) {" +
" this.results.put('p2', p2);" +
" this.results.put('p3', p3);" +
" return this;" +
"}," +
"methodC: function (p4) {" +
" this.results.put('p4', p4);" +
" return this.results;" +
"}}";
This is a string for semplicity, than handle your first case.
You can write the code in a js file and load that one easely.
You create a special attribute in your javascript object, that is a Java HashMap, so you get that as the result of the evaluation, with all the values by name.
So you just eval the input:
ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn");
final String inputSctipt = "T.methodA('p1Value').methodB('p2Value','p3Value').methodC('p4Value')";
try {
engine.eval(jsLib);
Map<String, Object> result = (Map<String, Object>)engine.eval(inputSctipt);
System.out.println("Script result:\n" + result.get("p1"));
} catch (ScriptException e) {
e.printStackTrace();
}
And you got:
Script result:
p1Value
In the same way you can get all the other values.
You need to ignore the script errors, are they should be path not implemented.
Just remember to reset the script context before each evaluation, in order to avoid to mix with previous values.
The advantage of this solution compared to regular expressions is that is easy to understand, easy to update when needed.
The only disadvantage I can see is the knowledge of Javascript, of course, and the performances.
You didn't mention the performances as an issue, so you can try this way if is fine for your need.
If you need a better peroformance than you should look on regular expressions.
UPDATE
To have a more complete answer, here is the same example with regular expressions:
Pattern p = Pattern.compile("^T\\.methodA\\(['\"]?(.+?)['\"]?\\)\\.methodB\\(['\"]?([^,]+?)['\"]?,['\"]?(.+?)['\"]?\\)\\.methodC\\(['\"]?(.+?)['\"]?\\)$");
Matcher m = p.matcher(inputSctipt);
if (m.find()) {
System.out.println("With regexp:\n" + m.group(1));
}
Please be aware that this expression didn't handle hedge cases, and you're going to need a reg exp for each string you want to parse and grab the attribute values.

Related

Java: Issue when replacing Strings on loop

I'm building a small app which auto translates boolean queries in Java.
This is the code to find if the query string contains a certain word and if so, it replaces it with the translated value.
int howmanytimes = originalValues.size();
for (int y = 0; y < howmanytimes; y++) {
String originalWord = originalValues.get(y);
System.out.println("original Word = " + originalWord);
if (toReplace.contains(" " + originalWord.toLowerCase() + " ")
|| toCheck.contains('"' + originalWord.toLowerCase() + '"')) {
toReplace = toReplace.replace(originalWord, translatedValues.get(y).toLowerCase());
System.out.println("replaced " + originalWord + " with " + translatedValues.get(y).toLowerCase());
}
System.out.println("to Replace inside loop " + toReplace);
}
The problem is when a query has, for example, '(mykeyword OR "blue mykeyword")' and the translated values are different, for example, mykeyword translates to elpalavra and "blue mykeyword" translates to "elpalavra azul". What happens in this case is that the result string will be '(elpalavra OR "blue elpalavra")' when it should be '(elpalavra OR "elpalavra azul")' . I understand that in the first loop it replaces all keywords and in the second it no longer contains the original value it should for translation.
How can I fix this?
Thank you
you can sort originalValues by size desc. And after that loop through them.
This way you first replace "blue mykeyword" and only after you replace "mykeyword"
The "toCheck" variable is not explained what is for, and in any case the way it is used looks weird (to me at least).
Keeping that aside, one way to answer your request could be this (based only on the requirements you specified):
sort your originalValues, so that the ones with more words are first. The ones that have same number of words, should be ordered from more length to less.

Retrieval of synonyms of an instance from whole ontology

Individual ind = model.createIndividual("http://www.semanticweb.org/ontologies/Word#Human", isSynonymOf);
System.out.println( "Synonyms of given instance are:" );
StmtIterator it =ind.listProperties(isSynonymOf);
while( it.hasNext() ) {
Statement stmt = ((StmtIterator) it).nextStatement();
System.out.println( " * "+stmt.getObject());
}
Output
Synonyms of given instance are:
http://www.semanticweb.org/ontologies/Word#Human
http://www.semanticweb.org//ontologies/Word#Mortal
http://www.semanticweb.org/ontologies/Word#Person
Problem 1: My output shows whole URI but I need output as under
Synonyms of given instance are:
Human
Mortal
Person
Problem 2: I have 26 instances and every time I have to mention its URI to show its synonyms. How will I show synonyms of any instance from whole ontology model instead of mentioning URIs again and again. I am using eclipse Mars 2.0 and Jena API
You can use REGEX or simply Java string operations to extract the substring after #. Note, best practice is to provide human readable representations of URIs and not to encode it in the URI. For instance, rdfs:label is a common property for doing that.
It is simply iterating over all individuals of the ontology which are returned by
model.listIndividuals()
Some comments:
You're using the method createIndividual not as expected. The second argument denotes a class and you're giving it a property. Please use Javadoc for the future.
I don't understand why you're casting it to StmtIterator - that doesn't make sense
Using listPropertiesValues is more convenient since you're only interested in the values.
Use Java 8 to make the code more compact
model.listIndividuals().forEachRemaining(ind -> {
System.out.println("Synonyms of instance " + ind + " are:");
ind.listPropertyValues(isSynonymOf).forEachRemaining(val -> {
System.out.println(" * " + val);
});
});
Java 6 compatible version:
ExtendedIterator<Individual> indIter = model.listIndividuals();
while(indIter.hasNext()) {
Individual ind = indIter.next();
System.out.println("Synonyms of instance " + ind + " are:");
NodeIterator valueIter = ind.listPropertyValues(isSynonymOf);
while(valueIter.hasNext()) {
RDFNode val = valueIter.next();
System.out.println(" * " + val);
}
}

Automated conversion from string concatenation to formatted arguments

Our code is littered with things like,
Log.d("Hello there " + x + ", I see your are " + y + " years old!");
I want to be able to script the conversion to something like this,
Log.d("Hello there %s, I see your are %d years old!", x, y);
(Note: I'm not worried about getting the right argument type now. I could pre-process the file to determine the types, or convert to always use strings. Not my concern right now.)
I am wondering if anyone has tackled this. I came up with these regexs for pulling out the static and variable parts of the strings,
static final Pattern P1 = Pattern.compile("\\s*(\".*?\")\\s*");
static final Pattern P2 = Pattern.compile("\\s*\\+?\\s*([^\\+]+)\\s*\\+?\\s*");
By looping on find() for each I can pull out the parts,
"Hello there "
", I see your are "
"years old!"
and,
x
y
But I can't come up with a good way to piece these back together, considering all the possibilities of how they might be concatenated together.
Maybe this is the wrong approach. Should I be trying to pull out, then replace the variable part with the format argument?
If you would replace everything to %s, you could do this:
(ps.: Assuming well formatted code in terms of whitespaces)
Keep resolving from RIGHT to LEFT, as parameter position is important.
1.) Run this regex to resolve everything of the form Log.d({something} + var) to Log.d({something}, var)
(Log\.d\(.*?)\"\s*\+\s*([^\s]+)(\+)?(\))
with replacement
$1%s", $2$4
(https://regex101.com/r/hY2iK6/8)
2.) Now, You need to take care about every variable occuring between strings:
Keep running this regex, until no replacements appear:
(Log\.d\(.*)(\"\s*\+\s*([^\s]+)\s*\+\s*\")(.*?\"),([^\"]+);
with replacement
$1%s$4,$3,$5;
After run 1: https://regex101.com/r/hY2iK6/10
After run 2: https://regex101.com/r/hY2iK6/11
3.) Finally, you need to resolve the Strings containing a leading variable - which is no problem:
(Log\.d\()([^\"]+)\s+\+\s*\"(.*?),([^"]+;)
with replacement
$1"%s$3,$2,$4
https://regex101.com/r/hY2iK6/9
There might be cases not covered, but it should give you an idea.
I added the Log.d to the matchgroups as well as its part of the replacement, so you could as well use Log\.(?:d|f|e) if you like,
You can use the following regex to capture all the arguments and strings together in one go. Therefore you can figure out exactly where the arguments are meant to fit into the overall string using the pairings.
(?:(\w+)\s*\+\s*)?"((?:[^"\\]|\\.)*+)"(?:\s*\+\s*(\w+))?
Regex demo here. (Thanks to nhahtdh for the improved version.)
It will find all the concatenations as part of Log.d in the format:
[<variable> +] <string> [+ <variable>]
Where [] denotes an optional part.
With that you can form the appropriate replacements, take the following example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.lang.StringBuilder;
import java.util.List;
import java.util.ArrayList;
class Main {
public static void main(String[] args) {
String log = "Log.d(\"Hello there \" + x + \", I see your are \" + y + \" years old!\");";
System.out.println("Input: " + log);
Pattern p = Pattern.compile("(?:(\\w+)\\s*\\+\\s*)?\"((?:[^\"\\\\]|\\\\.)*+)\"(?:\\s*\\+\\s*(\\w+))?");
Matcher m = p.matcher(log);
StringBuilder output = new StringBuilder(25);
List<String> arguments = new ArrayList<String>(5);
output.append("Log.d(\"");
while (m.find()) {
if (m.group(1) != null) {
output.append("%s");
arguments.add(m.group(1));
}
output.append(m.group(2));
if (m.group(3) != null) {
output.append("%s");
arguments.add(m.group(3));
}
}
output.append("\"");
for (String arg : arguments) {
output.append(", ").append(arg);
}
output.append(");");
System.out.println("Output: " + output);
}
}
Input: Log.d("Hello there " + x + ", I see your are " + y + " years old!");
Output: Log.d("Hello there %s, I see your are %s years old!", x, y);
Java demo here.

Correct way to use StringBuilder in SQL

I just found some sql query build like this in my project:
return (new StringBuilder("select id1, " + " id2 " + " from " + " table")).toString();
Does this StringBuilder achieve its aim, i.e reducing memory usage?
I doubt that, because in the constructor the '+' (String concat operator) is used. Will that take the same amount of memory as using String like the code below? s I understood, it differs when using StringBuilder.append().
return "select id1, " + " id2 " + " from " + " table";
Are both statements equal in memory usage or not? Please clarify.
Edit:
BTW, it is not my code. Found it in an old project. Also, the query is not so small as the one in my example. :)
The aim of using StringBuilder, i.e reducing memory. Is it achieved?
No, not at all. That code is not using StringBuilder correctly. (I think you've misquoted it, though; surely there aren't quotes around id2 and table?)
Note that the aim (usually) is to reduce memory churn rather than total memory used, to make life a bit easier on the garbage collector.
Will that take memory equal to using String like below?
No, it'll cause more memory churn than just the straight concat you quoted. (Until/unless the JVM optimizer sees that the explicit StringBuilder in the code is unnecessary and optimizes it out, if it can.)
If the author of that code wants to use StringBuilder (there are arguments for, but also against; see note at the end of this answer), better to do it properly (here I'm assuming there aren't actually quotes around id2 and table):
StringBuilder sb = new StringBuilder(some_appropriate_size);
sb.append("select id1, ");
sb.append(id2);
sb.append(" from ");
sb.append(table);
return sb.toString();
Note that I've listed some_appropriate_size in the StringBuilder constructor, so that it starts out with enough capacity for the full content we're going to append. The default size used if you don't specify one is 16 characters, which is usually too small and results in the StringBuilder having to do reallocations to make itself bigger (IIRC, in the Sun/Oracle JDK, it doubles itself [or more, if it knows it needs more to satisfy a specific append] each time it runs out of room).
You may have heard that string concatenation will use a StringBuilder under the covers if compiled with the Sun/Oracle compiler. This is true, it will use one StringBuilder for the overall expression. But it will use the default constructor, which means in the majority of cases, it will have to do a reallocation. It's easier to read, though. Note that this is not true of a series of concatenations. So for instance, this uses one StringBuilder:
return "prefix " + variable1 + " middle " + variable2 + " end";
It roughly translates to:
StringBuilder tmp = new StringBuilder(); // Using default 16 character size
tmp.append("prefix ");
tmp.append(variable1);
tmp.append(" middle ");
tmp.append(variable2);
tmp.append(" end");
return tmp.toString();
So that's okay, although the default constructor and subsequent reallocation(s) isn't ideal, the odds are it's good enough — and the concatenation is a lot more readable.
But that's only for a single expression. Multiple StringBuilders are used for this:
String s;
s = "prefix ";
s += variable1;
s += " middle ";
s += variable2;
s += " end";
return s;
That ends up becoming something like this:
String s;
StringBuilder tmp;
s = "prefix ";
tmp = new StringBuilder();
tmp.append(s);
tmp.append(variable1);
s = tmp.toString();
tmp = new StringBuilder();
tmp.append(s);
tmp.append(" middle ");
s = tmp.toString();
tmp = new StringBuilder();
tmp.append(s);
tmp.append(variable2);
s = tmp.toString();
tmp = new StringBuilder();
tmp.append(s);
tmp.append(" end");
s = tmp.toString();
return s;
...which is pretty ugly.
It's important to remember, though, that in all but a very few cases it doesn't matter and going with readability (which enhances maintainability) is preferred barring a specific performance issue.
When you already have all the "pieces" you wish to append, there is no point in using StringBuilder at all. Using StringBuilder and string concatenation in the same call as per your sample code is even worse.
This would be better:
return "select id1, " + " id2 " + " from " + " table";
In this case, the string concatenation is actually happening at compile-time anyway, so it's equivalent to the even-simpler:
return "select id1, id2 from table";
Using new StringBuilder().append("select id1, ").append(" id2 ")....toString() will actually hinder performance in this case, because it forces the concatenation to be performed at execution time, instead of at compile time. Oops.
If the real code is building a SQL query by including values in the query, then that's another separate issue, which is that you should be using parameterized queries, specifying the values in the parameters rather than in the SQL.
I have an article on String / StringBuffer which I wrote a while ago - before StringBuilder came along. The principles apply to StringBuilder in the same way though.
[[ There are some good answers here but I find that they still are lacking a bit of information. ]]
return (new StringBuilder("select id1, " + " id2 " + " from " + " table"))
.toString();
So as you point out, the example you give is a simplistic but let's analyze it anyway. What happens here is the compiler actually does the + work here because "select id1, " + " id2 " + " from " + " table" are all constants. So this turns into:
return new StringBuilder("select id1, id2 from table").toString();
In this case, obviously, there is no point in using StringBuilder. You might as well do:
// the compiler combines these constant strings
return "select id1, " + " id2 " + " from " + " table";
However, even if you were appending any fields or other non-constants then the compiler would use an internal StringBuilder -- there's no need for you to define one:
// an internal StringBuilder is used here
return "select id1, " + fieldName + " from " + tableName;
Under the covers, this turns into code that is approximately equivalent to:
StringBuilder sb = new StringBuilder("select id1, ");
sb.append(fieldName).append(" from ").append(tableName);
return sb.toString();
Really the only time you need to use StringBuilder directly is when you have conditional code. For example, code that looks like the following is desperate for a StringBuilder:
// 1 StringBuilder used in this line
String query = "select id1, " + fieldName + " from " + tableName;
if (where != null) {
// another StringBuilder used here
query += ' ' + where;
}
The + in the first line uses one StringBuilder instance. Then the += uses another StringBuilder instance. It is more efficient to do:
// choose a good starting size to lower chances of reallocation
StringBuilder sb = new StringBuilder(64);
sb.append("select id1, ").append(fieldName).append(" from ").append(tableName);
// conditional code
if (where != null) {
sb.append(' ').append(where);
}
return sb.toString();
Another time that I use a StringBuilder is when I'm building a string from a number of method calls. Then I can create methods that take a StringBuilder argument:
private void addWhere(StringBuilder sb) {
if (where != null) {
sb.append(' ').append(where);
}
}
When you are using a StringBuilder, you should watch for any usage of + at the same time:
sb.append("select " + fieldName);
That + will cause another internal StringBuilder to be created. This should of course be:
sb.append("select ").append(fieldName);
Lastly, as #T.J.rowder points out, you should always make a guess at the size of the StringBuilder. This will save on the number of char[] objects created when growing the size of the internal buffer.
You are correct in guessing that the aim of using string builder is not achieved, at least not to its full extent.
However, when the compiler sees the expression "select id1, " + " id2 " + " from " + " table" it emits code which actually creates a StringBuilder behind the scenes and appends to it, so the end result is not that bad afterall.
But of course anyone looking at that code is bound to think that it is kind of retarded.
In the code you have posted there would be no advantages, as you are misusing the StringBuilder. You build the same String in both cases. Using StringBuilder you can avoid the + operation on Strings using the append method.
You should use it this way:
return new StringBuilder("select id1, ").append(" id2 ").append(" from ").append(" table").toString();
In Java, the String type is an inmutable sequence of characters, so when you add two Strings the VM creates a new String value with both operands concatenated.
StringBuilder provides a mutable sequence of characters, which you can use to concat different values or variables without creating new String objects, and so it can sometimes be more efficient than working with strings
This provides some useful features, as changing the content of a char sequence passed as parameter inside another method, which you can't do with Strings.
private void addWhereClause(StringBuilder sql, String column, String value) {
//WARNING: only as an example, never append directly a value to a SQL String, or you'll be exposed to SQL Injection
sql.append(" where ").append(column).append(" = ").append(value);
}
More info at http://docs.oracle.com/javase/tutorial/java/data/buffers.html
You could also use MessageFormat too

What is the effective method to handle word contractions using Java?

I have a list of words in a file. They might contain words like who's, didn't etc. So when reading from it I need to make them proper like "who is" and "did not". This has to be done in Java. I need to do this without losing much time.
This is actually for handling such queries during a search that uses solr.
Below is a sample code I tried using a hash map
Map<String, String> con = new HashMap<String, String>();
con.put("'s", " is");
con.put("'d", " would");
con.put("'re", " are");
con.put("'ll", " will");
con.put("n't", " not");
con.put("'nt", " not");
String temp = null;
String str = "where'd you're you'll would'nt hello";
String[] words = str.split(" ");
int index = -1 ;
for(int i = 0;i<words.length && (index =words[i].lastIndexOf('\''))>-1;i++){
temp = words[i].substring(index);
if(con.containsKey(temp)){
temp = con.get(temp);
}
words[i] = words[i].substring(0, index)+temp;
System.out.println(words[i]);
}
If you are worried about queries containing for eg "who's" finding documents containing for eg "who is" then you should look at using a Stemmer, which is designed exactly for this purpose.
You can easily add a stemmer buy configuring it as a filter in your solr config. See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
Edit:
A SnowballPorterFilterFactory will probably do the job for you.
Following on from #James Jithin's last remark:
the "'s" -> " is" transform is incorrect if the word is a possessive form.
the "'d" -> " would" transform is incorrect in archaic forms, where the "'d" can be a contraction of "ed".
the "'nt" -> " not" transform is not correct because this is really just a mis-spelling of the "n't" contraction. (I mean "wo'nt" is just plain wrong ... isn't it.)
So, to my mind, the best way to implement this would be to enumerate the small number of contractions that are common and valid, and leave the rest alone. This also has the advantage that you can implement it with a simple string match rather than a suffix match.
The code can be written as
Map<String, String> con = new HashMap<String, String>();
con.put("'s", " is");
con.put("'d", " would");
con.put("'re", " are");
con.put("'ll", " will");
con.put("n't", " not");
con.put("'nt", " not");
String str = "where'd you're you'll would'nt hello";
for(String key : con.keySet()) {
str = str.replaceAll(key + "\\b" , con.get(key));
}
with the logic you have. But suppose its script's is a word which shows possession, changing it to script is alters the meaning.

Categories