What is the effective method to handle word contractions using Java? - java

I have a list of words in a file. They might contain words like who's, didn't etc. So when reading from it I need to make them proper like "who is" and "did not". This has to be done in Java. I need to do this without losing much time.
This is actually for handling such queries during a search that uses solr.
Below is a sample code I tried using a hash map
Map<String, String> con = new HashMap<String, String>();
con.put("'s", " is");
con.put("'d", " would");
con.put("'re", " are");
con.put("'ll", " will");
con.put("n't", " not");
con.put("'nt", " not");
String temp = null;
String str = "where'd you're you'll would'nt hello";
String[] words = str.split(" ");
int index = -1 ;
for(int i = 0;i<words.length && (index =words[i].lastIndexOf('\''))>-1;i++){
temp = words[i].substring(index);
if(con.containsKey(temp)){
temp = con.get(temp);
}
words[i] = words[i].substring(0, index)+temp;
System.out.println(words[i]);
}

If you are worried about queries containing for eg "who's" finding documents containing for eg "who is" then you should look at using a Stemmer, which is designed exactly for this purpose.
You can easily add a stemmer buy configuring it as a filter in your solr config. See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
Edit:
A SnowballPorterFilterFactory will probably do the job for you.

Following on from #James Jithin's last remark:
the "'s" -> " is" transform is incorrect if the word is a possessive form.
the "'d" -> " would" transform is incorrect in archaic forms, where the "'d" can be a contraction of "ed".
the "'nt" -> " not" transform is not correct because this is really just a mis-spelling of the "n't" contraction. (I mean "wo'nt" is just plain wrong ... isn't it.)
So, to my mind, the best way to implement this would be to enumerate the small number of contractions that are common and valid, and leave the rest alone. This also has the advantage that you can implement it with a simple string match rather than a suffix match.

The code can be written as
Map<String, String> con = new HashMap<String, String>();
con.put("'s", " is");
con.put("'d", " would");
con.put("'re", " are");
con.put("'ll", " will");
con.put("n't", " not");
con.put("'nt", " not");
String str = "where'd you're you'll would'nt hello";
for(String key : con.keySet()) {
str = str.replaceAll(key + "\\b" , con.get(key));
}
with the logic you have. But suppose its script's is a word which shows possession, changing it to script is alters the meaning.

Related

Java: Issue when replacing Strings on loop

I'm building a small app which auto translates boolean queries in Java.
This is the code to find if the query string contains a certain word and if so, it replaces it with the translated value.
int howmanytimes = originalValues.size();
for (int y = 0; y < howmanytimes; y++) {
String originalWord = originalValues.get(y);
System.out.println("original Word = " + originalWord);
if (toReplace.contains(" " + originalWord.toLowerCase() + " ")
|| toCheck.contains('"' + originalWord.toLowerCase() + '"')) {
toReplace = toReplace.replace(originalWord, translatedValues.get(y).toLowerCase());
System.out.println("replaced " + originalWord + " with " + translatedValues.get(y).toLowerCase());
}
System.out.println("to Replace inside loop " + toReplace);
}
The problem is when a query has, for example, '(mykeyword OR "blue mykeyword")' and the translated values are different, for example, mykeyword translates to elpalavra and "blue mykeyword" translates to "elpalavra azul". What happens in this case is that the result string will be '(elpalavra OR "blue elpalavra")' when it should be '(elpalavra OR "elpalavra azul")' . I understand that in the first loop it replaces all keywords and in the second it no longer contains the original value it should for translation.
How can I fix this?
Thank you
you can sort originalValues by size desc. And after that loop through them.
This way you first replace "blue mykeyword" and only after you replace "mykeyword"
The "toCheck" variable is not explained what is for, and in any case the way it is used looks weird (to me at least).
Keeping that aside, one way to answer your request could be this (based only on the requirements you specified):
sort your originalValues, so that the ones with more words are first. The ones that have same number of words, should be ordered from more length to less.

Reverse lookup of ArrayList of Array values

In my quest to continue my java education I'm trying to figure out if there is a native java method that quickly and efficiently allows a lookup of a string value in a ArrayList of Arrays.
Here is my code that shows what I'm trying to do:
public void exampleArrayListofArray () {
ArrayList<String []> al = new ArrayList<>();
al.add(new String[] {"AB","YZ"});
al.add(new String[] {"CD","WX"});
al.add(new String[] {"EF","UV"});
al.add(new String[] {"GH","ST"});
al.add(new String[] {"IJ","QR"});
al.add(new String[] {"KL","OP"});
displayArrayListofArray(al);
}
public void displayArrayListofArray(List<String []> al) {
for (String [] row : al)
for (int column = 0; column <= 1 ; column ++){
System.out.println("Value at Index Row " + al.indexOf(row) +
" Column " + column + " is " + (row)[column]);
}
String lookUpString = "YZ";
lookUpMethod(al, lookUpString);
lookUpString = "ST";
lookUpMethod(al, lookUpString);
lookUpString = "IJ";
lookUpMethod(al, lookUpString);
lookUpString = "AA";
lookUpMethod(al, lookUpString);
}
public void lookUpMethod(List<String []> al, String lookUpString) {
boolean isStringFound = false;
for (String[] row : al) {
for (int column = 0; column <= 1; column++) {
if (al.get(al.indexOf(row))[column] == lookUpString) {
System.out.println("Index of '" + lookUpString + "': " + al.indexOf(row) + column);
isStringFound = true;
}
}
}
if (!isStringFound) {
System.out.println("Search string '" + lookUpString + "' does not exist.");
}
}
Is this the most efficient way of searching my ArrayList for a given string?
Is there anything that I should be doing to make my code more efficient (besides not using an ArrayList)?
I know that perhaps to do what I'm trying to do here there could be more efficient ways of doing it than an ArrayList such as a HashMap but with my currently very limited java knowledge I'm making progress with ArrayList and would have to start from scratch using a HashMap. The very end goal of my code is to do the following:
Read an asset text file to load the ArrayList
Search the ArrayList for a user entered value
Do some calcs with the neighbouring values in the searched row
Allow the user to update the neighbouring values at the searched row
Allow the user to add a new row if the searched string is not found
Save any changes back to the asset text file in alphabetical order
Airfix
My answer is: don't worry.
I think you are looking at this from the wrong angle: if you find that the users of your application have a "performance" issue; and if you then do profiling, and then profiling shows that your current "search" code is the "culprit" (the single hot-spot that kills "end user perceived performance"); then you will have to bite the bullet and learn about using different data structures than ArrayLists.
(side note there: in reality, Set/HashSet isn't much "different"; learning how to use them ... isn't as big of a deal as it might sound).
But: if you answered any of the above "questions" with "no" (like: you do not have users that complain about bad performance) ... then there is no point in worrying about performance.
Long story short: either performance is really an issue - then you have to solve it. Otherwise: don't try to fix something that is not broken.
(as said: from a learning perspective, I would still encourage you to save your code; and start a new version that uses sets. There are plenty of tutorials out there that explain all the things you need to know).
But just to give you some direction: your main "performance" killer is (as you thought yourself) the inappropriate usage of data structures. There is no advantage in using an ArrayList to store arrays of strings that you want to search for. That adds "two layers"; each one requiring your code to iterate those "lists" in an sequential way. If you would use a single Set (like HashSet) instead; and add all your "search strings" to that set, your whole "lookup" for matches ... boils down to ask that set: "do you contain this value".

Splitting String according to multiple String in java

I just beginning to learn java, so please don't mind.
I have string
String test="John Software_Engineer Kartika QA Xing Project_Manager Mark CEO Celina Assistant_Developer";
I want to splitting based of position of Company={"Software_Engineer", "QA","Project_Manager","CEO ","Assistant_Developer"};
EDITED:
if above is difficulties then is it possible??? Based or {AND, OR)
String value="NA_USA >= 15 AND NA_USA=< 30 OR NA_USA!=80"
String value1="EUROPE_SPAIN >= 5 OR EUROPE_SPAIN < = 30 "
How to split and put in hashtable in java. finally how to access it from the end. this is not necessary but my main concern is how to split.
Next EDIT:
I got solution from this, it is the best idea or not????
String to="USA AND JAPAN OR SPAIN AND CHINA";
String [] ind= new String[]{"AND", "OR"};
for (int hj = 0; hj < ind.length; hj++){
to=to.replaceAll(ind[hj].toString(), "*");
}
System.out.println(" (=to=) "+to);
String[] partsparts = to.split("\\*");
for (int hj1 = 0; hj1 < partsparts.length; hj1++){
System.out.println(" (=partsparts=) "+partsparts[hj1].toString());
}
and
List<String> test1=split(to, '*', 1);
System.out.println("-str333->"+test1);
New EDIT:
If I have this type of String how can you splitting:
final String PLAYER = "IF John END IF Football(soccer) END IF Abdul-Jabbar tennis player END IF Karim -1996 * 1974 END IF";
How can i get like this: String [] data=[John , Football(soccer) ,Abdul-Jabbar tennis player, Karim -1996 * 1974 ]
Do you have any idea???
This will split your string for you and store it in a string array(Max size 50).
private static String[]split = new String[50];
public static void main(String[] args) {
String test="John -Software_Engineer Kartika -QA Xing -Project_Manager Mark -CEO Celina -Assistant_Developer";
for (String retval: test.split("-")){
int i = 0;
split[i]=retval;
System.out.println(split[i]);
i++;
}
}
You can make a string with Name:post and space. then it will be easy get desire value.
String test="John:Software_Engineer Kartika:QA Xing:Project_Manager"
I am unable to comment as my reputation is less. Hence i am writing over here.
Your first Question of String splitting could be generalized as positional word splitting. If it is guaranteed that you require all even positioned string, you could first split the string based on the space and pull all the even position string.
On your Second Question on AND & OR split, you could replace all " AND " & " OR " with single String " " and you could split the output string by single space string " ".
On your third Question, replace "IF " & " END" with single space string " " and I am not sure whether last IF do occurs in your string. If so you could replace it too with empty string "" and then split the string based on single space string " ".
First classify your input string based on patterns and please devise an algorithm before you work on Java.
I would suggest you to use StringBuffer or StringBuilder instead of using String directly as the cost is high for String Operation when compared to the above to.
try this
String[] a = test.replaceAll("\\w+ (\\w+)", "$1").split(" ");
here we first replace word pairs with the second word, then split by space
You can take a set which have all positions Like
Set<String> positions = new HashSet<String>();
positions.add("Software_Engineer");
positions.add("QA");
String test="John Software_Engineer Kartika QA Xing Project_Manager Mark CEO Celina Assistant_Developer";
List<String> positionsInString = new ArrayList<String>();
Iterator<String> iterator = positions.iterator();
while (iterator.hasNext()) {
String position = (String) iterator.next();
if(test.contains(position)){
positionsInString.add(position);
break;
}
}

Issue in Combining splitted String

I have extracted text from "web 2.0 wikipedia" article, and splitted it into "sentences". After that, I am going to create "Strings" which each string containing 5 sentences.
When extracted, the text looks like below, in EditText
Below is my code
finalText = textField.getText().toString();
String[] textArrayWithFullStop = finalText.split("\\. ");
String colelctionOfFiveSentences = "";
List<String>textCollection = new ArrayList<String>();
for(int i=0;i<textArrayWithFullStop.length;i++)
{
colelctionOfFiveSentences = colelctionOfFiveSentences + textArrayWithFullStop[i];
if( (i%5==0) )
{
textCollection.add(colelctionOfFiveSentences);
colelctionOfFiveSentences = "";
}
}
But, when I use the Toast to display the text, here what is gives
Toast.makeText(Talk.this, textCollection.get(0), Toast.LENGTH_LONG).show();
As you can see, this is only one sentence! But I expected it to have 5 sentences!
And the other thing is, the second sentence is starting from somewhere else. Here how I have extracted it into Toast
Toast.makeText(Talk.this, textCollection.get(1), Toast.LENGTH_LONG).show();
This make no sense to me! How can I properly split the text into sentences and, create Strings containing 5 sentences each?
The problem is that for the first sentence, 0 % 5 = 0, so it is being added to the array list immediately. You should use another counter instead of mod.
finalText = textField.getText().toString();
String[] textArrayWithFullStop = finalText.split("\\. ");
String colelctionOfFiveSentences = "";
int sentenceAdded = 0;
List<String>textCollection = new ArrayList<String>();
for(int i=0;i<textArrayWithFullStop.length;i++)
{
colelctionOfFiveSentences += textArrayWithFullStop[i] + ". ";
sentenceAdded++;
if(sentenceAdded == 5)
{
textCollection.add(colelctionOfFiveSentences);
colelctionOfFiveSentences = "";
sentenceAdded = 0;
}
}
add ". " to textArrayWithFullStop[i]
colelctionOfFiveSentences = colelctionOfFiveSentences + textArrayWithFullStop[i]+". ";
I believe that if you modify the mod line to this:
if(i%5==4)
you will have what you need.
You probably realize this, but there are other reasons why someone might use a ". ", that doesn't actually end a sentence, for instance
I spoke to John and he said... "I went to the store.
Then I went to the Tennis courts.",
and I don't believe he was telling the truth because
1. Why would someone go to play tennis after going to the store and
2. John has no legs!
I had to ask, am I going to let him get away with these lies?
That's two sentences that don't end with a period and would mislead your code into thinking it's 5 sentences broken up at entirely the wrong places, so this approach is really fraught with problems. However, as an exercise in splitting strings, I guess it's as good as any other.
As a side problem(splitting sentences) solution I would suggest to start with this regexp
string.split(".(\\[[0-9\\[\\]]+\\])? ")
And for main problem may be you could use copyOfRange()

Java - How to split (+ sign) on Strings so it could be read as a Variable/Int?

This question is rather difficult to confer, for simplistic sake:
I am loading some Strings via XML (XStream).
for example, Your total count is +variable+ .
The outcome would be
"Your total count is +variable+ ."
when it ideally should be
"Your total count is" + variable + "." aka "Your total count is 1."
The issue: (if you can't see it) it reads the variable as if it were a String.
I know I would need to split that String from where the plus sign starts and ends and then connect it to the String, for it to read as a variable, like the above. But how? I need this to be done so that the String before the variable and after it is split.
so:
"Your total count is 50, would you like a cookie?"
aka
"Your total count is " + variable + " , would you like a cookie?"
Thank you alot!
Okay, I agree it's very confusing. I've edited this post (read below).
Well I am loading some Strings via XML this could be the same case if I were loading them via a .txt or a config file.
On the XML file, I lay it out like so:
<list>
<dialogue>
<line>
<string> Your total count is + Somewhere.totalCount +, Would you like a cookie?</string>
</line>
</dialogue>
</list>
As you can see, the XML file can't locate where the variable (in a class is), nor can it recognise if it is a variable or a string.
I know that I would need to alter the way it reads it, so if there is a plus sign (+) anywhere on the String, it would simply "split" it away from the original String so I can reconnect it.
E.g. Your phone number is + PhoneBook.phoneNumber + should I call you? as it would be read from a XML file.
I want to "split" the String from front to back like so:
"Your phone number is " + PhoneBook.phoneNumber + " should I call you?"
At the same time, I'm not assigning a variable because It's already declared in the XML file, I want it to recognise it as a int.
First, Java can not know that the +variable+ part of your string should be replaced with the value of the corresponding variable and also does not provide some "eval" like functionality like PHP or other scripting languages do, which might help you with that.
If you want to exactly replace this specific '+variable+' part of the string, it can be done like this:
int variable = 1;
String text = "Your total count is +variable+.";
String textWithVariableValue = text.replaceAll("\\+variable\\+", Integer.toString(variable));
But if you want to replace variables with arbitrary names, you will have to put them into a Map first, and then find all occurences of +somename+ in the string and replace it with the corresponding value stored in the map. Something like this:
Map<String, Object> variables = new HashMap<String, Object>();
variables.put("var1", 1);
variables.put("foo", 5);
String text = "var1 = +var1+, foo = +foo+";
String textWithVariableValues = text;
for (String variableName : variables.keySet()) {
Object variableValue = variables.get(variableName);
textWithVariableValues = textWithVariableValues.replaceAll("\\+" + variableName + "\\+", variableValue.toString());
}
Sounds like what you need is the
String.format() method:
int total = calculateTotal();
String s = String.format("Your total is %1d.", total);
Not split, but find and replace.
Simplistically,
int variable = 1;
String src = "Your total count is +variable+.";
String result = src.replaceAll("\\+variable\\+.", Integer.toString(variable));
System.out.println(result);
Should print "Your total count is 1."
EDIT: (after your comment) If you need to replace a multiple variables in one go then the following works for me:
// Replace the ff. with the actual map of variables & values
Map<String, String> vars = Collections.singletonMap(
"variable", Integer.toString(123));
String src = "Your total count is +variable+.";
Pattern p = Pattern.compile("\\+(\\w+)\\+");
StringBuffer sb = new StringBuffer();
Matcher m = p.matcher(src);
while (m.find()) {
String varName = m.group(1);
if (vars.containsKey(varName)) {
m.appendReplacement(sb, vars.get(varName));
}
}
m.appendTail(sb);
System.out.println(sb.toString());
Prints "Your total count is 123."

Categories