I am using poi api for my development.Let me explain my process.
1.Compare two sentences.
Example :
A1 : Arun is well.
A2 : Is aruni well.
Here i need to find newly added word in A2 and newly added letter in arun*i* and highlight it with some colours.
How is it possible by using java .?
Thanks..!
This is not specifically java question. It is more of algorithm question. So once you understand algorithm, it will be trivial to implement solution in java in your case.
See this so question: How to Check for Deleted Words Between 2 Sentences in Java
and read about Longest common subsequence"
Related
I've been searching around and havn't quite found my answer.
At this moment me and along with my group have created a few classes resembling a Bank with Customer and Account and so on.
I've been struggling lately with trying to improve and secure our code by making our variable called "name" only respond to certain inputs.
In this case, I want to make it only possible for the person to enter name as such:
Atleast 2 words = (For the word part I've seen codes where you count towards the white space between but don't know yet what you do about the last word since there wont be a white space)
Max 4 words = ( Same thing here)
No special signs such as ,!%¤"#()=%/'¨. = ( for this, I've read something about "Matcher and pattern" )
Now I'm quite new to Java and I'm not asking for a code from someone, I'm asking for someone to point me in the right directions regarding codes, because alot of what i've seen like the Matcher and pattern are things that you import with downloading utils and stuff but I reckon that it's not needed and there should be a simpler more basic way as I'm not trying to get ahead of myself with copying codes just to get it done.
So yeah, the String "name" is used alot in our main class "Banklogic" where almost every method that adds something has the variable "name" in it, so it's quite important that I get this done.
I hope I was clear enough and any help would be appreciated! I'm gonna put the alarm for 3 hours before school to see what you guys have come up with so I can try and complete the code before our meeting! Thanks alot in advance :)
Since you asked for hints, you can use Regex to add such rules.
For Numbers only:
if(string.matches("[0-9\\W]")
//allow insertion of data else not
As for rules related Word Count:
string.split("\\W") will create an array separated by space character. You can count the number of elements in this array and allow/disallow input based on that.
As for no signs and only letters:
if(string.matches("[a-zA-Z\\W]")
// Allow Input else not
You can use Document Filter to implement these methods. Document filter will only allow text to be entered if you allow it to.
I hope this helped as a hint.
Also, note that \\W is for whitespaces. If you dont want to allow whitespaces, remove that char.
This is the most effective and simple way of doing the task.
EDIT:
This is a Class I wrote a little while ago to achieve such tasks. Just in case if you are interested....
I'm making a chat responder for a game and i want know if there is a way you can compare two strings and see if they are approximatley equal to each other for example:
if someone typed:
"Strength level?"
it would do a function..
then if someone else typed:
"Str level?"
it would do that same function, but i want it so that if someone made a typo or something like that it would automatically detect what they're trying to type for example:
"Strength tlevel?"
would also make the function get called.
is what I'm asking here something simple or will it require me to make a big giant irritating function to check the Strings?
if you've been baffled by my explanation (Not really one of my strong points) then this is basically what I'm asking.
How can I check if two strings are similar to each other?
See this question and answer: Getting the closest string match
Using some heuristics and the Levenshtein distance algorithm, you can compute the similarity of two strings and take a guess at whether they're equal.
Your only option other than that would be a dictionary of accepted words similar to the one you're looking for.
You can use Levenshtein distance.
I believe you should use one of Edit distance algorithms to solve your problem. Here is for example Levenstein distance algorithm implementation in java. You may use it to compare words in the sentences and if sum of their edit distances would be less than for example 10% of sentence length consider them equals.
Perhaps what you need is a large dictionary for similar words and common spelling mistakes, for which you would use for each word to "translate" to one single entry or key.
This would be useful for custom words, so you could add "str" in the same key as "strength".
However, you could also make a few automated methods, i.e. when your word isn't found in the dictionary, to loop recursively for 1 letter difference (either missing or replaced) and can recurse into deeper levels, i.e. 2 missing letters etc.
I found a few projects that do text to phonemes translations, don't know which one is best
http://mary.dfki.de/
http://www2.eng.cam.ac.uk/~tpl/asp/source/Phoneme.java
http://java.dzone.com/announcements/announcing-phonemic-10
If you want to find similar word beginnings, you can use a stemmer. Stemmers reduce words to a common beginning. The most known algorithm if the Port Stemmer (http://tartarus.org/~martin/PorterStemmer).
Levenshtein, as pointed above, is great, but computational heavy for distances greater than one or two.
Now this is a tricky problem for which I'm not able to figure out a good solution. Suppose we have a String in Java:- "He ate 3 apples today." Now the digit 3 can be easily identified in Java using isNumeric function or using regular expressions. But what if I have a String like: "He ate three apples today."? How can I identify that three is actually a number? I used OpenNlp and used its POS tagger but the time it takes to do is really too much! Can anyone suggest a better solution for this? Also among the ".bin" of OpenNlp, there is one file-"num.bin", but I don't know how to use this file. OpenNlp documentation also say nothing about it. Can anyone tell me if this is exactly what I've been looking for, and if yes then how to use it.
/*********************************************************************************************************************************/
I'm actually short of time here, so I've settled on a temporary solution here. Make a file/dictionary and take all the entries in a hashtable. Then I'll tokenize my sentence and check word by word for numbers, similar to what you guys suggested. I'll keep on updating the file as and when required. Thanks for your valuable suggestions guys, and if you have got something better than this I'd be really glad. OpenNlp implements this in a very good way, the only problem with it is time complexity and I want to do this in minimum time possible.
Create a dictionary of numbers. Search for elements from that dictionary in the text.
Check asympotic complexity, it may be cheaper to sort the text first.
You have to keep all that words in arrays and then use it. Here is an example how to convert number to string. It may help you... I think you have to split your text into words and check if a word is a number (three). If yes check the next word because it can be say "million", then check the next word and so on. It's not easy and seems like a little library.I think you'll spend a lot of time writing this. Or try to search in google for a library like this. Maybe someone have already got this problem, wrote a library and shares it for free )) Good luck.
The following list contains 1 correct word called "disastrous" and other incorrect words which sound like the correct word?
A. disastrus
B. disasstrous
C. desastrous
D. desastrus
E. disastrous
F. disasstrous
Is it possible to automate generation of wrong choices given a correct word, through some kind of java dictionary API?
No, there is nothing related in java API. You can make a simple algorithm which will do the job.
Just make up some rules about letters permutations and doubling and add generated words to the Set until you get enough words.
There are a number of algorithms for matching words by sound - 'soundex' is the one that springs to mind, but I remember uncovering a few when I did some research on this a couple of years ago. I expect the problem you would find is that they take a word and return a value that represents how the word sounds so you can see if two spellings sound similar (so the words in the question should generate similar values); but I expect doing the reverse, i.e. taking the value and generating similar sounding spellings, would be quite hard.
is there a dictionary i can download for java?
i want to have a program that takes a few random letters and sees if they can be rearanged into a real word by checking them against the dictionary
Is there a dictionary i can download
for java?
Others have already answered this... Maybe you weren't simply talking about a dictionary file but about a spellchecker?
I want to have a program that takes a
few random letters and sees if they
can be rearranged into a real word by
checking them against the dictionary
That is different. How fast do you want this to be? How many words in the dictionary and how many words, up to which length, do you want to check?
In case you want a spellchecker (which is not entirely clear from your question), Jazzy is a spellchecker for Java that has links to a lot of dictionaries. It's not bad but the various implementation are horribly inefficient (it's ok for small dictionaries, but it's an amazing waste when you have several hundred thousands of words).
Now if you just want to solve the specific problem you describe, you can:
parse the dictionary file and create a map : (letters in sorted order, set of matching words)
then for any number of random letters: sort them, see if you have an entry in the map (if you do the entry's value contains all the words that you can do with these letters).
abracadabra : (aaaaabbcdrr, (abracadabra))
carthorse : (acehorrst, (carthorse) )
orchestra : (acehorrst, (carthorse,orchestra) )
etc...
Now you take, say, three random letters and get "hsotrerca", you sort them to get "acehorrst" and using that as a key you get all the (valid) anagrams...
This works because what you described is a special (easy) case: all you need is sort your letters and then use an O(1) map lookup.
To come with more complicated spell checkings, where there may be errors, then you need something to come up with "candidates" (words that may be correct but mispelled) [like, say, using the soundex, metaphone or double metaphone algos] and then use things like the Levenhstein Edit-distance algorithm to check candidates versus known good words (or the much more complicated tree made of Levenhstein Edit-distance that Google use for its "find as you type"):
http://en.wikipedia.org/wiki/Levenshtein_distance
As a funny sidenote, optimized dictionary representation can store hundreds and even millions of words in less than 10 bit per word (yup, you've read correctly: less than 10 bits per word) and yet allow very fast lookup.
Dictionaries are usually programming language agnostic. If you try to google it without using the keyword "java", you may get better results. E.g. free dictionary download gives under each dicts.info.
OpenOffice dictionaries are easy to parse line-by-line.
You can read it in memory (remember it's a lot of memory):
List words = IOUtils.readLines(new FileInputStream("dicfile.txt")) (from commons-io)
Thus you get a List of all words. Alternatively you can use the Line Iterator, if you encounter memory prpoblems.
If you are on a unix like OS look in /usr/share/dict.
Here's one:
http://java.sun.com/docs/books/tutorial/collections/interfaces/examples/dictionary.txt
You can use the standard Java file handling to read the word on each line:
http://www.java-tips.org/java-se-tips/java.io/how-to-read-file-in-java.html
Check out - http://sourceforge.net/projects/test-dictionary/, it might give you some clue
I am not sure if there are any such libraries available for download! But I guess you can definitely digg through sourceforge.net to see if there are any or how people have used dictionaries - http://sourceforge.net/search/?type_of_search=soft&words=java+dictionary