How to compare String that are closest in similarity - in Java [duplicate] - java

This question already has answers here:
Spelling correction for data normalization in Java
(5 answers)
Closed 5 years ago.
Is there a library in Java - that can do a spell check.
I have a ArrayList of Categories - which is a list of Words {Fox,Lion,Wolf,Snake}. This list can be very big.
The program will ask the user to input the "Animal": If the user makes a spelling mistake e.g. inserts "Fix" or "Loin"..
Is there a way to compare the input to the elements of the List and find the closest in similarity and use the corresponding element instead of the misspelled input for the rest of the program.

You are probably looking for the Levenshtein Distance between two strings. The distance grows the more dissimilar the strings are.
Apache commons has an implementation: https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/similarity/LevenshteinDistance.html

If it is just a list of animals, then you could write your own code to check if the word is part of the list, it's length, amount of matching characters etc..

Related

Create substring breaking on the whole words [duplicate]

This question already has answers here:
split a string in java into equal length substrings while maintaining word boundaries
(2 answers)
Closed 6 years ago.
In Java or Groovy is there a library or a simple implementation
that for a text it would create substring at some length but not breaking a word in the middle?
An example method with an input: substring("My very long text", 9 /*substring length*/, true /*break on whole words only*/)
An output because without keeping the words it would result in My very l. Since I want to break on the whole words only it will be My very.
In case there are no spaces it would cut the string at the index:
substring("MyVeryLongText", 9 /*substring length*/, true /*break on whole words only*/) --> MyVeryLon
I believe I can say that there isn’t built into Java. We wrote our own method for a similar task. There could easily be some free library out there, but I don’t think I’d introduce a new dependency for this relatively simple problem. You will want to decide what you want to happen if there is no space at which to break (substring("Beginning with a long word", 4, true)). Maybe the library you find doesn’t do what you want in this case. If writing your own, you need to take the cases into account where the original string is too short (substring("Cat", 4, true)) and where the space comes right after the 4th char (substring("Long text", 4, true)).

How to sort String array by its component? Java [duplicate]

This question already has answers here:
Sort Java String array by multiple numbers
(2 answers)
Closed 8 years ago.
Date,Lat,Lon,Depth,Mag
20000101,34.6920,-116.3550,12.30,1.21
20000101,34.4420,-116.2280,7.32,1.01
20000101,37.4172,-121.7667,5.88,1.14
20000101,-41.1300,174.7600,27.00,1.90
20000101,37.6392,-119.0482,2.40,1.03
20000101,32.1790,-115.0730,6.00,2.44
20000101,59.7753,-152.2192,86.34,1.48
20000101,34.5230,-116.2410,11.63,1.61
20000101,59.5369,-153.1360,100.15,1.62
20000101,44.7357,-110.7932,4.96,2.20
20000101,34.6320,-116.2950,9.00,1.73
I need to sort these data by each criterion
I tried Double.parseDouble(array[0].split(",").[1]) but takes too much time using Selection Sort
Is there any way to just sort by not using parseDouble?
You have strings that you want to sort by a certain substring interpreted as a Double, so no way around parsing them. But if you parse each line once, and save those results to sort on, that will be faster than calling parseDouble within each comparison.
I suppose you could create a class that represents the data on each line. Parse the data into instances of that class and then sort on the objects.

Java library to calculate the relative difference between two Strings? [duplicate]

This question already has answers here:
Fuzzy string search library in Java [closed]
(8 answers)
Closed 9 years ago.
I'm looking for a way to do programmatically detect the delta ratio between two strings. I can use string length, but this doesn't give much useful information for like-sized but different inputs. There is a java diff tool on google code Java Diff Utils, but it hasn't been updated since 2011 and I don't need to actually modify the Strings themselves.
I'm attempting to do change detection with threshold values, for instance: Updated string is 42% different than existing string, are you sure you want to proceed?
Does anyone know of a library that could be used for this, or is java-diff-utils my only option? I couldn't find much in apache commons, and googling is returning irrelevant information.
You could use the Levenshtein Distance to calculate how much different two strings are amongst themselves. There's some quite complex math there but the actual code is rather short. You can easily rewrite the code in that wiki in Java.
The difference will be measured in integers, saying how many steps you'd take to turn one string into the other. A step may be a character addition, removal, or replacement with another character. It will tell you the amount of steps it takes, but not which steps, nor in which order. But then again, since you only want to measure the total difference, I'm sure that's enough information for your needs.
edit: one of the commenters (kaos) provided a link to an implementation of Levenshtein Distance in the Apache Commons.

Is there a method in Java that lets you find the index of an element in an int array? [duplicate]

This question already has answers here:
Where is Java's Array indexOf?
(13 answers)
Closed 9 years ago.
I need to assess the last int in an array where a certain conditional is met. My program can work out what that int is, but it needs to also know where it's position was in the array. I searched on stack-exchange and someone posted this:
Arrays.asList(array).indexOf(indexPos);
As a possible solution, but I am not sure if I am doing it right, because I get the error
cannot find symbol. I also allowed:
int test = Arrays.asList(array).indexOf(indexPos);
And then tried to print test, but I could not even get to that point. Thanks.
You may need to import java.util.Arrays to get the symbol.
There is no guaranteed way of finding the position of an element in an array except for looping over the array - that is basically what your asList snippets are doing.
This will work as long as your arrays don't have duplicate values. If you need to handle duplicate values, you may need to rethink you data structs.
Someone posted a similar question that someone else asked. It seems that this has worked for me.
The Code is:
java.util.Arrays.asList(seq).indexOf(indexPos);
and the Question:Where is Java's Array indexOf?
Yes you have the method defined in List interface. So you need to use asList() function followed by indexOf() function.
If the array is not sorted you can use java.util.Arrays.asList(theArray).indexOf(o)
If the array is sorted, you can make use of a binary search function(improves performance) java.util.Arrays.binarySearch(theArray, o)
As for the error make sure you have imported java.util.Arrays. Also that you have defined Array seq and int indexPos which makes your code int test = Arrays.asList(seq).indexOf(indexPos);.

Is it possible to automate generation of wrong choices from a correct word?

The following list contains 1 correct word called "disastrous" and other incorrect words which sound like the correct word?
A. disastrus
B. disasstrous
C. desastrous
D. desastrus
E. disastrous
F. disasstrous
Is it possible to automate generation of wrong choices given a correct word, through some kind of java dictionary API?
No, there is nothing related in java API. You can make a simple algorithm which will do the job.
Just make up some rules about letters permutations and doubling and add generated words to the Set until you get enough words.
There are a number of algorithms for matching words by sound - 'soundex' is the one that springs to mind, but I remember uncovering a few when I did some research on this a couple of years ago. I expect the problem you would find is that they take a word and return a value that represents how the word sounds so you can see if two spellings sound similar (so the words in the question should generate similar values); but I expect doing the reverse, i.e. taking the value and generating similar sounding spellings, would be quite hard.

Categories