I have to keep in mind the priority of operations, all the numbers including the answer are integers (seems silly to me but whatever), and I have to parse a String for the equation and, as far as I'm aware, push each number and each operator in two different stacks before I compare them.
I don't know how to approach this problem, and right now my main concern is dealing with parentheses. I want to use a recursive method to solve the calculation which would check for parentheses and solve them and replace them with their result, but I'm not sure how to do that. I could use substring() and indexOf() but I'd rather be more elegant.
Other than that I'm not sure how to solve the calculation once numbers and operators are stacked. I think I should compare the top 2 operators to make sure that if I combine two numbers, it is in the right order of operations, but I don't want to be clumsy with that part either.
My recommendation would be that you study the Shunting-yard algorithm and come back when you have specific questions about how it works or how to implement certain parts of it.
Related
I have following:
private static List<Pattern> pats;
This list contains around 90 patterns that is instantiated before iteration. The patterns are complex, like:
System.out.println("pat: " + pats.get(0).toString());
// pat: \bsingle1\b|\bsingle2\b|(?=.*\bcombo1\b)(?=.*\bcombo2\b)|\bsingle3\b|\bwild.*card\b ...
Some of the patterns contains around 40-50 single words or combination of words, as the regex above shows. The words can contain wildcards.
Now, I have a list of strings, sentences on around 30-60 characters each. I iterate through them and for every string in the list, I iterate them through the list of patterns and perform a pattern.match("This is one of the strings in my list").find() until I get a match, which I mark down and save somewhere else, then I break out of iteration through patterns and continue with the next string in the list.
This is a categorization job, so several strings can match on the same pattern.
My problem is that this of course takes a lot of execution time, I am looking for a more efficient way to solve this problem.
Any suggestions?
One thing that solved my problem (to 90%) was to give up regex partially where String.indexOf() made more sense out of a performance perspective.
This post inspired me: Quickest way to return list of Strings by using wildcard from collection in Java
I wrote my own implementation since the one in the link handles only full words, while I'm dealing with sentences.
It helped with wildcards "*" and pipes "hel(l|lo)" in the performance perspective, the former more than the latter.
Reason for this direction was several recommendations, and it improved performance by cutting down time on 200000 sentences from 1.5 hour down to 15 minutes.
You could also offload the regular expression in a dedicated service ? I believe that it could be faster (and perhaps safer) than giving up regexp partially ?
If your app is intended to run on multiple server, you may also gain performances by centralizing the computation cost.
Here is an example of such implementation via a REST api : http://www.rex-daemon.com/tutorial/more-advanced-queries/
I'm working on implementing probablistic matching for person record searching. As part of this, I plan to have blocking performed before any scoring is done. Currently, there are a lot of good options for transforming strings so that they can be stored and then searched for, with similar strings matching each other (things like soundex, metaphone, etc).
However, I've struggled to find something similar for purely numeric values. For example, it would be nice to be able to block on a social security number and not have numbers that are off or have transposed digits be removed from the results. 123456789 should have blocking results for 123456780 or 213456789.
Now, there are certainly ways to simply compare two numerical values to determine how similar they are, but what could I do when there are million of numbers in the database? It's obviously impractical to compare them all (and that would certainly invalidate the point of blocking).
What would be nice would be something where those three SSNs above could somehow be transformed into some other value that would be stored. Purely for example, imagine those three numbers ended up as AAABBCCC after this magical transformation. However, something like 987654321 would be ZZZYYYYXX and 123547698 would be AAABCCBC or something like that.
So, my question is, is there a good transformation for numeric values like there exists for alphabetical values? Or, is there some other approach that might make sense (besides some highly complex or low performing SQL or logic)?
The first thing to realize is that social security numbers are basically strings of digits. You really want to treat them like you would strings rather than numbers.
The second thing to realize is that your blocking function maps from a record to a list of strings that identify comparison worthy sets of items.
Here is some Python code to get you started. (I know you asked for Java, but I think the Python is clear and you aren't paying me enough to write it in Java :P ). The basic idea is to take your input record, simulate roughing it up in multiple ways (to get your blocking keys), and then group on by any match on those blocking keys.
import itertools
def transpositions(s):
for pos in range(len(s) - 1):
yield s[:pos] + s[pos + 1] + s[pos] + s[pos + 2:]
def substitutions(s):
for pos in range(len(s)):
yield s[:pos] + '*' + s[pos+1:]
def all_blocks(s):
return itertools.chain([s], transpositions(s), substitutions(s))
def are_blocked_candidates(s1, s2):
return bool(set(all_blocks(s1)) & set(all_blocks(s2)))
assert not are_blocked_candidates('1234', '5555')
assert are_blocked_candidates('1234', '1239')
assert are_blocked_candidates('1234', '2134')
assert not are_blocked_candidates('1234', '1255')
For a project that I'm currently working on I am dealing with a list of lists of integers, something of the form:
{[1,2];[5];[3,6,7]}
The idea here is that I'm trying to resolve an n-dimensional array into a list of the local maxima that occur in whatever particular axis I happen to be looking at. My question is this: I would like to get out a list of what would essentially be points in this n-dimensional space that contains every possible combination of entries of this list. For example, I would want the above to return:
{[1,5,3];[1,5,6];[1,5,7];[2,5,3];[2,5,6];[2,5,7]}
With the ordering not actually mattering to me. My first idea in how to approach this would be to boil this down to a tree where each path represents a possible combination and outputting every possible path, but I'm really not sure if this is the best way of going about it, and I am unfamiliar enough with Java's tree classes to be unsure if this would actually be straightforward to implement or not. Ideas?
Ah, my mistake, totally a duplicate.
I'm making a chat responder for a game and i want know if there is a way you can compare two strings and see if they are approximatley equal to each other for example:
if someone typed:
"Strength level?"
it would do a function..
then if someone else typed:
"Str level?"
it would do that same function, but i want it so that if someone made a typo or something like that it would automatically detect what they're trying to type for example:
"Strength tlevel?"
would also make the function get called.
is what I'm asking here something simple or will it require me to make a big giant irritating function to check the Strings?
if you've been baffled by my explanation (Not really one of my strong points) then this is basically what I'm asking.
How can I check if two strings are similar to each other?
See this question and answer: Getting the closest string match
Using some heuristics and the Levenshtein distance algorithm, you can compute the similarity of two strings and take a guess at whether they're equal.
Your only option other than that would be a dictionary of accepted words similar to the one you're looking for.
You can use Levenshtein distance.
I believe you should use one of Edit distance algorithms to solve your problem. Here is for example Levenstein distance algorithm implementation in java. You may use it to compare words in the sentences and if sum of their edit distances would be less than for example 10% of sentence length consider them equals.
Perhaps what you need is a large dictionary for similar words and common spelling mistakes, for which you would use for each word to "translate" to one single entry or key.
This would be useful for custom words, so you could add "str" in the same key as "strength".
However, you could also make a few automated methods, i.e. when your word isn't found in the dictionary, to loop recursively for 1 letter difference (either missing or replaced) and can recurse into deeper levels, i.e. 2 missing letters etc.
I found a few projects that do text to phonemes translations, don't know which one is best
http://mary.dfki.de/
http://www2.eng.cam.ac.uk/~tpl/asp/source/Phoneme.java
http://java.dzone.com/announcements/announcing-phonemic-10
If you want to find similar word beginnings, you can use a stemmer. Stemmers reduce words to a common beginning. The most known algorithm if the Port Stemmer (http://tartarus.org/~martin/PorterStemmer).
Levenshtein, as pointed above, is great, but computational heavy for distances greater than one or two.
All the questions pertaining this don't seem to answer the particular question I have.
My problem is this. I have a list of search terms, and for each term I find the edit distance to find possible misspelling of a word.
So for each word separated by a space, I have possible words each word could be.
For example: searching for green chilli might give us "fuzzy" words "green, greene and grain" and "chilli, chill and chilly".
Now I want the RowFilter to search for: "green OR greene OR grain" AND "chilli OR chill OR chilly".
I can't seem to find a way to do this in Java. I've looked all over the place but nothing talks about concatenating the OR and AND filters together in one RowFilter.
Would I have to roll my own solution based on the model? I suppose I can do this, but my method would most probably be naive at first and slow.
Any pointers as to how to roll my own solution for this or better yet, what's the Java way to do this right?
RowFilter.orFilter() and RowFilter.andFilter() seem apropos; each includes examples, and each accepts an arbitrary number of arguments.