Best way to validate a String against many patterns - java

This is a question more about best practices/design patterns than regexps.
In short I have 3 values: from, to and the value I want to change. From has to match one of several patterns:
XX.X
>XX.X
>=XX.X
<XX.X
<=XX.X
XX.X-XX.X
Whereas To has to be a decimal number. Depending on what value is given in From I have to check whether a value I want to change satisfies the From condition. For example the user inputs "From: >100.00 To: 150.00" means that every value greater than 100.00 should be changed.
The regexp itself isn't a problem. The thing is if I match the whole From against one regexp and it passes I still need to check which option was inputted - this will generate at least 5 IFs in my code and every time I want to add another option I will need to add another IF - not cool. Same thing if I were to create 5 Patterns.
Now I have a HashMap which holds a pattern as the key and a ValueMatcher as the value. When a user inputs a From value then I match it in a loop against every key in that map and if it matches then I use the corresponding ValueMatcher to actually check if the value that I want to change satisfies the "From" value.
This aproach on the other hand requires me to have a HashMap with all the possibilities, a ValueMatcher interface and 5 implementations each with only 1 short "matches" methode. I think it sure is better than the IFs, but still looks like an exaggerated solution.
Is there any other way to do it? Or is this how I actually should do it? I really regret that we can't hold methods in a HashMap/pass them as arguments because then I'd only have 1 class with all the matching methodes and store them in a HashMap.

How about a chain of responsibility.
Each ValueMatcher object exactly one From/To rule and a reference to the next ValueMatcher in the chain. Each ValueMatcher has a method which examines a candidate and either transaforms it or passes it on to the next in the chain.
This way adding a new rule is a trivial extension and the controlling code just passes the candidate to the first member of the chain.

a ValueMatcher interface and 5 implementations each with only 1 short "matches" methode. I think it sure is better than the IFs, but still looks like an exaggerated solution.
Well, for something as simple as evaluating a number against an operator and a limit value, couldn't you just write one slightly more generic ValueMatcher which has a limit value and an operator as its parameters? It would then be pretty easy to add 5 instances of this ValueMatcher with a few combinations of >, >=, etc.
EDIT: Removed non Java stuff... sorry about that.

Related

Why using default trash value for string is wrong?

tl;dr;
Why using
string myVariable = "someInitialValueDifferentThanUserValue99999999999";
as default value is wrong?
explanation of situation:
I had a discussion with a colleague at my workplace.
He proposed to use some trash value as default in order to differentiate it from user value.
An easy example it would be like this:
string myVariable = "someInitialValueDifferentThanUserValue99999999999";
...
if(myVariable == "someInitialValueDifferentThanUserValue99999999999")
{
...
}
This is quite obvious and intuitive for me that this is wrong.
But I could not give a nice argument for this, beyond that:
this is not professional.
there is a slight chance that someone would input the same value.
Once I read that if you have such a situation your architecture or programming habits are wrong.
edit:
Thank you for the answers. I found a solution that satisfied me, so I share with the others:
It is good to make a bool guard value that indicates if the initialization of a specific object has been accomplished.
And based on this private bool variable I can deduce if I play with a string that is default empty value "" from my mechanism (that is during initialization) or empty value from the user.
For me, this is a more elegant way.
Optional
Optional can be used.
Returns an empty Optional instance. No value is present for this Optional.
API Note:
Though it may be tempting to do so, avoid testing if an object is empty by comparing with == against instances returned by Option.empty(). There is no guarantee that it is a singleton. Instead, use isPresent().
Ref: Optional
Custom escape sequence shared by server and client
Define default value
When the user enter's the default value, escape the user value
Use a marker character
Always define the first character as the marker character
Take decision based on this character and strip this character for any actual comparison
Define clear boundaries for the check as propagating this character across multiple abstractions can lead to code maintenance issues.
Small elaboration on "It's not professional":
It's often a bad idea, because
it wastes memory when not a constant (at least in Java - of course, unless you're working with very limited space that's negligible).
Even as constant it may introduce ambiguity once you have more classes, packages or projects ("Was it NO_INPUT, INPUT_NOT_PROVIDED, INPUT_NONE?")
usually it's a sign that there will be no standardized scope-bound Defined_Marker_Character in the Project Documentation like suggested in the other answers
it introduces ambiguity for how to deal with deciding if an input has been provided or not
In the end you will either have a lot of varying NO_INPUT constants in different classes or end up with a self-made SthUtility class that defines one constant SthUtility.NO_INPUT and a static method boolean SthUtility.isInputEmpty(...) that compares a given input against that constant, which basically is reinventing Optional. And you will be copy-pasting that one class into every of your projects.
There is really no need as you can do the following as of Java 11 which was four releases ago.
String value = "";
// true only if length == 0
if (value.isEmpty()) {
System.out.println("Value is empty");
}
String value = " ";
// true if empty or contains only white space
if (value.isBlank()) {
System.out.println("Value is blank");
}
And I prefer to limit uses of such strings that can be searched in the class file that might possibly lead to exploitation of the code.

Custom Java sort by name

I want to sort something like this:
Given an ArrayList of objects with name Strings, I am trying to write the compareTo function such that Special T is always first, Special R is always second, Special C is always third, and then everything else is just alphabetical:
Special T
Special R
Special C
Aaron
Alan
Bob
Dave
Ron
Tom
Is there a standard way of writing this kind of compare function without needing to iterate over all possible combinations between the special cases and then invoking return getName().compareTo(otherObject).getName()); if it's a non-special case?
I would put the special cases in a HashMap<String, Integer> with the name as key and position as value. The advantages are:
search is in O(1) order of magnitude
The HashMap may be populated from an external source

Transformation algorithms for numerical values similar to functionality of Soundex, Metaphone, etc

I'm working on implementing probablistic matching for person record searching. As part of this, I plan to have blocking performed before any scoring is done. Currently, there are a lot of good options for transforming strings so that they can be stored and then searched for, with similar strings matching each other (things like soundex, metaphone, etc).
However, I've struggled to find something similar for purely numeric values. For example, it would be nice to be able to block on a social security number and not have numbers that are off or have transposed digits be removed from the results. 123456789 should have blocking results for 123456780 or 213456789.
Now, there are certainly ways to simply compare two numerical values to determine how similar they are, but what could I do when there are million of numbers in the database? It's obviously impractical to compare them all (and that would certainly invalidate the point of blocking).
What would be nice would be something where those three SSNs above could somehow be transformed into some other value that would be stored. Purely for example, imagine those three numbers ended up as AAABBCCC after this magical transformation. However, something like 987654321 would be ZZZYYYYXX and 123547698 would be AAABCCBC or something like that.
So, my question is, is there a good transformation for numeric values like there exists for alphabetical values? Or, is there some other approach that might make sense (besides some highly complex or low performing SQL or logic)?
The first thing to realize is that social security numbers are basically strings of digits. You really want to treat them like you would strings rather than numbers.
The second thing to realize is that your blocking function maps from a record to a list of strings that identify comparison worthy sets of items.
Here is some Python code to get you started. (I know you asked for Java, but I think the Python is clear and you aren't paying me enough to write it in Java :P ). The basic idea is to take your input record, simulate roughing it up in multiple ways (to get your blocking keys), and then group on by any match on those blocking keys.
import itertools
def transpositions(s):
for pos in range(len(s) - 1):
yield s[:pos] + s[pos + 1] + s[pos] + s[pos + 2:]
def substitutions(s):
for pos in range(len(s)):
yield s[:pos] + '*' + s[pos+1:]
def all_blocks(s):
return itertools.chain([s], transpositions(s), substitutions(s))
def are_blocked_candidates(s1, s2):
return bool(set(all_blocks(s1)) & set(all_blocks(s2)))
assert not are_blocked_candidates('1234', '5555')
assert are_blocked_candidates('1234', '1239')
assert are_blocked_candidates('1234', '2134')
assert not are_blocked_candidates('1234', '1255')

How to check if two Strings are approximately equal?

I'm making a chat responder for a game and i want know if there is a way you can compare two strings and see if they are approximatley equal to each other for example:
if someone typed:
"Strength level?"
it would do a function..
then if someone else typed:
"Str level?"
it would do that same function, but i want it so that if someone made a typo or something like that it would automatically detect what they're trying to type for example:
"Strength tlevel?"
would also make the function get called.
is what I'm asking here something simple or will it require me to make a big giant irritating function to check the Strings?
if you've been baffled by my explanation (Not really one of my strong points) then this is basically what I'm asking.
How can I check if two strings are similar to each other?
See this question and answer: Getting the closest string match
Using some heuristics and the Levenshtein distance algorithm, you can compute the similarity of two strings and take a guess at whether they're equal.
Your only option other than that would be a dictionary of accepted words similar to the one you're looking for.
You can use Levenshtein distance.
I believe you should use one of Edit distance algorithms to solve your problem. Here is for example Levenstein distance algorithm implementation in java. You may use it to compare words in the sentences and if sum of their edit distances would be less than for example 10% of sentence length consider them equals.
Perhaps what you need is a large dictionary for similar words and common spelling mistakes, for which you would use for each word to "translate" to one single entry or key.
This would be useful for custom words, so you could add "str" in the same key as "strength".
However, you could also make a few automated methods, i.e. when your word isn't found in the dictionary, to loop recursively for 1 letter difference (either missing or replaced) and can recurse into deeper levels, i.e. 2 missing letters etc.
I found a few projects that do text to phonemes translations, don't know which one is best
http://mary.dfki.de/
http://www2.eng.cam.ac.uk/~tpl/asp/source/Phoneme.java
http://java.dzone.com/announcements/announcing-phonemic-10
If you want to find similar word beginnings, you can use a stemmer. Stemmers reduce words to a common beginning. The most known algorithm if the Port Stemmer (http://tartarus.org/~martin/PorterStemmer).
Levenshtein, as pointed above, is great, but computational heavy for distances greater than one or two.

Java - Recover the original order of a list after its elements had been randomized

The Title is self explanatory. This was an interview question. In java, List is an interface. So it should be initialized by some collection.
I feel that this is a tricky question to confuse. Am I correct or not? How to answer this question?
Assuming you don't have a copy of the original List, and the randomizing algorithm is truly random, then no, you cannot restore the original List.
The explanation is far more important on this type of question than the answer. To be able to explain it fully, you need to describe it using the mathematical definitions of Function and Map (not the Java class definitions).
A Function is a Map of Elements in one Domain to another Domain. In our example, the first domain is the "order" in the first list, and the second domain is the "order" in the second list. Any way that can get from the first domain to the second domain, where each element in the first domain only goes to one of the elements in the second domain is a Function.
What they want is to know if there is an Inverse Function, or a corresponding function that can "back map" the elements from the second domain to the elements in the first domain. Some functions (squaring a number, or F(x) = x*x ) cannot be reversed because one element in the second domain might map back to multiple (or none) elements in the first domain. In the squaring a number example
F(x) = x * x
F(3) = 9 or ( 3 -> 9)
F(12) = 144 or ( 12 -> 144)
F(-11) = 121 or (-11 -> 121)
F(-3) = 9 or ( -3 -> 9)
attempting the inverse function, we need a function where
9 maps to 3
144 maps to 12
121 maps to -11
9 maps to -3
Since 9 must map to 3 and -3, and a Map must have only one destination for every origin, constructing an inverse function of x*x is not possible; that's why mathematicians fudge with the square root operator and say (plus or minus).
Going back to our randomized list. If you know that the map is truly random, then you know that the output value is truly independent of the input value. Thus if you attempted to create the inverse function, you would run into the delimma. Knowledge that the function is random tells you that the input cannot be calculated from the output, so even though you "know" the function, you cannot make any assumptions about the input even if you have the output.
Unless, it is pseudo-random (just appears to be random) and you can gather enough information to reverse the now-not-truly random function.
If you have not kept some external order information (this includes things like JVM trickery with ghost copies), and the items are not implicitly ordered, you cannot recover the original ordering.
When information is lost, it is lost. If the structure of the list is the only place recording the order you want, and you disturb that order, it's gone for good.
There's a user's view, and there's internals. There's the question as understood and the question as can be interpreted.
The user's view is that list items are blocks of memory, and that the pointer to the next item is a set of (4?8? they keep changing the numbers:) bytes inside this memory. So when the list is randomized and the pointer to the next item is changed, that area of memory is overriden and can't be recovered.
The question as understood is that you are given a list after it had been randomized.
Internals - I'm not a Java or an OS guy, but you should look into situations where the manner in which the process is executed differs from the naive view: Maybe Java randomizes lists by copying all the cells, so the old list is still kept in memory somewhere? Maybe it keeps backup values of pointers? Maybe the pointers are kept at an external table, separate from the list, and can be reconstructed? Maybe. Internals.
Understanding - Who says you haven't got an access to the list before it was randomized? You could have just printed it out! Or maybe you have a trace of the execution? Or who said you're using Java's built it list? Maybe you are using your own version controlled list? Or maybe you're using your own reversable-randomize method?
Edwin Buck's answer is great but it all depends what the interviewer was looking for.

Categories