Regex to validate url with port, clarification needed - java

I am trying to match a URL using the following regex, in Java
^http(s*):\\/\\/.+:[1-65535]/v2/.+/component/.+$
Test fails using URL: https://box:1234/v2/something/component/a/b
I suspect it's the number range that's causing it. Help me understand what am i missing here please?

See http://www.regular-expressions.info/numericranges.html. You can't just write [1-65535] to match 1 or 65535. That says any number 1-6, or 5 or 3.
The expression you need is quite verbose, in this case:
([1-9][0-9]{0,3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])
(Credit to http://utilitymill.com/utility/Regex_For_Range)
Another issue is your http(s*). That needs to be https? because in its current form it might allow httpsssssssss://. If your regex takes public input, this is a concern.

^http(s*) is wrong, it would allow httpssssss://...
You need ^https?
This doesn't affect the given test though.

The group [1-65535] basically means number from 1 to 6 or 5 or 5 or 3 or 5.
that would even evaluate, but you need an + (or *) at the end of the group.
To match the port more precisely you could use [1-6][0-9]{0,4}?. That would get you really close, but also allow p.e. 69999 - the {m,n}? is used to specify how often a group can be used (m to n times)
Also take care of that (s*) thing the others pointed out!
That would result in:
^https?:\\/\\/.+:[1-6][0-9]{0,4}?/v2/.+/component/.+$

Related

Pact body matching - match number of nodes

I try to achieve the following:
I want to verify that a specific node in body ("entry") has between a minimum and a maximum number of direct subnodes called "content" (1 to 10 nodes). I don't care about what is in these subnodes, which value they have or how many subnodes they have.
Since I'm new to pact and don't really know how to match something like this I hope someone can help me.
Thanks in advance.
Edit1:
I use a node matcher one of my collegues built like following:
return builder.given("given").uponReceiving("description").path(SERVICEPATH)
.query(query).method(METHOD_GET).headers(ACCEPT, REQ_ACCEPT).willRespondWith().status(200)
.matchHeader(HEADER_CONTENT_TYPE, RESP_HEADER_CONTENT_TYPE_REGEX).body(responseBody)
.matchBody(new PactDslXmlNodeMatcher().node("entry").node("content").value(".*")).toPact();
Don't let that irritate you, that .matchBody just converts that node to the typical $. + path + '[#text]'-notation, where .value adds an regex-matcher rule to body matchers.
I also had a look at: https://github.com/DiUS/pact-jvm/tree/master/pact-jvm-consumer-junit where maxArrayLike / minArrayLike looks promising, but I don't know how I can apply this to my case.
Edit2:
Now I have a very cool PactDslWithProvider as following:
return builder.given("Genereller Test").uponReceiving("Feed soll Eintraege enthalten").path(SERVICEPATH)
.query(query).method(METHOD_GET).headers(ACCEPT, REQ_ACCEPT).willRespondWith().status(200)
.matchHeader(HEADER_CONTENT_TYPE, RESP_HEADER_CONTENT_TYPE_REGEX)
.body(responseBody)
.matchBody(new PactDslXmlNodeMatcher().node("feed").minMaxType(minNumberNodesInFeed, maxNumberNodesInFeed))
.toPact();
The method "minMaxType" adds a MinMaxTypeMatcher to the body-category with the path of the node.
The actual behaviour of this:
It matches type, min and max of the most inner nodes of $.feed. Like: $.feed.0.title, $.feed.1.link, ..., $.feed.6.entry.0.title
But what I actually want is to verify that $.feed has a min and a max number of subnodes. How can I achieve that?
It sounds like you're trying to use Pact to do a functional test, rather than a contract test. Generally speaking, you shouldn't be writing tests that care about how many items there are in an array. Have a read of these two links and let me know how you go.
https://docs.pact.io/best-practices/consumer#use-pact-for-contract-testing-not-functional-testing-of-the-provider
https://docs.pact.io/best-practices/consumer/contract-tests-vs-functional-tests
Also, join us on slack.pact.io if you haven't already.

Multiple arithimetic operator store

I have been thinking to solve any problem like 1+2*4-5 with user entering it and program to solve it. I've read some questions on this site about storing arithmetic operator and the solution says to check by using switch which can't be applied here. I would be thankful if anybody could suggest any idea of how to make it.
I had a similar exercise not long ago, but in the question it was stated that the seperation is a space. So the user input would be 1 + 2 * 4 - 5, and i solved it that way. I will give you some tips but not paste the whole code.
-you read the input as a String
-you can use the String.split() method to devide the String into the pieces you need and they will be put in an array.(in this case: strArray[0]='1',strArray[1]='+', etc)
-you will need a for-loop to go trough every String in the array:
-the decimals will need to be converted to integers with the Integer.parseInt() method.
-The + - * / will need to be put in switch-statement.
(be careful how you construct your loop, think about how many times you want to go trough it and what you need in each loop)
I hope these tips helped.

Transformation algorithms for numerical values similar to functionality of Soundex, Metaphone, etc

I'm working on implementing probablistic matching for person record searching. As part of this, I plan to have blocking performed before any scoring is done. Currently, there are a lot of good options for transforming strings so that they can be stored and then searched for, with similar strings matching each other (things like soundex, metaphone, etc).
However, I've struggled to find something similar for purely numeric values. For example, it would be nice to be able to block on a social security number and not have numbers that are off or have transposed digits be removed from the results. 123456789 should have blocking results for 123456780 or 213456789.
Now, there are certainly ways to simply compare two numerical values to determine how similar they are, but what could I do when there are million of numbers in the database? It's obviously impractical to compare them all (and that would certainly invalidate the point of blocking).
What would be nice would be something where those three SSNs above could somehow be transformed into some other value that would be stored. Purely for example, imagine those three numbers ended up as AAABBCCC after this magical transformation. However, something like 987654321 would be ZZZYYYYXX and 123547698 would be AAABCCBC or something like that.
So, my question is, is there a good transformation for numeric values like there exists for alphabetical values? Or, is there some other approach that might make sense (besides some highly complex or low performing SQL or logic)?
The first thing to realize is that social security numbers are basically strings of digits. You really want to treat them like you would strings rather than numbers.
The second thing to realize is that your blocking function maps from a record to a list of strings that identify comparison worthy sets of items.
Here is some Python code to get you started. (I know you asked for Java, but I think the Python is clear and you aren't paying me enough to write it in Java :P ). The basic idea is to take your input record, simulate roughing it up in multiple ways (to get your blocking keys), and then group on by any match on those blocking keys.
import itertools
def transpositions(s):
for pos in range(len(s) - 1):
yield s[:pos] + s[pos + 1] + s[pos] + s[pos + 2:]
def substitutions(s):
for pos in range(len(s)):
yield s[:pos] + '*' + s[pos+1:]
def all_blocks(s):
return itertools.chain([s], transpositions(s), substitutions(s))
def are_blocked_candidates(s1, s2):
return bool(set(all_blocks(s1)) & set(all_blocks(s2)))
assert not are_blocked_candidates('1234', '5555')
assert are_blocked_candidates('1234', '1239')
assert are_blocked_candidates('1234', '2134')
assert not are_blocked_candidates('1234', '1255')

Find out next character for a given sequence of characters (logic test style) (Java)

I recently took a logic quiz/test with questions like: What is the next character for the sequence: a,c,b,d,c? Although not complicated I only managed to complete like half of them in the given time limit.
So I would like for my next try to use: either a script built by me or a tool from the Internet.
Do you have any ideas how to approach this using java? Are there any classes that I could use or have to build from scratch? I found a tutorial on Java Regex Pattern & Matcher but I'm pretty sure it's not what I am looking for.
Note: It's always a-z chars & usually sets of 6 (+/-1)
What is the legal alphabet for the sequence? Is it always a-z? If so, then predicting the sequence isn't that difficult. You could map the letters to 1-26 for a reasonable 'guesstimator'.
In this example:
1, 3, 2, 4, 3...
+2, -1, +2, -1...
You really need to qualify the question to determine how much modeling is required to solve the problem.
The Simple Problem
In your case, it appears you are picking the nth and n+2th letters, in turn (modulo the alphabet length) to continually generate the next letters in a sequence... The sequence might be staggered a little by some constant as well... But in either case, the exact solution should be precisely decoded by a human and implemented in any language.
However, other comments on your question identify that this problem hints at a full blown, much more interesting problem which is not easily solved by a human - but rather which requires hueristics. This prediction problem is relevant to bioinformaticians and artificial intelligence engineers, wherein we want to predict the next letter or word (I.e. From a text stream or Amino acid sequence ) in a string given the preceding word/letter sequence...
The FULL blown Problem
This is a classic problem in artificial intelligence which requires machine learning .
The particular type of problem would take, as input :
the preceding sequence.
And output :
A single, next character in the sequence.
There is an AminoAcid predictor algorithm on github , which we've designed to deal with thus problem using machine learning, that runs in Clojure (see the jayunit100/Rudolf project) , if you are interested in a full blown approach to solving thus problem over a 22 amino acid alphabet.

Best way to validate a String against many patterns

This is a question more about best practices/design patterns than regexps.
In short I have 3 values: from, to and the value I want to change. From has to match one of several patterns:
XX.X
>XX.X
>=XX.X
<XX.X
<=XX.X
XX.X-XX.X
Whereas To has to be a decimal number. Depending on what value is given in From I have to check whether a value I want to change satisfies the From condition. For example the user inputs "From: >100.00 To: 150.00" means that every value greater than 100.00 should be changed.
The regexp itself isn't a problem. The thing is if I match the whole From against one regexp and it passes I still need to check which option was inputted - this will generate at least 5 IFs in my code and every time I want to add another option I will need to add another IF - not cool. Same thing if I were to create 5 Patterns.
Now I have a HashMap which holds a pattern as the key and a ValueMatcher as the value. When a user inputs a From value then I match it in a loop against every key in that map and if it matches then I use the corresponding ValueMatcher to actually check if the value that I want to change satisfies the "From" value.
This aproach on the other hand requires me to have a HashMap with all the possibilities, a ValueMatcher interface and 5 implementations each with only 1 short "matches" methode. I think it sure is better than the IFs, but still looks like an exaggerated solution.
Is there any other way to do it? Or is this how I actually should do it? I really regret that we can't hold methods in a HashMap/pass them as arguments because then I'd only have 1 class with all the matching methodes and store them in a HashMap.
How about a chain of responsibility.
Each ValueMatcher object exactly one From/To rule and a reference to the next ValueMatcher in the chain. Each ValueMatcher has a method which examines a candidate and either transaforms it or passes it on to the next in the chain.
This way adding a new rule is a trivial extension and the controlling code just passes the candidate to the first member of the chain.
a ValueMatcher interface and 5 implementations each with only 1 short "matches" methode. I think it sure is better than the IFs, but still looks like an exaggerated solution.
Well, for something as simple as evaluating a number against an operator and a limit value, couldn't you just write one slightly more generic ValueMatcher which has a limit value and an operator as its parameters? It would then be pretty easy to add 5 instances of this ValueMatcher with a few combinations of >, >=, etc.
EDIT: Removed non Java stuff... sorry about that.

Categories