Pact body matching - match number of nodes

Pact body matching - match number of nodes - java

I try to achieve the following:
I want to verify that a specific node in body ("entry") has between a minimum and a maximum number of direct subnodes called "content" (1 to 10 nodes). I don't care about what is in these subnodes, which value they have or how many subnodes they have.
Since I'm new to pact and don't really know how to match something like this I hope someone can help me.
Thanks in advance.
Edit1:
I use a node matcher one of my collegues built like following:
return builder.given("given").uponReceiving("description").path(SERVICEPATH)
.query(query).method(METHOD_GET).headers(ACCEPT, REQ_ACCEPT).willRespondWith().status(200)
.matchHeader(HEADER_CONTENT_TYPE, RESP_HEADER_CONTENT_TYPE_REGEX).body(responseBody)
.matchBody(new PactDslXmlNodeMatcher().node("entry").node("content").value(".*")).toPact();
Don't let that irritate you, that .matchBody just converts that node to the typical $. + path + '[#text]'-notation, where .value adds an regex-matcher rule to body matchers.
I also had a look at: https://github.com/DiUS/pact-jvm/tree/master/pact-jvm-consumer-junit where maxArrayLike / minArrayLike looks promising, but I don't know how I can apply this to my case.
Edit2:
Now I have a very cool PactDslWithProvider as following:
return builder.given("Genereller Test").uponReceiving("Feed soll Eintraege enthalten").path(SERVICEPATH)
.query(query).method(METHOD_GET).headers(ACCEPT, REQ_ACCEPT).willRespondWith().status(200)
.matchHeader(HEADER_CONTENT_TYPE, RESP_HEADER_CONTENT_TYPE_REGEX)
.body(responseBody)
.matchBody(new PactDslXmlNodeMatcher().node("feed").minMaxType(minNumberNodesInFeed, maxNumberNodesInFeed))
.toPact();
The method "minMaxType" adds a MinMaxTypeMatcher to the body-category with the path of the node.
The actual behaviour of this:
It matches type, min and max of the most inner nodes of $.feed. Like: $.feed.0.title, $.feed.1.link, ..., $.feed.6.entry.0.title
But what I actually want is to verify that $.feed has a min and a max number of subnodes. How can I achieve that?

It sounds like you're trying to use Pact to do a functional test, rather than a contract test. Generally speaking, you shouldn't be writing tests that care about how many items there are in an array. Have a read of these two links and let me know how you go.
https://docs.pact.io/best-practices/consumer#use-pact-for-contract-testing-not-functional-testing-of-the-provider
https://docs.pact.io/best-practices/consumer/contract-tests-vs-functional-tests
Also, join us on slack.pact.io if you haven't already.

Related

Create a flowable with generate function using RxJava2

I need to create a custom Flowable with backpressure implemented. I'm trying to achieve some sort of paging. That means when downstream requests 5 items I will "ask the data source" for items 0 - 5. Then when downstream needs another 5, I'll get items 5 - 10 and emit back.
The best thing I've found so far is to use Flowable.generate method but I really don't understand why there is no way (as far as I know) how to get the requested number of items the downstream is requesting. I can use the state property of generator to save the index of last items requested so then I need only the number of newly requested items. The emmiter instance I got in the BiFunction apply is GeneratorSubscription which is extending from AtomicLong. So casting emmiter to AtomicLong can get me the requested number. But I know this can't be the "recommended" way.
On the other hand when you use Flowable.create you get the FlowableEmitter which has long requested() method. Using generate is suiting me more for my use-case, but now I'm also curious what is the "correct" way to use Flowable.generate.
Maybe I'm overthinking the whole thing so please point me in the right direction. :) Thank you.
This is what the actual code looks like (in Kotlin):
Flowable.generate(Callable { 0 }, BiFunction { start /*state*/, emitter ->
val requested = (emitter as AtomicLong).get().toInt() //this is bull*hit
val end = start + requested
//get items [start to end] -> items
emmiter.onNext(items)
end /*return the new state*/
})

Ok, I found out that the apply function of the BiFunction is called that many times as is the request amount (n). So there's no reason to have a getter for it. It's not what I have hoped for but it is apparently how generate works. :)

Transformation algorithms for numerical values similar to functionality of Soundex, Metaphone, etc

I'm working on implementing probablistic matching for person record searching. As part of this, I plan to have blocking performed before any scoring is done. Currently, there are a lot of good options for transforming strings so that they can be stored and then searched for, with similar strings matching each other (things like soundex, metaphone, etc).
However, I've struggled to find something similar for purely numeric values. For example, it would be nice to be able to block on a social security number and not have numbers that are off or have transposed digits be removed from the results. 123456789 should have blocking results for 123456780 or 213456789.
Now, there are certainly ways to simply compare two numerical values to determine how similar they are, but what could I do when there are million of numbers in the database? It's obviously impractical to compare them all (and that would certainly invalidate the point of blocking).
What would be nice would be something where those three SSNs above could somehow be transformed into some other value that would be stored. Purely for example, imagine those three numbers ended up as AAABBCCC after this magical transformation. However, something like 987654321 would be ZZZYYYYXX and 123547698 would be AAABCCBC or something like that.
So, my question is, is there a good transformation for numeric values like there exists for alphabetical values? Or, is there some other approach that might make sense (besides some highly complex or low performing SQL or logic)?

The first thing to realize is that social security numbers are basically strings of digits. You really want to treat them like you would strings rather than numbers.
The second thing to realize is that your blocking function maps from a record to a list of strings that identify comparison worthy sets of items.
Here is some Python code to get you started. (I know you asked for Java, but I think the Python is clear and you aren't paying me enough to write it in Java :P ). The basic idea is to take your input record, simulate roughing it up in multiple ways (to get your blocking keys), and then group on by any match on those blocking keys.
import itertools
def transpositions(s):
for pos in range(len(s) - 1):
yield s[:pos] + s[pos + 1] + s[pos] + s[pos + 2:]
def substitutions(s):
for pos in range(len(s)):
yield s[:pos] + '*' + s[pos+1:]
def all_blocks(s):
return itertools.chain([s], transpositions(s), substitutions(s))
def are_blocked_candidates(s1, s2):
return bool(set(all_blocks(s1)) & set(all_blocks(s2)))
assert not are_blocked_candidates('1234', '5555')
assert are_blocked_candidates('1234', '1239')
assert are_blocked_candidates('1234', '2134')
assert not are_blocked_candidates('1234', '1255')

How does WEKA normalize attributes?

Suppose I input to WEKA some dataset and set a normalization filter for the attributes so the values be between 0 and 1. Then suppose the normalization is done by dividing on the maximum value, and then the model is built. Then what happens if I deploy the model and in the new instances to be classified an instance has a feature value that is larger than the maximum in the training set. How such a situation is handled? Does it just take 1 or does it then take more than 1? Or does it throw an exception?

The documentation doesn't specify this for filters in general.So it must depend on the filter. I looked at the source code of weka.filters.unsupervised.attribute.Normalize which I assume you are using, and I don't see any bounds checking in it.
The actual scaling code is in the Normalize.convertInstance() method:
value = (vals[j] - m_MinArray[j]) / (m_MaxArray[j] - m_MinArray[j])
* m_Scale + m_Translation;
Barring any (unlikely) additional checks outside this method I'd say that it will scale to a value greater than 1 in the situation that you describe. To be 100% sure your best bet is to write a testcase, invoke the filter yourself, and find out. With libraries that haven't specified their working in the Javadoc, you never know what the next release will do. So if you greatly depend on a particular behaviour, it's not a bad idea to write an automated test that regression-tests the behaviour of the library.

I have the same questions as you said. I did as follows and may this method can help you:
I suppose you use the weka.filters.unsupervised.attribute.Normalize to normalize your data.
as Erwin Bolwidt said, weka use
value = (vals[j] - m_MinArray[j]) / (m_MaxArray[j] - m_MinArray[j])
* m_Scale + m_Translation;
to normalize your attribute.
Don't forget that the Normalize class has this two method:
public double[] getMinArray()
public double[] getMaxArray()
Which Returns the calculated minimum/maximum values for the attributes in the data.
And you can store the minimum/maximum values. And then use the formula to normalize your data by yourself.
Remember you can set the attribute in Instance class, and you can classify your result by Evaluation.evaluationForSingleInstance
I 'll give you the link later, may this help you.
Thank you

Lucene/Solr for approximate (company) name matching

I have a question regarding Lucene/Solr.
I am trying to solve a general (company) name matching problem.
Let me present one oversimplified example:
We have two (possibly large) lists of names viz., list_A and list_B.
We want to find the intersection of the two lists, but the names in the two lists may not always exactly match. For each distinct name in list_A, we will want to report one or more best matches from list_B.
I have heard that Lucene/Solr can solve this problem. Can you tell me if this is true? If it is, please point me to some minimal working example(s).
Thanks and regards,
Dibyendu

You could solve this with Lucene, yes, but if you just need to solve this one problem, creating a Lucene index would be a bit of a roundabout way to do it.
I'dd be more inclined to take a simpler approach. You could just find a library for fuzzy comparison between strings, and iterate through your lists and return only those under a certain threshold of similarity as matches.
org.apache.commons.lang3.StringUtils comes to mind, something like:
for (String a : alist) {
for (String b : blist) {
int dist = StringUtils.getLevenshteinDistance(a,b)
if (dist < threshold) {
//b is a good enough match for a, do something with it!
}
}
}
Depending on your intent, other algorithms might be more appropriate (Soundex or Metaphone, for instance)

SOLR can solve your problem. Index the list_B in SOLR. Now do a search for every item in list_A in SOLR, you will get one or more likely match from the list_B.
You need to configure analyzers and filters for the field according to your data set and what kind of similar result you want.

I am trying to do something similar, and I would like to point out to the other commenters that their proposed solutions (like Levenshtein Distance or Soundex) may not be appropriate, if the problem is matching of accurate names, as opposed to mis-spelled names.
For example: I doubt either one is much use for matching
John S W Edward
with
J Samuel Woodhouse Edward
I suppose it is possible, but this is a different class of problem than what they were intended to accomplish.

Regex to validate url with port, clarification needed

I am trying to match a URL using the following regex, in Java
^http(s*):\\/\\/.+:[1-65535]/v2/.+/component/.+$
Test fails using URL: https://box:1234/v2/something/component/a/b
I suspect it's the number range that's causing it. Help me understand what am i missing here please?

See http://www.regular-expressions.info/numericranges.html. You can't just write [1-65535] to match 1 or 65535. That says any number 1-6, or 5 or 3.
The expression you need is quite verbose, in this case:
([1-9][0-9]{0,3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])
(Credit to http://utilitymill.com/utility/Regex_For_Range)
Another issue is your http(s*). That needs to be https? because in its current form it might allow httpsssssssss://. If your regex takes public input, this is a concern.

^http(s*) is wrong, it would allow httpssssss://...
You need ^https?
This doesn't affect the given test though.

The group [1-65535] basically means number from 1 to 6 or 5 or 5 or 3 or 5.
that would even evaluate, but you need an + (or *) at the end of the group.
To match the port more precisely you could use [1-6][0-9]{0,4}?. That would get you really close, but also allow p.e. 69999 - the {m,n}? is used to specify how often a group can be used (m to n times)
Also take care of that (s*) thing the others pointed out!
That would result in:
^https?:\\/\\/.+:[1-6][0-9]{0,4}?/v2/.+/component/.+$

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.