Find difference of two Strings using java-diff-utils, or other? - java

i am looking to find an example of Java-diff-utils finding the difference between two strings e.g.
String s1 = "{\"type\": \"message_typing\",\"requestId\": \"requestid\",\"clientMutationId\": \"mutationid\",\"chatRoomId\": \"roomid\",\"conversationId\": \"conversationid\"}";
String s2 = "{\"type\": \"type_2\",\"requestId\": \"1\",\"clientMutationId\": \"2\",\"chatRoomId\": \"dev/test\",\"conversationId\": \"aa2344ceDsea1\"}";
so the first is the base message the second is the one i would like to compare against the base message and get the different values (e.g. type_2,1,2,dev/test,aa2344ceDsea1) however i would like to be able to reassemble the string correctly if given only the base message and the values of the diffs.
I can only find examples using two files online and no examples using two strings if anyone could give me a code example that would be very helpful. I have already did it using google-diff-patch-match however the returned diffs are too large for what i need. as i will be sending the diffs over MQTT in order to keep payload size down, so i need something that can extract the values but still be able to reassemble on the other side when given the values and base message.

Related

How to mask a specific value without knowing exact key, within a JSON string, in Java

I am receiving a JSON in the form of a string, need to mask a piece of information, however the JSON strucutre and key-names are always different but value's pattern is recognizable. Question being, what is an efficient way to traverse through String/JSONObject to mask that piece of data.
I've tried turning the String into a JSONObject and traverse through every embedded JSONObject/Array, detect the pattern, and replace that original value with its masked version. But this seems very time consuming when Logging this information out to console.
Value's pattern for reference is a 9-digit (Long) number.
Structure always varies from "{"key1":[{"innerKey1":123456789}]}" to "{"key1":"value1", "key2":{"innerKey1":123456789}"
Sample result : "{"key1":[{"innerKey1":"XXXXXX789"}]}"
If the JSON structure is always provided as optimized single line string you could just find the value in the string and replace it or get even more elaborate and use a regular expression to find the innerKey1:12345 match and replace that.
If this is just for logging purposes you can even implement this as a filter, depending on your logging framework it might be even configurable instead of having to code it.

Expected equal JSON strings are not equal when compared using assertEquals

I wrote JUnit test for the web service client, which submits JSON document to the service.
I saved "correct" JSON document to the file, then after the test execution I compare it with actual result.
They are not matched, although lines are identical:
org.junit.ComparisonFailure:
Expected :{"Callback":null,"Data":
{"MarketCode":"ISEM",,............"Price":2.99}]}]}]}]}}
Actual :{"Callback":null,"Data":
{"MarketCode":"ISEM",,............"Price":2.99}]}]}]}]}}
Lines are very long , about 4K characters, so I cut much of it here, but their length is identical. I compared string.size() in the debugger , and also I trim it before the compare, to remove some invisible or whitespace symbols in the end, which text editor can implicitly insert.
Also, test is OK when executed isolately. But it fails , when I run it as part of bigger suite.
There is no global/static variables, so memory overriding should be not an issue.
I'm mocking web service client to extract the request string , like this:
StringBuilder pd = new StringBuilder();
doAnswer((invocation) -> {
String postDocument = ((String)invocation.getArguments()[0]).trim();
pd.append(postDocument);
return null;
}).when(client).doPost(anyString(), anyObject());
client is mocked class.
Then I compare trimmed versions of strings, but it doesnt help
String expectedSubmit = TestUtils.readXmlFromFile("strategyexecution\\ireland_bm_strategy_override_expected.json").trim();
assertEquals(expectedSubmit, pd.toString().trim());
I found answer myself :-)
The problem is with JSON specification itself.
JSON cannot guarantee the same order of elements inside the array, it's basically unordered set.
So, the content can be randomly reordered. Two produced JSON files should not be compared as two strings.
I deserialized it to Java object and object comparision works!
Same old issue as we had with XML. For XML there is XMLUnit which semantically compares xml-s. For JSON I'd try to use a similar tool, like JsoNunit. JSONAssert too looks promising.

How to generate all possible sentence from given tokens in Java

I am trying to generate all possible sentences from given token. It is a transliteration program. I have various possibilities for each token to be transliterated and I want to generate all possible sentences. e.g. if sentence is token1 token2 token3 and supposing token1 can be represented in 3 ways after transliteration, token2 can be represented by 2 ways and token3 can be represented by 4 ways then total possible sentences are 24. I am developed a general tree and then perform depth first traversal to generate all possible sentences. the problem is when sentence become long, the number of possibilities increases and I got "java.lang.OutOfMemoryError: Java heap space" error.
Is there any other way to generate all possible sentences?? At some instances I need to generate millions of sentences. Please Help!!!
You can't generate them all at once like that.
Depending on what you need them for, you should either do whatever that is or write them to a file.
Another thought, that still might not work, would be to not store every possible value but store a set of references/relationships. You can make this much more complex with n-grams and mMrkov chains, or simply have a a set of references, or even just have a list of array indexes.
So besides using storage space as a memory buffer, you can conceptualize instead of foo calling gen for the full set, have gen call foo after each one is generated.
[EDIT: looking back on this, (I was interested to see any other answers) I want to clarify that the function foo is whatever you're using them for and the function gen generates them (just in case it isn't clear, and especially for anyone who's first language isn't english)]

Optimized way of doing string.endsWith() work.

I need to look for all web requests received by Application Server to check if the URL has extensions like .css, .gif, etc
Referred how tomcat is listening for every request and they pick the right configured Servlet to serve.
CharChunk , MessageBytes , Mapper
Here is my idea to implement:
Load all the extensions we like to compare and get the byte
representation of them.
get a unique value for this xtension by summing up the bytes in the byte Array // eg: "css".getBytes()
Add the result value to Sorted List
Whenever we receive the request, get the byte representation of the URL // eg: "flipkart.com/eshopping/images/theme.css".getBytes()
Start summing the bytes from the byte array's last index and break when we encounter "." dot byte value
Search for existence of the value thus summed with the Sorted List // Use binary Search here
Kindly give your feed backs about the implementation and issues if any.
-With thanks, Krishna
This sounds way more complicated than it needs to be.
Use String.lastIndeXOf to find the last dot in the URL
Use String.substring to get the extension based on that
Have a HashSet<String> for a set of supported extensions, or a HashMap<String, Whatever> if you want to map the extension to something else
I would be absolutely shocked to discover that this simple approach turned out to be a performance bottleneck - and indeed I suspect it would be more efficient than the approach you suggested, given that it doesn't require the entire URL to be converted into a byte array... (It's not clear why your approach uses byte arrays anyway instead of forming the hash from char values.)
Fundamentally, my preferred approach to performance is:
Do up-front design and testing around things which are hard to change later, architecturally
For everything else:
Determine the performance criteria first so you know when you can stop
Write the simplest code that works
Test it with realistic data
If it doesn't perform well enough, use profilers (etc) to work out where the bottleneck is, and optimize that making sure that you can prove the benefits using your existing tests

Is StringBuffer the same as Strings in Ruby and Symbols the same as regular Java strings?

I just started reading this book Eloquent Ruby and I have reached the chapter about Symbols in Ruby.
Strings in Ruby are mutable, which means each string allocate memory since the content can change, and even though the content is equal. If I need a mutable String in Java I would use StringBuffer. However since regular Java Strings are immutable one String object can be shared by multiple references. So if I had two regular Strings with the content of "Hello World", both references would point to the same object.
So is the purpose of Symbols in Ruby actually the same as "normal" String objects in Java? Is it a feature given to the programmer to optimize memory?
Is something of what I written here true? Or have I misunderstood the concept of Symbols?
Symbols are close to strings in Ruby, but they are not the equivalent to regular Java strings, although they, too, do share some commonalities such as immutability. But there is a slight difference - there is more than one way to obtain a reference to a Symbol (more on that later on).
In Ruby, it is entirely possible to convert the two back and forth. There is String#to_sym to convert a String into a Symbol and there is Symbol#to_s to convert a Symbol into a String. So what is the difference?
To quote the RDoc for Symbol:
The same Symbol object will be created for a given name or string for the duration of a program‘s execution, regardless of the context or meaning of that name.
Symbols are unique identifiers. If the Ruby interpreter stumbles over let's say :mysymbol for the first time, here is what happens: Internally, the symbol gets stored in a table if it doesn't exist yet (much like the "symbol table" used by parsers; this happens using the C function rb_intern in CRuby/MRI), otherwise Ruby will look up the existing value in the table and use that. After the symbol gets created and stored in the table, from then on wherever you refer to the Symbol :mysymbol, you will get the same object, the one that was stored in that table.
Consider this piece of code:
sym1 = :mysymbol
sym2 = "mysymbol".to_sym
puts sym1.equal?(sym2) # => true, one and the same object
str1 = "Test"
str2 = "Test"
puts str1.equal?(str2) # => false, not the same object
to notice the difference. It illustrates the major difference between Java Strings and Ruby Symbols. If you want object equality for Strings in Java you will only achieve it if you compare exactly the same reference of that String, whereas in Ruby it's possible to get the reference to a Symbol in multiple ways as you saw in the example above.
The uniqueness of Symbols makes them perfect keys in hashes: the lookup performance is improved compared to regular Strings since you don't have to hash your key explicitly as it would be required by a String, you can simply use the Symbol's unique identifier for the lookup directly. By writing :somesymbol you tell Ruby to "give me that one thing that you stored under the identifier 'somesymbol'". So symbols are your first choice when you need to uniquely identify things as in:
hash keys
naming or referring to variable, method and constant names (e.g. obj.send :method_name )
But, as Jim Weirich points out in the article below, Symbols are not Strings, not even in the duck-typing sense. You can't concatenate them or retrieve their size or get substrings from them (unless you convert them to Strings first, that is). So the question when to use Strings is easy - as Jim puts it:
Use Strings whenever you need … umm … string-like behavior.
Some articles on the topic:
Ruby Symbols.
Symbols are not immutable strings
13 Ways of looking at a Ruby Symbol
The difference is that Java Strings need not point to the same object if they contain the same text. When declaring constant strings in your code, this normally is the case since the compiler will put it in the constant pool.
However, if you create a String dynamically at runtime in Java, two Strings can perfectly point to different objects and still contain the same literal text. You can however force this by internalizing the String objects (calling String.intern(), see Java API
A nice example can be found here.

Categories