What is the equivalent of the C# Convert.ToInt32 in Java - java

I am trying to translate some C# code into java and would like to know what is the java equivalent of System.Convert.ToInt32(char):
Converts the value of the specified Unicode character to the
equivalent 32-bit signed integer.
Convert.ToInt32(letter);

"Convert.ToInt32(someChar)" does exactly what "(int)someChar" does.
Since "(int)someChar" is available in Java, why not use that?
When testing the various options, use '5' as a test - some options will convert this simply to the integer 5, but you will want the integer 53 to match the original C# behavior of Convert.ToInt32.

Related

Comparing Strings with equivalent but different Unicode code points in Java/Kotlin

I ran into an issue while comparing two strings with different coders. My code is actually in Kotlin but it's running on the JVM and is effectively using Java's String implementation. Also, my question is of a more general nature and my actual code will not be of concern.
The problem is that I have two strings, lets say a and b, where
a = "something something äöü something"
b = "äöü"
you'd expect that a.contains(b) returns true, and that is the case if you retrieve your strings like shown above. But in my case, the strings come from different sources and happen to have different coders. String a has the coder 1, which is UTF16, and String b has the coder 0, which is LATIN1. In this case, a.contains(b) returns false. Now you might have noticed that I included special characters (ä, ö and ü), because that is where, according to my debugging, the comparison fails.
While I am at the stackframe where the a.contains(b) call happens, both strings appear correctly displayed in my debugger (IntelliJ IDEA Ultimate 2020.2). However if I subsequently step into the comparing functions, I notice that in java.lang.StringLatin1.regionMatchesCI_UTF16(), where the byte arrays are converted back char by char, the special characters of b are now not correct (ä -> a, ö -> o, ü -> u). And of course the comparison fails then.
Now as I said, both strings are displayed correctly in the debugger originally, so the information has to be somewhere. My question is: what do I have to do to let the a.contains(b) call return true, as expected?
EDIT:
I was certain that the problem would originate from the strings having two different coders. However, even though the different coders hint at the fact that different encodings were at work, they are not the source of the problem. Generally speaking, different coders do not affect the result of .equals(), .contains() or similar calls. #OrangeDog pointed this out, while also suggesting that I actually ended up with two different representations of the same character, which really was the case. And still, my question remains the same: How do I compare these two strings that are "semantically" the same, but differ in the representation of certain characters?
Java 11 (11.0.2, openJDK 11)
Kotlin/JVM 1.4.0
IntelliJ IDEA Ultimate 2020.2
Ignore the internal details of String. As far as you are concerned it does not have an encoding, it just stores sequences of characters (or "code point units" as the Kotlin docs describe them).
I'm guessing one of your strings (that was Latin-1) uses the character U+00E4 (ä) and the other uses the sequence U+0061 U+0308 (ä). You can verify using toCharArray().
To be able to compare such strings sensibly, there is the class java.text.Normalizer:
Normalizer.normalize(a, Form.NFKD).contains(Normalizer.normalize(b, Form.NFKD))
Or, ensure that any Strings you are receiving are already in the recommended NFC form.

default number system and charecter set in java

Thi is a fundamental questuion about how java works and so i dont have any code to support it.
I am new to java development and want to know how the different number systems, charecter sets like UTF 8 and unicode come together in Java.
Lets say a user creates a new string and int with the same value.
int i=100;
String S="100";
The hardware of a computer understands zeros and ones. so it has to be converted to binary?(correct me if im wrong). this conversion should be done by the JVM(correct me if im wrong)? and to represent charecters of different languages into charecters that can be typed into the keyboard (english) UTF-8 and such conversions are used(correction needed)?
now how does this whole flow fit into the bigger picture of running a java web application?
how does a string/int get converted to a binary for the machine's hardware to understand?
how does it get converted to UTF-8 for a browser to understand?
and what are the default number format and charecterset in java? if im reading contents of a file? will they be read into binary or utf-8?
All computers run in binary. The conversion is done by the JVM and the computer that you have. You shouldn't worry about converting the code into the coordinating 1's and 0's. The browser has its own conversion hard code to change the universal 1's and 0's(used by all programs and computer software) into however it decides to display the given information. All languages are just a translation guide for the user to "speak" with the computer. And vice versa. Hope this helps though I don't think I really answered anything.
How java represents any data type in memory is the choice of the actual JVM. In practice, the JVM will chose the format native to the processor (e.g. chose between little/big endian for int), simply because it offers the best performance on that platform.
Basically, the JLS makes certain guarantees (like that a byte has 8 bits and the values range from -128 to 127) - the VM just maps that to the platform as it deems suitable (the JLS was specified to match common computing technology closely, so there is usually no magic needed to guess how primitive types map to the platform).
You should never care how the VM represents data in memory, java does not offer any legal way to access the data in a manner where you would need to know (bypassing most of the VM's logic by using sun.misc.Unsafe is not considered legal).
If you care for educational purposes, learn what binary representations the underlying platform (e.g. x86) uses and take a look at the VM. It has little to do with java really, its all VM and platform specific.
For java.lang.String, its the implementation of the class that defines how the String is stored internally - it went through quite some changes over major java versions - but what that String exposes is quite narrowly defined (see JDK javadoc for String.length(), String.charAt()).
As for how user input is translated to java standard types, thats actually platform specific. The JVM selects the default encoding (e.g. String.toBytes() can return quite different results for the same string, depending on the platform - thats why its recommended to explictly specify the desired encoding). Same goes for many other things (time zone, number format etc.).
CharSets and Formats are building blocks the program wires up to translate data from the outside world (file, http or user input) into javas representation of data (or vice versa). For example, a Web application will use the encoding from a HTTP header to determine what CharSet to use when interpreting the contents (the HTTP headers encoding is defined to be US-ASCII by the spec).

How to use or implement arrays in XQuery?

Is there any built in support for array in XQuery? For example, if we want to implement
the simple java program in xquery how we would do it:
(I am not asking to translate the entire program into xquery, but just asking
how to implement the array in line number 2 of the below code to xquery? I am
using marklogic / xdmp functions also).
java.lang.String test = new String("Hello XQuery");
char[] characters = test.toCharArray();
for(int i = 0; i<characters.length; i++) {
if(character[i] == (char)13) {
character[i] = (char) 0x00;
}
}
Legend:
hex 0x00 dec 0 : null
hex 0x0d dec 13: carriage return
hex 0x0a dec 10: line feed
hex 0x20 dec 22: dquote
The problem with converting your sample code to XQuery is not the absence of support for arrays, but the fact that x00 is not a valid character in XML. If it weren't for this problem, you could express your query with the simple function call:
translate($input, '', '')
Now, you could argue that's cheating, it just happens so that there's a function that does exactly what you are trying to do by hand. But if this function didn't exist, you could program it in XQuery: there are sufficient primitives available for strings to allow you to manipulate them any way you want. If you need to (and it's rarely necessary) you can convert a string to a sequence of integers using the function string-to-codepoints(), and then take advantage of all the XQuery facilities for manipulating sequences.
The lesson is, when you use a declarative language like XQuery or XSLT, don't try to use the same low-level programming techniques you were forced to use in more primitive languages. There's usually a much more direct way of expressing the problem.
XQuery has built-in support for sequences. The function tokenize() (as suggested by #harish.ray) returns a sequence. You can also construct one yourself using braces and commas:
let $mysequence = (1, 2, 3, 4)
Sequences are ordered lists, so you can rely on that. That is slightly different from a node-set returned from an XPath, those usually are document-ordered.
On a side mark: actually, everything in XQuery is either a node-set or a sequence. Even if a function is declared to return one string or int, you can treat that returned value as if it is a sequence of one item. No explicit casting is necessary, for which there are no constructs in XQuery anyhow. Functions like fn:exists() and fn:empty() always work.
HTH!
Just for fun, here's how I would do this in XQuery if fn:translate did not exist. I think Michael Kay's suggestion would end up looking similar.
let $test := "Hello XQuery"
return codepoints-to-string(
for $c in string-to-codepoints($test)
return if ($c eq 32) then 44 else $c)
Note that I changed the transformation because of the problem he pointed: 0 is not a legal codepoint. So instead I translated spaces to commas.
With MarkLogic, another option is to use http://docs.marklogic.com/json:array and its associated functions. The json:set-item-at function would allow coding in a vaguely imperative style. Coding both variations might be a good learning exercise.
There are two ways to do this.
Firstly you can create an XmlResults object using
XmlManager.createResults(), and use XmlResults.add() to add your
strings to this. You can then use the XmlResults object to set the
value of a variable in XmlQueryContext, which can be used in your
query.
Example:
XmlResults values = XMLManager.createResults();
values.add(new XmlValue("value1"));
values.add(new XmlValue("value2"));
XmlQueryContext.setVariableValue("files", values);
The alternative is to split the string in XQuery. You
can do this using the tokenize() function, which works using a
regular expression to match the string separator.
http://www.w3.org/TR/xpath-functions/#func-tokenize
Thanks.
A little outlook: XQuery 3.1 will provide native support for arrays. See http://www.w3.org/TR/xquery-31/ for more details.
You can construct an array like this:
$myArray = tokenize('a b c d e f g', '\s')
// $myArray[3] -> c
Please note that the first index of this pseudo-array is 1 not 0!
Since the question "How to use or implement arrays in XQuery?" is being held generic (and thus shows up in search results on this topic), I would like to add a generic answer for future reference (making it a Community Wiki, so others may expand):
As Christian Grün has already hinted at, with XQuery 3.1 XQuery got a native array datatype, which is a subtype of the function datatype.
Since an array is a 'ordered list of values' and an XPath/XQuery sequence is as well, the first question, which may arise, is: "What's the difference?" The answer is simple: a sequence can not contain another sequence. All sequences are automatically flattened. Not so an array, which can be an array of arrays. Just like sequences, arrays in XQuery can also have any mix of any other datatype.
The native XQuery array datatype can be expressed in either of two ways: As [] or via array {}. The difference being, that, when using the former constructor, a comma is being considered a 'hard' comma, meaning that the following array consists of two members:
[ ("apples", "oranges"), "plums" ]
while the following will consist of three members:
array { ("apples", "oranges"), "plums" }
which means, that the array expression within curly braces is resolved to a flat sequence first, and then memberized into an array.
Since Array is a subtype of function, an array can be thought of as an anonymous function, that takes a single parameter, the numeric index. To get the third member of an array, named $foo, we thus can write:
$foo(3)
If an array contains another array as a member you can chain the function calls together, as in:
$foo(3)(5)
Along with the array datatype, special operators have been added, which make it easy to look up the values of an array. One such operator (also used by the new Map datatype) is the question mark followed by an integer (or an expression that evaluates to zero or more integers).
$foo?(3)
would, again, return the third member within the array, while
$foo?(3, 6)
would return the members 3 and 6.
The parenthesis can be left out, when working with literal integers. However, the parens are needed, to form the lookup index from a dynamic expression, like in:
$foo?(3 to 6)
here, the expression in the parens gets evaluated to a sequence of integers and thus the expression would return a sequence of all members from index position 3 to index position 6.
The asterisk * is used as wildcard operator. The expression
$foo?*
will return a sequence of all items in the array. Again, chaining is possible:
$foo?3?5
matches the previos example of $foo(3)(5).
More in-depth information can be found in the official spec: XML Path Language (XPath) 3.1 / 3.11.2 Arrays
Also, a new set of functions, specific to arrays, has been implemented. These functions resinde in the namespace http://www.w3.org/2005/xpath-functions/array, which, conventionally, is being prefixed with array and can be found referenced in here: XPath and XQuery Functions and Operators 3.1 / 17.3 Functions that Operate on Arrays

Which kind of representation can be '\r\x00\x00\x00' (if usually I have hexadecimal code:'\x0\x00\x00\x03')

I'm using a program (klee) that give me tests of c code.
I need to use the results in my program.
It is not readable information, but some of the solutions are hexadecimal data with the next format:
'\x0e\x00\x00\x00'
I have already asked about how to convert it into integer, and I found the solution.
I will have to introduce this kind of results in structs too, I will know the size but any about the fields or anything else about it.
I think I can solve this but now the problem is that sometimes you can obtain things like:
'\n\x00\x00\x00'= 13
or
'\r\x00\x00\x00' = 10
And I didn't found which kind of representation they use to convert it in readable information..
Apparently I could solve this in python with:
import struct
selection = struct.unpack('
I don't have any idea of pyton, and I would like found a solution in java or c.
Thanks very much
The value \n\r is used by Windows systems to indicate a newline - the \n moves to the new line, and \r moves the write pointer to the start of the line. I'm thinking that you might have had some character data containing a newline where each character was converted into a 32-bit integer value in big-endian format.
Hope this helps!

Is there any difference between Java byte code and .NET byte code? If so, shall I take hexadecimal of that values?

I would like to know if there any difference between Java byte code and .NET byte code? If there any difference, shall I take hexadecimal values of that Java byte code and .Net byte code. Because, hexadecimal is independent of languages and it is universal specification.
Problem description
We are developing a mobile application in j2me and Java. Here I am using external finger print reader for reading/verifying finger print. We are using one Java api for reading/verifying finger print.
I capture the finger template and raw image bytes. I convert the raw image bytes into hex form and stored in a separate text file.
Here we using a conversion tool (developed in .NET) that converts the hex form into image. With the help of that tool we are trying to get the image from that text file. But we cannot get the image correctly.
The .NET programmer says the Java byte and .NET byte differ. Java byte ranges from -128 to 127. But .NET byte ranges from 0 to 255. So there is a problem.
But my assumption here is: the hex is independent of Java & .net. It is common to both. So, instead of storing byte code in text file, I plan to convert that byte code into hexadecimal format. So,our .NET conversion tool automatically convert this hexadecimal into Image.
I don't know whether I am going on correct path or not?
Hexadecimal is just a way to represent numbers.
Java is compiled to bytecode and executed by a JVM.
.NET is compiled to bytecode and executed by the CLR.
The two formats are completely incompatible.
I capture the finger template and raw image bytes .I convert the raw image bytes into hex form and stored in a separate text file.
OK; note, storing as binary would have been easier (and more efficient), but that should work
Here we using a conversion tool (developed in .NET) that converts the hex form into image.With the help of that tool we are trying to get the image from that text file.But we cannot get the image correctly.
Rather than worrying about the image, the first thing to do is check where the problem is; there are two obvious scenarios:
you aren't reading the data back into the same bytes
you have the right bytes, but you can't get it to load as an image
First; figure out which of those it is, simply by storing some known data and attempting to read it back at the other end.
The .NET programmer says the java byte and .NET byte differ.Java byte ranges from -128 to 127.But .NET byte ranges from 0 to 255.So there is a problem.
That shouldn't be a problem for any well-written hex-encode. I would expect a single java byte to correctly write a single hex value between 00 and FF.
I dont know, whether i am going on Correct path or not?
Personally, I suspect you are misunderstanding the problem, which makes it likely that the solution is off the mark. If you want to make life easier, store as binary rather than text; but there is no inherent problem exchanging hex around. If I had to pack raw binary data into a text file, personally I'd probably go for base-64 rather than hex (it will be shorter), but either is fine.
As I mentioned above: first figure out whether the problem is in reading the bytes, vs processing the bytes into an image. I'm also making the assumption that the bytes here are an image format that both environments can process, and not (for example) a custom serialization format.
Yes, Java byte code and .NET’s byte code are two different things that are not interchangeable. As to the second part of your question, I have no idea what you are talking about.
Yes they are different while there are tools that can migrate from one to an other.
Search google fro java bytecode IL comparison . This one from same search

Categories