Convert XSLT function (string-to-codepoints) to Java

Convert XSLT function (string-to-codepoints) to Java - java

How can I "translate" this XSLT code to Java ?
<xsl:value-of select="number(string-to-codepoints(upper-case($char)) - string-to-codepoints('A'))+10"/>
I only know that: "The fn:string-to-codepoints function returns a sequence of xs:integer values representing the Unicode code points."
From the example that is given in (http://www.xsltfunctions.com/xsl/fn_string-to-codepoints.html) :
string-to-codepoints('a') = 97
I found this:
char ch = 'a';
System.out.println(String.format("\\u%04x", (int) ch));
But I get : \u0061

For a single char you can just cast it to int to get the decimal value:
System.out.println((int)ch);
For a String there's .toCharArray() to convert it to a char[] but that isn't quite the same as a "sequence of codepoints" if the String involves Unicode characters outside the BMP (i.e. above U+FFFF), which are represented in Java as a surrogate pair of two char values. To handle surrogates properly you would need to use a technique like the one described in this answer.
To answer the specific question you ask, you can do
number(string-to-codepoints(upper-case($char)) - string-to-codepoints('A'))+10
in Java as
char ch = // wherever you get $char from
int num = Character.toUpperCase(ch) - 'A' + 10;
since char is an integer type in Java and you can add or subtract char values like any other number.
But this will probably only give you a sensible answer when the initial ch is an ASCII letter.

You print the value as unicode escape sequence; XSLT prints a decimal value.
This should work much better:
System.out.println("a".codePointAt(0));

Related

How to Output int from char in JAVA after storing it [duplicate]

This question already has answers here:
Char - Java not working as intended / my code
(4 answers)
Closed 3 years ago.
I am seeing a tutorial on udemy and there the instructor says that we can store the integer variable in the char data type. But when I try to print the value ... nothing shows up
I tried assigning the "char one" value to integer variable and then get the output from int variable,It works but why can not I use the char to output the value
public static void main(String[] args) {
char one = 10;
System.out.println(one);
}

If you look at the ASCII table you would see that the character 10 represents the newline character.
This can be proved by the code below:
public static void main(String[] args) {
char one = 10;
//no newline added by print, but println adds a newline implicitly
System.out.print("Test");
System.out.print(one);
System.out.print("Test");
}
The output is:
Test
Test
Although I used System.out.print a newline was still added in the output after the first Test. So you see something was actually printed.
Furthermore, when you pass a char to the System.out.println() the char is converted to its String representation as per the ASCII table by invoking the String.valueOf(char) as char is a primitive.
For Objects when you pass a reference in the System.out.println() the toString() method of the object would be called to get its String representation.
If you change the value to char one = 65 you would see the letter A printed.

In Java char type is an int, therefore they can be converted char <-> int.
When you print an int - you get an integer number. When you print char - you get an ASCII character. char ch = 10 - is not printable character.
char ch = 'A';
System.out.println(ch); // print 'A'
int code = ch;
System.out.println(code); // print 65 - ASCII code of 'A'

Adding to the above answers, if you want to output the int value from the variable "one", a cast would work:
char one = 10;
System.out.println((int) one);

If you take a look at the ASCII Table, you can see the value of 10 is LF which is a new line. If you print this alone, it will appear to be doing nothing because it is just a new line.
However if you modify the code a bit to print some actual characters on both side of the LF char:
char c1 = 70;
System.out.print(c1);
char one = 10;
System.out.print(one);
char c2 = 71;
System.out.print(c2);
This will output:
F
G
On separate lines due to the newline in between, without it they would have printed on the same line.
Additionally you can see on that table 70 corresponds with F, and 71 with G.
Note: Java does not technically use ASCII, but rather a different encoding depending on your environment(commonly UTF-16 or ISO-8859-1), however, the characters are usually equivalent to ASCII for the amount of values the ASCII table contains (a superset). For example char c1 = 202 will print Ê for me, which is not an ASCII value.

You are misinterpreting your output and drawing the wrong conclusion.
A char is a UTF-16 code unit. UTF-16 is a character encoding for the Unicode character set. UTF-16 encodes a Unicode codepoint with one or two UTF-16 code units. Typically, if it might be two code units, you'd use String or char[] instead of char. But if your codepoint is known to take only one UTF-16 code unit, you could use char.
The codepoint you are using is U+000A 'LINE FEED (LF)'. It does take one UTF-16 code unit \u000a, which is convertible from the integer value 0xa or 10. If you inspect your output carefully, you'll "see". Perhaps adding output before and after would help.

How can I zero-pad a hexadecimal digit string to eight digits?

I have a logic requirement, where I need to ensure that a hexadecimal digit string is presented in 8-digit format, even if the leading digits are zero. For example, the string corresponding to 0x3132 should be formatted as "0x00003132".
I tried this:
String key_ip = txt_key.getText();
int addhex = 0;
char [] ch = key_ip.toCharArray ();
StringBuilder builder = new StringBuilder();
for (char c : ch) {
int z = (int) c;
builder.append(Integer.toHexString(z).toUpperCase());
}
System.out.println("\ n (key) is:" + key_ip);
System.out.println("\ nkey in Hex:" + addhex + builder.toString());
, but it gave me an error. Can anyone explain how to fix or rewrite my code for this?
and I want to ask one more thing, if use code
Long.toHexString(blabla);
is it true to change the value "0x00" to "\0030" so that the output of 0 is 30

Evidently, you are receiving a String, converting its chars to their Unicode code values, and forming a String containing the hexadecimal representations of those code values. The problem you want to solve is to left-pad the result with '0' characters so that the total length is not less than eight. In effect, the only parts of the example code that are directly related to the problem itself are
int addhex = 0;
and
System.out.println("\ nkey in Hex:" + addhex + builder.toString());
. Everything else is just setup.
It should be clear, however, that that particular attempt cannot work, because all other considerations aside, you need something that adapts to the un-padded length of the digit string. That computation has no dependency on the length of the digit string at all.
Since you're already accumulating the digit string in a StringBuilder, it seems sensible to apply the needed changes to it, before reading out the result. There are several ways you could approach that, but a pretty simple one would be to just insert() zeroes one at a time until you reach the wanted length:
while (builder.length() < 8) {
builder.insert(0, '0'); // Inserts char '0' at position 0
}
I do suspect, however, that you may have interpreted the problem wrongly. The result you obtain from doing what you ask is ambiguous: in most cases where such padding is necessary, there are several input strings that could produce the same output. I am therefore inclined to guess that what is actually wanted is to pad the digits corresponding to each input character on a per-character basis, so that an input of "12" would yield the result "00310032". This would be motivated by the fact that Java char values are 16 bits wide, and it would produce a transformation that is reliably reversible. If that's what you really want, then you should be able to adapt the approach I've presented to achieve it (though in that case there are easier ways).
if use code
Long.toHexString(blabla);
is it true to change the value "0x00" to "\0030" so that the output of
0 is 30
The Unicode code value for the character '0', expressed in hexadecimal, is 30. Your method of conversion would produce that for the input string "0". Your method does not lend any special significance to the character '\' in its input.

Char to int and back again

I want to find an integer representation of a character, then later on find a character representation of an int.
My current solution is this, but it doesn't work:
String s = "A"
Integer b = Character.getNumericValue(s.toCharArray()[0]); // 10
char c = Character.toChars(b)[0]; // (blank)
How should i do this?

There's a misunderstanding in how Character.getNumericValue(char) works.
Returns the int value that the specified Unicode character represents. For example, the character '\u216C' (the roman numeral fifty) will return an int with a value of 50.
The letters A-Z in their uppercase ('\u0041' through '\u005A'), lowercase ('\u0061' through '\u007A'), and full width variant ('\uFF21' through '\uFF3A' and '\uFF41' through '\uFF5A') forms have numeric values from 10 through 35. This is independent of the Unicode specification, which does not assign numeric values to these char values.
from the java docs.
In other words:
The method getNumericValue does not produce the UTF-16 value of the specified character, but attempts to produce a numeric value - in the sense that '0' will produce 0 - from the given value. And 'A' corresponds to 10 in hex.
But 10 is not the UTF16 value corresponding to 'A'.
A correct way of converting a char to it's corresponding UTF16-value would be to use one of the codepoint-methods from Character. Or if you're absolutely positive, all values will be of the BMP, you can use
char c = 'A';
int i = (int) c;
In this case Character.toChars(int) will also work.

int b = s.charAt(0); // 65
char c = (char) b // 'A'

Why a character Array accepts integer values in Java?

Of course I'm a beginner for Java, previously I learned C. Please take a look on the following code segments.
char Character;
int Number = 27;
Character = Number;
System.out.println(Character);
The above code cannot be compiled as an error stated as “Loss of Information”
Whereas the following code...
char Character = ‘F’;
int Number;
Number = Character;
System.out.println(Number);`
The above code can be compiled but the output is “70”... not as “F”
Also take a look on the following code...
char [] arrayCh = new char [3];
arrayCh [0] = 27;
System.out.println(arrayCh[0]);
The above code can be compiled however it also gives an unfamiliar symbol...
I know the issues regarding the ASCII Values and the memory taking as 'char' takes 16 Bits, 'int' takes 32 Bits. Therefore an integer value couldn’t be assigned in to a character variable whereas a character value can be assigned in to an integer variable as "ASCII" value.
My question is... why a 'char' array accepts 'int' values..? Could anyone explain?

A char is a 2-bytes long, unsigned integer. 27 is an integer literal that is in the range of a char, so the compiler accepts to let you assign it to a char.
'F' is a character literal that represents the character F, which has the decimal value 70 in the unicode standard. So, assigning 'F' to an integer is the same thing as assigning 70.

Converting a int to char and then back to int - doesn't give same result always

I am trying to get a char from an int value > 0xFFFF. But instead, I always get back the same char value, that when cast to an int, prints the value 65535 (0xFFFF).
I couldn't understand why it is generating symbols for unicode > 0xFFFF.
int hex = 0x10FFFF;
char c = (char)hex;
System.out.println((int)c);
I expected the output to be 0x10FFFF. Instead, the output comes back as 65535.

This is because, while an int is 4 bytes, a char is only 2 bytes. Thus, you can't represent all values in a char that you can in an int. Using a standard unsigned integer representation, you can only represent the range of values from 0 to 2^16 - 1 == 65535 in a 2-byte value, so if you convert any number outside that range to a 2-byte value and back, you'll lose data.

int is 4 byte. char is 2 byte.
Your number was well within range an int can hold, but not which char can.
So when you converted that number to a char, it lost data and became the maximum a char can hold, which is what it printed i.e. 65535

Your number was too big to be a char which is 2 bytes. But it was small enough where it fit in as an int which is 4 bytes. 65535 is the biggest amount that fits in a char so that's why you got that value. Also, if a char was big enough to fit your number, when you returned it to an int it might have returned the decimal value for 0x10FFFF which is 1114111.

Unfortunately, I think you were expecting a Java char to be the same thing as a Unicode code point. They are not the same thing.
The Java char, as already expressed by other answers, can only support code points that can be represented in 16 bits, whereas Unicode needs 21 bits to support all code points.
In other words, a Java char on its own, only supports Basic Multilingual Plane characters (code points <= 0xFFFF). In Java, if you want to represent a Unicode code point that is in one of the extended planes (code points > 0xFFFF), then you need surrogate characters, or a pair of characters to do that. This is how UTF-16 works. And, internally, this is how Java strings work as well. Just for fun, run the following snippet to see how a single Unicode code point is actually represented by 2 characters if the code point is > 0xFFFF:
// Printing string length for a string with
// a single unicode code point: 0x22BED.
System.out.println("𢯭".length()); // prints 2, because it uses a surrogate pair.
If you want to safely convert an int value that represents a Unicode code point to a char (or chars to be more exact), and then convert it back to an int code point, you will have to use code like this:
public static void main(String[] args) {
int hex = 0x10FFFF;
System.out.println(Character.isSupplementaryCodePoint(hex)); // prints true because hex > 0xFFFF
char[] surrogateChars = Character.toChars(hex);
int codePointConvertedBack = Character.codePointAt(surrogateChars, 0);
System.out.println(codePointConvertedBack); // prints 1114111
}
Alternatively, instead of manipulating char arrays, you can use a String, like this:
public static void main(String[] args) {
int hex = 0x10FFFF;
System.out.println(Character.isSupplementaryCodePoint(hex)); // prints true because hex > 0xFFFF
String s = new String(new int[] {hex}, 0, 1);
int codePointConvertedBack = s.codePointAt(0);
System.out.println(codePointConvertedBack); // prints 1114111
}
For further reading: Java Character Class

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.