Java String comparison issue with blank spaces - java

I invoke using java some web service which returns some values to me.
Those values are some attribute names which I successful receive. I put the returned values in database.
For example the values I retrieved are:
DSLAM port
Interfejs na ISP
So when I look in the database those values are stored in DB as DSLAM port and Interfejs na ISP.
That is how I received them and that is how they are stored in DB (so only with one blank space between the words).
So I'm receiving those values from a web service but when I try to do a comparison additional in the class:
if ( attribute.trim().equalsIgnoreCase("Interfejs na ISP") ) {
System.out.println("attr2");
}
or
if ( attribute.trim().equalsIgnoreCase("DSLAM port") ) {
System.out.println("attr3");
}
I am not having the System Print lines to my console, if is always false.
What can be the problem and how can I solve it?
This is a really strange behavior for me. Attributes are stored correctly and only when I try to compare it I get strange behavior. The if clause is never true. Can there be some issue with the language format?
Additionally if I try with single word:
if (attribute.trim().equalsIgnoreCase("Telefon"))
{ System.out.println("attr5");
}
Then it writes in System Out.
So with sinlge word it seems it does not have problems

Output a serialization of the byte array behind the values of attribute to see exactly what characters it contains. This will help you catch stuff like nbsp for whitespace etc. When this is done, you can change your string matching to what you actually get back from the database, or tune your persistence to produce straight forward values.
It is not enough to just println the variable, instead, you have to iterate over all the buckets in the array and output them in hex or dec - your target output is something like "[44, 53, 4C, .. ]", which would correspond to "DSL..". To convert the byte array to a hex representation, you can use this snippet, or as an exercise, try it on your own:
public static String convertToHexString(byte[] data) {
StringBuffer buf = new StringBuffer();
for (int i = 0; i < data.length; i++) {
int nibble = (data[i] >>> 4) & 0x0F;
int two_nibbles = 0;
do {
if ((0 <= nibble) && (nibble <= 9))
buf.append((char) ('0' + nibble));
else
buf.append((char) ('a' + (nibble - 10)));
nibble = data[i] & 0x0F;
} while (two_nibbles++ < 1);
}
return buf.toString();
}
Now when you have that output, take an ascii table to look up which values are contained in the string and change your ifs depending on what is actually contained in the Strings. Possibly by using a matching regex.
Chances are the whitespaces are some non-trivial blank characters like (Hex: A0), but it's also possible that you are having an encoding problem. Feel free to post the hex values if the character tables don't help.

Related

How to compare Chinese characters in Java using 'equals()'

I want to compare a string portion (i.e. character) against a Chinese character. I assume due to the Unicode encoding it counts as two characters, so I'm looping through the string with increments of two. Now I ran into a roadblock where I'm trying to detect the '兒' character, but equals() doesn't match it, so what am I missing ? This is the code snippet:
for (int CharIndex = 0; CharIndex < tmpChar.length(); CharIndex=CharIndex+2) {
// Account for 'r' like in dianr/huir
if (tmpChar.substring(CharIndex,CharIndex+2).equals("兒")) {
Also, feel free to suggest a more elegant way to parse this ...
[UPDATE] Some pics from the debugger, showing that it doesn't match, even though it should. I pasted the Chinese character from the spreadsheet I use as input, so I don't think it's a copy and paste issue (unless the unicode gets lost along the way)
oh, dang, apparently it does not work simply copy and pasting:
Use CharSequence.codePoints(), which returns a stream of the codepoints, rather than having to deal with chars:
tmpChar.codePoints().forEach(c -> {
if (c == '兒') {
// ...
}
});
(Of course, you could have used tmpChar.codePoints().filter(c -> c == '兒').forEach(c -> { /* ... */ })).
Either characters, accepting 兒 as substring.
String s = ...;
if (s.contains("兒")) { ... }
int position = s.indexOf("兒");
if (position != -1) {
int position2 = position + "兒".length();
s = s.substring(0, position) + "*" + s.substring(position2);
}
if (s.startsWith("兒", i)) {
// At position i there is a 兒.
}
Or code points where it would be one code point. As that is not really easier, variable substring seem fine.
if (tmpChar.substring(CharIndex,CharIndex+2).equals("兒")) {
Is your problem. 兒 is only one UTF-16 character. Many Chinese characters can be represented in UTF-16 in one code unit; Java uses UTF-16. However, other characters are two code units.
There are a variety of APIs on the String class for coping.
As offered in another answer, obtaining the IntStream from codepoints allows you to get a 32-bit code point for each character. You can compare that to the code point value for the character you are looking for.
Or, you can use the ICU4J library with a richer set of facilities for all of this.

Create string with emoji unicode flag countries

i need to create a String with a country flag unicode emoji..I did this:
StringBuffer sb = new StringBuffer();
sb.append(StringEscapeUtils.unescapeJava("\\u1F1EB"));
sb.append(StringEscapeUtils.unescapeJava("\\u1F1F7"));
Expecting one country flag but i havent..How can i get a unicode country flag emoji in String with the unicodes characters?
The problem is, that the "\uXXXX" notation is for 4 hexadecimal digits, forming a 16 bit char.
You have Unicode code points above the 16 bit range, both U+F1EB and U+1F1F7. This will be represented with two chars, a so called surrogate pair.
You can either use the codepoints to create a string:
int[] codepoints = {0x1F1EB, 0x1F1F7};
String s = new String(codepoints, 0, codepoints.length);
Or use the surrogate pairs, derivable like this:
System.out.print("\"");
for (char ch : s.toCharArray()) {
System.out.printf("\\u%04X", (int)ch);
}
System.out.println("\"");
Giving
"\uD83C\uDDEB\uD83C\uDDF7"
Response to the comment: How to Decode
"\uD83C\uDDEB" are two surrogate 16 bit chars representing U+1F1EB and "\uD83C\uDDF7" is the surrogate pair for U+1F1F7.
private static final int CP_REGIONAL_INDICATOR = 0x1F1E7; // A-Z flag codes.
/**
* Get the flag codes of two (or one) regional indicator symbols.
* #param s string starting with 1 or 2 regional indicator symbols.
* #return one or two ASCII letters for the flag, or null.
*/
public static String regionalIndicator(String s) {
int cp0 = regionalIndicatorCodePoint(s);
if (cp0 == -1) {
return null;
}
StringBuilder sb = new StringBuilder();
sb.append((char)(cp0 - CP_REGIONAL_INDICATOR + 'A'));
int n0 = Character.charCount(cp0);
int cp1 = regionalIndicatorCodePoint(s.substring(n0));
if (cp1 != -1) {
sb.append((char)(cp1 - CP_REGIONAL_INDICATOR + 'A'));
}
return sb.toString();
}
private static int regionalIndicatorCodePoint(String s) {
if (s.isEmpty()) {
return -1;
}
int cp0 = s.codePointAt(0);
return CP_REGIONAL_INDICATOR > cp0 || cp0 >= CP_REGIONAL_INDICATOR + 26 ? -1 : cp0;
}
System.out.println("Flag: " + regionalIndicator("\uD83C\uDDEB\uD83C\uDDF7"));
Flag: EQ
You should be able to do that simply using toChars from java.lang.Character.
This works for me:
StringBuffer sb = new StringBuffer();
sb.append(Character.toChars(127467));
sb.append(Character.toChars(127479));
System.out.println(sb);
prints 🇫🇷, which the client can chose to display like a french flag, or in other ways.
If you want to use emojis often, it could be good to use a library that would handle that unicode stuff for you: emoji-java
You would just add the maven dependency:
<dependency>
<groupId>com.vdurmont</groupId>
<artifactId>emoji-java</artifactId>
<version>1.0.0</version>
</dependency>
And call the EmojiManager:
Emoji emoji = EmojiManager.getForAlias("fr");
System.out.println("HEY: " + emoji.getUnicode());
The entire list of supported emojis is here.
I suppose you want to achieve something like this
Let me give you 2 example of unicodes for country flags:
for ROMANIA ---> \uD83C\uDDF7\uD83C\uDDF4
for AMERICA ---> \uD83C\uDDFA\uD83C\uDDF8
You can get this and other country flags unicodes from this site Emoji Unicodes
Once you enter the site, you will see a table with a lot of emoji. Select the tab with FLAGS from that table (is easy to find it) then will appear all the country flags. You need to select one flag from the list, any flag you want... but only ONE. After that will appear a text code in the message box...that is not important. Important is that you have to look in the right of the site where will appear flag and country name of your selected flag. CLICK on that, and on the page that will open you need to find the TABLE named Emoji Character Encoding Data. Scroll until the last part of table where sais: C/C++/Java Src .. there you will find the correct unicode flag. Attention, always select the unicode that is long like that, some times if you are not carefull you can select a simple unicode, not long like that. So, keep that in mind.
Indications image 1
Indication image 2
In the end i will post a sample code from an Android app of mine that will work on java the same way.
ArrayList<String> listLanguages = new ArrayList<>();
listLanguages.add("\uD83C\uDDFA\uD83C\uDDF8 " + getString(R.string.English));
listLanguages.add("\uD83C\uDDF7\uD83C\uDDF4 " + getString(R.string.Romanian));
Another simple custom example:
String flagCountryName = "\uD83C\uDDEF\uD83C\uDDF2 Jamaica";
You can use this variable where you need it. This will show you the flag of Jamaica in front of the text.
This is all, if you did not understand something just ask.
Look at Creating Unicode character from its number
Could not get my machine to print the Unicode you have there, but for other values it works.

decode character from unicode using java

I was unable to insert a chinese character to mysql. So I though of doing this. I have a excel sheet where I have chinese characters. Like 秀昭 and so on.
I got them converted to unicode representations like \uxxx using below code which I got from SO, and then I stored in MySQL.
private static String escapeNonAscii(String str) {
List<String> arr = new ArrayList<String>();
StringBuilder retStr = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
int cp = Character.codePointAt(str, i);
System.out.println("cp="+cp);
int charCount = Character.charCount(cp);
if (charCount > 1) {
i += charCount - 1; // 2.
if (i >= str.length()) {
throw new IllegalArgumentException("truncated unexpectedly");
}
}
if (cp < 128) {
retStr.appendCodePoint(cp);
} else {
retStr.append(String.format("\\u%x", cp));
arr.add(String.format("\\\\u%x", cp));
}
}
return retStr.toString();
}
The values have been stored properly. So now I need to display them back. When I tried
System.out.println("\u8BF7\u5728\u6B64\u5904");
It gives me proper output like,
`请在此`
But when I read from DB and did like
System.out.println(rs.getString(1).trim().toString() + " from DB");
It printed
`\u8BF7\u5728\u6B64\u5904`
What might be the problem? Have I missed anything? please help.
Escaped characters will only be processed prior to compiling. To store and retrieve the data from a database, you only have to consider two things: Make sure the data you read had the correct encoding. And when printing the data the correct encoding is set.
If you read data on a windows machine, it is posible you have to use the cp* encodings. Just use a InputStreamReader and set the charset. Now you have the data in the JVM. The internal encoding is some utf-16. Now that you use a type 4 jdbc, you do not have to worry about encoding, except that your database needs a encoding capable to store the data. UTF-8 or Unicode will to the trick. Consult your jdbc documentation for properties to set. Sometimes you have set an encoding explicitly (jdbc:mysql://localhost:3306/?useUnicode=yes&characterEncoding=UTF-8).
When outputting the data, sometimes the output must have a specific encoding. Normally, your JVM runs with the default system char set but you need another one, for example when rendering a HTML file.

Comparing String Integers Issue

I have a scanner that reads a 7 character alphanumeric code (inputted by the user). the String variable is called "code".
The last character of the code (7th character, 6th index) MUST BE NUMERIC, while the rest may be either numeric or alphabetical.
So, I sought ought to make a catch, which would stop the rest of the method from executing if the last character in the code was anything but a number (from 0 - 9).
However, my code does not work as expected, seeing as even if my code ends in an integer between 0 and 9, the if statement will be met, and print out "last character in code is non-numerical).
example code: 45m4av7
CharacterAtEnd prints out as the string character 7, as it should.
however my program still tells me my code ends non-numerically.
I'm aware that my number values are string characters, but it shouldnt matter, should it?
also I apparently cannot compare actual integer values with an "|", which is mainly why im using String.valueOf, and taking the string characters of 0-9.
String characterAtEnd = String.valueOf(code.charAt(code.length()-1));
System.out.println(characterAtEnd);
if(!characterAtEnd.equals(String.valueOf(0|1|2|3|4|5|6|7|8|9))){
System.out.println("INVALID CRC CODE: last character in code in non-numerical.");
System.exit(0);
I cannot for the life of me, figure out why my program is telling me my code (that has a 7 at the end) ends non-numerically. It should skip the if statement and continue on. right?
The String contains method will work here:
String digits = "0123456789";
digits.contains(characterAtEnd); // true if ends with digit, false otherwise
String.valueOf(0|1|2|3|4|5|6|7|8|9) is actually "15", which of course can never be equal to the last character. This should make sense, because 0|1|2|3|4|5|6|7|8|9 evaluates to 15 using integer math, which then gets converted to a String.
Alternatively, try this:
String code = "45m4av7";
char characterAtEnd = code.charAt(code.length() - 1);
System.out.println(characterAtEnd);
if(characterAtEnd < '0' || characterAtEnd > '9'){
System.out.println("INVALID CRC CODE: last character in code in non-numerical.");
System.exit(0);
}
You are doing bitwise operations here: if(!characterAtEnd.equals(String.valueOf(0|1|2|3|4|5|6|7|8|9)))
Check out the difference between | and ||
This bit of code should accomplish your task using regular expressions:
String code = "45m4av7";
if (!code.matches("^.+?\\d$")){
System.out.println("INVALID CRC CODE");
}
Also, for reference, this method sometimes comes in handy in similar situations:
/* returns true if someString actually ends with the specified suffix */
someString.endsWith(suffix);
As .endswith(suffix) does not take regular expressions, if you wanted to go through all possible lower-case alphabet values, you'd need to do something like this:
/* ASCII approach */
String s = "hello";
boolean endsInLetter = false;
for (int i = 97; i <= 122; i++) {
if (s.endsWith(String.valueOf(Character.toChars(i)))) {
endsInLetter = true;
}
}
System.out.println(endsInLetter);
/* String approach */
String alphabet = "abcdefghijklmnopqrstuvwxyz";
boolean endsInLetter2 = false;
for (int i = 0; i < alphabet.length(); i++) {
if (s.endsWith(String.valueOf(alphabet.charAt(i)))) {
endsInLetter2 = true;
}
}
System.out.println(endsInLetter2);
Note that neither of the aforementioned approaches are a good idea - they are clunky and rather inefficient.
Going off of the ASCII approach, you could even do something like this:
ASCII reference : http://www.asciitable.com/
int i = (int)code.charAt(code.length() - 1);
/* Corresponding ASCII values to digits */
if(i <= 57 && i >= 48){
System.out.println("Last char is a digit!");
}
If you want a one-liner, stick to regular expressions, for example:
System.out.println((!code.matches("^.+?\\d$")? "Invalid CRC Code" : "Valid CRC Code"));
I hope this helps!

How to detect end of string in byte array to string conversion?

I receive from socket a string in a byte array which look like :
[128,5,6,3,45,0,0,0,0,0]
The size given by the network protocol is the total lenght of the string (including zeros) so , in my exemple 10.
If i simply do :
String myString = new String(myBuffer);
I have at the end of the string 5 non correct caracter. The conversion don't seems to detect the end of string caracter (0).
To get the correct size and the correct string i do this :
int sizeLabelTmp = 0;
//Iterate over the 10 bit to get the real size of the string
for(int j = 0; j<(sizeLabel); j++) {
byte charac = datasRec[j];
if(charac == 0)
break;
sizeLabelTmp ++;
}
// Create a temp byte array to make a correct conversion
byte[] label = new byte[sizeLabelTmp];
for(int j = 0; j<(sizeLabelTmp); j++) {
label[j] = datasRec[j];
}
String myString = new String(label);
Is there a better way to handle the problem ?
Thanks
May be its too late, But it may help others. The simplest thing you can do is new String(myBuffer).trim() that gives you exactly what you want.
0 isn't an "end of string character". It's just a byte. Whether or not it only comes at the end of the string depends on what encoding you're using (and what the text can be). For example, if you used UTF-16, every other byte would be 0 for ASCII characters.
If you're sure that the first 0 indicates the end of the string, you can use something like the code you've given, but I'd rewrite it as:
int size = 0;
while (size < data.length)
{
if (data[size] == 0)
{
break;
}
size++;
}
// Specify the appropriate encoding as the last argument
String myString = new String(data, 0, size, "UTF-8");
I strongly recommend that you don't just use the platform default encoding - it's not portable, and may well not allow for all Unicode characters. However, you can't just decide arbitrarily - you need to make sure that everything producing and consuming this data agrees on the encoding.
If you're in control of the protocol, it would be much better if you could introduce a length prefix before the string, to indicate how many bytes are in the encoded form. That way you'd be able to read exactly the right amount of data (without "over-reading") and you'd be able to tell if the data was truncated for some reason.
You can always start at the end of the byte array and go backwards until you hit the first non-zero. Then just copy that into a new byte and then String it. Hope this helps:
byte[] foo = {28,6,3,45,0,0,0,0};
int i = foo.length - 1;
while (foo[i] == 0)
{
i--;
}
byte[] bar = Arrays.copyOf(foo, i+1);
String myString = new String(bar, "UTF-8");
System.out.println(myString.length());
Will give you a result of 4.
Strings in Java aren't ended with a 0, like in some other languages. 0 will get turned into the so-called null character, which is allowed to appear in a String. I suggest you use some trimming scheme that either detects the first index of the array that's a 0 and uses a sub-array to construct the String (assuming all the rest will be 0 after that), or just construct the String and call trim(). That'll remove leading and trailing whitespace, which is any character with ASCII code 32 or lower.
The latter won't work if you have leading whitespace you must preserve. Using a StringBuilder and deleting characters at the end as long as they're the null character would work better in that case.
It appears to me that you are ignoring the read-count returned by the read() method. The trailing null bytes probably weren't sent, they are probably still left over from the initial state of the buffer.
int count = in.read(buffer);
if (count < 0)
; // EOS: close the socket etc
else
String s = new String(buffer, 0, count);
Not to dive into the protocol considerations that the original OP mentioned, how about this for trimming the trailing zeroes ?
public static String bytesToString(byte[] data) {
String dataOut = "";
for (int i = 0; i < data.length; i++) {
if (data[i] != 0x00)
dataOut += (char)data[i];
}
return dataOut;
}

Categories