how to distinguish Unicode characters and ASCII characters - java

I want to distinguish Unicode characters and ASCII characters from the below string:
abc\u263A\uD83D\uDE0A\uD83D\uDE22123
How can I distinguish characters? Can anyone help me with this issue? I have tried some code, but it crashes in some cases. What is wrong with my code?
The first three characters are abc, and the last three characters are 123. The rest of the string is Unicode characters. I want to make a string array like this:
str[0] = 'a';
str[1] = 'b';
str[2] = 'c';
str[3] = '\u263A\uD83D';
str[4] = '\uDE0A\uD83D';
str[5] = '\uDE22';
str[6] = '1';
str[7] = '2';
str[8] = '3';
Code:
private String[] getCharArray(String unicodeStr) {
ArrayList<String> list = new ArrayList<>();
for (int i = 0; i < unicodeStr.length(); i++) {
if (unicodeStr.charAt(i) == '\\') {
list.add(unicodeStr.substring(i, i + 11));
i = i + 11;
} else {
list.add(String.valueOf(unicodeStr.charAt(i)));
}
}
return list.toArray(new String[list.size()]);
}

ASCII characters exist in Unicode, they are Unicode codepoints U+0000 - U+007F, inclusive.
Java strings are represented in UTF-16, which is a 16-bit byte encoding of Unicode. Each Java char is a UTF-16 code unit. Unicode codepoints U+0000 - U+FFFF use 1 UTF-16 code unit and thus fit in a single char, whereas Unicode codepoints U+10000 and higher require a UTF-16 surrogate pair and thus need two chars.
If the string has UTF-16 code units represented as actual char values, then you can use Java's string methods that work with codepoints, eg:
private String[] getCharArray(String unicodeStr) {
ArrayList<String> list = new ArrayList<>();
int i = 0, j;
while (i < unicodeStr.length()) {
j = unicodeStr.offsetByCodePoints(i, 1);
list.add(unicodeStr.substring(i, j));
i = j;
}
return list.toArray(new String[list.size()]);
}
On the other hand, if the string has UTF-16 code units represented in an encoded "\uXXXX" format (ie, as 6 distinct characters - '\', 'u', ...), then things get a little more complicated as you have to parse the encoded sequences manually.
If you want to preserve the "\uXXXX" strings in your array, you could do something like this:
private boolean isUnicodeEncoded(string s, int index)
{
return (
(s.charAt(index) == '\\') &&
((index+5) < s.length()) &&
(s.charAt(index+1) == 'u')
);
}
private String[] getCharArray(String unicodeStr) {
ArrayList<String> list = new ArrayList<>();
int i = 0, j, start;
char ch;
while (i < unicodeStr.length()) {
start = i;
if (isUnicodeEncoded(unicodeStr, i)) {
ch = (char) Integer.parseInt(unicodeStr.substring(i+2, i+6), 16);
j = 6;
}
else {
ch = unicodeStr.charAt(i);
j = 1;
}
i += j;
if (Character.isHighSurrogate(ch) && (i < unicodeStr.length())) {
if (isUnicodeEncoded(unicodeStr, i)) {
ch = (char) Integer.parseInt(unicodeStr.substring(i+2, i+6), 16);
j = 6;
}
else {
ch = unicodeStr.charAt(i);
j = 1;
}
if (Character.isLowSurrogate(ch)) {
i += j;
}
}
list.add(unicodeStr.substring(start, i));
}
return list.toArray(new String[list.size()]);
}
If you want to decode the "\uXXXX" strings into actual chars in your array, you could do something like this instead:
private boolean isUnicodeEncoded(string s, int index)
{
return (
(s.charAt(index) == '\\') &&
((index+5) < s.length()) &&
(s.charAt(index+1) == 'u')
);
}
private String[] getCharArray(String unicodeStr) {
ArrayList<String> list = new ArrayList<>();
int i = 0, j;
char ch1, ch2;
while (i < unicodeStr.length()) {
if (isUnicodeEncoded(unicodeStr, i)) {
ch1 = (char) Integer.parseInt(unicodeStr.substring(i+2, i+6), 16);
j = 6;
}
else {
ch1 = unicodeStr.charAt(i);
j = 1;
}
i += j;
if (Character.isHighSurrogate(ch1) && (i < unicodeStr.length())) {
if (isUnicodeEncoded(unicodeStr, i)) {
ch2 = (char) Integer.parseInt(unicodeStr.substring(i+2, i+6), 16);
j = 6;
}
else {
ch2 = unicodeStr.charAt(i);
j = 1;
}
if (Character.isLowSurrogate(ch2)) {
list.add(String.valueOf(new char[]{ch1, ch2}));
i += j;
continue;
}
}
list.add(String.valueOf(ch1));
}
return list.toArray(new String[list.size()]);
}
Or, something like this (per https://stackoverflow.com/a/24046962/65863):
private String[] getCharArray(String unicodeStr) {
Properties p = new Properties();
p.load(new StringReader("key="+unicodeStr));
unicodeStr = p.getProperty("key");
ArrayList<String> list = new ArrayList<>();
int i = 0;
while (i < unicodeStr.length()) {
if (Character.isHighSurrogate(unicodeStr.charAt(i)) &&
((i+1) < unicodeStr.length()) &&
Character.isLowSurrogate(unicodeStr.charAt(i+1)))
{
list.add(unicodeStr.substring(i, i+2));
i += 2;
}
else {
list.add(unicodeStr.substring(i, i+1));
++i;
}
}
return list.toArray(new String[list.size()]);
}

It's not entirely clear what you're asking for, but if you want to tell if a specific character is ASCII, you can use Guava's ChatMatcher.ascii().
if ( CharMatcher.ascii().matches('a') ) {
System.out.println("'a' is ascii");
}
if ( CharMatcher.ascii().matches('\u263A\uD83D') ) {
// this shouldn't be printed
System.out.println("'\u263A\uD83D' is ascii");
}

Related

find the correct algorithme to find all the possible binary combination

I'm trying to write a non-recursive Java method called showStar, which takes a string, and generates ALL possible combinations of that string without the mask “*” characters.
receiving this as an input "1011*00*10",
the method `showStar` will display output like:
1011000010
1011000110
1011100010
1011100110
I tried this way, however, as soon as the number of possible cases is more than the String length, the output is not exact.
Here is my code.
public static void showStar(String s){
String save;
int count = 0;
int poss;
save = s.replace('*','0');
StringBuilder myString = new StringBuilder(save);
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == '*' && myString.charAt(i) == '0') {
myString.setCharAt(i, '1');
System.out.println(myString);
}
}
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == '*' && myString.charAt(i) == '1') {
myString.setCharAt(i, '0');
System.out.println(myString);
}
}
}
Say there are k *s. Then there are 2^k solutions. You can generate these by copying the bits from the integers 0 - 2^k-1 in order. (adding sufficient leading zeroes)
E.g. 1**1:
0 = 00 => 1001
1 = 01 => 1011
2 = 10 => 1101
3 = 11 => 1111
Here a recursive algoritm works just perfectly:
You check if an input string contains an asterisk '*' by using an x = str.indexOf('*');
If no asterisk is present (x == -1), you just print the string and return
Otherwise, you replace the asterisk at the position to '0' and '1' and call showStar() recursively for both replacements
public static void showStar(String str) {
int x = str.indexOf('*');
if(x == -1) {
System.out.println(str);
return;
}
String prefix = str.substring(0, x);
String suffix = str.substring(x + 1);
for (char i = '0'; i <= '1'; i++) {
showStar(prefix + i + suffix);
}
}
Update
In non-recursive implementation we need to collect the asterisk positions, then prepare a binary representation and set appropriate bits at the known positions:
public static void showStar(String str) {
int[] xs = IntStream.range(0, str.length())
.filter(i -> str.charAt(i) == '*')
.toArray();
int num = (int) Math.pow(2, xs.length); // 2^n variants for n asterisks
String format = xs.length > 0 ? "%" + xs.length + "s" : "%s"; // fix if no '*'
for (int i = 0; i < num; i++) {
String bin = String.format(format, Integer.toBinaryString(i))
.replace(' ', '0'); // pad leading zeros
StringBuilder sb = new StringBuilder(str);
// set 0 or 1 in place of asterisk(s)
for (int j = 0; j < xs.length; j++) {
sb.setCharAt(xs[j], bin.charAt(j));
}
System.out.println(sb);
}
}

Sorting a string with Capital small and numbers

Sort a string with small capital and numbers in Java
aAbcB1C23 .
Answer ABCabc123.
I tried sorting the array both ascending as well as decending but did not work as in both the ways ABC is coming in the middle. Any ideas ?
I'd like to solve with O(1) auxillary space and may be O(n log n) time ..
public class SortTheGivenStringAlphabetically {
public static void performAction() {
String input = "aAbcB1C23";
char[] inputCharArray = input.toCharArray();
sort(inputCharArray, 0, (inputCharArray.length) - 1);
for (int i = 0; i < inputCharArray.length; i++) {
System.out.println(inputCharArray[i]);
}
}
public static void sort(char[] array, int low, int high) {
if (low < high) {
int pi = partition(array, low, high);
sort(array, low, pi - 1);
sort(array, pi + 1, high);
}
}
private static int partition(char[] array, int low, int high) {
int pivot = array[high];
int i = low - 1;
for (int j = low; j < high; j++) {
if (array[j] <= pivot) {
i++;
char temp = array[i];
array[i] = array[j];
array[j] = temp;
}
}
char temp = array[i + 1];
array[i + 1] = array[high];
array[high] = temp;
return i + 1;
}
public static void main(String[] args) {
performAction();
}
}
Create 3 ArrayLists.
Separate all characters from the input and add them to the specific ArrayList.
Then sort them using Collections.sort().
Finally combine all the characters in the order you want.
String input = "aAbcB1C23";
ArrayList<Character> capital = new ArrayList(),
simple = new ArrayList(),
numbers = new ArrayList();
for (Character c : input.toCharArray()) {
if (Character.isLetter(c)) {
if (Character.isUpperCase(c)) {
capital.add(c);
} else {
simple.add(c);
}
} else {
numbers.add(c);
}
}
Collections.sort(simple);
Collections.sort(capital);
Collections.sort(numbers);
StringBuilder output = new StringBuilder();
for (Character c : capital) {
output.append(c);
}
for (Character c : simple) {
output.append(c);
}
for (Character c : numbers) {
output.append(c);
}
System.out.println(output.toString());
Output:
ABCabc123
The meanness is that the natural order is '1' (49) < 'A' (65) < 'a' (97).
String input = "aAbcB1C23"; // Sorted: ABCabc123
char[] array = input.toCharArray();
sort(array, 0, (array.length) - 1);
So either you could reorder the wrongly sorted result:
output = output.replaceFirst("^([0-9]*)([A-z]*)$", "$2$1");
or map every concerned char to a correct value: easiest with a function:
int value(char ch) {
if ('A' <= ch && ch <= 'Z') {
return 100 + (ch - 'A');
} else if ('a' <= ch && ch <= 'z') {
return 200 + (ch - 'a');
} else if ('0' <= ch && ch <= '9') {
return 300 + (ch - '0');
} else {
return 400 + (int) ch;
}
}
Now compare value(array[i]).
First loop iterate no of times the no of character present in String.And last two loops run for constant time (78 times).So time complexity wise it could be effecient.
public static String SortSstring(String input) {
int []intArr=new int[78];
for(int i=0;i<input.length();i++)
intArr[input.charAt(i)-48] ++;
String OutputString="";
for(int i=17;i<78;i++){
OutputString+=String.valueOf(new char[intArr[i]]).replace('\0', (char)(i+48));
}
for(int i=0;i<10;i++){
OutputString+=String.valueOf(new char[intArr[i]]).replace('\0', (char)(i+48));
}
return OutputString;
}
}
One option is to use integer value of a character when doing comparison. To make numbers appear at the end we can add some fixed value to it (e.g. 100).
public static void main(String[] args) {
String input = "aAbcB1C23";
char[] charArray = input.toCharArray();
Character[] charObjectArray = ArrayUtils.toObject(charArray);
Arrays.sort(charObjectArray, new Comparator<Character>() {
#Override
public int compare(Character o1, Character o2) {
Integer i1 = convert(Integer.valueOf(o1));
Integer i2 = convert(Integer.valueOf(o2));
return i1.compareTo(i2);
}
private Integer convert(Integer original) {
if (original < 58) {
return original + 100;
}
return original;
}
});
System.out.println(new String(ArrayUtils.toPrimitive(charObjectArray)));
}

How do i Replace every n-th character x from an String

My Question would be how can replace every 3rd ';' from a String a put a ',' at this position ?
for eg.:
String s = "RED;34;34;BLUE;44;44;GREEN;8;8;BLUE;53;53"
so that the String looks like:
RED;34;34,BLUE;44;44,GREEN;8;8,BLUE;53;53
I tried to solve it like this but i can't take a charAt(i) and replace it with an other char.
int counter =0;
for (int i=0;i<s.length();i++){
if(s.charAt(i) == ';'){
counter++;
}
if(counter ==3){
s.charAt(i)=',';
counter =0;
}
}
Normally some own effort is demanded from the question, but regex is hard.
s = s.replaceAll("([^;]*;[^;]*;[^;]*);", "$1,");
A sequence of 0 or more of not-semicolon followed by semicolon and such.
[^ ...characters... ] is some char not listed.
...* is zero or more of the immediately preceding match.
The match of the 1st group (...) is given in $1, so actually only the last semicolon is replaced by a comma.
You can use the modulo % operator to know the 3rd time something occurs. And a simple conversion between string and char array to do the rest:
class Main {
public static void main(String[] args) {
String s1 = "RED;34;34;BLUE;44;44;GREEN;8;8;BLUE;53;53";
char [] s = s1.toCharArray();
int j=0;
for(int i=0;i<s.length;i++){
if (s[i]==';') {
j++;
if(j % 3 == 0) {
s[i] = ',';
}
}
}
System.out.println(s);
}
}
There are many ways to do it, as I suggested in a comment. Here are implementations of the ones I suggested, but there are of course more ways than this.
The first is the simplest, from a code point of view, if you know regex. See answer by Joop Eggen for an explanation.
The second is likely the fastest, especially if you eliminate the % modulo operator by resetting j to 0 instead.
private static String usingRegex(String s) {
return s.replaceAll("([^;]*;[^;]*;[^;]*);", "$1,");
}
private static String usingCharArray(String s) {
char[] arr = s.toCharArray();
for (int i = 0, j = 0; i < arr.length; i++)
if (arr[i] == ';' && ++j % 3 == 0)
arr[i] = ',';
return new String(arr);
}
private static String usingStringBuilder(String s) {
StringBuilder sb = new StringBuilder(s);
for (int i = 0, j = 0; i < sb.length(); i++)
if (sb.charAt(i) == ';' && ++j % 3 == 0)
sb.setCharAt(i, ',');
return sb.toString();
}
private static String usingSubstring(String s) {
int i = -1, j = 0;
while ((i = s.indexOf(';', i + 1)) != -1)
if (++j % 3 == 0)
s = s.substring(0, i) + ',' + s.substring(i + 1);
return s;
}
Test
String s = "RED;34;34;BLUE;44;44;GREEN;8;8;BLUE;53;53";
System.out.println(usingRegex(s));
System.out.println(usingCharArray(s));
System.out.println(usingStringBuilder(s));
System.out.println(usingSubstring(s));
Output
RED;34;34,BLUE;44;44,GREEN;8;8,BLUE;53;53
RED;34;34,BLUE;44;44,GREEN;8;8,BLUE;53;53
RED;34;34,BLUE;44;44,GREEN;8;8,BLUE;53;53
RED;34;34,BLUE;44;44,GREEN;8;8,BLUE;53;53
Not that elegant like by #Joop, but probably simplier to understand:
String s = "RED;34;34;BLUE;44;44;GREEN;8;8;BLUE;53;53";
char[] chars = s.toCharArray();
int counter = 1;
for (int i = 0; i < chars.length; i++){
if (chars[i] == ';'){
if (counter == 3){
chars[i] = ','; // replace ';' with ','
counter = 1; // set counter to 1
}else {
counter++;
}
}
}
String output = String.valueOf(chars);
System.out.println(output); // RED;34;34,BLUE;44;44,GREEN;8;8,BLUE;53;53

Dont understand to do this [duplicate]

I hope this isn't too much of a stupid question, I have looked on 5 different pages of Google results but haven't been able to find anything on this.
What I need to do is convert a string that contains all Hex characters into ASCII for example
String fileName =
75546f7272656e745c436f6d706c657465645c6e667375635f6f73745f62795f6d757374616e675c50656e64756c756d2d392c303030204d696c65732e6d7033006d7033006d7033004472756d202620426173730050656e64756c756d00496e2053696c69636f00496e2053696c69636f2a3b2a0050656e64756c756d0050656e64756c756d496e2053696c69636f303038004472756d2026204261737350656e64756c756d496e2053696c69636f30303800392c303030204d696c6573203c4d757374616e673e50656e64756c756d496e2053696c69636f3030380050656e64756c756d50656e64756c756d496e2053696c69636f303038004d50330000
Every way I have seen makes it seems like you have to put it into an array first. Is there no way to loop through each two and convert them?
Just use a for loop to go through each couple of characters in the string, convert them to a character and then whack the character on the end of a string builder:
String hex = "75546f7272656e745c436f6d706c657465645c6e667375635f6f73745f62795f6d757374616e675c50656e64756c756d2d392c303030204d696c65732e6d7033006d7033006d7033004472756d202620426173730050656e64756c756d00496e2053696c69636f00496e2053696c69636f2a3b2a0050656e64756c756d0050656e64756c756d496e2053696c69636f303038004472756d2026204261737350656e64756c756d496e2053696c69636f30303800392c303030204d696c6573203c4d757374616e673e50656e64756c756d496e2053696c69636f3030380050656e64756c756d50656e64756c756d496e2053696c69636f303038004d50330000";
StringBuilder output = new StringBuilder();
for (int i = 0; i < hex.length(); i+=2) {
String str = hex.substring(i, i+2);
output.append((char)Integer.parseInt(str, 16));
}
System.out.println(output);
Or (Java 8+) if you're feeling particularly uncouth, use the infamous "fixed width string split" hack to enable you to do a one-liner with streams instead:
System.out.println(Arrays
.stream(hex.split("(?<=\\G..)")) //https://stackoverflow.com/questions/2297347/splitting-a-string-at-every-n-th-character
.map(s -> Character.toString((char)Integer.parseInt(s, 16)))
.collect(Collectors.joining()));
Either way, this gives a few lines starting with the following:
uTorrent\Completed\nfsuc_ost_by_mustang\Pendulum-9,000 Miles.mp3
Hmmm... :-)
Easiest way to do it with javax.xml.bind.DatatypeConverter:
String hex = "75546f7272656e745c436f6d706c657465645c6e667375635f6f73745f62795f6d757374616e675c50656e64756c756d2d392c303030204d696c65732e6d7033006d7033006d7033004472756d202620426173730050656e64756c756d00496e2053696c69636f00496e2053696c69636f2a3b2a0050656e64756c756d0050656e64756c756d496e2053696c69636f303038004472756d2026204261737350656e64756c756d496e2053696c69636f30303800392c303030204d696c6573203c4d757374616e673e50656e64756c756d496e2053696c69636f3030380050656e64756c756d50656e64756c756d496e2053696c69636f303038004d50330000";
byte[] s = DatatypeConverter.parseHexBinary(hex);
System.out.println(new String(s));
String hexToAscii(String s) {
int n = s.length();
StringBuilder sb = new StringBuilder(n / 2);
for (int i = 0; i < n; i += 2) {
char a = s.charAt(i);
char b = s.charAt(i + 1);
sb.append((char) ((hexToInt(a) << 4) | hexToInt(b)));
}
return sb.toString();
}
private static int hexToInt(char ch) {
if ('a' <= ch && ch <= 'f') { return ch - 'a' + 10; }
if ('A' <= ch && ch <= 'F') { return ch - 'A' + 10; }
if ('0' <= ch && ch <= '9') { return ch - '0'; }
throw new IllegalArgumentException(String.valueOf(ch));
}
Check out Convert a string representation of a hex dump to a byte array using Java?
Disregarding encoding, etc. you can do new String (hexStringToByteArray("75546..."));
So as I understand it, you need to pull out successive pairs of hex digits, then decode that 2-digit hex number and take the corresponding char:
String s = "...";
StringBuilder sb = new StringBuilder(s.length() / 2);
for (int i = 0; i < s.length(); i+=2) {
String hex = "" + s.charAt(i) + s.charAt(i+1);
int ival = Integer.parseInt(hex, 16);
sb.append((char) ival);
}
String string = sb.toString();
//%%%%%%%%%%%%%%%%%%%%%% HEX to ASCII %%%%%%%%%%%%%%%%%%%%%%
public String convertHexToString(String hex){
String ascii="";
String str;
// Convert hex string to "even" length
int rmd,length;
length=hex.length();
rmd =length % 2;
if(rmd==1)
hex = "0"+hex;
// split into two characters
for( int i=0; i<hex.length()-1; i+=2 ){
//split the hex into pairs
String pair = hex.substring(i, (i + 2));
//convert hex to decimal
int dec = Integer.parseInt(pair, 16);
str=CheckCode(dec);
ascii=ascii+" "+str;
}
return ascii;
}
public String CheckCode(int dec){
String str;
//convert the decimal to character
str = Character.toString((char) dec);
if(dec<32 || dec>126 && dec<161)
str="n/a";
return str;
}
To this case, I have a hexadecimal data format into an int array and I want to convert them on String.
int[] encodeHex = new int[] { 0x48, 0x65, 0x6c, 0x6c, 0x6f }; // Hello encode
for (int i = 0; i < encodeHex.length; i++) {
System.out.print((char) (encodeHex[i]));
}

How to use ASCII in array

I want to write a program that takes a string text, counts the appearances of every letter in English and stores them inside an array.and print the result like this:
java test abaacc
a:***
b:*
c:**
* - As many time the letter appears.
public static void main (String[] args) {
String input = args[0];
char [] letters = input.toCharArray();
System.out.println((char)97);
String a = "a:";
for (int i=0; i<letters.length; i++) {
int temp = letters[i];
i = i+97;
if (temp == (char)i) {
temp = temp + "*";
}
i = i - 97;
}
System.out.println(temp);
}
Writing (char)97 makes the code less readable. Use 'a'.
As 3kings said in a comment, you need an array of 26 counters, one for each letter of the English alphabet.
Your code should also handle both uppercase and lowercase letters.
private static void printLetterCounts(String text) {
int[] letterCount = new int[26];
for (char c : text.toCharArray())
if (c >= 'a' && c <= 'z')
letterCount[c - 'a']++;
else if (c >= 'A' && c <= 'Z')
letterCount[c - 'A']++;
for (int i = 0; i < 26; i++)
if (letterCount[i] > 0) {
char[] stars = new char[letterCount[i]];
Arrays.fill(stars, '*');
System.out.println((char)('a' + i) + ":" + new String(stars));
}
}
Test
printLetterCounts("abaacc");
System.out.println();
printLetterCounts("This is a test of the letter counting logic");
Output
a:***
b:*
c:**
a:*
c:**
e:****
f:*
g:**
h:**
i:****
l:**
n:**
o:***
r:*
s:***
t:*******
u:*

Categories