String to Unicode in Java - java

I have a large string I need to convert all the non alphanumeric chars to unicode
For example
Input string : abc12/dad-das/das_sdj
Output String : abc12:002Fdad:002Ddas:002Fdas:002Fsdj
Currently I am using this function
for (char c : str.toCharArray()) {
System.out.printf(":%04X \n", (int) c);
}
Is there a better way to do it ?

Here are two ways to do it:
// Looping over string characters
private static String convert(String input) {
StringBuilder buf = new StringBuilder(input.length() + 16);
for (int i = 0; i < input.length(); i++) {
char c = input.charAt(i);
if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9'))
buf.append(c);
else
buf.append(String.format(":%04X", (int) c));
}
return buf.toString();
}
// Using regular expression
private static String convert(String input) {
StringBuffer buf = new StringBuffer(input.length() + 16);
Matcher m = Pattern.compile("[^a-zA-Z0-9]").matcher(input);
while (m.find())
m.appendReplacement(buf, String.format(":%04X", (int) m.group().charAt(0)));
return m.appendTail(buf).toString();
}
Test
System.out.println(convert("abc12/dad-das/das_sdj"));
Output
abc12:002Fdad:002Ddas:002Fdas:005Fsdj

Related

Java. I want to repeat the words in the string, but something goes wrong in my code

I want to repeat only the words in my input string. But the order of output is not right. Here's my code:
public static String repeatWords(String s, int num){
StringBuilder sb = new StringBuilder();
StringBuilder sbtemporary = new StringBuilder();
int leng = s.length();
for (int i=0; i < leng; i++){
char c = s.charAt(i);
int b = (int) c;
if (b >= 65 && b <= 90 || b >= 97 && b <= 122){
sbtemporary.append(c);
} else if (b == 32){
sbtemporary.append(" ");
}
if (b == 32){
for (int j = 1; j<= num-1; j++){
sb = sb.append(" " + sbtemporary);
sbtemporary.delete(0,sbtemporary.length());
}
}
sb.append(c);
}
String str = sb.toString();
return str;
}
s is the input string, num is the times that needs to repeat. The result I want is like :
When the input is : "How are you? I am fine."
The output should be like: "How How are are you you?"
But the result of my code is:
"How How are are you? you I I am am fine."
I don't really know where goes wrong, pls could someone help me with this?
System.out.println((int) "?".charAt(0)); //63
System.out.println((int) ".".charAt(0)); //46
"?" and "." do not append to sbtemporary.
Corrected Code:-
public static String repeatWords(String s, int num){
StringBuilder sb = new StringBuilder();
StringBuilder sbtemporary = new StringBuilder();
int leng = s.length();
for (int i=0; i < leng; i++){
char c = s.charAt(i);
int b = (int) c;
if (b >= 65 && b <= 90 || b >= 97 && b <= 122){
sbtemporary.append(c);
}
if (b == 32 && sbtemporary.length() != 0){
for (int j = 1; j<= num-1; j++){
sb.append(" " + sbtemporary);
}
sbtemporary.delete(0,sbtemporary.length());
}
sb.append(c);
}
String str = sb.toString();
return str;
}
Changes:-
Removed the else if (b == 32) part where you are appending space to sbtemporary.
Delete content of sbtemporary outside the j loop, else it will work only for num == 2.
And in the if condition b == 32 added one more condition if sbtemporary is not empty.
Add this checking, and add to sbtemporary for the repeated part.
System.out.println(repeatWords(str, 1));
if ( b == 63) {
sb = sb.append(" " + sbtemporary);
sb.append(c);
break;
}
if (b == 63) {
for (int j = 1; j<= num; j++){
sb.append(" " + sbtemporary);
}
sb.append(c);
break;
}
public static String repeatWords(String s, int num) {
StringBuilder sb = new StringBuilder();
StringBuilder sbtemporary = new StringBuilder();
int leng = s.length();
for (int i = 0; i < leng; i++) {
char c = s.charAt(i);
int b = (int) c;
if (b == 63) {
for (int j = 1; j<= num; j++){
sb.append(" " + sbtemporary);
}
sb.append(c);
break;
}
if (b >= 65 && b <= 90 || b >= 97 && b <= 122) {
sbtemporary.append(c);
}
if (b == 32 && sbtemporary.length() != 0){
for (int j = 1; j<= num; j++){
sb.append(" " + sbtemporary);
}
sbtemporary.delete(0,sbtemporary.length());
}
sb.append(c);
}
String str = sb.toString();
return str;
}
You can simply do this instead,
String content = "How are you? I am fine.";
String[] words = content.split("\\s"); // `\\s` preserves any 2 or more repeated spaces
StringBuilder builder = new StringBuilder();
for(String word : words) {
for(int i = 0; i < n; i++) { // n is the no. of times to repeat the word
builder.append(word).append(" ");
}
}
System.out.printf("Repeated String : %s", builder.toString().trim());
A more tidy way with Java8, here Collections.nCopies(n, word) simply returns you a list of the word repeated n times, which is converted in a single String separated by space, and eventually, all such repeated strings are converted in one finally resultant string separated by space.
String content = "How are you? I am fine.";
String[] words = content.split("\\s");
String result = Arrays.stream(words)
.map(word -> String.join(" ", Collections.nCopies(n, word)))
.collect(Collectors.joining(" ")); //n is the no. of times to repeat the word
System.out.printf("Repeated String : %s", result);

how to distinguish Unicode characters and ASCII characters

I want to distinguish Unicode characters and ASCII characters from the below string:
abc\u263A\uD83D\uDE0A\uD83D\uDE22123
How can I distinguish characters? Can anyone help me with this issue? I have tried some code, but it crashes in some cases. What is wrong with my code?
The first three characters are abc, and the last three characters are 123. The rest of the string is Unicode characters. I want to make a string array like this:
str[0] = 'a';
str[1] = 'b';
str[2] = 'c';
str[3] = '\u263A\uD83D';
str[4] = '\uDE0A\uD83D';
str[5] = '\uDE22';
str[6] = '1';
str[7] = '2';
str[8] = '3';
Code:
private String[] getCharArray(String unicodeStr) {
ArrayList<String> list = new ArrayList<>();
for (int i = 0; i < unicodeStr.length(); i++) {
if (unicodeStr.charAt(i) == '\\') {
list.add(unicodeStr.substring(i, i + 11));
i = i + 11;
} else {
list.add(String.valueOf(unicodeStr.charAt(i)));
}
}
return list.toArray(new String[list.size()]);
}
ASCII characters exist in Unicode, they are Unicode codepoints U+0000 - U+007F, inclusive.
Java strings are represented in UTF-16, which is a 16-bit byte encoding of Unicode. Each Java char is a UTF-16 code unit. Unicode codepoints U+0000 - U+FFFF use 1 UTF-16 code unit and thus fit in a single char, whereas Unicode codepoints U+10000 and higher require a UTF-16 surrogate pair and thus need two chars.
If the string has UTF-16 code units represented as actual char values, then you can use Java's string methods that work with codepoints, eg:
private String[] getCharArray(String unicodeStr) {
ArrayList<String> list = new ArrayList<>();
int i = 0, j;
while (i < unicodeStr.length()) {
j = unicodeStr.offsetByCodePoints(i, 1);
list.add(unicodeStr.substring(i, j));
i = j;
}
return list.toArray(new String[list.size()]);
}
On the other hand, if the string has UTF-16 code units represented in an encoded "\uXXXX" format (ie, as 6 distinct characters - '\', 'u', ...), then things get a little more complicated as you have to parse the encoded sequences manually.
If you want to preserve the "\uXXXX" strings in your array, you could do something like this:
private boolean isUnicodeEncoded(string s, int index)
{
return (
(s.charAt(index) == '\\') &&
((index+5) < s.length()) &&
(s.charAt(index+1) == 'u')
);
}
private String[] getCharArray(String unicodeStr) {
ArrayList<String> list = new ArrayList<>();
int i = 0, j, start;
char ch;
while (i < unicodeStr.length()) {
start = i;
if (isUnicodeEncoded(unicodeStr, i)) {
ch = (char) Integer.parseInt(unicodeStr.substring(i+2, i+6), 16);
j = 6;
}
else {
ch = unicodeStr.charAt(i);
j = 1;
}
i += j;
if (Character.isHighSurrogate(ch) && (i < unicodeStr.length())) {
if (isUnicodeEncoded(unicodeStr, i)) {
ch = (char) Integer.parseInt(unicodeStr.substring(i+2, i+6), 16);
j = 6;
}
else {
ch = unicodeStr.charAt(i);
j = 1;
}
if (Character.isLowSurrogate(ch)) {
i += j;
}
}
list.add(unicodeStr.substring(start, i));
}
return list.toArray(new String[list.size()]);
}
If you want to decode the "\uXXXX" strings into actual chars in your array, you could do something like this instead:
private boolean isUnicodeEncoded(string s, int index)
{
return (
(s.charAt(index) == '\\') &&
((index+5) < s.length()) &&
(s.charAt(index+1) == 'u')
);
}
private String[] getCharArray(String unicodeStr) {
ArrayList<String> list = new ArrayList<>();
int i = 0, j;
char ch1, ch2;
while (i < unicodeStr.length()) {
if (isUnicodeEncoded(unicodeStr, i)) {
ch1 = (char) Integer.parseInt(unicodeStr.substring(i+2, i+6), 16);
j = 6;
}
else {
ch1 = unicodeStr.charAt(i);
j = 1;
}
i += j;
if (Character.isHighSurrogate(ch1) && (i < unicodeStr.length())) {
if (isUnicodeEncoded(unicodeStr, i)) {
ch2 = (char) Integer.parseInt(unicodeStr.substring(i+2, i+6), 16);
j = 6;
}
else {
ch2 = unicodeStr.charAt(i);
j = 1;
}
if (Character.isLowSurrogate(ch2)) {
list.add(String.valueOf(new char[]{ch1, ch2}));
i += j;
continue;
}
}
list.add(String.valueOf(ch1));
}
return list.toArray(new String[list.size()]);
}
Or, something like this (per https://stackoverflow.com/a/24046962/65863):
private String[] getCharArray(String unicodeStr) {
Properties p = new Properties();
p.load(new StringReader("key="+unicodeStr));
unicodeStr = p.getProperty("key");
ArrayList<String> list = new ArrayList<>();
int i = 0;
while (i < unicodeStr.length()) {
if (Character.isHighSurrogate(unicodeStr.charAt(i)) &&
((i+1) < unicodeStr.length()) &&
Character.isLowSurrogate(unicodeStr.charAt(i+1)))
{
list.add(unicodeStr.substring(i, i+2));
i += 2;
}
else {
list.add(unicodeStr.substring(i, i+1));
++i;
}
}
return list.toArray(new String[list.size()]);
}
It's not entirely clear what you're asking for, but if you want to tell if a specific character is ASCII, you can use Guava's ChatMatcher.ascii().
if ( CharMatcher.ascii().matches('a') ) {
System.out.println("'a' is ascii");
}
if ( CharMatcher.ascii().matches('\u263A\uD83D') ) {
// this shouldn't be printed
System.out.println("'\u263A\uD83D' is ascii");
}

Dont understand to do this [duplicate]

I hope this isn't too much of a stupid question, I have looked on 5 different pages of Google results but haven't been able to find anything on this.
What I need to do is convert a string that contains all Hex characters into ASCII for example
String fileName =
75546f7272656e745c436f6d706c657465645c6e667375635f6f73745f62795f6d757374616e675c50656e64756c756d2d392c303030204d696c65732e6d7033006d7033006d7033004472756d202620426173730050656e64756c756d00496e2053696c69636f00496e2053696c69636f2a3b2a0050656e64756c756d0050656e64756c756d496e2053696c69636f303038004472756d2026204261737350656e64756c756d496e2053696c69636f30303800392c303030204d696c6573203c4d757374616e673e50656e64756c756d496e2053696c69636f3030380050656e64756c756d50656e64756c756d496e2053696c69636f303038004d50330000
Every way I have seen makes it seems like you have to put it into an array first. Is there no way to loop through each two and convert them?
Just use a for loop to go through each couple of characters in the string, convert them to a character and then whack the character on the end of a string builder:
String hex = "75546f7272656e745c436f6d706c657465645c6e667375635f6f73745f62795f6d757374616e675c50656e64756c756d2d392c303030204d696c65732e6d7033006d7033006d7033004472756d202620426173730050656e64756c756d00496e2053696c69636f00496e2053696c69636f2a3b2a0050656e64756c756d0050656e64756c756d496e2053696c69636f303038004472756d2026204261737350656e64756c756d496e2053696c69636f30303800392c303030204d696c6573203c4d757374616e673e50656e64756c756d496e2053696c69636f3030380050656e64756c756d50656e64756c756d496e2053696c69636f303038004d50330000";
StringBuilder output = new StringBuilder();
for (int i = 0; i < hex.length(); i+=2) {
String str = hex.substring(i, i+2);
output.append((char)Integer.parseInt(str, 16));
}
System.out.println(output);
Or (Java 8+) if you're feeling particularly uncouth, use the infamous "fixed width string split" hack to enable you to do a one-liner with streams instead:
System.out.println(Arrays
.stream(hex.split("(?<=\\G..)")) //https://stackoverflow.com/questions/2297347/splitting-a-string-at-every-n-th-character
.map(s -> Character.toString((char)Integer.parseInt(s, 16)))
.collect(Collectors.joining()));
Either way, this gives a few lines starting with the following:
uTorrent\Completed\nfsuc_ost_by_mustang\Pendulum-9,000 Miles.mp3
Hmmm... :-)
Easiest way to do it with javax.xml.bind.DatatypeConverter:
String hex = "75546f7272656e745c436f6d706c657465645c6e667375635f6f73745f62795f6d757374616e675c50656e64756c756d2d392c303030204d696c65732e6d7033006d7033006d7033004472756d202620426173730050656e64756c756d00496e2053696c69636f00496e2053696c69636f2a3b2a0050656e64756c756d0050656e64756c756d496e2053696c69636f303038004472756d2026204261737350656e64756c756d496e2053696c69636f30303800392c303030204d696c6573203c4d757374616e673e50656e64756c756d496e2053696c69636f3030380050656e64756c756d50656e64756c756d496e2053696c69636f303038004d50330000";
byte[] s = DatatypeConverter.parseHexBinary(hex);
System.out.println(new String(s));
String hexToAscii(String s) {
int n = s.length();
StringBuilder sb = new StringBuilder(n / 2);
for (int i = 0; i < n; i += 2) {
char a = s.charAt(i);
char b = s.charAt(i + 1);
sb.append((char) ((hexToInt(a) << 4) | hexToInt(b)));
}
return sb.toString();
}
private static int hexToInt(char ch) {
if ('a' <= ch && ch <= 'f') { return ch - 'a' + 10; }
if ('A' <= ch && ch <= 'F') { return ch - 'A' + 10; }
if ('0' <= ch && ch <= '9') { return ch - '0'; }
throw new IllegalArgumentException(String.valueOf(ch));
}
Check out Convert a string representation of a hex dump to a byte array using Java?
Disregarding encoding, etc. you can do new String (hexStringToByteArray("75546..."));
So as I understand it, you need to pull out successive pairs of hex digits, then decode that 2-digit hex number and take the corresponding char:
String s = "...";
StringBuilder sb = new StringBuilder(s.length() / 2);
for (int i = 0; i < s.length(); i+=2) {
String hex = "" + s.charAt(i) + s.charAt(i+1);
int ival = Integer.parseInt(hex, 16);
sb.append((char) ival);
}
String string = sb.toString();
//%%%%%%%%%%%%%%%%%%%%%% HEX to ASCII %%%%%%%%%%%%%%%%%%%%%%
public String convertHexToString(String hex){
String ascii="";
String str;
// Convert hex string to "even" length
int rmd,length;
length=hex.length();
rmd =length % 2;
if(rmd==1)
hex = "0"+hex;
// split into two characters
for( int i=0; i<hex.length()-1; i+=2 ){
//split the hex into pairs
String pair = hex.substring(i, (i + 2));
//convert hex to decimal
int dec = Integer.parseInt(pair, 16);
str=CheckCode(dec);
ascii=ascii+" "+str;
}
return ascii;
}
public String CheckCode(int dec){
String str;
//convert the decimal to character
str = Character.toString((char) dec);
if(dec<32 || dec>126 && dec<161)
str="n/a";
return str;
}
To this case, I have a hexadecimal data format into an int array and I want to convert them on String.
int[] encodeHex = new int[] { 0x48, 0x65, 0x6c, 0x6c, 0x6f }; // Hello encode
for (int i = 0; i < encodeHex.length; i++) {
System.out.print((char) (encodeHex[i]));
}

How can I uppercase and lowercase a Char

GWT is not allowing me to use Character.toUpperCase(char) and Character.toLowerCase(char). How can I rewrite the method bellow to not use the Character class or any external library
public static String toDisplayCase(String s) {
final String ACTIONABLE_DELIMITERS = " '-/"; // these cause the character following
// to be capitalized
StringBuilder sb = new StringBuilder();
boolean capNext = true;
for (char c : s.toCharArray()) {
c = (capNext)
? Character.toUpperCase(c)
: Character.toLowerCase(c);
sb.append(c);
capNext = (ACTIONABLE_DELIMITERS.indexOf((int) c) >= 0); // explicit cast not needed
}
return sb.toString();
}
If somehow, you are not allowed to use Character class (though that sounds quite crazy), you may add or subtract ASCII values.
eg:
for (char c : s.toCharArray()) {
c = (capNext)
? ( (c>='a'&&c<='z') ? c+32 : c) //to Upper Case
: ( (c>='A'&&c<='Z') ? c-32 : c) //to Lower Case
sb.append(c);
capNext = (ACTIONABLE_DELIMITERS.indexOf((int) c) >= 0); // explicit cast not needed
}
Just use basic operators
if (c >= 'a' && c <= 'z')
c = c - 'a' + 'A'; // lower to upper
if (c >= 'A' && c <= 'Z')
c = c - 'A' + 'a'; // upper to lower
Here are toLower and toUpper using ascii values. Hope it helps.
static char toUpperCase(char c) {
if (97 <= c && c <= 122) {
c = (char) ((c - 32));
}
return c;
}
static char toLowerCase(char c) {
if (65 <= c && c <= 90) {
c = (char) ((c + 32));
}
return c;
}

How do I shift letters down in a loop

I'm trying to create a loop which only returns letters. In my code, I get symbols that I don't want. How do I fix my loop so that when my integer is +3, it only gives me letters?
public static String caesarDecrypt(String encoded, int shift){
String decrypted = "";
for (int i = 0; i < encoded.length(); i++) {
char t = encoded.charAt(i);
if ((t <= 'a') && (t >= 'z')) {
t -= shift;
}
if (t > 'z') {
t += 26;
} else if ((t >= 'A') && (t <= 'Z')) {
t -= shift;
if (t > 'Z')
t += 26;
} else {
}
decrypted = decrypted + t;
}
}
You are subtracting the shift value from the letters. Therefore, the new letter can never be > 'z'. You should check if the it is < 'a' (or 'A', respectively).
StringBuilder decrypted = new StringBuilder(encoded.length());
for (int i = 0; i < encoded.length(); i++)
{
char t = encoded.charAt(i);
if ((t >= 'a') && (t <= 'z'))
{
t -= shift;
while (t < 'a')
{
t += 26;
}
}
else if ((t >= 'A') && (t <= 'Z'))
{
t -= shift;
while (t < 'A')
{
t += 26;
}
}
decrypted.append(t);
}
return decrypted.toString();
Also, you shouldn't be using String concatenation to generate the result. Learn about StringBuilder instead.
EDIT: To make sure the new letter is in the range 'a' .. 'z' for an arbitrary (positive) shift, you should use while instead of if.
I am not giving you exact code. But I can help you in logic:
Check whether you are reaching end points (a, A, z, Z) due to the shift.
If you exceed the end points either way, then compute the distance between end points and shifted t. Add/subtract/modulus (based on the end point) this distance to the other endpoint to get the exact letter.
Something like this? (Warning, untested)
public static String caesarDecrypt(String encoded, int shift) {
String decrypted = "";
for (int i = 0; i < encoded.length(); i++) {
char t = encoded.charAt(i).ToUpper();
decrypted = decrypted + decode(t, shift);
}
}
// call with uppercase ASCII letters, and a positive shift
function decode(char n, int shift)
{
if ((n < 'A') || (n > 'Z')) return ('-');
var str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
var s = str.charAt(((n - 'A') + shift)%26);
return(s);
}
As you are naming your method caesarDecrypt (I assume you mean encrypt), I think you want a shift in the alphabet including wrapping around.
This code will do that for you:
public class Snippet {
public static void main(String[] args) {
System.out.println(caesarShift("This is a Fizzy test.", 5));
System.out.println(caesarShift("Ymnx nx f Kneed yjxy.", -5));
}
public static String caesarShift(String input, int shift) {
// making sure that shift is positive so that modulo works correctly
while (shift < 0)
shift += 26;
int l = input.length();
StringBuffer output = new StringBuffer();
for (int i = 0; i < l; i++) {
char c = input.charAt(i);
char newLetter = c;
if (c >= 'a' && c <= 'z') { // lowercase
newLetter = (char) ((c - 'a' + shift) % 26 + 'a'); // shift, wrap it and convert it back to char
} else if (c >= 'A' && c <= 'Z') { // uppercase
newLetter = (char) ((c - 'A' + shift) % 26 + 'A'); // shift, wrap it and convert it back to char
}
output.append(newLetter);
}
return output.toString();
}
}
This will handle lowercase and uppercase letters. Everything else will be left as it is (like spaces, punctuations, etc).
Please take some time to look at this code to understand how it works. I have put some comments to make it clearer. From your code I think you were a bit confused, so it is important that you understand this code very well. If you have questions, feel free to ask them.
This code
String start = "abcdefghijklmnopqrstuvwxyz";
String encrypted = caesarShift(start, 3);
String decrypted = caesarShift(encrypted, -3);
System.out.println("Start : " + start);
System.out.println("Encrypted : " + encrypted);
System.out.println("Decrypted : " + decrypted);
will give this result
Start : abcdefghijklmnopqrstuvwxyz
Encrypted : defghijklmnopqrstuvwxyzabc
Decrypted : abcdefghijklmnopqrstuvwxyz

Categories