Insert in between the letters of String - java

I have to split it and put %20 in between of the digits
var code="247834"
to
2%204%207%208%203%204
looks simple but i am not able to convert it.
any answer in scala or java is appreciated.

In Scala,
code.mkString("%20")

In whatever language, your general technique should be to split the string into an array and then join it with the string '%20'. In PHP this would be
$array = str_split( $code);
$result = join( '%20', $array);
In Javascript:
var code="247834";
var myArray = code.split();
var result = myArray.join( '%20');
In one statement:
var result = code.split().join('%20');
// or "247834".split().join('%20');
Java??:
char[] myArray = "247844".toCharArray();
char[] result = StringUtils.join( myArray, '%20');
(the latter may need minor changes e.g. a join method is in TextUtils for Android and swaps the parameter order)

In Javascript
var code = "247834";
var output = '';
for (var i = 0; i < code.length; ++i)
{
output = output + code.charAt(i) + ((i < code.length - 1) ? '%20' : '');
}
alert(output);
http://jsfiddle.net/sNrU7/1/

Java solution
Starting from the end, you can insert a delimiter using string buffer. This is the easiest solution and may not be the fastest.
Output
2%204%207%208%203%204
2-4-7-8-3-4
Code
import java.lang.StringBuffer;
public class Test {
public static void main(String[] args) {
String str = "247834";
System.out.println(separate(str, "%20"));
System.out.println(separate(str, "-"));
}
public static String separate(String str, String delim) {
StringBuffer buff = new StringBuffer(str);
for (int i = str.length(); i > 0 ; i--) {
if (i < str.length()) {
buff.insert(i, delim);
}
}
return buff.toString();
}
}

Related

(Java) Append elements to a String variable from String array up to specific index?

public class sample {
public static void main(String[] args) {
String[] test = new String[1024];
int count = 0;
test[count] = "33";
count++;
test[count] = "34";
String s = new String();
This is just a simplified version, but I would like to append elements to a String variable s up to the index value of count without using StringBuilder, is there a way to do it? Thank you.
edit: without using loop as well, is there a String manipulation function I can use?
One way to do that is using String.join and Arrays.copyOf:
String s = String.join("", Arrays.copyOf(test, count + 1));
Which, with your test data, produces 3334
Dont quite understand what you want...
But I guess you could user char array?
char[] c = new char[maxCount]
for(int i = 0;i<maxCount;i++){
c[i] = "34";
}
String s = String.valueOf(c)
Hope this could help you:)
Hard to say, what you're asking for...
Concatenating consecutive numbers could be easily done with a stream:
String s = IntStream.rangeClosed(0, 1024)
.mapToObj(Integer::toString)
.collect(Collectors.joining());
We can use the join function of String.
Assuming the test is the string array.
String joinedString = String.join("", Arrays.stream(test).limit(count).collect(Collectors.toList()))
You can use String.join()
public class Main {
public static void main(String[] args) {
String[] test = new String[1024];
int count = 0;
test[count] = "33";
count++;
test[count] = "34";
String s = new String();
System.out.println(s=String.join("", Arrays.copyOf(test, count + 1)));
}
}

Java: Display unicode chars as chars when printing string [duplicate]

I have a string with escaped Unicode characters, \uXXXX, and I want to convert it to regular Unicode letters. For example:
"\u0048\u0065\u006C\u006C\u006F World"
should become
"Hello World"
I know that when I print the first string it already shows Hello world. My problem is I read file names from a file, and then I search for them. The files names in the file are escaped with Unicode encoding, and when I search for the files, I can't find them, since it searches for a file with \uXXXX in its name.
The Apache Commons Lang StringEscapeUtils.unescapeJava() can decode it properly.
import org.apache.commons.lang.StringEscapeUtils;
#Test
public void testUnescapeJava() {
String sJava="\\u0048\\u0065\\u006C\\u006C\\u006F";
System.out.println("StringEscapeUtils.unescapeJava(sJava):\n" + StringEscapeUtils.unescapeJava(sJava));
}
output:
StringEscapeUtils.unescapeJava(sJava):
Hello
Technically doing:
String myString = "\u0048\u0065\u006C\u006C\u006F World";
automatically converts it to "Hello World", so I assume you are reading in the string from some file. In order to convert it to "Hello" you'll have to parse the text into the separate unicode digits, (take the \uXXXX and just get XXXX) then do Integer.ParseInt(XXXX, 16) to get a hex value and then case that to char to get the actual character.
Edit: Some code to accomplish this:
String str = myString.split(" ")[0];
str = str.replace("\\","");
String[] arr = str.split("u");
String text = "";
for(int i = 1; i < arr.length; i++){
int hexVal = Integer.parseInt(arr[i], 16);
text += (char)hexVal;
}
// Text will now have Hello
You can use StringEscapeUtils from Apache Commons Lang, i.e.:
String Title = StringEscapeUtils.unescapeJava("\\u0048\\u0065\\u006C\\u006C\\u006F");
This simple method will work for most cases, but would trip up over something like "u005Cu005C" which should decode to the string "\u0048" but would actually decode "H" as the first pass produces "\u0048" as the working string which then gets processed again by the while loop.
static final String decode(final String in)
{
String working = in;
int index;
index = working.indexOf("\\u");
while(index > -1)
{
int length = working.length();
if(index > (length-6))break;
int numStart = index + 2;
int numFinish = numStart + 4;
String substring = working.substring(numStart, numFinish);
int number = Integer.parseInt(substring,16);
String stringStart = working.substring(0, index);
String stringEnd = working.substring(numFinish);
working = stringStart + ((char)number) + stringEnd;
index = working.indexOf("\\u");
}
return working;
}
Shorter version:
public static String unescapeJava(String escaped) {
if(escaped.indexOf("\\u")==-1)
return escaped;
String processed="";
int position=escaped.indexOf("\\u");
while(position!=-1) {
if(position!=0)
processed+=escaped.substring(0,position);
String token=escaped.substring(position+2,position+6);
escaped=escaped.substring(position+6);
processed+=(char)Integer.parseInt(token,16);
position=escaped.indexOf("\\u");
}
processed+=escaped;
return processed;
}
StringEscapeUtils from org.apache.commons.lang3 library is deprecated as of 3.6.
So you can use their new commons-text library instead:
compile 'org.apache.commons:commons-text:1.9'
OR
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.9</version>
</dependency>
Example code:
org.apache.commons.text.StringEscapeUtils.unescapeJava(escapedString);
With Kotlin you can write your own extension function for String
fun String.unescapeUnicode() = replace("\\\\u([0-9A-Fa-f]{4})".toRegex()) {
String(Character.toChars(it.groupValues[1].toInt(radix = 16)))
}
and then
fun main() {
val originalString = "\\u0048\\u0065\\u006C\\u006C\\u006F World"
println(originalString.unescapeUnicode())
}
It's not totally clear from your question, but I'm assuming you saying that you have a file where each line of that file is a filename. And each filename is something like this:
\u0048\u0065\u006C\u006C\u006F
In other words, the characters in the file of filenames are \, u, 0, 0, 4, 8 and so on.
If so, what you're seeing is expected. Java only translates \uXXXX sequences in string literals in source code (and when reading in stored Properties objects). When you read the contents you file you will have a string consisting of the characters \, u, 0, 0, 4, 8 and so on and not the string Hello.
So you will need to parse that string to extract the 0048, 0065, etc. pieces and then convert them to chars and make a string from those chars and then pass that string to the routine that opens the file.
For Java 9+, you can use the new replaceAll method of Matcher class.
private static final Pattern UNICODE_PATTERN = Pattern.compile("\\\\u([0-9A-Fa-f]{4})");
public static String unescapeUnicode(String unescaped) {
return UNICODE_PATTERN.matcher(unescaped).replaceAll(r -> String.valueOf((char) Integer.parseInt(r.group(1), 16)));
}
public static void main(String[] args) {
String originalMessage = "\\u0048\\u0065\\u006C\\u006C\\u006F World";
String unescapedMessage = unescapeUnicode(originalMessage);
System.out.println(unescapedMessage);
}
I believe the main advantage of this approach over unescapeJava by StringEscapeUtils (besides not using an extra library) is that you can convert only the unicode characters (if you wish), since the latter converts all escaped Java characters (like \n or \t). If you prefer to convert all escaped characters the library is really the best option.
Updates regarding answers suggesting using The Apache Commons Lang's:
StringEscapeUtils.unescapeJava() - it was deprecated,
Deprecated.
as of 3.6, use commons-text StringEscapeUtils instead
The replacement is Apache Commons Text's StringEscapeUtils.unescapeJava()
Just wanted to contribute my version, using regex:
private static final String UNICODE_REGEX = "\\\\u([0-9a-f]{4})";
private static final Pattern UNICODE_PATTERN = Pattern.compile(UNICODE_REGEX);
...
String message = "\u0048\u0065\u006C\u006C\u006F World";
Matcher matcher = UNICODE_PATTERN.matcher(message);
StringBuffer decodedMessage = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(
decodedMessage, String.valueOf((char) Integer.parseInt(matcher.group(1), 16)));
}
matcher.appendTail(decodedMessage);
System.out.println(decodedMessage.toString());
I wrote a performanced and error-proof solution:
public static final String decode(final String in) {
int p1 = in.indexOf("\\u");
if (p1 < 0)
return in;
StringBuilder sb = new StringBuilder();
while (true) {
int p2 = p1 + 6;
if (p2 > in.length()) {
sb.append(in.subSequence(p1, in.length()));
break;
}
try {
int c = Integer.parseInt(in.substring(p1 + 2, p1 + 6), 16);
sb.append((char) c);
p1 += 6;
} catch (Exception e) {
sb.append(in.subSequence(p1, p1 + 2));
p1 += 2;
}
int p0 = in.indexOf("\\u", p1);
if (p0 < 0) {
sb.append(in.subSequence(p1, in.length()));
break;
} else {
sb.append(in.subSequence(p1, p0));
p1 = p0;
}
}
return sb.toString();
}
one easy way i know using JsonObject:
try {
JSONObject json = new JSONObject();
json.put("string", myString);
String converted = json.getString("string");
} catch (JSONException e) {
e.printStackTrace();
}
Fast
fun unicodeDecode(unicode: String): String {
val stringBuffer = StringBuilder()
var i = 0
while (i < unicode.length) {
if (i + 1 < unicode.length)
if (unicode[i].toString() + unicode[i + 1].toString() == "\\u") {
val symbol = unicode.substring(i + 2, i + 6)
val c = Integer.parseInt(symbol, 16)
stringBuffer.append(c.toChar())
i += 5
} else stringBuffer.append(unicode[i])
i++
}
return stringBuffer.toString()
}
UnicodeUnescaper from Apache Commons Text does exactly what you want, and ignores any other escape sequences.
String input = "\\u0048\\u0065\\u006C\\u006C\\u006F World";
String output = new UnicodeUnescaper().translate(input);
assert("Hello World".equals(output));
assert("\u0048\u0065\u006C\u006C\u006F World".equals(output));
Where input would be the string you are reading from a file.
try
private static final Charset UTF_8 = Charset.forName("UTF-8");
private String forceUtf8Coding(String input) {return new String(input.getBytes(UTF_8), UTF_8))}
Actually, I wrote an Open Source library that contains some utilities. One of them is converting a Unicode sequence to String and vise-versa. I found it very useful. Here is the quote from the article about this library about Unicode converter:
Class StringUnicodeEncoderDecoder has methods that can convert a
String (in any language) into a sequence of Unicode characters and
vise-versa. For example a String "Hello World" will be converted into
"\u0048\u0065\u006c\u006c\u006f\u0020 \u0057\u006f\u0072\u006c\u0064"
and may be restored back.
Here is the link to entire article that explains what Utilities the library has and how to get the library to use it. It is available as Maven artifact or as source from Github. It is very easy to use. Open Source Java library with stack trace filtering, Silent String parsing Unicode converter and Version comparison
Here is my solution...
String decodedName = JwtJson.substring(startOfName, endOfName);
StringBuilder builtName = new StringBuilder();
int i = 0;
while ( i < decodedName.length() )
{
if ( decodedName.substring(i).startsWith("\\u"))
{
i=i+2;
builtName.append(Character.toChars(Integer.parseInt(decodedName.substring(i,i+4), 16)));
i=i+4;
}
else
{
builtName.append(decodedName.charAt(i));
i = i+1;
}
};
I found that many of the answers did not address the issue of "Supplementary Characters". Here is the correct way to support it. No third-party libraries, pure Java implementation.
http://www.oracle.com/us/technologies/java/supplementary-142654.html
public static String fromUnicode(String unicode) {
String str = unicode.replace("\\", "");
String[] arr = str.split("u");
StringBuffer text = new StringBuffer();
for (int i = 1; i < arr.length; i++) {
int hexVal = Integer.parseInt(arr[i], 16);
text.append(Character.toChars(hexVal));
}
return text.toString();
}
public static String toUnicode(String text) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < text.length(); i++) {
int codePoint = text.codePointAt(i);
// Skip over the second char in a surrogate pair
if (codePoint > 0xffff) {
i++;
}
String hex = Integer.toHexString(codePoint);
sb.append("\\u");
for (int j = 0; j < 4 - hex.length(); j++) {
sb.append("0");
}
sb.append(hex);
}
return sb.toString();
}
#Test
public void toUnicode() {
System.out.println(toUnicode("😊"));
System.out.println(toUnicode("πŸ₯°"));
System.out.println(toUnicode("Hello World"));
}
// output:
// \u1f60a
// \u1f970
// \u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
#Test
public void fromUnicode() {
System.out.println(fromUnicode("\\u1f60a"));
System.out.println(fromUnicode("\\u1f970"));
System.out.println(fromUnicode("\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u0057\\u006f\\u0072\\u006c\\u0064"));
}
// output:
// 😊
// πŸ₯°
// Hello World
#NominSim
There may be other character, so I should detect it by length.
private String forceUtf8Coding(String str) {
str = str.replace("\\","");
String[] arr = str.split("u");
StringBuilder text = new StringBuilder();
for(int i = 1; i < arr.length; i++){
String a = arr[i];
String b = "";
if (arr[i].length() > 4){
a = arr[i].substring(0, 4);
b = arr[i].substring(4);
}
int hexVal = Integer.parseInt(a, 16);
text.append((char) hexVal).append(b);
}
return text.toString();
}
An alternate way of accomplishing this could be to make use of chars() introduced with Java 9, this can be used to iterate over the characters making sure any char which maps to a surrogate code point is passed through uninterpreted. This can be used as:-
String myString = "\u0048\u0065\u006C\u006C\u006F World";
myString.chars().forEach(a -> System.out.print((char)a));
// would print "Hello World"
Solution for Kotlin:
val sourceContent = File("test.txt").readText(Charset.forName("windows-1251"))
val result = String(sourceContent.toByteArray())
Kotlin uses UTF-8 everywhere as default encoding.
Method toByteArray() has default argument - Charsets.UTF_8.

How to convert a string with Unicode encoding to a string of letters

I have a string with escaped Unicode characters, \uXXXX, and I want to convert it to regular Unicode letters. For example:
"\u0048\u0065\u006C\u006C\u006F World"
should become
"Hello World"
I know that when I print the first string it already shows Hello world. My problem is I read file names from a file, and then I search for them. The files names in the file are escaped with Unicode encoding, and when I search for the files, I can't find them, since it searches for a file with \uXXXX in its name.
The Apache Commons Lang StringEscapeUtils.unescapeJava() can decode it properly.
import org.apache.commons.lang.StringEscapeUtils;
#Test
public void testUnescapeJava() {
String sJava="\\u0048\\u0065\\u006C\\u006C\\u006F";
System.out.println("StringEscapeUtils.unescapeJava(sJava):\n" + StringEscapeUtils.unescapeJava(sJava));
}
output:
StringEscapeUtils.unescapeJava(sJava):
Hello
Technically doing:
String myString = "\u0048\u0065\u006C\u006C\u006F World";
automatically converts it to "Hello World", so I assume you are reading in the string from some file. In order to convert it to "Hello" you'll have to parse the text into the separate unicode digits, (take the \uXXXX and just get XXXX) then do Integer.ParseInt(XXXX, 16) to get a hex value and then case that to char to get the actual character.
Edit: Some code to accomplish this:
String str = myString.split(" ")[0];
str = str.replace("\\","");
String[] arr = str.split("u");
String text = "";
for(int i = 1; i < arr.length; i++){
int hexVal = Integer.parseInt(arr[i], 16);
text += (char)hexVal;
}
// Text will now have Hello
You can use StringEscapeUtils from Apache Commons Lang, i.e.:
String Title = StringEscapeUtils.unescapeJava("\\u0048\\u0065\\u006C\\u006C\\u006F");
This simple method will work for most cases, but would trip up over something like "u005Cu005C" which should decode to the string "\u0048" but would actually decode "H" as the first pass produces "\u0048" as the working string which then gets processed again by the while loop.
static final String decode(final String in)
{
String working = in;
int index;
index = working.indexOf("\\u");
while(index > -1)
{
int length = working.length();
if(index > (length-6))break;
int numStart = index + 2;
int numFinish = numStart + 4;
String substring = working.substring(numStart, numFinish);
int number = Integer.parseInt(substring,16);
String stringStart = working.substring(0, index);
String stringEnd = working.substring(numFinish);
working = stringStart + ((char)number) + stringEnd;
index = working.indexOf("\\u");
}
return working;
}
Shorter version:
public static String unescapeJava(String escaped) {
if(escaped.indexOf("\\u")==-1)
return escaped;
String processed="";
int position=escaped.indexOf("\\u");
while(position!=-1) {
if(position!=0)
processed+=escaped.substring(0,position);
String token=escaped.substring(position+2,position+6);
escaped=escaped.substring(position+6);
processed+=(char)Integer.parseInt(token,16);
position=escaped.indexOf("\\u");
}
processed+=escaped;
return processed;
}
StringEscapeUtils from org.apache.commons.lang3 library is deprecated as of 3.6.
So you can use their new commons-text library instead:
compile 'org.apache.commons:commons-text:1.9'
OR
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.9</version>
</dependency>
Example code:
org.apache.commons.text.StringEscapeUtils.unescapeJava(escapedString);
With Kotlin you can write your own extension function for String
fun String.unescapeUnicode() = replace("\\\\u([0-9A-Fa-f]{4})".toRegex()) {
String(Character.toChars(it.groupValues[1].toInt(radix = 16)))
}
and then
fun main() {
val originalString = "\\u0048\\u0065\\u006C\\u006C\\u006F World"
println(originalString.unescapeUnicode())
}
It's not totally clear from your question, but I'm assuming you saying that you have a file where each line of that file is a filename. And each filename is something like this:
\u0048\u0065\u006C\u006C\u006F
In other words, the characters in the file of filenames are \, u, 0, 0, 4, 8 and so on.
If so, what you're seeing is expected. Java only translates \uXXXX sequences in string literals in source code (and when reading in stored Properties objects). When you read the contents you file you will have a string consisting of the characters \, u, 0, 0, 4, 8 and so on and not the string Hello.
So you will need to parse that string to extract the 0048, 0065, etc. pieces and then convert them to chars and make a string from those chars and then pass that string to the routine that opens the file.
For Java 9+, you can use the new replaceAll method of Matcher class.
private static final Pattern UNICODE_PATTERN = Pattern.compile("\\\\u([0-9A-Fa-f]{4})");
public static String unescapeUnicode(String unescaped) {
return UNICODE_PATTERN.matcher(unescaped).replaceAll(r -> String.valueOf((char) Integer.parseInt(r.group(1), 16)));
}
public static void main(String[] args) {
String originalMessage = "\\u0048\\u0065\\u006C\\u006C\\u006F World";
String unescapedMessage = unescapeUnicode(originalMessage);
System.out.println(unescapedMessage);
}
I believe the main advantage of this approach over unescapeJava by StringEscapeUtils (besides not using an extra library) is that you can convert only the unicode characters (if you wish), since the latter converts all escaped Java characters (like \n or \t). If you prefer to convert all escaped characters the library is really the best option.
Updates regarding answers suggesting using The Apache Commons Lang's:
StringEscapeUtils.unescapeJava() - it was deprecated,
Deprecated.
as of 3.6, use commons-text StringEscapeUtils instead
The replacement is Apache Commons Text's StringEscapeUtils.unescapeJava()
Just wanted to contribute my version, using regex:
private static final String UNICODE_REGEX = "\\\\u([0-9a-f]{4})";
private static final Pattern UNICODE_PATTERN = Pattern.compile(UNICODE_REGEX);
...
String message = "\u0048\u0065\u006C\u006C\u006F World";
Matcher matcher = UNICODE_PATTERN.matcher(message);
StringBuffer decodedMessage = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(
decodedMessage, String.valueOf((char) Integer.parseInt(matcher.group(1), 16)));
}
matcher.appendTail(decodedMessage);
System.out.println(decodedMessage.toString());
I wrote a performanced and error-proof solution:
public static final String decode(final String in) {
int p1 = in.indexOf("\\u");
if (p1 < 0)
return in;
StringBuilder sb = new StringBuilder();
while (true) {
int p2 = p1 + 6;
if (p2 > in.length()) {
sb.append(in.subSequence(p1, in.length()));
break;
}
try {
int c = Integer.parseInt(in.substring(p1 + 2, p1 + 6), 16);
sb.append((char) c);
p1 += 6;
} catch (Exception e) {
sb.append(in.subSequence(p1, p1 + 2));
p1 += 2;
}
int p0 = in.indexOf("\\u", p1);
if (p0 < 0) {
sb.append(in.subSequence(p1, in.length()));
break;
} else {
sb.append(in.subSequence(p1, p0));
p1 = p0;
}
}
return sb.toString();
}
one easy way i know using JsonObject:
try {
JSONObject json = new JSONObject();
json.put("string", myString);
String converted = json.getString("string");
} catch (JSONException e) {
e.printStackTrace();
}
Fast
fun unicodeDecode(unicode: String): String {
val stringBuffer = StringBuilder()
var i = 0
while (i < unicode.length) {
if (i + 1 < unicode.length)
if (unicode[i].toString() + unicode[i + 1].toString() == "\\u") {
val symbol = unicode.substring(i + 2, i + 6)
val c = Integer.parseInt(symbol, 16)
stringBuffer.append(c.toChar())
i += 5
} else stringBuffer.append(unicode[i])
i++
}
return stringBuffer.toString()
}
UnicodeUnescaper from Apache Commons Text does exactly what you want, and ignores any other escape sequences.
String input = "\\u0048\\u0065\\u006C\\u006C\\u006F World";
String output = new UnicodeUnescaper().translate(input);
assert("Hello World".equals(output));
assert("\u0048\u0065\u006C\u006C\u006F World".equals(output));
Where input would be the string you are reading from a file.
try
private static final Charset UTF_8 = Charset.forName("UTF-8");
private String forceUtf8Coding(String input) {return new String(input.getBytes(UTF_8), UTF_8))}
Actually, I wrote an Open Source library that contains some utilities. One of them is converting a Unicode sequence to String and vise-versa. I found it very useful. Here is the quote from the article about this library about Unicode converter:
Class StringUnicodeEncoderDecoder has methods that can convert a
String (in any language) into a sequence of Unicode characters and
vise-versa. For example a String "Hello World" will be converted into
"\u0048\u0065\u006c\u006c\u006f\u0020 \u0057\u006f\u0072\u006c\u0064"
and may be restored back.
Here is the link to entire article that explains what Utilities the library has and how to get the library to use it. It is available as Maven artifact or as source from Github. It is very easy to use. Open Source Java library with stack trace filtering, Silent String parsing Unicode converter and Version comparison
Here is my solution...
String decodedName = JwtJson.substring(startOfName, endOfName);
StringBuilder builtName = new StringBuilder();
int i = 0;
while ( i < decodedName.length() )
{
if ( decodedName.substring(i).startsWith("\\u"))
{
i=i+2;
builtName.append(Character.toChars(Integer.parseInt(decodedName.substring(i,i+4), 16)));
i=i+4;
}
else
{
builtName.append(decodedName.charAt(i));
i = i+1;
}
};
I found that many of the answers did not address the issue of "Supplementary Characters". Here is the correct way to support it. No third-party libraries, pure Java implementation.
http://www.oracle.com/us/technologies/java/supplementary-142654.html
public static String fromUnicode(String unicode) {
String str = unicode.replace("\\", "");
String[] arr = str.split("u");
StringBuffer text = new StringBuffer();
for (int i = 1; i < arr.length; i++) {
int hexVal = Integer.parseInt(arr[i], 16);
text.append(Character.toChars(hexVal));
}
return text.toString();
}
public static String toUnicode(String text) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < text.length(); i++) {
int codePoint = text.codePointAt(i);
// Skip over the second char in a surrogate pair
if (codePoint > 0xffff) {
i++;
}
String hex = Integer.toHexString(codePoint);
sb.append("\\u");
for (int j = 0; j < 4 - hex.length(); j++) {
sb.append("0");
}
sb.append(hex);
}
return sb.toString();
}
#Test
public void toUnicode() {
System.out.println(toUnicode("😊"));
System.out.println(toUnicode("πŸ₯°"));
System.out.println(toUnicode("Hello World"));
}
// output:
// \u1f60a
// \u1f970
// \u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
#Test
public void fromUnicode() {
System.out.println(fromUnicode("\\u1f60a"));
System.out.println(fromUnicode("\\u1f970"));
System.out.println(fromUnicode("\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u0057\\u006f\\u0072\\u006c\\u0064"));
}
// output:
// 😊
// πŸ₯°
// Hello World
#NominSim
There may be other character, so I should detect it by length.
private String forceUtf8Coding(String str) {
str = str.replace("\\","");
String[] arr = str.split("u");
StringBuilder text = new StringBuilder();
for(int i = 1; i < arr.length; i++){
String a = arr[i];
String b = "";
if (arr[i].length() > 4){
a = arr[i].substring(0, 4);
b = arr[i].substring(4);
}
int hexVal = Integer.parseInt(a, 16);
text.append((char) hexVal).append(b);
}
return text.toString();
}
An alternate way of accomplishing this could be to make use of chars() introduced with Java 9, this can be used to iterate over the characters making sure any char which maps to a surrogate code point is passed through uninterpreted. This can be used as:-
String myString = "\u0048\u0065\u006C\u006C\u006F World";
myString.chars().forEach(a -> System.out.print((char)a));
// would print "Hello World"
Solution for Kotlin:
val sourceContent = File("test.txt").readText(Charset.forName("windows-1251"))
val result = String(sourceContent.toByteArray())
Kotlin uses UTF-8 everywhere as default encoding.
Method toByteArray() has default argument - Charsets.UTF_8.

Extract digits from a string in Java

I have a Java String object. I need to extract only digits from it. I'll give an example:
"123-456-789" I want "123456789"
Is there a library function that extracts only digits?
Thanks for the answers. Before I try these I need to know if I have to install any additional llibraries?
You can use regex and delete non-digits.
str = str.replaceAll("\\D+","");
Here's a more verbose solution. Less elegant, but probably faster:
public static String stripNonDigits(
final CharSequence input /* inspired by seh's comment */){
final StringBuilder sb = new StringBuilder(
input.length() /* also inspired by seh's comment */);
for(int i = 0; i < input.length(); i++){
final char c = input.charAt(i);
if(c > 47 && c < 58){
sb.append(c);
}
}
return sb.toString();
}
Test Code:
public static void main(final String[] args){
final String input = "0-123-abc-456-xyz-789";
final String result = stripNonDigits(input);
System.out.println(result);
}
Output:
0123456789
BTW: I did not use Character.isDigit(ch) because it accepts many other chars except 0 - 9.
public String extractDigits(String src) {
StringBuilder builder = new StringBuilder();
for (int i = 0; i < src.length(); i++) {
char c = src.charAt(i);
if (Character.isDigit(c)) {
builder.append(c);
}
}
return builder.toString();
}
Using Google Guava:
CharMatcher.inRange('0','9').retainFrom("123-456-789")
UPDATE:
Using Precomputed CharMatcher can further improve performance
CharMatcher ASCII_DIGITS=CharMatcher.inRange('0','9').precomputed();
ASCII_DIGITS.retainFrom("123-456-789");
input.replaceAll("[^0-9?!\\.]","")
This will ignore the decimal points.
eg: if you have an input as 445.3kg the output will be 445.3.
Using Google Guava:
CharMatcher.DIGIT.retainFrom("123-456-789");
CharMatcher is plug-able and quite interesting to use, for instance you can do the following:
String input = "My phone number is 123-456-789!";
String output = CharMatcher.is('-').or(CharMatcher.DIGIT).retainFrom(input);
output == 123-456-789
public class FindDigitFromString
{
public static void main(String[] args)
{
String s=" Hi How Are You 11 ";
String s1=s.replaceAll("[^0-9]+", "");
//*replacing all the value of string except digit by using "[^0-9]+" regex.*
System.out.println(s1);
}
}
Output: 11
Use regular expression to match your requirement.
String num,num1,num2;
String str = "123-456-789";
String regex ="(\\d+)";
Matcher matcher = Pattern.compile( regex ).matcher( str);
while (matcher.find( ))
{
num = matcher.group();
System.out.print(num);
}
I inspired by code Sean Patrick Floyd and little rewrite it for maximum performance i get.
public static String stripNonDigitsV2( CharSequence input ) {
if (input == null)
return null;
if ( input.length() == 0 )
return "";
char[] result = new char[input.length()];
int cursor = 0;
CharBuffer buffer = CharBuffer.wrap( input );
while ( buffer.hasRemaining() ) {
char chr = buffer.get();
if ( chr > 47 && chr < 58 )
result[cursor++] = chr;
}
return new String( result, 0, cursor );
}
i do Performance test to very long String with minimal numbers and result is:
Original code is 25,5% slower
Guava approach is 2.5-3 times slower
Regular expression with D+ is 3-3.5 times slower
Regular expression with only D is 25+ times slower
Btw it depends on how long that string is. With string that contains only 6 number is guava 50% slower and regexp 1 times slower
Using Kotlin and Lambda expressions you can do it like this:
val digitStr = str.filter { it.isDigit() }
You can use str.replaceAll("[^0-9]", "");
I have finalized the code for phone numbers +9 (987) 124124.
Unicode characters occupy 4 bytes.
public static String stripNonDigitsV2( CharSequence input ) {
if (input == null)
return null;
if ( input.length() == 0 )
return "";
char[] result = new char[input.length()];
int cursor = 0;
CharBuffer buffer = CharBuffer.wrap( input );
int i=0;
while ( i< buffer.length() ) { //buffer.hasRemaining()
char chr = buffer.get(i);
if (chr=='u'){
i=i+5;
chr=buffer.get(i);
}
if ( chr > 39 && chr < 58 )
result[cursor++] = chr;
i=i+1;
}
return new String( result, 0, cursor );
}
Code:
public class saasa {
public static void main(String[] args) {
// TODO Auto-generated method stub
String t="123-456-789";
t=t.replaceAll("-", "");
System.out.println(t);
}
import java.util.*;
public class FindDigits{
public static void main(String []args){
FindDigits h=new FindDigits();
h.checkStringIsNumerical();
}
void checkStringIsNumerical(){
String h="hello 123 for the rest of the 98475wt355";
for(int i=0;i<h.length();i++) {
if(h.charAt(i)!=' '){
System.out.println("Is this '"+h.charAt(i)+"' is a digit?:"+Character.isDigit(h.charAt(i)));
}
}
}
void checkStringIsNumerical2(){
String h="hello 123 for 2the rest of the 98475wt355";
for(int i=0;i<h.length();i++) {
char chr=h.charAt(i);
if(chr!=' '){
if(Character.isDigit(chr)){
System.out.print(chr) ;
}
}
}
}
}

How to reverse a string

I need to reverse the string of a user's input.
I need it done in the simplest of ways. I was trying to do reverseOrder(UserInput) but it wasn't working.
For example, user inputs abc I just take the string and print out cba
new StringBuilder(str).reverse().toString()
java.util.Collections.reverseOrder is for sorting in reverse of normal order.
I prefer using Apache's commons-lang for this kind of thing. There are all kinds of goodies, including:
StringUtils.reverse("Hello World!");
yields: !dlroW olleH
StringUtils.reverseDelimited("Hello World!", ' ');
yields: World! Hello
If you are new to programming, which I guess you are, my suggestion is "Why use simple stuff?".
Understand the internals and play some!!
public static void main(String[] args) {
String str = "abcasz";
char[] orgArr = str.toCharArray();
char[] revArr = new char[orgArr.length];
for (int i = 0; i < orgArr.length;i++) {
revArr[i] = orgArr[orgArr.length - 1 - i];
}
String revStr = new String(revArr);
System.out.println(revStr);
There is an interesting method to do it so too.
String input = "abc";
//Here, input is String to reverse
int b = input.length();
String reverse = ""; // Declaring reverse String variable
while(b!=0){
//Loop for switching between the characters of the String input
reverse += (input.charAt(b-1));
b--;
}
System.out.println(reverse);
public String reverseString(final String input_String)
{
char temp;
char[] chars = input_String.toCharArray();
int N = chars.length;
for (int i = 0 ; i < (N / 2) ; i++)
{
temp = chars[i];
chars[i] = chars[N - 1 - i];
chars[N - 1 - i] = temp;
}
return new String(chars);
}
Run :
Pandora
arodnaP
Without going through the char sequence, easiest way:
public String reverse(String post) {
String backward = "";
for(int i = post.length() - 1; i >= 0; i--) {
backward = backward + post.substring(i, i + 1);
}
return backward;
}

Categories