I have a problem converting hexadecimal to a character when the hexadecimal has 3 digits
I have 2 methods which escape and unescape characters over decimal value 127
test\\b8 is produced when test¸ is escaped
The unescape does the following:
for (int i=0, n=node.length; i<n; i++) {
if(c == "\\"){
char c2 = node[i + 1];
char c3 = node[i + 2];
int i= Integer.parseInt(str,16);
char c = (char)i;
System.out.println("Char is:=" + c);
}
}
output - test¸
As you can see I take the first two characters after the slash and convert them into a char. This all works fine. However there are sometimes characters that have 3 hexadecimal digits (for example test\\2d8. This should unescape as test˘). When this enters into my unescape method is won't use all 3 characters. Only the first 2 and therefore the produce wrong results.
Is there a way to determine when to convert 2 or 3 characters
Here's what I would do:
String raw = new String(node); // might be a better way to get a string from the chars
int slashPos = raw.indexOf('\\');
if(slashPos >= 0) {
String hex = raw.substring(slashPos + 1);
int value = Integer.parseInt(hex,16);
}
In this manner, we're not special casing anything for 2, 3, 4, or 100 digits (although I'm sure 100 digits would throw an exception :-) ). Instead, we're using the protocol as a 'milestone' in the string, and then just accepting that everything after the slash is the hex string.
class HexParse {
private static class HexResult {
final boolean exists;
final int value;
HexResult(boolean e, int v) { exists = e; value = v; }
}
private final String raw;
private final HexResult result;
public HexParse(String raw) {
this.raw = raw;
int slashPos = raw.indexOf('\\');
boolean noSlash = slashPos < 0;
boolean noTextAfterSlash = slashPos > raw.length() - 2;
if(noSlash || noTextAfterSlash) {
result = new HexResult(false,0);
} else {
// throws exception if second part of string contains non-hex chars
result = new HexResult(true,Integer.parseInt(raw.substring(slashPos + 1),16));
}
}
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append(raw).append(" ");
if(result.exists) {
sb.append("has hex of decimal value ").append(result.value);
} else {
sb.append("has no hex");
}
return sb.toString();
}
public static void main(String...args) {
System.out.println(new HexParse("test`hello")); // none
System.out.println(new HexParse("haha\\abcdef")); // hex
System.out.println(new HexParse("good\\f00d")); // hex
System.out.println(new HexParse("\\84b")); // hex
System.out.println(new HexParse("\\")); // none
System.out.println(new HexParse("abcd\\efgh")); //exception
}
}
c:\files\j>javac HexParse.java
c:\files\j>java HexParse
test`hello has no hex
haha\abcdef has hex of decimal value 11259375
good\f00d has hex of decimal value 61453
\84b has hex of decimal value 2123
\ has no hex
Exception in thread "main" java.lang.NumberFormatException: For input string: "e
fgh"
at java.lang.NumberFormatException.forInputString(NumberFormatException.
java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at HexParse.<init>(HexParse.java:21)
at HexParse.main(HexParse.java:
Related
I want to take the value of a ASCII value(Saved as a string) and convert it to the character to reveal a message. I tried this and it keeps throwing an index out of bound at the declaration of the int b.It also shows that str and b do not have a value
String value = "104 101 108 108 111";
char[] ch = new char[value.length()];
for (int i = 0; i < value.length(); i++) {
ch[i] = value.charAt(i);
}
System.out.println(ch.length);
String ans = "";
int i = 0;
while (i+2 < ch.length) {
int b= ch[i]+ch[i++]+ch[i+2];
String str = new Character((char) b).toString();
System.out.println(str);
System.out.println(b);
ans = ans+str;
i=i+3;
}
Using string split function
String value = "104 101 108 108 111";
String[] arrOfStr = value.split(" ");
String ans = "";
for(String str : arrOfStr) {
String str1 = Character.toString((char)Integer.parseInt(str));
ans += str1;
}
System.out.println(ans); // output: hello
We can switch the Imperative code to Declarative code using Java 8 Streams.
Key points to observe:
Declarative style is more readable and easy to write.
String Joiner is faster than simple String Concatenation.
No need to write an iterator.
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
String value = "104 101 108 108 111";
Arrays.stream(value.split(" ")) // Starting a stream of String[]
.mapToInt(Integer::parseInt) // mapping String to int
.mapToObj(Character::toChars) // finding ASCII char from int
.forEach(System.out::print); // printing each character
}
}
If you wish to store the result and then print it, this is how is done.
import java.util.Arrays;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
String value = "104 101 108 108 111";
String result = Arrays.stream(value.split(" ")) // Starting a stream of String[]
.mapToInt(Integer::parseInt) // mapping String to int
.mapToObj(Character::toChars) // finding ASCII char from int
.map(String::new) // convert char to String
.collect(Collectors.joining()); // combining individual result using String Joiner
System.out.println(result);
}
}
Comments to code:
There is a built-in method for for getting a char[] with the characters of a string, so the following two blocks of code are the same:
// Code from question
char[] ch = new char[value.length()];
for (int i = 0; i < value.length(); i++) {
ch[i] = value.charAt(i);
}
// Using built-in method
char[] ch = value.toCharArray();
It is better to use a for loop when increment a value while looping. The following two ways of writing the loop behave the same, but the for loop keeps the loop logic together:
// Code from question
int i = 0;
while (i+2 < ch.length) {
// some code here
i=i+3;
}
// Using for loop
for (int i = 0; i + 2 < ch.length; i=i+3) {
// some code here
}
The following line of code is entirely wrong:
int b= ch[i]+ch[i++]+ch[i+2];
i++ increments the value is i, but it is the value before the increment that is used in the expression, which means that if i = 0 before the line, the result is the same as this code:
int b = ch[0] + ch[0] + ch[2];
i = i + 1;
You need to replace i++ with i + 1, and realize that those are not the same.
Since you no longer increment the value of i by 1 in that statement, the loop much be changed from i=i+3 to i = i + 4, to correctly skip the spaces in the input.
The value of ch[i] is a char value, which is widened to an int value by the use of the + operator. The int value of a char is the Unicode Code Point value, which for your text is also the same as the ASCII code for the character.
This means that if i = 0, the expression would (after fixing issue #1) evaluate as:
int b = ch[0] + ch[1] + ch[2];
int b = `1` + `0` + `4`;
int b = 49 + 48 + 52;
int b = 149;
That matches the output from running the code is in question, where the second printed number is 149 (after fixing issue #1).
What you really wanted was to get the substring "104" and convert that to a number, then cast that ASCII code value to a char, like this:
String numberStr = value.substring(i, i + 3); // E.g. "104"
int number = Integer.parseInt(numberStr); // E.g. 104
String str = String.valueOf((char) number); // E.g. "h"
With that, you no longer need the char[], so the final code would be:
String value = "104 101 108 108 111";
String ans = "";
for (int i = 0; i + 2 < value.length(); i += 4) {
String numberStr = value.substring(i, i + 3);
int number = Integer.parseInt(numberStr);
String str = String.valueOf((char) number);
ans = ans + str;
}
System.out.println(ans);
Output
hello
i will like to know how do i mask any number of string characters except the last 4 strings.
I want to masked all strings using "X"
For example
Number:"S1234567B"
Result
Number :"XXXXX567B
Thank you guys
Solution 1
You can do it with a regular expression.
This is the shortest solution.
static String mask(String input) {
return input.replaceAll(".(?=.{4})", "X");
}
The regex matches any single character (.) that is followed (zero-width positive lookahead) by at least 4 characters ((?=.{4})). Replace each such single character with an X.
Solution 2
You can do it by getting a char[]1, updating it, and building a new string.
This is the fastest solution, and uses the least amount of memory.
static String mask(String input) {
if (input.length() <= 4)
return input; // Nothing to mask
char[] buf = input.toCharArray();
Arrays.fill(buf, 0, buf.length - 4, 'X');
return new String(buf);
}
1) Better than using a StringBuilder.
Solution 3
You can do it using the repeat(int count) method that was added to String in Java 11.
This is likely the easiest solution to understand.
static String mask(String input) {
int maskLen = input.length() - 4;
if (maskLen <= 0)
return input; // Nothing to mask
return "X".repeat(maskLen) + input.substring(maskLen);
}
Kotlin extension which will take care of the number of stars that you want to set and also number of digits for ex: you have this string to be masked: "12345678912345" and want to be ****2345 then you will have:
fun String.maskStringWithStars(numberOfStars: Int, numberOfDigitsToBeShown: Int): String {
var stars = ""
for (i in 1..numberOfStars) {
stars += "*"
}
return if (this.length > numberOfDigitsToBeShown) {
val lastDigits = this.takeLast(numberOfDigitsToBeShown)
"$stars$lastDigits"
} else {
stars
}
}
Usage:
companion object{
const val DEFAULT_NUMBER_OF_STARS = 4
const val DEFAULT_NUMBER_OF_DIGITS_TO_BE_SHOWN = 4
}
yourString.maskStringWithStars(DEFAULT_NUMBER_OF_STARS,DEFAULT_NUMBER_OF_DIGITS_TO_BE_SHOWN)
You can do it with the help of StringBuilder in java as follows,
String value = "S1234567B";
String formattedString = new StringBuilder(value)
.replace(0, value.length() - 4, new String(new char[value.length() - 4]).replace("\0", "x")).toString();
System.out.println(formattedString);
You can use a StringBuilder.
StringBuilder sb = new StringBuilder("S1234567B");
for (int i = 0 ; i < sb.length() - 4 ; i++) { // note the upper limit of the for loop
// sets every character to X until the fourth to last character
sb.setCharAt(i, 'X');
}
String result = sb.toString();
My class to mask simple String
class MaskFormatter(private val pattern: String, private val splitter: Char? = null) {
fun format(text: String): String {
val patternArr = pattern.toCharArray()
val textArr = text.toCharArray()
var textI = 0
for (patternI in patternArr.indices) {
if (patternArr[patternI] == splitter) {
continue
}
if (patternArr[patternI] == 'A' && textI < textArr.size) {
patternArr[patternI] = textArr[textI]
}
textI++
}
return String(patternArr)
}
}
Example use
MaskFormatter("XXXXXAAAA").format("S1234567B") // XXXXX567B
MaskFormatter("XX.XXX.AAAA", '.').format("S1234567B") // XX.XXX.567B
MaskFormatter("**.***.AAAA", '.').format("S1234567B") // **.***.567B
MaskFormatter("AA-AAA-AAAA",'-').format("123456789") // 12-345-6789
I have a string, which I believed contains some of ISO-8859-1 hex character code
String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n"
And I want to change it into this,
Áo thun bé gái cột dây xanh biển
I have tried this method but no luck
byte[] isoBytes = doc.getBytes("ISO-8859-1");
System.out.println(new String(isoBytes, "UTF-8"));
What is the proper way to convert it? Many thanks for your help!
On the assumption that the #nnnn; sequences are plain old Unicode character representation, I suggest the following approach.
class Cvt {
static String convert(String in) {
String str = in;
int curPos = 0;
while (curPos < str.length()) {
int j = str.indexOf("#x", curPos);
if (j < 0) // no more #x
curPos = str.length();
else {
int k = str.indexOf(';', curPos + 2);
if (k < 0) // unterminated #x
curPos = str.length();
else { // convert #xNNNN;
int n = Integer.parseInt(str.substring(j+2, k), 16);
char[] ch = { (char)n };
str = str.substring(0, j) + new String(ch) + str.substring(k+1);
curPos = j + 1; // after ch
}
}
}
return str;
}
static public void main(String... args) {
String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";
System.out.println(convert(doc));
}
}
This is very similar to the approach of the previous answer, except for the assumption that the character is a Unicode codepoint and not an 8859-1 codepoint.
And the output is
Áo thun bé gái cột dây xanh biển
There is no hex literal syntax for strings in Java. If you need to support that String format, I would make a helper function which parses that format and builds up a byte array and then parse that as ISO-8859-1.
import java.io.ByteArrayOutputStream;
public class translate {
private static byte[] parseBytesWithHexLiterals(String s) throws Exception {
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
while (!s.isEmpty()) {
if (s.startsWith("#x")) {
s = s.substring(2);
while (s.charAt(0) != ';') {
int i = Integer.parseInt(s.substring(0, 2), 16);
baos.write(i);
s = s.substring(2);
}
} else {
baos.write(s.substring(0, 1).getBytes("US-ASCII")[0]);
}
s = s.substring(1);
}
return baos.toByteArray();
}
public static void main(String[] args) throws Exception {
String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";
byte[] parsedAsISO88591 = parseBytesWithHexLiterals(doc);
doc = new String(parsedAsISO88591, "ISO-8859-1");
System.out.println(doc); // Print out the string, which is in Unicode internally.
byte[] asUTF8 = doc.getBytes("UTF-8"); // Get a UTF-8 version of the string.
}
}
This is a case where the code can really obscure the requirements. The requirements are a bit uncertain but seem to be to decode a specialized Unicode character entity reference similar to HTML and XML, as documented in the comments.
It is also a somewhat rare case where the advantage of the regular expression engine outweighs any studying needed to understand the pattern language.
String input = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";
// Hex digits between "#x" and ";" are a Unicode codepoint value
String text = java.util.regex.Pattern.compile("(#x([0-9A-Fa-f]+);)")
.matcher(input)
// group 2 is the matched input between the 2nd ( in the pattern and its paired )
.replaceAll(x -> new String(Character.toChars(Integer.parseInt(x.group(2), 16))));
System.out.println(text);
The matcher function finds candidate strings to replace that match the pattern. The replaceAll function replaces them with the calculated Unicode codepoint. Since a Unicode codepoint might be encoded as two char (UTF-16) values the desired replacement string must be constructed from a char[].
This question already has answers here:
What is the regex to extract all the emojis from a string?
(18 answers)
Closed 5 years ago.
I have a Java string that contains supplementary characters (characters in the Unicode standard whose code points are above U+FFFF). These characters could for example be emojis. I want to remove those characters from the string, i.e. replace them with the empty string "".
How do I remove supplementary characters from a string?
How do I remove characters from an arbitrary code point range? (For example all characters in the range 1F000–1FFFF)?
There are a couple of approaches. As regex replace is expensive, maybe do:
String basic(String s) {
StringBuilder sb = new StringBuilder();
for (char ch : s.toCharArray()) {
if (!Character.isLowSurrogate(ch) && !Character.isHighSurrogate(ch)) {
sb.append(ch);
}
}
return sb.length() == s.length() ? s : sb.toString();
}
You can get a character's unicode value by simply converting it to an int.
Therefore, you'll want to do the following:
Convert your String to a char[], or do something like have the loop condition iterate through each character in the String using String.charAt()
Check if the unicode value is one you want to remove.
If so, replace the character with "".
This is just to start you off, however if you're still struggling I can try type out a whole example.
Good luck!
Here is a code snippet that collects characters between code point 60 and 100:
public class Test {
public static void main(String[] args) {
new Test().go();
}
private void go() {
String s = "ABC12三○";
String ret = "";
for (int i = 0; i < s.length(); i++) {
System.out.println(s.codePointAt(i));
if ((s.codePointAt(i) > 60) & (s.codePointAt(i) < 100)) {
ret += s.substring(i, i+1);
}
}
System.out.println(ret);
}
}
the result:
code point: 65
code point: 66
code point: 67
code point: 49
code point: 50
code point: 19977
code point: 65518
result: ABC
Hope this helps.
Java strings are UTF-16 encoded. The String type has a codePointAt() method for retrieving a decoded codepoint at a given char (codeunit) index.
So, you can do something like this, for instance:
String removeSupplementaryChars(String s)
{
int len = s.length();
if (len == 0)
return "";
StringBuilder sb = new StringBuilder(len);
int i = 0;
do
{
if (s.codePointAt(i) <= 0xFFFF)
sb.append(s.charAt[i]);
i = s.offsetByCodePoints(i, 1);
}
while (i < len);
return sb.toString();
}
Or this:
String removeCodepointsinRange(String s, int lower, int upper)
{
int len = s.length();
if (len == 0)
return "";
StringBuilder sb = new StringBuilder(len);
int i = 0;
do
{
int cp = s.codePointAt(i);
if ((cp < lower) || (cp > upper))
sb.appendCodePoint(cp);
i = s.offsetByCodePoints(i, 1);
}
while (i < len);
return sb.toString();
}
Not sure why my code isn't working. If I input qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT, I get qw9w5e2ry5y4qE2ET3T when I should be getting q9w5e2rt5y4qw2Er3T.
Run-length encoding (RLE) is a simple "compression algorithm" (an algorithm which takes a block of data and reduces its size, producing a block that contains the same information in less space). It works by replacing repetitive sequences of identical data items with short "tokens" that represent entire sequences. Applying RLE to a string involves finding sequences in the string where the same character repeats. Each such sequence should be replaced by a "token" consisting of:
the number of characters in the sequence
the repeating character
If a character does not repeat, it should be left alone.
For example, consider the following string:
qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT
After applying the RLE algorithm, this string is converted into:
q9w5e2rt5y4qw2Er3T
In the compressed string, "9w" represents a sequence of 9 consecutive lowercase "w" characters. "5e" represents 5 consecutive lowercase "e" characters, etc.
Write a program that takes a string as input, compresses it using RLE, and outputs the compressed string. Case matters - uppercase and lowercase characters should be considered distinct. You may assume that there are no digit characters in the input string. There are no other restrictions on the input - it may contain spaces or punctuation. There is no need to treat non-letter characters any differently from letters.
public class Compress{
public static void main(String[] args){
System.out.println("Enter a string: ");
String str = IO.readString();
int count = 0;
String result = "";
for (int i=1; i<=str.length(); i++) {
char a = str.charAt(i-1);
count = 1;
if (i-2 >= 0) {
while (i<=str.length() && str.charAt(i-1) == str.charAt(i-2)) {
count++;
i++;
}
}
if (count==1) {
result = result.concat(Character.toString(a));
}
else {
result = result.concat(Integer.toString(count).concat(Character.toString(a)));
}
}
IO.outputStringAnswer(result);
}
}
I would start at zero, and look forward:
public static void main(String[] args){
System.out.println("Enter a string: ");
String str = IO.readString();
int count = 0;
String result = "";
for (int i=0; i < str.length(); i++) {
char a = str.charAt(i);
count = 1;
while (i + 1 < str.length() && str.charAt(i) == str.charAt(i+1)) {
count++;
i++;
}
if (count == 1) {
result = result.concat(Character.toString(a));
} else {
result = result.concat(Integer.toString(count).concat(Character.toString(a)));
}
}
IO.outputStringAnswer(result);
}
Some outputs:
qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT => q9w5e2rt5y4qw2Er3T
qqwwwwwwwweeeeerrtyyyyyqqqqwEErTTT => 2q8w5e2rt5y4qw2Er3T
qqwwwwwwwweeeeerrtyyyyyqqqqwEErTXZ => 2q8w5e2rt5y4qw2ErTXZ
aaa => 3a
abc => abc
a => a