How to convert a Java String to an ASCII byte array? - java

How to convert a Java String to an ASCII byte array?

Using the getBytes method, giving it the appropriate Charset (or Charset name).
Example:
String s = "Hello, there.";
byte[] b = s.getBytes(StandardCharsets.US_ASCII);
If more control is required (such as throwing an exception when a character outside the 7 bit US-ASCII is encountered) then CharsetDecoder can be used:
private static byte[] strictStringToBytes(String s, Charset charset) throws CharacterCodingException {
ByteBuffer x = charset.newEncoder().onMalformedInput(CodingErrorAction.REPORT).encode(CharBuffer.wrap(s));
byte[] b = new byte[x.remaining()];
x.get(b);
return b;
}
Before Java 7 it is possible to use: byte[] b = s.getBytes("US-ASCII");. The enum StandardCharsets, the encoder as well as the specialized getBytes(Charset) methods have been introduced in Java 7.

If you are a guava user there is a handy Charsets class:
String s = "Hello, world!";
byte[] b = s.getBytes(Charsets.US_ASCII);
Apart from not hard-coding arbitrary charset name in your source code it has a much bigger advantage: Charsets.US_ASCII is of Charset type (not String) so you avoid checked UnsupportedEncodingException thrown only from String.getBytes(String), but not from String.getBytes(Charset).
In Java 7 there is equivalent StandardCharsets class.

There is only one character wrong in the code you tried:
Charset characterSet = Charset.forName("US-ASCII");
String string = "Wazzup";
byte[] bytes = String.getBytes(characterSet);
^
Notice the upper case "String". This tries to invoke a static method on the string class, which does not exist. Instead you need to invoke the method on your string instance:
byte[] bytes = string.getBytes(characterSet);

The problem with other proposed solutions is that they will either drop characters that cannot be directly mapped to ASCII, or replace them with a marker character like ?.
You might desire to have for example accented characters converted to that same character without the accent. There are a couple of tricks to do this (including building a static mapping table yourself or leveraging existing 'normalization' defined for unicode), but those methods are far from complete.
Your best bet is using the junidecode library, which cannot be complete either but incorporates a lot of experience in the most sane way of transliterating Unicode to ASCII.

String s = "ASCII Text";
byte[] bytes = s.getBytes("US-ASCII");

If you happen to need this in Android and want to make it work with anything older than FroYo, you can also use EncodingUtils.getAsciiBytes():
byte[] bytes = EncodingUtils.getAsciiBytes("ASCII Text");

In my string I have Thai characters (TIS620 encoded) and German umlauts. The answer from agiles put me on the right path. Instead of .getBytes() I use now
int len = mString.length(); // Length of the string
byte[] dataset = new byte[len];
for (int i = 0; i < len; ++i) {
char c = mString.charAt(i);
dataset[i]= (byte) c;
}

Convert string to ascii values.
String test = "ABCD";
for ( int i = 0; i < test.length(); ++i ) {
char c = test.charAt( i );
int j = (int) c;
System.out.println(j);
}

I found the solution. Actually Base64 class is not available in Android. Link is given below for more information.
byte[] byteArray;
byteArray= json.getBytes(StandardCharsets.US_ASCII);
String encoded=Base64.encodeBytes(byteArray);
userLogin(encoded);
Here is the link for Base64 class: http://androidcodemonkey.blogspot.com/2010/03/how-to-base64-encode-decode-android.html

To convert String to ASCII byte array:
String s1 = "Hello World!";
byte[] byteArray = s1.getBytes(StandardCharsets.US_ASCII);
// Now byteArray is [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
To convert ASCII byte array to String:
String s2 = new String(byteArray, StandardCharsets.US_ASCII));

Try this:
/**
* #(#)demo1.java
*
*
* #author
* #version 1.00 2012/8/30
*/
import java.util.*;
public class demo1
{
Scanner s=new Scanner(System.in);
String str;
int key;
void getdata()
{
System.out.println ("plase enter a string");
str=s.next();
System.out.println ("plase enter a key");
key=s.nextInt();
}
void display()
{
char a;
int j;
for ( int i = 0; i < str.length(); ++i )
{
char c = str.charAt( i );
j = (int) c + key;
a= (char) j;
System.out.print(a);
}
public static void main(String[] args)
{
demo1 obj=new demo1();
obj.getdata();
obj.display();
}
}
}

Related

Need help in converting EBCDIC to Hexadecimal

I am writing an hive UDF to convert the EBCDIC character to Hexadecimal.
Ebcdic characters are present in hive table.Currently I am able to convert it, bit it is ignoring few characters while conversion.
Example:
This is the EBCDIC value stored in table:
AGNSAñA¦ûÃÃÂõÂjÂq  à ()
Converted hexadecimal:
c1c7d5e2000a5cd4f6ef99187d07067203a0200258dd9736009f000000800017112400000000001000084008403c000000000000000080
What I want as output:
c1c7d5e200010a5cd4f6ef99187d0706720103a0200258dd9736009f000000800017112400000000001000084008403c000000000000000080
It is ignoring to convert the below EBCDIC characters:
01 - It is start of heading
10 - It is a escape
15 - New line.
Below is the code I have tried so far:
public class EbcdicToHex extends UDF {
public String evaluate(String edata) throws UnsupportedEncodingException {
byte[] ebcdiResult = getEBCDICRawData(edata);
String hexResult = getHexData(ebcdiResult);
return hexResult;
}
public byte[] getEBCDICRawData (String edata) throws UnsupportedEncodingException {
byte[] result = null;
String ebcdic_encoding = "IBM-037";
result = edata.getBytes(ebcdic_encoding);
return result;
}
public String getHexData(byte[] result){
String output = asHex(result);
return output;
}
public static String asHex(byte[] buf) {
char[] HEX_CHARS = "0123456789abcdef".toCharArray();
char[] chars = new char[2 * buf.length];
for (int i = 0; i < buf.length; ++i) {
chars[2 * i] = HEX_CHARS[(buf[i] & 0xF0) >>> 4];
chars[2 * i + 1] = HEX_CHARS[buf[i] & 0x0F];
}
return new String(chars);
}
}
While converting, its ignoring few EBCDIC characters. How to make them also converted to hexadecimal?
I think the problem lies elsewhere, I created a small testcase where I create a String based on those 3 bytes you claim to be ignored, but in my output they do seem to be converted correctly:
private void run(String[] args) throws Exception {
byte[] bytes = new byte[] {0x01, 0x10, 0x15};
String str = new String(bytes, "IBM-037");
byte[] result = getEBCDICRawData(str);
for(byte b : result) {
System.out.print(Integer.toString(( b & 0xff ) + 0x100, 16).substring(1) + " ");
}
System.out.println();
System.out.println(evaluate(str));
}
Output:
01 10 15
011015
Based on this it seems both your getEBCDICRawData and evaluate method seem to be working correctly and makes me believe your String value may already be incorrect to start with. Could it be the String is already missing those characters? Or perhaps a long shot, but maybe the charset is incorrect? There are different EBCDIC charsets, so maybe the String is composed using a different one? Although I doubt this would make much difference for the 01, 10 and 15 bytes.
As a final remark, but probably unrelated to your problem, I usually prefer to use the encode/decode functions on the charset object to do such conversions:
String charset = "IBM-037";
Charset cs = Charset.forName(charset);
ByteBuffer bb = cs.encode(str);
CharBuffer cb = cs.decode(bb);

How to convert Java String into byte[]?

Is there any way to convert Java String to a byte[] (not the boxed Byte[])?
In trying this:
System.out.println(response.split("\r\n\r\n")[1]);
System.out.println("******");
System.out.println(response.split("\r\n\r\n")[1].getBytes().toString());
and I'm getting separate outputs. Unable to display 1st output as it is a gzip string.
<A Gzip String>
******
[B#38ee9f13
The second is an address. Is there anything I'm doing wrong? I need the result in a byte[] to feed it to gzip decompressor, which is as follows.
String decompressGZIP(byte[] gzip) throws IOException {
java.util.zip.Inflater inf = new java.util.zip.Inflater();
java.io.ByteArrayInputStream bytein = new java.io.ByteArrayInputStream(gzip);
java.util.zip.GZIPInputStream gzin = new java.util.zip.GZIPInputStream(bytein);
java.io.ByteArrayOutputStream byteout = new java.io.ByteArrayOutputStream();
int res = 0;
byte buf[] = new byte[1024];
while (res >= 0) {
res = gzin.read(buf, 0, buf.length);
if (res > 0) {
byteout.write(buf, 0, res);
}
}
byte uncompressed[] = byteout.toByteArray();
return (uncompressed.toString());
}
The object your method decompressGZIP() needs is a byte[].
So the basic, technical answer to the question you have asked is:
byte[] b = string.getBytes();
byte[] b = string.getBytes(Charset.forName("UTF-8"));
byte[] b = string.getBytes(StandardCharsets.UTF_8); // Java 7+ only
However the problem you appear to be wrestling with is that this doesn't display very well. Calling toString() will just give you the default Object.toString() which is the class name + memory address. In your result [B#38ee9f13, the [B means byte[] and 38ee9f13 is the memory address, separated by an #.
For display purposes you can use:
Arrays.toString(bytes);
But this will just display as a sequence of comma-separated integers, which may or may not be what you want.
To get a readable String back from a byte[], use:
String string = new String(byte[] bytes, Charset charset);
The reason the Charset version is favoured, is that all String objects in Java are stored internally as UTF-16. When converting to a byte[] you will get a different breakdown of bytes for the given glyphs of that String, depending upon the chosen charset.
String example = "Convert Java String";
byte[] bytes = example.getBytes();
Simply:
String abc="abcdefghight";
byte[] b = abc.getBytes();
Try using String.getBytes(). It returns a byte[] representing string data.
Example:
String data = "sample data";
byte[] byteData = data.getBytes();
You can use String.getBytes() which returns the byte[] array.
You might wanna try return new String(byteout.toByteArray(Charset.forName("UTF-8")))
I know I'm a little late tothe party but thisworks pretty neat (our professor gave it to us)
public static byte[] asBytes (String s) {
String tmp;
byte[] b = new byte[s.length() / 2];
int i;
for (i = 0; i < s.length() / 2; i++) {
tmp = s.substring(i * 2, i * 2 + 2);
b[i] = (byte)(Integer.parseInt(tmp, 16) & 0xff);
}
return b; //return bytes
}
i had to conwert a int to decimal 3 byte 129 to 1 2 9
Byte data
int i1 = 129
int i3 = (i1 / 100);
i1 = i1 - i3*100;
int i2 = (i1 / 10);
i1 = i1 - i2*10;
data [1]= (byte) i1
data [2]= (byte) i2
data [3]= (byte) i3
It is not necessary to change java as a String parameter. You have to change the c code to receive a String without a pointer and in its code:
Bool DmgrGetVersion (String szVersion);
Char NewszVersion [200];
Strcpy (NewszVersion, szVersion.t_str ());
.t_str () applies to builder c ++ 2010

String to bytes to string

The question is in comments in the code, I thought that'd be an easier way to ask...
Easy question, but I can't seem to find an answer. I want to convert a String to it's byte[] (easy, String.getBytes()). Then I want to convert a String of bytes (101011010101001 for example) to a byte[] and get the String value of that (that's easy too: new String(byte[]))
Here's what I've got so far:
Scanner scan = new Scanner(System.in);
String string = scan.nextLine();
String byteString = "";
for (byte b : string.getBytes()) {
byteString += b;
}
System.out.println(byteString);
//This isn't exactly how it works, these two parts in separate methods, but you get the idea...
String byteString = scan.nextLine();
byte[] bytes = byteString.literalToBytes() //<== or something like that...
//The line above is pretty much all I need...
String string = new String(bytes);
System.out.println(string);
This won't work. The problem is that when you convert your bytes to a string you are going to get a string like
2532611134
So analyzing this string, is the first byte 2, or 25, or 253?
The only way to make this work would be to use a DecimalFormat and make sure every byte is 3 characters long in your string
EDIT
Please see this answer for a solution.
With this you can:
String string = scan.nextLine();
String convertByte = convertByte(string.getBytes());
System.out.println(convertByte);
String byteString = scan.nextLine();
System.out.println(new String(convertStr(byteString)));
Alright, because the commenter who pointed me to this question (which lead me to this answer) isn't going to answer, I'll just post the solution here:
Scanner scan = new Scanner(System.in);
String pass = scan.nextLine();
StringBuilder byteString = new StringBuilder();
for (byte b : pass.getBytes()) {
b = (byte) (b);
byteString.append(b).append(","); //appending that comma is what does the trick.
}
System.out.println(byteString);
//
String[] split = byteString.toString().split(","); //splitting by that comma is what does the trick... too...
byte[] bytes = new byte[split.length];
for (int i = 0; i < split.length; i++) {
bytes[i] = (byte) (Byte.valueOf(split[i]).byteValue());
}
System.out.println(new String(bytes));
I guess what you want is this
// to get back the string from byte array
StringBuilder byteString = new StringBuilder();
for (byte b : string.getBytes()) {
byteString.append((char)b);
}
System.out.println(byteString.toString());
// to get the binary representation from string
StringBuilder byteString = new StringBuilder();
for (byte b : string.getBytes()) {
System.out.print(Integer.toBinaryString((int)b));
}
System.out.println(byteString.toString());

How to parse UTF-8 representation to String in Java?

Given the following code:
String tmp = new String("\\u0068\\u0065\\u006c\\u006c\\u006f\\u000a");
String result = convertToEffectiveString(tmp); // result contain now "hello\n"
Does the JDK already provide some classes for doing this ?
Is there a libray that does this ? (preferably under maven)
I have tried with ByteArrayOutputStream with no success.
This works, but only with ASCII. If you use unicode characters outside of the ASCCI range, then you will have problems (as each character is being stuffed into a byte, instead of a full word that is allowed by UTF-8). You can do the typecast below because you know that the UTF-8 will not overflow one byte if you guaranteed that the input is basically ASCII (as you mention in your comments).
package sample;
import java.io.UnsupportedEncodingException;
public class UnicodeSample {
public static final int HEXADECIMAL = 16;
public static void main(String[] args) {
try {
String str = "\\u0068\\u0065\\u006c\\u006c\\u006f\\u000a";
String arr[] = str.replaceAll("\\\\u"," ").trim().split(" ");
byte[] utf8 = new byte[arr.length];
int index=0;
for (String ch : arr) {
utf8[index++] = (byte)Integer.parseInt(ch,HEXADECIMAL);
}
String newStr = new String(utf8, "UTF-8");
System.out.println(newStr);
}
catch (UnsupportedEncodingException e) {
// handle the UTF-8 conversion exception
}
}
}
Here is another solution that fixes the issue of only working with ASCII characters. This will work with any unicode characters in the UTF-8 range instead of ASCII only in the first 8-bits of the range. Thanks to deceze for the questions. You made me think more about the problem and solution.
package sample;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
public class UnicodeSample {
public static final int HEXADECIMAL = 16;
public static void main(String[] args) {
try {
String str = "\\u0068\\u0065\\u006c\\u006c\\u006f\\u000a\\u3fff\\uf34c";
ArrayList<Byte> arrList = new ArrayList<Byte>();
String codes[] = str.replaceAll("\\\\u"," ").trim().split(" ");
for (String c : codes) {
int code = Integer.parseInt(c,HEXADECIMAL);
byte[] bytes = intToByteArray(code);
for (byte b : bytes) {
if (b != 0) arrList.add(b);
}
}
byte[] utf8 = new byte[arrList.size()];
for (int i=0; i<arrList.size(); i++) utf8[i] = arrList.get(i);
str = new String(utf8, "UTF-8");
System.out.println(str);
}
catch (UnsupportedEncodingException e) {
// handle the exception when
}
}
// Takes a 4 byte integer and and extracts each byte
public static final byte[] intToByteArray(int value) {
return new byte[] {
(byte) (value >>> 24),
(byte) (value >>> 16),
(byte) (value >>> 8),
(byte) (value)
};
}
}
Firstly, are you just trying to parse a string literal, or is tmp going to be some user-entered data?
If this is going to be a string literal (i.e. hard-coded string), it can be encoded using Unicode escapes. In your case, this just means using single backslashes instead of double backslashes:
String result = "\u0068\u0065\u006c\u006c\u006f\u000a";
If, however, you need to use Java's string parsing rules to parse user input, a good starting point might be Apache Commons Lang's StringEscapeUtils.unescapeJava() method.
I'm sure there must be a better way, but using just the JDK:
public static String handleEscapes(final String s)
{
final java.util.Properties props = new java.util.Properties();
props.setProperty("foo", s);
final java.io.ByteArrayOutputStream baos = new java.io.ByteArrayOutputStream();
try
{
props.store(baos, null);
final String tmp = baos.toString().replace("\\\\", "\\");
props.load(new java.io.StringReader(tmp));
}
catch(final java.io.IOException ioe) // shouldn't happen
{ throw new RuntimeException(ioe); }
return props.getProperty("foo");
}
uses java.util.Properties.load(java.io.Reader) to process the backslash-escapes (after first using java.util.Properties.store(java.io.OutputStream, java.lang.String) to backslash-escape anything that would cause problems in a properties-file, and then using replace("\\\\", "\\") to reverse the backslash-escaping of the original backslashes).
(Disclaimer: even though I tested all the cases I could think of, there are still probably some that I didn't think of.)

How to get encoded version of string (e.g. \u0421\u043b\u0443\u0436\u0435\u0431\u043d\u0430\u044f)

How to get encoded version of string (e.g. \u0421\u043b\u0443\u0436\u0435\u0431\u043d\u0430\u044f) using Java?
EDIT:
I guess the question is not very clear... Basically what I want is this:
Given string s="blalbla" I want to get string "\uXXX\uYYYY"
You will need to extract each code point/unit from the String and encode it yourself. The following works for all Strings even if the individual linguistic characters within the String are composed of digraphs or ligatures.
public String getUnicodeEscapes(String aString)
{
if (aString != null && aString.length() > 0)
{
int length = aString.length();
StringBuilder buffer = new StringBuilder(length);
for (int ctr = 0; ctr < length; ctr++)
{
char codeUnit = aString.charAt(ctr);
String hexString = Integer.toHexString(codeUnit);
String padAmount = "0000".substring(hexString.length());
buffer.append("\\u");
buffer.append(padAmount);
buffer.append(hexString);
}
return buffer.toString();
}
else
{
return null;
}
}
The above produces output as dictated by the Java Language Specification on Unicode escapes, i.e. it produces output of the form \uxxxx for each UTF-16 code unit. It addresses supplementary characters by producing a pair of code units represented as \uxxxx\uyyyy.
The originally posted code has been modified to produce Unicode codepoints in the format U+FFFFF:
public String getUnicodeCodepoints(String aString)
{
if (aString != null && aString.length() > 0)
{
int length = aString.length();
StringBuilder buffer = new StringBuilder(length);
for (int ctr = 0; ctr < length; ctr++)
{
char ch = aString.charAt(ctr);
if (Character.isLowSurrogate(ch))
{
continue;
}
else
{
int codePoint = aString.codePointAt(ctr);
String hexString = Integer.toHexString(codePoint);
String zeroPad = Character.isHighSurrogate(ch) ? "00000" : "0000";
String padAmount = zeroPad.substring(hexString.length());
buffer.append(" U+");
buffer.append(padAmount);
buffer.append(hexString);
}
}
return buffer.toString();
}
else
{
return null;
}
}
The gruntwork is done by the String.codePointAt() method which returns the Unicode codepoint at a particular index. For a String instance composed of combinational characters, the length of the String instance will not be the length of the number of visible characters, but the number of actual Unicode codepoints. For example, क and ् combine to form क् in Devanagari, and the above function will rightfully return U+0915 U+094d without any fuss as String.length() will return 2 for the combined character. Strings with supplementary characters will be with single codepoints for the individual characters - 𝒥𝒶𝓋𝒶𝓈𝒸𝓇𝒾𝓅𝓉 (the page will not display this String literal correctly, but you can copy this just fine; it should be Javascript but written using the supplementary character set for Mathematical alphanumeric symbols) will return U+1d4a5 U+1d4b6 U+1d4cb U+1d4b6 U+1d4c8 U+1d4b8 U+1d4c7 U+1d4be U+1d4c5 U+1d4c9.
public static void main(String[] args) {
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
CharsetEncoder encoder = charset.newEncoder();
try {
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap("\u0421\u043b\u0443\u0436\u0435\u0431\u043d\u0430\u044f"));
CharBuffer cbuf = decoder.decode(bbuf);
String s = cbuf.toString();
System.out.println(s);
} catch (CharacterCodingException e) {
e.printStackTrace();
}
}
I'm not aware of a build-in solution, so:
StringBuilder builder = new StringBuilder();
for(int i=0; i<yourString.length(); i++) {
builder.append(String.format("\\u%04x", yourString.charAt(i)));
}
String encoded = builder.toString();
Edit: sry, I thought you wanted to get the String encoded to \uXXXX expressions ...
You didn't saying what encoding you are after, but based on the tag I'm assuming you want the UTF-8 encoding. Here's how:
byte[] utf8 =
"\u0421\u043b\u0443\u0436\u0435\u0431\u043d\u0430\u044f".getBytes("UTF-8");
You can then write a simple loop to output the bytes in utf8 in hexadecimal or decimal ... or do something else with them.
System.out.println ("\u0421\u043b\u0443\u0436\u0435\u0431\u043d\u0430\u044f");
works like a charm for me:
Служебная

Categories