How to convert a string to a stream of bits in java - java

How to convert a string to a stream of bits zeroes and ones
what i did i take a string then convert it to an array of char then i used method
called forDigit(char,int) ,but it does not give me the character as a stream of 0 and 1
could you help please.
also how could i do the reverse from bit to a char. pleaes show me a sample

Its easiest if you take two steps. String supports converting from String to/from byte[] and BigInteger can convert byte[] into binary text and back.
String text = "Hello World!";
System.out.println("Text: "+text);
String binary = new BigInteger(text.getBytes()).toString(2);
System.out.println("As binary: "+binary);
String text2 = new String(new BigInteger(binary, 2).toByteArray());
System.out.println("As text: "+text2);
Prints
Text: Hello World!
As binary: 10010000110010101101100011011000110111100100000010101110110111101110010011011000110010000100001
As text: Hello World!

I tried this one ..
public String toBinaryString(String s) {
char[] cArray=s.toCharArray();
StringBuilder sb=new StringBuilder();
for(char c:cArray)
{
String cBinaryString=Integer.toBinaryString((int)c);
sb.append(cBinaryString);
}
return sb.toString();
}

String strToConvert = "abc";
byte [] bytes = strToConvert.getBytes();
StringBuilder bits = new StringBuilder(bytes.length * 8);
System.err.println(strToConvert + " contains " + bytes.length +" number of bytes");
for(byte b:bytes) {
bits.append(Integer.toString(b, 2));
}
System.err.println(bits);
char [] chars = new char[bits.length()];
bits.getChars(0, bits.length(), chars, chars.length);

Related

Byte Array, when converted to string, then concatenated, returns equal String but unequal byte array

I have a byte array. I need to concatenate a string with a delimiter to it. Then I want to get back the byte array. After all this logic, the output byte array is not equal to the input. In java:
This fails at the last line:
#Test
void test1() {
byte[] initialBytes = RandomUtils.nextBytes(64);
String initialString = new String(initialBytes, StandardCharsets.UTF_8);
String concatenatedString = String.join("\t", "Pre", initialString);
byte[] concatenatedStringToBytes = concatenatedString.getBytes(StandardCharsets.UTF_8);
String concatenatedBytesBackToString = new String(concatenatedStringToBytes, StandardCharsets.UTF_8);
int indexOfDelimeter = concatenatedBytesBackToString.indexOf("\t");
String finalString = concatenatedBytesBackToString.substring(indexOfDelimeter + 1);
byte[] finalBytes = finalString.getBytes(StandardCharsets.UTF_8);
assertEquals(initialString, finalString);
assertTrue(Arrays.equals(initialBytes, finalBytes));
}
In Java, String values use UTF_16.
Since UTF_16 and UTF_8 have different character coverage, conversions from UTF_8 to UTF_16 can result in loss of information (if those non-matching characters are used). So, when you convert back to UTF_8, you will not get the same byte array.
public static void tryCharsetEncodingForRandomBytes() {
byte[] initialBytes = getRandomBytes(64);
String initialString = new String(initialBytes, StandardCharsets.UTF_8);
byte[] finalBytes = initialString.getBytes(StandardCharsets.UTF_8);
String finalString = new String(finalBytes, StandardCharsets.UTF_8);
System.out.println(finalString.equals(initialString));
System.out.println(initialBytes.length);
System.out.println(finalBytes.length);
System.out.println(Arrays.equals(initialBytes, finalBytes));
}
Output :
true
64
103
false
You will not encounter this loss of information when dealing with more popular characrers like AlphaNumerics which are commons in both UTF_16 and UTF_8 charsets.
public static void tryCharsetEncodingForAlphanumeric() {
String alphaNumeric = "abcd1234";
byte[] initialBytes = alphaNumeric.getBytes(StandardCharsets.UTF_8);
String initialString = new String(initialBytes, StandardCharsets.UTF_8);
byte[] finalBytes = initialString.getBytes(StandardCharsets.UTF_8);
String finalString = new String(finalBytes, StandardCharsets.UTF_8);
System.out.println(finalString.equals(initialString));
System.out.println(initialBytes.length);
System.out.println(finalBytes.length);
System.out.println(Arrays.equals(initialBytes, finalBytes));
}
Output:
true
8
8
true
Which means that your tests will pass as long as you are dealing with common characters in UTF_8 and UTF_16.
public static void yourTestScenarioWithAlphaNumeric() {
String alphaNumeric = "abcdefghijklmop1234567890";
byte[] initialBytes = alphaNumeric.getBytes(StandardCharsets.UTF_8);
String initialString = new String(initialBytes, StandardCharsets.UTF_8);
String concatenatedString = String.join("\t", "Pre", initialString);
byte[] concatenatedStringToBytes = concatenatedString.getBytes(StandardCharsets.UTF_8);
String concatenatedBytesBackToString = new String(concatenatedStringToBytes, StandardCharsets.UTF_8);
int indexOfDelimiter = concatenatedBytesBackToString.indexOf("\t");
String finalString = concatenatedBytesBackToString.substring(indexOfDelimiter + 1);
byte[] finalBytes = finalString.getBytes(StandardCharsets.UTF_8);
System.out.println(finalString.equals(initialString));
System.out.println(Arrays.equals(initialBytes, finalBytes));
}
Output:
true
true

Need help in converting EBCDIC to Hexadecimal

I am writing an hive UDF to convert the EBCDIC character to Hexadecimal.
Ebcdic characters are present in hive table.Currently I am able to convert it, bit it is ignoring few characters while conversion.
Example:
This is the EBCDIC value stored in table:
AGNSAñA¦ûÃÃÂõÂjÂq  à ()
Converted hexadecimal:
c1c7d5e2000a5cd4f6ef99187d07067203a0200258dd9736009f000000800017112400000000001000084008403c000000000000000080
What I want as output:
c1c7d5e200010a5cd4f6ef99187d0706720103a0200258dd9736009f000000800017112400000000001000084008403c000000000000000080
It is ignoring to convert the below EBCDIC characters:
01 - It is start of heading
10 - It is a escape
15 - New line.
Below is the code I have tried so far:
public class EbcdicToHex extends UDF {
public String evaluate(String edata) throws UnsupportedEncodingException {
byte[] ebcdiResult = getEBCDICRawData(edata);
String hexResult = getHexData(ebcdiResult);
return hexResult;
}
public byte[] getEBCDICRawData (String edata) throws UnsupportedEncodingException {
byte[] result = null;
String ebcdic_encoding = "IBM-037";
result = edata.getBytes(ebcdic_encoding);
return result;
}
public String getHexData(byte[] result){
String output = asHex(result);
return output;
}
public static String asHex(byte[] buf) {
char[] HEX_CHARS = "0123456789abcdef".toCharArray();
char[] chars = new char[2 * buf.length];
for (int i = 0; i < buf.length; ++i) {
chars[2 * i] = HEX_CHARS[(buf[i] & 0xF0) >>> 4];
chars[2 * i + 1] = HEX_CHARS[buf[i] & 0x0F];
}
return new String(chars);
}
}
While converting, its ignoring few EBCDIC characters. How to make them also converted to hexadecimal?
I think the problem lies elsewhere, I created a small testcase where I create a String based on those 3 bytes you claim to be ignored, but in my output they do seem to be converted correctly:
private void run(String[] args) throws Exception {
byte[] bytes = new byte[] {0x01, 0x10, 0x15};
String str = new String(bytes, "IBM-037");
byte[] result = getEBCDICRawData(str);
for(byte b : result) {
System.out.print(Integer.toString(( b & 0xff ) + 0x100, 16).substring(1) + " ");
}
System.out.println();
System.out.println(evaluate(str));
}
Output:
01 10 15
011015
Based on this it seems both your getEBCDICRawData and evaluate method seem to be working correctly and makes me believe your String value may already be incorrect to start with. Could it be the String is already missing those characters? Or perhaps a long shot, but maybe the charset is incorrect? There are different EBCDIC charsets, so maybe the String is composed using a different one? Although I doubt this would make much difference for the 01, 10 and 15 bytes.
As a final remark, but probably unrelated to your problem, I usually prefer to use the encode/decode functions on the charset object to do such conversions:
String charset = "IBM-037";
Charset cs = Charset.forName(charset);
ByteBuffer bb = cs.encode(str);
CharBuffer cb = cs.decode(bb);

Convert inputstream byte to character

I have text with contents
12 13 14
The text has 8 spaces between values 12 and 13 and 13 and 14
My java method is receiving the text as inputstream thru an argument and storing each contents in a byte array, and further then convert each byte to a character
public class FileUpload implements RequestStreamHandler{
String fileObjKeyName = "sample1.txt";
String bucketName="";
/**
* #param args
*/
#Override
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
LambdaLogger logger = context.getLogger();
byte[] bytes = IOUtils.toByteArray(inputStream);
StringBuilder sb = new StringBuilder();
StringBuilder sb1 = new StringBuilder();
sb.append("[ ");
sb1.append("[ ");
for (byte b : bytes) {
sb.append(b);
char ch = (char) b;
sb1.append(ch);
}
sb.append("]");
sb1.append("] ");
logger.log(sb.toString());
logger.log(sb1.toString());
}
}
The Decimal representation for the each bytes are correctly printed as below
[ 4950323232323232323249513232323232323232324952]
However when converted to character, only one decimal value '32' (for spaces) between the values are getting converted, skipping all remaining in between spaces bytes.
[ 12 13 14]
Can anyone suggest, the reason for this.
How you convert byte to string? it will be same. see below code:
public static void main(String[] args) {
byte[] bytes = "12 13 14".getBytes();
System.out.println(Arrays.toString(bytes));
String str = new String(bytes,StandardCharsets.UTF_8);
System.out.println(str);
}
Your example shows that you're using AWS, for which you will often check the results and the produced logs online, with a tool that supports HTML.
And in HTML, when you write several consecutive spaces, they are displayed as only one.
Your String object, withing Java, does contain the 8 spaces. But when you give it to a logger to be eventually displayed in a webpage, the spaces are collapsed and displayed as only one.
This is easy to prove: just add the following code at the end of your method:
String s = sb1.toString();
logger.log("s length: " + s.length());
for(int i = 0; i < s.length(); i++) {
logger.log("s[" + i + "]: " + s.charAt(i));
}
It demonstrates the length and exact content of the String. If you're not seeing that exact content when displaying the String, it is the fault of the tool that displays it.

Encode/decode hex to utf-8 string

Working on web application which accepts all UTF-8 character's including greek characters following are strings that i want to convert to hex.
Following are different language string which are not working in my current code
ЫЙБПАРО Εγκυκλοπαίδεια éaös Größe Größe
Following are hex conversions by javascript function mentioned below
42b41941141f41042041e 3953b33ba3c53ba3bb3bf3c03b13af3b43b53b93b1 e961f673 4772c3192c2b6c3192c217865 4772f6df65
Javascript function to convert above string to hex
function encode(string) {
var str= "";
var length = string.length;
for (var i = 0; i < length; i++){
str+= string.charCodeAt(i).toString(16);
}
return str;
}
Here it is not giving any error to convert but at java side I'm unable to parse such string used following java code to convert hex
public String HexToString(String hex){
StringBuilder finalString = new StringBuilder();
StringBuilder tempString = new StringBuilder();
for( int i=0; i<hex.length()-1; i+=2 ){
String output = hex.substring(i, (i + 2));
int decimal = Integer.parseInt(output, 16);
finalString.append((char)decimal);
tempString.append(decimal);
}
return finalString.toString();
}
It throws error while parsing above hex string giving parse exception.
Suggest me the solution
Javascript works with 16-bit unicode characters, therefore charCodeAt might return any number between 0 and 65535. When you encode it to hex you get strings from 1 to 4 chars, and if you simply concatenate these, there's no way for the other party to find out what characters have been encoded.
You can work around this by adding delimiters to your encoded string:
function encode(string) {
return string.split("").map(function(c) {
return c.charCodeAt(0).toString(16);
}).join('-');
}
alert(encode('größe Εγκυκλοπαίδεια 维'))

String to bytes to string

The question is in comments in the code, I thought that'd be an easier way to ask...
Easy question, but I can't seem to find an answer. I want to convert a String to it's byte[] (easy, String.getBytes()). Then I want to convert a String of bytes (101011010101001 for example) to a byte[] and get the String value of that (that's easy too: new String(byte[]))
Here's what I've got so far:
Scanner scan = new Scanner(System.in);
String string = scan.nextLine();
String byteString = "";
for (byte b : string.getBytes()) {
byteString += b;
}
System.out.println(byteString);
//This isn't exactly how it works, these two parts in separate methods, but you get the idea...
String byteString = scan.nextLine();
byte[] bytes = byteString.literalToBytes() //<== or something like that...
//The line above is pretty much all I need...
String string = new String(bytes);
System.out.println(string);
This won't work. The problem is that when you convert your bytes to a string you are going to get a string like
2532611134
So analyzing this string, is the first byte 2, or 25, or 253?
The only way to make this work would be to use a DecimalFormat and make sure every byte is 3 characters long in your string
EDIT
Please see this answer for a solution.
With this you can:
String string = scan.nextLine();
String convertByte = convertByte(string.getBytes());
System.out.println(convertByte);
String byteString = scan.nextLine();
System.out.println(new String(convertStr(byteString)));
Alright, because the commenter who pointed me to this question (which lead me to this answer) isn't going to answer, I'll just post the solution here:
Scanner scan = new Scanner(System.in);
String pass = scan.nextLine();
StringBuilder byteString = new StringBuilder();
for (byte b : pass.getBytes()) {
b = (byte) (b);
byteString.append(b).append(","); //appending that comma is what does the trick.
}
System.out.println(byteString);
//
String[] split = byteString.toString().split(","); //splitting by that comma is what does the trick... too...
byte[] bytes = new byte[split.length];
for (int i = 0; i < split.length; i++) {
bytes[i] = (byte) (Byte.valueOf(split[i]).byteValue());
}
System.out.println(new String(bytes));
I guess what you want is this
// to get back the string from byte array
StringBuilder byteString = new StringBuilder();
for (byte b : string.getBytes()) {
byteString.append((char)b);
}
System.out.println(byteString.toString());
// to get the binary representation from string
StringBuilder byteString = new StringBuilder();
for (byte b : string.getBytes()) {
System.out.print(Integer.toBinaryString((int)b));
}
System.out.println(byteString.toString());

Categories