Please have a look at the following machine code
0111001101110100011100100110010101110011011100110110010101100100
This means something. I need to convert this to string. When I use Integer.parseInt() with the above as the string and 2 as the radix(to convert it to bytes), it gives number format exception.
And I believe I have to seperate this into sets of 8 pieces (like 01110011 , 10111010, etc). Am I correct?
Please help me to convert this correctly to string.
Thanks
final String s =
"0111001101110100011100100110010101110011011100110110010101100100";
final StringBuilder b = new StringBuilder();
for (int i = 0; i < s.length(); i+=8)
b.append((char)Integer.parseInt(s.substring(i,i+8),2));
System.out.println(b);
prints "stressed"
A shorter way of reading large integers is to use BigInteger
final String s = "0111001101110100011100100110010101110011011100110110010101100100";
System.out.println(new String(new BigInteger('0'+s, 2).toByteArray(), 0));
prints
stressed
It depends on the encoding of the String.
An ASCII coded string uses 1 byte for each character while a unicode coded string takes 2 bytes for each character. There are many other types of encodings. The binary layout differs for each encoding.
So you need to find the encoding that was used to write this string to binary format
Related
I have a function for hashing passwords, that returns a byte[] with entries using the full range of the byte datatype from -128 to 127. I have tried to convert the byte[] to a String using new String(byte_array, StandardCharsets.UTF_8);. This does return a String - however it can not properly encode negative numbers - hence it encodes them to a "�" character. When comparing two of those characters using: new String(new byte[]{-1}, StandardCharsets.UTF_8).equals(new String(new byte[]{-2}, StandardCharsets.UTF_8)) it turns out the String representation for all negative numbers is equal as the expression above returns true. While this doesn't fully ruin my hashing functionality as the hash of the same expression will still always yield the same result, this is obviously not what I want as it increases the chance of two different inputs yielding the same output drastically.
Is there some easy fix for this or any alternative idea how to convert the byte[] to a String? For context I want to use the String to later write it to a file to store it in a file and later read it again to compare it to other hashes.
Edit: After a bit of trying around with the tips from the comments my solution is to convert the byte[] to a char[] and add 128 to every value. The char array can then easily be converted to a String or be written to a file directly (byteHash is the byte[]):
char[] charHash = new char[byteHash.length];
for(int i = 0; i < byteHash.length; i++){
charHash[i] = (char) (byteHash[i]+128);
}
return new String(charHash);
I do not really like the solution but it works.
The appropriate solution to this is to use an encoding like hexadecimal (https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HexFormat.html) or Base64 (https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html) to convert an arbitrary byte sequence to a string reversibly.
I'm developing a JPEG decoder(I'm in the Huffman phase) and I want to write BinaryString's into a file.
For example, let's say we've this:
String huff = "00010010100010101000100100";
I've tried to convert it to an integer spliting it by 8 and saving it integer represantation, as I can't write bits:
huff.split("(?<=\\G.{8})"))
int val = Integer.parseInt(str, 2);
out.write(val); //writes to a FileOutputStream
The problem is that, in my example, if I try to save "00010010" it converts it to 18 (10010), and I need the 0's.
And finally, when I read :
int enter;
String code = "";
while((enter =in.read())!=-1) {
code+=Integer.toBinaryString(enter);
}
I got :
Code = 10010
instead of:
Code = 00010010
Also I've tried to convert it to bitset and then to Byte[] but I've the same problem.
Your example is that you have the string "10010" and you want the string "00010010". That is, you need to left-pad this string with zeroes. Note that since you're joining the results of many calls to Integer.toBinaryString in a loop, you need to left-pad these strings inside the loop, before concatenating them.
while((enter = in.read()) != -1) {
String binary = Integer.toBinaryString(enter);
// left-pad to length 8
binary = ("00000000" + binary).substring(binary.length());
code += binary;
}
You might want to look at the UTF-8 algorithm, since it does exactly what you want. It stores massive amounts of data while discarding zeros, keeping relevant data, and encoding it to take up less disk space.
Works with: Java version 7+
import java.nio.charset.StandardCharsets;
import java.util.Formatter;
public class UTF8EncodeDecode {
public static byte[] utf8encode(int codepoint) {
return new String(new int[]{codepoint}, 0, 1).getBytes(StandardCharsets.UTF_8);
}
public static int utf8decode(byte[] bytes) {
return new String(bytes, StandardCharsets.UTF_8).codePointAt(0);
}
public static void main(String[] args) {
System.out.printf("%-7s %-43s %7s\t%s\t%7s%n",
"Char", "Name", "Unicode", "UTF-8 encoded", "Decoded");
for (int codepoint : new int[]{0x0041, 0x00F6, 0x0416, 0x20AC, 0x1D11E}) {
byte[] encoded = utf8encode(codepoint);
Formatter formatter = new Formatter();
for (byte b : encoded) {
formatter.format("%02X ", b);
}
String encodedHex = formatter.toString();
int decoded = utf8decode(encoded);
System.out.printf("%-7c %-43s U+%04X\t%-12s\tU+%04X%n",
codepoint, Character.getName(codepoint), codepoint, encodedHex, decoded);
}
}
}
https://rosettacode.org/wiki/UTF-8_encode_and_decode#Java
UTF-8 is a variable width character encoding capable of encoding all 1,112,064[nb 1] valid code points in Unicode using one to four 8-bit bytes.[nb 2] The encoding is defined by the Unicode Standard, and was originally designed by Ken Thompson and Rob Pike.[1][2] The name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.[3]
It was designed for backward compatibility with ASCII. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as "/" (slash) in filenames, "\" (backslash) in escape sequences, and "%" in printf.
https://en.wikipedia.org/wiki/UTF-8
Binary 11110000 10010000 10001101 10001000 becomes F0 90 8D 88 in UTF-8. Since you are storing it as text, you go from having to store 32 characters to storing 8. And because it's a well known and well designed encoding, you can reverse it easily. All the math is done for you.
Your example of 00010010100010101000100100 (or rather 00000001 0010100 0101010 00100100) converts to *$ (two unprintable characters on my machine). That's the UTF-8 encoding of the binary. I had mistakenly used a different site that was using the data I put in as decimal instead of binary.
https://onlineutf8tools.com/convert-binary-to-utf8
For a really good explanation of UTF-8 and how it can apply to the answer:
https://hackaday.com/2013/09/27/utf-8-the-most-elegant-hack/
Edit:
I took this question as a way to reduce the amount of characters needed to store values, which is a type of encoding. UTF-8 is a type of encoding. Used in a "non-standard" way, the OP can use UTF-8 to encode their strings of 0's & 1's in a much shorter format. That's how this answer is relevant.
If you concatenate the characters, you can go from 4x 8 bits (32 bits) to 8x 8 bits (64 bits) easily and encode a value as large as 9,223,372,036,854,775,807.
I'm writing a web application in Google app Engine. It allows people to basically edit html code that gets stored as an .html file in the blobstore.
I'm using fetchData to return a byte[] of all the characters in the file. I'm trying to print to an html in order for the user to edit the html code. Everything works great!
Here's my only problem now:
The byte array is having some issues when converting back to a string. Smart quotes and a couple of characters are coming out looking funky. (?'s or japanese symbols etc.) Specifically it's several bytes I'm seeing that have negative values which are causing the problem.
The smart quotes are coming back as -108 and -109 in the byte array. Why is this and how can I decode the negative bytes to show the correct character encoding?
The byte array contains characters in a special encoding (that you should know). The way to convert it to a String is:
String decoded = new String(bytes, "UTF-8"); // example for one encoding type
By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte is signed, it covers the range from -128 to 127.
-109 = 0x93: Control Code "Set Transmit State"
The value (-109) is a non-printable control character in UNICODE. So UTF-8 is not the correct encoding for that character stream.
0x93 in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". The next line provides a test code:
System.out.println(new String(new byte[]{-109}, "Cp1252"));
Java 7 and above
You can also pass your desired encoding to the String constructor as a Charset constant from StandardCharsets. This may be safer than passing the encoding as a String, as suggested in the other answers.
For example, for UTF-8 encoding
String bytesAsString = new String(bytes, StandardCharsets.UTF_8);
You can try this.
String s = new String(bytearray);
public class Main {
/**
* Example method for converting a byte to a String.
*/
public void convertByteToString() {
byte b = 65;
//Using the static toString method of the Byte class
System.out.println(Byte.toString(b));
//Using simple concatenation with an empty String
System.out.println(b + "");
//Creating a byte array and passing it to the String constructor
System.out.println(new String(new byte[] {b}));
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
new Main().convertByteToString();
}
}
Output
65
65
A
public static String readFile(String fn) throws IOException
{
File f = new File(fn);
byte[] buffer = new byte[(int)f.length()];
FileInputStream is = new FileInputStream(fn);
is.read(buffer);
is.close();
return new String(buffer, "UTF-8"); // use desired encoding
}
I suggest Arrays.toString(byte_array);
It depends on your purpose. For example, I wanted to save a byte array exactly like the format you can see at time of debug that is something like this : [1, 2, 3] If you want to save exactly same value without converting the bytes to character format, Arrays.toString (byte_array) does this,. But if you want to save characters instead of bytes, you should use String s = new String(byte_array). In this case, s is equal to equivalent of [1, 2, 3] in format of character.
The previous answer from Andreas_D is good. I'm just going to add that wherever you are displaying the output there will be a font and a character encoding and it may not support some characters.
To work out whether it is Java or your display that is a problem, do this:
for(int i=0;i<str.length();i++) {
char ch = str.charAt(i);
System.out.println(i+" : "+ch+" "+Integer.toHexString(ch)+((ch=='\ufffd') ? " Unknown character" : ""));
}
Java will have mapped any characters it cannot understand to 0xfffd the official character for unknown characters. If you see a '?' in the output, but it is not mapped to 0xfffd, it is your display font or encoding that is the problem, not Java.
I am creating an encryption algorithm and is to XOR two strings. While I know how to XOR the two strings the problem is the length. I have two byte arrays one for the plain text which is of a variable size and then the key which is of 56 bytes lets say. What I want to know is what is the correct method of XORing the two strings. Concatenate them into one String in Binary and XOR the two values? Have each byte array position XOR a concatenated Binary value of the key and such. Any help is greatly appreciated.
Regards,
Milinda
To encode just move through the array of bytes from the plain text, repeating the key as necessary with the mod % operator. Be sure to use the same character set at both ends.
Conceptually we're repeating the key like this, ignoring encoding.
hello world, there are sheep
secretsecretsecretsecretsecr
Encrypt
String plainText = "hello world, there are sheep";
Charset charSet = Charset.forName("UTF-8");
byte[] plainBytes = plainText.getBytes(charSet);
String key = "secret";
byte[] keyBytes = key.getBytes(charSet);
byte[] cipherBytes = new byte[plainBytes.length];
for (int i = 0; i < plainBytes.length; i++) {
cipherBytes[i] = (byte) (plainBytes[i] ^ keyBytes[i
% keyBytes.length]);
}
String cipherText = new String(cipherBytes, charSet);
System.out.println(cipherText);
To decrypt just reverse the process.
// decode
for (int i = 0; i < cipherBytes.length; i++) {
plainBytes[i] = (byte) (cipherBytes[i] ^ keyBytes[i
% keyBytes.length]);
}
plainText = new String(plainBytes, charSet); // <= make sure same charset both ends
System.out.println(plainText);
(As noted in comments, you shouldn't use this for anything real. Proper cryptography is incredibly hard to do properly from scratch - don't do it yourself, use existing implementations.)
There's no such concept as "XOR" when it comes to strings, really. XOR specifies the result given two bits, and text isn't made up of bits - it's made up of characters.
Now you could just take the Unicode representation of each character (an integer) and XOR those integers together - but the result may well be a sequence of integers which is not a valid Unicode representation of any valid string.
It's not clear that you're even thinking in the right way to start with - you talk about having strings, but also having 56 bytes. You may have an encoded representation of a string (e.g. the result of converting a string to UTF-8) but that's not the same thing.
If you've got two byte arrays, you can easily XOR those together - and perhaps cycle back to the start of one of them if it's shorter than the other, so that the result is always the same length as the longer array. However, even if both inputs are (say) UTF-8 encoded text, the result often won't be valid UTF-8 encoded text. If you must have the result in text form, I'd suggest using Base64 at that point - there's a public domain base64 encoder which has a simple API.
I have a byte array in java. That array contains '%' symbol somewhere in it. I want to find the position of that symbol in that array. Is there any way to find this?
Thanks in Advance!
[EDIT]
I tried below code and it worked fine.
byte[] b = {55,37,66};
String s = new String(b);
System.out.println(s.indexOf("%"));
I have a doubt. Is every character takes exactly one byte in java?
A correct and more direct Guava solution:
Bytes.indexOf(byteArray, (byte) '%');
using Google Guava:
com.google.common.primitives.Bytes.asList(byteArray).indexOf(Byte.valueOf('%'))
I come from the future with some streaming and lambda stuff.
If it's just a matter of finding a byte in a byte[]:
Input:
byte[] bytes = {55,37,66};
byte findByte = '%';
With streaming and lambda stuff:
OptionalInt firstMatch = IntStream.range(0, bytes.length).filter(i -> bytes[i] == findByte).findFirst();
int index = firstMatch.isPresent ? firstMatch.getAsInt() : -1;
Which is pretty much the same as:
Actually, I think I still just prefer this. (e.g. and put it in some utility class).
int index = -1;
for (int i = 0 ; i < bytes.length ; i++)
if (bytes[i] == findByte)
{
index = i;
break;
}
EDIT
Your question is actually more about finding a character rather than finding a byte.
What could be improved in your solution:
String s = new String(bytes); // will not always give the same result
// there is an invisible 2nd argument : i.e. charset
String s = new String(bytes, charset); // default charset depends on your system.
So, your program may act different on different platforms.
Some charsets use 1 byte per character, others use 2, 3, ... or are irregular.
So, the size of your string may vary from platform to platform.
Secondly, some byte sequences cannot be represented as strings at all. i.e. if the charset does not have a character for the matching value.
So, how could you improve it:
If you just know that your byte array will always contain plain old ascii values, you could use this:
byte[] b = {55,37,66};
String s = new String(b, StandardCharsets.US_ASCII);
System.out.println(s.indexOf("%"));
On the other hand, if you know that your content contains UTF-8 characters, use :
byte[] b = {55,37,66};
String s = new String(b, StandardCharsets.UTF-8);
System.out.println(s.indexOf("%"));
etc ...