How to unpack a binary file in java? - java

May somebody help me to know how can I do in java what I do in ruby with the code below.
The ruby code below uses unpack('H*')[0] to stores the complete binary file content in variable "var" in ASCII format.
IO.foreach(ARGV[0]){ |l|
var = l.unpack('H*')[0]
} if File.exists?(ARGV[0])
Update:
Hi Aru. I've tested the way you say in the form below
byte[] bytes = Files.readAllBytes(testFile.toPath());
str = new String(bytes,StandardCharsets.UTF_8);
System.out.println(str);
But when I print the content of variable "str", the printout shows only little squares, like is not decoding the content. I'd like to store in "str" the content of binary file in ASCII format.
Update #2:
Hello Aru, I'm trying to store in array of bytes all the binary file's content but I don't know how to do it. It worked
with "FileUtils.readFileToByteArray(myFile);" but this is an external library, is there a built in option to do it?
File myFile = new File("./Binaryfile");
byte[] binary = FileUtils.readFileToByteArray(myFile); //I have issues here to store in array of bytes all binary content
String hexString = DatatypeConverter.printHexBinary(binary);
System.out.println(hexString);
Update #3:
Hello ursa and Aru, Thanks for your help. I've tried both of your solutions and works so fine, but seeing Files.readAllBytes() documentation
it says that is not intended to handle big files and the binary file I want to analyse is more than 2GB :(. I see an option with your solutions, read
chunk by chunk. The chunks inside the binary are separated by the sequence FF65, so is there a way to tweak your codes to only process one chunk at a
time based on the chunk separator? If not, maybe with some external library.
Update #4:
Hello, I'm trying to modify your code since I'd like to read variable size chunks based of
value of "Var".
How can I set an offset to read the next chunk in your code?
I mean,
- in first iteration read the first 1024,
- In this step Var=500
- in 2d iteration read the next 1024 bytes, beginning from 1024 - Var = 1024-500 = 524
- In this step Var=712
- in 3rd iteration read the next 1024 bytes, beginning from 1548 - Var = 1548-712 = 836
- and so on
is there a method something like read(number of bytes, offset)?

You can use commons-codec Hex class + commons-io FileUtils class:
byte[] binary = FileUtils.readFileToByteArray(new File("/Users/user/file.bin");
String hexEncoded = Hex.encodeHex(binary);
But if you just want to read content of TEXT file you can use:
String content = FileUtils.readFileToString(new File("/Users/user/file.txt", "ISO-8859-1");
With JRE 7 you can use standard classes:
public static void main(String[] args) throws Exception {
Path path = Paths.get("path/to/file");
byte[] data = Files.readAllBytes(path);
char[] hexArray = "0123456789ABCDEF".toCharArray();
char[] hexChars = new char[data.length * 2];
for ( int j = 0; j < data.length; j++ ) {
int v = data[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
System.out.println(new String(hexChars));
}

This should do what you want:
try {
File inputFile = new File("someFile");
byte inputBytes[] = Files.readAllBytes(inputFile.toPath());
String hexCode = DatatypeConverter.printHexBinary(inputBytes);
System.out.println(hexCode);
} catch (IOException e) {
System.err.println("Couldn't read file: " + e);
}
If you don't want to read the entire file at once, you can do so as well. You'll need an InputStream of some sort.
File inputFile = new File("C:\\Windows\\explorer.exe");
try (InputStream input = new FileInputStream(inputFile)) {
byte inputBytes[] = new byte[1024];
int readBytes;
// Read until all bytes were read
while ((readBytes = input.read(inputBytes)) != -1) {
System.out.printf("%4d bytes were read.\n", readBytes);
System.out.println(DatatypeConverter.printHexBinary(inputBytes));
}
} catch (FileNotFoundException ex) {
System.err.println("Couldn't read file: " + ex);
} catch (IOException ex) {
System.err.println("Error while reading file: " + ex);
}

Related

How to Inflate the same data in Python

I have a code in Java which works fine, and I need to inflate the same data in python
import org.apache.commons.codec.binary.Base64;
import org.apache.commons.codec.binary.StringUtils;
public static byte[] Inflate(byte[] compressedContent) throws IOException {
ByteArrayOutputStream s = new ByteArrayOutputStream();
InflaterInputStream iis = new InflaterInputStream(new ByteArrayInputStream(compressedContent), new Inflater(true));
byte[] buffer = new byte[4096];
int len;
while ((len = iis.read(buffer)) != -1) {
s.write(buffer, 0, len);
}
iis.close();
s.flush();
s.close();
return s.toByteArray();
}
Using
StringUtils.newStringUtf8(inflate(Base64.decodeBase64("PZLHrptQAET_xevHE8VgnB1gyqVjig0bRLkUg-k9yr_HiZTsZo5mZjU_T1GSwHEMp7aCzenH6fR1-ivDae_gx7MwGuDwoWX6PwN3uYjFpDRK2XZRfnJQQXA5MIK3N_s7oEDFb9qruFmVNtmCtuuOX6qcTEVP5k-Hv7t-mVnfo-XgDa4LBkIt9lMmtKBz4kful_eDNBUONYQ95CXHBRY3dSlEYcC063oXC8hMkKLXRof6Re3vS8U1w-A0oRQt0spqnGifob-1orDhK-bMYflYVOR8KQC_YxVjjekaHuUxvQOZXBgdI4ubvl6z-p0BF-AjY2qNca48qY6j80Wa6Wxjvl8c31AG5V6vto8FG3vZ2c1jvt28MuvIdyjTx1otQPLMC71iOHjqtpFihNLmQVhPdSzbuM8rJ_eocJ4z12DzvFDZGwyeC109TGV2xjsQ32kv5VGB2NH1XFiGVd8xkE9PRI1oDHFwRck_25y3KlxMWKmlDrw7Br75nrunSsrNJbZwzq5rTRivAuhmBZz12RRacuxyeSz5ZIcMqFk8Il8U7nYEsLHHqLRP92oEGfvQZgfqLuuNWf-qlXqc56TiLpdjlfvAU-LwGG599wrdKST41sHeiKCbCZckNLW-aT8V0_tC7FzPh1pZWO6uykgGHtpOp0J9KzxKlPdXvwy9FTV0geUAmjERfR_mgwDciiqlr0qahOlKSMrW524DzAY4Fv8-18x1_XWCW1d-aFh-CE2dUfTXbw")))
The Java code works well, but I cannot convert it to Python as follows..
def Base64UrlDecode(data):
"""Decode base64, padding being optional.
:param data: Base64 data as an ASCII byte string
:returns: The decoded byte string.
"""
if isinstance(data, unicode):
data = data.encode('utf-8')
missing_padding = len(data) % 4
if missing_padding != 0:
data += b'=' * (4 - missing_padding)
return base64.decodestring(data)
url_decode = Base64UrlDecode(token) # The token is the same string as the above one.
# https://docs.python.org/2/library/zlib.html#zlib.compressobj
for i in range(-15, 32): # try all possible ones, but none works.
try:
decode = zlib.decompress(url_decode, i)
except:
pass
The true in Inflater(true)in Java means inflation of raw deflate data with no header or trailer. To get that same operation in Python, the second argument to zlib.decompress() must be -15. So you don't need to try different values there.
The next thing to check is your Base64 decoding. The result of that must be different in the two cases, so look to see where they are different to find your bug.

Java - My Huffman decompression doesn't work in rare cases. Can't figure out why

I just finished coding a Huffman compression/decompression program. The compression part of it seems to work fine but I am having a little bit of a problem with the decompression. I am quite new to programming and this is my first time doing any sort of byte manipulation/file handling so I am aware that my solution is probably awful :D.
For the most part my decompression method works as intended but sometimes it drops data after decompression (aka my decompressed file is smaller than my original file).
Also whenever I try to decompress a file that isnt a plain text file (for example a .jpg) the decompression returns a completely empty file (0 bytes), the compression compresses these other types of files just fine though.
Decompression method:
public static void decompress(File file){
try {
BitFileReader bfr = new BitFileReader(file);
int[] charFreqs = new int[256];
TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
File nF = new File(file.getName() + "_decomp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
int uniqueBytes;
int counter = 0;
int byteCount = 0;
uniqueBytes = data.readUnsignedByte();
// Read frequency table
while (counter < uniqueBytes){
int index = data.readUnsignedByte();
int freq = data.readInt();
charFreqs[index] = freq;
counter++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillDecodeMap(tree, new StringBuffer(), decodeMap);
// Skip BitFileReader position to actual compressed code
bfr.skip(uniqueBytes*5);
// Get total number of compressed bytes
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] > 0){
byteCount += charFreqs[i];
}
}
// Decompress data and write
counter = 0;
StringBuffer code = new StringBuffer();
while(bfr.hasNextBit() && counter < byteCount){
code.append(""+bfr.nextBit());
if(decodeMap.containsKey(code.toString())){
bfw.writeByte(decodeMap.get(code.toString()));
code.setLength(0);
counter++;
}
}
bfw.close();
bfr.close();
data.close();
System.out.println("Decompression successful!");
}
catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
File f = new File("test");
compress(f);
f = new File("test_comp");
decompress(f);
}
}
When I compress the file I save the "character" (byte) values and the frequencies of each unique "character" + the compressed bytes in the same file (all in binary form). I then use this saved info to fill by charFreqs array in my decompress() method and then use that array to build my tree. The formatting of the saved structure looks like this:
<n><value 1><frequency>...<value n><frequency>[the compressed bytes]
(without the <> of course) where n is the number of unique bytes/characters I have in my original text (AKA my leaf values).
I have tested my code a bit and the bytes seem to get dropped somewhere in the while() loop at the bottom of my decompress method (charFreqs[] and the tree seem to retain all the original byte values).
EDIT: Upon request I have now shortened my post a bit in an attempt to make it less cluttered and more "straight to the point".
EDIT 2: I fixed it (but not fully)! The fault was in my BitFileWriter and not in my decompress method. My decompression still does not function properly though. Whenever I try to decompress something that isn't a plain text file (for example a .jpg) it returns a empty "decompressed" file (0 bytes in size). I have no idea what is causing this...

Stream decoding of Base64 data

I have some large base64 encoded data (stored in snappy files in the hadoop filesystem).
This data was originally gzipped text data.
I need to be able to read chunks of this encoded data, decode it, and then flush it to a GZIPOutputStream.
Any ideas on how I could do this instead of loading the whole base64 data into an array and calling Base64.decodeBase64(byte[]) ?
Am I right if I read the characters till the '\r\n' delimiter and decode it line by line?
e.g. :
for (int i = 0; i < byteData.length; i++) {
if (byteData[i] == CARRIAGE_RETURN || byteData[i] == NEWLINE) {
if (i < byteData.length - 1 && byteData[i + 1] == NEWLINE)
i += 2;
else
i += 1;
byteBuffer.put(Base64.decodeBase64(record));
byteCounter = 0;
record = new byte[8192];
} else {
record[byteCounter++] = byteData[i];
}
}
Sadly, this approach doesn't give any human readable output.
Ideally, I would like to stream read, decode, and stream out the data.
Right now, I'm trying to put in an inputstream and then copy to a gzipout
byteBuffer.get(bufferBytes);
InputStream inputStream = new ByteArrayInputStream(bufferBytes);
inputStream = new GZIPInputStream(inputStream);
IOUtils.copy(inputStream , gzipOutputStream);
And it gives me a
java.io.IOException: Corrupt GZIP trailer
Let's go step by step:
You need a GZIPInputStream to read zipped data (that and not a GZIPOutputStream; the output stream is used to compress data). Having this stream you will be able to read the uncompressed, original binary data. This requires an InputStream in the constructor.
You need an input stream capable of reading the Base64 encoded data. I suggest the handy Base64InputStream from apache-commons-codec. With the constructor you can set the line length, the line separator and set doEncode=false to decode data. This in turn requires another input stream - the raw, Base64 encoded data.
This stream depends on how you get your data; ideally the data should be available as InputStream - problem solved. If not, you may have to use the ByteArrayInputStream (if binary), StringBufferInputStream (if string) etc.
Roughly this logic is:
InputStream fromHadoop = ...; // 3rd paragraph
Base64InputStream b64is = // 2nd paragraph
new Base64InputStream(fromHadoop, false, 80, "\n".getBytes("UTF-8"));
GZIPInputStream zis = new GZIPInputStream(b64is); // 1st paragraph
Please pay attention to the arguments of Base64InputStream (line length and end-of-line byte array), you may need to tweak them.
Thanks to Nikos for pointing me in the right direction.
Specifically this is what I did:
private static final byte NEWLINE = (byte) '\n';
private static final byte CARRIAGE_RETURN = (byte) '\r';
byte[] lineSeparators = new byte[] {CARRIAGE_RETURN, NEWLINE};
Base64InputStream b64is = new Base64InputStream(inputStream, false, 76, lineSeparators);
GZIPInputStream zis = new GZIPInputStream(b64is);
Isn't 76 the length of the Base64 line? I didn't try with 80, though.

Parsing byte array that contains different data types and able to get each value correctly using Java

I am new to Java, working on byte arrays.
I am having a Blob which is created in Database, which contains a double and float value. Now I have to read that into a byte array and should able to get the float and double separately.
I am read the blob information into the byte array like so:
FileInputStream fin = new FileInputStream(file);
byte[] fileContent = new byte[(int)file.length()];
fin.read(fileContent);
and reading the byte array like
for(int i = 0; i < fileContent.length; i++)
{
System.out.println("bit " + i + "= " + fileContent[i]);
}
This is giving byte
bit 0= -57
bit 1= -16
bit 2= -90
bit 3= -109
bit 4= 66
bit 5= -90
bit 6= 116
bit 7= -25
bit 8= -100
You may use longBitsToDouble and intBitsToFloat to convert an long (64 bits) or an int(32 bits). However, you must take care if the floats have a binary layout matching that of Java; and you have take care to assemble the bytes read from the blob in the proper order. (using << operator).
I think the easiest and least error prone way to do this is by using a ByteBuffer. Here's an example containing two test cases, the first creates a binary file and the second one reads it. Note that you can set your byte encoding to little endian or big endian.
import org.junit.Test;
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
public class ByteStreamWriteRead {
#Test
public void write() throws IOException {
ByteBuffer buffer = ByteBuffer.allocate(32);
buffer.order(ByteOrder.BIG_ENDIAN);
System.out.println("Putting: " + Math.PI + ", " + (float) Math.PI);
buffer.putDouble(Math.PI);
buffer.putFloat((float) Math.PI);
File file = new File("C:/tmp/file.bin");
file.createNewFile();
try (FileOutputStream fos = new FileOutputStream(file)) {
fos.write(buffer.array(), 0, buffer.position());
}
}
#Test
public void read() throws IOException {
File file = new File("C:/tmp/file.bin");
byte[] a = new byte[32];
if (file.exists()) {
try (FileInputStream fis = new FileInputStream(file)) {
fis.read(a);
}
ByteBuffer buffer = ByteBuffer.wrap(a);
buffer.order(ByteOrder.BIG_ENDIAN);
System.out.println(buffer.getDouble());
System.out.println(buffer.getFloat());
} else {
System.out.println("File doesn't exist");
}
}
}
Just a note: the examples above do NOT show the most efficient way to read or write a file. You should use a buffered reader/writer and reuse the ByteBuffer to read a chunk of bytes at a time. That's application specific. The example above only shows the logic to use the ByteBuffer and the right byte encoding.

Zlib compression is too big in size

I am completely new to java, I have decided to learn it by doing a small project in it. I need to compress some string using zlib and write it to a file. However, file turn out to be too big in size. Here is code example:
String input = "yasar\0yasar"; // test input. Input will have null character in it.
byte[] compressed = new byte[100]; // hold compressed content
Deflater compresser = new Deflater();
compresser.setInput(input.getBytes());
compresser.finish();
compresser.deflate(compressed);
File test_file = new File(System.getProperty("user.dir"), "test_file");
try {
if (!test_file.exists()) {
test_file.createNewFile();
}
try (FileOutputStream fos = new FileOutputStream(test_file)) {
fos.write(compressed);
}
} catch (IOException e) {
e.printStackTrace();
}
This write a 1 kilobytes file, while the file should be at most 11 bytes (because the content is 11 bytes here.). I think problem is in the way I initialize the byte array compressed as 100 bytes, but I don't know how big the compreesed data will be in advance. What am I doing wrong here? How can I fix it?
If you don't want to write the whole array and instead write just the part of it that was filled by Deflater use OutputStream#write(byte[] array, int offset, int lenght)
Roughly like
String input = "yasar\0yasar"; // test input. Input will have null character in it.
byte[] compressed = new byte[100]; // hold compressed content
Deflater compresser = new Deflater();
compresser.setInput(input.getBytes());
compresser.finish();
int length = compresser.deflate(compressed);
File test_file = new File(System.getProperty("user.dir"), "test_file");
try {
if (!test_file.exists()) {
test_file.createNewFile();
}
try (FileOutputStream fos = new FileOutputStream(test_file)) {
fos.write(compressed, 0, length); // starting at 0th byte - lenght(-1)
}
} catch (IOException e) {
e.printStackTrace();
}
You will probably still see 1kB or so in Windows because what you see there seems to be either rounded (you wrote 100 bytes before) or it refers to the size on the filesystem which is at least 1 block large (should be 4kb IIRC). Rightclick the file and check the size in the properties, that should show the actual size.
If you don't know the size in advance, don't use Deflater, use a DeflaterOutputStream that writes data of any length compressed.
try (OutputStream out = new DeflaterOutputStream(new FileOutputStream(test_file))) {
out.write("hello!".getBytes());
}
Above example will use the default values for deflating but you can pass a configured Deflater in the constructor of DeflaterOutputStream to change the behavior.
you write to file all 100 bytes of compressed array, but you have to write only really compressed bytes returned by deflater.
int compressedsize = compresser.deflate(compressed);
fos.write(compressed, 0, compressedsize);

Categories