Read file into binary byte array - Java - java

Is there any way to read a file into a byte array and have it built with binary numbers only? For both binary files and "regular" files (.txt etc.)(In Java)
I found a way to read a file into a byte array, but if the file is a binary file then the byte array contains negative numbers and i dont't know how to handle that as a binary number. I need my array to only contain 0s and 1s.

Even though there are negative values, the bits are still correct. The easiest thing you could do is wrap a BitSet around the byte[] so you can easily test individual bits:
BitSet bitSet = BitSet.valueOf(myByteArray);
boolean isBit20Set = bitSet.get(20);

Related

Open file in its bit representation and manipulate bits in Java

I know that using a hexadecimal editor, one can edit binary files and change 4 bits with each hexadecimal value, But I am kind of thinking of a project that requires to modify a single bit rather than 4-bits.
So Is there a way to read something (e.g. ASCII coded plain text-file) in bits and manipulate single bits in e.g. Java?
As a noob, I can think of loading each bytes and generating a string containing each 8-bit representation of each byte, but that is kind of quite a complex way and will waste a lot of space. Also, this approach would require me to keep a list containing each available byte's 8-bit representation to look it up.
Others have already hinted that your question is broad and can probably be better answered by other sites, ressources or search.
However I'd like to show you some small snippet with which you can start.
// Read a file into a byte array, we are not interested
// in interpreting encodings, just plain bytes
final byte[] fileContent = Files.readAllBytes(Paths.get("pathToMyFile"));
// Iterate the content and display one byte per line
for (final byte data : fileContent) {
// Convert to a regular 8-bit representation
System.out.println(Integer.toBinaryString(data & 255 | 256).substring(1));
}
You can also easily manipulate the bytes and also the bits by simply accessing the array contents and using simple bit operators like &, |, ^.
Here is a snippet showing you how to set and unset a bit at a given position:
byte data = ...;
// Set it (to 1)
data = data | (1 << pos);
// Unset it (to 0)
data = data & ~(1 << pos);
The conversion gets explained here: how to get the binary values of the bytes stored in byte array
The bit manipulation here: Set specific bit in byte
Here are some relevant Java-Docs: Files#readAllBytes and Integer#toBinaryString
Note that from a view of efficiency the smallest you can go in Java is byte, there is no bit data type. However in practice you will probably not see any difference, the CPU already loads the whole neighboring bits and bytes into the cache regardless of you want to use them or not. Thus you can just use the byte data type and use them to manipulate single bits.

How to write to a file in Java after Huffman Coding is done

I have implemented a class for Huffman coding. The class will parse an input file and build a huffman tree from it and creates a map which has each of the distinct characters appeared in the file as the key and the huffman code of the character as its value.
For example, let the string "aravind_is_a_good_boy" be the only line in the file. When you build the huffman tree and generate the huffman code for each character, we can see that, for the character 'a', the huffman code is '101' and for the character 'r', the huffman code is '0101' etc.
My intention is to compress the file. So I cannot write a string, which is created by replacing each character, by its huffman code, directly to the file. Since, each character would be replaced by at least 3 characters (Each '1' and '0' would still be written into the file as a character, not bits). So I thought I would write it to a file as a bytes, since there is no way you can write bits to a file. But then, 'a' and 'r' are both written as '5' into the file. This would cause problem when trying to decompress the file.
This is how I am converting a series of bits to bytes:
public byte[] compressString(String s, CharCodeHashMap map) {
String byteString = "";
byte[] byteArr = new byte[s.length()];
int size = 0;
for (int i = 0; i < s.length(); i++) {
byteString += addPaddingZeros(map.getCompressedChar(s.charAt(i)));
byteArr[size++] = new BigInteger(byteString, 2).toByteArray()[0];
byteString = "";
}
return byteArr;
}
I tried prefixing '1' to each of the hashcodes, to fix the problem. But then, when you build a huffman tree, reading a file, some characters would have more than 8 bits. Then, the problem is new BigInteger(byteString, 2).toByteArray() would have more than 1 element in the array.(For eg, if 'v' has the hashcode '11010001' and new BigInteger(byteString, 2).toByteArray() returns an array of elements [0, -47].)
Can someone please suggest me a way to write to a file such that, the file would be compressed and at the same time, these problems are also taken care.
The problem is that files in modern operating systems are modeled as indexable sequences of bytes1.
So what you need is a way to encode the fact that your file is representing a number of bits that may not be a multiple of 8. That means the bit stream size is not necessarily the file size (in bytes) multiplied by 8.
There are a variety of solutions:
Reserve N bytes at the start of the file for the file size in bits. For example, reserving 4 bytes allows you to represent file sizes up to 232 bits.
Reserve 3 bits at the start of the file to hold the number of bits modulo 8. You can use this to decide how many bits in the last byte of the file to ignore.
Use some kind of encoding to represent the end of stream; e.g. represent it as a character in the text stream that you are encoding.
Is there a way to deal with this without using some bits? AFAIK, No.
1 - And at a lower level, files are represented as sequences of disk blocks consisting of multiple bytes. So, from a physical storage perspective, compressing files that are already small (e.g. smaller than a disk block) doesn't achieve anything. Similarly saving or not saving (say) 3 bits when the representation is modeled as a byte sequence is at the border of being pointless ... if that was what was concerning you.
Yes, you can write bits to a file. In fact you are always writing bits to a file. The only thing is that you are writing eight bits at a time.
What you need is a bit buffer, say a 32-bit unsigned variable, into which you accumulate bits. Have another integer that tracks how many bits are in the bit buffer. Use the shift left and or (or plus) operators to put more bits in the bit buffer, and the and and shift right operators to remove them. Whenever you have eight or more bits in the bit buffer, you write those eight bits to the file as a byte. At the end, write the remaining bits (if any) to the file as the last byte.
So, to add the bits bits in value to the buffer:
bitBuffer |= value << bitCount;
bitcount += bits;
to write and remove available bytes:
while (bitCount >= 8) {
writeByte(bitBuffer & 0xff);
bitBuffer >>>= 8;
bitCount -= 8;
}
You need to make sure that when decoding, you don't mistake the filler bits in the last byte as another code. You can either send the actual number of bits in the message preceding the message (or the number of bits in the last byte), or you can add a symbol to your alphabet for end-of-stream that gets its own Huffman code, and end the message with that.
The other problem you have is that you will also need to transmit the Huffman code itself to the decoder before the coded symbols in order for the decoder to know how to decode. Look up "canonical Huffman codes" for how to approach that efficiently.

Java. How to get constant read times from text file?

I have text file which contains over 1kk integer numbers. I want to read the n-th number in constant time. I'm not allowed to put all integers in the array. I heard that there is a technique which operates with bytes, so I could just write method "getNthInteger(int nth, int elementLengthInBytes)" or something like that. Please give me reference to this technique, any help is appreciated!
You convert each integer to an array of bytes of some length L, then write the bytes to the file. L must be exactly the same for each integer. Then to read integer N, read L bytes starting from byte N*L.
For example:
You can write an integer to a file as 4 bytes with java.io.RandomAccessFile.writeInt(int).
You can read the Nth integer with:
java.io.RandomAccessFile.seek(n*4);
int i = java.io.RandomAccessFile.readInt();
Replace java.io.RandomAccessFile with an actual object of type java.io.RandomAccessFile.

how to find the type of integer from a file using java

I have a file that contains integer values of different bit lengths (4 bytes, 2 bytes), but I don't know the layout of these values in the file (i.e. whether a value is a 4 bytes or 2 bytes integer). For example, a file may have two 4-byte integers followed by five 2-byte integers, and another file may have three 2-byte integers first and then four 4-byte integers. Is there a way to read such values?
I want to write code that takes such a file and reads a value irrespective of its byte size. Right now I am using DataInputStream, and by knowing the layout of the values, using some viewer in advance to read the values. But in this manner everything is hard coded, and my code is not generic.
Your going to have to "parse" or "read" or "do something" with the viewer data and use the refactored viewer info as file format definition info during the reads.

XOR Encryption in Java: losing data after decryption

I'm currently writing a very small Java program to implement a one-time-pad, where the pad (or key) itself is generated as a series of bytes using a SecureRandom object, which is seeded using a simple string with the SHA-512 algorithm.
Generating the one-time-pad hasn't caused any problems, and if I supply the same seed string each time, as expected I get the same sequence of psuedo-random numbers, making the decryption process possible as long as the person decrypting has the seed string used to encrypt.
When I try to encrypt a file, the program reads in the data 64 chars at a time (except for the end of file, which is generally an odd number), and generates 64 bytes (or matching amount) of psuedo random bytes. XOR is performed between the elements of both arrays, the resulting char array containing the cipher characters is written to file, and the process repeats until all text in the file has been read.
Now, because Java treats all primitives as signed numbers (the data type byte ranges from -128 to 127, not 0 to 255) this means that the XOR operation can (and does) result in some negative values (-128 to -1). It seems that Java does not recognise these values as valid ASCII, and simply writes a ? (question mark) to the file for any negative values. When it comes to reading from the file to decrypt the cipher text, the negative value that resulted in the ? to be written to file is lost, replaced with 63, the valid ASCII code for a question mark.
This means that XORing this value is useless, without the original value there is no way to produce the plaintext. Incidentally, if I reproduce the behaviour of encrypting some data and then decrypting the data immediately after, in the same program run, and printing status along the way, there are no problems. Only if the data is written to file is the information lost.
I should also mention that I did try adding 128 to each encryption XOR result, and then subtracting it before performing the decryption XOR (to put each value in a valid ASCII range), but the ? problem still showed up because there are 31 ASCII codes from 128 to 159 that I'm unable to read and appear as ?
I've been banging my head off the wall on this for a while now, any help is appreciated.
Cheers.
This is very confused. If you are processing a char array, the elements are 16 bits wide, they are unsigned, and not all values are valid. So (a) you cant possibly be having a problem with signs or bytes, and (b) you shouldn't be doing that at all. You should be reading the file into a byte array, XOR-ing, and writing out the byte array directly to the output file. No Readers or Writers, no chars, no Strings.
I guess the problem is in the way you write the file. Write directly the converted byte array to a FileOutputStream and do not try to convert it to string first. For reading, do the same thing, read it to a byte array.

Categories