Read an integer from a binary file [duplicate] - java

This question already has answers here:
Does Java flip byte order in file I/O? [duplicate]
(2 answers)
how can i read from a binary file?
(3 answers)
Closed 2 years ago.
I need to read an integer from a non-text file in Java. Here is the equivalent code in C
unsigned int MagicNumber;
int fd = open("MyFile", O_RDONLY);
int rc = read(fd, &MagicNumber, 4);
What is the equivalent in Java?
This does NOT work for me (it produces different results than the C code):
FileInputStream fin = new FileInputStream("MyFile");
DataInputStream din = new DataInputStream(fin);
int MagicNumber = din.readInt();

Java uses a machine independent representation of the interger as a byte sequence, if you write it with DataOutputStream and next try to read it with DataInputStream on a different machine even with different endian, it will read it correctly anyway.
Java was designed exactly for that: i.e. to be machine independent.
Of course you cannot assume the byte sequence will be the same in the file you generated in C.
It happens that Java defaults to big-endian (see the DataOutput specs).
If you do your test on any big-endian machine it will work, on a little-endian (like x86) it will fail.
You can try to use LittleEndianDataOutputStream to solve this.

Related

Implement fread (readInt) in java

I am attempting to convert a program that reads a binary file in C++ to java. The file is in little-endian.
fread(n, sizeof (unsigned), 1, inputFile);
The snippet above of c++ reads 1 integer into the integer variable 'n'.
I am currently using this method to accomplish the same thing:
public static int readInt(RandomAccessFile inputStream) throws IOException {
int retVal;
byte[] buffer = new byte[4];
inputStream.readFully(buffer);
ByteBuffer wrapped = ByteBuffer.wrap(buffer);
wrapped.order(ByteOrder.LITTLE_ENDIAN);
retVal = wrapped.getInt();
return retVal;
}
but this method sometimes differs in its result to the c++ example. I haven't been able to determine which parts of the file cause this method to fail, but I know it does. For example, when reading one part of the file my readInt method returns 543974774 but the C++ version returns 1.
Is there a better way to read little endian values in Java? Or is there some obvious flaw in my implementation? Any help understanding where I could be going wrong, or how could I could read these values in a more effective way would be very appreciated.
Update:
I am using RandomAcccessFile because I frequently require fseek functionality which RandomAccessFile provides in java
543974774 is, in hex, 206C6576.
There is no endianness on the planet that turns 206C6576 into '1'. The problem is therefore that you aren't reading what you think you're reading: If the C code is reading 4 bytes (or even a variable, unknown number of bytes) and turns that into '1', then your java code wasn't reading the same bytes - your C code and java code is out of sync: At some point, your C code read, for example, 2 bytes, and your java code read 4 bytes, or vice versa.
The problem isn't in your readInt method - that does the job properly every time.

Java write(str.getBytes()) vs writeBytes(str) [duplicate]

This question already has answers here:
writeBytes(str) vs write(str) in DataOutputStream
(2 answers)
Closed 3 years ago.
When using DataOutputStream to push Strings, I normally do the following:
DataOutputStream dout;
String str;
dout.write(str.getBytes());
I just came across the writeBytes() method of DataOutputStream, and my question is whether the above is equivalent to:
dout.writeBytes(str);
If not, what is difference and when should it be used?
No, it is not equivalent.
The Javadocs for writeBytes say
Writes out the string to the underlying output stream as a sequence of bytes. Each character in the string is written out, in sequence, by discarding its high eight bits.
So this will not work well except for ASCII Strings.
You should be doing
dout.write(str.getBytes(characterSet));
// remember to specify the character set, otherwise it become
// platform-dependent and the result non-portable
or
dout.writeChars(str);
or
dout.writeUTF(str);
Note that only the last method also writes the length of the String, so for the others, you probably need to know exactly what you are doing if you intend to later read it back.
The bigger question is why you need to use such a low-level protocol as DataOutputStream directly.

When should we use BufferedInputStream,FileInputStream or DataInputStream? [duplicate]

This question already has answers here:
Should I use DataInputStream or BufferedInputStream
(7 answers)
Closed 5 years ago.
I'm confused the above mentioned classes. When to use what? From my perspective every thing that comes in, is in the form of stream in java right? so which one is to use in what case to make the input more efficient? Also answer please can I use DataInputStream or BufferedInputStream in case of reading content from files?
FileInputStream
Is used for reading from files.
See the JavaDoc:
A FileInputStream obtains input bytes from a file in a file system. What files are available depends on the host environment. [...]
DataInputStream
Is used for reading in primitive Java types (that you might have written using a DataOutputStream) and provides convenience methods for that purpose, e.g. writeInt().
See the JavaDoc:
A data input stream lets an application read primitive Java data types
from an underlying input stream in a machine-independent way. [...]
BufferedInputStream
Is used to do buffered block reads from an InputStream (instead of single bytes) and increases performance if reading small chunks of data. Most of the time you want to use it for text processing.
See the JavaDoc:
A BufferedInputStream adds functionality to another input stream-namely, the ability to buffer the input[...].
Of course you can combine those as they are following the Decorator Pattern.
Example of writing primitive Java types to a file:
FileOutputStream write = new FileOutputStream
DataOutputStream out = new DataOutputStream(write);
out.writeInt(10);
write.close();

Java bit processing of file.txt [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to process a file.txt at the binary level by removing every 5th bit if it is equal to 1. Save the new processed binary file and repeat the process until it can no longer find any more 5th bits equal to 1, then save the final file.
Usually you operate on bytes not bits. If you want to access individual bits, you can use BitSet (assuming the file will fit in memory). For example, to set 17th bit to 1:
final Path path = Paths.get("file.bin");
final BitSet bitSet = BitSet.valueOf(Files.readAllBytes(path));
bitSet.set(17, true);
Files.write(path, bitSet.toByteArray());
All files are already stored as binary. You can get the binary bytes from any file in Java using the Files api. As an example:
InputStream is = null;
try{
is = Files.newInputStream(Paths.get("myFile.pdf"),StandardOpenOption.READ, StandardOpenOption.WRITE);
boolean hadMoreBytes = true;
byte[] buffer = new byte[1024];
int bytesRead = 0;
while(hadMoreBytes){
bytesRead = is.read(buffer);
doSomethingWithBytes(buffer,bytesRead);
hadMoreBytes = bytesRead > 0;
}
} finally {
if(is!=null){
is.close();
}
}
*plus usual disclaimers about adding error handling & other checks as appropriate for your situation
Note that you will be reading your file in "chunks" of bytes no bigger than your buffer. If you know that your files will be small enough to fit comfortably in memory and your situation demands it, you can build an array that contains all the bytes from the file yourself.
If you wanted to do something with bytes of the file after reading it, you can do something similar using Files.newOutputStream(Path path, OpenOption... options).
To manipulate file bytes - read and write, you could use RandomAccessFile or a ByteBuffer. An example using RandomAccess file:
public void writeAndRead(byte[] bytes) throws IOException {
RandomAccessFile file = new RandomAccessFile("myFile.bin", "rw");
// Write some bytes to file.
file.write(bytes);
// Seek to the begining of the file.
file.seek(0);
// Read back the bytes from the file.
byte[] buffer = new byte[bytes.length];
file.read(buffer);
file.close();
}
My take on this would be something like this:
After reading a byte from you file you could check its 5th bit value by using bit wise operations.
byte myByte;
int bit;
...
boolean bitValue = (myByte & (1 << bit)) != 0;
After reading one byte, check its 5th bit. If the bit is equal to 1, shift the first 3 bits of the byte to left (remove the bit). Now the first bit of your byte is undefined (can be either 0 or 1). So read the next byte and take its last bit, and insert into the previous bytes first bit.
Do the same shifting for the next byte until no bytes are left. Afterwards repeat the process. Of checking the bits.
You can set a specific bit of a byte by doing this:
myByte |= 1 << bit;
Looking at other questions in stack overflow, maybe you could make use of bit-io.

Base64 and binary streams between Java and C# [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I feel like the answer is obvious but suppose I have the following in C#
using (MemoryStream ms = new MemoryStream())
{
using (BinaryWriter bw = new BinaryWriter(ms))
{
// Write some floats, bytes, and uints
// Convert.ToBase64String this stuff from ms.ToArray
}
}
and the following in Java (ok it's Scala but using Java libraries):
val byteStream = new ByteArrayOutputStream()
val outStream = new DataOutputStream(byteStream)
// Write some floats, bytes, and longs where the uints were using
// writeFloat, writeByte, and writeLong. .NET has an overloaded
// function that takes whatever.
// Base64.getEncoder.encodeToString byteStream.toByteArray
I get completely different base 64 strings. What are they doing different here? I need the Java output to match the .NET output. I assume its some sort of byte ordering issue but I haven't had any luck using ByteBuffer to correct this.
Java:
PczMzT3MzM0/gAAAPczMzQAAAAAAAAAAAAAAAD3MzM0/gAAAAQAAAABRn8XzAAAAAAAAAAEAAAAAAAAAAQ==
C# (with unknown = signs as we chop them off for reasons) :
zczMPc3MzD0AAIA/zczMPQAAAAAAAAAAAAAAAM3MzD0AAIA/AfPFn1EBAAAAAQAAAA
I really feel as though it is byte ordering which is why I tried using ByteBuffer in the Java code, order method, to change the ordering but I did not have success.
For further clarity the Java code is running on x86_64 CentOS Java 7 and the .NET is on x86_64 Windows Server 2008 .NET 4.These values are coming from Protobuf objects so they should be pretty cross platform I would think. Numerically the data is identical and consistent regardless of what I put in at least when I write at least these three data types. The only significant difference is the lack of an unsigned type in Java and perhaps there is a binary representation difference which is where I was initially trying to resolve but I do not seem to be able to figure it out.
As I have said. Using another format is not an option. I need the binary data written from java and then base 64 encoded to result in the same results as .NET. Serialization choices are not an option. This has to be it. I need a resource that will aid in bringing this together whether that means binary manipulation of byte data or not. I need some explanation in the datatypes and as I have searched significantly and not found a resource explaining how to do this or what the real differences are so I can implement a solution I decided to ask here.
The main problem is that C#'s BinaryWriter writes the data type's low bytes first, where as Java's DataOutputStream writes the high bytes first.
Also, when you write a .NET unsigned integer, that writes 4 bytes. But when you write a Java long, it writes 8 bytes. So that's another difference right there.
But fixing them to match is actually not that hard once you understand the differences. Here are 2 code snippets, one in C#, and the other in Java that encode the same information and output the same Base64-encoded string. In my case, I chose to override how Java writes the floats and longs.
.NET code sample
static void Main(string[] args)
{
using (MemoryStream ms = new MemoryStream())
{
using (BinaryWriter bw = new BinaryWriter(ms))
{
// floats
bw.Write(-456.678f);
bw.Write(0f);
bw.Write(float.MaxValue);
// bytes
bw.Write((byte)0);
bw.Write((byte)120);
bw.Write((byte)255);
// uints
bw.Write(0U);
bw.Write(65000U);
bw.Write(4294967295U);
}
var base64String = Convert.ToBase64String(ms.ToArray());
Console.WriteLine(base64String);
}
}
Java code sample
public static void main(String[] args) throws Exception {
try (ByteArrayOutputStream byteStream = new ByteArrayOutputStream()) {
try (DataOutputStream outStream = new DataOutputStream(byteStream)) {
// floats
writeFloat(-456.678f, outStream);
writeFloat(0f, outStream);
writeFloat(Float.MAX_VALUE, outStream);
// bytes
outStream.writeByte(0);
outStream.writeByte(120);
outStream.writeByte(255);
// longs (uints)
writeUint(0L, outStream);
writeUint(65000L, outStream);
writeUint(4294967295L, outStream);
}
String base64String = Base64.getEncoder().encodeToString(byteStream.toByteArray());
System.out.println(base64String);
}
}
private static void writeFloat(float f, DataOutputStream stream) throws Exception {
int val = Float.floatToIntBits(f);
stream.writeByte(val & 0xFF);
stream.writeByte((val >>> 8) & 0xFF);
stream.writeByte((val >>> 16) & 0xFF);
stream.writeByte((val >>> 24) & 0xFF);
}
private static void writeUint(long val, DataOutputStream stream) throws Exception {
stream.writeByte((int) (val & 0xFF));
stream.writeByte((int) ((val >>> 8) & 0xFF));
stream.writeByte((int) ((val >>> 16) & 0xFF));
stream.writeByte((int) ((val >>> 24) & 0xFF));
}
Output for both samples
yVbkwwAAAAD//39/AHj/AAAAAOj9AAD/////
Make sure you test edge cases with the float type and make adjustments where necessary. If it matters to you, I expect funny values like NaN to cause differences, but maybe you don't care about that. Otherwise, I expect that it will work fine.
How to implement cross-platform binary communication:
define exact byte format
implement in each platform
Frequently you can simplify both steps by using off-the-shelf protocol that is close to your needs (like https://en.wikipedia.org/wiki/BSON) and supported on one or all platforms you are interested in.
Note that generally basic binary serialization types in a given language/framework are targeting strictly that language/framework (and often particular version) as it frequently gives speed/size benefit and there is no well accepted standard on "binary object representation".
Alternative approach is to use well defined text formats like JSON/XML as suggested in other answer.
Some possible technical differences between binary formats:
serialization of integer types can differ by byte order/possible alternative representation (like compressed int in .Net)
size of boolean and enumeration types could be different
arrays/strings can use different types to represent length
padding may be added by some binary represenations
strings can be Utf8, Utf-16, or any other specified/unspecified encoding with or without trailing 0.
Different platforms have different binary representations. If you want to match base64 strings you should use json or xml serialization. Json or xml providing cross platform.
Editted: Don't missunderstand me: Base64 is standard encoding algorithm. It gives the same output for the same data. I mean byte array might be different.

Categories