Read binary file backwards using Java

Read binary file backwards using Java - java

I'm reading in binary files normally using:
//What I use to read in the file normally
int hexIn;
for(int i = 0; (hexIn = in.read()) != -1; i++){
}
What I need to do is read the file in backwards I have tried something along the lines of... but it does not work! I have looked a loads of help pages but can't find anything I hope you can help me please.
//How im trying to read in the file backwards
for(long i = 0, j = length - 1; i < length; i++, j--){
int hexIn = 0;
hexIn = in.read();
}
Just to complacate things I'm reading the binary in and converting it to hex using
//This makes sure it does not miss and 0 on the end
String s = Integer.toHexString(hexIn);
if(s.length() < 2){
s = "0" + Integer.toHexString(hexIn);
}
Say the hex being read in normally is
10 11 12 13 14 15
If it was being read in backwards it would be read in
51 41 31 21 11 01
I need to read it in
15 14 13 12 11 10
Does anyone have an idea? Because I'm all out of them, not even my trusty Java books know!

You don't want to "read" the file at all. What you want to do is use a FileChannel and a MappedByteBuffer overlaid on top of the file, then simply access the byte buffer in reverse.
This lets the host OS manage the actual reading of blocks from disk for you, while you simply scan the buffer backwards in a loop.
Look at this page for some details.

You can use RandomAccessFile class:
RandomAccessFile file = new RandomAccessFile(new File(fileName), "r");
long index, length;
length = file.length() - 1;
for (index = length; index >= 0; index--) {
file.seek(index);
int s = file.read();
//..
}
file.close();
This should work, but will be much slower than InputStream as here you can't benefit from block reading.

You would need to use a RandomAccesFile. Then you can specify the exact byte to read.
It won't be very efficient but it allows you to read a file of any size.
Depends on your exact requirement which solution you use.

How about trying the following.. NOTE: this is definitely not that efficient but I believe will work.
First read the whole inputstream into a ByteArray
http://www.java-tips.org/java-se-tips/java.io/reading-a-file-into-a-byte-array.html
Use the following code.
code-example.java
byte[] theBytesFromTheFile = <bytes_read_in_from_file>
Array<Byte> bytes = new Array<Byte>();
for(Byte b : theBytesFromTheFile) {
bytes.push(b);
}
Now you can pop the array and you will have each byte in the correct order, backwards from the file. (NOTE: You will still have to split the byte into their individual hex chars from the byte)
If you don't want to do it this way, you can also look at this site. This code will read the files data in backward.
http://mattfleming.com/node/11

In case of a small binary file consider reading it into byte array. Then you can perform necessary operations backwards or in any other order. Here is the code using java 7:
pivate byte[] readEntireBinaryFile(String filename) throws IOException {
Path path = Paths.get(filename);
return Files.readAllBytes(path);
}

Related

Input shouldnt exceed xxx KB

im solving some questions in java and i come across this line in the question... " The total size of the input doesn't exceed 300 KB" , " The total size of the input doesn't exceed 256 KB"
my doubt is how can i make sure that my input is less than that value.
i actually tried using
CountingInputStream (CountingInputStream input = new CountingInputStream(System.in);)
to validate it. this is an external jar file by Google.
but when i submit my solution in the online compilers, CountingInputStream is not taken by the compiler. so how do i do it without using this ?.. in a general way ?
CountingInputStream input = new CountingInputStream(System.in);
System.out.println("Enter Values: ");
while (scanner.hasNext() && input.getCount() < (256 * 1024))
this is now im doing ...but is there a way where i can control my input without using CountingInputStream. Kindly help

Write your own class that decorates an InputStream, overriding the read method to count bytes and then throw an exception when the number of bytes exceeds some threshold. Your driver could look like this:
InputStream in = new ByteLimiterInputStream(new FileInputStream("file.bin"));
while(...)
in.read();
This will throw an exception when you've read too much data. It's up to you to write the ByteLimiterInputStream class. This is an academic exercise after all: exercise your own brain and don't ask others for the answers.

Use an InputStream, call the read() method, and increment a counter.
read() will return a single byte, or -1 at end of stream.
e.g.
int MAX = 256 * 1024;
int count = 0;
while (true) {
int return = is.read();
if (return == -1) break;
if (++count >= MAX) {
// maximum limit reached
} else {
// store the byte somewhere, do something with it...
}
}

How to parse a huge file line by line, serialize & deserialize a huge object efficiently?

I have a file of size around 4-5 Gigs(nearly billion lines). From every line of the file, I have to parse the array of integers and the additional integer info and update my custom data structure. My class to hold such information looks like
class Holder {
private int[][] arr = new int[1000000000][5]; // assuming that max array size is 5
private int[] meta = new int[1000000000];
}
A sample line from the file looks like
(1_23_4_55) 99
Every index in the arr & meta corresponds to the line number in the file. From the above line, I extract the array of integers first and then the meta information. In that case,
--pseudo_code--
arr[line_num] = new int[]{1, 23, 4, 55}
meta[line_num]=99
Right now, I am using BufferedReader object and it's readLine method to read each line & use character level operations to parse the integer array and meta information from each line and populate the Holder instance. But, it takes almost half an hour to complete this entire operation.
I used both java Serialization & Externalizable(write the meta and arr) to serialize and deserialize this HUGE Holder instance. And with both of them, the time to serialize is almost half an hour and to deserialize is also almost half an hour.
I would appreciate your suggestions on dealing with this kind of problem & would definitely love to hear your part of story if any.
P.S. Main Memory is not a problem. I have almost 50 GB of RAM in my machine. I have also increased the BufferedReader size to 40 MB (Of course, I can increase this upto 100 MB considering that disk access takes approx. 100 MB/sec). Even cores and CPU is not a problem.
EDIT I
The code that I am using to do this task is provided below(after anonymizing very few information);
public class BigFileParser {
private int parsePositiveInt(final String s) {
int num = 0;
int sign = -1;
final int len = s.length();
final char ch = s.charAt(0);
if (ch == '-')
sign = 1;
else
num = '0' - ch;
int i = 1;
while (i < len)
num = num * 10 + '0' - s.charAt(i++);
return sign * num;
}
private void loadBigFile() {
long startTime = System.nanoTime();
Holder holder = new Holder();
String line;
try {
Reader fReader = new FileReader("/path/to/BIG/file");
// 40 MB buffer size
BufferedReader bufferedReader = new BufferedReader(fReader, 40960);
String tempTerm;
int i, meta, ascii, len;
boolean consumeNextInteger;
// GNU Trove primitive int array list
TIntArrayList arr;
char c;
while ((line = bufferedReader.readLine()) != null) {
consumeNextInteger = true;
tempTerm = "";
arr = new TIntArrayList(5);
for (i = 0, len = line.length(); i < len; i++) {
c = line.charAt(i);
ascii = c - 0;
// 95 is the ascii value of _ char
if (consumeNextInteger && ascii == 95) {
arr.add(parsePositiveInt(tempTerm));
tempTerm = "";
} else if (ascii >= 48 && ascii <= 57) { // '0' - '9'
tempTerm += c;
} else if (ascii == 9) { // '\t'
arr.add(parsePositiveInt(tempTerm));
consumeNextInteger = false;
tempTerm = "";
}
}
meta = parsePositiveInt(tempTerm);
holder.update(arr, meta);
}
bufferedReader.close();
long endTime = System.nanoTime();
System.out.println("#time -> " + (endTime - startTime) * 1.0
/ 1000000000 + " seconds");
} catch (IOException exp) {
exp.printStackTrace();
}
}
}
public class Holder {
private static final int SIZE = 500000000;
private TIntArrayList[] arrs;
private TIntArrayList metas;
private int idx;
public Holder() {
arrs = new TIntArrayList[SIZE];
metas = new TIntArrayList(SIZE);
idx = 0;
}
public void update(TIntArrayList arr, int meta) {
arrs[idx] = arr;
metas.add(meta);
idx++;
}
}

It sounds like the time taken for file I/O is the main limiting factor, given that serialization (binary format) and your own custom format take about the same time.
Therefore, the best thing you can do is to reduce the size of the file. If your numbers are generally small, then you could get a huge boost from using Google protocol buffers, which will encode small integers generally in one or two bytes.
Or, if you know that all your numbers are in the 0-255 range, you could use a byte[] rather than int[] and cut the size (and hence load time) to a quarter of what it is now. (assuming you go back to serialization or just write to a ByteChannel)

It simply can't take that long. You're working with some 6e9 ints, which means 24 GB. Writing 24 GB to the disk takes some time, but nothing like half an hour.
I'd put all the data in a single one-dimensional array and access it via methods like int getArr(int row, int col) which transform row and col onto a single index. According to how the array gets accessed (usually row-wise or usually column-wise), this index would be computed as N * row + col or N * col + row to maximize locality. I'd also store meta in the same array.
Writing a single huge int[] into memory should be pretty fast, surely no half an hour.
Because of the data amount, the above doesn't work as you can't have a 6e9 entries array. But you can use a couple of big arrays instead and all of the above applies (compute a long index from row and col and split it into two ints for accessing the 2D-array).
Make sure you aren't swapping. Swapping is the most probable reason for the slow speed I can think of.

There are several alternative Java file i/o libraries. This article is a little old, but it gives an overview that's still generally valid. He's reading about 300Mb per second with a 6-year old Mac. So for 4Gb you have under 15 seconds of read time. Of course my experience is that Mac IO channels are very good. YMMV if you have a cheap PC.
Note there is no advantage above a buffer size of 4K or so. In fact you're more likely to cause thrashing with a big buffer, so don't do that.
The implication is that parsing characters into the data you need is the bottleneck.
I have found in other apps that reading into a block of bytes and writing C-like code to extract what I need goes faster than the built-in Java mechanisms like split and regular expressions.
If that still isn't fast enough, you'd have to fall back to a native C extension.

If you randomly pause it you will probably see that the bulk of the time goes into parsing the integers, and/or all the new-ing, as in new int[]{1, 23, 4, 55}. You should be able to just allocate the memory once and stick numbers into it at better than I/O speed if you code it carefully.
But there's another way - why is the file in ASCII?
If it were in binary, you could just slurp it up.

How to determine length of OGG file

I'm making a rhythm game and I need a quick way to get the length of an ogg file. The only way I could think would be to stream the file really fast without playing it but if I have hundreds of songs this would obviously not be practical. Another way would be to store the length of the file in some sort of properties file but I would like to avoid this. I know there must be some way to do this as most music players can tell you the length of a song.

The quickest way to do it is to seek to the end of the file, then back up to the last Ogg page header you find and read its granulePosition (which is the total number of samples per channel in the file). That's not foolproof (you might be looking at a chained file, in which case you're only getting the last stream's length), but should work for the vast majority of Ogg files out there.
If you need help with reading the Ogg page header, you can read the Jorbis source code... The short version is to look for "OggS", read a byte (should be 0), read a byte (only bit 3 should be set), then read a 64-bit little endian value.

I implemented the solution described by ioctlLR and it seems to work:
double calculateDuration(final File oggFile) throws IOException {
int rate = -1;
int length = -1;
int size = (int) oggFile.length();
byte[] t = new byte[size];
FileInputStream stream = new FileInputStream(oggFile);
stream.read(t);
for (int i = size-1-8-2-4; i>=0 && length<0; i--) { //4 bytes for "OggS", 2 unused bytes, 8 bytes for length
// Looking for length (value after last "OggS")
if (
t[i]==(byte)'O'
&& t[i+1]==(byte)'g'
&& t[i+2]==(byte)'g'
&& t[i+3]==(byte)'S'
) {
byte[] byteArray = new byte[]{t[i+6],t[i+7],t[i+8],t[i+9],t[i+10],t[i+11],t[i+12],t[i+13]};
ByteBuffer bb = ByteBuffer.wrap(byteArray);
bb.order(ByteOrder.LITTLE_ENDIAN);
length = bb.getInt(0);
}
}
for (int i = 0; i<size-8-2-4 && rate<0; i++) {
// Looking for rate (first value after "vorbis")
if (
t[i]==(byte)'v'
&& t[i+1]==(byte)'o'
&& t[i+2]==(byte)'r'
&& t[i+3]==(byte)'b'
&& t[i+4]==(byte)'i'
&& t[i+5]==(byte)'s'
) {
byte[] byteArray = new byte[]{t[i+11],t[i+12],t[i+13],t[i+14]};
ByteBuffer bb = ByteBuffer.wrap(byteArray);
bb.order(ByteOrder.LITTLE_ENDIAN);
rate = bb.getInt(0);
}
}
stream.close();
double duration = (double) (length*1000) / (double) rate;
return duration;
}
Beware, finding the rate this way will work only for vorbis OGG!
Feel free to edit my answer, it may not be perfect.

In Java, how do I set a progress bar for FileChannel?

For example,
Channels.newChannel(url.openStream());
This one line opens the stream and gets data from it, but there is no way I can find the progress for it.
Is there a way to do so?

This one line opens the stream and gets data from it
No, it won't until you .read() from it. Therefore...
Is there a way to do so?
Yes there is. You .read(buf) where buf is a ByteBuffer.
Just grab that buffer's .position().
(and note that Channels.newChannel(InputStream) will not return a FileChannel but a ReadableByteChannel)

When using the transferFrom() or transferTo() Method, you have to pass the starting point within the file, as well as the amount of Bytes to transfer.
What you could do, is to break up the file into, say 100 "parts". This is done very simply:
File f = new File(url);
int size = f.length();
Say our file has a size of 100KB so size would now be 102400.
Just divide this by the amount of "parts" (100 in this example) and we have 1024 Bytes per "part". Now all we have to do is show the percentage of "parts" which we already transfered.
As an Example
int pos = 0;
for(int x = 0 ; x < 100 /*amount of parts*/ ; x++) {
source.transferTo(0*x, 1024 /*part size*/, destination);
progress.setValue(x/100 /*amount of parts*/ *100);
}

Remove junk trailing xml from an inputstream

My free webhost appends analytics javascript to all PHP and HTML files. Which is fine, except that I want to send XML to my Android app, and it's invalidating my files.
Since XML is parsed in its entirety (and blows up) before passed along to my SAX ContentHandler, I can't just catch the exception and continue merrily along with a fleshed out object. (Which I tried, and then felt sheepish about.)
Any suggestions on a reasonably efficient strategy?
I'm about to create a class that will take my InputStream, read through it until I find the junk, break, then take what I just wrote to, convert it back into an InputStream and pass it along like nothing happened. But I'm worried that it'll be grossly inefficient, have bugs I shouldn't have to deal with (e.g. breaking on binary values such as embedded images) and hopefully unnecessary.
FWIW, this is part of an Android project, so I'm using the android.util.Xml class (see source code). When I traced the exception, it took me to a native appendChars function that is itself being called from a network of private methods anyway, so subclassing anything seems to be unreasonably useless.
Here's the salient bit from my stacktrace:
E/AndroidRuntime( 678): Caused by: org.apache.harmony.xml.ExpatParser$ParseException: At line 3, column 0: junk after document element
E/AndroidRuntime( 678): at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:523)
E/AndroidRuntime( 678): at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:482)
E/AndroidRuntime( 678): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:320)
E/AndroidRuntime( 678): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:277)
I guess in the end I'm asking for opinions on whether the InputStream -> manually parse to OutputStream -> recreate InputStream -> pass along solution is as horrible as I think it is.

I'm about to create a class that will take my InputStream, read
through it until I find the junk, break, then take what I just wrote
to, convert it back into an InputStream and pass it along like nothing
happened. But I'm worried that it'll be grossly inefficient, have bugs
I shouldn't have to deal with (e.g. breaking on binary values such as
embedded images) and hopefully unnecessary.
you could use a FilterStream for that no need for a buffer
best thing to do is add a delimiter to the end of the XML like --theXML ends HERE -- or a char not found in XML like a group of 16 \u04 chars (you then only need to check every 16th byte) to the end of the XML and read until you find it
implementation assuming \u04 delim
class WebStream extends FilterInputStream {
byte[] buff = new byte[1024];
int offset = 0, length = 0;
public WebStream(InputStream i) {
super(i);
}
#Override
public boolean markSupported() {
return false;
}
#Override
public int read() throws IOException {
if (offset == length)
readNextChunk();
if (length == -1)
return -1;// eof
return buff[offset++];
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
if (offset == length)
readNextChunk();
if (length == -1)
return -1;// eof
int cop = length - offset;
if (len < cop)
cop = len;
System.arraycopy(buff, offset, b, off, cop);
offset += cop;
return cop;
}
private void readNextChunk() throws IOException {
if (offset <= length) {
System.arraycopy(buff, offset, buff, 0, length - offset);
length -= offset;
offset = 0;
}
int read = in.read(buff, length, buff.length - length);
if (read < 0 && length <= 0) {
length = -1;
offset = 0;
return;
}
// note that this is assuming ascii compatible
// anything like utf16 or utf32 will break here
for (int i = length; i < read + length; i += 16) {
if (buff[i] == 0x04) {
while (buff[--i] == 0x04)
;// find beginning of delim block
length = i;
read = 0;
}
}
}
}
note this misses throws, some error checking and needs proper debugging

"I'm about to create a class that will take my InputStream, read through it until I find the junk, break, then take what I just wrote to, convert it back into an InputStream and pass it along like nothing happened. But I'm worried that it'll be grossly inefficient, have bugs I shouldn't have to deal with (e.g. breaking on binary values such as embedded images) and hopefully unnecessary."
That'll work. You can read into a StringBuffer and then use a ByteArrayInputStream or something similar (like StreamReader if that's applicable).
http://developer.android.com/reference/java/io/ByteArrayInputStream.html
The downside is that you're reading in the entire XML file into memory, for large files, it can be inefficient memory-wise.
Alternatively, you can subclass InputStream and do the filtering out via the stream. You'd probably just need to override the 3 read() methods by calling super.read() and flagging when you've gotten to the garbage at the end and return an EOF as needed.

Free webhost have this issue. I'm still yet to find an alternative still in free mode.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Read binary file backwards using Java - java

You would need to use a RandomAccesFile. Then you can specify the exact byte to read. It won't be very efficient but it allows you to read a file of any size. Depends on your exact requirement which solution you use.

Related

Input shouldnt exceed xxx KB

How to parse a huge file line by line, serialize & deserialize a huge object efficiently?

How to determine length of OGG file

In Java, how do I set a progress bar for FileChannel?

Remove junk trailing xml from an inputstream

Categories

Resources