My code below only parses through the data file once. I'm trying to get it to parse through the whole file. Every time it finds a marker, parse the data and append it to the output file. Currently it successfully parses the data once and then stops. Can't figure out how to keep it looping until eof. The data is 4 byte aligned and is in a input binary file.
private static void startParse(File inFile) throws IOException {
boolean markerFound = false;
for (int offset = 0; !markerFound && offset < 4; offset++){
DataInputStream dis = new DataInputStream(new FileInputStream(inFile));
for (int i = 0; i < offset; i++){
dis.read();
}
try {
int integer;
long l;
while((l = (integer = dis.readInt())) != MARKER) {
//Don't do anything
}
markerFound = true;
for (int i = 0; i < 11; i++){
dis.read();
}
// ********************** data **********************
byte[] data = new byte[1016];
for(int i = 0; i < 1016; i++){
data[i] = (byte) dis.read();
}
for (int i = 0; i < 4; i++){
dis.read();
}
// ***************** output data ********************
if (checksumCheck(checksum) && fecfCheck(fecf)){
FileOutputStream output = new FileOutputStream("ParsedData", true);
try{
output.write(data);
}
finally{
output.close();
}
}
}
catch (EOFException eof) {
}
dis.close();
}
}
markerFound = true;
This line is not inside a conditional and will be executed in any occurrence of the loop.
Which will of course shut down your loop because:
for (int offset = 0; !markerFound && offset < 4; offset++)
First thing
You are opening the file inside your for, so, the reading always will start at the beginning of the file. Open it before the first for.
Second
Because of the test !markerFound && offset < 4, your loop will occur max 4 times.
Third
This code not make sense to me:
for (int i = 0; i < offset; i++){
dis.read();
}
Because the offset, in the first iteration, is 0, in the next will be 1, and so on. And that loop is not necessary, you are using another loop to read bytes until reach the MARKER.
Fourth
If your file has "records" with fixed lenghts and the markers occurs on predictable positionings, use the DataInputStream skipBytes method to go forward to next marker.
As I'd posted in an earlier answer to your question Java, need a while loop to reach eof. i.e.while !eof, keep parsing I'd like to state again that DataInputStream.read() (unlike other readXxX() methods) does not throw EOFExcepion.
From the JavaDocs: (DataInputStream inherits read() from FilterInputStream)
If no byte is available because the end of the stream has been reached, the value -1 is returned.
So, to correctly check for EOF, usually read(byte[]) is used in a while loop as follows:
int read = 0;
byte[] b = new byte[1024];
while ((read = dis.read(b)) != -1) { // returns numOfBytesRead or -1 at EOF
// fos = FileOutputStream
fos.write(b, 0, read); // (byte[], offset, numOfBytesToWrite)
}
Answer
Now, getting back to your current question; since, you haven't shared your binary file format it's difficult to suggest a better way to parse it. So, from the limited understanding of the way your nested loops are parsing your file currently; you need another while loop (as reasoned above) to read/parse and copy your "data" till you reach EOF once you've found the marker.
markerFound = true;
for (int i = 0; i < 11; i++){ // move this loop inside while IF
dis.read(); // these 11 bytes need to be skipped every time
}
// Open the file just ONCE (outside the loop)
FileOutputStream output = new FileOutputStream("ParsedData", true);
// ********************** data **********************
int read = 0;
byte[] data = new byte[1016]; // set byte buffer size
while ((read = dis.read(data)) != -1) { // read and check for EOF
// ***************** output data ********************
if (checksumCheck(checksum) && fecfCheck(fecf)) { // if checksum is valid
output.write(data, 0, read); // write the number of bytes read before
}
// SKIP four bytes
for (int i = 0; i < 4; i++) { // or, dis.skipBytes(4); instead of the loop
dis.read();
}
}
// Close the file AFTER input stream reaches EOF
output.close(); // i.e. all the data has been written
Related
The Story
I've been having a problem lately...
I have to read a file in reverse character by character without running out of memory.
I can't read it line-by-line and reverse it with StringBuilder because it's a one-line file that takes up to a gig (GB) of I/O space.
Hence it would take up too much of the JVM's (and the System's) Memory.
I've decided to just read it character by character from end-to-start (back-to-front) so that I could process as much as I can without consuming too much memory.
What I've Tried
I know how to read a file in one go:
(MappedByteBuffer+FileChannel+Charset which gave me OutOfMemoryExceptions)
and read a file character-by-character with UTF-8 character support
(FileInputStream+InputStreamReader).
The problem is that FileInputStream's #read() only calls #read0() which is a native method!
Because of that I have no idea about the underlying code...
Which is why I'm here today (or at least until this is done)!
This will do it (but as written it is not very efficient).
just skip to the last location read less one and read and print the character.
then reset the location to the mark, adjust size and continue.
File f = new File("Some File name");
int size = (int) f.length();
int bsize = 1;
byte[] buf = new byte[bsize];
try (BufferedInputStream b =
new BufferedInputStream(new FileInputStream(f))) {
while (size > 0) {
b.mark(size);
b.skip(size - bsize);
int k = b.read(buf);
System.out.print((char) buf[0]);
size -= k;
b.reset();
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
This could be improved by increasing the buffer size and making equivalent adjustments in the mark and skip arguments.
Updated Version
I wasn't fully satisfied with my answer so I made it more general. Some variables could have served double duty but using meaningful names helps clarify how they are used.
Mark must be used so reset can be used. However, it only needs to be set once and is set to position 0 outside of the main loop. I do not know if marking closer to the read point is more efficient or not.
skipCnt - initally set to fileLength it is the number of bytes to skip before reading. If the number of bytes remaining is greater than the buffer size, then the skip count will be skipCnt - bsize. Else it will be 0.
remainingBytes - a running total of how many bytes are still to be read. It is updated by subtracting the current readCnt.
readCnt - how many bytes to read. If remainingBytes is greater than bsize then set to bsize, else set to remainingBytes
The while loop continuously reads the file starting near the end and then prints the just read information in reverse order. All variables are updated and the process repeats until the remainingBytes reaches 0.
File f = new File("some file");
int bsize = 16;
int fileSize = (int)f.length();
int remainingBytes = fileSize;
int skipCnt = fileSize;
byte[] buf = new byte[bsize];
try (BufferedInputStream b =
new BufferedInputStream(new FileInputStream(f))) {
b.mark(0);
while(remainingBytes > 0) {
skipCnt = skipCnt > bsize ? skipCnt - bsize : 0;
b.skip(skipCnt);
int readCnt = remainingBytes > bsize ? bsize : remainingBytes;
b.read(buf,0,readCnt);
for (int i = readCnt-1; i >= 0; i--) {
System.out.print((char) buf[i]);
}
remainingBytes -= readCnt;
b.reset();
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
This doesn't support multi byte UTF-8 characters
Using a RandomAccessFile you can easily read a file in chunks from the end to the beginning, and reverse each of the chunks.
Here's a simple example:
import java.io.FileWriter;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.stream.IntStream;
class Test {
private static final int BUF_SIZE = 10;
private static final int FILE_LINE_COUNT = 105;
public static void main(String[] args) throws Exception {
// create a large file
try (FileWriter fw = new FileWriter("largeFile.txt")) {
IntStream.range(1, FILE_LINE_COUNT).mapToObj(Integer::toString).forEach(s -> {
try {
fw.write(s + "\n");
} catch (IOException e) {
throw new RuntimeException(e);
}
});
}
// reverse the file
try (RandomAccessFile raf = new RandomAccessFile("largeFile.txt", "r")) {
long size = raf.length();
byte[] buf = new byte[BUF_SIZE];
for (long i = size - BUF_SIZE; i > -BUF_SIZE; i -= BUF_SIZE) {
long offset = Math.max(0, i);
long readSize = Math.min(i + BUF_SIZE, BUF_SIZE);
raf.seek(offset);
raf.read(buf, 0, (int) readSize);
for (int j = (int) readSize - 1; j >= 0; j--) {
System.out.print((char) buf[j]);
}
}
}
}
}
This uses a very small file and very small chunks so that you can test it easily. Increase those constants to see it work on a larger scale.
The input file contains newlines to make it easy to read the output, but the reversal doesn't depend on the file "having lines".
I am trying to build a manual HTTP client (using sockets) along with a cache and I cant seem to figure out why the files are not saving to disk properly. It works pretty good for HTML files, but cant seem to work for other files types that re not text based like .gif. Could anyone tell me why? I am quite new to HTTP protocol and Socket programming in general.
The loop to grab the response.
InputStream inputStream = socket.getInputStream();
PrintWriter outputStream = new PrintWriter(socket.getOutputStream());
ArrayList<Byte> dataIn = new ArrayList<Byte>();
ArrayList<String> stringData = new ArrayList<String>();
//Indices to show the location of certain lines in arrayList
int blankIndex = 8;
int lastModIndex = 0;
int byteBlankIndex = 0;
try
{
//Get last modified date
long lastMod = getLastModified(url);
Date d = new Date(lastMod);
//Construct the get request
outputStream.print("GET "+ "/" + pathName + " HTTP/1.1\r\n");
outputStream.print("If-Modified-Since: " + ft.format(d)+ "\r\n");
outputStream.print("Host: " + hostString+"\r\n");
outputStream.print("\r\n");
outputStream.flush();
//Booleans to prevent duplicates, only need first occurrences of key strings
boolean blankDetected = false;
boolean lastModDetected = false;
//Keep track of current index
int count = 0;
int byteCount = 0;
//While loop to read response
String buff = "";
byte t;
while ( (t = (byte) inputStream.read()) != -1)
{
dataIn.add(t);
//Check for key lines
char x = (char) t;
buff = buff + x;
//For the first blank line (signaling the end of the header)
if(x == '\n')
{
stringData.add(buff);
if(buff.equals("\r\n") && !blankDetected)
{
blankDetected = true;
blankIndex = count;
byteBlankIndex = byteCount + 2;
}
//For the last modified line
if(buff.contains("Last-Modified:") && !lastModDetected)
{
lastModDetected = true;
lastModIndex = count;
}
buff = "";
count++;
}
//Increment count
byteCount++;
}
}
The the code to parse through response and write file to disk.
String catalogKey = hostString+ "/" + pathName;
//Get the directory sequence to make
String directoryPath = catalogKey.substring(0, catalogKey.lastIndexOf("/") + 1);
//Make the directory sequence if possible, ignore the boolean value that results
boolean ignoreThisBooleanVal = new File(directoryPath).mkdirs();
//Setup output file, and then write the contents of dataIn (excluding header) to the file
PrintWriter output = new PrintWriter(new FileWriter(new File(catalogKey)),true);
for(int i = byteBlankIndex + 1 ; i < dataIn.size(); i++)
{
output.print(new String(new byte[]{ (byte)dataIn.get(i)}, StandardCharsets.UTF_8));
}
output.close();
byte t;
while ( (t = (byte) inputStream.read()) != -1)
The problem is here. It should read:
int t;
while ( (t = inputStream.read()) != -1)
{
byte b = (byte)t;
// use b from now on in the loop.
The issue is that a byte of 0xff in the input will be returned to the int as 0xff, but to the byte as -1, so you are unable to distinguish it from end of stream.
And you should use a FileOutputStream, not a FileWriter, and you should not accumulate potentially binary data into a String or StringBuffer or anything to do with char. As soon as you've got to the end of the header you should open a FileOutputStream and just start copying bytes. Use buffered streams to make all this more efficient.
Not much point in any of these given that HttpURLConnection already exists.
Is it possible to find instances of // in a line read from a file into a byte array and then "snip" from // to the end of the line out? I'm trying
FileInputStream fis = new FileInputStream(file);
byte[] buffer = new byte[8 * 1024];
int read;
while ((read = fis.read(buffer)) != -1)
{
for (int i = 0; i < read; i++)
{
if (buffer[i] == '//')
{
buffer = buffer[0:i];
}
}
}
but I'm getting Invalid character constant at if (buffer[i] == '//') on the '//' part. Am I doing something wrong, or is this just not possible?
Old-school solution
for (int i = 0; i < read-1; i++)
{
(if (buffer[i] == '/') && (buffer[i+1]== '/'))
{
buffer = buffer[0:i];
}
}
' and ' denote one character. Since // are two characters this does not work. One has to differentiate between a character and a string. Thus you have to individually check both positions in the byte array to confirm there are two successive /s.
Scenario:
1.Create fromX.txt and toY.txt file (content has to be appended and will come from another logic)
2.check every second fromX.txt file for new addition if yes write it to toY.txt
how to get the just new content fromX.txt file?
I have tried implementing it by counting number of lines and looking for any change in it.
public static int countLines(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
You implement it like this:
Open the using RandomAccessFile
Seek to where the end-of-file was last time. (If this is the first time, seek to the start of the file.)
Read until you reach the new end-of-file.
Record where the end-of-file is.
Close the RandomAccessFile
Record the position as a byte offset from the start of the file, and use the same value for seeking.
You can modify the above to reuse the RandomAccessFile object rather than opening / closing it each time.
UPDATE - The javadocs for RandomAccessFile are here. Look for the seek and getFilePointer methods.
Are there any ways to store a large binary file like 50 MB in the ten files with 5 MB?
thanks
are there any special classes for doing this?
Use a FileInputStream to read the file and a FileOutputStream to write it.
Here a simple (incomplete) example (missing error handling, writes 1K chunks)
public static int split(File file, String name, int size) throws IOException {
FileInputStream input = new FileInputStream(file);
FileOutputStream output = null;
byte[] buffer = new byte[1024];
int count = 0;
boolean done = false;
while (!done) {
output = new FileOutputStream(String.format(name, count));
count += 1;
for (int written = 0; written < size; ) {
int len = input.read(buffer);
if (len == -1) {
done = true;
break;
}
output.write(buffer, 0, len);
written += len;
}
output.close();
}
input.close();
return count;
}
and called like
File input = new File("C:/data/in.gz");
String name = "C:/data/in.gz.part%02d"; // %02d will be replaced by segment number
split(input, name, 5000 * 1024));
Yes, there are. Basically just count the bytes which you write to file and if it hits a certain limit, then stop writing, reset the counter and continue writing to another file using a certain filename pattern so that you can correlate the files with each other. You can do that in a loop. You can learn here how to write to files in Java and for the remnant just apply the primary school maths.