Java fileinputstream reading without .read() - java

I ran into a weird problem, and i was wondering if anyone has an idea what could be the cause. I'm reading in a file ( a small exe of 472 KB ) with FileInputStream, i plan to send the file torugh RMI connection, and i had an idea, where i could show the upload's % based on how much have i already sent trough compared to the overall length of the file.
First i tried it out locally and i couldn't get it work. Here is an example, what i was doing.
FileInputStream fileData = new FileInputStream(file);
reads = new ArrayList<Integer>();
buffers = new ArrayList<byte[]>();
int i = 0;
while ( (read = fileData.read(buffer)) > 0) {
System.out.println("Run : " + (i + 1));
outstreamA.write(buffer, 0, read);
reads.add(read);
buffers.add(buffer);
outstreamB.write(this.buffers.get(i), 0, this.reads.get(i));
i = i + 1;
}
This two FileOutputStream creates two files ( same ones just with different name ), works fine. However, when i'm not using fileData.read() but any other for / while, it just dosen't work. It creates the exact same file ( length is exactly the same ) but my Window cannot run the exe, i get an error message :
"The version of this file is not compatible with the version of Windows you're running...".
This is how i tried:
//for (int i = 0; i < buffers.size(); ++i) {
i = 0;
//while ( (read = fileData2.read(buffer)) > 0) {
while ( i < size) {
System.out.println("Run#2 : " + (i + 1));
outstreamC.write(this.buffers.get(i), 0, this.reads.get(i));
i = i + 1;
}
fileData2 is the same as fileData. If i work with fileData2.read(buffer), outstreamC creates a working file aswell.
It dosen't matter if i run with for till the list's size, or till "size" which equals the time i entered the first while. There is something missing, and i cannot figure it out.
The weird thing is, outstreamB creates a working file, yet outstreamC cannot, but they working with the exact same items.
Originally i was planning to pass the "read" and "buffer" each time i entered the first while trough RMI connection, and put everything together on the other side, after all the parts arrived, but now my plan is kinda dead. Anyone has maybe an idea, how could i solve this, or achieve something similar to be able to send files trough RMI?
Best regards,
Mihaly

Your code can never work. You are reading into the same buffer repeatedly and adding the same buffer to a list. So the list contains several copies of the final data you read. You would need to allocate a new buffer every time around the loop.

Related

Java "Quicksave" Execution

The problem I seem to have hit is one relating to loading times; I'm not running on a particularly fast machine by any means, but I still want to dabble into neural networks. In short, I have to load 336,600,000 integers into one large array (I'm using the MNIST database; each image is 28x28, which amounts to 748 pixels per image, times 45,000 images). It works fine, and surprisingly I don't run out of RAM, but... it takes 4 and a half hours, just to get the data into an array.
I can supply the rest of the code if you want me to, but here's the function that runs through the file.
public static short[][] readfile(String fileName) throws FileNotFoundException, IOException {
short[][] array = new short[10000][784];
BufferedReader br = new BufferedReader(new FileReader(System.getProperty("user.dir") + "/MNIST/" + fileName + ".csv"));
br.readLine();
try {
for (short i = 1; i < 45000; i++) {
String line = br.readLine();
for (short j = 0; j < 784; j++) {
array[i][j] = Short.parseShort(line.split(",")[j]);
}
}
br.close();
} catch (IOException e) {
e.printStackTrace();
}
return array;
}
What I want to know is, is there some way to "quicksave" the execution of the program so that I don't have to rebuild the array for every small tweak?
Note: I haven't touched Java in a while, and my code is mostly chunked together from a lot of different sources. I wouldn't be surprised if there were some serious errors (or just Java "no-nos"), it would actually help me a lot if you could fix them if you answer.
Edit: Bad question, I'm just blind... sorry for wasting time
Edit 2: I've decided after a while that instead of loading all of the images, and then training with them one by one, I could simply train one by one and load the next. Thank you all for your ideas!
array[i][j] = Short.parseShort(line.split(",")[j]);
You are calling String#split() for every single integer.
Call it once outside the loop and copy the value into your 2d array.

Java data structure for providing random <String><Float> pair based on a large data set at run-time

Is there a smart way to create a 'JSON-like' structure of String - Float pairs, 'key' not needed as data will be grabbed randomly - although an incremented key from 0-n might aid random retrieval of associated data. Due to the size of data set (10k pairs of values), I need this to be saved out to an external file type.
The reason is how my data will be compiled. To save someone entering data into an array manually the item will be excel based, saved out to CSV, parsed using a temporary java program to a file format (for example jJSON) which can be added to my project resources folder. I can then retrieve data from this set, without my application having to manually load a huge array into memory upon application creation. I can quite easily parse the CSV to 'fill-up' an array (or similar) at run-time - but I fear that on a mobile device, the memory overhead will be significant?
I have reviewed the answers to: Suitable Java data structure for parsing large data file and Data structure options for efficiently storing sets of integer pairs on disk? and have not been able to draw a definitive conclusion.
I have tried saving to a .JSON file, however not sure if I can request a random entry, plus this seems quite cumbersome for holding a simple structure. Is a treeMap or hashtable where I need to be focusing my search.
To provide some context to my query, my application will be running on android, and needs to reference a definition (approx 500 character String) and a conversion factor (an Float). I need to retrieve a random data entry. The user may only make 2 or 3 requests during a session - therefore see no point in loading a 10k element array into memory. QUERY: potentially modern day technology on android phones will easily munch through this type of query, and its perhaps only an issue if I am parsing millions of entries at run-time?
I am open to using SQLlite to hold my data if this will provide the functionality required. Please note that the data set must be derived from an easily exportable file format from excel (CSV, TXT etc).
Any advice you can give me would be much appreciated.
Here's one possible design that requires a minimal memory footprint while providing fast access:
Start with a data file of comma-separated or tab-separated values so you have line breaks between your data pairs.
Keep an array of long values corresponding to the indexes of the lines in the data file. When you know where the lines are, you can use InputStream.skip() to advance to the desired line. This leverages the fact that skip() is typically quite a bit faster than read for InputStreams.
You would have some setup code that would run at initialization time to index the lines.
An enhancement would be to only index every nth line so that the array is smaller. So if n is 100 and you're accessing line 1003, you take the 10th index to skip to line 1000, then read past two more lines to get to line 1003. This allows you to tune the size of the array to use less memory.
I thought this was an interesting problem, so I put together some code to test my idea. It uses a sample 4MB CSV file that I downloaded from some big data website that has about 36K lines of data. Most of the lines are longer than 100 chars.
Here's code snippet for the setup phase:
long start = SystemClock.elapsedRealtime();
int lineCount = 0;
try (InputStream in = getResources().openRawResource(R.raw.fl_insurance_sample)) {
int index = 0;
int charCount = 0;
int cIn;
while ((cIn = in.read()) != -1) {
charCount++;
char ch = (char) cIn; // this was for debugging
if (ch == '\n' || ch == '\r') {
lineCount++;
if (lineCount % MULTIPLE == 0) {
index = lineCount / MULTIPLE;
if (index == mLines.length) {
mLines = Arrays.copyOf(mLines, mLines.length + 100);
}
mLines[index] = charCount;
}
}
}
mLines = Arrays.copyOf(mLines, index+1);
} catch (IOException e) {
Log.e(TAG, "error reading raw resource", e);
}
long elapsed = SystemClock.elapsedRealtime() - start;
I discovered my data file was actually separated by carriage returns rather than line feeds. It must have been created on an Apple computer. Hence the test for '\r' as well as '\n'.
Here's a snippet from the code to access the line:
long start = SystemClock.elapsedRealtime();
int ch;
int line = Integer.parseInt(editText.getText().toString().trim());
if (line < 1 || line >= mLines.length ) {
mTextView.setText("invalid line: " + line + 1);
}
line--;
int index = (line / MULTIPLE);
in.skip(mLines[index]);
int rem = line % MULTIPLE;
while (rem > 0) {
ch = in.read();
if (ch == -1) {
return; // readLine will fail
} else if (ch == '\n' || ch == '\r') {
rem--;
}
}
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String text = reader.readLine();
long elapsed = SystemClock.elapsedRealtime() - start;
My test program used an EditText so that I could input the line number.
So to give you some idea of performance, the first phase averaged around 1600ms to read through the entire file. I used a MULTIPLE value of 10. Accessing the last record in the file averaged about 30ms.
To get down to 30ms access with only a 29312-byte memory footprint is pretty good, I think.
You can see the sample project on GitHub.

H5 file reading very slow with Java

I have a Java program using the H5 libraries that tries to read a dataset in a H5 file with the following properties:
The file's size is 769M.
The code that reads the dataset is the following (very simple):
// Open file using the default properties.
fileId = H5.H5Fopen(filepath, HDF5Constants.H5F_ACC_RDONLY, HDF5Constants.H5P_DEFAULT);
// Open dataset using the default properties.
if (fileId >= 0) {
datasetId = H5.H5Dopen(fileId, "/data/0_u0/20050103", HDF5Constants.H5P_DEFAULT);
}
if (datasetId >= 0) {
dataSpaceId = H5.H5Dget_space(datasetId);
}
// Get the dimensions of the dataset
int ndims = -1;
if (dataSpaceId >= 0)
ndims = H5.H5Sget_simple_extent_ndims(dataSpaceId);
if (ndims > 0) {
long[] dims = new long[ndims];
H5.H5Sget_simple_extent_dims(dataSpaceId, dims, null);
H5.H5Sclose(dataSpaceId);
int dimX = (int)dims[0];
int dimY = (int)dims[1];
Double[][] dsetData = new Double[dimX][dimY];
H5.H5Dread(datasetId, HDF5Constants.H5T_NATIVE_DOUBLE,
HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
HDF5Constants.H5P_DEFAULT, dsetData);
}
And it takes forever (more than 15 minutes, I stopped after that).
What I don't understand is that I also have kind of the same code in Python, and it takes a few seconds.
When I debug the Java program and stop in the middle execution, it's in the byteToDouble() function of the H5 lib. It's a lot of double, but should not take that much time right?
Thanks for your help!
I think the issue is that your reading the data into 2D array Double[][]. When you do this the HDF5 implementation is very slow (think the issue is probably in HDFArray.arrayify). Try reading the data into a 1D double[].
Also you are using boxed Double it would probably be better to use primative double.

In Java, how do I set a progress bar for FileChannel?

For example,
Channels.newChannel(url.openStream());
This one line opens the stream and gets data from it, but there is no way I can find the progress for it.
Is there a way to do so?
This one line opens the stream and gets data from it
No, it won't until you .read() from it. Therefore...
Is there a way to do so?
Yes there is. You .read(buf) where buf is a ByteBuffer.
Just grab that buffer's .position().
(and note that Channels.newChannel(InputStream) will not return a FileChannel but a ReadableByteChannel)
When using the transferFrom() or transferTo() Method, you have to pass the starting point within the file, as well as the amount of Bytes to transfer.
What you could do, is to break up the file into, say 100 "parts". This is done very simply:
File f = new File(url);
int size = f.length();
Say our file has a size of 100KB so size would now be 102400.
Just divide this by the amount of "parts" (100 in this example) and we have 1024 Bytes per "part". Now all we have to do is show the percentage of "parts" which we already transfered.
As an Example
int pos = 0;
for(int x = 0 ; x < 100 /*amount of parts*/ ; x++) {
source.transferTo(0*x, 1024 /*part size*/, destination);
progress.setValue(x/100 /*amount of parts*/ *100);
}

Reading Objects until End of File in java

I'm trying to write a program where the user can: 1) Add a person to the contact (name, phone, email), 2) Remove a person from the contacts, 3) Read all from contact.
The Way I'm doing this is I'm asking for the user for their choice and respectively does whatever. For writing, I simply write an object to the file. For removing, I think I'll be asking the user for "last name" which will be used as the KEY (since I'm using a TreeMap)and will remove the value (object) at the key.
So I'm having a problem with reading here. I'm trying to read the object like so:
public void readContact()
{
TreeMap<String, Contact> contactMap = new TreeMap<String, Contact>();
try
{
ObjectInputStream in = new ObjectInputStream(new BufferedInputStream(
new FileInputStream(file)));
while( in.available() > 0 ) //This line does NOT read
{
Contact c = (Contact)in.readObject();
contactMap.put(c.getLastName(), c);
}
for(Map.Entry contact : contactMap.entrySet() )
{
Contact con = contactMap.get( contact.getKey() );
System.out.println( con.getLastName() + ", " + con.getFirstName() + ": " + con.getPhoneNumber() + "\t" + con.getEmail());
}
}
catch(Exception e)
{
System.out.println("Exception caught");
}
}
Please do not suggest doing something like while(true) until I get the EOFException because:
that isn't what exception handling is for I believe
I still have more things to do after this so I can't have the program terminating'
Please do not suggest doing something like while(true) until I get the EOFException
That is exactly what I suggest. When you are searching for answers it is counter-productive to circumscribe the solution space according to arbitrary criteria like this.
because:
that isn't what exception handling is for I believe
When an API that you are calling throws an exception, as this one does, you don't have any choice but to catch it. Whatever you may think about 'what exception handling is for', you are subject to what the designers of the API thought when they designed the API.
I still have more things to do after this so I can't have the program terminating'
So don't terminate it. Catch EOFException, close the input, and break out of the loop.
I have seen more costly programming time wasted over 'what exception handling is for' than I can really credit.
I know that you are looking for an answer that is not using exception handling, but I believe in this case using EOFException to determine when all input has been read is the right way.
The JavaDoc of EOFException states that
This exception is mainly used by data input streams to signal end of stream. Note that many other input operations return a special value on end of stream rather than throwing an exception.
So, there are input streams that use other means to signal an end of file, but ObjectInputStream#readObject uses ObjectInputStream$BlockDataInputStream#peekByte to determine if there is more data to read, and peekByte throws an EOFException when the end of the stream has been reached.
So it is feasible to use this exception as an indicator that the end of the file has been reached.
To handle the exceptions without interrupting the program flow, some of the possible exceptions should be passed up in the hierarchy. They can be handled by a try - catch block in the code that calls readContact().
The EOFException can simply be used as an indicator that we are done reading the objects.
public TreeMap<String, Contact> readContact() throws FileNotFoundException,
IOException, ClassNotFoundException {
TreeMap<String, Contact> contactMap = new TreeMap<String, Contact>();
// The following call can throw a FileNotFoundException or an IOException.
// Since this is probably better dealt with in the calling function,
// readContact is made to throw these exceptions instead.
ObjectInputStream in = new ObjectInputStream(new BufferedInputStream(
new FileInputStream(file)));
while (true) {
try {
// Read the next object from the stream. If there is none, the
// EOFException will be thrown.
// This call might also throw a ClassNotFoundException, which can be passed
// up or handled here.
Contact c = (Contact) in.readObject();
contactMap.put(c.getLastName(), c);
for (Map.Entry<String, Contact> contact : contactMap.entrySet()) {
Contact con = contact.getValue();
System.out.println(con.getLastName() + ", "
+ con.getFirstName() + ": " + con.getPhoneNumber()
+ "\t" + con.getEmail());
}
} catch (EOFException e) {
// If there are no more objects to read, return what we have.
return contactMap;
} finally {
// Close the stream.
in.close();
}
}
}
- Exceptions are not only used in order to raise an alarm when something goes wrong while calling a method, but are also used in Threads and IO with various other uses.
- You can use Exception to indicate end of the file.
- Use the try-catch combo to work along with the above to keep the flow of the program smooth.
Why so much trouble while reading object from file, just save a hash map into a file and read the same once from file then perform any operation.
Also I would suggest to use any one of object oriented database like Db4o to do this quickly then you never worry about end of file exception
'ObjectInputStream.available returns 0' is a known problem, see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4954570, and since we cannot use it I think EOFException would be a reasonable approach in your situation. Catching EOFExcepion will not terminate your program.
You could write the number of objects to your file with ObjectOutputStream.writeInt, and then you would read this number with ObjectInputStream.readInt and know how many objects to read
You could use null as EOF marker.
You could save your objects as an array or List or even Map and then read them with one readObject.
What you have discovered
You found out about FileInputStream.available() returning 0 even though there are unread bytes in the file! Why is this? This could happen (rendering FileInputStream.available() unreliable for a couple of reasons:
According to this documentation, FileInputStream.available() approximates the number of bytes that can be read without blocking
The hard disk drive itself might change its operation mid-read (go from spinning to not spinning)
The file you are trying to access is either a file on a remote system or a device file : May the FileInputStream.available foolish me?
The FileInputStream might be blocked
Some alternative way
As an alternative to relying on EOFException to close the file, you could use a (very-small!) binary file that keeps track of the number of Objects in your file. (Judging from your code, it looks like you are simply writing Objects to the file.) The way I have used this is just to
store the number of bytes the number itself will consume
using that number of bytes, store the number itself
For example, the first time the serialization file is created, I could make the binary file store 1 1 (which specifies that the number of Objects in the serialization file takes up 1 byte, and that number is 1). This way, after 255 Objects (remember, an unsigned byte can only store up to 28-1 == 255), if I write another Object (Object number 256 up to 2562-1 == 65535), the binary file will have, as contents, 2 1 0, which specifies that the number takes up 2 bytes and is 1*2561+0 == 256. Provided that the serialization is reliable (good luck on ensuring that: http://www.ibm.com/developerworks/library/j-serialtest/index.html), this method will let you store (and detect) up to 256255-1 bytes (which pretty much means that this method works indefinitely).
The code itself
How something like that would be implemented would be something like:
ObjectOutputStream yourOutputStream = new ObjectOutputStream(new FileOutputStream(workingDirectory + File.separatorChar + yourFileName); //The outputStream
File binaryFile = new File(workingDirectory + File.separatorChar + nameOfFile); //the binary file
int numOfObjects = 0, numOfBytes; //The number of Objects in the file
//reading the number of Objects from the file (if the file exists)
try
{
FileInputStream byteReader = new FileInputStream(binaryFile);
numOfBytes = byteReader.read();
//Read the rest of the bytes (the number itself)
for (int exponent = numOfBytes; exponent >= 0; exponent--)
{
numOfObjects += byteReader.read() * Math.pow(256,exponent);
}
}
catch (IOException exception)
{
//if an exception was thrown due to the file not existing
if (exception.getClass() == FileNotFoundException.class)
{
//we simply create the file (as mentioned earlier in this answer)
try
{
FileOutputStream fileCreator = new FileOutputStream(binaryFile);
//we write the integers '1','1' to the file
for (int x = 0; x < 2; x++) fileCreator.write(1);
//attempt to close the file
fileCreator.close();
}
catch (IOException innerException)
{
//here, we need to handle this; something went wrong
innerException.printStackTrace();
System.exit(-1);
}
}
else
{
exception.printStackTrace();
System.exit(-2);
}
}
Now, we have the number of Objects in the file (I leave it to you to figure out how to update the bytes to indicate that one more Object has been written when yourOutputStream calls writeObject(yourObject);; I have to go clock in.)
Edit: yourOutputStream is either going to write over all the data in the binaryFile or append data to it. I just found out that RandomAccessFile would be a way to insert data into anywhere in the file. Again, I leave details to you. However you want to do it.
You can write the last object as null.
And then iterate till you get a null at the reading side.
e.g.
while ((object = inputStream.readObject()) != null) {
// ...
}

Categories