The problem I seem to have hit is one relating to loading times; I'm not running on a particularly fast machine by any means, but I still want to dabble into neural networks. In short, I have to load 336,600,000 integers into one large array (I'm using the MNIST database; each image is 28x28, which amounts to 748 pixels per image, times 45,000 images). It works fine, and surprisingly I don't run out of RAM, but... it takes 4 and a half hours, just to get the data into an array.
I can supply the rest of the code if you want me to, but here's the function that runs through the file.
public static short[][] readfile(String fileName) throws FileNotFoundException, IOException {
short[][] array = new short[10000][784];
BufferedReader br = new BufferedReader(new FileReader(System.getProperty("user.dir") + "/MNIST/" + fileName + ".csv"));
br.readLine();
try {
for (short i = 1; i < 45000; i++) {
String line = br.readLine();
for (short j = 0; j < 784; j++) {
array[i][j] = Short.parseShort(line.split(",")[j]);
}
}
br.close();
} catch (IOException e) {
e.printStackTrace();
}
return array;
}
What I want to know is, is there some way to "quicksave" the execution of the program so that I don't have to rebuild the array for every small tweak?
Note: I haven't touched Java in a while, and my code is mostly chunked together from a lot of different sources. I wouldn't be surprised if there were some serious errors (or just Java "no-nos"), it would actually help me a lot if you could fix them if you answer.
Edit: Bad question, I'm just blind... sorry for wasting time
Edit 2: I've decided after a while that instead of loading all of the images, and then training with them one by one, I could simply train one by one and load the next. Thank you all for your ideas!
array[i][j] = Short.parseShort(line.split(",")[j]);
You are calling String#split() for every single integer.
Call it once outside the loop and copy the value into your 2d array.
Related
I'm fairly new to coding and am struggling with an assignment for my class. The program takes a user input for the size of an Array and prompts the user to enter each value 1 at a time. The array size starts at 3 and if the array needs to be bigger when the array has filled a new array that's 2x size is created and all info is copied into it. I was able to figure out this part but I just can't see what I'm doing wrong in the downsizing part. After the info is copied I have to remove the trailing zeroes. I think I have the downsize method right but I don't know if I'm calling it right
import java.util.Scanner;
public class Lab6 {
public static void main(String args[]) {
int[] myarray = new int[3];
int count = 0;
int limit, limitcount = 1;
Scanner kbd = new Scanner(System.in);
System.out.print("How many values would you like to enter? ");
limit = kbd.nextInt();
while (limitcount <= limit) {
System.out.println("Enter an integer value ");
int input = kbd.nextInt();
limitcount++;
if (count < myarray.length) {
myarray[count] = input;
}
else {
myarray = upsize(myarray);
myarray[count] = input;
}
count++;
}
myarray = downsize(myarray, count)
printArray(myarray);
System.out.println("The amount of values in the arrays that we care about is: " + count);
}
static int[] upsize(int[] array) {
int[] bigger = new int[array.length * 2];
for (int i =0;i<array.length; i++) {
bigger[i] = array[i];
}
return bigger;
}
static void printArray( int[] array ) {
for ( int number : array ) {
System.out.print( number + " ");
}
System.out.println();
}
static int[] downsize(int[] array,int count) {
int[] smaller = new int[count];
for (int i =0; i<count; i++) {
smaller[i] = array[i];
}
return array;
}
}
Giving you a full response rather than a comment since you're new here and I don't want to discourage you with brevity which could be misunderstood.
Not sure what happened to your code when you pasted it in here, you've provided everything but the format is weird (the 'code' bit is missing out a few lines at the top and bottom). Might be one to double-check before posting. After posting, I see that someone else has already edited your code to fix this one.
You're missing a semi-colon. I'm not a fan of handing out answers, so I'll leave you to find it :) If you're running your code in an IDE, it should already be flagging that one up for you. If you're not, why on earth not??? IntelliJ is free, easy to get going with, and incredibly helpful. There are others out there as well which different folk prefer :) An IDE will help you spot all sorts of useful things quickly.
I have now run your code, and you do have a problem! It's in your final method, downsize(). Look very, very carefully at the return statement ;) Your questions suggests you aren't actually sure whether or not this method is right, which makes me wonder: have you actually run this code with different inputs to see what results you get? Please do that.
Style-wise: blank lines between methods would make the code easier to look at, by providing a visual gap between components. Please be consistent with putting your opening { on the same line as the method signature, and with having spaces between items, e.g. for (int i = 0; i < count; i++) rather than for (int i =0; i<count; i++). The compiler couldn't care less, but it is easier for humans to look at and just makes it look like you did care. Always a good thing!
I think it is awesome that you are separating some of the work into smaller methods. Seriously. For extra brownie points, think about how you could move that while() block into its own method, e.g. private int[] getUserData(int numberOfItems, Scanner scanner). Your code is great without this, but the more you learn to write tiny units, the more favours you will be doing your future self.
Has your class looked at unit testing yet? Trust me, if not, when you get to this you will realise just how important point 5 can be. Unit tests will also help a lot with issues such as the one in point 3 above.
Overall, it looks pretty good to me. Keep going!!!
Simple mistake in your downsize method. If you have an IDE like Eclipse, Intellij, etc. you would have seen it flagged right away.
return array; // should return smaller
I have a few suggestions since you mentioned being new to coding.
The "limitcount" variable can be removed and substituted with "count" at every instance. I'll leave it to you to figure that out.
Try using more descriptive and understandable variable names. Other people will read your code (like now) and appreciate it.
Try to use consistent spacing/indentation throughout your code.
Your upsize method can be simplified using a System.arraycopy() call which generally performs better and avoids the need for writing out a for loop. You can rewrite downsize in a similar manner.
static int[] upsize(int[] array) {
int[] bigger = new int[array.length * 2];
System.arraycopy(array, 0, bigger, 0, array.length);
return bigger;
}
Edit: All good points by sunrise above - especially that you've done well given your experience. You should set up an IDE when you have the time, they're simple to use and invaluable. When you do so you should learn to step through a debugger to explore the state of your program over time. In this case you would have noticed that the myarray variable was never reassigned after the downsize() call, quickly leading you to a solution (if you had missed the warning about an unused "smaller" array).
So, I've searched around stackoverflow for a bit, but I can't seem to find an answer to this issue.
My current homework for my CS class involves reading from a file of 5000 random numbers and doing various things with the data, like putting it into an array, seeing how many times a number occurs, and finding what the longest increasing sequence is. I've got all that done just fine.
In addition to this, I am (for myself) adding in a method that will allow me to overwrite the file and create 5000 new random numbers to make sure my code works with multiple different test cases.
The method works for the most part, however after I call it it doesn't seem to "activate" until after the rest of the program finishes. If I run it and tell it to change the numbers, I have to run it again to actually see the changed values in the program. Is there a way to fix this?
Current output showing the delay between changing the data:
Not trying to change the data here- control case.
elkshadow5$ ./CompileAndRun.sh
Create a new set of numbers? Y for yes. n
What number are you looking for? 66
66 was found 1 times.
The longest sequence is [606, 3170, 4469, 4801, 5400, 8014]
It is 6 numbers long.
The numbers should change here but they don't.
elkshadow5$ ./CompileAndRun.sh
Create a new set of numbers? Y for yes. y
What number are you looking for? 66
66 was found 1 times.
The longest sequence is [606, 3170, 4469, 4801, 5400, 8014]
It is 6 numbers long.
Now the data shows that it's changed, the run after the data should have been changed.
elkshadow5$ ./CompileAndRun.sh
Create a new set of numbers? Y for yes. n
What number are you looking for? 1
1 was found 3 times.
The longest sequence is [1155, 1501, 4121, 5383, 6000]
It is 5 numbers long.
My code:
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Scanner;
public class jeftsdHW2 {
static Scanner input = new Scanner(System.in);
public static void main(String args[]) throws Exception {
jeftsdHW2 random = new jeftsdHW2();
int[] data;
data = new int[5000];
random.readDataFromFile(data);
random.overwriteRandNums();
}
public int countingOccurrences(int find, int[] array) {
int count = 0;
for (int i : array) {
if (i == find) {
count++;
}
}
return count;
}
public int[] longestSequence(int[] array) {
int[] sequence;
return sequence;
}
public void overwriteRandNums() throws Exception {
System.out.print("Create a new set of numbers? Y for yes.\t");
String answer = input.next();
char yesOrNo = answer.charAt(0);
if (yesOrNo == 'Y' || yesOrNo == 'y') {
writeDataToFile();
}
}
public void readDataFromFile(int[] data) throws Exception {
try {
java.io.File infile = new java.io.File("5000RandomNumbers.txt");
Scanner readFile = new Scanner(infile);
for (int i = 0; i < data.length; i++) {
data[i] = readFile.nextInt();
}
readFile.close();
} catch (FileNotFoundException e) {
System.out.println("Please make sure the file \"5000RandomNumbers.txt\" is in the correct directory before trying to run this.");
System.out.println("Thank you.");
System.exit(1);
}
}
public void writeDataToFile() throws Exception {
int j;
StringBuilder theNumbers = new StringBuilder();
try {
PrintWriter writer = new PrintWriter("5000RandomNumbers.txt", "UTF-8");
for (int i = 0; i < 5000; i++) {
if (i > 1 && i % 10 == 0) {
theNumbers.append("\n");
}
j = (int) (9999 * Math.random());
if (j < 1000) {
theNumbers.append(j + "\t\t");
} else {
theNumbers.append(j + "\t");
}
}
writer.print(theNumbers);
writer.flush();
writer.close();
} catch (IOException e) {
System.out.println("error");
}
}
}
It is possible that the file has not been physically written to the disk, using flush is not enough for this, from the java documentation here:
If the intended destination of this stream is an abstraction provided by the underlying operating system, for example a file, then flushing the stream guarantees only that bytes previously written to the stream are passed to the operating system for writing; it does not guarantee that they are actually written to a physical device such as a disk drive.
Because of the HDDs read and write speed, it is advisable to depend as little as possible on HDD access.
Perhaps storing the random number strings to a list when re-running and using that would be a solution. You could even write the list to disk, but this way the implementation does not depend on the time the file is being written.
EDIT
After the OP posted more of its code it became apparent that my original answer is not relatede to the problem. Nonetheless it is sound.
The code OP posted is not enough to see when is he reading the file after writing. It seems he is writing to the file after reading, which of course is what is percieved as an error. Reading after writing should produce a program that does what you want.
Id est, this:
random.readDataFromFile(data);
random.overwriteRandNums();
Will be reflected until the next execution. This:
random.overwriteRandNums();
random.readDataFromFile(data);
Will use the updated file in the current execution.
During my app development one performance question came to my mind:
I have a lot of lines of data that can looks like that:
!ANG:-0.03,0.14,55.31
!ANG:-0.03,-0.14,305.31
!ANG:-234.03,-0.14,55.31
in general: !ANG:float,float,float
Between those lines there are also "damaged" lines - they don't start with ! or are too short/have extra signs and so on.
To detect lines that are damaged at the begining I simply use
if(myString.charAt(0) != '!')//wrong string
What I can do to detect lines that are damaged at the end? It is very important to mention that I need not only to check if the line is correct but also get those 3 float numbers to use it later.
I've found three options for this:
use regexp
split twice (first ":" and second ",") and count elements
use Scanner class
I am not sure which one of this (or maybe there are other) methods will be the best from the performance point of view. Can you please give me some advice?
EDIT:
After some comments I see that it is worth to write how damage lines an look:
NG:-0.03,0.14,55.31
.14,55.31
!ANG:-0.03,0.14,
!A,-0.02,-0.14,554,-0.12,55
It is quite difficult to talk about number of lines because I am getting them from readings from other device so I get packets of around 20 lines at a time with a frequency of 50Hz.
What I've found out so far is the big drawback of using scanner - for each line I need to create new object and after some time my device is starting to get short on resources.
Benchmark them, then you will know.
The likely fastest way is to write your own tiny state machine to match your format and find the float boundaries. Theoretically a regex will have the same performance, but it's likely to have additional overhead.
As an intermediate solution I'd do something like that :
private static class LineObject {
private float f1, f2, f3;
}
private LineObject parseLine(String line) {
LineObject obj = null;
if (line.startsWith("!ANG:")) {
int i = line.indexOf(',', 5);
if (i != -1) {
int j = line.indexOf(',', i+1);
if (j != -1) {
try {
obj = new LineObject();
obj.f1 = Float.parseFloat(line.substring(5, i));
obj.f2 = Float.parseFloat(line.substring(i+1, j));
obj.f3 = Float.parseFloat(line.substring(++j));
} catch (NumberFormatException e) {
return null;
}
}
}
}
return obj;
}
After you can copy/paste only usefull jdk code of startsWith, indexOf and parseFloat in your own state machine...
Edit:
It was helpful to load the images only once in the default constructor, everything works much faster now. The problem, however, has changed. I can't open the jar file anymore, and if I launch it from the console using java -jar BounceTheSphinx.jar I get this
Exception in thread ''main'' java.lang.IllegalArgumentException: input == null!:
at javax.imageio.ImageIO.read<Unknown Source>
at BounceBack.PanneauJeu.<init>(PanneauJeu.java:55)
at BounceBack.FenetreJeu.<init>(FenetreJeu.java:21)
at BounceBack.MainBounceBack.main(MainBounceBack.java:11)
Line 55 from PanneauJeu.java is fondArray[j] = ImageIO.read(this.getClass().getResource(imageArray[j])); I looked on other posts, but I can't solve my problem with the solutions proposed. The thing is, I really use the same technique to load and display those images, those images exist, everything works in eclipse, yet the fondArray one always causes the problem, not the fondPerdu
I edited the code for you to see
So I wrote in the comments ''WORKS'' and ''DOESN'T WORK'' so you can see where my problem is.
public class PanneauJeu extends JPanel
{
private int i = 0;//color counter
private int j = 0;//imageArray counter
private int k = 0;//imagePerdu counter
private String[] imageArray = {"/resources/Sphinx.png", "/resources/Sphinx2.png ", "/resources/Sphinx3.png", "/resources/Sphinx4.png", "/resources/Sphinx5.png", "/resources/Sphinx6.png", "/resources/Sphinx7.png", "/resources/Sphinx8.png"};//score
private String[] imagePerdu = {"/resources/Lose5.png", "/resources/Lose6.png", "/resources/Lose7.png", "/resources/Lose8.png", "/resources/Lose9.png", "/resources/Lose10.png", "/resources/Lose11.png", "/resources/Lose12.png", "/resources/Lose13.png"};//, "Lose10.png", "Loose11.png", "Loose12.png"};
private Image fond;
private Image fondArray[] = new Image[imageArray.length];
private Image fondPerdu[] = new Image[imagePerdu.length];
public PanneauJeu()//default constructor
{
for(int j = 0; j < imageArray.length; j++)
{
//DOESN'T WORK
try
{
fondArray[j] = ImageIO.read(this.getClass().getResource(imageArray[j]));
}catch(IOException e){e.printStackTrace();}
}
for(int k = 0; k < imagePerdu.length; k++)
{
//WORKS
try
{
fondPerdu[k] = ImageIO.read(this.getClass().getResource(imagePerdu[k]));
}catch(IOException e){e.printStackTrace();}
}
}
Can anyone tell me what could possibly be wrong? Remember, everything works just fine in Eclipse.
Thank you everyone for your help
It's not entirely clear what the issue is, but there's one likely candidate:
You're loading an image every time that you want to display it.
In the case of an animation, that means trying to constantly reload lots of images. This is a burden both in terms of I/O and CPU time. What you want to do is load your images once, and then keep them around (instead of just the file names) to display when you need them. This way, your program doesn't have to be constantly loading and reloading the same data from the filesystem.
There's a reasonable chance that you have another issue, but doing this should make it easier to find.
Once you've moved your loading to happen once, if the problem persists, try launching your JAR from the command line: run java -jar <PATH TO JARFILE>, and see if it prints out any errors. There's a good chance that there's an error happening then that you can't see if you try to launch the JAR from a GUI.
I ran into a weird problem, and i was wondering if anyone has an idea what could be the cause. I'm reading in a file ( a small exe of 472 KB ) with FileInputStream, i plan to send the file torugh RMI connection, and i had an idea, where i could show the upload's % based on how much have i already sent trough compared to the overall length of the file.
First i tried it out locally and i couldn't get it work. Here is an example, what i was doing.
FileInputStream fileData = new FileInputStream(file);
reads = new ArrayList<Integer>();
buffers = new ArrayList<byte[]>();
int i = 0;
while ( (read = fileData.read(buffer)) > 0) {
System.out.println("Run : " + (i + 1));
outstreamA.write(buffer, 0, read);
reads.add(read);
buffers.add(buffer);
outstreamB.write(this.buffers.get(i), 0, this.reads.get(i));
i = i + 1;
}
This two FileOutputStream creates two files ( same ones just with different name ), works fine. However, when i'm not using fileData.read() but any other for / while, it just dosen't work. It creates the exact same file ( length is exactly the same ) but my Window cannot run the exe, i get an error message :
"The version of this file is not compatible with the version of Windows you're running...".
This is how i tried:
//for (int i = 0; i < buffers.size(); ++i) {
i = 0;
//while ( (read = fileData2.read(buffer)) > 0) {
while ( i < size) {
System.out.println("Run#2 : " + (i + 1));
outstreamC.write(this.buffers.get(i), 0, this.reads.get(i));
i = i + 1;
}
fileData2 is the same as fileData. If i work with fileData2.read(buffer), outstreamC creates a working file aswell.
It dosen't matter if i run with for till the list's size, or till "size" which equals the time i entered the first while. There is something missing, and i cannot figure it out.
The weird thing is, outstreamB creates a working file, yet outstreamC cannot, but they working with the exact same items.
Originally i was planning to pass the "read" and "buffer" each time i entered the first while trough RMI connection, and put everything together on the other side, after all the parts arrived, but now my plan is kinda dead. Anyone has maybe an idea, how could i solve this, or achieve something similar to be able to send files trough RMI?
Best regards,
Mihaly
Your code can never work. You are reading into the same buffer repeatedly and adding the same buffer to a list. So the list contains several copies of the final data you read. You would need to allocate a new buffer every time around the loop.