I have some java code that compares two files. When it finds similar numbers on a particular line it prints that line to a new file. This seems to work for a good amount of time... Until what I believe is the last line. That line only ever gets partially printed. I THINK that it might be because of a 'break' that is later in the code, but I looked around on google and wasn't really sure if any of the answers were really relevant.
Here is some code that I think is related:
Read in files
while the line isn't null...
Parse files
Write a header
Some comparisons
while (!input.startsWith("#") && !input.startsWith("P")) {
prse = input.split("\t");//split the file by tabs
pos = prse[7];
poson = prse[8];
pos = Integer.parseInt(poson);
if (cnt < num.size()) { //if we haven't exceeded an array
if (num.get(cnt).equals(pos)) { //if the first number is the same
if (cnt2 < posstart.size()) { //if we haven't exceeded another array
end = Integer.parseInt(posend.get(cnt2)); //change to int
start = Integer.parseInt(posstart.get(cnt2));//change to int
if (pos < start) { //if it is less then the starting pos then it can't fall within
break; //so break
}
if (pos < end && pos > start) {//I am trying to see if a number falls within the range of numbers from a separate file
out1.write(input + "\n"); //If it does: This is where I am writing out the line
break; //if I remove this break the program hangs here
} else {
cnt2++; //if it wasn't the same, add
}
}
} else {
cnt++; //if it was the same move to the next one
cnt2 = 0; //reset this number
break; //go back to beginning
}
} else {
break;
}
So the code works perfectly for about 6500 lines but then it abruptly cuts off the last line:
Blah B 6 5 8 C 5 X 6
Blah A 0 1 4 C 2 X 7
Blah B 3 5 9 C 5 X 6
Blah B 0 9 4
Does anyone know what I can add to stop the final line from cutting off so suddenly? I know in BASH you could specify for it to wait... But I was confused by the java equivalents and was hoping someone could suggest one for me and help to explain it a bit better.
For the sake of having an answer (until Carl puts his up) I am going to go ahead and answer
did you close the outputstream? maybe you need to call the flush method. – Carl
He was right. I hadn't. Silly me.
Actually, i made that mistakes a lot, i think it is because with the garbage collector,we don't really care about memory management,so,it tends to forget to close any iostream we'd opened or to flush memory to disk.That's a terrible thing to do though.
Related
I have a problem which is puzzling me. I'm indexing a corpus (17 000 files) of text files, and while doing this, I'm also storing all the k-grams (k-long parts of words) for each word in a HashMap to be used later:
public void insert( String token ) {
//For example, car should result in "^c", "ca", "ar" and "r$" for a 2-gram index
// Check if token has already been seen. if it has, all the
// k-grams for it have already been added.
if (term2id.get(token) != null) {
return;
}
id2term.put(++lastTermID, token);
term2id.put(token, lastTermID);
// is word long enough? for example, "a" can be bigrammed and trigrammed but not four-grammed.
// K must be <= token.length + 2. "ab". K must be <= 4
List<KGramPostingsEntry> postings = null;
if(K > token.length() + 2) {
return;
}else if(K == token.length() + 2) {
// insert the one K-gram "^<String token>$" into index
String kgram = "^"+token+"$";
postings = index.get(kgram);
SortedSet<String> kgrams = new TreeSet<String>();
kgrams.add(kgram);
term2KGrams.put(token, kgrams);
if (postings == null) {
KGramPostingsEntry newEntry = new KGramPostingsEntry(lastTermID);
ArrayList<KGramPostingsEntry> newList = new ArrayList<KGramPostingsEntry>();
newList.add(newEntry);
index.put("^"+token+"$", newList);
}
// No need to do anything if the posting already exists, so no else clause. There is only one possible term in this case
// Return since we are done
return;
}else {
// We get here if there is more than one k-gram in our term
// insert all k-grams in token into index
int start = 0;
int end = start+K;
//add ^ and $ to token.
String wrappedToken = "^"+token+"$";
int noOfKGrams = wrappedToken.length() - end + 1;
// get K-Grams
String kGram;
int startCurr, endCurr;
SortedSet<String> kgrams = new TreeSet<String>();
for (int i=0; i<noOfKGrams; i++) {
startCurr = start + i;
endCurr = end + i;
kGram = wrappedToken.substring(startCurr, endCurr);
kgrams.add(kGram);
postings = index.get(kGram);
KGramPostingsEntry newEntry = new KGramPostingsEntry(lastTermID);
// if this k-gram has been seen before
if (postings != null) {
// Add this token to the existing postingsList.
// We can be sure that the list doesn't contain the token
// already, else we would previously have terminated the
// execution of this function.
int lastTermInPostings = postings.get(postings.size()-1).tokenID;
if (lastTermID == lastTermInPostings) {
continue;
}
postings.add(newEntry);
index.put(kGram, postings);
}
// if this k-gram has not been seen before
else {
ArrayList<KGramPostingsEntry> newList = new ArrayList<KGramPostingsEntry>();
newList.add(newEntry);
index.put(kGram, newList);
}
}
Clock c = Clock.systemDefaultZone();
long timestart = c.millis();
System.out.println(token);
term2KGrams.put(token, kgrams);
long timestop = c.millis();
System.out.printf("time taken to put: %d\n", timestop-timestart);
System.out.print("put ");
System.out.println(kgrams);
System.out.println();
}
}
The insertion into the HashMap happens on the rows term2KGrams.put(token, kgrams); (There are 2 of them in the code snippet). When indexing, everything works fine until things suddenly, at 15 000 indexed files, go bad. Everything slows down immensely, and the program doesn't finish in a reasonable time, if at all.
To try to understand this problem, I've added some prints at the end of the function. This is the output they generate:
http://soccer.org
time taken to put: 0
put [.or, //s, /so, ://, ^ht, cce, cer, er., htt, occ, org, p:/, r.o, rg$, soc, tp:, ttp]
aysos
time taken to put: 0
put [^ay, ays, os$, sos, yso]
http://www.davisayso.org/contacts.htm
time taken to put: 0
put [.da, .ht, .or, //w, /co, /ww, ://, ^ht, act, avi, ays, con, cts, dav, g/c, htm, htt, isa, nta, o.o, ont, org, p:/, rg/, s.h, say, so., tac, tm$, tp:, ts., ttp, vis, w.d, ww., www, yso]
playsoccer
time taken to put: 0
put [^pl, ays, cce, cer, er$, lay, occ, pla, soc, yso]
This looks fine to me, the putting doesn't seem to be taking long time and the k-grams (in this case trigrams) are correct.
But one can see strange behaviour in the pace at which my computer is printing this information. In beginning, everything is printing at a super high speed. But at 15 000, that speed stops, and instead, my computer starts printing a few lines at a time, which of course means that indexing the other 2000 files of the corpus will take an eternity.
Another interesting thing I observed was when doing a keyboard interrupt (ctrl+c) after it had been printing erratically and slowly as described for a while. It gave me this message:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)sahandzarrinkoub#Sahands-MBP:~/Documents/Programming/Information Retrieval/lab3 2$ sh compile_all.sh
Note: ir/PersistentHashedIndex.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Does this mean I'm out of memory? Is that the issue? If so, that's surprising, because I've been storing quite a lot of things in memory before, such as a HashMap containing the document ID's of every single word in the corpus, a HashMap containing every single word where every single k-gram appears, etc.
Please let me know what you think and what I can do to fix this problem.
To understand this, you must first understand that java does not allocate memory dynamically (or, at least, not indefinetly). The JVM is by default configured to start with a minimum heap size and a maximum heap size. When the maximum heap size would be exceeded through some allocation, you get a OutOfMemoryError
You can change the minimum and maximum heap size for your execution with the vm parameters -Xms and -Xmx respectively. An example for an execution with at least 2, but at most 4 GB would be
java -Xms2g -Xmx4g ...
You can find more options on the man page for java.
Before changing the heap memory, however, take a close look at your system resources, especially whether your system starts swapping. If your system swaps, a larger heap size may let the program run longer, but with equally bad performance. The only thing possible then would be to optimize your program in order to use less memory or to upgrade the RAM of your machine.
Say I want to get the 'of' typed in the input by the user
I am making an app, not finishing off in this dumb output screen (This is only an example)
import java.util.*;
Scanner input = new Scanner(System.in);
String text = input.nextLine();
int check = 0;
for(int i = 0; i < text.length(); i++;){
if(text.substring(i, i + 2) .equals("of")){
check = 0;
}
}
If the user enters abofd, it surely recognises the of at position at 2-4.
But when i value is 4 it checks position from 4-6, but as position 6 is not present is not present it gives an error.
I know you are thinking me to set i < text.length() - 1 at line 5, but my original code needs to run until the end!
It is always a good idea to validate input coming from a user, but you are very much overcomplicating things! Even worse, you wrote down outright wrong code:
String text = input.nextLine();
int check = 0;
for(int i = 0; i < text.length(); i++;){
if(text.substring(i, i + 2) .equals("of")){
The above can't work! You see, i iterates from 0 to text LENGTH. But then you are using i+2 to get a substring from text. (so: say hello to your first ArrayIndexOutOfBoundsException).
Instead, you can do things like:
if (text.contains("of")) {
or
if (text.indexOf("of") >= 0) {
to find out if your string contains "of".
And to answer your question in the title: absolutely yes. Programming is about being a good craftsman to a very large degree. A good craftsman keeps all his tools and materials in order. He doesn't allow for mess, waste, ...
So, long story short: from day one, to all eternity: when writing code, you strive for a zero tolerance policy: no compiler errors, no warnings, nothing in your code that doesn't belong there!
I'm relatively new to Java programming and I've just started learning recursion, but I can't seem to figure out how this method works in my head.
private static int mystery(int w) {
{
if (w < 0) return 0;
int x = mystery (w-2);
return w - x;
}
}
Whenever a variable like 100 is put in, it outputs 50. When 200 is input, it outputs 100. When 2 is input, it outputs 2. When 25 is input, 13 is output. I'm not sure how this method works, and I'm trying to wrap my head around it.
The way I currently view it, if you put in 100, it'll bypass the first return statement since it is greater than 0.
when it gets to the second line, it'll do 100-2, which brings in 98, then goes to the third line and does 100 - 98 = 2. Which is then returned to the original call.
I know I'm messing up on the second line of the method where the mystery (w-2) is. I assume it would bring back the result of w-2 to the beginning of the method again, and it would continue to do the method over and over again until w is smaller than 0, which should output 0 again regardless of the answer. But that's not what happens, and I don't know why.
Can anyone explain what is going on here?
What you are missing is that on the second line it doesn't just do w - 2, but calls itself with w - 2. It doesn't go further until the call returns. And the second call calls itself if w isn't < 0 and so on until you reach value lower than 0 and then return. The execution will go like this, if you visualize it:
mystery(10)
> skip first line
> x = mystery(8)
> skip first line
> x = mystery(6)
> skip first line
> x = mystery(4)
> skip first line
> x = mystery(2)
> skip first line
> x = mystery(0)
> skip first line
> x = mystery(-2)
> return 0
> return 0 - 0 (0)
> return 2 - 0 (2)
> return 4 - 2 (2)
> return 6 - 2 (4)
> return 8 - 4 (4)
> return 10 - 4 (6)
With example of w = 10. I hope you understand it better now.
private static int mystery(int w) {
{
if (w < 0) return 0;
int x = mystery (w-2);
return w - x;
}
}
Let's imagine that we call mystery(3). What happens? w<0) is false, so we don't return 0. In the next line, we call some function called mystery using the value 3-2=1 as its argument.
Despite the fact that this function we've called happens to be the same one we've just called, it's still an ordinary function call, and it returns a value. It does this by calling the function called mystery, this time using the value -1 as the argument. And this time w<0 is true, so we just return 0. Now we're back in the second call to mystery, and we've set x = 0. So that call returns w - 0 = 1. That puts us back in the first call, and now x = 1, so we return w-x = 3-1 = 2.
You might want to take a few minutes and work through this using w=4 and see what you get - this will help you understand how the recursive calls work.
After you've done this, I suggest you add a print statement or two in the function to tell you where you are and what's happening, and that'll also help - but do it on paper first.
The two given answers are excellent. Both focus on the way how to get a grasp of what recursion is. The problem with recursion is, that it is so unnatural to one who do not know what recursion is, or do not know someone who does. It's like a snake eating itself again and again.
The best way to understand recursion is to write down the calls to a recursive method, by noying the current state when it's called, and after the call write the result back. You stack up the calls and that's also the way to not used recursion at all.
So do not try too hard to understand recursion at first but first focus on the program flow. If you have seen enough recursions, it will come to you.
I'm studying for my computer science final and am going back over some of the things that I never quite grasped when we went over them in class. The main thing being recursion. I think I've got the hang of the simple recursion example but am trying to work through one that was on a previous exam and am having trouble figuring out how it should be done.
Here is the question:
Texas numbers (Tx(n)) are defined as follows for non-negative numbers (assume true):
Tx(n) = 10 if n is 0
Tx(n) = 5 if n is 1
Tx(n) = 2*(Tx(n-1) + Tx(n-2) if n >= 2
We are then to write the recursion function for Texas numbers, after making some corrections after the test, here's what I've come up with, I think it's right, but not 100% sure.
public int Tx(int n) {
if(n == 0)
return 10;
else if (n == 1)
return 5;
else
return 2*(Tx(n-1) + Tx(n-2));
}
Then we are asked to computer the value of Tx(5). This is where I'm stuck. If the return statement for the else was simply n-1, I think I'd be able to figure it out, but the n-1 + n-2 is completely throwing me off.
Can anyone explain how this would work, or share some links that have similar examples. I have tried looking this up online and in my textbook but the examples I've found are either so advanced that I have no clue what's going on, or they only deal with something like return n-1, which I already know how to do.
Let's start with Tx(2). n > 1, so we have 2*(Tx(n-1) + Tx(n-2)) which is 2*(Tx(1) + Tx(0)).
But we already know Tx(1) and Tx(0)! So just substitute them in and you get 2*(5 + 10) -> 30. Great, so now we know T(2).
What about T(3)? 2*(Tx(2) + Tx(1)). Nice, we already know these too :) Again, just fill them in to get 2*(30 + 5) -> 70.
You can work forwards to get to Tx(5).
Your code is logically correct, you should just be using == to test equality, a single = is for assignment.
When you run your method, it will work backwards and solve smaller and smaller subproblems until it gets to a point where the answer is known, these are your base cases.
Tx(3)
2* Tx(2) + Tx(1)
2*Tx(1) + Tx(0) (5)
(5) (10)
In order for recursion to work, whatever you are doing each time to break the problem down into smaller problems needs to make some progress towards the base case. If it doesn't, you will just infinitely recurse until your computer runs out of space to store all of the repeated calls to the same function.
public int Tx(int n) {
if(n == 0)
return 10;
else
return Tx(n+1); // n will never reach 0!
}
Tx(1) becomes Tx(2) -> Tx(3) -> Tx(4) -> Tx(5) etc.
Your implementation is good, only one minor mistake - in the conditions you should replace = with == - it's not an assignment - it's a comparison.
By the way, what would you expect your method to return for Tx(-1) ?
You have implemented it right just change = with ==.
If you want to further reduce the time complexity you can store the result in an array global to the function so that your function doesnot compute results again and again for a same number this will only save you some time for large computations.
You can use something like this.
public int tx(int n , int []arr) {
if (arr[n] == 0) {
if (n == 1) {
arr[n] = 10;
}
else if (n == 2) {
arr[n] = 5;
}
else {
arr[n] = 2 * (tx((n - 1), arr) + tx((n - 2), arr));
}
}
return arr[n];
}
See whenever you ask the computer for the value Tx(5) it will call the recursive function and so the program will execute the else part because value of n=5.
Now in the else part 2*(Tx(n-1)+Tx(n-2)) will be executed.
In first iteration it will become 2*((2*(Tx(3)+Tx(2)))+(2*(Tx(2)+Tx(1)))) . The iteration will be continued until the value of n become 0 or 1.
I am making a lottery program where I am asking if basically they would like a quick pick ticket. The numbers for their ticket of course would be random since it is a quick pick but the first four numbers range from 0-9 while the fifth number only goes up to 0-4. I am trying to ask them to input a button such as either "1" for no or "2" for yes if they don't want one then it would skip this step. But I am doing the boolean part incorrectly though. Could someone help me out?
Here is an example
System.out.println("Do you want Quick pick, 1 for no or 2 for yes? The first four numbers is from a separate set of 0 to 9 and the fifth number is from a set of 0 to 4.");
QuickPick=keyboard.nextInt();
if((QuickPick==1)){
return false;
}
if((QuickPick==2)){
return true;
int n = (int)(Math.random()*9+0);
System.out.println("Your QuickPick numbers are: " + kickerNumbers + kickerPowerball);
}
I still haven't gotten around to making the line of code for the final number of 0-4, just the first four numbers, so I haven't forgotten that.
Your code for case 2 immediately does a return true; which ends the method (I assume this is in a method) right then and there. Your other lines don't get execute at all.
Consider using a switch() statement here, it'll make it easier to read:
switch(QuickPick)
{
case 1:
return false;
case 2:
int n = (int)(Math.random()*9+0); // Why is n here? You don't do anything with it?
System.out.println("Your QuickPick numbers are: " + kickerNumbers + kickerPowerball);
return true;
default:
// Uh oh - someone did something bad maybe just return false?
return false;
}
Also your code for case 2 is definitely wrong, you need to generate a total of five numbers, using bounds 0-9 for the first 4 and 0-4 for the last one. You'll want to use Java's Random to do this (not Math.Random) something like:
Random rand = new Random();
int somethingRandom = rand.nextInt(10);
// Will give you an integer value where 0 < val < 10
// You can call rand.nextInt as many times as you want
To avoid doing your homework for you -- I'll follow the typical CS textbook line and say "Implementation left as an exercise."
The code after return true will not be executed - you need to put that prior to the return statement
Like Marvo said, you dropped a brace in your if.
But you also have faulty logic. I'm not quite sure what the purpose of the method you're in is (that returns a boolean value). But your last few lines will never be reached unless the user types in something like 3 or 42.
Assuming the method is supposed to a) Ask if the user wants a Quick Pick b) Calculate the Quick Pick, if desired c) Return true/false depending on whether the Quick Pick happened or not, you should have:
public boolean doQuickPick()
{
System.out.println("Do you want Quick pick, 1 for no or 2 for yes? The first four numbers is from a separate set of 0 to 9 and the fifth number is from a set of 0 to 4.");
QuickPick=keyboard.nextInt();
if((QuickPick==1)){
return false;
}
if((QuickPick==2)){
int n = (int)(Math.random()*9+0);
System.out.println("Your QuickPick numbers are: " + kickerNumbers + kickerPowerball);
return true;
}
}
As a separate issue, it'd be much better style to break that into several methods. boolean yesNoPrompt(String message), generateQuickPick(), etc.
Your question is kind of unclear, so I'm afraid I can't be much more help than that. Do post any clarifications / further questions if you have them.
if((QuickPick==2)){
return true;
int n = (int)(Math.random()*9+0);
System.out.println("Your QuickPick numbers are: " + kickerNumbers + kickerPowerball);
}
In the above copied code from your question, I see that you will be getting compilation errors in your IDE. Your IDE will complain about "Unreachable Code" for the line that is just below the return statement. So, you need to put the return statement at the end of the if block.