Using a clock while loop to read files in a while loop

Using a clock while loop to read files in a while loop - java

Currently taking my Data Structures class and we're to employ a queue in our next program.
We are given an input file such as this one:
10 324 Boots 32.33
11 365 Gloves 33.33
12 384 Sweater 36.33
13 414 Blouse 35.33
I am to read the first int (which is a unit of time) and use that as a reference for my clock that keeps running in the background.
I did something along these lines:
Scanner infp = new Scanner(new File(FILE));
while (busy) {
clock = 0;
clock += clockCount++;
while (infp.hasNext()) {
timeEntered = infp.nextInt();
infp.nextLine();
System.out.println(timeEntered);
busy = true;
if (timeEntered == clock) {
itemNum = infp.nextInt();
type = infp.nextLine();
itemPrice = infp.nextDouble();
}
}
}
The problem is, when I run it I get an 'InputMismatchException' error. I'm cognizant of the fact that you need to skip the carriage before a String, which is what I believe I am doing.
I have no clue where to go from here.

So given these columns:
10 324 Boots 32.33
11 365 Gloves 33.33
12 384 Sweater 36.33
13 414 Blouse 35.33
For each line, you read the first column into timeEntered.
And then you do infp.nextLine() and that's a mistake.
When you call nextLine, the scanner reads everything unread on the current line until the end.
That means you cannot read the other column values.
But you need them. So don't call nextLine when you still want to process values on a line. Call it after.
And you have the exact same kind of problem again later when your read type and itemPrice.
Replace the while (infp.hasNext()) with this:
while (infp.hasNextLine()) {
int timeEntered = infp.nextInt();
System.out.println(timeEntered);
busy = true;
if (timeEntered == clock) {
itemNum = infp.nextInt();
type = infp.next();
itemPrice = infp.nextDouble();
}
infp.nextLine();
}

Related

Java program slows down abruptly when indexing corpus for k-grams

I have a problem which is puzzling me. I'm indexing a corpus (17 000 files) of text files, and while doing this, I'm also storing all the k-grams (k-long parts of words) for each word in a HashMap to be used later:
public void insert( String token ) {
//For example, car should result in "^c", "ca", "ar" and "r$" for a 2-gram index
// Check if token has already been seen. if it has, all the
// k-grams for it have already been added.
if (term2id.get(token) != null) {
return;
}
id2term.put(++lastTermID, token);
term2id.put(token, lastTermID);
// is word long enough? for example, "a" can be bigrammed and trigrammed but not four-grammed.
// K must be <= token.length + 2. "ab". K must be <= 4
List<KGramPostingsEntry> postings = null;
if(K > token.length() + 2) {
return;
}else if(K == token.length() + 2) {
// insert the one K-gram "^<String token>$" into index
String kgram = "^"+token+"$";
postings = index.get(kgram);
SortedSet<String> kgrams = new TreeSet<String>();
kgrams.add(kgram);
term2KGrams.put(token, kgrams);
if (postings == null) {
KGramPostingsEntry newEntry = new KGramPostingsEntry(lastTermID);
ArrayList<KGramPostingsEntry> newList = new ArrayList<KGramPostingsEntry>();
newList.add(newEntry);
index.put("^"+token+"$", newList);
}
// No need to do anything if the posting already exists, so no else clause. There is only one possible term in this case
// Return since we are done
return;
}else {
// We get here if there is more than one k-gram in our term
// insert all k-grams in token into index
int start = 0;
int end = start+K;
//add ^ and $ to token.
String wrappedToken = "^"+token+"$";
int noOfKGrams = wrappedToken.length() - end + 1;
// get K-Grams
String kGram;
int startCurr, endCurr;
SortedSet<String> kgrams = new TreeSet<String>();
for (int i=0; i<noOfKGrams; i++) {
startCurr = start + i;
endCurr = end + i;
kGram = wrappedToken.substring(startCurr, endCurr);
kgrams.add(kGram);
postings = index.get(kGram);
KGramPostingsEntry newEntry = new KGramPostingsEntry(lastTermID);
// if this k-gram has been seen before
if (postings != null) {
// Add this token to the existing postingsList.
// We can be sure that the list doesn't contain the token
// already, else we would previously have terminated the
// execution of this function.
int lastTermInPostings = postings.get(postings.size()-1).tokenID;
if (lastTermID == lastTermInPostings) {
continue;
}
postings.add(newEntry);
index.put(kGram, postings);
}
// if this k-gram has not been seen before
else {
ArrayList<KGramPostingsEntry> newList = new ArrayList<KGramPostingsEntry>();
newList.add(newEntry);
index.put(kGram, newList);
}
}
Clock c = Clock.systemDefaultZone();
long timestart = c.millis();
System.out.println(token);
term2KGrams.put(token, kgrams);
long timestop = c.millis();
System.out.printf("time taken to put: %d\n", timestop-timestart);
System.out.print("put ");
System.out.println(kgrams);
System.out.println();
}
}
The insertion into the HashMap happens on the rows term2KGrams.put(token, kgrams); (There are 2 of them in the code snippet). When indexing, everything works fine until things suddenly, at 15 000 indexed files, go bad. Everything slows down immensely, and the program doesn't finish in a reasonable time, if at all.
To try to understand this problem, I've added some prints at the end of the function. This is the output they generate:
http://soccer.org
time taken to put: 0
put [.or, //s, /so, ://, ^ht, cce, cer, er., htt, occ, org, p:/, r.o, rg$, soc, tp:, ttp]
aysos
time taken to put: 0
put [^ay, ays, os$, sos, yso]
http://www.davisayso.org/contacts.htm
time taken to put: 0
put [.da, .ht, .or, //w, /co, /ww, ://, ^ht, act, avi, ays, con, cts, dav, g/c, htm, htt, isa, nta, o.o, ont, org, p:/, rg/, s.h, say, so., tac, tm$, tp:, ts., ttp, vis, w.d, ww., www, yso]
playsoccer
time taken to put: 0
put [^pl, ays, cce, cer, er$, lay, occ, pla, soc, yso]
This looks fine to me, the putting doesn't seem to be taking long time and the k-grams (in this case trigrams) are correct.
But one can see strange behaviour in the pace at which my computer is printing this information. In beginning, everything is printing at a super high speed. But at 15 000, that speed stops, and instead, my computer starts printing a few lines at a time, which of course means that indexing the other 2000 files of the corpus will take an eternity.
Another interesting thing I observed was when doing a keyboard interrupt (ctrl+c) after it had been printing erratically and slowly as described for a while. It gave me this message:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)sahandzarrinkoub#Sahands-MBP:~/Documents/Programming/Information Retrieval/lab3 2$ sh compile_all.sh
Note: ir/PersistentHashedIndex.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Does this mean I'm out of memory? Is that the issue? If so, that's surprising, because I've been storing quite a lot of things in memory before, such as a HashMap containing the document ID's of every single word in the corpus, a HashMap containing every single word where every single k-gram appears, etc.
Please let me know what you think and what I can do to fix this problem.

To understand this, you must first understand that java does not allocate memory dynamically (or, at least, not indefinetly). The JVM is by default configured to start with a minimum heap size and a maximum heap size. When the maximum heap size would be exceeded through some allocation, you get a OutOfMemoryError
You can change the minimum and maximum heap size for your execution with the vm parameters -Xms and -Xmx respectively. An example for an execution with at least 2, but at most 4 GB would be
java -Xms2g -Xmx4g ...
You can find more options on the man page for java.
Before changing the heap memory, however, take a close look at your system resources, especially whether your system starts swapping. If your system swaps, a larger heap size may let the program run longer, but with equally bad performance. The only thing possible then would be to optimize your program in order to use less memory or to upgrade the RAM of your machine.

Convert Bigdecimal to Timeformat for a string

Basically, I'm using app42 to store scores for a game. However my game uses times.
For example. If I submit a score of 0:3:85 ( 0 hours, 3 seconds and 85 milliseconds)
it would be stored as a bigdecimal as 385.
When i retrieve my score I retrieve it as 385, I have no idea how I can convert it back to my time format.
Example:
At the minute i'm using a function to count how many digits the number is.
int getNumberOfDecimalPlaces(BigDecimal bigDecimal) {
String string = bigDecimal.toPlainString();
int index = string.length();
return index;
}
then to actually work it out, i'm at a loss.
Im thinking something along the lines of
private String ConvertScore(BigDecimal Score){
int Len = getNumberOfDecimalPlaces(Score);
String Convert = Score.toString();
String Finished;
if( Len == 1){
}
else if(Len == 2){
}
else if(Len == 3)
{
Finished
}
return Finished;
}
but honestly I can't even think how i'd do it.
Thanks

Correct me if I'm wrong but your API that you're using returns you the score of the game (not the, not the time. And you won't have an ability to convert it correctly. Because there could be different score.
For example, how to convert 10345?
10 hours 3 seconds and 45 milliseconds
OR
is it 103 hours 0 seconds and 45 milliseconds and so on.

How to assign indexes to values in CSV using different lengths. (Java)

I'm trying to align values in two different CSV files to like indexes. One CSV has a unix time stamp every second and the other every minute. Additionally I need to mark the indexes pn various lengths of the data.
So for example I would need to mark minutes 1 - 20 (and seconds 1 - 1200) with the index of 0, then minutes 21 - 25 (1260 - 1500) with index 1, minutes 26 - 42 (1560 - 2520) with index 2, etc. etc.
The length will vary depending on the data (possibly an array that holds the different values for the lengths) that I am using so I would like to do this programatically. I'm not sure how to attack this problem. Any help would be greatly appreciated.
Edit (hopefully this helps clarify some)
I have data in one CSV that has information about audio and another that has information about bio-metric data. They will both have a column that has a unix time stamp. The data in the audio CSV is for every second and the bio-metric data is for every minute.
I want to add an additional row to the CSV's that have the same index(or key) value so that the data has a common link between the two files. If I call index 1 I want to get the data from both files that have an index of 1.
The length of an audio file will determine where the keys start and stop. So if I had an audio file that was 2 minutes long, 2 rows in the bio-metric CSV will have a key of 0 and 120 rows of the audio CSV will have a key of 0.
If that audio file had been 3 minutes long the first 3 rows of the bio-metric CSV would have a key of 0 and the first 180 rows of the audio CSV would have the a key of 0.

You can open both files, read the first line to see what the timestamp is, then read lines from the file that has the earlier timestamp until it no longer has the earlier timestamp, or ends.
If this is by-minute.csv:
1394589660,minute 1
1394589720,minute 2
and this is by-second.csv:
1394589659,second -1
1394589660,second 0
1394589661,second 1
1394589662,second 2
1394589663,second 3
1394589664,second 4
…
1394589718,second 58
1394589719,second 59
1394589720,second 60
1394589721,second 61
then this Java code:
import java.io.*;
import java.text.*;
import java.util.*;
public class Foo {
public static void main(String[] args)
throws Exception
{
BufferedReader byMinute = new BufferedReader(
new InputStreamReader(
new FileInputStream("by-minute.csv")));
BufferedReader bySecond = new BufferedReader(
new InputStreamReader(
new FileInputStream("by-second.csv")));
String byMinuteLine = byMinute.readLine();
String bySecondLine = bySecond.readLine();
while (byMinuteLine != null || bySecondLine != null) {
/* If either file is done, print lines from the other file */
if (byMinuteLine == null) {
System.out.println(indicize(bySecondLine));
bySecondLine = bySecond.readLine();
} else if (bySecondLine == null) {
System.out.println(indicize(byMinuteLine));
byMinuteLine = byMinute.readLine();
} else {
/* Otherwise print the earlier entry */
long minuteTime = getTimeStamp(byMinuteLine);
long secondTime = getTimeStamp(bySecondLine);
if (secondTime < minuteTime) {
System.out.println(indicize(bySecondLine));
bySecondLine = bySecond.readLine();
} else {
System.out.println(indicize(byMinuteLine));
byMinuteLine = byMinute.readLine();
}
}
}
}
static long getTimeStamp(String line) {
return Long.valueOf(line.split(",")[0]);
}
static String indicize(String line) {
return ((getTimeStamp(line) - 1394589660) / 20)
+ line.substring(line.indexOf(","));
}
}
will print out:
0,second -1
0,minute 1
0,second 0
0,second 1
0,second 2
0,second 3
0,second 4
0,second 5
0,second 6
0,second 7
0,second 8
0,second 9
…
2,second 55
2,second 56
2,second 57
2,second 58
2,second 59
3,minute 2
3,second 60
3,second 61
Now I know that’s not exactly what you’re looking for—but I hope that’s enough to get you started!

Array and Java string error: [Ljava.lang.String;#19c42c4b

I've created a program that allows a user to enter in Journal entries (up to 7 days) and then allows a person to call up one of those days after they enter in an entry. Unfortunately, this has left me with some weird string error that I'm not familiar with.
Code as follows:
public class eDiary{
public static void main (String args[]){
int[] days = new int[7];//get our days
days[0] = 1;//start with 1 and not 0
days[1] = 2;
days[2] = 3;
days[3] = 4;
days[4] = 5;
days[5] = 6;
days[6] = 7;
String [] events = new String[7];//events for the days
int i = 0;
//asks for input and counts
for(i=0; i<7; i++){
String event = Console.readString("Tell me the major event of day " + days[i] + "\n");
events[i] = event;
}
int journal_entry = Console.readInt("Enter what day you want to hear or Enter 0 to stop \n");
while (journal_entry != 0) {
System.out.println(events);
journal_entry = Console.readInt("Enter what day you want to hear or Enter 0 to stop \n");
//get r dun!
The input and output:
Tell me the major event of day 1
one
Tell me the major event of day 2
two
Tell me the major event of day 3
thre
Tell me the major event of day 4
four
Tell me the major event of day 5
five
Tell me the major event of day 6
six
Tell me the major event of day 7
seven
Enter what day you want to hear or Enter 0 to stop
1
[Ljava.lang.String;#10181f5b
Enter what day you want to hear or Enter 0 to stop
0
Howdy y'all!
Thanks a lot for the quick responses. One thing it seems to be doing now is when replacing
System.out.println(events);
with
System.out.println(events[journal_entry]);
Now gives me input such as this:
Tell me the major event of day 1
first day
Tell me the major event of day 2
second day
Tell me the major event of day 3
third day
Tell me the major event of day 4
fourth day
Tell me the major event of day 5
fifth day
Tell me the major event of day 6
sixth day
Tell me the major event of day 7
seventh day
Enter what day you want to hear or Enter 0 to stop
1//the day im asking for
second day//spitting out the next day's entry instead of the first day's entry
Enter what day you want to hear or Enter 0 to stop
0//this is me stopping it

It's not an error.
System.out.println(events);
In this line you are trying to print the array, but that statement doesn't print the array contents, it only prints the object class name followed by its hashcode.
To print the array content you have to use
System.out.println(Arrays.toString(events));
Or, if you want, loop through the array and print its values

The [Ljava.lang.String;#10181f5b stuff is what you get when you explicitly or implicitly call Object.toString() and the target object's class doesn't override toString(). In this case, the issue is that Java array types do not override toString().
If you want to output an array, use java.util.Arrays.toString(...) to convert it to a String, then output that.
But in this case, you actually need to output a specific entry, not the entire array. The fix is to change
System.out.println(events);
to
System.out.println(events[journal_entry]);
For the record, the stuff above consists of the classes internal name ("[Ljava.lang.String;") and the object's identity hashcode (in hexadecimal).
This is not a "weird error string".

The output you are getting is because:
In Java, each object has toString() method, the default is displaying the class name representation, then adding # and then the hashcode.
You should use Arrays#toString(), which is implemented this way:
3860 public static String toString(int[] a) { {
3861 if (a == null)
3862 return "null";
3863 int iMax = a.length - 1;
3864 if (iMax == -1)
3865 return "[]";
3866
3867 StringBuilder b = new StringBuilder();
3868 b.append('[');
3869 for (int i = 0; ; i++) {
3870 b.append(a[i]);
3871 if (i == iMax)
3872 return b.append(']').toString();
3873 b.append(", ");
3874 }
3875 }
This will help you to better understand arrays.
Of course you can manually loop on the array and print it:
for(String event: events) {
System.out.println(event);
}

There is nothing wrong and that's not an error message.
Instead, it's the string representation of an array. Consider this line:
System.out.println(events);
You are printing the whole array, so you get that representation -- which happens to be a bit ugly, indeed. You want to print only one element, the one corresponding to the selected day. Use:
System.out.println(events[journal_entry]);
And perform bound checks.

This is not an error. You want to print the value of variable events. [Ljava.lang.String;#10181f5b means that events is an array of type java.lang.String and 10181f5b is hashcode of this variable. What you want to println is event[i] where i is the number of a day.

In java array's are consider as object. you are printing the event array object that's not what you want.
You need to print name of the day in a week. You need to replace
System.out.println(events);
to
System.out.println(events[journal_entry]);

It won't print out the answer correctly because you just pointed System.out.println() to events which is supposed to be an array pointer and not the actual variable. You should just replace this line with
System.out.println(events[journal_entry]);
For it to make sense. Run it with the conmmand and see if it will run properly.

Thanks for all the responses! I was able to resolve the issue. Here's the code if anyone is curious:
public static void main (String args[]){
int[] days = new int[7];//get our days
days[0] = 1;//start with 1 and not 0
days[1] = 2;
days[2] = 3;
days[3] = 4;
days[4] = 5;
days[5] = 6;
days[6] = 7;
String [] events = new String[7];//events for the days
int i = 0;
//asks for input and counts
for(i=0; i<7; i++){
String event = Console.readString("Tell me the major event of day " + days[i] + "\n");
events[i] = event;
int journal_entry = Console.readInt("Enter what day you want to hear or Enter 0 to stop \n");
while (journal_entry != 0) {
System.out.println("On day " + days[i = 0] + " " + events[journal_entry - 1]);
journal_entry = Console.readInt("Enter what day you want to hear or Enter 0 to stop \n");

Last line only partially prints in Java

I have some java code that compares two files. When it finds similar numbers on a particular line it prints that line to a new file. This seems to work for a good amount of time... Until what I believe is the last line. That line only ever gets partially printed. I THINK that it might be because of a 'break' that is later in the code, but I looked around on google and wasn't really sure if any of the answers were really relevant.
Here is some code that I think is related:
Read in files
while the line isn't null...
Parse files
Write a header
Some comparisons
while (!input.startsWith("#") && !input.startsWith("P")) {
prse = input.split("\t");//split the file by tabs
pos = prse[7];
poson = prse[8];
pos = Integer.parseInt(poson);
if (cnt < num.size()) { //if we haven't exceeded an array
if (num.get(cnt).equals(pos)) { //if the first number is the same
if (cnt2 < posstart.size()) { //if we haven't exceeded another array
end = Integer.parseInt(posend.get(cnt2)); //change to int
start = Integer.parseInt(posstart.get(cnt2));//change to int
if (pos < start) { //if it is less then the starting pos then it can't fall within
break; //so break
}
if (pos < end && pos > start) {//I am trying to see if a number falls within the range of numbers from a separate file
out1.write(input + "\n"); //If it does: This is where I am writing out the line
break; //if I remove this break the program hangs here
} else {
cnt2++; //if it wasn't the same, add
}
}
} else {
cnt++; //if it was the same move to the next one
cnt2 = 0; //reset this number
break; //go back to beginning
}
} else {
break;
}
So the code works perfectly for about 6500 lines but then it abruptly cuts off the last line:
Blah B 6 5 8 C 5 X 6
Blah A 0 1 4 C 2 X 7
Blah B 3 5 9 C 5 X 6
Blah B 0 9 4
Does anyone know what I can add to stop the final line from cutting off so suddenly? I know in BASH you could specify for it to wait... But I was confused by the java equivalents and was hoping someone could suggest one for me and help to explain it a bit better.

For the sake of having an answer (until Carl puts his up) I am going to go ahead and answer
did you close the outputstream? maybe you need to call the flush method. – Carl
He was right. I hadn't. Silly me.

Actually, i made that mistakes a lot, i think it is because with the garbage collector,we don't really care about memory management,so,it tends to forget to close any iostream we'd opened or to flush memory to disk.That's a terrible thing to do though.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.