I am getting ~20'000 rows out of a database with JDBC in Java, Eclipse IDE.
I go through the result set step by step (ordered by id). Every row contains information about its previous and succeeding "identifier" (unique string of a row). So for every row I check if the chain is broken or not, b/c of whatever reason.
My approach beneath works, and don't quote me on that, but it seems it works better for the first 10'000 than for the following. Error quota is 335 for 19'999 entries. I checked with my own eyes if the reported errors are in accordance with reality, but they are definitely not, at least for 1 error.
Do I miss something important? Why does this happen? It almost looks like it is the result of parallelization, multithreading, etc.?
int i = 0;
String actualprevious = "", previous = "", next = "";
boolean first = true; // we can't check if the first is in line because it is the first
int errors = 0;
while (rs.next())
{
if (i%10000==0) { System.out.println("Checked "+(i/10000)+" myriads."); } // inform
String current = rs.getString("identifier");
if (!current.equals(next)) { System.out.println("Current is: "+current); System.out.println("Expected "+next+" to be next, but "+current+" is."); errors++; } // inform
// ignore: Document doc = Jsoup.parse(rs.getString("source"));
next = rs.getString("next");
if (next==null) { System.out.println("There is no next listed in row "+current+"."); errors++; } // inform
previous = rs.getString("previous");
if (!first && !actualprevious.equals(previous)) { System.out.println("Expected "+actualprevious+" to be listed as previous, but "+previous+" was in document "+current+"."); errors++; } // inform
actualprevious = current;
i++;
first = false;
}
Related
I have a problem which is puzzling me. I'm indexing a corpus (17 000 files) of text files, and while doing this, I'm also storing all the k-grams (k-long parts of words) for each word in a HashMap to be used later:
public void insert( String token ) {
//For example, car should result in "^c", "ca", "ar" and "r$" for a 2-gram index
// Check if token has already been seen. if it has, all the
// k-grams for it have already been added.
if (term2id.get(token) != null) {
return;
}
id2term.put(++lastTermID, token);
term2id.put(token, lastTermID);
// is word long enough? for example, "a" can be bigrammed and trigrammed but not four-grammed.
// K must be <= token.length + 2. "ab". K must be <= 4
List<KGramPostingsEntry> postings = null;
if(K > token.length() + 2) {
return;
}else if(K == token.length() + 2) {
// insert the one K-gram "^<String token>$" into index
String kgram = "^"+token+"$";
postings = index.get(kgram);
SortedSet<String> kgrams = new TreeSet<String>();
kgrams.add(kgram);
term2KGrams.put(token, kgrams);
if (postings == null) {
KGramPostingsEntry newEntry = new KGramPostingsEntry(lastTermID);
ArrayList<KGramPostingsEntry> newList = new ArrayList<KGramPostingsEntry>();
newList.add(newEntry);
index.put("^"+token+"$", newList);
}
// No need to do anything if the posting already exists, so no else clause. There is only one possible term in this case
// Return since we are done
return;
}else {
// We get here if there is more than one k-gram in our term
// insert all k-grams in token into index
int start = 0;
int end = start+K;
//add ^ and $ to token.
String wrappedToken = "^"+token+"$";
int noOfKGrams = wrappedToken.length() - end + 1;
// get K-Grams
String kGram;
int startCurr, endCurr;
SortedSet<String> kgrams = new TreeSet<String>();
for (int i=0; i<noOfKGrams; i++) {
startCurr = start + i;
endCurr = end + i;
kGram = wrappedToken.substring(startCurr, endCurr);
kgrams.add(kGram);
postings = index.get(kGram);
KGramPostingsEntry newEntry = new KGramPostingsEntry(lastTermID);
// if this k-gram has been seen before
if (postings != null) {
// Add this token to the existing postingsList.
// We can be sure that the list doesn't contain the token
// already, else we would previously have terminated the
// execution of this function.
int lastTermInPostings = postings.get(postings.size()-1).tokenID;
if (lastTermID == lastTermInPostings) {
continue;
}
postings.add(newEntry);
index.put(kGram, postings);
}
// if this k-gram has not been seen before
else {
ArrayList<KGramPostingsEntry> newList = new ArrayList<KGramPostingsEntry>();
newList.add(newEntry);
index.put(kGram, newList);
}
}
Clock c = Clock.systemDefaultZone();
long timestart = c.millis();
System.out.println(token);
term2KGrams.put(token, kgrams);
long timestop = c.millis();
System.out.printf("time taken to put: %d\n", timestop-timestart);
System.out.print("put ");
System.out.println(kgrams);
System.out.println();
}
}
The insertion into the HashMap happens on the rows term2KGrams.put(token, kgrams); (There are 2 of them in the code snippet). When indexing, everything works fine until things suddenly, at 15 000 indexed files, go bad. Everything slows down immensely, and the program doesn't finish in a reasonable time, if at all.
To try to understand this problem, I've added some prints at the end of the function. This is the output they generate:
http://soccer.org
time taken to put: 0
put [.or, //s, /so, ://, ^ht, cce, cer, er., htt, occ, org, p:/, r.o, rg$, soc, tp:, ttp]
aysos
time taken to put: 0
put [^ay, ays, os$, sos, yso]
http://www.davisayso.org/contacts.htm
time taken to put: 0
put [.da, .ht, .or, //w, /co, /ww, ://, ^ht, act, avi, ays, con, cts, dav, g/c, htm, htt, isa, nta, o.o, ont, org, p:/, rg/, s.h, say, so., tac, tm$, tp:, ts., ttp, vis, w.d, ww., www, yso]
playsoccer
time taken to put: 0
put [^pl, ays, cce, cer, er$, lay, occ, pla, soc, yso]
This looks fine to me, the putting doesn't seem to be taking long time and the k-grams (in this case trigrams) are correct.
But one can see strange behaviour in the pace at which my computer is printing this information. In beginning, everything is printing at a super high speed. But at 15 000, that speed stops, and instead, my computer starts printing a few lines at a time, which of course means that indexing the other 2000 files of the corpus will take an eternity.
Another interesting thing I observed was when doing a keyboard interrupt (ctrl+c) after it had been printing erratically and slowly as described for a while. It gave me this message:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)sahandzarrinkoub#Sahands-MBP:~/Documents/Programming/Information Retrieval/lab3 2$ sh compile_all.sh
Note: ir/PersistentHashedIndex.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Does this mean I'm out of memory? Is that the issue? If so, that's surprising, because I've been storing quite a lot of things in memory before, such as a HashMap containing the document ID's of every single word in the corpus, a HashMap containing every single word where every single k-gram appears, etc.
Please let me know what you think and what I can do to fix this problem.
To understand this, you must first understand that java does not allocate memory dynamically (or, at least, not indefinetly). The JVM is by default configured to start with a minimum heap size and a maximum heap size. When the maximum heap size would be exceeded through some allocation, you get a OutOfMemoryError
You can change the minimum and maximum heap size for your execution with the vm parameters -Xms and -Xmx respectively. An example for an execution with at least 2, but at most 4 GB would be
java -Xms2g -Xmx4g ...
You can find more options on the man page for java.
Before changing the heap memory, however, take a close look at your system resources, especially whether your system starts swapping. If your system swaps, a larger heap size may let the program run longer, but with equally bad performance. The only thing possible then would be to optimize your program in order to use less memory or to upgrade the RAM of your machine.
This is not the rare question on the net, but I does a few optimization work with MySQL server for solve this problem and did not get results. So at first I use maven's package mysql:mysql-connector-java:6.0.6.
I try just to run this code:
try {
mysqlConnection = DriverManager.getConnection(DatabaseUtils.mysqlUrl, DatabaseUtils.mysqlUser, DatabaseUtils.mysqlPassword);
PreparedStatement valuesStatement = "SELECT * FROM `test` ORDER BY `id`"
ResultSet cursor = valuesStatement.executeQuery();
double value = 0;
if (cursor.next())
value = cursor.getDouble("value");
} catch (SQLException sqlEx) {
sqlEx.printStackTrace();
} finally {
cursor.close();
pricesStatement.close();
}
I have a lot records in the table. It's about million but every day add about thousand records. So I was very surprised when this simple example executed 30 seconds. I googled my problem and I find only "using pool", "tune mysql server", "try to EXPLAIN SELECT". But I've noticed that execution time related with rows count. So I looked into driver's code and found that:
TextResultsetReader::read():
while(true) {
if(row == null) {
rows = new ResultsetRowsStatic(rowList, cdef);
break;
}
if(maxRows == -1 || rowList.size() < maxRows) {
rowList.add(row);
}
row = (ResultsetRow)this.protocol.read(ResultsetRow.class, trf);
}
This means that even if I want to fetch only one row driver fetches all queried rows and get me first of it. Manuals suggest to use "setFetchSize" for fetching only n records. But it doesn't work. Driver code fetching all data anyway. So then I found that there is two recordsets: ResultRowsStatic and ResultSetStreaming. Second seems to be fetching data only when I need query it. How to use ResultRowsStreaming? I found it only into code. Parameter "fetchSize" must to equal -2147483648. I did try and it worked! Execution time of "executeQuery()" now if about 0.0007 sec. It's very fast for me. But wait.. My script anyway takes 30 seconds. Why? I debugged code after executing query. There's only two "close" methods after that. What's can go wrong? And that's true, "cursor.close()" takes the rest of time. I looked into library code again and reached ResultsetRowsStreaming::close():
boolean hadMore = false;
int howMuchMore = 0;
synchronized(mutex) {
while(this.next() != null) {
hadMore = true;
++howMuchMore;
if(howMuchMore % 100 == 0) {
Thread.yield();
}
}
if(conn != null) {
if(!((Boolean)this.protocol.getPropertySet().getBooleanReadableProperty("clobberStreamingResults").getValue()).booleanValue() && ((Integer)this.protocol.getPropertySet().getIntegerReadableProperty("netTimeoutForStreamingResults").getValue()).intValue() > 0) {
int oldValue = this.protocol.getServerSession().getServerVariable("net_write_timeout", 60);
this.protocol.clearInputStream();
try {
this.protocol.sqlQueryDirect((StatementImpl)null, "SET net_write_timeout=" + oldValue, (String)this.protocol.getPropertySet().getStringReadableProperty("characterEncoding").getValue(), (PacketPayload)null, -1, false, (String)null, (ColumnDefinition)null, (GetProfilerEventHandlerInstanceFunction)null, this.resultSetFactory);
} catch (Exception var9) {
throw ExceptionFactory.createException(var9.getMessage(), var9, this.exceptionInterceptor);
}
}
if(((Boolean)this.protocol.getPropertySet().getBooleanReadableProperty("useUsageAdvisor").getValue()).booleanValue() && hadMore) {
ProfilerEventHandler eventSink = ProfilerEventHandlerFactory.getInstance(conn.getSession());
eventSink.consumeEvent(new ProfilerEventImpl(0, "", this.owner.getCurrentCatalog(), this.owner.getConnectionId(), this.owner.getOwningStatementId(), -1, System.currentTimeMillis(), 0L, Constants.MILLIS_I18N, (String)null, (String)null, Messages.getString("RowDataDynamic.2") + howMuchMore + Messages.getString("RowDataDynamic.3") + Messages.getString("RowDataDynamic.4") + Messages.getString("RowDataDynamic.5") + Messages.getString("RowDataDynamic.6") + this.owner.getPointOfOrigin()));
}
}
}
This code unconditionally fetching all the rest of data only for logging how many records I did not fetched. Really weird. And it would be justified if logger was attached. But in my case this code counting unfetched rows and in 30 seconds and... do nothing with it. And this proble I cannot fix because there's not parameter which can tell code not to count rows.
Now I don't know what to do next. Query time is very slow for me. For example mysql driver for php execute this query in 0.0004-0.001 seconds.
So people who using mysql-connector for Java, tell me please have you got these problems? If not, could you post any examples what should I do to bypass the above problems? Maybe you use another connectors. So tell me please, what to do?
Your SQL query says
SELECT * FROM test ORDER BY id
You are, with that query, instructing your MySQL server to serialize every column of every row of your test table and send it to your Java program. So, MySQL obeys. You have a large table. So your instruction to MySQL takes time. And yes, the more rows in your table the longer it takes. This is not a problem with JDBC or the driver; it's a problem with the SQL you're using.
It seems from your sample code that you want one column -- named value -- from one row -- the first one -- in your table. You could accomplish that using this SQL statement:
SELECT value FROM test ORDER BY id LIMIT 1
If your id column is your table's primary key, this will be fast.
The whole point of SQL is to allow your tables to contain so many rows that it's unreasonable to fetch them all into your Java (or other) program in a short amount of time. That's why SQL has WHERE and LIMIT clauses.
I need to create a method which checks each element in my array to see if it is true or false, each element holds several values such as mass, formula, area etc for one compound, and in total there are 30 compounds (so the array has 30 elements). I need an algorithm to ask if mass < 50 and area > 5 = true .
My properties class looks like:
public void addProperty (Properties pro )
{
if (listSize >=listlength)
{
listlength = 2 * listlength;
TheProperties [] newList = new TheProperties [listlength];
System.arraycopy (proList, 0, newList, 0, proList.length);
proList = newList;
}
//add new property object in the next position
proList[listSize] = pro;
listSize++;
}
public int getSize()
{
return listSize;
}
//returns properties at a paticular position in list numbered from 0
public TheProperties getProperties (int pos)
{
return proList[pos];
}
}
and after using my getters/setters from TheProperties I put all the information in the array using the following;
TheProperties tp = new properties();
string i = tp.getMass();
String y = tp.getArea();
//etc
theList.addProperty(tp);
I then used the following to save an output of the file;
StringBuilder builder = new StringBuilder();
for (int i=0; i<theList.getSize(); i++)
{
if(theList.getProperties(i).getFormatted() != null)
{
builder.append(theList.getProperties(i).getFormatted());
builder.append("\n");
}
}
SaveFile sf = new SaveFile(this, builder.toString());
I just cant work out how to interrogate each compound individually for whether they reach the value or not, reading a file in and having a value for each one which then gets saved has worked, and I can write an if statement for the requirements to check against, but how to actually check the elements for each compound match the requirements? I am trying to word this best I can, I am still working on my fairly poor java skills.
Not entirely sure what you are after, I found your description quite hard to understand, but if you want to see if the mass is less than 50 and the area is greater than 5, a simple if statement, like so, will do.
if (tp.getMass() < 50 && tp.getArea() > 5) {}
Although, you will again, have to instantiate tp and ensure it has been given its attributes through some sort of constructor.
Lots of ways to do this, which makes it hard to answer.
You could check at creation time, and just not even add the invalid ones to the list. That would mean you only have to loop once.
If you just want to save the output to the file, and not do anything else, I suggest you combine the reading and writing into one function.
Open up the read and the write file
while(read from file){
check value is ok
write to file
}
close both files
The advantage of doing it this way are:
You only loop through once, not three times, so it is faster
You never have to store the whole list in memory, so you can handle really large files, with thousands of elements.
In case the requirements changes, you can write method that uses Predicate<T>, which is a FunctionalInterface designed for such cases (functionalInterfaces was introduced in Java 8):
// check each element of the list by custom condition (predicate)
public static void checkProperties(TheList list, Predicate<TheProperties> criteria) {
for (int i=0; i < list.getSize(); i++) {
TheProperties tp = list.get(i);
if (!criteria.apply(tp)) {
throw new IllegalArgumentException(
"TheProperty at index " + i + " does not meet the specified criteria");
}
}
}
If you want to check if mass < 50 and area > 5, you would write:
checkProperties(theList, new Predicate<TheProperties> () {
#Override
public boolean apply(TheProperties tp) {
return tp.getMass() < 50 && tp.getArea() > 5;
}
}
This can be shortened by using lambda expression:
checkProperties(theList, (TheProperties tp) -> {
return tp.getMass() < 50 && tp.getArea() > 5;
});
The following code reads the spreadsheet cell values with the JXL plugin and then compares these values with the values on the page and chooses the matching value from the combo box.
The code I have works but it is case sensitive, the value must be identical.
I want to improve this code to search the combo box faster and select the closest value without being identical. Currently it runs through all values slowly.
String valDesejado = tipopromocao;
String valorComboBox = "";
Select verificaOpt = new Select(driver.findElement(By.name("tipoDePromocaoPromocao")));
int tamanhoBox = verificaOpt.getOptions().size();
int variavelVerificadora1 = 0;
System.out.println("Tamanho: " + tamanhoBox);
for (int i = 0; i < tamanhoBox; i++)
{
verificaOpt.selectByIndex(i);
valorComboBox = verificaOpt.getFirstSelectedOption().getText().toString();
if (valDesejado.equalsIgnoreCase(valorComboBox))
{
i = tamanhoBox;
variavelVerificadora1 = 1;
}
}
if (variavelVerificadora1 == 0)
{
System.out.println("ALERTA: The Option + valDesejado + " no comboBox \"tipoDePromocaoPromocao\" not found.");
}
I put some comments in the code that explains what I'm doing and makes corrections on a few things.
Instead of using an int and setting it to 0/1, use a boolean and set it to true/false.
This loop should be faster because I'm not selecting each option as I loop. You can examine the text of each option without selecting it... then once you find a match, select the match.
Use break to exit a loop instead of setting the counter to max value.
Give this code a try.
String valDesejado = tipopromocao;
boolean variavelVerificadora1 = false; // use boolean instead of int set to 0/1
Select verificaOpt = new Select(driver.findElement(By.name("tipoDePromocaoPromocao")));
System.out.println("Tamanho: " + verificaOpt.getOptions().size());
// as this loops, the variable 'option' contains the current loops' OPTION element
// you don't need to select the option to get its text so this loop should be much faster
// it selects the OPTION once the correct one is found
for (WebElement option : verificaOpt.getOptions())
{
if (valDesejado.equalsIgnoreCase(option.getText()))
{
verificaOpt.selectByVisibleText(option.getText()); // select the OPTION match
variavelVerificadora1 = true; // set the boolean to true to indicate we found a match
break; // exits the for loop
}
}
if (!variavelVerificadora1) // this is the equivalent of variavelVerificadora1 == false, it's basically saying if not true
{
System.out.println("ALERTA: The Option" + valDesejado + " no comboBox \"tipoDePromocaoPromocao\" not found.");
}
I am running Android Studio and setting breakpoints, but on both of my IF...ELSE conditions, the code is not being executed. It seems to me that such a thing is impossible. Either the IF or the ELSE should be true... right?
The code is this:
if (lastReading.isItTimeYet(Calendar.getInstance()))
{
lastReadingReturn = lastReading.SensorReadingChanges(z_value, chkOrient, inclination, rotation);
if (lastReadingReturn.isEmpty())
{
String EMPTY = "TRUE";
// DO NOTHING
}
else
{
int stopHERE = 0;
}
}
lastReadingReturn is a string. It is getting a value from SensorReadingChanges just fine. I don't see any errors being thrown.
I put break points on both the String EMPTY = "TRUE"; line and the int stopHERE = 0; line, but neither is hit. I can stop on the line before the if. But when I try to step into or step over the next line, the debugger jumps to the first line of code that is OUT of the If clause. In other words, it just skips it.
I have run it with both conditions (i.e. the string being checked is empty and the string being checked has a value) but it doesn't matter. Neither is hit.
Here is a screenshot of my Android Studio running in debugger:
Both your if-block and your else-block contains code that is very likely to be removed by the compiler during optimization since they actually don't do anything.
Try replacing them with something that actually does something like logging a message or move the declaration of EMPTY and stopHERE outside of their respective blocks.
String EMPTY;
int stopHERE;
if (lastReading.isItTimeYet(Calendar.getInstance()))
{
lastReadingReturn = lastReading.SensorReadingChanges(z_value, chkOrient, inclination, rotation);
if (lastReadingReturn.isEmpty())
{
EMPTY = "TRUE";
// DO NOTHING
System.out.println("Doing nothing");
}
else
{
stopHERE = 0;
System.out.println("stopHERE set to zero");
}
}
Edit: Since I'm not being believed I simulated OPs issue in Android Studio:
final Random random = new Random();
if(random.nextInt(1) != 0) {
String EMPTY = "EMPTY";
} else {
int stopHERE = 0;
}
So, Android Studio is actually warning us that there is no executable code inside our else-block where the breakpoint is.
When run, this will only pause execution once, on row 18 (since random.nextInt(1) will always be 0).