For a java simulation project I have a log file that keeps track of everything that happens during a single run. I do this by adding objects of a custom TimeLog.class to a List. When the simulation is complete I loop through this list and write an ArrayList of strings for each entry, followed by writing it to a csv file using the following "CSVHelper" class which I found online (credits to Keke Chen).
Creating ArrayList of strings:
public static void saveData(List<TimeLog> data, String fileName, Simulator simulator,
ArrayList<String> header) throws Exception {
File csvFile = new File(simulator.getOutputFolder()+fileName);
FileOutputStream fos = new FileOutputStream(csvFile);
Writer fw = new OutputStreamWriter(fos, "UTF-8");
CSVHelper.writeLine(fw, header);
for (TimeLog entry : data) {
ArrayList<String> row = new ArrayList<>();
row.add(String.valueOf(entry.getTime())); //time of entry, double
row.add(String.valueOf(entry.getMessageType())); //type of message (SETUP, DEBUG, EVENT, ERROR)
if (entry.getEvent() == null) {
row.add("");
} else {
row.add(entry.getEvent().getTypeName()); //event type if event
}
row.add(entry.getLog()); //message
CSVHelper.writeLine(fw, row);
}
fw.flush();
fw.close();
}
CSVHelper.writeLine :
/** Class for processing csv file.
* created by Keke Chen (keke.chen#wright.edu)
* For Cloud Computing Labs
* Feb. 2014
*/
public static void writeLine(Writer w, ArrayList<String> values) throws Exception {
boolean firstVal = true;
for (String val : values) {
if (!firstVal) {
w.write(",");
}
w.write("\"");
for (int i=0; i<val.length(); i++) {
char ch = val.charAt(i);
if (ch=='\"') {
w.write("\""); //extra quote
}
w.write(ch);
}
w.write("\"");
firstVal = false;
}
w.write("\n");
}
For debugging purposes, I catch exceptions thrown by the simulator, add information about them to my logging list, write the log file, and then throw the exception again. This way, when a problem occurs I can look in the log file what happened.
Until recently this worked just fine, but now I get a NullPointerException during the 2nd for-loop in the writing process. For some reason the line
for (int i=0; i<val.length(); i++)
throws an exception halfway during a string. If I open the log file that is created, the last entry has a half finished message or timestamp as well (for example: "tool X was res" instead of "tool X was reserved for item Y" or "212" instead of "212312.16").
Is there any limitation to writing csv files, keeping lists in java, using for-loops on characters of strings, or anything else that I am not aware of? This is giving me quiet the headache.
Ciao,
Robin
EDIT: As requested, an example of the list entries:
//initiating:
private List<TimeLog> logging = new ArrayList<>();
//message:
public void message(MessageType mt, String s, Event e) {
if (saveLog) {
logFile.add(new TimeLog(now(), mt, s, e)); //now() gets the current time of the simulator
}
}
//example:
Event e = new ReservationEvent(simulator.now(), X, Y);
simulator.message(MessageType.EVENT, "Tool "+X.getId()+" was reserved for item "+Y+getId(), e);
this is the stack trace:
java.lang.NullPointerException
at model.simulator.transport.input.CSVHelper.writeLine(CSVHelper.java:27)
at model.simulator.TimeLog.saveData(TimeLog.java:73)
at model.simulator.Simulator.endSimulation(Simulator.java:98)
at model.simulator.Simulator.runSimulation(Simulator.java:64)
at test.simulator.compare.TestFullFab.testRunning(TestFullFab.java:147)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:539)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:761)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:461)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:207)
Something else that I noticed due to Emax's comment. I added
System.out.println("0:"+row.get(0)+", 1:"+row.get(1)+", 2:"+row.get(2)+", 3:"+row.get(3));
in the saveData method, just before CSVHelper.writeLine(fw,row), to see the list of strings before they get written. This resulted in the full list of logs in the console, so also the ones that did not (or partly) get written to the csv-file. Somehow, the TimeLog for-loop continues while the writeLine is throwing exceptions.
EDIT2: structure of TimeLog:
public class TimeLog {
private double time;
private MessageType mt;
private String log;
private Event e;
public TimeLog(double time, MessageType mt, String log, Event e) {
this.time = time;
this.mt = mt;
this.log = log;
this.e = e;
}
//see full method above
public static void saveData(List<TimeLog> data, String fileName, Simulator simulator,
ArrayList<String> header) throws Exception { ...
//further more, there are some getters
}
An example of the output in the console as a result of the System.out.println() line:
0:27131.490112666303, 1:EVENT, 2:TOOLEVENT, 3:Tool input event, item 3998 has gone into tool QSD301
keep in mind that the 0:,1:,2:, and 3: are not in the original list but added in the println() method.
UPDATE: Using Emax's idea of printing the values where it returns null revealed the problem. The value of val indeed becomes null, and it happens at the NullPointerException (which has no message), so e.getMessage() returns null. Strangly enough, the Writer simply does not write all the values in the csv file, but seems to have some kind of delay (maybe it first collects a default number of characters before actually writing it to the file or something?). Because of this, I thought the NullPointer happened during a loop through val, while it had already finished. Thanks for the help!
The only possible reason that cause a NullPointerException on that line is that val is null since val is taken from a list, the problem bubble up to:
row.add(String.valueOf(entry.getTime()));
row.add(String.valueOf(entry.getMessageType()));
row.add(entry.getEvent().getTypeName());
row.add(entry.getLog());
One of this add is adding a null to the list, you should check for one of this.
However post the full stracktrace including (if present) the "Caused By"
UPDATE
Try to modify writeLine like this:
[...]
if (!firstVal) {
w.write(",");
}
w.write("\"");
// EDIT HERE
if (val == null) {
System.out.println("This val is null: " + values);
}
// .....
for (int i = 0; i < val.length(); i++) {
[...]
Post the output with the case that caused the NullPointerException
Related
The below code is throwing ‘stale element reference: element is not attached to the page document’
How to handle error message and show the result as a pass
Action- The below code is adding one image(which can be one or more) which inside another image
public void click_on_the_Add_to_collection_button_displaying_down_below_the_assert() {
List<WebElement> all_colection = driver.findElements(By.xpath("//li//button[#class='icon-button ' and #title='Add To Collection']"));
int collection_size = all_colection.size();
System.out.println("the collection size is " + collection_size);
Random ran = new Random();
all_colection.get(collection_size - 1).click();
crate_new_colection.sendKeys("newcollection#1");
}
Wrap with try..catch like shown below:
public void click_on_the_Add_to_collection_button_displaying_down_below_the_assert() {
try{
List<WebElement> all_colection = driver.findElements(By.xpath("//li//button[#class='icon-button ' and #title='Add To Collection']"));
int collection_size = all_colection.size();
System.out.println("the collection size is " + collection_size);
Random ran = new Random();
all_colection.get(collection_size - 1).click();
crate_new_colection.sendKeys("newcollection#1");
}catch (StaleElementReferenceException e){
System.err.println("Skipping the exception. Do not let it get down stack.");
}
}
P.S. - I kept your snake case unchanged but please do not use Python-like code style in Java.
I compiled matlab function using library compiler in matlab 2015b, I suspect the pca function the source of the exception, because I have made a simple function of addition and it was executed without any problems, how can I execute the pca function??
Matlab function:
function [ COEFF,SCORE,latent] = ACP( path )
Data = fileread(path);
Data = strrep(Data, ',', '.');
FID = fopen('comma2pointData.txt', 'w');
fwrite(FID, Data, 'char');
fclose(FID);
Data=importdata('comma2pointData.txt','\t');
[COEFF,SCORE,latent] = pca(Data);
end
Java code:
String path = "/Desktop/datamicro.txt";
Object[] result = null;
acpClass acp = null;
try {
acp = new acpClass();
result=acp.ACP(3, path);
} catch (MWException ex) {
Logger.getLogger(CalculAcpFrame.class.getName()).log(Level.SEVERE, null, ex);
} finally {
MWArray.disposeArray(result);
acp.dispose();
}
datamicro.txt
0,25 0,16 0,95 0,53 0,22 1,17 549,00
0,20 0,06 0,39 0,62 0,18 1,09 293,25
0,16 0,05 0,31 0,39 0,14 0,78 935,00
0,19 0,06 0,40 0,62 0,23 1,14 380,00
The exception:
Caught "std::exception" Exception message is:
Timed out waiting for Thread to Process
avr. 06, 2017 11:59:57 PM microarchi_proj.Microarchi_proj main
GRAVE: null
... Matlab M-code Stack Trace ...
com.mathworks.toolbox.javabuilder.MWException: Timed out waiting for Thread to Process
at com.mathworks.toolbox.javabuilder.internal.MWMCR.mclFeval(Native Method)
at com.mathworks.toolbox.javabuilder.internal.MWMCR.access$600(MWMCR.java:31)
at com.mathworks.toolbox.javabuilder.internal.MWMCR$6.mclFeval(MWMCR.java:861)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.mathworks.toolbox.javabuilder.internal.MWMCR$5.invoke(MWMCR.java:759)
at com.sun.proxy.$Proxy0.mclFeval(Unknown Source)
at com.mathworks.toolbox.javabuilder.internal.MWMCR.invoke(MWMCR.java:427)
at ACPFunction.acpClass.ACP(acpClass.java:210)
at microarchi_proj.Microarchi_proj.main(Microarchi_proj.java:1145)
Edited:
After data has been provided, the code has been edited to accommodate data formatting.
The edited code is:
function [ COEFF,SCORE,latent] = ACP(path)
%% Reading file as a string
Data = fileread(path);
%% Converting comma decimal to point decimal values
Data = strrep(Data, ',', '.');
%% writing point decimal values to new file
FID = fopen('comma2pointData.txt', 'w');
fwrite(FID, Data, 'char');
fclose(FID);
% Delete unwanted variables
clear Data FID
%% Reading new file
nData=importdata('comma2pointData.txt','\t');
% determin rows and columns
rows = length(nData);
[~,columns] = size(strsplit(nData{1},' '));
b = zeros(rows,columns);
%
for row = 1:rows
line = nData{row};
a = strsplit(line,' ');
b(row,:)=cellfun(#str2num,a);
end
% Delete unwanted variables
clear nData a line row rows columns
%% Calling pca function
[COEFF,SCORE,latent] = pca(b);
% Delete unwanted variables
clear b
% End of Function
end
I hope it will solve your problem.
I have the following Android code:
public final List<MyObj> getList() {
Cursor cursor = null;
try {
final String queryStr = GET_LIST_STATEMENT;
cursor = db.rawQuery(queryStr, new String[] {});
List<MyObj> list = null;
//here I get the data from de cursor.
return list ;
} catch (SQLiteFullException ex) {
//do something to treat the exception.
} finally {
if (cursor != null) {
cursor.close();
}
}
}
When I run PMD analysis over this code, I get the following issue: Found 'DD'-anomaly for variable 'cursor' (lines '182'-'185').
The line 182 is: Cursor cursor = null;.
The line 185 is: cursor = db.rawQuery(queryStr, new String[] {});
So, I understand that the problem is that I'm doing a Premature Initialization in the line 182 (I never read the variable between the lines 182 and 185), but if I don't do that, I can't have the code closing the cursor in the finally block.
What to do in this case? Just ignore this PMD issue? Can I configure PMD to don't rise up this specific kind of DD-anomaly (not all DD-anomaly)? Should PMD be smart enough to doesn't rise up this issue?
Another example of DD-anomaly that I think is not a real problem:
Date distributeDate;
try {
distributeDate = mDf.parse(someStringDate);
} catch (ParseException e) {
Log.e("Problem", "Problem parsing the date of the education. Apply default date.");
distributeDate = Calendar.getInstance().getTime();
}
In this case, the anomaly occurs with the distributeDate variable.
The documentation is pretty easy to understand:
Either you use annotations to suppress warnings:
// This will suppress UnusedLocalVariable warnings in this class
#SuppressWarnings("PMD.UnusedLocalVariable")
public class Bar {
void bar() {
int foo;
}
}
or you use a comment:
public class Bar {
// 'bar' is accessed by a native method, so we want to suppress warnings for it
private int bar; //NOPMD
}
When it comes to your specific code, I'd say that the easiest way to handle it is to not use a finally block even though this would look like the perfect place for it.
I have a Plain text file with possibly millions of lines which needs custom parsing and I want to load it into an HBase table as fast as possible (using Hadoop or HBase Java client).
My current solution is based on a MapReduce job without the Reduce part. I use FileInputFormat to read the text file so that each line is passed to the map method of my Mapper class. At this point the line is parsed to form a Put object which is written to the context. Then, TableOutputFormat takes the Put object and inserts it to table.
This solution yields an average insertion rate of 1,000 rows per second, which is less than what I expected. My HBase setup is in pseudo distributed mode on a single server.
One interesting thing is that during insertion of 1,000,000 rows, 25 Mappers (tasks) are spawned but they run serially (one after another); is this normal?
Here is the code for my current solution:
public static class CustomMap extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
protected void map(LongWritable key, Text value, Context context) throws IOException {
Map<String, String> parsedLine = parseLine(value.toString());
Put row = new Put(Bytes.toBytes(parsedLine.get(keys[1])));
for (String currentKey : parsedLine.keySet()) {
row.add(Bytes.toBytes(currentKey),Bytes.toBytes(currentKey),Bytes.toBytes(parsedLine.get(currentKey)));
}
try {
context.write(new ImmutableBytesWritable(Bytes.toBytes(parsedLine.get(keys[1]))), row);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public int run(String[] args) throws Exception {
if (args.length != 2) {
return -1;
}
conf.set("hbase.mapred.outputtable", args[1]);
// I got these conf parameters from a presentation about Bulk Load
conf.set("hbase.hstore.blockingStoreFiles", "25");
conf.set("hbase.hregion.memstore.block.multiplier", "8");
conf.set("hbase.regionserver.handler.count", "30");
conf.set("hbase.regions.percheckin", "30");
conf.set("hbase.regionserver.globalMemcache.upperLimit", "0.3");
conf.set("hbase.regionserver.globalMemcache.lowerLimit", "0.15");
Job job = new Job(conf);
job.setJarByClass(BulkLoadMapReduce.class);
job.setJobName(NAME);
TextInputFormat.setInputPaths(job, new Path(args[0]));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(CustomMap.class);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Put.class);
job.setNumReduceTasks(0);
job.setOutputFormatClass(TableOutputFormat.class);
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args) throws Exception {
Long startTime = Calendar.getInstance().getTimeInMillis();
System.out.println("Start time : " + startTime);
int errCode = ToolRunner.run(HBaseConfiguration.create(), new BulkLoadMapReduce(), args);
Long endTime = Calendar.getInstance().getTimeInMillis();
System.out.println("End time : " + endTime);
System.out.println("Duration milliseconds: " + (endTime-startTime));
System.exit(errCode);
}
I've gone through a process that is probably very similar to yours of attempting to find an efficient way to load data from an MR into HBase. What I found to work is using HFileOutputFormat as the OutputFormatClass of the MR.
Below is the basis of my code that I have to generate the job and the Mapper map function which writes out the data. This was fast. We don't use it anymore, so I don't have numbers on hand, but it was around 2.5 million records in under a minute.
Here is the (stripped down) function I wrote to generate the job for my MapReduce process to put data into HBase
private Job createCubeJob(...) {
//Build and Configure Job
Job job = new Job(conf);
job.setJobName(jobName);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
job.setMapperClass(HiveToHBaseMapper.class);//Custom Mapper
job.setJarByClass(CubeBuilderDriver.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(HFileOutputFormat.class);
TextInputFormat.setInputPaths(job, hiveOutputDir);
HFileOutputFormat.setOutputPath(job, cubeOutputPath);
Configuration hConf = HBaseConfiguration.create(conf);
hConf.set("hbase.zookeeper.quorum", hbaseZookeeperQuorum);
hConf.set("hbase.zookeeper.property.clientPort", hbaseZookeeperClientPort);
HTable hTable = new HTable(hConf, tableName);
HFileOutputFormat.configureIncrementalLoad(job, hTable);
return job;
}
This is my map function from the HiveToHBaseMapper class (slightly edited ).
public void map(WritableComparable key, Writable val, Context context)
throws IOException, InterruptedException {
try{
Configuration config = context.getConfiguration();
String[] strs = val.toString().split(Constants.HIVE_RECORD_COLUMN_SEPARATOR);
String family = config.get(Constants.CUBEBUILDER_CONFIGURATION_FAMILY);
String column = strs[COLUMN_INDEX];
String Value = strs[VALUE_INDEX];
String sKey = generateKey(strs, config);
byte[] bKey = Bytes.toBytes(sKey);
Put put = new Put(bKey);
put.add(Bytes.toBytes(family), Bytes.toBytes(column), (value <= 0)
? Bytes.toBytes(Double.MIN_VALUE)
: Bytes.toBytes(value));
ImmutableBytesWritable ibKey = new ImmutableBytesWritable(bKey);
context.write(ibKey, put);
context.getCounter(CubeBuilderContextCounters.CompletedMapExecutions).increment(1);
}
catch(Exception e){
context.getCounter(CubeBuilderContextCounters.FailedMapExecutions).increment(1);
}
}
I pretty sure this isn't going to be a Copy&Paste solution for you. Obviously the data I was working with here didn't need any custom processing (that was done in a MR job before this one). The main thing I want to provide out of this is the HFileOutputFormat. The rest is just an example of how I used it. :)
I hope it gets you onto a solid path to a good solution. :
One interesting thing is that during insertion of 1,000,000 rows, 25 Mappers (tasks) are spawned but they run serially (one after another); is this normal?
mapreduce.tasktracker.map.tasks.maximum parameter which is defaulted to 2 determines the maximum number of tasks that can run in parallel on a node. Unless changed, you should see 2 map tasks running simultaneously on each node.
I am using the Stanford Natural Language processing toolkit. I've been trying to find spelling errors with Lexicon's isKnown method, but it produces quite a few false positives. So I thought I'd load a second lexicon, and check that too. However, that causes a problem.
private static LexicalizedParser lp = new LexicalizedParser(Constants.stdLexFile);
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);
static {
lp.setOptionFlags(Constants.lexOptionFlags);
wsjLexParse.setOptionFlags(Constants.lexOptionFlags);
}
public ParseTree(String input) throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
initialInput = input;
DocumentPreprocessor process = new DocumentPreprocessor();
sentences = process.getSentencesFromText(new StringReader(input));
for (List<? extends HasWord> sent : sentences) {
if(lp.parse(sent)) { // line 65
forest.add(lp.getBestParse()); //non determinism?
}
}
partsOfSpeech = pos();
runAnalysis();
}
The following fail trace is produced:
java.lang.ArrayIndexOutOfBoundsException: 45547
at edu.stanford.nlp.parser.lexparser.BaseLexicon.initRulesWithWord(BaseLexicon.java:300)
at edu.stanford.nlp.parser.lexparser.BaseLexicon.isKnown(BaseLexicon.java:160)
at edu.stanford.nlp.parser.lexparser.BaseLexicon.ruleIteratorByWord(BaseLexicon.java:212)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.initializeChart(ExhaustivePCFGParser.java:1299)
at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:388)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:234)
at nth.compling.ParseTree.<init>(ParseTree.java:65)
at nth.compling.ParseTreeTest.constructor(ParseTreeTest.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.internal.runners.BeforeAndAfterRunner.invokeMethod(BeforeAndAfterRunner.java:74)
at org.junit.internal.runners.BeforeAndAfterRunner.runBefores(BeforeAndAfterRunner.java:50)
at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:33)
at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
If I comment out this line: (and other references to wsjLexParse)
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);
then everything works fine. What am I doing wrong here?
Looks like a bug in the Stanford library. You should report it to them.
Does the second lexicon work when you load only it (and not the other one)?
Does the same error occur when you load the two lexica in different order?