JavaPlot timestamps not working - java

I have a file like
1429520881 15.0
1429520882 3.0
1429520883 340.0
and I try to use it in JavaPlot
JavaPlot plot=new JavaPlot();
GenericDataSet dataset=new GenericDataSet();
filling dataset with data
...
plot.set("xdata","time");
plot.set("timefmt","'%s'");
plot.set("format x","'%H:%M:%S'");
plot.plot();
in result gnuplot's window don't appear but If I try this file directly in gnuplot with the same data and options it shows me a time on xAxis; If in JavaPlot I delete last settings(xdata, timefmt,format) it works but it shows me only numbers
I also tried to create manualy dataset with data in program but the same result.
I also implement new DataSet with date as String but it seems that xdata,time option doesn't work

It took forever to figure this one out. I found that if you have a DataSetPlot object you can set the 'using' option:
DataSetPlot dataSet = new DataSetPlot( values );
dataSet.set( "using", "1:2" );
This will then make use of the 'using' option for the plot command, eg:
plot '-' using 1:2 title 'Success' with lines linetype rgb 'green'
You have to have the 'using' option when making use of time for the x axis, otherwise you will see this error:
Need full using spec for x time data

It generates temp script file with data inside in weird order because of ParametersHolder inherits HashMap and there should be "using" keyword after '-'
for example:
I wrote LinkedParams extends GNUPlotParameters class with inner LinkedMap and overrided methods to use inner structure;
set ... ...(xrange,yrange etc)
set xdata time
set timefmt '%s'
set format x '%H:%M:%S'
plot '-' using 1:2 title 'ololo' with linesploints lineType 2 lineWidth 3
1429520881 15.0
1429520882 3.0
1429520883 340.0
e
quit
but it was
set xdata time
set ... ...(xrange,yrange etc)
set format x '%H:%M:%S'
set timefmt '%s'
plot '-' title 'ololo' with linesploints lineType 2 lineWidth 3
1429520881 15.0
1429520882 3.0
1429520883 340.0
e
quit

Related

Set WatermarkStrategy for Event Timestamps

I am trying to do a windowed aggregation query on a data stream that contains over 40 attributes in Flink. The stream's schema contains an epoch timestamp which I want to use for the WatermarkStrategy so I can actually define tumbling windows over it.
I know from the docs, that you can define a Timestamp using the SQL Api in an CREATE TABLE-query by first using TO_TIMESTAMP_LTZ on the epochs to convert it to a proper timestamp which can be used in the following WATERMARK FOR-statement. Since I got a really huge schema tho, I want to deserialise and provide the schema NOT by completely writing the complete CREATE TABLE-statement containing all columns BUT by using a custom class derived from the proto file that cointains the schema. As far as I know, this is only possible by providing a deserializer for the KafkaSourceBuilder and calling the results function of the stream on the class derived from the protofile with protoc. Which means, that I have to define the table using the stream api.
Inspired by the answer to this question, I do it like this:
WatermarkStrategy watermarkStrategy = WatermarkStrategy
.<Row>forBoundedOutOfOrderness(Duration.ofSeconds(10))
.withTimestampAssigner( (event, ts) -> (Long) event.getField("ts"));
tableEnv.createTemporaryView(
"bidevents",
stream
.returns(BiddingEvent.BidEvent.class)
.map(e -> Row.of(
e.getTracking().getCampaign().getId(),
e.getTracking().getAuction().getId(),
Timestamp.from(Instant.ofEpochSecond(e.getTimestamp().getMilliseconds() / 1000))
)
)
.returns(Types.ROW_NAMED(new String[] {"campaign_id", "auction_id", "ts"}, Types.STRING, Types.STRING, Types.SQL_TIMESTAMP))
.assignTimestampsAndWatermarks(watermarkStrategy)
);
tableEnv.executeSql("DESCRIBE bidevents").print();
Table resultTable = tableEnv.sqlQuery("" +
"SELECT " +
" TUMBLE_START(ts, INTERVAL '1' DAY) AS window_start, " +
" TUMBLE_END(ts, INTERVAL '1' DAY) AS window_end, " +
" campaign_id, " +
" count(distinct auction_id) auctions " +
"FROM bidevents " +
"GROUP BY TUMBLE(ts, INTERVAL '1' DAY), campaign_id");
DataStream<Row> resultStream = tableEnv.toDataStream(resultTable);
resultStream.print();
env.execute();
I get this error:
Caused by: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Window aggregate can only be defined over a time attribute column, but TIMESTAMP(9) encountered.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372) ~[flink-dist-1.15.1.jar:1.15.1]
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) ~[flink-dist-1.15.1.jar:1.15.1]
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist-1.15.1.jar:1.15.1]
at org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:291) ~[flink-dist-1.15.1.jar:1.15.1]
This seems kind of logical, since in line 3 I cast a java.sql.Timestamp to a Long value, which it is not (but also the stacktrace does not indicate that an error occurred during the cast). But when I do not convert the epoch (in Long-Format) during the map-statement to a Timestamp, I get this exception:
"Cannot apply '$TUMBLE' to arguments of type '$TUMBLE(<BIGINT>, <INTERVAL DAY>)'"
How can I assign the watermark AFTER the map-statement and use the column in the later SQL Query to create a tumbling window?
======UPDATE=====
Thanks to a comment from David, I understand that I need the column to be of type TIMESTAMP(p) with precision p <= 3. To my understanding this means, that my timestamp may not be more precise than having full milliseconds. So i tried different ways to create Java Timestamps (java.sql.Timestamps and java.time.LocaleDateTime) which correspond to the Flink timestamps.
Some examples are:
1 Trying to convert epochs into a LocalDateTime by setting nanoseconds (the 2nd parameter of ofEpochSecond to 0):
LocalDateTime.ofEpochSecond(e.getTimestamp().getMilliseconds() / 1000, 0, ZoneOffset.UTC )
2 After reading the answer from Svend in this question who uses LocalDateTime.parse on timestamps that look like this "2021-11-16T08:19:30.123", I tried this:
LocalDateTime.parse(
DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss").format(
LocalDateTime.ofInstant(
Instant.ofEpochSecond(e.getTimestamp().getMilliseconds() / 1000),
ZoneId.systemDefault()
)
)
)
As you can see, the timestamps even only have seconds-granularity (which i checked when looking at the printed output of the stream I created) which I assume should mean, they have a precision of 0. But actually when using this stream for defining a table/view, it once again has the type TIMESTAMP(9).
3 I also tried it with the sql timestamps:
new Timestamp(e.getTimestamp().getMilliseconds() )
This also did not change anything. I somehow always end up with a precision of 9.
Can somebody please help me how I can fix this?
Ok, I found the solution to the problem. If you got a stream containing a timestamp which you want to define as event time column for watermarks, you can use this function:
Table inputTable = tableEnv.fromDataStream(
stream,
Schema.newBuilder()
.column("campaign_id", "STRING")
.column("auction_id", "STRING")
.column("ts", "TIMESTAMP(3)")
.watermark("ts", "SOURCE_WATERMARK()")
.build()
);
The important part is, that you can "cast" the timestamp ts from TIMESTAMP(9) "down" to TIMESTAMP(3) or any other precision below 4 and you can set the column to contain the water mark.
Another mention that seems important to me is, that only Timestamps of type java.time.LocalDateTime actually worked for later use as watermarks for tumbling windows.
Any other attempts to influence the precision of the timestamps by differently creating java.sql.Timestamp or java.time.LocalDateTime failed. This seemed to be the only viable way.

Calculating TDIST using Apache Commons library

I'm trying to calculate a 2 tailed Student Distribution using commons-math. I'm using Excel to compare values and validate if my results are correct.
So Using excel to calculate TDIST(x, df, t) with x = 5.968191467, df = 8, tail t = 2
=TDIST(ABS(5.968191467),8,2)
And get the Result: 0.000335084
Using commons Math like so :
TDistribution tDistribution = new TDistribution(8);
System.out.println(BigDecimal.valueOf(tDistribution.density(5.968191467)));
I get Result : 0.00018738010608336254
What should I be using to get the result exactly like the TDIST value?
To replicate your formula in Excel you can use CDF:
2*(1.0 - tDistribution.cumulativeProbability(5.968191467))
The right formula for a general x is:
2*(1.0 - tDistribution.cumulativeProbability(Math.abs(x)))
(thanks to ryuichiro). Do not forget to add the absolute value, because TDIST for 2 degrees of freedom in Excel is a symmetrical formula, that is
TDIST(-x,df,2) = TDIST(x,df,2)
and the one of ryuchiro would not work for negative x's. Check also the docs or this.

increase width of AcroFields(iTextSharp)

I'm using iTextSharp to populate the data to PDF Templates, which is created in OpenOffice. it populating fine, I'm getting proper PDF, but some where i want to increase the width of the AcroField.
I did below code. it is increasing the width but text is not displaying.
AcroFields.Item fldItem = fields.getFieldItem(fieldName);
for (int i =0; i < fldItem.size(); ++i) {
PdfDictionary widgetDict = fldItem.getWidget(0);
PdfArray rectArr = widgetDict.getAsArray(PdfName.RECT);
float origX = rectArr.getAsNumber(0).floatValue();
rectArr.set( 2, new PdfNumber( origX + 12 + 60 ) );
}
in the below image highlighted one. actual string is 10000 SUPERIOROPTICAL 123 4567 89
please help.
thanks.
I can't reproduce the problem. I've made this POC: ChangeFieldSize
In this example, I take a form with three fields, among others a "Name" and a "Company" field. I first change the size of the "Name" field, the same way you change the field. Then I fill out the "Name" field and the "Company" field. Note that the order in which I perform these operations is important. Maybe you're doing it the other way round.
The result looks like this:
As you can see, the text isn't truncated the way it is in your screen shot.
So there are two things you can try:
change the order in which you change the field rectangle and fill out the field.
upgrade to the most recent version of iTextSharp.
If that doesn't help, post a SSCCE.

ELKI DBSCAN : How to set dbc.parser?

I am doing DBSCAN clustering and I have one more column apart from latitude longitude which I want to see with cluster results. For example data looks like this:
28.6029445 77.3443552 1
28.6029511 77.3443573 2
28.6029436 77.3443458 3
28.6029011 77.3443032 4
28.6028967 77.3443042 5
28.6029087 77.3442829 6
28.6029132 77.3442797 7
Now in minigui if i set parser.labelindices to 2 and run the task then the output looks like this:
# Cluster: Cluster 0
ID=63222 28.6031295 77.3407848 441
ID=63225 28.603134 77.3407744 444
ID=63220 28.6031566667 77.3407816667 439
ID=63226 28.6030819 77.3407605 445
ID=63221 28.6032 77.3407616667 440
ID=63228 28.603085 77.34071 447
ID=63215 28.60318 77.3408583333 434
ID=63229 28.6030751 77.3407096 448
So it is still connected to the 3rd column which I passed as a label. I have checked the clustering result by passing just latitude and longitude and its exactly same. So in a way by passing a column as 'label' I can retrieve that column with lat long in cluster results.
Now I want to use this in my java code
// Setup parameters:
ListParameterization params = new ListParameterization();
params.addParameter(
FileBasedDatabaseConnection.Parameterizer.INPUT_ID,
fileLocation);
params.addParameter(
NumberVectorLabelParser.Parameterizer.LABEL_INDICES_ID,
2);
params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,
RStarTreeFactory.class);
But this is giving a NullPointerException. In MiniGui dbc.parser is NumberVectorLabelParser by default. So this should work fine. What am I missing?
I will have a look into the NPE, it should return a more helpful error message instead.
Most likely, the problem is that this parameter is of type List<Integer>, i.e. you would need to pass a list. Alternatively, you can pass a String, which will be parsed. The following should work just fine:
params.addParameter(
NumberVectorLabelParser.Parameterizer.LABEL_INDICES_ID,
"2");
Note that the text writer might (I have not checked this) print labels as is. So you cannot take the output as indication that it considered your data set to be 3 dimensional.
The debugging handler -resulthandler LogResultStructureResultHandler -verbose should give you type output:
java -jar elki.jar KDDCLIApplication -dbc.in dbpedia.gz \
-algorithm NullAlgorithm \
-resulthandler LogResultStructureResultHandler -verbose
should yield an output like this:
de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection.load: 1941 ms
de.lmu.ifi.dbs.elki.algorithm.NullAlgorithm.runtime: 0 ms
BasicResult: Algorithm Step (main)
StaticArrayDatabase: Database (database)
DBIDView: Database IDs (DBID)
MaterializedRelation: DoubleVector,dim=2 (relation)
MaterializedRelation: LabelList (relation)
SettingsResult: Settings (settings)
In this case, my data set are coordinates from Wikipedia, along with a name each. I have a 2 dimensional DoubleVector relation, and a LabelList relation storing the object names.

Batch Inserts in PostgreSQL using JDBC

I want to insert a file into postgresql using JDBC.
I know the below command to use but the file downloaded is not parsed with a delimiter("|" or ",")
Filereader fr = new FileReader("mesowest.out.txt");
cm.copyIn("COPY tablename FROM STDIN WITH DELIMITER '|'", fr);
My file looks like this :
ABAUT 20131011/0300 8.00 37.84 -109.46 3453.00 21.47 8.33 241.90
ALTU1 20131011/0300 8.00 37.44 -112.48 2146.00 -9999.00 -9999.00 -9999.00
BDGER 20131011/0300 8.00 39.34 -108.94 1529.00 43.40 0.34 271.30
BULLF 20131011/0300 8.00 37.52 -110.73 1128.00 56.43 8.07 197.50
CAIUT 20131011/0300 8.00 38.35 -110.95 1381.00 54.88 8.24 250.00
CCD 20131011/0300 8.00 40.69 -111.59 2743.00 27.94 8.68 285.40
So my question is .. is it necessary to append delimiters to this file to push it into the database using jdbc?
from postgres documentation
delimiter : The single character that separates columns within each row (line) of the file. The default is a tab character in text mode, a comma in CSV mode.
It looks like your data is tab delimeted. So using the default should work.
Filereader fr = new FileReader("mesowest.out.txt");
cm.copyIn("COPY tablename FROM STDIN", fr);
You need to transform your file in some way, yes.
It looks it is currently either delimited by a variable number of spaces, or it has fixed width fields. The difference being, what would happen if 2146.00 were changed to 312146.00? Would it run into the previous field like "-112.48312146.00", like fixed width would, or would you add a space anyway even though that would break the column alignment?
I don't believe either of those are directly supported by COPY, so some transformation is necessary. Also, -9999.00 looks like a magic value that should probably be converted to NULL.

Categories