Apache Beam create timeseries from event stream - java

I am trying to create a timeseries of the count of events that happened over a given time.
The events are encoded as
PCollection<KV<String, Long>> events;
Where the String is the id of the event source, and long is the timestamp of the event.
What I want out is a PCollection<Timeseries> of timeseries that have the form
class Timeseries {
String id;
List<TimeseriesWindow> windows;
}
class TimeseriesWindow {
long timestamp;
long count;
}
Toy example with a fixed window size (is this the correct term?) of 10 seconds, and an total timeseries duration of 60 seconds:
Input:
[("one", 1), ("one", 13), ("one", 2), ("one", 43), ("two", 3)]
Output:
[
{
id: "one"
windows: [
{
timestamp: 0,
count: 2
},
{
timestamp: 10,
count: 1
},
{
timestamp: 20,
count: 0
},
{
timestamp: 30,
count: 0
},
{
timestamp: 40,
count: 1
},
{
timestamp: 50,
count: 0
}
]
},
{
id: "two"
windows: [
{
timestamp: 0,
count: 1
},
{
timestamp: 10,
count: 0
},
{
timestamp: 20,
count: 0
},
{
timestamp: 30,
count: 0
},
{
timestamp: 40,
count: 0
},
{
timestamp: 50,
count: 0
}
]
}
]
I hope this makes sense :)

You can do a GroupByKey to transform your input into
[
("one", [1, 13, 2, 43]),
("two", [3]),
]
at which point you can apply a DoFn to convert the list of integers into a Timeseries object (e.g. by creating the list of TimeseriesWindow at the appropriate times, and then iterating over the values incrementing the counts.)
You may also look into the builtin windowing capabilities to see if that will meet your needs.

Related

Springboot - MongoDb queries taking too long

I am using springboot with mongodb in our project.
I have two collections. For simplicity lets call them foo and bar.
Foo has bar as Dbref inside its documents.
I have a query on Foo where I am querying it on bar.$id
{ "bar.$id" : ObjectId("61405393108f544d3b2082ed")}
I have indexed Foo collection on bar.$id
THere are 14K documents in Foo collection right now and 5 bar documents.
However, when I am calling this query from mongo repository, it is taking more than 1 min to execute.
When I analyzed the issue using explain(), I saw that on mongoDb side, the query is executed in under 100 ms. Result of explain:
{ queryPlanner:
{ plannerVersion: 1,
namespace: 'development.foo',
indexFilterSet: false,
parsedQuery: { 'bar.$id': { '$eq': ObjectId("61405393108f544d3b2082ed") } },
winningPlan:
{ stage: 'FETCH',
inputStage:
{ stage: 'IXSCAN',
keyPattern: { 'bar.$id': 1 },
indexName: 'index1',
isMultiKey: false,
multiKeyPaths: { 'bar.$id': [] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: { 'bar.$id': [ '[ObjectId(\'61405393108f544d3b2082ed\'), ObjectId(\'61405393108f544d3b2082ed\')]' ] } } },
rejectedPlans: [] },
executionStats:
{ executionSuccess: true,
nReturned: 7011,
executionTimeMillis: 6,
totalKeysExamined: 7011,
totalDocsExamined: 7011,
executionStages:
{ stage: 'FETCH',
nReturned: 7011,
executionTimeMillisEstimate: 1,
works: 7012,
advanced: 7011,
needTime: 0,
needYield: 0,
saveState: 7,
restoreState: 7,
isEOF: 1,
docsExamined: 7011,
alreadyHasObj: 0,
inputStage:
{ stage: 'IXSCAN',
nReturned: 7011,
executionTimeMillisEstimate: 0,
works: 7012,
advanced: 7011,
needTime: 0,
needYield: 0,
saveState: 7,
restoreState: 7,
isEOF: 1,
keyPattern: { 'bar.$id': 1 },
indexName: 'index1',
isMultiKey: false,
multiKeyPaths: { 'b.$arid': [] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: { 'bar.$id': [ '[ObjectId(\'61405393108f544d3b2082ed\'), ObjectId(\'61405393108f544d3b2082ed\')]' ] },
keysExamined: 7011,
seeks: 1,
dupsTested: 0,
dupsDropped: 0 } } },
serverInfo:
{ host: 'XXXX.0.mongodb.net',
port: 27017,
version: '4.4.9',
gitVersion: 'XXXX' },
ok: 1,
'$clusterTime':
{ clusterTime: Timestamp({ t: 1632909970, i: 19 }),
signature:
{ hash: Binary(Buffer.from("XXXX", "hex"), 0),
keyId: XXXX } },
operationTime: Timestamp({ t: 1632909970, i: 19 }) }
Springboot code:
#Query(value="{ 'bar.$id' : ObjectId(?0) }")
List<Foo> findByBarId(String barId);
Questions:
Why is spring boot taking too long to process query results?
Any solution to optimize this?
For all those who are facing a similar issue,
I resolved the issue, by adding turning on the lazy evaluation in Dbrefs.
#DBRef(lazy = true)
private bar bar;
I was retrieving the list of 14K documents each with one Dbref object and without mentioning lazy=true, all those objects were getting resolving eagerly which was causing the delay.

How to change cells background color in the google sheets with java

I try to change the cells background color with java, and try to use the official tutorial:
https://developers.google.cn/sheets/api/guides/conditional-format?hl=ko
But after code executing, I got the error:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid requests[0].addConditionalFormatRule: Invalid ConditionValue.userEnteredValue: =LT($D2,median($D$2:$D$11))",
"reason" : "badRequest"
} ],
"message" : "Invalid requests[0].addConditionalFormatRule: Invalid ConditionValue.userEnteredValue: =LT($D2,median($D$2:$D$11))",
"status" : "INVALID_ARGUMENT"
}
I do not understand what means "=LT($D2,median($D$2:$D$11))" and where the range sets. On the screen-shot range is A2:D5 - so how to this range should be set and what wrong with this example?
Regarding where do we set the range, It is being set in AddConditionalFormatRuleRequest -> ConditionalFormatRule -> ranges
Regarding what this formula means =LT($D2,median($D$2:$D$11)),
Since this is a custom formula, $D2 will increment based on the range provided. From D2 up to D11. It will check if the current Column D row is less than the median value of range D2:D11. See LT() and MEDIAN()
Sample Request via API Explorer:
{
"requests": [
{
"addConditionalFormatRule": {
"index": 0,
"rule": {
"ranges": [
{
"sheetId": 116889903,
"startRowIndex": 1,
"startColumnIndex": 0,
"endColumnIndex": 4,
"endRowIndex": 11
}
],
"booleanRule": {
"condition": {
"type": "CUSTOM_FORMULA",
"values": [
{
"userEnteredValue": "=LT($D2,median($D$2:$D$11))"
}
]
},
"format": {
"backgroundColor": {
"red": 1,
"blue": 0,
"green": 0
}
}
}
}
}
}
]
}
In this example, the range I set is starting from rowIndex 1 which is equivalent to sheets row 2. (GridRange object is zero-based). start columnIndex is 0 since we want it to start at sheet column 1.
Notice that endRowIndex is 11 (Sheet row 12) and endColumnIndex is 4 (Sheet column 5). It is because based on GridRange, endRowIndex and endColumnIndex are exclusive which means it is not included in the range.
Output:
For a java code sample code, please refer to the official document example here.
But if you just want to change the cell background color without conditional formatting, you can use UpdateCellsRequest or RepeatCellRequest.
The difference between the 2 is that, In UpdateCellsRequest you will provide the backgroundColor in each cell in your range. While in RepeatCellRequest you only need to set the backgroundColor once and it will be reflected in all your range.
Sample Update Cells Request via API Explorer:
{
"requests": [
{
"updateCells": {
"range": {
"sheetId": 116889903,
"startRowIndex": 0,
"startColumnIndex": 0,
"endRowIndex": 1,
"endColumnIndex": 4
},
"rows": [
{
"values": [
{
"userEnteredFormat": {
"backgroundColor": {
"red": 0,
"blue": 1,
"green": 0
}
}
},
{
"userEnteredFormat": {
"backgroundColor": {
"red": 0,
"blue": 0,
"green": 1
}
}
}
]
}
],
"fields": "userEnteredFormat"
}
}
]
}
In this example, even though I set the range to A1:D1, but since I only provided 2 CellData values with background color blue and green. Only 2 cells were updated
Output:
Sample Repeat Cell Request via API Explorer:
{
"requests": [
{
"repeatCell": {
"range": {
"sheetId": 116889903,
"startRowIndex": 0,
"endRowIndex": 1,
"startColumnIndex": 0,
"endColumnIndex": 4
},
"cell": {
"userEnteredFormat": {
"backgroundColor": {
"red": 0,
"blue": 0,
"green": 1
}
}
},
"fields": "userEnteredFormat"
}
}
]
}
I only entered 1 CellData which will reflect to all the cells within the range provided.
Output:

How to read this json with gson?

I have this Json and I don't know how to read when the atributtes are dynamic...
Note that example below, the field "atletas" is a List by "Ids"... How can I get this information if I don't know what Id will get?
For example:
{
"rodada": 2,
"atletas":
"100651": {
"apelido": "Rodrygo",
"pontuacao": 1.3,
"scout": {
"FC": 3,
"FF": 1,
"FS": 1,
"RB": 1
},
"foto": "https://s.glbimg.com/es/sde/f/2017/11/01/b8128d3d5db5325dc238cadc67c28342_FORMATO.png",
"posicao_id": 5,
"clube_id": 277
},
"101957": {
"apelido": "Thiago Larghi",
"pontuacao": 0,
"scout": {
},
"foto": "https://s.glbimg.com/es/sde/f/2018/03/19/b1b3b42713071408fb76d25cfb76d927_FORMATO.jpeg",
"posicao_id": 6,
"clube_id": 282
},
"36773": {
"apelido": "Julio Cesar",
"pontuacao": 8,
"scout": {
"DD": 1,
"SG": 1
},
"foto": "https://s.glbimg.com/es/sde/f/2018/04/17/326a0f6b72076eb12b4e6b335ae6a1da_FORMATO.png",
"posicao_id": 1,
"clube_id": 262
}
......
'

Highcharts not displaying chart correctly/not reading array properly

I am reading in some information from a database and putting it in to an arraylist for highcharts to be able to read. The Arraylist is in the form of [String,int] and it represents a date/number of users. It looks like this
[[2014-06-25, 35], [2014-06-26, 48], [2014-06-27, 60], [2014-06-28, 14], [2014-06-29, 8], [2014-06-30, 26], [2014-07-01, 21], [2014-07-02, 32], [2014-07-03, 33], [2014-07-04, 17], [2014-07-05, 18], [2014-07-06, 14], [2014-07-07, 26], [2014-07-08, 18], [2014-07-09, 26], [2014-07-10, 21], [2014-07-11, 1]]
I got to feed that in to my highchart, which looks like this:
$(function () {
$('#container').highcharts({
title: {
text: 'Monthly Average Users',
x: -20 //center
},
subtitle: {
text: 'subtitle',
x: -20
},
xAxis: {
type: 'category'
},
yAxis: {
title: {
text: 'Number of users'
},
plotLines: [{
value: 0,
width: 1,
color: '#808080'
}]
},
tooltip: {
valueSuffix: '°C'
},
legend: {
layout: 'vertical',
align: 'right',
verticalAlign: 'middle',
borderWidth: 0
},
series: [{
name: 'Users',
data: '<%=combined%>'
}]
});
But it comes out like this
I'm trying to do this in a jsp file, can highcharts read java ArrayLists?
Thanks
Change
xAxis: {
type: 'category',
to
xAxis: {
type: 'datetime',
One more thing - date format should be like this - [Date.UTC(2014,06,25), 35] - Run it as it is to get idea..
http://jsfiddle.net/8jwHV/4/
$(function () {
$('#container').highcharts({
title: {
text: 'Monthly Average Users',
x: -20 //center
},
subtitle: {
text: 'subtitle',
x: -20
},
xAxis: {
type: 'datetime'
},
yAxis: {
title: {
text: 'Number of users'
},
plotLines: [{
value: 0,
width: 1,
color: '#808080'
}]
},
tooltip: {
valueSuffix: '°C'
},
legend: {
layout: 'vertical',
align: 'right',
verticalAlign: 'middle',
borderWidth: 0
},
series: [{
name: 'Users',
data: [
[Date.UTC(2014,06,25), 35],
[Date.UTC(2014,06,26), 40],
[Date.UTC(2014,06,27), 41],
[Date.UTC(2014,06,28), 80],
]
}, ]
});
});
xAxis: {
type: 'datetime',
tickInterval: 50
}

How to create a java class for a JSON string where keys are variable?

The response
array (
0 =>
array (
'time_start' => 1252652400,
'time_stop' => 1252911600,
'stats' =>
array (
6002306163363 =>
array (
'id' => 6002306163363,
'impressions' => '6713',
'clicks' => '7',
'spent' => '593',
'actions' => '1',
),
),
),
)
data is shown in facebook api of rest/ads.getAdGroupStats.
I am not able to convert the stats part to a Java class, Where the 6002306163363 is a variable and similarly could have many more mappings. Below is the full result for three ads 123456,23456,34567.
[
{
"time_start": 0,
"time_stop": 1285224928,
"stats": {
"123456": {
"id": 123456,
"impressions": 40,
"clicks": 0,
"spent": 0,
"social_impressions": 0,
"social_clicks": 0,
"social_spent": 0
},
"23456": {
"id": 23456,
"impressions": 3,
"clicks": 0,
"spent": 0,
"social_impressions": 0,
"social_clicks": 0,
"social_spent": 0
},
"34567": {
"id": 34567,
"impressions": 211457,
"clicks": 84,
"spent": 6898,
"social_impressions": 124,
"social_clicks": 0,
"social_spent": 0
}
}
}
]
I have to make a Java class which could map to the above JSON and not able to do so. Can anyone please help me here?
Update : I am getting this data from facebook and in the api that we are using requires class, so that the returned json could be mapped. I have only control to create a class so that api internally map this out. I need the format of the java class required.
You need a hashmap or something similar to deal with those numerical keys.
public class GroupStats {
long time_start;
long time_stop;
HashMap<GroupAccount> stats;
}
public class GroupAccount {
long id;
int impressions;
int clicks;
int spent;
int social_impressions;
int social_spent;
}
I'm Mark Allen (RestFB maintainer). Version 1.6 should address this with the new option to map to the built-in JsonObject type - check out http://restfb.com for an example.

Categories