Create real-time report using ZoomData (visualization problems) - java

I'm trying to create simple real-time report using ZoomData.
I create DataSource (Upload API) in ZoomData admin interface & add visualization to it (vertical bars).
Also I disable else visualizations for this DS.
My DS has 2 fields:
timestamp - ATTRIBUTE
count - INTEGER AVG
In visualization
group by: timestamp
group by sort: count
y axis: count avg
colors: count avg
Every second i send post request to zoomdata server to add info in DS.
I do it from java (also trying to send from postman).
My problem is: data came from post and added to DS but visualization properties become to default
group by sort: volume
y axis: volume
colors: volume
but group by stay timestamp
I can't understand why visualization properties always change after data came in POST request.

Related

Calculate average request durration over time using Java + Apache commons math

I have a dataset of pairs that represent HTTP request samples
(request time, request duration)
By using apache math EmpiricalDistribution I can calculate the average request count over time like this
double[] input = [request times from the pairs];
EmpiricalDistribution dist = new EmpiricalDistribution((number of buckets));
dist.load(input);
dist.getBinStats()
.stream()
.filter(stat -> stat.getN() > 0)
.forEach(stat -> (logic to store the data in proper format);
This way I can store the data in a chart-like way and plot it later on. Now what I can't do is calculate the average request duration over time.
In systems like Prometheus, this is done by using queries like
rate(http_server_requests_seconds_sum[5m])
/
rate(http_server_requests_seconds_count[5m])
I want to achieve the same thing (if possible) in my java code

Cassandra, Java and MANY Async request : is this good?

I'm developping a Java application with Cassandra with my table :
id | registration | name
1 1 xxx
1 2 xxx
1 3 xxx
2 1 xxx
2 2 xxx
... ... ...
... ... ...
100,000 34 xxx
My tables have very large amount of rows (more than 50,000,000). I have a myListIds of String id to iterate over. I could use :
SELECT * FROM table WHERE id IN (1,7,18, 34,...,)
//image more than 10,000,000 numbers in 'IN'
But this is a bad pattern. So instead I'm using async request this way :
List<ResultSetFuture> futures = new ArrayList<>();
Map<String, ResultSetFuture> map = new HashMap<>();
// map : key = id & value = data from Cassandra
for (String id : myListIds)
{
ResultSetFuture resultSetFuture = session.executeAsync(statement.bind(id));
mapFutures.put(id, resultSetFuture);
}
Then I will process my data with getUninterruptibly() method.
Here is my problems : I'm doing maybe more than 10,000,000 Casandra request (one request for each 'id'). And I'm putting all these results inside a Map.
Can this cause heap memory error ? What's the best way to deal with that ?
Thank you
Note: your question is "is this a good design pattern".
If you are having to perform 10,000,000 cassandra data requests then you have structured your data incorrectly. Ultimately you should design your database from the ground up so that you only ever have to perform 1-2 fetches.
Now, granted, if you have 5000 cassandra nodes this might not be a huge problem(it probably still is) but it still reeks of bad database design. I think the solution is to take a look at your schema.
I see the following problems with your code:
Overloaded Cassandra cluster, it won't be able to process so many async requests, and you requests will be failed with NoHostAvailableException
Overloaded cassadra driver, your client app will fails with IO exceptions, because system will not be able process so many async requests.(see details about connection tuning https://docs.datastax.com/en/developer/java-driver/3.1/manual/pooling/)
And yes, memory issues are possible. It depends on the data size
Possible solution is limit number of async requests and process data by chunks.(E.g see this answer )

How to get Dynamic Xpath of a webtable?

List<WebElement>table=driver.findElements(By.xpath("//*[#id=\"prodDetails\"]/div[2]/div[1]/div/div[2]/div/div/table/tbody/tr"));
JavascriptExecutor jse = (JavascriptExecutor) driver;
// jse.executeScript("arguments[0].scrollIntoView();",table);
jse.executeScript("arguments[0].style.border='3px solid red'",table);
int row= table.size();
I am unable to get the required no of row and column.The xpath i provided does not find the table on site
Link : Click here
I have to fetch the specification of the mobile.
Instead of this xpath :
//*[#id=\"prodDetails\"]/div[2]/div[1]/div/div[2]/div/div/table/tbody/tr
Use this :
//*[#id="prodDetails"]/div[2]/div[1]/div/div[2]/div/div/table/tbody/tr
Though I would not suggest you to use absolute xpath. You can go for relative xpath which is more readable and easy.
Relative Xpath :
//div[#id='prodDetails']/descendant::div[#class='pdTab'][1]/descendant::tbody/tr
In code something like :
List<WebElement>table=driver.findElements(By.xpath("//div[#id='prodDetails']/descendant::div[#class='pdTab'][1]/descendant::tbody/tr"));
Instead of absolute xpath:
//*[#id=\"prodDetails\"]/div[2]/div[1]/div/div[2]/div/div/table/tbody/tr
I would suggest to use simple relative xpath:
//*[#id='prodDetails']//table/tbody/tr
This xpath will work if there are no other tables in the page. Otherwise, make sure both the tables can be differentiated with some attribute
You can get the total no of rows using the below Xpath.
In the above link, we are having multiple section which has same class and two table also has similar locator. So, You need to get the element based on the table name as below
Note: you can achieve this without using JavascriptExecutor
WebDriverWait wait=new WebDriverWait(driver,20);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//div[#class='section techD']//span[text()='Technical Details']/ancestor::div[#class='section techD']//table//tr")));
List<WebElement> rowElementList=driver.findElements(By.xpath("//div[#class='section techD']//span[text()='Technical Details']/ancestor::div[#class='section techD']//table//tr"));
int row= rowElementList.size();
System.out.println(row);//16
output:
16
Suppose , If you want to get an additional information table row details, you can use the above Xpath by replace the section as Additional Information
List<WebElement> additionInfoList=driver.findElements(By.xpath("//div[#class='section techD']//span[text()='Additional Information']/ancestor::div[#class='section techD']//table//tr"));
System.out.println(additionInfoList.size());//Output: 5
Output: 5
Finally, you can iterate above list and extract table content details
XPATH can be pretty hard to read, especially when you need to use it a lot.
You could try the univocity-html-parser
HtmlElement e = HtmlParser.parseTree(new UrlReaderProvider("your_url"));
List<HtmlElement> rows = e.query()
.match("div").precededBy("div").withExactText("Technical Details")
.match("tr").getElements();
for(HtmlElement row : rows){
System.out.println(row.text());
}
The above code will print out:
OS Android
RAM 2 GB
Item Weight 150 g
Product Dimensions 7.2 x 14.2 x 0.9 cm
Batteries: 1 AA batteries required. (included)
Item model number G-550FY
Wireless communication technologies Bluetooth, WiFi Hotspot
Connectivity technologies GSM, (850/900/1800/1900 MHz), 4G LTE, (2300/2100/1900/1800/850/900 MHz)
Special features Dual SIM, GPS, Music Player, Video Player, FM Radio, Accelerometer, Proximity sensor, E-mail
Other camera features 8MP primary & 5MP front
Form factor Touchscreen Phone
Weight 150 Grams
Colour Gold
Battery Power Rating 2600
Whats in the box Handset, Travel Adaptor, USB Cable and User Guide
Alternatively the following code is a bit more usable, as I believe you probably want more stuff from that page too, and getting rows with data is usually what you want to end up with:
HtmlEntityList entityList = new HtmlEntityList();
HtmlEntitySettings product = entityList.configureEntity("product");
PartialPath technicalDetailRows = product.newPath()
.match("div").precededBy("div").withExactText("Technical Details")
.match("tr");
technicalDetailRows.addField("technical_detail_field").matchFirst("td").classes("label").getText();
technicalDetailRows.addField("technical_detail_value").matchLast("td").classes("value").getText();
HtmlParserResult results = new HtmlParser(entityList).parse(new UrlReaderProvider("your_url")).get("product");
System.out.println("-- " + Arrays.toString(results.getHeaders()) + " --");
for(String[] row : results.getRows()){
System.out.println(Arrays.toString(row));
}
Now this produces:
OS = Android
RAM = 2 GB
Item Weight = 150 g
Product Dimensions = 7.2 x 14.2 x 0.9 cm
Batteries: = 1 AA batteries required. (included)
Item model number = G-550FY
Wireless communication technologies = Bluetooth, WiFi Hotspot
Connectivity technologies = GSM, (850/900/1800/1900 MHz), 4G LTE, (2300/2100/1900/1800/850/900 MHz)
Special features = Dual SIM, GPS, Music Player, Video Player, FM Radio, Accelerometer, Proximity sensor, E-mail
Other camera features = 8MP primary & 5MP front
Form factor = Touchscreen Phone
Weight = 150 Grams
Colour = Gold
Battery Power Rating = 2600
Whats in the box = Handset, Travel Adaptor, USB Cable and User Guide
Disclosure: I'm the author of this library. It's commercial closed source but it can save you a lot of development time.

How to interpret K-Means clusters

I have written code in java using Apache Spark for K-Means clustering.
I want to analyze network data. I created K-Means model using some training data. k=5 and iteration=50.
Now I want detect anomaly record using distance of a record from the center of the cluster. If it is far from the center then it is an anomaly record.
Also I want to find out what type of data each cluster stores. To give an example--- in case of Movie clustering, detecting cluster having common genre or theme among the movies in the cluster
I am having trouble to interpret clusters. I am using one bad record and one good record for prediction but at times both good and bad records fall in same cluster.
Bad record means URI field of that field containing value something like HelloWorld/../../WEB-INF/web.xml.
I get array of all cluster centers from K-Means model. There is no api to get cluster center of a particular cluster. I am calculating distance of a input vector/record from all clusters but I am not able to get cluster center of a specific cluster or where that record is present.
Here is my code,
KMeansModel model = KMeans.train(trainingData.rdd(), numClusters,
numIterations);
In separate file,
model.save(sparkContext, KM_MODEL_PATH);
Vector[] clusterCenters = model.clusterCenters();
//Input for prediction is <Vector> testData
//Predict data
System.out.println("Test Data cluster ----- "
+ model.predict(vector) + " k ->> " + model.k());
//Calculate distance of a input record from each cluster center
for (Vector clusterCenter : clusterCenters) {
System.out.println(" Distance "
+ computeDistance(clusterCenter.toArray(),
vector.toArray()));
}
//Function for computing distance between input record and center of a cluster
public double computeDistance(double[] clusterCenters, double[] vector) {
org.apache.spark.mllib.linalg.DenseVector dV1 = new org.apache.spark.mllib.linalg.DenseVector(
clusterCenters);
org.apache.spark.mllib.linalg.DenseVector dV2 = new org.apache.spark.mllib.linalg.DenseVector(
vector);
return org.apache.spark.mllib.linalg.Vectors.sqdist(dV1, dV2);
}

Dojo enhanced grid pagination in java

I have a restful webservice using portal framework which gets hundreds of rows from database. I want to display on Dojo EnhancedGrid with pagination each time showing 10 rows using page numbers 10|20|30. I am able to do pagination with this example But my rest url is loading all the records from database which
leading to performance issues. There should be some event, every time when i click on page number, it should call rest url and get 10 records from database. how can i achieve this?
Dojo Enhanced Grid with Pagination makes a call to the backend REST service each time it is necessary (clicking the next page/last page/prev page/specific page/x results per page and so on..) it passes the Range parameter in the Header of the request indicating how many items it requests for the current query (i.e. Range items=0-9 will return the first 10 items and so on). So this is automatically done by the pagination support.
What you have to do is to read this parameter in the backend REST service and return the specified rows from the database. But be careful, the pagination expects an array of objects from the database.
#GET
#Path("getSearchResults")
#Produces(MediaType.APPLICATION_JSON)
public Response getSearchResults(#HeaderParam("Range") String range) {
// parse range String
// perform search
return Response.ok(responseList.toArray()).header("Content-Range", "items " + startItem + "-" + endItem + "/" + totalItems).build();
}
Also the response should contain the number of items returned and the total item number so that Pagination knows how many pages to display in the Grid and it also shows a total in the lower left corner of the Grid. This response is returned in the Header of the response as well in the following parameter: Content-Range items 0-9/120.
For no results use Content-Range: */0
on the Dojo side:
store=new JsonRest({ handleAs: 'json', target:
'{pathToServices}/rest/services/getSearchResults'});
grid = new EnhancedGrid({
id: "gridId",
store: new ObjectStore({ objectStore: store}),
structure: getGridStructure(),
plugins: {
pagination: {
pageSizes: ["25", "50", "100"],
description: true,
sizeSwitch: true,
pageStepper: true,
gotoButton: true,
maxPageStep: 4,
position: "bottom"},
}
});
That's all you have to do, Enhanced Grid Pagination takes care of everything else.

Categories