Filtering Data in Java/Processing

Filtering Data in Java/Processing - java

I am working on a map based data visualization project that scrapes data from an XML file. Locations are placed on a map based on geo location and locations are interactive with mouse clicks that will then display information about the location. I need to start filtering the results based on information about each location. Ex: lets say the i want to display information about trees and i know their location and their type. I would want to filter in and out walnut, cherry, oak using check boxes.
I am trying to plan out how to attack this problem from a design standpoint. Currently the all information is pulled directly from the XML file with very little going into new Arrays/Lists. Any recommendations as I am trying to conquer this task? If you need me to elaborate more or want any more information please let me know.
EDIT:
I'm sorry if this is vague, I'm not entirely sure how to ask the question. So right now I am taking 311 data and putting information into Array's based on the information I want to display. So lets say I want get an address. (At this point a map has been populated with all of the individual locations from the 311 data lets say 200 spots) I click one location, and that location is tied to an index in an array that has all of the addresses. So at any time I can use an index to get information from an array. There are multiple arrays holding information like address, report type, time, etc. I want the locations on the map to be sorted by the report type. I hope this makes more sense.

I hope I understand correctly, this seem like a regular data management system requirements, it will be hard to cover these kind of systems in a few words, but in a nutshell I'd say that those systems are divided into layers:
data layer, usually some database, try to install and use a database like mysql
data access layer, I understand you're using java, consider using hibernate that will let you describe and use your database using objects rather than RDBMS tables, here you would also have sql / hql queries
some business layer to have the logic on top of these dummy data objects, or maybe connect to some external service
serve this data to you client, whatever it runs on, if its a Java client or a web browser
Check out java spring http://projects.spring.io/spring-framework/ on how its done in practice.
Then, if you feel like going back and forth to the server for more data is too much on performance you may decide to cache some of the information in the client side.
Last, always remember Donald Knuth saying
about 97% of the time: premature optimization is the root of all evil.

Are you using processing tag for Processing.org right? If I understand you, isn't the case of make an object that group all data relative to one location, and when needed retrieve the info using a getter or even dot notation? I though of something like:
class Local(){
String name;
String address;
//whatever else...
float mapPosX, mapPosY;
boolean ispressedOver(){
//return if mouse over
}
}
create Locals using XML data, store in an array of Local, and when mouse pressed get it
if (localsArray[i].isPressedOver){display(localsArray[i].address);}
this would be a very simple example of the idea except for the xml parsing to an
Place[] places = new Place[4];
void setup() {
size(600, 400);
noStroke();
places[0] = new Place ("one", "That street, 12 - BR", 0.32044, 0.230098, 200, 98);
places[1] = new Place ("two", "This street, 35 - UG", 0.22222, 0.084723, 394,176);
places[2] = new Place ("three", "Other street, 132 - TY", 0.32321, 0.36388, 157, 283);
places[3] = new Place ("four", "Principal street, 672 - OP", 0.909044, 0.7828939, 276, 312);
}
void draw() {
background(75, 16, 160);
for(Place p:places){
p.display();
}
}
class Place {
String name;
String address;
float latitude;
float longitude;
float xPos;
float yPos;
float sz = 40;
Place(String n, String a, float lat, float lng, float x, float y) {
name = n;
address = a;
latitude = lat;
longitude = lng;
xPos = x;
yPos = y;
}
void display() {
fill(200, 210, 100);
rect(xPos, yPos, sz, sz);
if (isOver()) {
String quick = name + " - " + address + " - " + latitude + " - " + longitude ;
fill(0);
text(quick, xPos - textWidth(quick)/2, yPos - 10);
}
}
boolean isOver() {
return (mouseX > xPos && mouseX < xPos + sz && mouseY > yPos && mouseY < yPos + sz);
}
}
In following link I pasted a code I'm working on. It is not displaying anything. But I does get XML data and build objects based on them. For now the output is in the console. I don't know if is going to help much. Most variables names are in portuguese :P And it is not commented... But it works. You can run it. it gets the xml from an API in the web. There are two classes, don't bother with Query. It is necessary to get the xml, but not related to your question. The Prop class is the data holder. It gets an xml as parameter and parse it's fields to member vars. For now there is only one method. toString() used to display data to the console.
http://pastebin.com/8gGDsFAv

Related

What input should I use for my Neural Network?

I recently implemented a simple Deep Q-Learning agent in Processing for a game called Frozen Lake (game from OpenAI Gym). The agent basically has to find the shortest path between the starting and the ending points, avoiding obstacles (holes in the ice) and without going out of the map.
This is the code that generates the state passed to the Neural Network:
//Return an array of double containing all 0s except for the cell the Agent is on that is 1.
private double[] getState()
{
double[] state = new double[cellNum];
for(Cell cell : lake.cells)
{
if((x - cellDim/2) == cell.x && (y - cellDim/2) == cell.y)
{
state[lake.cells.indexOf(cell)] = 1;
}
else
{
state[lake.cells.indexOf(cell)] = 0;
}
}
return state;
}
where lake is the environment object, cells is an ArrayList attribute of lake containing all the squares of the map, x and y are the agent's coordinates on the map.
And all of this works well, but the agent only learns the best path for a single game map and if the map changes the agent must be trained all over again.
I wanted the agent to learn how to play the game and not how to play a single map.
So, instead of setting all the map squares to 0 except the one the agent is on that is set to 1, I tried to associated some random numbers for every kind of square (Goal:1, Ice:8, Hole:0, Goal:3, Agent:7) and set the input like that, but it didn't work at all.
So I tried to convert all the colors of the squares into a grayscale value (from 0 to 255), so that now the different squares were mapped as (roughly): Goal:45, Ice:243.37, Hole:34.57, Goal:70.8, Agent:150.
But this didn't work either, so I mapped all the grayscale values to values between 0 and 1.
But no result with this either.
By the way, this is the code for the Neural Network to calculate the output:
public Layer[] estimateOutput(double[] input)
{
Layer[] neurons = new Layer[2]; //Hidden neurons [0] and Output neurons [1].
neurons[0] = new Layer(input); //To be transformed into Hidden neurons.
//Hidden neurons values calculation.
neurons[0] = neurons[0].dotProduct(weightsHiddenNeurons).addition(biasesHiddenNeurons).sigmoid();
//Output neurons values calculation.
neurons[1] = neurons[0].dotProduct(weightsOutputNeurons).addition(biasesOutputNeurons);
if(gameData.trainingGames == gameData.gamesThreshold)
{
//this.render(new Layer(input), neurons[0], neurons[1].sigmoid()); //Draw Agent's Neural Network.
}
return neurons;
}
and to learn:
public void learn(Layer inputNeurons, Layer[] neurons, Layer desiredOutput)
{
Layer hiddenNeurons = neurons[0];
Layer outputNeurons = neurons[1];
Layer dBiasO = (outputNeurons.subtraction(desiredOutput)).valueMultiplication(2);
Layer dBiasH = (dBiasO.dotProduct(weightsOutputNeurons.transpose())).layerMultiplication((inputNeurons.dotProduct(weightsHiddenNeurons).addition(biasesHiddenNeurons)).sigmoidDerivative());
Layer dWeightO = (hiddenNeurons.transpose()).dotProduct(dBiasO);
Layer dWeightH = (inputNeurons.transpose()).dotProduct(dBiasH);
//Set new values for Weights and Biases
weightsHiddenNeurons = weightsHiddenNeurons.subtraction(dWeightH.valueMultiplication(learningRate));
biasesHiddenNeurons = biasesHiddenNeurons.subtraction(dBiasH.valueMultiplication(learningRate));
weightsOutputNeurons = weightsOutputNeurons.subtraction(dWeightO.valueMultiplication(learningRate));
biasesOutputNeurons = biasesOutputNeurons.subtraction(dBiasO.valueMultiplication(learningRate));
}
Anyway, the whole project is available on GitHub, where the code is better commented: https://github.com/Nyphet/Frozen-Lake-DQL
What am I doing wrong on setting the input? How can I achieve "learning the game" instead of "learning the map"?
Thanks in advance.

How To Iterate Through Every Latitude/Longitude in Java

I'm accessing this API that gives me global weather:
https://callforcode.weather.com/doc/v3-global-weather-notification-headlines/
However, it takes lat/lng as an input parameter, and I need the data for the entire world.
I figure I can loop through every latitude longitude, every 2 latitudes and 2 longitude degrees, giving me a point on the world, every ~120 miles across and roughly 100 degrees north/south, which should give me all the data in 16,200 API calls ((360/2) * (180/2)).
How can I do this effectively in Java?
I'd conceived something like this; but is there a better way of doing this?
for(int i = 0; i < 360; i+2){
var la = i;
for(int x = 0 x < 180; x+2) {
var ln = x;
//call api with lat = i, lng = x;
}
}

It's somewhat of a paradigm shift, but I would NOT use a nested for-loop for this problem. In many situations where you are looking at iterating over an entire result set, it is often possible to trim the coverage dramatically without losing much or any effectiveness. Caching, trimming, prioritizing... these are the things you need: not a for-loop.
Cut sections entirely - maybe you can ignore ocean, maybe you can ignore Antartica and the North Pole (since people there have better ways of checking weather anyway)
Change your search frequency based on population density. Maybe northern Canada doesn't need to be checked as thoroughly as Los Angeles or Chicago.
Rely on caching in low-usage areas - presumably you can track what areas are actually being used and can then more frequently refresh those sections.
So what you end up with is some sort of weighted caching system that takes into account population density, usage patterns, and other priorities to determine what latitude/longitude coordinates to check and how frequently.
High-level code might look something like this:
void executeUpdateSweep(List<CoordinateCacheItem> cacheItems)
{
for(CoordinateCacheItem item : cacheItems)
{
if(shouldRefreshCache(item))
{
//call api with lat = item.y , lng = item.x
}
}
}
boolean shouldRefreshCache(item)
{
long ageWeight = calculateAgeWeight(item);//how long since last update?
long basePopulationWeight = item.getBasePopulationWeight();//how many people (users and non-users) live here?
long usageWeight = calculateUsageWeight(item);//how much is this item requested?
return ageWeight + basePopulationWeight + usageWeight > someArbitraryThreshold;
}

Arraylist of lat/long points of interest?

i've made an app that implements augmented reality based on POI's and have all the functionality working for one POI but i would now like to be able to put in multiple points. Can any give me advice on how to do this ? Can i create an array of POI's ?? posted my relevant code below but don't really know where to go from here.
private void setAugmentedRealityPoint() {
homePoi = new AugmentedPOI(
"Home",
"Latitude, longitude",
28.306802, -81.601358
);
This is how its currently set and i then go on to use it in other area's as shown belown:
public double calculateAngle() {
double dX = homePoi.getPoiLatitude() - myLatitude;
double dY = homePoi.getPoiLongitude() - myLongitude;
}
and here:
private boolean isWithinDistance(double myLatitude, double myLongitude){
Location my1 = new Location("One");
my1.setLatitude(myLatitude);
my1.setLongitude(myLongitude);
Location target =new Location("Two");
target.setLatitude(homePoi.getPoiLatitude());
target.setLongitude(homePoi.getPoiLongitude());
double range =my1.distanceTo(target);
double zone = 20;
if (range < zone ) {
return true;
}
else {
return false;
}
}
Any help would be appreciated.

Using a List would be a smart idea. You could add all entries into it in code, or you could pull them in from a JSON file. When you're rendering them, you could check if they are in range.
If you have a lot of these POIs, you should divide them into smaller and smaller regions, and only load what you need. For example, structure them like this:
- CountryA
+ County 1
* POI
* POI
- CountryB
+ County 1
* POI
* POI
+ County 2
* POI
Get the country and county of the user, and only load what you really need. I assume this is a multiplayer game, so I'll share some of my code.
On the server side, I have 3 objects: Country, County and POI.
First I discover all countries on the disk, and make an object for it. Inside my country object I have a list for all counties, and inside my County object I have a list of POIs. When a player joins, they send a packet with their Country and County, and I can select the appropriate POIs for them. Storing them in smaller regions is essential, or your server will have a hard time if you go through all of the POIs for every player.
Here is my method for discovering data: Server.java#L311-L385
Code for selecting POIs for a player: Server.java#L139-L181
And how you can render it: PlayScreen.java#L209-L268
You need to port it to your own app, and I'm probably horrible at explaining, but I hope you got something out of it.

compare two hashtable values and put results into an arraylist

I'm a little stuck with the way I want to compare and collect data from hash tables that I am creating. The code is a little messy as I am now getting a little confused on where to go next, I apologise.
I have two Hashtables. Each table holds keys which are co-ordinates. One table holds the longitudes, the other the latitudes. The values of each side hold a location. What I want to do is compare the two tables values, so if the Strings are the same, then the Key and Value can be put into a separate ArrayLists. A coordinates list, maybe one for lat and one for lon to make the gps feature easier, and then a location list.
Here is the code :
public static void trails(String[] args) {
Hashtable<Double, String> trailLat = new Hashtable<Double, String>();
trailLat.put(51.7181283, "Bike Park Wales");
...
Hashtable<Double, String> trailLon = new Hashtable<Double, String>();
trailLon.put(-3.3633637, "Bike Park Wales");
...
if ( trailLat.keys() >= userLatL && trailLat.keys() <= userLatH ) {
trailLat.values().retainAll(trailLon);
ArrayList<Item> items = new ArrayList<Item>(trailLat.values());
}
....
I've only included the latitude part of the code as I would think it's just repeated.
The 'userLatL' and 'userLatH' are the users location coordinates 20mile radius boundaries. The idea is to return the keys and values that fall within that number difference/20 mile radius.
Cheers in advance! Any help would be really appreciated!

I would use this approach
class Trail{
String name;
double lat,lon;
}
double LAT_MIN, LAT_MAX, LON_MIN, LON_MAX;
List<Trail> trails = new ArrayList<Trail>();
//populate trails from db or whatever
List<Trail> goodTrails = new ArrayList<Trail>();
for(Trail trail : trails){
if(trail.lat > LAT_MIN && trail.lat < LAT_MAX && trail.lon > LON_MIN && trail.lon < LON_MAX ){
goodTrails.add(trail);
}
}

Even though you are looking at 20mile radius, you would have to take a spherical shape of the earth in the account if that radius ever grows bigger.
Also you would need some better structure instead of 2 hashmaps where the keys are lon/lat values. You can use a simple table to store all the points (in the database or as a list of objects).
I don't want to use other people's work, everything is explained here:
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
Good Luck.

Drools performance for decision tables

I have a potential performance/memory bottleneck when I try to calculate insurance premium using Drools engine.
I use Drools in my project to separate business logic from java code and I decided to use it for premium calculation too.
Am I using Drools the wrong way?
How to meet the requirements in more performant way?
Details below:
Calculations
I have to calculate insurance premium for given contract.
Contract is configured with
productCode (code from dictionary)
contractCode (code from dictionary)
client’s personal data (e.g. age, address)
insurance sum (SI)
etc.
At the moment, premium is calculated using this formula:
premium := SI * px * (1 + py) / pz
where:
px is factor parameterized in Excel file and depends on 2 properties (client’s age and sex)
py is factor parameterized in excel file and depends on 4 contract’s properties
pz - similarly
Requirements
R1 – java code doesn’t know the formula,
R2 - java code knows nothing about formula dependencies, in other words that premium depends on: px, py, pz,
R3 - java code knows nothing about parameters’ dependencies, I mean that px depends on client’s age and sex, and so on.
With R1, R2 and R3 implemented I have java code in separation from business logic, and any business analyst (BA) may modify formula and add new dependencies without redeploys.
My solution, so far
I have contract domain model, which consists of classes Contract, Product, Client, Policy and so on. Contract class is defined as:
public class Contract {
String code; // contractCode
double sumInsured; // SI
String clientSex; // M, F
int professionCode; // code from dictionary
int policyYear; // 1..5
int clientAge; //
... // etc.
In addition I introduced Var class that is container for any parameterized variable:
public class Var {
public final String name;
public final ContractPremiumRequest request;
private double value; // calculated value
private boolean ready; // true if value is calculated
public Var(String name, ContractPremiumRequest request) {
this.name = name;
this.request = request;
}
...
public void setReady(boolean ready) {
this.ready = ready;
request.check();
}
...
// getters, setters
}
and finally - request class:
public class ContractPremiumRequest {
public static enum State {
INIT,
IN_PROGRESS,
READY
}
public final Contract contract;
private State state = State.INIT;
// all dependencies (parameterized factors, e.g. px, py, ...)
private Map<String, Var> varMap = new TreeMap<>();
// calculated response - premium value
private BigDecimal value;
public ContractPremiumRequest(Contract contract) {
this.contract = contract;
}
// true if *all* vars are ready
private boolean _isReady() {
for (Var var : varMap.values()) {
if (!var.isReady()) {
return false;
}
}
return true;
}
// check if should modify state
public void check() {
if (_isReady()) {
setState(State.READY);
}
}
// read number from var with given [name]
public double getVar(String name) {
return varMap.get(name).getValue();
}
// adding uncalculated factor to this request – makes request IN_PROGRESS
public Var addVar(String name) {
Var var = new Var(name, this);
varMap.put(name, var);
setState(State.IN_PROGRESS);
return var;
}
...
// getters, setters
}
Now I can use these classes with such flow:
request = new ContractPremiumRequest(contract)
creates request with state == INIT
px = request.addVar( "px" )
creates Var("px") with ready == false
moves request to state == IN_PROGRESS
py = request.addVar( "py" )
px.setValue( factor ), px.setReady( true )
set calculated value on px
makes it ready == true
request.check() makes state == READY if ALL vars are ready
now we can use formula, as request has all dependencies calculated
I have created 2 DRL rules and prepared 3 decision tables (px.xls, py.xls, ...) with factors provided by BA.
Rule1 - contract_premium_prepare.drl:
rule "contract premium request - prepare dependencies"
when
$req : ContractPremiumRequest (state == ContractPremiumRequest.State.INIT)
then
insert( $req.addVar("px") );
insert( $req.addVar("py") );
insert( $req.addVar("pz") );
$req.setState(ContractPremiumRequest.State.IN_PROGRESS);
end
Rule2 - contract_premium_calculate.drl:
rule "contract premium request - calculate premium"
when
$req : ContractPremiumRequest (state == ContractPremiumRequest.State.READY)
then
double px = $req.getVar("px");
double py = $req.getVar("py");
double pz = $req.getVar("pz");
double si = $req.contract.getSumInsured();
// use formula to calculate premium
double premium = si * px * (1 + py) / pz;
// round to 2 digits
$req.setValue(premium);
end
Decision table px.xls:
Decision table py.xls:
KieContainer is constructed once on startup:
dtconf = KnowledgeBuilderFactory.newDecisionTableConfiguration();
dtconf.setInputType(DecisionTableInputType.XLS);
KieServices ks = KieServices.Factory.get();
KieContainer kc = ks.getKieClasspathContainer();
Now to calculate premium for given contract we write:
ContractPremiumRequest request = new ContractPremiumRequest(contract); // state == INIT
kc.newStatelessKieSession("session-rules").execute(request);
BigDecimal premium = request.getValue();
This is what happens:
Rule1 fires for ContractPremiumRequest[INIT]
this rule creates and adds px, py and pz dependencies (Var objects)
proper excel row fires for each px, py, pz object and makes it ready
Rule2 fires for ContractPremiumRequest[READY] and use formula
Volumes
PX decision table has ~100 rows,
PY decision table has ~8000 rows,
PZ decision table has ~50 rows.
My results
First calculation, which loads and initializes decision tables takes ~45 seconds – this might become problematic.
Each calculation (after some warmup) takes ~0.8 ms – which is acceptable for our team.
Heap consumption is ~150 MB – which is problematic as we expect much more big tables will be used.
Question
Am I using Drools the wrong way?
How to meet the requirements in more performant way?
How to optimize memory usage?
========== EDIT (after 2 years) ==========
This is a short summary after 2 years.
Our system has grown very much, as we expected. We have ended with more then 500 tables (or matrices) with insurance pricing, actuarial factors, coverage configs etc.
Some tables are more than 1 million rows in size. We used drools but we couldn't handle performance problems.
Finally we have used Hyperon engine (http://hyperon.io)
This system is a beast - it allows us to run hundreds rule matches in approx 10 ms total time.
We were even able to trigger full policy recalculation on every KeyType event on UI fields.
As we have learnt, Hyperon uses fast in-memory indexes for each rule table and these indexes are somehow compacted so they offer almost no memory footprint.
We have one more benefit now - all pricing, factors, config tables can be modified on-line (both values and structure) and this is fully transparent to java code.
Application just continues to work with new logic, no development or restart is needed.
However we have needed some time and effort to get to know Hyperon well enough :)
I have found some comparison made by our team a year ago - it shows engine initialization (drools/hyperon) and 100k simple calculations from jvisualVM perspective:

The problem is that you have created a huge amount of code (all the rules resulting from the tables) for what is a relatively small amount of data. I have seen similar cases, and they all benefited from inserting the tables as data. PxRow, PyRow and PzRow should be defined like this:
class PxRow {
private String gender;
private int age;
private double px;
// Constructor (3 params) and getters
}
Data can still be in (simpler) spreadsheets or anything else you fancy for data entry by the BA boffins. You insert all rows as facts PxRow, PyRow, PzRow. Then you need one or two rules:
rule calculate
when
$c: Contract( $cs: clientSex, $ca: clientAge,
$pc: professionCode, $py: policyYear,...
...
$si: sumInsured )
PxRow( gender == $cs, age == $ca, $px: px )
PyRow( profCode == $pc, polYear == $py,... $py: py )
PzRow( ... $pz: pz )
then
double premium = $si * $px * (1 + $py) / $pz;
// round to 2 digits
modify( $c ){ setPremium( premium ) }
end
Forget the flow and all the other decorations. But you may need another rule just in case your Contract doesn't match Px or Py or Pz:
rule "no match"
salience -100
when
$c: Contract( premium == null ) # or 0.00
then
// diagnostic
end

After reading the question more carefully, I would offer a few recommendations:
I'd prefer a relational database to Excel spreadsheets.
These are trivially simple calculations. I think the model is overkill. A rules engine seems like far too big a hammer for a problem of this size.
I would code it more simply.
Make the calculation interfaced based so you can modify it by injecting in a new class implementation.
Learn how to write Junit tests.
My first choice would be a simple decision table calculation, without a rules engine, maintaining the factors in a relational database.
A Rete rules engine is a big hammer for if/else or switch statements. I think it's overkill unless you're leveraging induction features.
I would not put anything in session. I'm envisioning an idempotent REST service that takes in a request and returns a response with premium and whatever else has to come back.
It sounds to me like you are grossly overcomplicating the solution prematurely. Do the simplest thing that can possibly work; measure the performance; refactor as needed based on the data you get back and requirements.
How experienced a developer are you? Are you alone or part of a team? Is this a new system that's never been done by you before?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.