Load mathematical formulae dynamically and evaluating values using Script Engine - java

I need to implement a feature for a program that calculates metrics based on predefined measures with the requirement that the addition of a new metric (and/or measure) should require minimal code level changes. This is how I have done it:
class Measure { // clas to hold the measures
Integer id;
String name;
Long value;
// necessary getters and setters
}
These measures are retrieved from a MySQL DB.
// the measures stored in an ArrayList of objects of the above Measure class
[
{
"id": 1,
"name": "Displacement",
"value": 200
},
{
"id": 2,
"name":"Time",
"value": 120
},
{
"id":3,
"name":"Mass",
"value": 233
},
{
"id":4,
"name": "Acceleration",
"value": 9.81
},
{
"id": 5,
"name":"Speed of Light",
"value": 300000000
}
]
I need to get metrics such as the following:
Velocity (Displacement/Time), Force (Mass * Acceleration) and Energy (Mass * Speed of Light^2)
I implemented the following JSON in which I have defined the above formulae:
[
{
"title": "Velocity",
"unit": "m/s",
"formula": "( displacement / time )"
},
{
"title": "Force",
"unit": "N",
"formula": "( mass * acceleration )"
},
{
"title": "Energy",
"unit": "J",
"formula": "( mass * speed_of_light * speed_of_light )"
}
]
I then calculate metrics in the following way:
class Evaluator {
ScriptEngine engine = new ScriptEngineManager().getEngineByExtension("js");
public Evaluator(List<Measure> measures) {
measures.forEach(measure -> {
String fieldName = measure.getName().replace(" ", "_").toLowerCase(); // convert Speed of Light -> speed_of_light etc
engine.put(fieldName, Double.parseDouble(measure.getValue()));
})
}
public void formula(String formula) throws Exception {
engine.eval("function calculateMetric() { return " + formula +" }");
}
public Object evaluate() throws ScriptException {
return engine.eval("calculateMetric()");
}
}
The above class is used to load each formula into the Script Engine and then calculate the metric value based on the formula provided.
// load JSON file into JsonArray
JsonArray formulae = parseIntoJsonArray(FileUtils.getContent("formulae.json"));
Evaluator evaluator = new Evaluator(measures);
for (Object formula : formulae) {
JsonObject jsonObj = (JsonObject) formula;
evaluator.formula(jsonObj.get("formula").getAsString());
Double metricVal = Double.parseDouble(evaluator.evaluate().toString());
// do stuff with value
}
The code works as expected. I want to know if I am doing anything wrong that could affect the program in the long run/if anything is not best practice/if there is a better way/easier way to accomplish the same.
This question is closely related to a question I posted yesterday. Sort of a second part.

Related

Google DLP - Can I use a delimiter to instruct DLP infotype detectors to search only inside that for sensitive text?

I have an issue while trying to deidentify some data with DLP using an object mapper to parse the object into string - send it to DLP for deidentification - getting back the deidentified string and using the object mapper to parse the string back to the initial object. Sometimes DLP will return a string that cannot be parsed back to the initial object (it breaks the json format of the object mapper)
I use an objectMapper to parse an Address object to string like this:
Address(
val postal_code: String,
val street: String,
val city: String,
val provence: String
)
and my objectmapper will transform this object into a string eg: "{\"postal_code\":\"123ABC\",\"street\":\"Street Name\",\"city\":\"My City\",\"provence\":\"My Provence\"}" which is sent to DLP and deidentified (using LOCATION or STREET_ADDRESS detectors).
The issue is that my object mapper would expect to take back the deidentified string and parse it back to my Address object using the same json format eg:
"{\"postal_code\":\"LOCATION_TOKEN(10):asdf\",\"street\":\"LOCATION_TOKEN(10):asdf\",\"city\":\"LOCATION_TOKEN(10):asdf\",\"provence\":\"LOCATION_TOKEN(10):asdf\"}"
But there are a lot of times that DLP will return something like
"{"LOCATION_TOKEN(25):asdfasdfasdf)\",\"provence\":\"LOCATION_TOKEN(10):asdf\"}" - basically breaking the json format and i am unable to parse back the string from DLP to my initial object
Is there a way to instruct DLP infotype detectors to keep the json format, or to look for sensitive text only inside \" * \"?
Thanks
There are some options here using a custom regex and a detection ruleset in order to define a boundary on matches.
The general idea is that you require that findings must match both an infoType (e.g. STREET_ADDRESS, LOCATION, PERSON_NAME, etc.) and your custom infoType before reporting as a finding or for redaction. By requiring that both match, you can set bounds on where the infoType can detect.
Here is an example.
{
"item": {
"value": "{\"postal_code\":\"123ABC\",\"street\":\"Street Name\",\"city\":\"My City\",\"provence\":\"My Provence\"}"
},
"inspectConfig": {
"customInfoTypes": [
{
"infoType": {
"name": "CUSTOM_BLOCK"
},
"regex": {
"pattern": "(:\")([^,]*)(\")",
"groupIndexes": [
2
]
},
"exclusionType": "EXCLUSION_TYPE_EXCLUDE"
}
],
"infoTypes": [
{
"name": "EMAIL_ADDRESS"
},
{
"name": "LOCATION"
},
{
"name": "PERSON_NAME"
}
],
"ruleSet": [
{
"infoTypes": [
{
"name": "LOCATION"
}
],
"rules": [
{
"exclusionRule": {
"excludeInfoTypes": {
"infoTypes": [
{
"name": "CUSTOM_BLOCK"
}
]
},
"matchingType": "MATCHING_TYPE_INVERSE_MATCH"
}
}
]
}
]
},
"deidentifyConfig": {
"infoTypeTransformations": {
"transformations": [
{
"primitiveTransformation": {
"replaceWithInfoTypeConfig": {}
}
}
]
}
}
}
Example output:
"item": {
"value": "{\"postal_code\":\"123ABC\",\"street\":\"Street Name\",\"city\":\"My City\",\"provence\":\"My [LOCATION]\"}"
},
By setting "groupIndexes" to 2 we are indicating that we only want the custom infoType to match the middle (or second) regex group and not allow the :" or " to be part of the match. Also, in this example we mark the custom infoType as EXCLUSION_TYPE_EXCLUDE so that it does not report itself:
"exclusionType": "EXCLUSION_TYPE_EXCLUDE"
If you remove this line, anything matching your infoType could also get redacted. This can be useful for testing though - example output:
"item": {
"value": "{\"postal_code\":\"[CUSTOM_BLOCK]\",\"street\":\"[CUSTOM_BLOCK]\",\"city\":\"[CUSTOM_BLOCK]\",\"provence\":\"[CUSTOM_BLOCK][LOCATION]\"}"
},
...
Hope this helps.

Json Array Flattening with Jackson / Parsing Performance

I have a JSON like this below:
My aimed POJO is
[{
"id": "1",
"teams": [{
"name": "barca"
},
{
"name": "real"
}
]
},
{
"id": "2"
},
{
"id": "3",
"teams": [{
"name": "atletico"
},
{
"name": "cz"
}
]
}
]
My aimed POJO is
class Team
int id;
String name;
Meaning, for each "team" I want to create a new object. Like;
new Team(1,barca)
new Team(1,real)
new Team(2,null)
new Team(3,atletico)
...
Which I believe I did with custom deserializer like below:
JsonNode rootArray = jsonParser.readValueAsTree();
for (JsonNode root : rootArray) {
String id = root.get("id").toString();
JsonNode teamsNodeArray = root.get("teams");
if (teamsNodeArray != null) {
for (JsonNode teamNode: teamsNodeArray ) {
String nameString = teamNode.get("name").toString();
teamList.add(new Team(id, nameString));
}
} else {
teamList.add(new Team(id, null));
}
}
Condidering I am getting 750k records... having 2 fors is I believe making the code way slower than it should be. It takes ~7min.
My question is, could you please enlighten me if there is any better way to do this?
PS: I have checked many stackoverflow threads for this, could not find anything that fits so far.
Thank you in advance.
Do not parse the data yourself, use automatic de/serialization whenever possible.
Using jackson it could be as simple as:
MyData myData = new ObjectMapper().readValue(rawData, MyData.class);
For you specific example, we generate a really big instance (10M rows):
$ head big.json
[{"id": 1593, "group": "6141", "teams": [{"id": 10502, "name": "10680"}, {"id": 16435, "name": "18351"}]}
,{"id": 28478, "group": "3142", "teams": [{"id": 30951, "name": "3839"}, {"id": 25310, "name": "19839"}]}
,{"id": 29810, "group": "8889", "teams": [{"id": 5586, "name": "8825"}, {"id": 27202, "name": "7335"}]}
...
$ wc -l big.json
10000000 big.json
Then, define classes matching your data model (e.g.):
public static class Team {
public int id;
public String name;
}
public static class Group {
public int id;
public String group;
public List<Team> teams;
}
Now you can read directly the data by simply:
List<Group> xs = new ObjectMapper()
.readValue(
new File(".../big.json"),
new TypeReference<List<Group>>() {});
A complete code could be:
public static void main(String... args) throws IOException {
long t0 = System.currentTimeMillis();
List<Group> xs = new ObjectMapper().readValue(new File("/home/josejuan/tmp/1/big.json"), new TypeReference<List<Group>>() {});
long t1 = System.currentTimeMillis();
// test: add all group id
long groupIds = xs.stream().mapToLong(x -> x.id).sum();
long t2 = System.currentTimeMillis();
System.out.printf("Group id sum := %d, Read time := %d mS, Sum time = %d mS%n", groupIds, t1 - t0, t2 - t1);
}
With output:
Group id sum := 163827035542, Read time := 10710 mS, Sum time = 74 mS
Only 11 seconds to parse 10M rows.
To check data and compare performance, we can read directly from disk:
$ perl -n -e 'print "$1\n" if /"id": ([0-9]+), "group/' big.json | time awk '{s+=$1}END{print s}'
163827035542
4.96user
Using 5 seconds (the Java code is only half as slow).
The non-performance problem of processing the data can be solved in many ways depending on how you want to use the information. For example, grouping all the teams can be done:
List<Team> teams = xs.stream()
.flatMap(x -> x.teams.stream())
.collect(toList());
Map<Integer, Team> uniqTeams = xs.stream()
.flatMap(x -> x.teams.stream())
.collect(toMap(
x -> x.id,
x -> x,
(a, b) -> a));

How to remove in a JSON an Object in an Array if this object contains a key with certain value?

I have a nested Json that I need to remove some objects in array with a filter, this in a dynamic way, this Json structure is not the same all time, for example:
{
"A": "HI",
"B": 1,
"C": [
{
"TIME": "TODAY",
"LOCATION": "USA",
"BALANCE": 100,
"STATE": "TX",
"NAME": "JHON"
},
{
"TIME": "YESTERDAY",
"LOCATION": "USA",
"BALANCE": 100,
"STATE": "TX",
"NAME": "MICHAEL"
},
{
"TIME": "YESTERDAY",
"LOCATION": "USA",
"BALANCE": 100,
"STATE": "TX",
"NAME": "REBECCA"
}
]
}
And now, from this kind of nested Json I want to remove the Object that contains key "NAME" with VALUE "Michael", and the result have to be this one:
{
"A": "HI",
"B": 1,
"C": [
{
"TIME": "TODAY",
"LOCATION": "USA",
"BALANCE": 100,
"STATE": "TX",
"NAME": "JHON"
},
{
"TIME": "YESTERDAY",
"LOCATION": "USA",
"BALANCE": 100,
"STATE": "TX",
"NAME": "REBECCA"
}
]
}
This JSON change every time depending on reponse from an API, just I have to match KEY - VALUE to remove the Object that I need filter without modify the Json structure, in this case I need to recive KEY = "NAME" and VALUE = "Michael" to filter this object.
In this case "C" is a variable key and I could have more keys with arrays in the same json that need to be filtered, I need a dynamic way to filter in array of objects based just in key-value
Could you help me find a way to perform this functionality?
Here is a streaming solution that can deal with huge responses without any significant impact on your system. It also does not require any class mappings using the built-in JSON node representation (therefore saving time and probably memory on type bindings).
public static void filterAbcBySpecializedStreaming(final JsonReader input, final JsonWriter output)
throws IOException {
input.beginObject();
output.beginObject();
// iterating over each entry of the outer object
while ( input.hasNext() ) {
final String name = input.nextName();
output.name(name);
switch ( name ) {
// assuming "A" is known to be a string always
case "A":
output.value(input.nextString());
break;
// assuming "B" is known to be a number always
case "B":
// note: JsonReader does not allow to read a number of an arbitrary length as an instance of `java.lang.Number`
output.value(new LazilyParsedNumber(input.nextString()));
break;
// assuming "C" is known to be an array of objects always
case "C":
input.beginArray();
output.beginArray();
// iterating over each element of the array
while ( input.hasNext() ) {
// assuming array elements are not very big and it trusts their size
final JsonObject jsonObject = Streams.parse(input)
.getAsJsonObject();
// if the current element JSON object has a property named "NAME" and its value is set to "MICHAEL", the skip it
// of course, this can also be externalized using the Strategy design pattern (e.g. using java.lang.function.Predicate)
// but this entire method is not that generic so probably it's fine
if ( jsonObject.get("NAME").getAsString().equals("MICHAEL") ) {
continue;
}
Streams.write(jsonObject, output);
}
input.endArray();
output.endArray();
break;
default:
throw new IllegalStateException("Unknown: " + name);
}
}
input.endObject();
output.endObject();
}
The test:
final JsonElement expected = Streams.parse(expectedJsonReader);
final ByteArrayOutputStream buffer = new ByteArrayOutputStream();
final JsonWriter output = new JsonWriter(new OutputStreamWriter(buffer));
Filter.filterAbcBySpecializedStreaming(input, output);
output.flush();
final JsonElement actual = JsonParser.parseReader(new InputStreamReader(new ByteArrayInputStream(buffer.toByteArray())));
Assertions.assertEquals(expected, actual);
Of course, it's not that easy, but it may result in the best performance. Making it generic and "dynamic" is an option, and it can be done according to your needs. If you find it too complex, the input JSON document is known not to be very big (therefore causing OutOfMemoryErrors), you can also filter it out as a tree, but again without any type bindings:
public static void filterAbcBySpecializedTree(final JsonElement input, final JsonElement output) {
final JsonObject inputJsonObject = input.getAsJsonObject();
final JsonObject outputJsonObject = output.getAsJsonObject();
for ( final Map.Entry<String, JsonElement> e : inputJsonObject.entrySet() ) {
final String name = e.getKey();
final JsonElement value = e.getValue();
switch ( name ) {
case "A":
case "B":
outputJsonObject.add(name, value.deepCopy());
break;
case "C":
final JsonArray valueJsonArray = value.getAsJsonArray()
.deepCopy();
for ( final Iterator<JsonElement> it = valueJsonArray.iterator(); it.hasNext(); ) {
final JsonObject elementJsonObject = it.next().getAsJsonObject();
if ( elementJsonObject.get("NAME").getAsString().equals("MICHAEL") ) {
it.remove();
}
}
outputJsonObject.add(name, valueJsonArray);
break;
default:
throw new IllegalStateException("Unknown: " + name);
}
}
}
Test:
final JsonElement input = Streams.parse(inputJsonReader);
final JsonElement expected = Streams.parse(expectedJsonReader);
final JsonElement actual = new JsonObject();
Filter.filterAbcBySpecializedTree(input, actual);
Assertions.assertEquals(expected, actual);

0.0 rounded o 0 in wrapper - ApaceCXF REST

My application uses Apace CXF for exposing REST APIs and uses jackson for marshalling and unmarshalling.. In one of the endpoint, we return a wrapper which is a Map. Displaying decimal places is crucial for this application.
#XmlRootElement(name = "Output")
class Wrapper {
private Map<String, CountVO> data;
//constructor
//getter, setter
}
class CountVO {
private BigDecimal value;
//getter, setter, constructor
//updateMethod
public void updateValue(String userDecimalFormat){
switch(userDecimalFormat){
case "1":
this.value = BigDecimal.ZERO.setScale(1);
break;
case "2":
this.value = BigDecimal.ZERO.setScale(2);
break;
case "3":
this.value= BigDecimal.ZERO.setScale(3);
break;
}
}
}
Here, default value is set to 0 with decimal points based on certain configuration.
I have added tostring and loggers at relevant places in the code. What I could see is that, when the userDecimalFormat is 1 or 2 or 3, I get the desired output, viz count is set to 0.0, 0.00 or 0.000 resp and the same is printed in the logs.
But, the moment it is converted to json format, I get the desired output only when userDecimalFormat is 2,3. When it is 1, I get it as 0 and not as 0.0 in the resultant json. Here is the snippet.
{
"Output": {
"data": {
"entry": [
{
"key": "1",
"value": {
"count": 0
}
},
{
"key": "2",
"value": {
"count": 0
}
}
]
}
}
}
In other cases, it is as follows.
{
"Output": {
"data": {
"entry": [
{
"key": "1",
"value": {
"count": "0.00"
}
},
{
"key": "2",
"value": {
"count": "0.00"
}
}
]
}
}
}
How do you suggest I resolve this issue?

how to deserilize JSON with dynamic numeric field names, using GSON?

I have a JSON file in this format:
title 1
{
"0" : 2,
"1" : 5,
"2" : 8
}
title 2
{
"1" : 44,
"2" : 15,
"3" : 73,
"4" : 41
}
As you can see the indexes are dynamic - in title 1 they were: "0","1","2" and for title 2 they are: "1","2","3","4"
I don't know how to read this using GSON.
I need to somehow convert it into a java object so I can go on and process the data.
Any help is most welcome.
First thing is the JSON that is represented on the page is not valid JSON invalid, so my recommendation is based on the fallowing JSON. Just giving you full disclosure.
{
"title 1":{
"0":2,
"1":5,
"2":8
},
"title 2":{
"1":44,
"2":15,
"3":73,
"4":41
}
}
OPTION 1 (not how i would solve this issue)
This would serialize it as a general object that you could loop threw to process.
new Gson().fromJson(yourJsonString,Object.class);
OPTION 2 (best bet in my opinion)
If you had control of how the object cam in i would do something like this
{
"listOfTitles":[
{
"title":[
{
"key":"0",
"value":1234
},
{
"key":"1",
"value":12341234
},
{
"key":"2",
"value":123412341234
}
],
"titleName":"title 1"
},
{
"title":[
{
"key":"0",
"value":12341
},
{
"key":"1",
"value":123412
},
{
"key":"2",
"value":12
},
{
"key":"3",
"value":12341
}
],
"titleName":"title 2"
}
]
}
This would allow for you to build an object like...
public class YouObjectName{
private ArrayList<Title> listOfTitles;
private String titleName;
//constructor
//getters and setters
}
public class Title{
private String key;
private Integer value;
//constructor
//getters and setters
}
I would consume this with GSON like
new Gson().fromJson(jsonString,YouObjectName.class);
Hope that helps a little.

Categories