Declaring configuration of custom configurable application in java? - java

So for a hobby project of mine, I would like to create an application that translates an HTTP call and request between two services.
The application does that based on a configuration that can be set by the user. The idea is that the application listens to an incoming API call translates the call and then forwards it.
Then the application waits for a response then translates the response and sends it back to the caller.
A translation can be as simple as renaming a field value in a body object or replace a header field to the body.
I think a translation should begin with mapping the correct URL so here is an example of what I was thinking of a configuration should look like:
//request mapping
incoming URL = outgoing URL(
//Rename header value
header.someobject.renameto = "somevalue"
//Replace body object to header
body.someobject.replaceto.header
)
I was thinking that the configuration should be placed in a .txt file and read by the application.
My question is, are there other similar systems that use a configuration file for a configuration like this? And are there other/better ways to declare a configuration?

I have done something sort-of-similar in a different context (generate code from an input specification), so I will provide an outline of what I did to provide some food for thought. I used Config4* (disclosure: I developed that). If the approach I describe below is of interest to you, then I suggest you read Chapters 2 and 3 of the Config4* Getting Started Guide to get an overview of the Config4* syntax and API. Alternatively, express the concepts below in a different configuration syntax, such as XML.
Config4* is a configuration syntax, and the subset of syntax relevant to this discussion is as follows:
# this is a comment
name1 = "simple value";
name2 = ["a", "list of", "values"];
# a list can be laid out in columns to simulate a table of information
name3 = [
# item colour
#------------------
"car", "red",
"jeans", "blue",
"roses", "red",
];
In a code generator application, I used a table to provide rules to specify how to generate code for assigning values to fields of messages. If no rule was specified for a particular field, then some built-in rules provided default behaviour. The table looked something like the following:
field_rules = [
# wildcarded message.field instruction
#----------------------------------------------------------------
"Msg1.username", "#config:username",
"Msg1.password", "#config:password",
"Msg3.price", "#order:price",
"*.account", "#string:foobar",
"*.secondary_account", "#ignore",
"*.heartbeat_interval", "#expr:_heartbeatInterval * 1000",
"*.send_timestamp", "#now",
];
When my code generator wanted to generate code to assign a value to a field, the code generator constructed a string of the form "<message-name>.<field-name>", for example, Msg3.price. Then it examined the field_rules table line-by-line (starting from the top) to find a line in which the first column matched "<message-name>.<field-name>". The matching logic permitted * as a wildcard character that could match zero or more characters. (Conveniently, Config4* provides a patternMatch() utility operation that provides this functionality.)
If a match was found, then the value in the instruction column told the code generator what sort of code to generate. (If no match was found, then built-in rules were used, and if none of those applied, then no code was generated for the field.)
Each instruction was a string of the form "#<keyword>:optional,arguments". That was tokenized to provide the keyword and the optional arguments. The keyword was converted to an enum, and that drove a switch statement for generating code. For example:
The #config:username instruction specified that code should be
generated to assign the value of the username variable in a runtime
configuration file to the field.
The #order:price instruction specified that code should be generated
to assign the value returned from calling orderObj->getPrice() to the field.
The #string:foobar instruction specified the string literal foobar
should be assigned to the field.
The #expr:_heartbeatInterval * 1000 instruction specified that code should
be generated to assign the value of the expression _heartbeatInterval * 1000
to the field.
The #ignore instruction specified that no code should be generated to
assign a value to the field.
The #now instruction specified that code should be generated to assign
the current clock time to the field.
I have used the above technique in several projects, and each time I have invented instructions specific to the needs of the particular project. If you decide to use this technique, then obviously you will need to invent instructions to specify runtime translations rather than instructions to generate code. Also, don't feel you have to shoehorn all of your translation-based configuration into a single table. For example, you might use one table to provide a source URL -> destination URL mapping, and a different table to provide instructions for translating fields within messages.
If this technique works as well for you as it has worked for me on my projects, then you will end up with your translation application being an "engine" whose behaviour is driven entirely by a configuration file that, in effect, is a DSL (domain-specific language). That DSL file is likely to be quite compact (less than 100 lines), and will be the part of the application that is visible to users. Because of this, it is worthwhile investing effort to make the DSL as intuitive and easy-to-read/modify as possible, because doing that will make the translation application: (1) user friendly, and (2) easy to document in a user manual.

Related

How to get result set by checking a specific element in an aggregated array using JOOQ?

I want to filter results by a specific value in the aggregated array in the query.
Here is a little description of the problem.
Section belongs to the garden. Garden belongs to District and District belongs to the province.
Users have multiple sections. Those sections belong to their gardens and they are to their Districts and them to Province.
I want to get user ids that have value 2 in district array.
I tried to use any operator but it doesn't work properly. (syntax error)
Any help would be appreciated.
ps: This is possible writing using plain SQL
rs = dslContext.select(
field("user_id"),
field("gardens_array"),
field("province_array"),
field("district_array"))
.from(table(select(
arrayAggDistinct(field("garden")).as("gardens_array"),
arrayAggDistinct(field("province")).as("province_array"),
arrayAggDistinct(field("distict")).as("district_array"))
.from(table("lst.user"))
.leftJoin(table(select(
field("section.user_id").as("user_id"),
field("garden.garden").as("garden"),
field("garden.province").as("province"),
field("garden.distict").as("distict"))
.from(table("lst.section"))
.leftJoin("lst.garden")
.on(field("section.garden").eq(field("garden.garden")))
.leftJoin("lst.district")
.on(field("district.district").eq(field("garden.district")))).as("lo"))
.on(field("user.user_id").eq(field("lo.user_id")))
.groupBy(field("user.user_id"))).as("joined_table"))
.where(val(2).equal(DSL.any("district_array"))
.fetch()
.intoResultSet();
Your code is calling DSL.any(T...), which corresponds to the expression any(?) in PostgreSQL, where the bind value is a String[] in your case. But you don't want "district_array" to be a bind value, you want it to be a column reference. So, either, you assign your arrayAggDistinct() expression to a local variable and reuse that, or you re-use your field("district_array") expression or replicate it:
val(2).equal(DSL.any(field("district_array", Integer[].class)))
Notice that it's usually a good idea to be explicit about data types (e.g. Integer[].class) when working with the plain SQL templating API, or even better, use the code generator.

Elastic search index on all attributes?

I am new to elastic search(ES) and have gone through basic tutorials like
this mykong tutorial
I have question on create part of any document
CREATE Operation Example
To insert a new Document with /mkyong/posts/1001 and the following Request Data:
{
"title": "Java 8 Optional In Depth",
"category":"Java",
"published_date":"23-FEB-2017",
"author":"Rambabu Posa"
}
Question 1 :- Will ES create the inverted index on all attributes of above document i.e. title/category/published/author by default and provide
full text search or I need to mention it explicitly ?
Question 2 :- In above example we already have unique_id i.e. 1001. That's fine if I am already storing it in DB and generate ID. What if
I need to generate the ID through ES engine and do not have any DB ?
Update :-
Got the answer for question 1 from Specify which fields are indexed in ElasticSearch
Question 1 :- Yes, by default ES will index your field twice as two separate types. Once as "text" and once as "keyword" as a sub-field like "title.keyword". The "text" type runs through language analyzers to support the standard search case (remove stop words, stem words, etc). The "keyword" type makes no changes and indexes the data exactly as it is support exact match and aggregations. You can explicitly tell ES a mapping for any field, but if you don't this is the default behavior.
Here is some information on the text vs keyword behavior:
https://www.elastic.co/blog/strings-are-dead-long-live-strings
Question 2 :- ES will automatically create it's own internal ID for every document you index in a field called "_id". You can technically replace this with your own ID, but typically you don't want to do that because it can impact performance by making ES's hashing algorithm to spread out the data preform poorly. It is usually better to just to add any ID's you would like as new fields in the document and let ES index them for you, ideally as the keyword type.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html

Drools for large volume of data

We have a requirement where we need to process about 5MM messages in a day and based on certain business rules, generate a unique identifier for messages received asynchronously.
Use case:-
System received message A, message B, message C and message D (standard xml format for all message types).
Business Rule :- If message A contains tag <tag1> and value of tag matches against value of either of <tag2> , <tag3>, <tag4> of message B, C or D; assign an identifier assigned for first match. If none matches, generate new identifer and assign to message A.
Similiar rules applies for message B, C or D.
We thought of using Drools Engine implementation to support above use case but not sure if it will work of such huge amount of data and processed near real time.
Has anyone used Drools Engine to process large amount of data and if so, can you please share the issues or statistical data around the same.
For simple rules that just check 4 conditions your describe Drools will perform more than fast enough. Just make sure you compile rools just once and not every rule execution. You should likely see performance in order of about few 100_000 of rule invocations per minute in hot state against simple rules like you describe above.
Take a look at these benchmarks for to get better idea:
https://github.com/winklerm/phreak-examples/tree/master/benchmark

making a condition from a pattern

I have a table which contains following columns
dependentColumn : values table1.column2, table1.column3, table3.column4....
condition : values ([table1.column2.LAST3][=ABC][OR][=DEF]),
([table1.column2.ALL][=ABC]),
(([table1.column2][=ABC][OR][table1.column2][!="DEF"])[AND]
([table1.column2][!="DEF"]))
...
values: abc, [table1.column1.LAST3]
Now I need to parse the values contained in condition column and write a code containg the conditions and put the values to the dependentColumns
My concern is making java conditions from the conditions mentioned in the 'condition' column. conditions are stored in a pattern. there can be multiple conditions whith ANDs and ORs. How do I tackel the problem. I Know its possible but I am a bit confused.Can I use Stack Class, tyhough I have not used it before.
If there is a simple way out to the solution please tell me
It's not totally clear what you're trying to do from your question but here's my understanding. It looks like you're trying to encode some values into some db objects described by the "dependentColumn" column of a database table where the values are defined by evaluating a domain-specific language (DSL) encoded in the the "condition" column.
One critical aspect is how complex this DSL is. A simple language could be parsed by regular expressions and evaluated using a stack as you mentioned but from your example it looks like you could have grouped boolean expressions which might require the use of an actual parser generator (e.g. ANTLR).

JCR 170 Data modeling: Node names

The situation:
Lets say we are implementing a blog engine based on JCR with support for localization.
The content structure looks something like this /blogname/content/[node name]
The problem:
What is the best way to name the content nodes (/blogname/content/[nodename]) to satisfy the following requirements:
The node name must be usable in HTML to support REST like URLs i.e.: blogname.com/content/nodename should point to a single content item.
The above requirement must not produce ugly URLs i.e.: /content/node_name is good, /content/node%20name is bad.
Programmatic retrieval should be easy given the node name i.e.: //content[#node_name=some-name]
The naming scheme must guarantee node name uniqueness.
PS: The JCR implementation used is JackRabbit
For 1. to 3. the answer is simple: just use characters you want to see in the node name, ie. escape whatever input string you have (eg. the blog post title) against a restricted character set such as the one for URIs.
For example, do not allow spaces (which are allowed for JCR node names, but would produce the ugly %20 in URLs) and other chars that must be encoded in URLs. You can remove those chars or simply replace them with a underscore, because that looks good in most cases.
Regarding unique names (4.), you can either include the current time incl. milliseconds into it or you explicitly check for collisions. The first might look a bit ugly, but should probably never fail for a blog scenario. The latter can be done by reacting upon the exception thrown if a node with such a name already exists and adding eg. an incrementing counter and try again (eg. my_great_post1, my_great_post2, etc.). You can also lock the parent node so that only one session can actually add a node at the same time, which avoids a trial loop, but comes at the cost of blocking.
Note: //content[#node_name=some-name] is not a valid JCR Xpath query. You probably want to use /jcr:root/content//some-name for that.
Regarding item 3. I recently learned that xpath queries do not allow items to start with a number. If your node name starts with a number it can still be queried by escaping the first byte of the name, but your queries will be more straightforward if you start all node names with a letter.
(I'm not sure about property names. Haven't ever seen one that didn't start with a letter.)
Unique names: To quickly generate a unique name from the first characters of a title plus a random number (to resolve conflicts), you could use the following algorithm:
String title = "JCR 170 Data modeling: Node names";
String name = title.substring(0, Math.min(title.length(), 10)).trim().replace(' ', '_');
if (name is not unique) {
name += "_";
Random r = new Random();
while (name is not unqiue) {
name += Integer.toString(r.nextInt(10));
}
}
The advantage to use a random number is: even if you have many similar names, this will resolve conflicts very quickly.

Categories