Convert External format to Internal format

Convert External format to Internal format - java

I need a solution for this problem:
I'm a service provider and there are several service clients who I work with.
each service client sends me request by its own format, for instance:
service client 1 fields are --> f1 , f2 , f3
service client 2 fields are --> f2 , f3 , f4
service client 3 fields are --> f3 , f7 , f8
it is possible they add or remove new fields or change their current format, for example "service client 1" combines:
f1+f2 ==> f12 and adds f5
or client 3 :
decomposes f7 ---> f1,f2
I need an internal format for myself, for instance :
f1,f2,f3,f4,f5,f6,f7,f8,f9
this format should be configurable in a way that I can change it by xml configuration file so when a change happens on client side I fix it by changing xml without changing source code.
How can I do that?

Briefly, you need an API to which hand over your external message plus a “how to” file and it does some magic on it and deliver you the internal message. Let’s focus on the main duty of the API which is message conversion. As you mentioned it should be configurable by an XML config file. We need an element that can be called “Field” which has at least one attribute which I call “name”. I wrap a collection of these Field elements within a parent element. Every one of “Field” elements designates a field in the target internal message. Within the Field element I like to add another element which is responsible to gather my desired fields and does a function on them. Here’s a sample of an XML config:
<fields>
<field name="aLong">
<function name="add">
<arg>
<function name="readExternalField">
<arg>
f1
</arg>
</function>
</arg>
<arg>
<function name="readExternalField">
<arg>
f2
</arg>
</function>
</arg>
</function>
</field>
<field name="aStr">
<function name="getFromArray" index="0">
<arg>
<function name="splitStr" character=" ">
<arg>
<function name="readExternalField">
<arg>
f3
</arg>
</function>
</arg>
</function>
<arg>
</function>
</field>
</fields>
Imagine we have an internal object which has at least two fields called “aLong” and “aStr” and an external object which has at least three fields: “f1”, “f2”and “f3”. The point is I must make sure of using the functions that their return types are assignable to the target fields. The function “add” adds the value of fields “f1” and “f2” and the result must be assigned to the field “aLong” and the function “splitStr” splits the “f3” field and returns an array which the function “getFromArray” gets the first item of the array as the result. I prefer to use the JAXB API to unmarshal my XML file and to parse it easily, so we need an XSD document which can be generated from the XML file through the online tools. I suggest to utilize map based objects to eliminate the need of doing the reflection stuff. If you develop REST services, the received JSON message can be converted to a map object. In this way your API has a method which receives a map based object and returns the same. So, every field is a key in the map not a field in a class. But the functions can have certain parameters with specific types. The main body of the API must cast the got objects from the external map before passing them to the functions and put the returned value to the internal message with the specified field name in the XML file.
I hope this brief answer illuminates the way to a satisfactory solution and remember that writing an efficient API which you can share with your colleagues proudly is a skill and only practice gives it to you.

Related

Select the right architecture for simple java bean application

I need to make a simple java application, and now I am working on the architecture for this. Please, help me to build way to make my app. I only need advice on how to make this (what classes I need, what methods to include), but the code I will write myself. If it is not difficult for you, please write your opinion about the best and right way to make my app. Thanks!
My technical task below:
Given: TEST table in any database (use in memory databases are not recommended), containing one integer column (FIELD).
You must write a console application in Java, using standard library JDK7 (preferably) or JDK8 and implements the following functionality:
The main application class must follow the rules of JavaBean, that is initialized through the setters. Initialization parameters - the data to connect to the database and the number N.
Upon launch, the application inserts a TEST N records with values 1..N. If the TEST table were recording, they are removed before inserting.
The application then requests the data from TEST.FIELD and generates the correct XML-type document
<entries>
<Entry>
<Field> field to field </ field>
</ Entry>
...
<Entry>
<Field> field to field </ field>
</ Entry>
</ entries>
(with N nested elements ) The document is saved in the file system as "1.xml".
By means of XSLT, the application converts the contents of the "1.xml" to the
following form:
< entries> < entry> < field>value of field 'field' < /entry> ...
< entry> < field>value of field 'field' < /entry>
(with N nested elements ) The new document is saved in the file system as "2.xml".
The application parses "2.xml" and outputs the arithmetic sum of the values of all the attributes field in the console.
For large N (~ 1,000,000) while the application should not be more than five minutes.

+ sign being dropped from xml when validation occurs

up to a previous question I asked here WebResponse posting a null string
while the answer works for the question a new problem happened. When parsing the below xml
<?xml version="1.0" encoding="UTF-8"?>
<hml xmlns="http://schemas.nmdp.org/spec/hml/1.0.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://schemas.nmdp.org/spec/hml/1.0.1 http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd"
version="1.0.1" >
<!--
MIRING Element 1.1 requires the inclusion of an hmlid.
hmlid can be reported in the form of an ISO Object Identifier (OID)
"root" represents a unique publically registered organization
"extension" is a unique document id managed by the reporting organization.
-->
<hmlid root="2.34.48.32" extension="HML.3245662"/>
<!--
MIRING Element 1.2 requires the inclusion of a reporting-center.
reporting-center identifies the organization sending the HML message.
"reporting-center-id" is a unique identifier of the sender.
"reporting-center-context" reports the context/naming authority of the identifier.
-->
<reporting-center reporting-center-id="567"/>
<sample id="4555-6677-8">
<typing gene-family="HLA" date="2015-01-13">
<!--
MIRING Element 3 requires the inclusion of Genotyping information.
The Genotype should include all pertinent Loci, as well as a Genotype in a standard format.
GLStrings can be included either as plain text, or as a reference to a publicly
available service, such as GL Service (gl.nmdp.org)
-->
<allele-assignment date="2015-07-28" allele-db="IMGT/HLA" allele-version="3.17.0">
<haploid locus="HLA-A" method="DNA" type="02:20:01"/>
<glstring>
HLA-A*02:20:01
</glstring>
</allele-assignment>
<typing-method>
<!--
MIRING Element 6 requires platform documentation. This could be a peer-reviewed publication,
or an identifier of a procedure on a publicly available resource, such as NCBI GTR
-->
<sbt-ngs locus="HLA-A"
test-id="HLA-A.Test.1234"
test-id-source="AcmeGenLabs">
<raw-reads uri="rawreads/read1.fastq.gz"
availability="public"
format="fastq"
paired="1"
pooled="1"
adapter-trimmed="1"
quality-trimmed="0"/>
</sbt-ngs>
</typing-method>
<consensus-sequence date="2015-01-13">
<!--
MIRING Element 2 requires the inclusion of Reference Context.
The location and identifiers of the reference sequence should be specified.
start and end attributes are 0-based, and refer to positions on the reference sequence.
-->
<reference-database availability="public" curated="true">
<reference-sequence
name="HLA-A reference"
id="Ref111"
start="945000"
end="946000"
accession="GL000123.4"
uri="http://AcmeGenReference/RefDB/GL000123.4"/>
</reference-database>
<!--
MIRING Element 4 requires the inclusion of a consensus sequence.
The start and end positions are 0-based, and refer to positions on the reference sequence (reference-sequence-id)
Multiple consensus-sequence-block elements can be included sequentially.
-->
<consensus-sequence-block reference-sequence-id="Ref111"
start="945532"
end="945832"
strand="+"
phase-set="1"
expected-copy-number="1"
continuity="true"
description="HLA-A Consensus Sequence 4.5.67">
<!--
A sequence can be reported as plain text, or as a pointer to an external reference,
or as variants from a reference sequence.
-->
<sequence>
CCCAGTTCTCACTCCCATTGGGTGTCGGGTTTCCAGAGAAGCCAATCAGTGTCGTCGCGGTCGCTGTTCTAAAGCCCGCACGCACCCACCGGGACTCAGATTCTCCCCAGACGCCGAGGATGGCCGTCATGGCGCCCCGAACCCTCCTCCTGCTACTCTCGGGGGCCCTGGCCCTGACCCAGACCTGGGCGGGTGAGTGCGGGGTCGGGAGGGAAACCGCCTCTGCGGGGAGAAGCAAGGGGCCCTCCTGGCGGGGGCGCAGGACCGGGGGAGCCGCGCCGGGACGAGGGTCGGGCAGGT
</sequence>
<!--
MIRING Element 5 requires the inclusion of any relevant sequence polymorphisms.
These represent variants from the reference sequence.
start and end attributes are 0-based, and refer to positions on the reference sequence.
You can see this variant at positions 10 - 15 on the sequence. (945542 - 945532 = 10)
-->
<variant id="0"
reference-bases="GTCATG"
alternate-bases="ACTCCC"
start="945542"
end="945548"
filter="pass"
quality-score="95">
<!--
The functional effects of variants can be reported using variant-effect.
They should use Sequence Ontology (SO) variant effect terms.
-->
<variant-effect term="missense_variant"/>
</variant>
</consensus-sequence-block>
</consensus-sequence>
</typing>
</sample>
<!--
Multiple samples can be included in a single message.
Each sample should have it's own reference-database(s) even if they are identical to other samples' references.
-->
<sample id="4555-6677-9">
<typing gene-family="HLA" date="2015-01-13">
<allele-assignment date="2015-07-28" allele-db="IMGT/HLA" allele-version="3.17.0">
<haploid locus="HLA-A" method="DNA" type="02:20:01"/>
<glstring>
HLA-A*02:01:01:01
</glstring>
</allele-assignment>
<typing-method>
<sbt-ngs locus="HLA-A"
test-id="HLA-A.Test.1234"
test-id-source="AcmeGenLabs">
<raw-reads uri="rawreads/read2.fastq.gz"
availability="public"
format="fastq"
paired="1"
pooled="1"
adapter-trimmed="1"
quality-trimmed="0"/>
</sbt-ngs>
</typing-method>
<consensus-sequence date="2015-01-13">
<reference-database availability="public" curated="true">
<reference-sequence
name="HLA-A reference"
id="Ref112"
start="945000"
end="946000"
accession="GL000123.4"
uri="http://AcmeGenReference/RefDB/GL000123.4"/>
</reference-database>
<consensus-sequence-block
reference-sequence-id="Ref112"
start="945532"
end="945832"
strand="+"
phase-set="1"
expected-copy-number="1"
continuity="true"
description="HLA-A Consensus Sequence 4.5.89">
<sequence>
CCCAGTTCTCGTCATGATTGGGTGTCGGGTTTCCAGAGAAGCCAATCAGTGTCGTCGCGGTCGCTGTTCTAAAGCCCGCACGCACCCACCGGGACTCAGATTCTCCCCAGACGCCGAGGATGGCCGTCATGGCGCCCCGAACCCTCCTCCTGCTACTCTCGGGGGCCCTGGCCCTGACCCAGACCTGGGCGGGTGAGTGCGGGGTCGGGAGGGAAACCGCCTCTGCGGGGAGAAGCAAGGGGCCCTCCTGGCGGGGGCGCAGGACCGGGGGAGCCGCGCCGGGACGAGGGTCGGGCAGGT
</sequence>
</consensus-sequence-block>
</consensus-sequence>
</typing>
</sample>
</hml>
Which is the sample given for the validator so I know it works. However when I pass it through my restful POST code:
#POST
#Path("/Validate")
#Produces("application/xml")
public String validate(#FormParam("xml") String xml)
{
System.out.println(xml);
try {
Client client = Client.create();
WebResource webResource = client.resource("http://miring.b12x.org/validator/ValidateMiring/");
// POST method
ClientResponse response = webResource.accept("application/xml").post(ClientResponse.class,"xml="+xml);
// check response status code
if (response.getStatus() != 200) {
throw new RuntimeException("Failed : HTTP error code : " + response.getStatus());
}
// display response
String output = response.getEntity(String.class);
System.out.println("Output from Server .... ");
System.out.println(output + "\n");
return output;
} catch (Exception e) {
e.printStackTrace();
}
return "Oops";
}
Everything passes through perfectly fine except for Strand="+" which for some reason drops the + and gets the error message of The value '' of attribute 'strand' on element 'consensus-sequence-block' is not valid with respect to its...'
I tried it with all of strands enumerations +,-,-1,1 and all of them work except for +.
Using the WEB UI (miring.b12x.org) it works perfectly.
Is there something with parsing with SAX that could cause a + to be dropped or any reason a certain enumeration would be dropped?
Thank you
EDIT: Here is the output received:
Output from Server ....
<?xml version="1.0" encoding="UTF-8"?>
<miring-report xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
timestamp="07/19/2016 15:07:31"
xsi:noNamespaceSchemaLocation="http://schemas.nmdp.org/spec/miringreport/1.0/miringreport.xsd">
<hml-compliant>reject</hml-compliant>
<miring-compliant>reject</miring-compliant>
<hmlid extension="HML.3245662" root="2.34.48.32"/>
<samples compliant-sample-count="4"
noncompliant-sample-count="0"
sample-count="2">
<sample hml-compliant="true" id="4555-6677-8" miring-compliant="true"/>
<sample hml-compliant="true" id="4555-6677-9" miring-compliant="true"/>
</samples>
<fatal-validation-errors>
<miring-result miring-rule-id="reject" severity="fatal">
<description>[cvc-attribute.3:, The, value, ', ', of, attribute, 'strand', on, element, 'consensus-sequence-block', is, not, valid, with, respect, to, its, type,, 'null'.]</description>
<solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
</miring-result>
<miring-result miring-rule-id="reject" severity="fatal">
<description>[cvc-attribute.3:, The, value, ', ', of, attribute, 'strand', on, element, 'consensus-sequence-block', is, not, valid, with, respect, to, its, type,, 'null'.]</description>
<solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
</miring-result>
<miring-result miring-rule-id="reject" severity="fatal">
<description>[cvc-enumeration-valid:, Value, ', ', is, not, facet-valid, with, respect, to, enumeration, '[-1,, 1,, +,, -]'., It, must, be, a, value, from, the, enumeration.]</description>
<solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
</miring-result>
<miring-result miring-rule-id="reject" severity="fatal">
<description>[cvc-enumeration-valid:, Value, ', ', is, not, facet-valid, with, respect, to, enumeration, '[-1,, 1,, +,, -]'., It, must, be, a, value, from, the, enumeration.]</description>
<solution>Verify that your HML file is well formed, and conforms to http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd</solution>
</miring-result>
</fatal-validation-errors>
<validation-warnings>
<miring-result miring-rule-id="1.2.b" severity="warning">
<description>The node reporting-center is missing a reporting-center-context attribute.</description>
<solution>Please add a reporting-center-context attribute to the reporting-center node. You can use reporting-center-context to specify the naming authority of the reporting center identifier. Reporting-center-context is not explicitly required.</solution>
<xpath>/hml[1]/reporting-center[1]</xpath>
</miring-result>
</validation-warnings>
</miring-report>

You don’t set the type of your WebResource, and I don’t know what the default Content-Type of the request is, but I suspect it is application/x-www-form-urlencoded, which means + is being treated as a space. If that is the case, changing "xml="+xml to "xml=" + URLEncoder.encode(xml, "UTF-8") may address the problem.
The application/x-www-form-urlencoded format is the default format for HTML form submissions, as described in the HTML 4.01 specification. The the documentation for the URLEncoder class also describes this format.
In that format, a + character represents a space, so the strand attribute contains a single space. Except, the Attribute-Value Normalization section of the XML 1.0 specification states:
If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters …
So, that single space is then normalized into the empty string (when all leading and trailing space is removed). The empty string, strand='', does not conform to the XML schema you are referencing, http://schemas.nmdp.org/spec/hml/1.0.1/hml-1.0.1.xsd .
URLEncoder.encode escapes all “reserved” characters, including +, as percent-escapes, and then escapes spaces as +. The server expects this format (almost certainly because a Content-Type: application/x-www-form-urlencoded header is present in the HTTP request), and decodes the + and percent-escapes back to the original XML.

java: convert HashMap with dynamic keys to Bean

I'm trying to convert a large Map> to some JavaBean. The key of map corresponds to some property of JavaBean, and the value somehow is decoded to property value. So I decided to use some util for that, but don't know what will work. There are some requirements I have to this util (or framework):
all configuration must be in separate files
should be a possibility to map dynamic quantity of keys:
there is a map:
key | value
quan | n
key_1| value_1
key_2| value_2
........ | .............
key_n| value_n
where n - is any number
and the JavaBean has a List of some beans. They have a property. value_1, value_2, ... must be mapped in this property, and in the end there must be so much beans in list, as these keys and values in map.
3.. should be a possibility to set up custom decoder for property mapping, because in most cases the value in map is a List with 1 value, so I need to get the first item of list (if it's not empty).
4.. should be a possibility run some script to execute extraordinary mappings, for example:
there is a map, that is described in 2d point.
and the JavaBean has a property of type HashMap, where value_1 is mapped to Bean1 and some analogous value from input map is mapped to Bean2.
I've tried to use smooks, but when I've started, all these requirements were not clear yet and the smooks was something new, I haven't worked with it until now. So the smooks config doesn't contain the whole business-logic (because of second req.) and looks ugly, I don't like that. I can show the most ugliest fragment for 2d point:
<jb:bean beanId="javaBean" class="com.example.JavaBean" createOnElement="map">
<jb:wiring property="someBeans" beanIdRef="someBeanItems"/>
</jb:bean>
<jb:bean beanId="someBeanItems" class="java.util.ArrayList" createOnElement="map/entry">
<jb:wiring beanIdRef="someBeanItem"/>
</jb:bean>
<jb:bean beanId="someBeanItem" class="com.example.someBeanItem" createOnElement="map/entry">
<condition>map.quan[0]>0</condition>
<jb:expression property="property1">
index = map.quan[0]-1;
value = additionalProperties.property1_List[index];
map.quan[0] = map.quan[0] - 1;
return value;
</jb:expression>
</jb:bean>
Here "property1_List" is builded before executing smooks.
Now I look for something more nice and need your help: maybe you know how to make that better using smooks? Or what another frameworks for mapping can you recommend for my issue?

Parsing multi-line, multi-column text response with jersey/apache REST client

I'm trying to hit a REST end-point that returns a multi-line, multi-column response, such as:
A1 B1 C1
A2 B2 C2
A3 B3 C3
...
...
I'm currently using jersey-client to hit this endpoint and trying to look for the neatest way to parse this response. Here, each line would represent a bean, say MyBean and each column on that would represent a property in that bean. The order of values in the response is always fixed.
I can get the response back as a long String, split it at line-feeds and tabs to get individual values.
However, I would like to know if there is a way where I can get the results as a List<String>, where each element in the List would represent a line of the response. I can then split it on \t to get individual values.
Here's what I've tried:
WebResource resource = client.resource(NETSPEAK_URL)
.type(MediaType.TEXT_PLAIN)
.get(new GenericType<List<String>>(){});
But this leads to the following exception:
A message body reader for Java class java.util.List,
and Java type java.util.List<java.lang.String>,
and MIME media type text/plain; charset=UTF-8 was not found
If I be even greedier, I would like to know if I can get the individual column values mapped to the properties of my bean, MyBean. I've considered creating a wrapper around MyBean, to have a list of MyBeans, but then how would I annotate it to assist parsing? That would have made sense for an xml/json response. But this is plain text.
Is it possible to somehow tell jersey-client about the parsing of this text/plain response? If this is achievable through Apache HTTP client, I'm ready to move.
Thanks

You may want to implement a class representing your list of beans, say class BeanList extends List<Bean>, and implement a MessageBodyReader<BeanList> (see http://jsr311.java.net/nonav/releases/1.1/javax/ws/rs/ext/MessageBodyReader.html) to teach jersey how to read a string as a BeanList.
Then you can use BeanList.class instead of List<String> as argument to the get call.

How to create a composite Key Field in Apache Solr?

I have an Apache Solr 3.5 setup that has a SchemaXml like this:
<field name="appid" type="string" indexed="true" stored="true" required="true"/>
<field name="docid" type="string" indexed="true" stored="true" required="true"/>
What I would need is a field that concatenates them together and uses that as <uniqueKey>. There seems nothing built-in, short of creating a multi-valued id field and using <copyField>, but it seems uniqueKey requires a single-valued field.
The only reason I need this is to allow clients to blindly fire <add> calls and have Solr figure out if it's an addition or update. Therefore, I don't care too much how the ID looks like.
I assume I would have to write my own Analyzer or Tokenizer? I'm just starting out learning Solr, so I'm not 100% sure what I'd actually need and would appreciate any hints towards what I need to implement.

I would personally give that burden to the users, since it's pretty easy for them adding a field to each document.
Otherwise, you would have to write a few lines of code I guess. You could write your own UpdateRequestProcessorFactory which adds the new field automatically to every input document based on the value of other existing fields. You can use a separator and keep it single value.
On your UpdateRequestProcessor you should override the processAdd method like this:
#Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();
String appid = (String)doc.getFieldValue( "appid" );
String docid = (String)doc.getFieldValue( "docid" );
doc.addField("uniqueid", appid + "-" + docid);
// pass it up the chain
super.processAdd(cmd);
}
Then you should add your UpdateProcessor to your customized updateRequestProcessorChain as the first processor in the chain (solrconfig.xml):
<updateRequestProcessorChain name="mychain" >
<processor class="my.package.MyUpdateRequestProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory" />
</updateRequestProcessorChain>
Hope it works, I haven't tried it. I already did something like this but not with uniqueKey or required fields, that's the only problem you could find. But I guess if you put the updateProcessor at the beginning of the chain, it should work.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert External format to Internal format - java

Related

Select the right architecture for simple java bean application

+ sign being dropped from xml when validation occurs

java: convert HashMap with dynamic keys to Bean

Parsing multi-line, multi-column text response with jersey/apache REST client

How to create a composite Key Field in Apache Solr?

Categories

Resources