Forming an XML by extracting values from Excel in Java - java

My requirement is search for a value (Message ID) in excel and take the other column values (Source and Target) of that row and form a XML.
Say my excel looks like below:
Message ID Output Source Target
#A74104I #A74104O IPT CRD
#A74101 #A74101 IAP CRD
#A74101 #A74101 IAP CRD
#A74104I #A74104O IAP CRD
For e.g. for message ID A74104I extract Source and target and form an XML as below. This messageID repeats and there are 2 source and target which are appended in same XML.
<ApplicationParameters>
<Parms name="default" type="default">
<SGHeader>
<ServiceName>
<TargetApplication>
<IAP>CRD</IAP>
<IPT>CRD</IPT>
</TargetApplication>
</ServiceName>
</SGHeader>
</Parms>
For each messageID create different XML.
If for a particular messageID Source repeats ( e.g. in above excel for A74101 Source IAP is the same) then put this messageID in an exception file which looks like<MessageID>
<A74101/>
</MessageID>

If you want to do it in Java, look here for code on how to parse Excel sheets.
Once that is done, you remain with extraction of MessageId from input file. You can do that in Java, using Regular Expressions. Look here or here for code on how to do it.
If you want to do it using powershell, look at this post. He has almost same requirement as you do (other than that he reads input from console).
You can search through rows/columns as shown in that post. Once a match is found, you extract relevant information and write-out the XML message to an external file.
Once you are able to do that, then you can worry about extracting MessageId from First Input File by coding it as shown in this post.
Does the procedure look like a good-fit for your need?

Related

How to fetch the conversation id from a .msg file using MAPIMessage

I'm trying to parse a .msg file. How do I get the conversation id?
I'm using org.apache.poi.hsmf.MAPIMessage
The structure of the MSG file format is described in the MSDN library, see [MS-OXMSG]: Outlook Item (.msg) File Format.
It is stored in the 0x0F030102 property - you can see if it is set in a particular MSG file in OutlookSpy (I am its author - click OpenIMsgOnIStg button).

How to check for missing Key in JSON using Pig?

I have a JSON file with varying schema.
{"asin":"xxxxxx", "title":"xxxsomething"}
{"asin":"yyyyy"}
{"asin":"zzzzzz", "title":"zzzsomething"}
For which I have written a pig script that makes use of twitter's elephant-bird library to load the JSON data and convert it into a tab separated file.
However if a line in the input JSON file is missing the "title" key (line# 2 in above example), the tvs file also has nothing in place of it, like:
xxxxxx xxxsomething
yyyyyy
zzzzzz zzzsomething
I would like to give custom default value if a particular key is missing. How can I do this using PigLatin?
expected output:
xxxxxx xxxsomething
yyyyyy default_string
zzzzzz zzzsomething
Here's my script:
REGISTER elephant-bird-elephant-bird-4.13/pig/target/elephant-bird-pig-4.13.jar;
REGISTER elephant-bird-elephant-bird-4.13/hadoop-compat/target/elephant-bird-hadoop-compat-4.13.jar;
REGISTER elephant-bird-elephant-bird-4.13/core/target/elephant-bird-core-4.13-thrift9.jar;
reviews = load '../data/Amazon/meta_Amazon_Instant_Video.json'
using com.twitter.elephantbird.pig.load.JsonLoader();
tabs = FOREACH reviews generate (chararray)$0#'asin' as asin_new, (chararray)$0#'title';
A = ORDER tabs BY asin_new;
DESCRIBE A;
STORE A INTO 'hdfs://localhost:9000/meta_Amazon_Instant_Video.tsv';
You can simply write a UDF for that and put the condition that if either one of them is empty then pass the default string.

Talend iterate on tTikaExtractor

I'm trying to use tTikaExtractor component to extract the content of several files in a folder.
It is working with a single file but when I add a tFileList component, I don't understand how to get the content of the 2 different files.
I think it is something related to flow/iterations but I cannot manage to make it work.
For example, I have this simple job :
tFileList -(iterate)-> tTikaExtractor -(onComponentOk)-> tJava -(row1)-> tFileOutputJSON
In my java component I only have this :
String content = (String) globalMap.get("tTikaExtractor_1_CONTENT");
row1.content=content;
But in my json output I only the content of the last file and not of all files !
Can you help me on this ?
That because you are not appending records to the output it is writing records one by one so eventually only last record is available in file.
Perhaps you can write all the rows to delimited file first then use tFileInputDelimited--main--tFileOutputJSON
to transfer all the rows.

Capture generated output file path and name using CSSDK

We are in the process of converting over to using the XSLT compiler for page generation. I have a Xalan Java extention to exploit the CSSDK and capture some meta data we have stored in the Extended Attributes for output to the page. No problems in getting the EA's rendered to the output file.
The problem is that I don't know how to dynamically capture the file path and name of the output file.
So just as POC, I have the CSVPath hard coded to the output file in my Java extension. Here's a code sample:
CSSimpleFile sourceFile = (CSSimpleFile)client.getFile(new CSVPath("/some-path-to-the-output.jsp"));
Can someone point me in the CSSDK to where I could capture the output file?
I found the answer.
First, get or create your CSClient. You can use the examples provided in the cssdk/samples. I tweaked one so that I captured the CSClient in the method getClientForCurrentUser(). Watch out for SOAP vs Java connections. In development, I was using a SOAP connection and for the make_toolkit build, the Java connection was required for our purposes.
Check the following snippet. The request CSClient is captured in the static variable client.
CSSimpleFile sourceFile = (CSSimpleFile)client.getFile(new CSVPath(XSLTExtensionContext.getContext().getOutputDirectory().toString() + "/" + XSLTExtensionContext.getContext().getOutputFileName()));

SOAP: Reading SOAP response's embedded file

In SOAP Client application. I am using javax.xml.soap api. I am getting the soap response. A part of it, shown below.
<ns5:XXX type="Full" format="HTML">
<ns5:EmbeddedFile MIMEType="text/html"
fileExtension="html"
fileName="ZZZ.html">
<ns5:Document>...</ns5:Document>
</ns5:EmbeddedFile>
</ns5:XXX>
The value between the Document tag is in the Base64 format.
I need to know two things, as in the above code you will see that, the fileName is zzz.html.
where this zzz.html file will stored or exits. I search for in my local machine i do not find.
Another thing i would like to know that the between the Document tags it show long text messages in the Base64 format. Is this is the document that exists in the zzz.html. If it is so how to read that document.
Thanks
This appears to be a custom way of embedding file content to a SOAP message being used by the service you are calling - a standard way of doing this would have been using Soap Attachments.
In this specific case, it does look like the file content is being embedded as Base64 data between the Document tags, and the meta information of the file is the attributes of EmbeddedFile tag. You will basically have to decode the Base64 encoded content - see here
and here on how to, move the contents to a file with the name in the meta information tag.

Categories