Talend iterate on tTikaExtractor

Talend iterate on tTikaExtractor - java

I'm trying to use tTikaExtractor component to extract the content of several files in a folder.
It is working with a single file but when I add a tFileList component, I don't understand how to get the content of the 2 different files.
I think it is something related to flow/iterations but I cannot manage to make it work.
For example, I have this simple job :
tFileList -(iterate)-> tTikaExtractor -(onComponentOk)-> tJava -(row1)-> tFileOutputJSON
In my java component I only have this :
String content = (String) globalMap.get("tTikaExtractor_1_CONTENT");
row1.content=content;
But in my json output I only the content of the last file and not of all files !
Can you help me on this ?

That because you are not appending records to the output it is writing records one by one so eventually only last record is available in file.
Perhaps you can write all the rows to delimited file first then use tFileInputDelimited--main--tFileOutputJSON
to transfer all the rows.

Related

Is there a Java code to convert csv files into pbix?

We need a Java code which automatically converts csv files into pbix files, so they can be opened and further worked on in the PowerBI Desktop. Now, I know PowerBI offers this super cool feature, which converts csv files and many other formats into pbix manually. However, we need a function which automatically converts our reports directly into pbix, so that no intermediate files need to be created and stored somewhere.
We have already been able to develop a function with three parameters: The first one corresponds to the selected report, from our database; the second corresponds to the directory, in which the converted report should be generated; and finally the third one is the converted output file itself. The two first parameters work well and the code is able to generate a copy of any report we select into any directory we select. However, it is able to generate csv files only. Any other format will have the same size as the csv and won't be able to open.
This is what we've tried so far for the conversion part of the code:
Util.writeFile("C:\\" + "test.csv", byteString);
The above piece of code works just fine, however csv is not what we wanted, the original reports are already in csv format anyway.
Util.writeFile("C:\\" + "test.pbix", byteString);
Util.writeFile("C:\\" + "test.pdf", byteString);
Util.writeFile("C:\\" + "test.xlsx", byteString);
Each of the three lines above generates one file in the indicated format, however each of the generated files are just as large as its corresponding csv(but should be much larger) and therefore are unable to open.
File file = new File("C:\\" + "test1.csv");
File file2 = new File("C:\\" + "test1.pbix");
file.renameTo(file2);
The above piece of code does not generate any file at all, but I thought it could be worth mentioning it, as it doesn't throw any exception at all either.
P.S. We would also be interested in a java code which converts csv in any other BI reporting software besides PowerBI, like Tableau, BIRT, Knowage, etc.
P.S.2 The first piece of code uses objects of a class (sailpoint.tools.Util) which is apparently only available for those who have access to Sailpoint.

How do you import a specific element from an external JSON file to an excel file?

I have a long JSON file and i want to copy a specific element from it(i know its name) to an excel file.
eg :: Suppose i want to make an excel file having "Product" (Baleno, i20, Ford Figo etc) imported from a JSON file, how to do it using GET POST or without AJAX.

So, obviously there are ways to write this yourself. What I recommend, however, is using a library (or two. I'd recommend JSON Simple and/or Apache POI) Software engineering is about efficiency, and that includes for the engineer. Using libraries is not shameful. I'd recommend doing that first. Try out using librarys, okay?
-Batista

One simple method I have used, when you only require the content you have in the JSON and if the output needs no formatting!
Create/Construct/Return a CSV File containing the content.
Product,Q1Sales,Q2Sales,Q3SalesQ4Sales
"Baleno",6000,5000,7000,5500
Return the Mimetype Filename as "BalenoSales.xls"
Make the Suffix of the Servlet URL ".xls" as well so Excel/IE likes it.

Forming an XML by extracting values from Excel in Java

My requirement is search for a value (Message ID) in excel and take the other column values (Source and Target) of that row and form a XML.
Say my excel looks like below:
Message ID Output Source Target
#A74104I #A74104O IPT CRD
#A74101 #A74101 IAP CRD
#A74101 #A74101 IAP CRD
#A74104I #A74104O IAP CRD
For e.g. for message ID A74104I extract Source and target and form an XML as below. This messageID repeats and there are 2 source and target which are appended in same XML.
<ApplicationParameters>
<Parms name="default" type="default">
<SGHeader>
<ServiceName>
<TargetApplication>
<IAP>CRD</IAP>
<IPT>CRD</IPT>
</TargetApplication>
</ServiceName>
</SGHeader>
</Parms>
For each messageID create different XML.
If for a particular messageID Source repeats ( e.g. in above excel for A74101 Source IAP is the same) then put this messageID in an exception file which looks like<MessageID>
<A74101/>
</MessageID>

If you want to do it in Java, look here for code on how to parse Excel sheets.
Once that is done, you remain with extraction of MessageId from input file. You can do that in Java, using Regular Expressions. Look here or here for code on how to do it.
If you want to do it using powershell, look at this post. He has almost same requirement as you do (other than that he reads input from console).
You can search through rows/columns as shown in that post. Once a match is found, you extract relevant information and write-out the XML message to an external file.
Once you are able to do that, then you can worry about extracting MessageId from First Input File by coding it as shown in this post.
Does the procedure look like a good-fit for your need?

Merging two .odt files from code

How do you merge two .odt files? Doing that by hand, opening each file and copying the content would work, but is unfeasable.
I have tried odttoolkit Simple API (simple-odf-0.8.1-incubating) to achieve that task, creating an empty TextDocument and merging everything into it:
private File masterFile = new File(...);
...
TextDocument t = TextDocument.newTextDocument();
t.save(masterFile);
...
for(File f : filesToMerge){
joinOdt(f);
}
...
void joinOdt(File joinee){
TextDocument master = (TextDocument) TextDocument.loadDocument(masterFile);
TextDocument slave = (TextDocument) TextDocument.loadDocument(joinee);
master.insertContentFromDocumentAfter(slave, master.getParagraphByReverseIndex(0, false), true);
master.save(masterFile);
}
And that works reasonably well, however it looses information about fonts - original files are a combination of Arial Narrow and Windings (for check boxes), output masterFile is all in TimesNewRoman. At first I suspected last parameter of insertContentFromDocumentAfter, but changing it to false breaks (almost) all formatting. Am I doing something wrong? Is there any other way?

I think this is "works as designed".
I tried this once with a global document, which imports documents and display them as is... as long as paragraph styles have different names !
Using same named templates are overwritten with the values the "master" document have.
So I ended up cloning standard styles with unique (per document) names.
HTH

Ma case was a rather simple one, files I wanted to merge were generated the same way and used the same basic formatting. Therefore, starting off of one of my files, instead of an empty document fixed my problem.
However this question will remain open until someone comes up with a more general solution to formatting retention (possibly based on ngulams answer and comments?).

Load txt's file into Java application and save it to XML's file

I read the next answer about load file into java application.
I need to write a program that load .txt, which contains a list of records. After I parse it, I need to match the records (with conditions that I will check), and save the result to XML's file.
I am stuck on this issue, and I will happy for answer to next questions:
How I load the .txt file into Java?
After I load the file, how I can acsses to the information into it? for example, How I can asked if the first line of one of the records is equal to "1";
How I export the result to XML's file.

one: you need a sample-code for reading a file line by line
two: the split-method of a string might be helpful. For instance getting the number of the first element if information is seperated by a space
String myLine;
String[] components = myLine.split(" ");
if(components != null && components.length >= 1) {
int num = Integer.parseInt(components[0]);
....
}
three: you can just write it like any text-file, or use any XML-Writer you want

Basic I/O
Integer.parseInt(1stLine)
There are a plethora of choices.
Create POJO's to represent the records and write them using XMLEncoder
SAX
DOM..

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Talend iterate on tTikaExtractor - java

That because you are not appending records to the output it is writing records one by one so eventually only last record is available in file. Perhaps you can write all the rows to delimited file first then use tFileInputDelimited--main--tFileOutputJSON to transfer all the rows.

Related

Is there a Java code to convert csv files into pbix?

How do you import a specific element from an external JSON file to an excel file?

Forming an XML by extracting values from Excel in Java

Merging two .odt files from code

Load txt's file into Java application and save it to XML's file

Categories

Resources