How to print pretty JSON using docx4j into a word document? - java

I want to print a simple pretty json string (containing multiple line breaks - many \n) into a word document. I tried the following but docx4j just prints all the contents inline in one single line (without \n). Ideally it should print multiline pretty json as it is recognising the "\n" the json string contains :
1)
wordMLPackage.getMainDocumentPart().addParagraphOfText({multiline pretty json String})
2)
ObjectFactory factory = Context.getWmlObjectFactory();
P p = factory.createP();
Text t = factory.createText();
t.setValue(text);
R run = factory.createR();
run.getContent().add(t);
p.getContent().add(run);
PPr ppr = factory.createPPr();
p.setPPr(ppr);
ParaRPr paraRpr = factory.createParaRPr();
ppr.setRPr(paraRpr);
wordMLPackage.getMainDocumentPart().addObject(p);
Looking for help. Thanks.

The docx file format doesn't treat \n as a newline.
So you'll need to split your string on \n, and either create a new P, or use w:br, like so:
Br br = wmlObjectFactory.createBr();
run.getContent().add( br);

Related

ZWNBSP appears when parsing CSV

I have a CSV and I want to check if it has all the data it should have. But it looks like ZWNBSP appears at the beginning of the 1st column name in the 1st string.
My simplified code is
#Test
void parseCsvTest() throws Exception {
Configuration.holdBrowserOpen = true;
ClassLoader classLoader = getClass().getClassLoader();
try (
InputStream inputStream = classLoader.getResourceAsStream("files/csv_example.csv");
CSVReader reader = new CSVReader(new InputStreamReader(inputStream))
) {
List<String[]> content = reader.readAll();
var csvStrings0line = content.get(0);
var csv1stElement = csvStrings0line[0];
var csv1stElementShouldBe = "Timestamp";
assertEquals(csv1stElementShouldBe,csv1stElement);
My CSV contains
"Timestamp","Source","EventName","CountryId","Platform","AppVersion","DeviceType","OsVersion"
"2022-05-02T14:56:59.536987Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 11","15.4.1"
"2022-05-02T14:57:35.849328Z","courierapp","order_delivered_sent","643","ios","3.11.0","iPhone 8","15.3.1"
My test fails with
expected: <Timestamp> but was: <Timestamp>
Expected :Timestamp
Actual :Timestamp
<Click to see difference>
Clicking on the see difference shows that there is a ZWNBSP at the beginning of the Actual text.
Copypasting my text to the online tool for displaying non-printable unicode characters https://www.soscisurvey.de/tools/view-chars.php shows only CR LF at the ends of the lines, no ZWNBSPs.
But where does it come from?
It's a BOM character. You may remove it yourself or use several other solutions (see https://stackoverflow.com/a/4897993/1420794 for instance)
That is the Unicode zero-width no-break space character. When used at the beginning of Unicode encoded text files, it serves as a 'byte-order-mark' . You read it to determine the encoding of the text file, then you can safely discard it if you want. The best thing you can do is spread awareness.

Univocity CSV parser glues the whole line if it begins with quote "

I'm using univocity 2.7.5 to parse csv file. Till now it worked fine and parsed a row in csv file as String array with n elements, where n = number of columns in a row. But now i have a file, where rows start with quote " and the parser cannot handle it. It returns a row as String array with only one element which contains whole row data. I tried to remove that quote from csv file and it worked fine, but there are about 500,000 rows. What should i do to make it work?
Here is the sample line from my file (it has quotes in source file too):
"100926653937,Kasym Amina,620414400630,Marzhan Erbolova,""Kazakhstan, Almaty, 66, 3"",87029845662"
And here's my code:
CsvParserSettings settings = new CsvParserSettings();
settings.setDelimiterDetectionEnabled(true);
CsvParser parser = new CsvParser(settings);
List<String[]> rows = parser.parseAll(csvFile);
Author of the library here. The input you have there is a well-formed CSV, with a single value consisting of:
100926653937,Kasym Amina,620414400630,Marzhan Erbolova,"Kazakhstan, Almaty, 66, 3",87029845662
If that row appeared in the middle of your input, I suppose your input has unescaped quotes (somewhere before you got to that line). Try playing with the unescaped quote handling setting:
For example, this might work:
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
If nothing works, and all your lines look like the one you posted, then you can parse the input twice (which is shitty and slow but will work):
CsvParser parser = new CsvParser(settings);
parser.beginParsing(csvFile);
List<String[]> out = new ArrayList<>();
String[] row;
while ((row = parser.parseNext()) != null) {
//got a row with unexpected length?
if(row.length == 1){
//break it down again.
row = parser.parseLine(row[0]);
}
out.add(row);
}
Hope this helps.

CSVParser processes LF as CRLF

I am trying to parse a CSV file as below
String NEW_LINE_SEPARATOR = "\r\n";
CSVFormat csvFileFormat = CSVFormat.DEFAULT.withRecordSeparator(NEW_LINE_SEPARATOR);
FileReader fr = new FileReader("201404051539.csv");
CSVParser csvParser = csvFileFormat.withHeader().parse(fr);
List<CSVRecord> recordsList = csvParser.getRecords();
Now the file got normal lines ending with CRLF characters however for few lines there is additional LF character appearing in middle.
i.e.
a,b,c,dCRLF --line1
e,fLF,g,h,iCRLF --line2
Due to this, the parse operation creates three records whereas actually they are only two.
Is there a way I can get the LF character appearing in middle of second line not treated as line break and get two records only upon parsing?
Thanks
I think uniVocity-parsers is the only parser you will find that will work with line endings as you expect.
The equivalent code using univocity-parsers will be:
CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial
settings.getFormat().setLineSeparator("\r\n");
settings.getFormat().setNormalizedNewline('\u0001'); //uses a special character to represent a new record instead of \n.
settings.setNormalizeLineEndingsWithinQuotes(false); //does not replace \r\n by the normalized new line when reading quoted values.
settings.setHeaderExtractionEnabled(true); //extract headers from file
settings.trimValues(false); //does not remove whitespaces around values
CsvParser parser = new CsvParser(settings);
List<Record> recordsList = parser.parseAllRecords(new File("201404051539.csv"));
If you define a line separator to be \r\n then this is the ONLY sequence of characters that should identify a new record (when outside quotes). All values can have either \r or \n without being enclosed in quotes because that's NOT the line separator sequence.
When parsing the input sample you gave:
String input = "a,b,c,d\r\ne,f\n,g,h,i\r\n";
parser.parseAll(new StringReader(input));
The result will be:
LINE1 = [a, b, c, d]
LINE2 = [e, f
, g, h, i]
Disclosure: I'm the author of this library. It's open-source and free (Apache 2.0 license)

qt project code to xml file transformation

i want to convert some qt project files to xml with c++ or java
for exemple the code do this transformation :
TextInput {
id: textInput2
x: 247
y: 161
width: 80
height: 20
}
to have :
< TextInput >
< id> textInput2< /id>
< x> 247< /x>
< y> 161< /y>
< width> 80< /width>
< height> 20 < /height>
< /TextInput >
do you have some ideas to do this ? what technology i have to apply to transform qt to xml ?
edit : i tried SAX XML PARSER but the code don't know how to read .
thanks
Surely there's already a lib which makes it possible, but I don't know it, so, if you want to do it by code, you can try to read it as plain text and do translation manually using BufferedReader and a few loops.
Try this:
BufferedReader qtIn = new BufferedReader(new FileReader("example.qt")); //I don't know if you can read it as plain text straight.
String tag
String metaTag
String lineIn
String lineOut
BufferedWriter writer = new BufferedWriter(new FileWriter("example.xml"));
//here you should use writer to write down the heading of the xml file.
while ((lineIn = qtIn.readLine()) != null) { // while loop begins here. lineIn is the string where reader stores current line read.
if (lineIn.charAt(lineIn.length() - 1) == "{"){ //if line's last character is an opening brace ({)
metaTag = lineIn.subString(0, lineIn.length() - 1).trim(); //we store it in string metaTag
lineOut = "<"+metaTag+">\n"; //and write metaTag as opening XML tag
writer.write (lineOut,0,lineOut.length());
}else if (lineIn.trim() == "}"){ //else, if it's a closing brace (})
lineOut = "</"+metaTag+">\n"; //we write metaTag as closing XML tag
writer.write (lineOut,0,lineOut.length());
}else{ // if it's not an opening or closing brace
String[] element = lineIn.split(":"); //we split the line in element name and element data using the colon as splitter. don't forget to use trim method on both parts.
tag = element[0].trim(); //this is optional, you can replace it by using element[0].trim() instead in the next line, I added it just to make it clearer
lineOut = "<" + tag + ">" + element[1].trim() + "</" + tag +">\n" // here, we take two element parts and write them as XML tag and plain text.
writer.write (lineOut,0,lineOut.length());
}
} // end while
//here you should write the footing of the file, if there's any.
writer.close(); // don't forget to close writer
I think I'm not missing anything. Don't forget to close your file. Also, I'm parting from the concept that the structure of the qt & xml file may vary and it has not complex child nodes. If you know beforehand the structure of the qt file it's a lot easier, and you may use DOM parser to write XML.

StandardAnalyzer - Apache Lucene

I'm actually developing a system where you input some text files to a StandardAnalyzer, and the contents of that file are then replaced by the output of the StandardAnalyzer (which tokenizes and removes all the stop words). The code ive developed till now is :
File f = new File(path);
TokenStream stream = analyzer.tokenStream("contents",
new StringReader(readFileToString(f)));
CharTermAttribute charTermAttribute = stream.getAttribute(CharTermAttribute.class);
while (stream.incrementToken()) {
String term = charTermAttribute.toString();
System.out.print(term);
}
//Following is the readFileToString(File f) function
StringBuilder textBuilder = new StringBuilder();
String ls = System.getProperty("line.separator");
Scanner scanner = new Scanner(new FileInputStream(f));
while (scanner.hasNextLine()){
textBuilder.append(scanner.nextLine() + ls);
}
scanner.close();
return textBuilder.toString();
The readFileToString(f) is a simple function which converts the file contents to a string representation.
The output i'm getting are the words each with the spaces or the new line between them removed. Is there a way to preserve the original spaces or the new line characters after the analyzer output, so that i can replace the original file contents with the filtered contents of the StandardAnalyzer and present it in a readable form?
Tokenizers save the term position, so in theory you could look at the position to determine how many characters there are between each token, but they don't save the data which was between the tokens. So you could get back spaces, but not newlines.
If you're comfortable with JFlex you could modify the tokenizer to treat newlines as a token. That's probably harder than any gain you'd get from it though.

Categories