Avoid namespace while Parsing xml with woodstox

Avoid namespace while Parsing xml with woodstox - java

I am trying to parse an xml File and remove namespaces and prefix using woodstox parser(the xml contains nested elements and each element contains namespace at every level)
Below is the code i use to parse.I get the same input as i pass.Please help in resolving the issue
byte[] byteArray = null;
try {
File file = new File(xmlFileName);
byteArray = new byte[(int) file.length()];
byteArray = FileUtils.readFileToByteArray(file);
} catch (Exception e) {
e.printStackTrace();
}
InputStream articleStream = new ByteArrayInputStream(byteArray);
WstxInputFactory xmlInputFactory = (WstxInputFactory) XMLInputFactory.newInstance();
xmlInputFactory.configureForSpeed();
// xmlInputFactory.configureForXmlConformance();
XMLStreamReader2 xmlStreamReader = (XMLStreamReader2) xmlInputFactory.createXMLStreamReader(articleStream,
StandardCharsets.UTF_8.name());
xmlStreamReader.setProperty(XMLInputFactory.IS_COALESCING, true);
WstxOutputFactory xmloutFactory = (WstxOutputFactory) XMLOutputFactory2.newInstance();
StringWriter sw = new StringWriter();
XMLEventWriter xw = null;
XMLStreamWriter2 xmlwriter = (XMLStreamWriter2) xmloutFactory.createXMLStreamWriter(sw,
StandardCharsets.UTF_8.name());
xmlwriter.setNamespaceContext(new NamespaceContext() {
#Override
public String getNamespaceURI(String prefix) {
return "";
}
#Override
public String getPrefix(String namespaceURI) {
return "";
}
#Override
public Iterator getPrefixes(String namespaceURI) {
return null;
}
});
while (xmlStreamReader.hasNext()) {
xmlStreamReader.next();
xmlwriter.copyEventFromReader(xmlStreamReader, false);
}
System.out.println("str" + xmlwriter.getNamespaceContext().getPrefix(""));
xmlwriter.closeCompletely();
xmlwriter.flush();
xmlStreamReader.closeCompletely();
xmlStreamReader.close();

If you want to remove all namespace prefixes and bindings, you should NOT use copy methods -- they will literally copy those things. Instead read element and attribute names, but only write out using "local name"s, and leave namespaceURI and prefix as nulls (or use methods that only take local name).

Related

Mustang Library: Converting invalid PDFA3 to valid PDFA3 for building Factur-x

I'm trying to create a facturX using Mustang Library in a webservice. This web service accept a xml string and a base64 PDF.
My issue is that i have no "knowledge" about PDF format that is sent to me. In my service layer class, I build my facturx using ZUGFeRDExporterFromA1.
#Override
public FacturxDto createFacturX(FacturxDto facturxDto) {
context.setContext(facturxDto);
if (facturxDto.getVersion() == null) {
facturxDto.setVersion(2);
}
if(facturxDto.getPdfDocument() == null) {
throw new AppServiceException("Pdf is required in the payload");
}
if(facturxDto.getXml() == null) {
throw new AppServiceException("Xml is required in the payload");
}
if ((facturxDto.getVersion() < 1) || (facturxDto.getVersion() > 2)) {
throw new AppServiceException("invalid version");
}
try {
Utils.facturxValidator(facturxDto);
} catch (SAXException | IOException e) {
throw new AppServiceException(e.getMessage());
}
ByteArrayOutputStream output = new ByteArrayOutputStream();
log.debug("Converting to PDF/A-3u");
PDFAConformanceLevel pdfaConformanceLevel = Utils.setPdfaConformanceLevel(facturxDto);
// System.out.println(Arrays.toString(facturxDto.getPdfDocument().getBytes(StandardCharsets.UTF_8)));
byte[] xmlData = facturxDto.getXml().getBytes(StandardCharsets.UTF_8);
byte[] pdfData = Base64.getDecoder().decode(facturxDto.getPdfDocument().getBytes(StandardCharsets.UTF_8));
try {
ZUGFeRDExporterFromA1 ze = new ZUGFeRDExporterFromA1()
.setProducer("Mustang LIB")
.setCreator("ME")
.setProfile(facturxDto.getFxLevel())
.setZUGFeRDVersion(facturxDto.getVersion())
.setConformanceLevel(pdfaConformanceLevel)
.ignorePDFAErrors()
.load(pdfData);
ze.attachFile("factur-x.xml", xmlData, "text/xml", "Data");
ze.setXML(xmlData);
log.debug("Attaching ZUGFeRD-Data");
ze.disableAutoClose(true);
ze.export(output);
byte[] bytes = output.toByteArray();
InputStream inputStream = new ByteArrayInputStream(bytes);
byte[] pdfBytes = IOUtils.toByteArray(inputStream);
String encoded = Base64.getEncoder().encodeToString(pdfBytes);
// persist data in db and generate id
ModelMapper modelMapper = new ModelMapper();
modelMapper.getConfiguration().setMatchingStrategy(MatchingStrategies.STRICT);
FacturxEntity facturxEntity = modelMapper.map(facturxDto, FacturxEntity.class);
facturxEntity.setStatus(RequestOperationStatus.SUCCESS.name());
facturxEntity.setCreatedAt(new Date());
facturxEntity.setFacturxId(Utils.generateId());
FacturxEntity storedFacturx = facturxRepository.save(facturxEntity);
FacturxDto returnValue = modelMapper.map(storedFacturx, FacturxDto.class);
returnValue.setPdfDocument(encoded);
return returnValue;
} catch (IOException e) {
e.printStackTrace();
throw new AppServiceException(e.getMessage());
}
}
My issue is here :
ZUGFeRDExporterFromA1 ze = new ZUGFeRDExporterFromA1()
.setProducer("Mustang LIB")
.setCreator("ME")
.setProfile(facturxDto.getFxLevel())
.setZUGFeRDVersion(facturxDto.getVersion())
.setConformanceLevel(pdfaConformanceLevel)
.ignorePDFAErrors()
.load(pdfData);
If i don't use ignorePDFAErrors() I do have an exception thrown.
If i use it, my pdf is not PDFA compliant. And it's an issue.
Is there a way to convert on the fly an invalid PDFA to a valid one. Thanks

You can e.g. use Mustang's REST API, Mustang Server https://www.mustangproject.org/server/ to correct and export any input PDF file as PDF/A, which also includes files which are already PDF/A-1.
kind regards
Jochen

In Java, how do you deal with double quote inside of a CSV that you need to parse

here is what I want to do,
This my spend.csv file :
"Date","Description","Detail","Amount"
"5/03/21","Cinema","Batman","7.90"
"15/02/20","Groceries","Potatoes","23.00"
"9/12/21","DIY","Wood Plates","33.99"
"9/07/22","Fuel","Shell","$56.00"
"23/08/19","Lamborghini","Aventador","800,000.00"
From a table view :
Table View of the csv
And here is what I want as my output file named spend.xml :
<?xml version="1.0" encoding="UTF-8"?>
<SPEND>
<RECORD DATE="5/03/21">
<DESC>Cinema</DESC>
<DETAIL>Batman</DETAIL>
<AMOUNT>7.90</AMOUNT>
</RECORD>
<RECORD DATE="15/02/20">
<DESC>Groceries</DESC>
<DETAIL>Potatoes</DETAIL>
<AMOUNT>23.00</AMOUNT>
</RECORD>
<RECORD DATE="9/12/21">
<DESC>DIY</DESC>
<DETAIL>Wood Plates</DETAIL>
<AMOUNT>33.99</AMOUNT>
</RECORD>
<RECORD DATE="9/07/22">
<DESC>Fuel</DESC>
<DETAIL>Shell</DETAIL>
<AMOUNT>$56.00</AMOUNT>
</RECORD>
<RECORD DATE="23/08/19">
<DESC>Lamborghini</DESC>
<DETAIL>Aventador</DETAIL>
<AMOUNT>800,000.00</AMOUNT>
</RECORD>
</SPEND>
In order to do that, I found some stuff here and there and managed to get this :
public class Main {
public static void main(String[] args) throws FileNotFoundException {
List<String> headers = new ArrayList<String>(5);
File file = new File("spend.csv");
BufferedReader reader = null;
try {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder = domFactory.newDocumentBuilder();
Document newDoc = domBuilder.newDocument();
// Root element
Element rootElement = newDoc.createElement("XMLCreators");
newDoc.appendChild(rootElement);
reader = new BufferedReader(new FileReader(file));
int line = 0;
String text = null;
while ((text = reader.readLine()) != null) {
StringTokenizer st = new StringTokenizer(text, "", false);
int index = 0;
String[] rowValues = text.split(",");
if (line == 0) { // Header row
for (String col : rowValues) {
headers.add(col);
}
} else { // Data row
Element rowElement = newDoc.createElement("RECORDS");
rootElement.appendChild(rowElement);
for (int col = 0; col < headers.size(); col++) {
String header = headers.get(col);
String value = null;
if (col < rowValues.length) {
value = rowValues[col];
} else {
value = "";
}
Element curElement = newDoc.createElement(header);
curElement.appendChild(newDoc.createTextNode(value));
rowElement.appendChild(curElement);
}
}
line++;
}
ByteArrayOutputStream baos = null;
OutputStreamWriter osw = null;
try {
baos = new ByteArrayOutputStream();
osw = new OutputStreamWriter(baos);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
aTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
aTransformer.setOutputProperty(OutputKeys.METHOD, "xml");
aTransformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
Source src = new DOMSource(newDoc);
Result result = new StreamResult(osw);
aTransformer.transform(src, result);
osw.flush();
System.out.println(new String(baos.toByteArray()));
} catch (Exception exp) {
exp.printStackTrace();
} finally {
try {
osw.close();
} catch (Exception e) {
}
try {
baos.close();
} catch (Exception e) {
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
At this point the programm should print in the terminal the XML file but;
Sadly, because of the double quotes of each value in my CSV file, I'm having this issue :
java org.w3c.dom.domexception invalid_character_err an invalid or illegal xml character is specified
I think I'm missing something around those lines :
StringTokenizer st = new StringTokenizer(text, "", false);
int index = 0;
String[] rowValues = text.split(",");
I would like to keep the double quotes in my CSV, if anyone as an idea feel free to tell me please!

Before you run your conversion, do a
String.replaceAll("\"", "####")
Then run the conversion and when it is complete, reverse it and replace all the "####" in the string with double quotes

Another possible approach using OpenCsv and Jackson:
public class FileProcessor {
public static void main(String[] args) throws IOException {
List<DataStructure> importList = new CsvToBeanBuilder<DataStructure>(
new FileReader("pathIn"))
.withIgnoreEmptyLine(true)
.withType(DataStructure.class)
.build()
.parse();
ListLoader exportList = new ListLoader(importList);
XmlMapper xmlMapper = new XmlMapper();
xmlMapper.configure(ToXmlGenerator.Feature.WRITE_XML_DECLARATION, true)
.enable(SerializationFeature.INDENT_OUTPUT)
.writeValue(new File("pathOut"), exportList);
}
}
Class to serialize each element:
#Data
public class DataStructure {
#CsvBindByName
#JacksonXmlProperty(isAttribute = true, localName = "DATE")
private String date;
#CsvBindByName
#JacksonXmlProperty(localName = "DESC")
private String description;
#CsvBindByName
#JacksonXmlProperty(localName = "DETAIL")
private String detail;
#CsvBindByName
#JacksonXmlProperty(localName = "AMOUNT")
private String amount;
}
Class to serialize full list:
#JacksonXmlRootElement(localName = "SPEND")
public class ListLoader {
#JacksonXmlElementWrapper(useWrapping = false)
#JacksonXmlProperty(localName = "RECORD")
private List<DataStructure> list;
public ListLoader(List<DataStructure> list){
this.list = list;
}
}

Error: Caused by: com.sun.istack.internal.SAXException2: class java.util.LinkedHashMap not found nor either of his superclass in this context

I want to convert a XML to a JSON and after some process returning to a valid XML with the DTD schema.
I have this method that returns a JSONObject:
public JSONObject xml2JSON(InputStream xml) throws IOException, JDOMException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int len;
while ((len = xml.read(buffer)) > -1 ) {
baos.write(buffer, 0, len);
}
baos.flush();
InputStream is1 = new ByteArrayInputStream(baos.toByteArray());
InputStream is2 = new ByteArrayInputStream(baos.toByteArray());
String s = input2String(is1);
if(validationDTD(is2)) {
return XML.toJSONObject(s);
}
return null;
}
public Boolean validationDTD(InputStream xml) throws JDOMException, IOException {
try {
SAXBuilder builder = new SAXBuilder(XMLReaders.DTDVALIDATING);
Document validDocument = builder.build(xml);
validDocument.getDocType();
return true;
} catch (JDOMException e) {
return false;
} catch (IOException e) {
return false;
}
}
public String input2String(InputStream inputStream) throws IOException {
return IOUtils.toString(inputStream, Charset.defaultCharset());
}
And this method that returns the proper xml:
public String JSONtoXML(JSONObject jsonObject) {
String finalString = DOCTYPE.concat(XML.toString(jsonObject));
return finalString;
}
with a variable for adding the DTD:
private static final String DOCTYPE = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"<!DOCTYPE ep-request SYSTEM \"myDtd.dtd\">";
I have this tests:
#Test
public void xml2JSONShouldReturnString() throws IOException, JDOMException {
InputStream xmlInputString = this.getClass().getClassLoader().getResourceAsStream("myXmlDtd.xml");
service.xml2JSON(xmlInputString);
}
#Test
public void validateDTDShouldReturnDocument() throws IOException, JDOMException {
InputStream xmlInputString = this.getClass().getClassLoader().getResourceAsStream("myXmlDtd.xml");
Assert.assertEquals(true, service.validationDTD(xmlInputString));
}
#Test
public void JSON2toxmlShouldReturnValidXML() throws IOException, JDOMException {
InputStream xmlInputString = this.getClass().getClassLoader().getResourceAsStream("myXmlDtd.xml");
JSONObject jsonObject = service.xml2JSON(xmlInputString);
String xmlOut = eblService.JSONtoXML(jsonObject);
Assert.assertEquals(true, service.validationDTD(new ByteArrayInputStream(xmlOut.getBytes())));
}
But the last one fails because the xml isn´t in the correct format of my DTD.
How can I make a valid XML (that matches the DTD)?
EDIT:
Now I'm parsing the XML to POJO (generated with xjc -dtd mydtd.dtd) and POJO to JSON, and viceversa.
But I'm having troubles with POJO to XML serialization because My POJO contains:
#XmlElements({
#XmlElement(name = "file-reference-id", required = true, type =
FileReferenceId.class),
#XmlElement(name = "request-petition", required = true, type =
RequestPetition.class)
})
protected List<Object> fileRefenceIdOrRequestPetition;
the problem appears when my POJO contains a List of LinkedHashMap and returns that LinkedHashMap isn't in the JAXBContext, but if I change the type of my class to LinkedHashMap.class it misses the context of my FileReferenceId.class or whatever class that it its contained into the linkedHashMap.class

There are many different libraries for converting JSON to XML and they all produce different answers, with different strengths and weaknesses. (For example, they all have different solutions to the problem of handling JSON keys that aren't valid XML names.) Generally they don't give you much control over the format of the XML, which means that you typically have to transform the generated XML to the format you actually want, e.g. with an XSLT stylesheet. That will certainly be the case if you have a specific DTD that the XML has to conform to.
Note: The json-to-xml() function in XSLT 3.0 produces XML that directly reflects the JSON grammar, with constructs like
<map>
<string key="first">John</string>
<string key="last">Smith</string>
</map>
The idea here is that users will always want to transform this into their desired target format, and since you're in XSLT already, this transformation poses no problems.

convert a word documents to HTML with embedded images by TIKA

I'm new in TIKA. I try to convert Microsoft word documents to HTML by using Tika. I'm using TikaOnDotNet wrapper to used TIKA on .Net framework. My conversion code is like following:
byte[] file = Files.toByteArray(new File(#"myPath\document.doc"));
AutoDetectParser tikaParser = new AutoDetectParser();
ByteArrayOutputStream output = new ByteArrayOutputStream();
SAXTransformerFactory factory = (SAXTransformerFactory)TransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, "UTF-8");
handler.setResult(new StreamResult(output));
ExpandedTitleContentHandler handler1 = new ExpandedTitleContentHandler(handler);
tikaParser.parse(new ByteArrayInputStream(file), handler1, new Metadata());
File ofile = new File(#"C:\toHtml\text.html");
ofile.createNewFile();
DataOutputStream stream = new DataOutputStream(new FileOutputStream(ofile));
output.writeTo(stream);
everything working well except the embedded images. The generated HTML contains image tag like:
<img src="embedded:image2.wmf" alt="image2.wmf"/>
but the image source does not exists. Please advise me

Credits goes to #Gagravarr.
please note that this is a simple implementation of code, the original codes are available in comment of the questions.
This implementation is based on TikaOnDotNet wrapper.....
public class DocToHtml
{
private TikaConfig config = TikaConfig.getDefaultConfig();
public void Convert()
{
byte[] file = Files.toByteArray(new File(#"filename.doc"));
AutoDetectParser tikaParser = new AutoDetectParser();
ByteArrayOutputStream output = new ByteArrayOutputStream();
SAXTransformerFactory factory = (SAXTransformerFactory)TransformerFactory.newInstance();
var inputStream = new ByteArrayInputStream(file);
// ToHTMLContentHandler handler = new ToHTMLContentHandler();
var metaData = new Metadata();
EncodingDetector encodingDetector = new UniversalEncodingDetector();
var encode = encodingDetector.detect(inputStream, metaData) ?? new UTF_32();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, encode.toString());
handler.setResult(new StreamResult(output));
ContentHandler imageRewriting = new ImageRewritingContentHandler(handler);
// ExpandedTitleContentHandler handler1 = new ExpandedTitleContentHandler(handler);
ParseContext context = new ParseContext();
context.set(typeof(EmbeddedDocumentExtractor), new FileEmbeddedDocumentEtractor());
tikaParser.parse(inputStream, imageRewriting, new Metadata(), context);
byte[] array = output.toByteArray();
System.IO.File.WriteAllBytes(#"C:\toHtml\text.html", array);
}
private class ImageRewritingContentHandler : ContentHandlerDecorator
{
public ImageRewritingContentHandler(ContentHandler handler) : base(handler)
{
}
public override void startElement(string uri, string localName, string name, Attributes origAttrs)
{
if ("img".Equals(localName))
{
AttributesImpl attrs;
if (origAttrs is AttributesImpl)
attrs = (AttributesImpl)origAttrs;
else
attrs = new AttributesImpl(origAttrs);
for (int i = 0; i < attrs.getLength(); i++)
{
if ("src".Equals(attrs.getLocalName(i)))
{
String src = attrs.getValue(i);
if (src.StartsWith("embedded:"))
{
var newSrc = src.Replace("embedded:", #"images\");
attrs.setValue(i, newSrc);
}
}
}
attrs.addAttribute(null, "width", "width","width", "100px");
base.startElement(uri, localName, name, attrs);
}
else
base.startElement(uri, localName, name, origAttrs);
}
}
private class FileEmbeddedDocumentEtractor : EmbeddedDocumentExtractor
{
private int count = 0;
public bool shouldParseEmbedded(Metadata m)
{
return true;
}
public void parseEmbedded(InputStream inputStream, ContentHandler contentHandler, Metadata metadata, bool outputHtml)
{
Detector detector = new DefaultDetector();
string name = metadata.get("resourceName");
MediaType contentType = detector.detect(inputStream, metadata);
if (contentType.getType() != "image") return;
var embeddedFile = name;
File outputFile = new File(#"C:\toHtml\images", embeddedFile);
try
{
using (FileOutputStream os = new FileOutputStream(outputFile))
{
var tin = inputStream as TikaInputStream;
if (tin != null)
{
if (tin.getOpenContainer() != null && tin.getOpenContainer() is DirectoryEntry)
{
POIFSFileSystem fs = new POIFSFileSystem();
fs.writeFilesystem(os);
}
else
{
IOUtils.copy(inputStream, os);
}
}
}
}
catch (Exception ex)
{
throw;
}
}
}
}

Transform XML to declare all namespaces on root element

Is there a simple Java method way of "moving" all XML namespace declarations of an XML document to the root element? Due to a bug in parser implementation of an unnamed huge company, I need to programmatically rewrite our well formed and valid RPC requests in a way that the root element declares all used namespaces.
Not OK:
<document-element xmlns="uri:ns1">
<foo>
<bar xmlns="uri:ns2" xmlns:ns3="uri:ns3">
<ns3:foobar/>
<ns1:sigh xmlns:ns1="uri:ns1"/>
</bar>
</foo>
</document-element>
OK:
<document-element xmlns="uri:ns1" xmlns:ns1="uri:ns1" xmlns:ns2="uri:ns2" xmlns:ns3="uri:ns3">
<foo>
<ns2:bar>
<ns3:foobar/>
<ns1:sigh/>
</ns2:bar>
</foo>
</document-element>
Generic names for missing prefixes are acceptable. Default namespace may stay or be replaced/added as long as it is defined on the root element. I don't really mind which specific XML technology is used to achieve this (I would prefer to avoid DOM though).
To clarify, this answer refers to what I'd like to achieve as redeclaring namespace declarations within root element scope (entire document) on the root element. Essentially the related question is asking why oh why would anyone implement what I now need to work around.

Wrote a two-pass StAX reader/writer, which is simple enough.
import java.io.*;
import java.util.*;
import javax.xml.stream.*;
import javax.xml.stream.events.*;
public class NamespacesToRoot {
private static final String GENERATED_PREFIX = "pfx";
private final XMLInputFactory inputFact;
private final XMLOutputFactory outputFact;
private final XMLEventFactory eventFactory;
private NamespacesToRoot() {
inputFact = XMLInputFactory.newInstance();
outputFact = XMLOutputFactory.newInstance();
eventFactory = XMLEventFactory.newInstance();
}
public String transform(String xmlString) throws XMLStreamException {
Map<String, String> pfxToNs = new HashMap<String, String>();
XMLEventReader reader = null;
// first pass - analyze
try {
if (xmlString == null || xmlString.isEmpty()) {
throw new IllegalArgumentException("xmlString is null or empty");
}
StringReader stringReader = new StringReader(xmlString);
XMLStreamReader streamReader = inputFact.createXMLStreamReader(stringReader);
reader = inputFact.createXMLEventReader(streamReader);
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
buildNamespaces(event, pfxToNs);
}
}
System.out.println(pfxToNs);
} finally {
try {
if (reader != null) {
reader.close();
}
} catch (XMLStreamException ex) {
}
}
// reverse mapping, also gets rid of duplicates
Map<String, String> nsToPfx = new HashMap<String, String>();
for (Map.Entry<String, String> entry : pfxToNs.entrySet()) {
nsToPfx.put(entry.getValue(), entry.getKey());
}
List<Namespace> namespaces = new ArrayList<Namespace>(nsToPfx.size());
for (Map.Entry<String, String> entry : nsToPfx.entrySet()) {
namespaces.add(eventFactory.createNamespace(entry.getValue(), entry.getKey()));
}
// second pass - rewrite
XMLEventWriter writer = null;
try {
StringWriter stringWriter = new StringWriter();
writer = outputFact.createXMLEventWriter(stringWriter);
StringReader stringReader = new StringReader(xmlString);
XMLStreamReader streamReader = inputFact.createXMLStreamReader(stringReader);
reader = inputFact.createXMLEventReader(streamReader);
boolean rootElement = true;
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
StartElement origStartElement = event.asStartElement();
String prefix = nsToPfx.get(origStartElement.getName().getNamespaceURI());
String namespace = origStartElement.getName().getNamespaceURI();
String localName = origStartElement.getName().getLocalPart();
Iterator attributes = origStartElement.getAttributes();
Iterator namespaces_;
if (rootElement) {
namespaces_ = namespaces.iterator();
rootElement = false;
} else {
namespaces_ = null;
}
writer.add(eventFactory.createStartElement(
prefix, namespace, localName, attributes, namespaces_));
} else {
writer.add(event);
}
}
writer.flush();
return stringWriter.toString();
} finally {
try {
if (reader != null) {
reader.close();
}
} catch (XMLStreamException ex) {
}
try {
if (writer != null) {
writer.close();
}
} catch (XMLStreamException ex) {
}
}
}
private void buildNamespaces(XMLEvent event, Map<String, String> pfxToNs) {
System.out.println("el: " + event);
StartElement startElement = event.asStartElement();
Iterator nsIternator = startElement.getNamespaces();
while (nsIternator.hasNext()) {
Namespace nsAttr = (Namespace) nsIternator.next();
if (nsAttr.isDefaultNamespaceDeclaration()) {
System.out.println("need to generate a prefix for " + nsAttr.getNamespaceURI());
generatePrefix(nsAttr.getNamespaceURI(), pfxToNs);
} else {
System.out.println("add prefix binding for " + nsAttr.getPrefix() + " --> " + nsAttr.getNamespaceURI());
addPrefix(nsAttr.getPrefix(), nsAttr.getNamespaceURI(), pfxToNs);
}
}
}
private void generatePrefix(String namespace, Map<String, String> pfxToNs) {
int i = 1;
String prefix = GENERATED_PREFIX + i;
while (pfxToNs.keySet().contains(prefix)) {
i++;
prefix = GENERATED_PREFIX + i;
}
pfxToNs.put(prefix, namespace);
}
private void addPrefix(String prefix, String namespace, Map<String, String> pfxToNs) {
String existingNs = pfxToNs.get(prefix);
if (existingNs != null) {
if (existingNs.equals(namespace)) {
// nothing to do
} else {
// prefix clash, need to rename this prefix or reuse an existing
// one
if (pfxToNs.values().contains(namespace)) {
// reuse matching prefix
} else {
// rename
generatePrefix(namespace, pfxToNs);
}
}
} else {
// need to add this prefix
pfxToNs.put(prefix, namespace);
}
}
public static void main(String[] args) throws XMLStreamException {
String xmlString = "" +
"<document-element xmlns=\"uri:ns1\" attr=\"1\">\n" +
" <foo>\n" +
" <bar xmlns=\"uri:ns2\" xmlns:ns3=\"uri:ns3\">\n" +
" <ns3:foobar ns3:attr1=\"meh\" />\n" +
" <ns1:sigh xmlns:ns1=\"uri:ns1\"/>\n" +
" </bar>\n" +
" </foo>\n" +
"</document-element>";
System.out.println(xmlString);
NamespacesToRoot transformer = new NamespacesToRoot();
System.out.println(transformer.transform(xmlString));
}
}
Note that this is just fast example code which could use some tweaks, but is also a good start for anyone with a similar problem.

Below is a simple app does the namespace re-declaration... based on XPath and VTD-XML.
import com.ximpleware.*;
import java.io.*;
public class moveNSDeclaration {
public static void main(String[] args) throws IOException, VTDException{
// TODO Auto-generated method stub
VTDGen vg = new VTDGen();
String xml="<document-element xmlns=\"uri:ns1\">\n"+
"<foo>\n"+
"<bar xmlns=\"uri:ns2\" xmlns:ns3=\"uri:ns3\">\n"+
"<ns3:foobar/>\n"+
"<ns1:sigh xmlns:ns1=\"uri:ns1\"/>\n"+
"</bar>\n"+
"</foo>\n"+
"</document-element>\n";
vg.setDoc(xml.getBytes());
vg.parse(false); // namespace unaware to all name space nodes addressable using xpath #*
VTDNav vn = vg.getNav();
XMLModifier xm = new XMLModifier(vn);
FastIntBuffer fib = new FastIntBuffer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// get the index value of xmlns declaration of root element
AutoPilot ap =new AutoPilot (vn);
ap.selectXPath("//#*");
int i=0;
//remove all ns node under root element
//save those nodes to be re-inserted into the root element up on verification of uniqueness
while((i=ap.evalXPath())!=-1){
if (vn.getTokenType(i)==VTDNav.TOKEN_ATTR_NS){
xm.remove(); //remove all ns node
fib.append(i);
}
}
//remove redundant ns nodes
for (int j=0;j<fib.size();j++){
if (fib.intAt(j)!=-1){
for (i=j+1;i<fib.size();i++){
if (fib.intAt(i)!=-1)
if (vn.compareTokens(fib.intAt(j), vn, fib.intAt(i))==0){
fib.modifyEntry(i, -1);
}
}
}
}
// compose a string to insert back into the root element containing all subordinate ns nodes
for (int j=0;j<fib.size();j++){
if (fib.intAt(j)!=-1){
int os = vn.getTokenOffset(fib.intAt(j));
int len = vn.getTokenOffset(fib.intAt(j)+1)+vn.getTokenLength(fib.intAt(j)+1)+1-os;
//System.out.println(" os len "+ os + " "+len);
//System.out.println(vn.toString(os,len));
baos.write(" ".getBytes());
baos.write(vn.getXML().getBytes(),os,len);
}
}
byte[] attrBytes = baos.toByteArray();
vn.toElement(VTDNav.ROOT);
xm.insertAttribute(attrBytes);
//System.out.println(baos.toString());
baos.reset();
xm.output(baos);
System.out.println(baos.toString());
}
}
Output looks like
<document-element xmlns="uri:ns2" xmlns:ns3="uri:ns3" xmlns:ns1="uri:ns1" >
<foo>
<bar >
<ns3:foobar/>
<ns1:sigh />
</bar>
</foo>
</document-element>

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Avoid namespace while Parsing xml with woodstox - java

Related

Mustang Library: Converting invalid PDFA3 to valid PDFA3 for building Factur-x

In Java, how do you deal with double quote inside of a CSV that you need to parse

Error: Caused by: com.sun.istack.internal.SAXException2: class java.util.LinkedHashMap not found nor either of his superclass in this context

convert a word documents to HTML with embedded images by TIKA

Transform XML to declare all namespaces on root element

Categories

Resources