I have two different XML's, both containing the same products but in another language.
The problem is, I want to add the description of the French XML to the description of the Dutch one, but the French XML contains a lot more products.
This is what I tried, but it doesn't work since the French XML is bigger. (Also, I believe I made a mistake since the products aren't on the same position either, they just share the same product code (named code in the Dutch XML and artikelnummer in the French XML).
What should I do here?
doc = (Document) builder.build(xmlFile);
docfrans = (Document) builder.build(xmlFilefrans);
root = doc.getRootElement();
root.setName("productlist");
List<Element> elementje = root.getChildren();
rootfrans = docfrans.getRootElement();
List<Element> elementjefrans = rootfrans.getChildren();
for (int i = 0; i < elementjefrans.size(); i++) {
Element verwijderdelementfrans = elementjefrans.get(i);
Element verwijderdelement = elementje.get(i);
List<Element> lijstjefrans = verwijderdelementfrans.getChildren();
List<Element> lijstje = verwijderdelement.getChildren();
for (int j = 0; j < lijstjefrans.size(); j++) {
if ( verwijderdelementfrans.getChild("artikelnummer").getText().equals(verwijderdelement.getChild("code").getText()) ){
System.out.println("test");
verwijderdelement.getChild("description").setText(verwijderdelement.getChild("description").getText()+verwijderdelementfrans.getChild("omschrijving").getText());
}
}
}
Figured it out myself after a lot of searching, for those interested :
SAXBuilder builder = new SAXBuilder();
String eol = System.getProperty("line.separator");
doc = (Document) builder.build(xmlFile);
docfrans = (Document) builder.build(xmlFilefrans);
root = doc.getRootElement();
root.setName("productlist");
List<Element> elementje = root.getChildren();
rootfrans = docfrans.getRootElement();
List<Element> elementjefrans = rootfrans.getChildren();
for (int i = 0; i < elementjefrans.size(); i++) {
for (int k = 0; k < elementje.size(); k++) {
Element verwijderdelementfrans = elementjefrans.get(i);
Element verwijderdelement = elementje.get(k);
// for (int j = 0; j < lijstje.size(); j++) {
if (verwijderdelementfrans.getChild("artikelnummer").getText().equals(verwijderdelement.getChild("code").getText())) {
System.out.println("test");
verwijderdelement.getChild("description").setText(verwijderdelement.getChild("description").getText() +"\n"+"\n"+ verwijderdelementfrans.getChild("omschrijving").getText());
}
// }
}
}
Related
I was asked to write an app that will download only main table (marked as report_table) from the given URL https://www.ote-cr.cz/en/statistics/electricity-imbalances-1 and store it in a separate HTML file.
I have managed to download the table's content, however, I cannot manage to style it properly as I am asked to. Here is my code:
Document doc = Jsoup.connect(url).get();
System.out.println(doc);
Element tableElement = doc.select("table.table.report_table").first();
Elements tableHeaderElements = tableElement.select("thead tr th");
System.out.println("headers");
for (int i = 0; i < tableHeaderElements.size(); i++) {
System.out.println(tableHeaderElements.get(i).text());
writer.append(tableHeaderElements.get(i).text());
if (i != tableHeaderElements.size() - 1) {
writer.append(',');
}
}
writer.append('\n');
System.out.println();
Elements tableRowElements = tableElement.select(":not(thead) tr");
for (int i = 0; i < tableRowElements.size(); i++) {
Element row = tableRowElements.get(i);
System.out.println("row");
Elements rowItems = row.select("td");
for (int j = 0; j < rowItems.size(); j++) {
System.out.println(rowItems.get(j).text());
writer.append(rowItems.get(j).text());
if (j != rowItems.size() - 1) {
writer.append(' ');
}
}
writer.append('\n');
}
writer.close();
}
What shall I add to my code, in order to get a correctly styled table in a separate HTML?
This extracts the html table (without css) and saves it to a file
public class Parser {
public void parseAndWrite() {
Document doc;
try {
doc = Jsoup.connect(" https://www.ote-cr.cz/en/statistics/electricity-imbalances-1").get();
PrintWriter writer = new PrintWriter(new File("out.html"));
System.out.println(doc);
Element tableElement = doc.select("div.bigtable").first();
writer.write(tableElement.toString());
writer.close();
} catch (IOException e) {
// LOG may be?
}
}
Hope this helps
I'm able to parse the XML object if it has a single unique inner tag. But the problem comes when I have two duplicate tags in a parent tag. How can I get both tag values? I'm getting the response as XML string.
Here is my code
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(responseXML));
if (is != null) {
Document doc = db.parse(is);
String errorCode = "";
NodeList errorDetails = doc.getElementsByTagName("ERROR-LIST");
if (errorDetails != null) {
int length = errorDetails.getLength();
if (length > 0) {
for (int i = 0; i < length; i++) {
if (errorDetails.item(i).getNodeType() == Node.ELEMENT_NODE) {
Element el = (Element) errorDetails.item(i);
if (el.getNodeName().contains("ERROR-LIST")) {
NodeList errorCodes = el.getElementsByTagName("ERROR-CODE");
for (int j = 0; j < errorCodes.getLength(); j++) {
Node errorCode1 = errorCodes.item(j);
logger.info(errorCode1.getNodeValue());
}
}
}
}
} else {
isValidResponse = true;
}
}
}
The response which I'm getting from server is
<DATA><HEADER><RESPONSE-TYPE CODE = "0" DESCRIPTION = "Response Error" />
</HEADER><BODY><ERROR-LIST>
<ERROR-CODE>9000</ERROR-CODE>
<ERROR-CODE>1076</ERROR-CODE>
</ERROR-LIST></BODY></DATA>
Im able to get only 9000 error code, how can I catch all error codes which are under error list?
Any ideas would be greatly appreciated.
You are explicitly requesting the first element of the error list:
el.getElementsByTagName("ERROR-CODE").item(0).getTextContent();
Loop over all the nodes getElementsByTagName returns.
NodeList errorCodes = el.getElementsByTagName("ERROR-CODE");
for (int j = 0; j < errorCodes.getLength(); j++) {
String errorCode = errorCodes.item(j).getTextContent();
}
Hi i want to parse XML and display list based on selection of user
my xml is looking like this
below is my code
try {
XMLParser parser = new XMLParser();
Document doc = parser.getDomElement(xml); // getting DOM element
NodeList n1 = doc.getElementsByTagName("company");
// looping through all item nodes <item>
for (int i = 0; i < n1.getLength(); i++) {
// creating new HashMap
Element e = (Element) n1.item(i);
System.out.println("name node "+parser.getValue(e, "name"));
}
by this way i am getting the output like
Company ABC
Company XYZ
of Companies list
but
i would write code like
NodeList n1 = doc.getElementsByTagName("province");
// looping through all item nodes <item>
for (int i = 0; i < n1.getLength(); i++) {
// creating new HashMap
Element e = (Element) n1.item(i);
System.out.println("name node "+parser.getValue(e, "name"));
}
i am getting list of province name
Alberta
Ontario
New York
Florida
but it should work like this
when i select Company ABC
only two provision list should display
Alberta
Ontario
not should all display can any body help me how to rewrite my code
This should do it:
XMLParser parser = new XMLParser();
Document doc = parser.getDomElement(xml); // getting DOM element
NodeList n1 = doc.getElementsByTagName("company");
// looping through all item nodes <item>
for (int i = 0; i < n1.getLength(); i++) {
Element e = (Element) n1.item(i);
System.out.println("name node " +parser.getValue(e, "name"));
NodeList children = e.getChildNodes();
for (int j = 0; j < children.getLength(); j++) {
Node child = children.item(j);
if (child.getNodeName().equalsIgnoreCase("province")) {
System.out.println("name node " + parser.getValue((Element)child, "name"));
}
}
}
Use Node.getChildNodes() over the "company" nodes. Then, to get the child province nodes, compare by name. Example:
XMLParser parser = new XMLParser();
Document doc = parser.getDomElement(xml); // getting DOM element
NodeList n1 = doc.getElementsByTagName("company");
// looping through all item nodes <item>
for (int i = 0; i < n1.getLength(); i++) {
Node companyNode = n1.item(i);
NodeList childNodes = companyNode.getChildNodes();
// Here we're getting child nodes inside the company node.
// Only direct childs will be returned (name and province)
for (int j = 0; j < childNodes.getLength(); j++) {
Node childNode = childNodes.item(j);
if("province".equalsIgnoreCase(childNode.getName())){
//Do something with province
}
}
}
Try the following code:
public class MainActivity extends Activity {
#Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
/** Create a new layout to display the view */
LinearLayout layout = new LinearLayout(this);
layout.setOrientation(1);
/** Create a new textview array to display the results */
TextView name[];
TextView website[];
TextView category[];
try {
URL url = new URL(
"http://xyz.com/aa.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList = doc.getElementsByTagName("item");
/** Assign textview array lenght by arraylist size */
name = new TextView[nodeList.getLength()];
website = new TextView[nodeList.getLength()];
category = new TextView[nodeList.getLength()];
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
name[i] = new TextView(this);
website[i] = new TextView(this);
category[i] = new TextView(this);
Element fstElmnt = (Element) node;
NodeList nameList = fstElmnt.getElementsByTagName("name");
Element nameElement = (Element) nameList.item(0);
nameList = nameElement.getChildNodes();
name[i].setText("Name = "
+ ((Node) nameList.item(0)).getNodeValue());
NodeList websiteList = fstElmnt.getElementsByTagName("website");
Element websiteElement = (Element) websiteList.item(0);
websiteList = websiteElement.getChildNodes();
website[i].setText("Website = "
+ ((Node) websiteList.item(0)).getNodeValue());
category[i].setText("Website Category = "
+ websiteElement.getAttribute("category"));
layout.addView(name[i]);
layout.addView(website[i]);
layout.addView(category[i]);
}
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
/** Set the layout view to display */
setContentView(layout);
}
}
The getElementsBytagName called on the document object will always return the list of all the nodes with the given tag name in the whole document. Instead, filter out the single company element you are interested in, and then call getElementsByTagName on it. E.g.
Element companyEl = doc.getElementById(desiredCompanyId);
if (companyEl != null) { // always good to check
NodeList n1 = companyEl.getElementsByTagName("province");
// your code here
}
Try with this code
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
name[i] = new TextView(this);
website[i] = new TextView(this);
category[i] = new TextView(this);
Element fstElmnt = (Element) node;
NodeList nameList = fstElmnt.getElementsByTagName("name");
Element nameElement = (Element) nameList.item(0);
nameList = nameElement.getChildNodes();
name[i].setText("Name = "
+ ((Node) nameList.item(0)).getNodeValue());
NodeList websiteList = fstElmnt.getElementsByTagName("website");
Element websiteElement = (Element) websiteList.item(0);
websiteList = websiteElement.getChildNodes();
website[i].setText("Website = "
+ ((Node) websiteList.item(0)).getNodeValue());
category[i].setText("Website Category = "
+ websiteElement.getAttribute("category"));
layout.addView(name[i]);
layout.addView(website[i]);
layout.addView(category[i]);
}
<taxmann>
<docdetails>
<info id="104010000000007617" date="19780225">
<physicalpath>\\192.168.1.102\CMS\DATA</physicalpath>
<filepath isxml="N">
\NOTIFICATIONS\DIRECTTAXLAWS\HTMLFILES\150025021978.htm
</filepath>
<summary></summary>
<description></description>
<heading>
2187 [S.O.1500] | Section 35(1)(ii) of the Income-tax Act, 1961 - Scientific research expenditure - Approved scientific research associations/institutions
</heading>
<correspondingcitation/>
<hasfile>YES</hasfile>
<sortby>20120328160152743</sortby>
<parentid></parentid>
<parentchapterid></parentchapterid>
</info>
</docdetails>
</taxmann>
Code:
ArrayList<HashMap<String, String>> menuItems = new ArrayList<HashMap<String, String>>();
XMLParser parser = new XMLParser();
String xml = parser.getXmlFromUrl(URL); // getting XML
Document doc = parser.getDomElement(xml); // getting DOM element
NodeList nl = doc.getElementsByTagName(KEY_DOCDETAILS);
// looping through all item nodes <item>
for (int i = 0; i < nl.getLength(); i++) {
// creating new HashMap
HashMap<String, String> map = new HashMap<String, String>();
Element e = (Element) nl.item(i);
map.put(KEY_HEADING, parser.getValue(e, KEY_HEADING));
// adding HashList to ArrayList
menuItems.add(map);
}
This is My Xml Format i want to xmlParse and want to dsiaply id,date,heading i m able to display heading But i am not to print date and id can u please tell me how i will implemnt it . this is my code to Print heading please modify my code and Print heading ,id,date..
1.Declaration
String URL = "http://www.google.co.in/ig/api?news&hl=en";
String KEY_ITEM = "news";
String KEY_ID = "news_entry";
String KEY_NAME = "title";
String KEY_COST = "url";
String KEY_DESC = "snippet";
2.Parsing
XMLParser parser = new XMLParser();
String xml = parser.getXmlFromUrl(URL); // getting XML
Document doc = parser.getDomElement(xml); // getting DOM element
NodeList nl = doc.getElementsByTagName(KEY_ITEM);
for (int i = 0; i < nl.getLength(); i++) {
HashMap<String, String> map = new HashMap<String, String>();
Element e = (Element) nl.item(i);
NamedNodeMap attributes = e.getAttributes();
System.out.println("attrlength"+attributes.getLength());
for (int a = 0; a < attributes.getLength(); a++)
{
Node theAttribute = attributes.item(a);
System.out.println(theAttribute.getNodeName() + "=" + theAttribute.getNodeValue());
}
NodeList nl1=e.getElementsByTagName(KEY_ID);
System.out.println("keyId"+nl1.getLength());
for(int j=0;j<nl1.getLength();j++)
{
Element e1 = (Element) nl1.item(j);
NodeList n = e1.getElementsByTagName(KEY_NAME);
for (int k = 0; k < n.getLength(); k++) {
Element e2 = (Element) n.item(k);
// System.out.println("node Title value"+e2.getNodeName());
NamedNodeMap attributes2 = e2.getAttributes();
// System.out.println("attrlength"+attributes2.getLength());
for (int a = 0; a < attributes2.getLength(); a++)
{
Node theAttribute = attributes2.item(a);
System.out.println(theAttribute.getNodeName() + "=" + theAttribute.getNodeValue());
}
}
NodeList n1 = e1.getElementsByTagName(KEY_COST);
// System.out.println("title "+n.getLength());
for (int k = 0; k < n1.getLength(); k++) {
Element e2 = (Element) n1.item(k);
// System.out.println("node Url value");
NamedNodeMap attributes2 = e2.getAttributes();
// System.out.println("attrlength"+attributes2.getLength());
for (int a = 0; a < attributes2.getLength(); a++)
{
Node theAttribute = attributes2.item(a);
System.out.println(theAttribute.getNodeName() + "=" + theAttribute.getNodeValue());
}}
NodeList n2 = e1.getElementsByTagName(KEY_DESC);
// System.out.println("title "+n.getLength());
for (int k = 0; k < n2.getLength(); k++) {
Element e2 = (Element) n2.item(k);
// System.out.println("node snippet value"+e2.getNodeName());
NamedNodeMap attributes2 = e2.getAttributes();
// System.out.println("attrlength"+attributes2.getLength());
for (int a = 0; a < attributes2.getLength(); a++)
{
Node theAttribute = attributes2.item(a);
System.out.println(theAttribute.getNodeName() + "=" + theAttribute.getNodeValue());
}
}
}
// menuItems.add(map);
}
NodeList nl = doc.getElementsByTagName("info");
for (int i = 0; i < nl.getLength(); i++) {
Element e = (Element) nl.item(i);
NamedNodeMap attributes = e.getAttributes();
for (int a = 0; a < attributes.getLength(); a++)
{
Node theAttribute = attributes.item(a);
System.out.println(theAttribute.getNodeName() + "=" + theAttribute.getNodeValue());
}
}
I am trying to parse dblp.xml in java to get the author names/title/year etc, but since the file is huge (860MB), I cannot use DOM/SAX on the complete file.
So I split the file into multiple small files of around 100MB each.
Now each file contains various (thousands of) nodes like this:
<dblp>
<inproceedings mdate="2011-06-23" key="conf/aime/BianchiD95">
<author>Nadia Bianchi</author>
<author>Claudia Diamantini</author>
<title>Integration of Neural Networks and Rule Based Systems in the Interpretation of Liver Biopsy Images.</title>
<pages>367-378</pages>
<year>1995</year>
<crossref>conf/aime/1995</crossref>
<booktitle>AIME</booktitle>
<url>db/conf/aime/aime1995.html#BianchiD95</url>
<ee>http://dx.doi.org/10.1007/3-540-60025-6_152</ee>
</inproceedings>
</dblp>
100MB should be readable in DOM, I am assuming, but the code stops after roughly 45k lines. Here is the java code I am using:
#SuppressWarnings({"unchecked", "null"})
public List<dblpModel> readConfigDOM(String configFile) {
List<dblpModel> items = new ArrayList<dblpModel>();
List<String> strList = null;
dblpModel item = null;
try {
File fXmlFile = new File(configFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("incollection");
for (int temp = 0; temp < nList.getLength(); temp++) {
item = new dblpModel();
strList = new ArrayList<String>();
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
strList = getTagValueString("title", eElement);
System.out.println(strList.get(0).toString());
strList = getTagValueString("author", eElement);
System.out.println("Author : " + strList.size());
for(String s: strList) {
System.out.println(s);
}
}
items.add(item);
}
} catch (Exception e) {
e.printStackTrace();
}
return items;
}
private static String getTagValueString(String sTag, Element eElement) {
String temp = "";
StringBuffer concatTestSb = new StringBuffer();
List<String> strList = new ArrayList<String>();
int len = eElement.getElementsByTagName(sTag).getLength();
try {
for (int i = 0; i < len; i++) {
NodeList nl = eElement.getElementsByTagName(sTag).item(i).getChildNodes();
if (nl.getLength() > 1) {
for (int j = 0; j < nl.getLength(); j++) {
concatTestSb.append(nl.item(j).getTextContent());
}
} else {
temp = nl.item(0).getNodeValue();
concatTestSb.append(temp);
if (len > 1) {
concatTestSb.append("*");
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
return concatTestSb.toString();
}
Any help? I have tried using STAX api for parsing large documents also, but that also
If you goal is to just get the details out, the just use a BufferedReader to read the file as a text file. If you want, throw in some regex.
if using mysql is an option, you may be able to get it to do the heavy lifting through it's XML Functions
Hope this helps.
Don't fuss too much about the xml format. It is not terribly useful anyway. Just read it as text file and parse the lines as string. You can then export the data to a csv and use it the way you want from that point.
Unfortunately xml is not very efficient for large documents. I did something similar here for a research project:
http://qualityofdata.com/2011/03/27/dblp-for-sql-server/