How to get element from ArrayList with xml data? - java

I would like to get searchedProduct name from ArrayList example in new Product class, how to do it? Everything works correct but only I forgot how to get searchedProduct for example in another class :(
public class XMLoader {
private final String XML_PATH = "src\\main\\java\\products.xml";
private List<SearchData> data = new ArrayList<SearchData>();
public XMLoader() throws ParserConfigurationException, IOException, SAXException {
File inputFile = new File(XML_PATH);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = builderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(inputFile);
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
String id = node.getAttributes().getNamedItem("ID").getNodeValue();
String searchedProduct = element.getElementsByTagName("Category").item(0).getChildNodes().item(0).getNodeValue();
data.add(new SearchData(id, searchedProduct));
}
}
}
public class SearchData {
private String id;
private String searchedProduct;
public SearchData( String id, String searchedProduct) {
this.searchedProduct = searchedProduct;
this.id = id;
}
public String getSearchedProduct() {
return searchedProduct;
}
public String getId() {
return id;
}
#Override
public String toString() {
return "SearchData{" +
"id='" + id + '\'' +
", searchedProduct='" + searchedProduct + '\'' +
'}';
}
}

There are two ways I can immediately see (I leave it up to you to re-write it to modern Java):
public String getIdFromSearchData(String searchedProduct) {
// iterate over your list
for ( SearchData element : data ) {
// compare searchedProduct to the paramater
if ( searchedProduct.equals(element.getSearchedProduct()) {
return element.getId();
}
}
// in case nothing found
return null;
}
or, you override the equals method in SearchData to compare the searchedProduct (only)
public boolean equals(Object o) {
// let's assume all checks have been done
SearchData data = (SearchData)o;
return data.getSearchedProduct().equals(searchedProduct);
}
at which point you can immediately find the element from the list:
public String getIdFromSearchData(String searchedProduct) {
SearchData d = new SeardData(null, searchedProduct);
if ( data.contains(d) ) {
d = data.get(data.indexOf(d));
return d.getId();
}
return null;
}
A few remarks about this second option:
1. I would strongly recommend against it. One day you might have to compare on the id as well, and at that point, your code will no longer function
2. If you decide to do something like this second way anyway, don't forget to implement the hashCode() method as well.

Related

I need Freemarker documentation for data model when using #recurse with trees other than XML

I am trying to process a tree structure in Freemarker and would like use the #recurse, #visit directives but I can't find any good documentation on how to set up the data model. The only examples I can see are those that create a data model for an XML structure. I don't need it to be so detailed. My tree is very simple.
In trying to test the functionality I need, I built a unit test but when I run it I get
FreeMarker template error:
For "." left-hand operand: Expected a hash, but this has evaluated to a node
Here is the source code:
public class FreemarkerXmlTests {
static class Element implements TemplateNodeModel {
private final String name;
private final String text;
private Element parent;
private final List<Element> elements = new ArrayList<>();
public Element(String name) {
this(name, null);
}
public Element(String name, String text) {
this.name = name;
this.text = text;
}
public void add(Element element) {
element.parent = this;
this.elements.add(element);
}
public List<Element> getElements() {
return this.elements;
}
public String getName() {
return this.name;
}
public String getText() {
return this.text;
}
public String getTitle() {
return this.name;
}
public TemplateModel get(String key) {
return null;
}
#Override
public TemplateNodeModel getParentNode() throws TemplateModelException {
return this.parent;
}
#Override
public TemplateSequenceModel getChildNodes() throws TemplateModelException {
// TODO Auto-generated method stub
return new SimpleSequence(this.elements, cfg.getObjectWrapper());
}
#Override
public String getNodeName() throws TemplateModelException {
return this.name;
}
#Override
public String getNodeType() throws TemplateModelException {
return this.name;
}
#Override
public String getNodeNamespace() throws TemplateModelException {
return null;
}
}
private static Configuration cfg;
private static final String myTestTemplate = "<#recurse doc>\r\n" +
"\r\n" +
"<#macro book>\r\n" +
" Book element with title ${.node.title} \r\n" +
" <#recurse>\r\n" +
" End book\r\n" +
"</#macro>\r\n" +
"\r\n" +
"<#macro title>\r\n" +
" Title element\r\n" +
"</#macro>\r\n" +
"\r\n" +
"<#macro chapter>\r\n" +
" Chapter element with title: ${.node.title}\r\n" +
"</#macro>";
#BeforeClass
public static void classInit() throws IOException {
StringTemplateLoader stringTemplateLoader = new StringTemplateLoader();
stringTemplateLoader.putTemplate("myTestTemplate", myTestTemplate);
cfg = new Configuration(Configuration.VERSION_2_3_29);
cfg.setTemplateLoader(stringTemplateLoader);
cfg.setDefaultEncoding("UTF-8");
cfg.setTemplateExceptionHandler(TemplateExceptionHandler.RETHROW_HANDLER);
cfg.setLogTemplateExceptions(false);
cfg.setWrapUncheckedExceptions(true);
cfg.setFallbackOnNullLoopVariable(false);
}
#Test
public void basicXmlTest() throws TemplateException, IOException {
Element doc = new Element("doc");
Element book = new Element("book");
book.add(new Element("title", "Test Book"));
doc.add(book);
Element chapter1 = new Element("chapter");
chapter1.add(new Element("title", "Ch1"));
chapter1.add(new Element("para", "p1.1"));
chapter1.add(new Element("para", "p1.2"));
chapter1.add(new Element("para", "p1.3"));
book.add(chapter1);
Element chapter2 = new Element("chapter");
chapter2.add(new Element("title", "Ch2"));
chapter2.add(new Element("para", "p2.1"));
chapter2.add(new Element("para", "p2.2"));
chapter2.add(new Element("para", "p2.3"));
book.add(chapter2);
Map<String, Object> root = new HashMap<>();
// Put string "user" into the root
root.put("doc", doc);
Template temp = cfg.getTemplate("myTestTemplate");
Writer out = new OutputStreamWriter(System.out);
temp.process(root, out);
}
Any ideas?
Take a look at the freemarker.template.TemplateNodeModel interface. Your objects have to implement that, or they have to be wrapped (via the ObjectWrapper) into a TemplateModel the implements that. Then #recurse/#visit/?parent/?children/etc. will work with them.
Here's an example of implementing TemplateNodeModel for traversing JSON: https://github.com/freemarker/fmpp/blob/master/src/main/java/fmpp/models/JSONNode.java
Some templates where above is used:
https://github.com/freemarker/fmpp/tree/master/src/test/resources/tests/dl_json/src
As of the . operator, for that you need to implement TemplateHashModel (or its sub-interfaces, like TemplateHashModelEx2).
With the help of the examples posted by ddekany, I added the following:
implements TemplateHashModel to the Element class
static class Element implements TemplateNodeModel, TemplateHashModel {
a method to the Element class and the unit test worked:
#Override
public TemplateModel get(String key) throws TemplateModelException {
switch (key) {
case "title":
case "name":
return cfg.getObjectWrapper().wrap(this.name);
default:
throw new TemplateModelException("unknown hash get: " + key);
}
}

want to parse the xml into java object

in my case its not full xml instead of that i want to parse the part of one xml tag to be parsed.
<FILTERABLE>
<FILTER_ELEMENT ALIAS_NAME="roomnumber" JOINER="AND" LPAREN="false" OPERATOR="BEGINS" RPAREN="false" SEQNUM="1" VALUE="1001"/>
</FILTERABLE>
Please help to convert the code into java object.
ByteArrayInputStream bis = new ByteArrayInputStream(filterStrValue.getBytes("UTF-8"));
Document document = EntityCollectionXMLUtil.DomfromXML(new InputSource(bis), false);
Element rootElement = document.getDocumentElement();
rootElement.getElementsByTagName("FILTERABLE")
Need one java object as a pair of hash map contains below
FILTER_ELEMENT ALIAS_NAME = "roomnumber"
JOINER="AND"
LPAREN="false"
OPERATOR="BEGINS"
RPAREN="false"
SEQNUM="1"
VALUE="1001"
dom4j is an open source, Java-based library for parse XML documents. in this answer used dom4j api for parse the xml document. hence, add the dom4j.jar file into your application's classpath.
class of FILTER_ELEMENT
public class Filter_Element {
private String ALIAS_NAME;
private String JOINER;
private Boolean LPAREN;
private String OPERATOR;
private Boolean RPAREN;
private int SEQNUM;
private int VALUE;
public String getALIAS_NAME() {
return ALIAS_NAME;
}
public void setALIAS_NAME(String aLIAS_NAME) {
ALIAS_NAME = aLIAS_NAME;
}
public String getJOINER() {
return JOINER;
}
public void setJOINER(String jOINER) {
JOINER = jOINER;
}
public Boolean getLPAREN() {
return LPAREN;
}
public void setLPAREN(Boolean lPAREN) {
LPAREN = lPAREN;
}
public String getOPERATOR() {
return OPERATOR;
}
public void setOPERATOR(String oPERATOR) {
OPERATOR = oPERATOR;
}
public Boolean getRPAREN() {
return RPAREN;
}
public void setRPAREN(Boolean rPAREN) {
RPAREN = rPAREN;
}
public int getSEQNUM() {
return SEQNUM;
}
public void setSEQNUM(int sEQNUM) {
SEQNUM = sEQNUM;
}
public int getVALUE() {
return VALUE;
}
public void setVALUE(int vALUE) {
VALUE = vALUE;
}
}
attributes values of xml element (FILTER_ELEMENT) set to the filterElement object
try {
File fXmlFile = new File("your_xml_file.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
NodeList nodeList= doc.getElementsByTagName("FILTER_ELEMENT");
Filter_Element filterElement;
for(int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
filterElement = new Filter_Element();
filterElement.setALIAS_NAME(node.getAttributes().getNamedItem("ALIAS_NAME").getNodeValue());
filterElement.setJOINER(node.getAttributes().getNamedItem("JOINER").getNodeValue());
filterElement.setLPAREN(Boolean.valueOf(node.getAttributes().getNamedItem("LPAREN").getNodeValue()));
filterElement.setOPERATOR(node.getAttributes().getNamedItem("OPERATOR").getNodeValue());
filterElement.setRPAREN(Boolean.valueOf(node.getAttributes().getNamedItem("RPAREN").getNodeValue()));
filterElement.setSEQNUM(Integer.valueOf(node.getAttributes().getNamedItem("SEQNUM").getNodeValue()));
filterElement.setVALUE(Integer.valueOf(node.getAttributes().getNamedItem("VALUE").getNodeValue()));
}
} catch (Exception e) {
e.printStackTrace();
}

jsoup to get div elements with classes

I am new to Jsoup parsing and I want to get the list of all the companies on this page: https://angel.co/companies?company_types[]=Startup
Now, a way to do this is actually to inspect the page with the div tags relevant to what I need.
However, when I call the method :
Document doc = Jsoup.connect("https://angel.co/companies?company_types[]=Startup").get();
System.out.println(doc.html());
Firstly I cannot even find those DIV tags in my consol html output, (the ones which are supposed to give a list of the companies)
Secondly, even if I did find it, how can I find a certain Div element with class name :
div class=" dc59 frw44 _a _jm"
Pardon the jargon, I have no idea how to go through this.
The data are not embedded in the page but they are retrieved using subsequent API calls :
a POST https://angel.co/company_filters/search_data to get an ids array & a token named hexdigest
a GET https://angel.co/companies/startups to retrieve company data using the output from the previous request
The above is repeated for each page (thus a new token & a list of ids are needed for each page). This process can be seen using Chrome dev console in Network tabs.
The first POST request gives JSON output but the second request (GET) gives HTML data in a property of a JSON object.
The following extracts the company filter :
private static CompanyFilter getCompanyFilter(final String filter, final int page) throws IOException {
String response = Jsoup.connect("https://angel.co/company_filters/search_data")
.header("Content-Type", "application/x-www-form-urlencoded;charset=UTF-8")
.header("X-Requested-With", "XMLHttpRequest")
.data("filter_data[company_types][]=", filter)
.data("sort", "signal")
.data("page", String.valueOf(page))
.userAgent("Mozilla")
.ignoreContentType(true)
.post().body().text();
GsonBuilder gsonBuilder = new GsonBuilder();
Gson gson = gsonBuilder.create();
return gson.fromJson(response, CompanyFilter.class);
}
Then the following extracts companies :
private static List<Company> getCompanies(final CompanyFilter companyFilter) throws IOException {
List<Company> companies = new ArrayList<>();
URLConnection urlConn = new URL("https://angel.co/companies/startups?" + companyFilter.buildRequest()).openConnection();
urlConn.setRequestProperty("User-Agent", "Mozilla");
urlConn.connect();
BufferedReader reader = new BufferedReader(new InputStreamReader(urlConn.getInputStream(), "UTF-8"));
HtmlContainer htmlObj = new Gson().fromJson(reader, HtmlContainer.class);
Element doc = Jsoup.parse(htmlObj.getHtml());
Elements data = doc.select("div[data-_tn]");
if (data.size() > 0) {
for (int i = 2; i < data.size(); i++) {
companies.add(new Company(data.get(i).select("a").first().attr("title"),
data.get(i).select("a").first().attr("href"),
data.get(i).select("div.pitch").first().text()));
}
} else {
System.out.println("no data");
}
return companies;
}
The main function :
public static void main(String[] args) throws IOException {
int pageCount = 1;
List<Company> companies = new ArrayList<>();
for (int i = 0; i < 10; i++) {
System.out.println("get page n°" + pageCount);
CompanyFilter companyFilter = getCompanyFilter("Startup", pageCount);
pageCount++;
System.out.println("digest : " + companyFilter.getDigest());
System.out.println("count : " + companyFilter.getTotalCount());
System.out.println("array size : " + companyFilter.getIds().size());
System.out.println("page : " + companyFilter.getpage());
companies.addAll(getCompanies(companyFilter));
if (companies.size() == 0) {
break;
} else {
System.out.println("size : " + companies.size());
}
}
}
Company, CompanyFilter & HtmlContainer are model class :
class CompanyFilter {
#SerializedName("ids")
private List<Integer> mIds;
#SerializedName("hexdigest")
private String mDigest;
#SerializedName("total")
private String mTotalCount;
#SerializedName("page")
private int mPage;
#SerializedName("sort")
private String mSort;
#SerializedName("new")
private boolean mNew;
public List<Integer> getIds() {
return mIds;
}
public String getDigest() {
return mDigest;
}
public String getTotalCount() {
return mTotalCount;
}
public int getpage() {
return mPage;
}
private String buildRequest() {
String out = "total=" + mTotalCount + "&";
out += "sort=" + mSort + "&";
out += "page=" + mPage + "&";
out += "new=" + mNew + "&";
for (int i = 0; i < mIds.size(); i++) {
out += "ids[]=" + mIds.get(i) + "&";
}
out += "hexdigest=" + mDigest + "&";
return out;
}
}
private static class Company {
private String mLink;
private String mName;
private String mDescription;
public Company(String name, String link, String description) {
mLink = link;
mName = name;
mDescription = description;
}
public String getLink() {
return mLink;
}
public String getName() {
return mName;
}
public String getDescription() {
return mDescription;
}
}
private static class HtmlContainer {
#SerializedName("html")
private String mHtml;
public String getHtml() {
return mHtml;
}
}
The full code is also available here

How to remove particular bean object from ArrayList in Java? [duplicate]

This question already has answers here:
When you call remove(object o) on an arraylist, how does it compare objects?
(4 answers)
Closed 8 years ago.
I want to remove particular bean object from ArrayList.
I am using remove and removeAll method for delete the object element from ArrayList, but not remove element.
for example, assume below code,
ArrayList<SystemDetailData> systemDetails = new ArrayList<SystemDetailData>();
SystemDetailData data = new SystemDetailData();
data.setId("1");
data.setName("abc");
data.setHost("192.168.1.2");
systemDetails.add(data);
data = new SystemDetailData();
data.setId("2");
data.setName("asd");
data.setHost("192.168.1.45");
systemDetails.add(data);
System.out.println("Before remove : " + systemDetails);
ArrayList<SystemDetailData> systemDetail = new ArrayList<SystemDetailData>();
SystemDetailData data = new SystemDetailData();
data.setId("1");
data.setName("abc");
data.setHost("192.168.1.2");
systemDetail.add(data);
System.out.println("Old data :" + systemDetail);
//Remove object from arraylist - method1
systemDetails.removeAll(systemDetail);
//Remove object from arraylist - method2
systemDetails.removeAll(systemDetail.getId());
systemDetails.removeAll(systemDetail.getName());
systemDetails.removeAll(systemDetail.getHost());
System.out.println("After remove : "+systemDetails);
Bean Class :
public class SystemDetailData extends BusinessData {
/**
*
*/
private static final long serialVersionUID = 1L;
private static final String DOMAIN_NAME = "domainName";
private static final String HOST_NAME = "hostName";
private static final String USER_NAME = "userName";
private static final String PASSWORD = "password";
private static final String INDEX = "index";
private BigInteger index;
private String domainName;
private String hostName;
private String userName;
private String password;
public BigInteger getIndex() {
return (BigInteger) get (INDEX);
}
public void setIndex(BigInteger index) {
set (INDEX, index);
this.index = index;
}
public String getDomainName() {
return (String) get(DOMAIN_NAME).toString();
}
public void setDomainName(String domainName) {
set (DOMAIN_NAME, domainName);
this.domainName = domainName;
}
public String getHostName() {
return (String) get (HOST_NAME);
}
public void setHostName(String hostName) {
set (HOST_NAME, hostName);
this.hostName = hostName;
}
public String getUserName() {
return (String) get (USER_NAME);
}
public void setUserName(String userName) {
set (USER_NAME, userName);
this.userName = userName;
}
public String getPassword() {
return (String) get (PASSWORD);
}
public void setPassword(String password) {
set (PASSWORD, password);
this.password = password;
}
#Override
public String toString() {
return "SystemDetailData [index=" + index + ", domainName="
+ domainName + ", hostName=" + hostName + ", userName="
+ userName + ", password=" + password + "]";
}
#Override
public String getKeyValue() {
String value = "";
if (index != null) {
value = value + "INDEX =" + index + ";";
}
if (domainName != null) {
value = value + "DOMAIN_NAME =" + domainName + ";";
}
if (userName != null) {
value = value + "USER_NAME =" + userName + ";";
}
if (hostName != null) {
value = value + "HOST_NAME =" + hostName + ";";
}
if (password != null) {
value = value + "PASSWORD =" + password + ";";
}
return value;
}
}
I got below output :
Before remove : [SystemDetailData [index=1, Name=abc, host=192.168.1.2], SystemDetailData [index=2, Name=asd, host=192.168.1.45]]
Old data : [SystemDetailData [index=1, Name=abc, host=192.168.1.2]]
After remove : [SystemDetailData [index=1, Name=abc, host=192.168.1.2], SystemDetailData [index=2, Name=asd, host=192.168.1.45]]
I want below output :
After remove : [SystemDetailData [index=2, Name=asd, host=192.168.1.45]]
If the SystemDetailData Class you have to implement hashcode and equals method. To expand my answer, In java when you want to delete an Object from a collection. Java check if the Object you want to delete is in this collection ( if Collection contains an Object which is equals to the one we want to delete). It uses the method equals. So we have to tell (explain) to Java what is for us the same Object: it can have the same name or the same id or another property ( attribute ). This is a reason why we have to implements equals (and hashcode)
Your SystemDetailData class needs to implement an equals method. When you call remove on the ArrayList, the code is doing something like:
ArrayList<SystemDetailData> items;
void remove(SystemDetailData itemToRemove) {
for ( int i = 0; i < items.size() ++i ) {
if ( items.get(i).equals(itemToRemove) ) {
items.remove(i);
break;
}
}
}
he
So unless the equals method returns true for the item you are passing into the remove method and an item in your collection, nothing wil be removed.
You need to decide exactly what the equals method should look like but if, for example, two items are the same if the ids are the same then you could just add a method to SystemDetailData like:
public boolean equals(Object other) {
SystemDetailData otherData = (SystemDetailData)other;
return otherData.getId() == this.getId();
}
Obviously you'll need to add checks for null, the type of other etc. but that should give you an idea of what the method needs to look like.

How to get element only elements with values Stax

I'm trying to get only elements that have text, ex xml :
<root>
<Item>
<ItemID>4504216603</ItemID>
<ListingDetails>
<StartTime>10:00:10.000Z</StartTime>
<EndTime>10:00:30.000Z</EndTime>
<ViewItemURL>http://url</ViewItemURL>
....
</item>
It should print
Element Local Name:ItemID
Text:4504216603
Element Local Name:StartTime
Text:10:00:10.000Z
Element Local Name:EndTime
Text:10:00:30.000Z
Element Local Name:ViewItemURL
Text:http://url
This code prints also root, item etc. Is it even possible, it must be I just can't google it.
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream input = new FileInputStream(new File("src/main/resources/file.xml"));
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(input);
while (xmlStreamReader.hasNext()) {
int event = xmlStreamReader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
}
if (event == XMLStreamConstants.CHARACTERS) {
if(!xmlStreamReader.getText().trim().equals("")){
System.out.println("Text:"+xmlStreamReader.getText().trim());
}
}
}
Edit incorrect behaviour :
Element Local Name:root
Element Local Name:item
Element Local Name:ItemID
Text:4504216603
Element Local Name:ListingDetails
Element Local Name:StartTime
Text:10:00:10.000Z
Element Local Name:EndTime
Text:10:00:30.000Z
Element Local Name:ViewItemURL
Text:http://url
I don't want that root and other nodes which don't have text to be printed, just the output which I wrote above. thank you
Try this:
while (xmlStreamReader.hasNext()) {
int event = xmlStreamReader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
try {
String text = xmlStreamReader.getElementText();
System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
System.out.println("Text:" + text);
} catch (XMLStreamException e) {
}
}
}
SAX based solution (works):
public class Test extends DefaultHandler {
public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, XMLStreamException {
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
parser.parse(new File("src/file.xml"), new Test());
}
private String currentName;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
currentName = qName;
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
String string = new String(ch, start, length);
if (hasText(string)) {
System.out.println(currentName);
System.out.println(string);
}
}
private boolean hasText(String string) {
string = string.trim();
return string.length() > 0;
}
}
Stax solution :
Parse document
public void parseXML(InputStream xml) {
try {
DOMResult result = new DOMResult();
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader reader = xmlInputFactory.createXMLEventReader(new StreamSource(xml));
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
transformer.transform(new StAXSource(reader), result);
Document document = (Document) result.getNode();
NodeList startlist = document.getChildNodes();
processNodeList(startlist);
} catch (Exception e) {
System.err.println("Something went wrong, this might help :\n" + e.getMessage());
}
}
Now all nodes from the document are in a NodeList so do this next :
private void processNodeList(NodeList nodelist) {
for (int i = 0; i < nodelist.getLength(); i++) {
if (nodelist.item(i).getNodeType() == Node.ELEMENT_NODE && (hasValidAttributes(nodelist.item(i)) || hasValidText(nodelist.item(i)))) {
getNodeNamesAndValues(nodelist.item(i));
}
processNodeList(nodelist.item(i).getChildNodes());
}
}
Then for each element node with valid text get name and value
public void getNodeNamesAndValues(Node n) {
String nodeValue = null;
String nodeName = null;
if (hasValidText(n)) {
while (n != null && isWhiteSpace(n.getTextContent()) == true && StringUtils.isWhitespace(n.getTextContent()) && n.getNodeType() != Node.ELEMENT_NODE) {
n = n.getFirstChild();
}
nodeValue = StringUtils.strip(n.getTextContent());
nodeName = n.getLocalName();
System.out.println(nodeName + " " + nodeValue);
}
}
Bunch of useful methods to check nodes :
private static boolean hasValidAttributes(Node node) {
return (node.getAttributes().getLength() > 0);
}
private boolean hasValidText(Node node) {
String textValue = node.getTextContent();
return (textValue != null && textValue != "" && isWhiteSpace(textValue) == false && !StringUtils.isWhitespace(textValue) && node.hasChildNodes());
}
private boolean isWhiteSpace(String nodeText) {
if (nodeText.startsWith("\r") || nodeText.startsWith("\t") || nodeText.startsWith("\n") || nodeText.startsWith(" "))
return true;
else
return false;
}
I also used StringUtils, you can get that by including this in your pom.xml if you're using maven :
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.5</version>
</dependency>
This is inefficient if you're reading huge files, but not so much if you split them first. This is what I've come with(with google). There are more better solutions this is mine, I'm an amateur(for now).

Categories