SAXParseException error in XML Parsing - java

Using web service call ,I got the following response from server. Now i need to parse this response, Extract all field values and store it in String values
<?xml version="1.0" encoding="utf-8"?>
<ecomexpress-objects version="1.0"><object pk="1" model="awb">
<field type="BigIntegerField" name="awb_number">102019265</field>
<field type="CharField" name="orderid">8008444</field>
<field type="FloatField" name="actual_weight">2</field>
<field type="CharField" name="origin">DELHI-DSW</field>
<field type="CharField" name="destination">Mumbai - BOW</field>
<field type="CharField" name="current_location_name">Mumbai - BOW</field>
<field type="CharField" name="current_location_code">BOW</field>
<field type="CharField" name="customer">Ecom Express Private Limited - 32012</field>
<field type="CharField" name="consignee">BEECHAND VERMA</field>
<field type="CharField" name="pickupdate">22-Jan-2014</field>
<field type="CharField" name="status">Undelivered</field>
<field type="CharField" name="tracking_status">Undelivered</field>
<field type="CharField" name="reason_code">221 - Consignee Refused To Accept</field>
<field type="CharField" name="reason_code_description">Consignee Refused To Accept</field>
<field type="CharField" name="reason_code_number">221 </field>
<field type="CharField" name="receiver"></field>
<field type="CharField" name="expected_date" >15-Feb-2014</field>
<field type="CharField" name="last_update_date" ></field>
<field type="CharField" name="delivery_date" ></field>
<field type="CharField" name="ref_awb" >703063993</field>
<field type="CharField" name="rts_shipment" >0</field>
<field type="CharField" name="system_delivery_update" ></field>
<field type="CharField" name="rts_system_delivery_status" Undelivered</field>
<field type="CharField" name="rts_reason_code_number">777</field>
<field type="CharField" name="rts_last_update">22 Jan, 2014, 12:44 </field>
<field type="CharField" name="pincode" >400037</field>
<field type="CharField" name="city" >MUMBAI</field>
<field type="CharField" name="state" >Maharashtra</field>
<field name="scans">
</ecomexpress-objects>
If i try the following code to parse
String xml=result.toString();
try{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is;
is = new InputSource(new StringReader(xml));
Document doc = db.parse(is);
NodeList nodelist = doc.getChildNodes();
}
catch (SAXException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
I get the following error.
org.xml.sax.SAXParseException: expected: /field read: ecomexpress-objects (position:END_TAG </ecomexpress-objects>#1:1917 in java.io.StringReader#39978dff)
I need to store all field values in respective string variables

There are several errors in your xml :
No closing tag object .
Undelivered appears in attribute at : <field type="CharField"
name="rts_system_delivery_status" Undelivered</field>
should be closed like <field name="scans"/>
As long as xml is invalid, you keep getting exceptions.

Your XML is not valid.
The tag <field name="scans"> is not closed.

Related

Solr return only parent document on child query match

I have a set of documents indexed which has a pseudo parent-child relationship. Each child document had a reference to the parent document. Due to some availability complexity, these documents are not being indexed to support block-join, i.e. instead of a nested structure, they are all flat. Here's an example:
<doc>
<field name="id">1</field>
<field name="title">Parent title</field>
<field name="doc_id">123</field>
</doc>
<doc>
<field name="id">2</field>
<field name="title">Child title1</field>
<field name="parent_doc_id">123</field>
</doc>
<doc>
<field name="id">3</field>
<field name="title">Child title2</field>
<field name="parent_doc_id">123</field>
</doc>
<doc>
<field name="id">4</field>
<field name="title">Misc title2</field>
</doc>
What I'm looking is if I search "title2", the result should bring back the following two docs, 1 matching the parent and one based on a regular match.
<doc>
<field name="id">1</field>
<field name="title">Parent title</field>
<field name="doc_id">123</field>
</doc>
<doc>
<field name="id">4</field>
<field name="title">Misc title2</field>
</doc>
With block-join support, I could have used Block Join Parent Query Parser,
q={!parent which="content_type:parentDocument"}title:title2
Transforming result documents is an alternate but it has the reverse support through ChildDocTransformerFactory.
Just wondering if there's a way to address query in a different way. Any pointers will be appreciated.
If you use Solr 6, you might be able to expand the results to include the parent records using Graph query parser.

Extract specific redundant tag from xml in java

I would like to extract the content of "body" from xml file
<row>
<field name="id">28479</field>
<field name="commit_id">53162</field>
<field name="user_id">16</field>
<field name="body">test test test</field>
<field name="line" xsi:nil="true" />
<field name="position" xsi:nil="true" />
<field name="comment_id">390328</field>
<field name="ext_ref_id">524dd257bd3543ae270027f6</field>
<field name="created_at">2011-05-19 01:37:02</field>
</row>
<row>
....
</row>
...
The output that I'm looking for is
test test test
how could I do that by using java code?
You should consider using the XPath API which will allow you to query the XML content, for example...
Based on...
<?xml version="1.0" encoding="UTF-8"?>
<row>
<field name="id">28479</field>
<field name="commit_id">53162</field>
<field name="user_id">16</field>
<field name="body">test test test</field>
<field name="line" xsi:nil="true" />
<field name="position" xsi:nil="true" />
<field name="comment_id">390328</field>
<field name="ext_ref_id">524dd257bd3543ae270027f6</field>
<field name="created_at">2011-05-19 01:37:02</field>
</row>
And using...
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse(new File("Test.xml"));
XPath xpath = XPathFactory.newInstance().newXPath();
// Find the "thing" node...
XPathExpression thingExpr = xpath.compile("/row/field[#name='body']");
Node body = (Node) thingExpr.evaluate(dom, XPathConstants.NODE);
if (body != null) {
System.out.println(body.getTextContent());
}
} catch (Exception exp) {
exp.printStackTrace();
}
Outputs
test test test
Take a look at:
Java API for XML Processing (JAXP)
How XPath Works
XPath Tutorial
For more details

Apach solr. query syntax explanation

I messed with syntax q query:
if I write q=*:* - I see 2 results.
If I skip q - I have not see anything
if I write q=price:* - see 2 results
if I write q=price - 0 results
update
q=price:0 - 1 result
Can you explain differences between these queries?
especially I want to understand what does it mean 4 th variant ?
indexed documents:
add><doc>
<field name="id">3007WFP</field>
<field name="name">Dell Widescreen UltraSharp 3007WFP</field>
<field name="manu">Dell, Inc.</field>
<!-- Join -->
<field name="manu_id_s">dell</field>
<field name="cat">electronics</field>
<field name="cat">monitor</field>
<field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast</field>
<field name="includes">USB cable</field>
<field name="weight">401.6</field>
<field name="price">2199</field>
<field name="popularity">6</field>
<field name="inStock">true</field>
<!-- Buffalo store -->
<field name="store">43.17614,-90.57341</field>
<field name="cat">XXX</field>
</doc></add>
<add>
<doc>
<field name="id">SOLR1000</field>
<field name="name">Solr, the Enterprise Search Server</field>
<field name="manu">Apache Software Foundation</field>
<field name="cat">software</field>
<field name="cat">search</field>
<field name="cat">XXX</field>
<field name="features">Advanced Full-Text Search Capabilities using Lucene</field>
<field name="features">Optimized for High Volume Web Traffic</field>
<field name="features">Standards Based Open Interfaces - XML and HTTP</field>
<field name="features">Comprehensive HTML Administration Interfaces</field>
<field name="features">Scalability - Efficient Replication to other Solr Search Servers</field>
<field name="features">Flexible and Adaptable with XML configuration and Schema</field>
<field name="features">Good unicode support: héllo (hello with an accent over the e)</field>
<field name="price">0</field>
<field name="popularity">10</field>
<field name="inStock">true</field>
<field name="incubationdate_dt">2006-01-17T00:00:00.000Z</field>
</doc>
</add>
If you do not give the value it consider the default value. As in your fourth query
q=price means it searches the default searchable field having value "price"
That's why you are getting 0 result since no price is of 0 value.

xml to java object using castor

What can I do to ignore the <envelope> and <body> tags in unmarshall process using Castor?
Xml examole:
<?xml version="1.0" encoding="UTF-8"?>
<envelope>
<header>
<message>consultaTelefonosVigentesSocios</message>
</header>
<body>
<datosTelefonosVigentesSocios>
<listaTelefonosVigentesSocios>
<nroInterlocutor>2000393451672</nroInterlocutor>
<nroContrato>S6125345450573001</nroContrato>
<nroTelefono>011-4454451-8293</nroTelefono>
<tipoTelefono>T</tipoTelefono>
<claseDireccion>Z001</claseDireccion>
<descClaseDireccion>Correspondencia</descClaseDireccion>
<marcaEstandar>X</marcaEstandar>
<nroInterlocutorAsociadoDomicilio>200053945351672</nroInterlocutorAsociadoDomicilio>
</listaTelefonosVigentesSocios>
<listaTelefonosVigentesSocios>
<nroInterlocutor>200053435391672</nroInterlocutor>
<nroContrato>S612535430573001</nroContrato>
<nroTelefono>011-44453551-8299</nroTelefono>
<tipoTelefono>T</tipoTelefono>
<claseDireccion>Z001</claseDireccion>
<descClaseDireccion>Correspondencia</descClaseDireccion>
<marcaEstandar/>
<nroInterlocutorAsociadoDomicilio>20005543391672</nroInterlocutorAsociadoDomicilio>
</listaTelefonosVigentesSocios>
</datosTelefonosVigentesSocios>
</body>
<fault>
<faultactor>servicios.page:consultaTelefonosVigentesSocios</faultactor>
</fault>
</envelope>
castor mapping file:
<?xml version="1.0"?>
<mapping>
<class
name="ar.com.telefonosSocioByNroContratoService.backend.service.TelefonosVigentesSocios">
<map-to xml="datosTelefonosVigentesSocios" />
<field name="listaTelefonosVigentesSocios"
type="ar.com.telefonosSocioByNroContratoService.backend.service.TelefonoVigenteSocio"
collection="arraylist">
<bind-xml name="listaTelefonosVigentesSocios" />
</field>
</class>
<class
name="ar.com.telefonosSocioByNroContratoService.backend.service.TelefonoVigenteSocio">
<map-to xml="listaTelefonosVigentesSocios" />
<field name="nroInterlocutor" type="java.lang.String">
<bind-xml name="nroInterlocutor" node="element" />
</field>
<field name="nroContrato" type="java.lang.String">
<bind-xml name="nroContrato" node="element" />
</field>
<field name="nroTelefono" type="java.lang.String">
<bind-xml name="nroTelefono" node="element" />
</field>
<field name="tipoTelefono" type="java.lang.String">
<bind-xml name="tipoTelefono" node="element" />
</field>
<field name="marcaEstandar" type="java.lang.String">
<bind-xml name="marcaEstandar" node="element" />
</field>
<field name="descClaseDireccion" type="java.lang.String">
<bind-xml name="descClaseDireccion" node="element" />
</field>
<field name="nroInterlocutorAsociadoDomicilio" type="java.lang.String">
<bind-xml name="nroInterlocutorAsociadoDomicilio" node="element" />
</field>
</class>
</mapping>
Test Class:
public class TelefonosSocioByNroContratoServiceTest {
#Test
public void testUsuarioIntranetListfromXML() throws Exception{
Mapping mapping= new Mapping();
ClassPathResource mappingResource =
new ClassPathResource("/ar/com/telefonosSocioByNroContratoService/backend/service/telefonosVigenteSocios.map.xml");
mapping.loadMapping(mappingResource.getURL());
ClassPathResource inputExample= new ClassPathResource("ar/com/test/castor/consultaTelefonosVigentesSocios.xml");
Reader reader = new FileReader(inputExample.getFile());
Unmarshaller unmarshaller = new Unmarshaller(TelefonosVigentesSocios.class);
unmarshaller.setMapping(mapping);
TelefonosVigentesSocios telefonosVigentesSocios = (TelefonosVigentesSocios) unmarshaller.unmarshal(reader);
reader.close();
Assert.assertNotNull(telefonosVigentesSocios);
Assert.assertNotNull(telefonosVigentesSocios.getListaTelefonosVigentesSocios());
Assert.assertTrue("se esperaba not empty telefonos",!telefonosVigentesSocios.getListaTelefonosVigentesSocios().isEmpty());
}
}
Instead of using an input stream, you could use an XMLStreamReader (StAX) as your input. Then advance the XMLStreamReader to the start element event for the content you mapped to. Then have Castor unmarshal from the XMLStreamReader.
If Castor does not support StAX then I can show you how to do it with JAXB. I lead the EclipseLink JAXB implementation (MOXy).

Why can't I query SolrJ for a URL?

I have a Solr schema that has a "url" field:
<fieldType name="url" class="solr.TextField"
positionIncrementGap="100">
</fieldType>
<fields>
<field name="id" type="string" stored="true" indexed="true"/>
<field name="url" type="url" stored="true" indexed="false"/>
<field name="chunkNum" type="long" stored="true" indexed="false"/>
<field name="origScore" type="float" stored="true" indexed="true"/>
<field name="concept" type="string" stored="true" indexed="true"/>
<field name="text" type="text" stored="true" indexed="true"
required="true"/>
<field name="title" type="text" stored="true" indexed="true"/>
<field name="origDoctype" type="string" stored="true" indexed="true"/>
<field name="keywords" type="string" stored="true" indexed="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
I can add SolrInputDocuments with all the fields and query them back using the text field and/or with a filter query on "concept". But when I try to query a specific url, I don't get any results. My code looks like:
SolrQuery query = new SolrQuery();
query.setQuery("url:" + ClientUtils.escapeQueryChars(url));
//query.setQuery("*:*");
//query.addFilterQuery("url:" + ClientUtils.escapeQueryChars(url));
List<Chunk> retCode = null;
try
{
QueryResponse resp = solrServer.query(query);
SolrDocumentList docs = resp.getResults();
retCode = new ArrayList<Chunk>(docs.size());
for (SolrDocument doc : docs)
{
LOG.debug("got doc " + doc);
Chunk chunk = new Chunk(doc);
retCode.add(chunk);
}
}
catch (SolrServerException e)
{
LOG.error("caught a server exception", e);
}
return retCode;
I've tried with and without the ClientUtils.escapeQueryChars and I've tried using a query of "url:" or a filter query on url. I never get anything back. Any hints?
Whats the actual type of "url"? In your schema.xml you should have a set of "fieldType" elements which list the actual Solr backing classes and filters that make up a data type.
For your "fieldType" for the "url" you are interested in the "class" attribute. E.g. the most basic free-text type has a class="solr.TextField". You might be using a type that has some wacky filters on it and Lucene/Solr ends up indexing your data differently from what you would expect.
Download Luke and look at your index visually:
http://www.getopt.org/luke/
It will help you "look" at your data - like I said, maybe its stored differently than what you expect.
Dammit, another stupid one on my part: Thanks to Cody's suggestion of using Luke, I discovered this inconvenient part of the schema:
<field name="url" type="url" stored="true" indexed="false"/>
Changing that to indexed="true" fixed the problem.

Categories