MarkLogic Search return document collections - java

Is there a way to return the collections of a document if you are using the search api?
I could not find a option in the Query Options Reference for that use case.
Right now i would have to build my own wrapper around the search api and find the collections of search results by myself:
xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy";
let $docs := search:search("query")
return for $doc in $docs
return xdmp:node-collections(doc($doc/search:result/#uri))
Edit: This should be also availiable with the marklogic java client api.

In case you are using the MarkLogic REST api, you can use the category parameter on /v1/search to pull up metadata instead of content. If you would like to blend it into the search results, you best use a REST transform on /v1/search using the transform parameter. See also:
https://docs.marklogic.com/REST/GET/v1/search
HTH!

To get only document metadata such as collections and not the document content, write and install a server-side transform that takes calls xdmp:node-collections() on the document and constructs a replacement document. See:
http://docs.marklogic.com/guide/java/transforms
Then call the QueryDefinition.setResponseTransform() method to specify the server-side transform:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/query/QueryDefinition.html#setResponseTransform-com.marklogic.client.document.ServerTransform-
before passing the query definition to the DocumentManager.search() method:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/document/DocumentManager.html#search-com.marklogic.client.query.QueryDefinition-long-
As a footnote, the DocumentManager.search() method can retrieve both the document metadata and content in a single request without a server-side transform by calling DocumentManager.setMetadataCategories() before searching. See:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/document/DocumentManager.html#setMetadataCategories-java.util.Set-
Hoping that helps,

Related

MarkLogic Java API PlanBuilderBase.ExportablePlanBase

I use PlanBuilder.ModifyPlan to retrieve the contents and the results are in StringHandle().
I see the PlanBuilderBase.ExportablePlanBase but there is no reference as how to use its exportAs method.
This method should be sth like:
ExportablePlan ep = plan.exportAs(String);
Typically, an application wouldn't call exportAs().
Instead, an application would pass the plan to methods of the RowManager class. Internally, the implementation of such methods export the plan for sending to the server.
In particular, the following RowManager methods take a plan and get its result rows or an explanation of the query preparation:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/expression/class-use/PlanBuilder.Plan.html#com.marklogic.client.row
Here is an example of getting result rows:
http://docs.marklogic.com/guide/java/OpticJava#id_93678
RowManager also provides methods for binding parameters of the plan to literal values before sending the plan to the server:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/expression/class-use/PlanBuilder.Plan.html#com.marklogic.client.expression
Examples of edge cases where an application might want to export a plan include:
logging
inserting into a JSON document so an enode script could import a plan without receiving the plan from the client
The exported plan is a JSON document (represented as a String, if the exportAs() method is used). After exporting the plan, the application could process the JSON document in the same way as any other JSON document. For instance, the application could use JSONDocumentManager to write the plan as a document in the content database.
Hoping that helps,

How to write an xpath for a dynamic URL

The below is the xpath I have written for
//form[#data-validate-url='/se/register/validate']
But the field data-validate-url changes from time to time e.g
data-validate-url= /gb/register/validate
data-validate-url= /de/register/validate
So, how to write an xpath having dynamic content. Please help
you can simply write this xpath using contains() function as below:
//form[contains(#data-validate-url,'/register/validate')]
In your url the dynamic part seems /gb , /de and rest of things are constant Use the CSS locator in following way to handle the same
driver.findElement(By.cssSelector("form[data-validate-url$='register/validate']"));
Also look into these
css=form[data-validate-url^='prefix_']
css=form[data-validate-url$=' _suffix']
css=form[data-validate-url*='_pattern_']

Recursively scan documents for indexing in a folder in SolrJ

I understand that in SimplePostTool (post.jar), there is this command to automatically detect content types in a folder, and recursively scan it for documents for indexing into a collection:
bin/post -c gettingstarted afolder/
This has been useful for me to do mass indexing of all the files that are in the folder. Now that I'm moving to production and plans to use SolrJ to do the indexing as it can do more things like robustness checks and retires for indexes that fails.
However, I can't seems to find a way to do the same in SolrJ. Is it possible for this to be done in SolrJ? I'm using Solr 5.3.0
Thank you.
Regards,
Edwin
If you're looking to submit content to an extracting request handler (for indexing PDFs and similar rich documents), you can use the ContentStreamUpdateRequest method as shown at Uploading data with SolrJ:
SolrClient server = new HttpSolrClient("http://localhost:8983/solr/my_collection");
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
req.addFile(new File("my-file.pdf"));
server.request(req);
To iterate through a directory structure recursively in Java, see Best way to iterate through a directory in Java.
If you're planning to index plain content (and not use the request handler), you can do that by creating the documents in SolrJ itself and then submitting the documents to the server - there's no need to write them to a temporary file in between.

OpenSearch Compatible Response From Java

Here is an example OpenSearch description file:
http://webcat.hud.ac.uk/OpenSearch.xml
When I send a query as like that:
http://webcat.hud.ac.uk/perl/opensearch.pl?keyword=new&startpage=1&itemsperpage=20
I get a response which is compatible to OpenSearch. How can I implement OpenSearch specification at Java or is there any library for it or is there any xsd that I can generate a Java code from it?
According to the OpenSearch website's section on "Reading OpenSearch", there is a Java library which can do this, called Apache Abdera. I have not used it myself, so I cannot comment on its quality, but it should be worth looking into - apparently it can both interpret AND create OpenSearch responses, so this may be exactly what you're looking for.
Alternatively, there are quite a few very good XML parsers for Java (see this question for some suggestions), so writing your own parser for a simple OpenSearch XML file shouldn't be too difficult, since the full specification is available online.
As for an XSD, I can't find an "official" one, however there are XSD's for OpenSearch in various open source projects which have been tested and you can use, such as this one, which is part of a project called "OpenSearch Validator."
Another potential choice for writing OpenSearch results is the very mature and widely-used Apache Lucene library, which is in the list of software "writing OpenSearch results" in the previously linked OpenSearch website.
ROME also supports OpenSearch with its ROME Module A9 OpenSearch.
Sample usage:
SyndFeed feed = new SyndFeedImpl();
feed.setFeedType(feedType);
// Add the opensearch module, you would get information like totalResults from the
// return results of your search
List mods = feed.getModules();
OpenSearchModule osm = new OpenSearchModuleImpl();
osm.setItemsPerPage(1);
osm.setStartIndex(1);
osm.setTotalResults(1024);
osm.setItemsPerPage(50);
OSQuery query = new OSQuery();
query.setRole("superset");
query.setSearchTerms("Java Syndication");
query.setStartPage(1);
osm.addQuery(query);
Link link = new Link();
link.setHref("http://www.bargainstriker.com/opensearch-description.xml");
link.setType("application/opensearchdescription+xml");
osm.setLink(link);
mods.add(osm);
feed.setModules(mods);
// end add module

Validating an XML NCName in Java

I'm getting some values from Java annotations in an annotation processor to generate metadata. Some of these values are supposed to indicate XML element or attribute names. I'd like to validate the input to find out if the provided values are actually legal NCNames according to the XML specification. Only the local name is important in this case, the namespace URI doesn't play a part here.
Is there some simple way of finding out if a string is a legal XML element or attribute name? Preferably I'd use some XML API that is readily available in Java SE. One of the reasons I'm doing this stuff in the first place is to cut back on dependencies. I'm using JDK 7 so I have access to the most up-to-date classes/methods.
So far, browsing through content handler classes and SAX/DOM stuff hasn't yielded any result.
If you're prepared to have Saxon on your class path you can do
new Name10Checker().isValidNCName(s);
I can't see anything simpler in the public JDK interface.
didn't find anything straightforward in any of the jdk 6 APIs (don't know about jdk 7). a quick but possibly "hackish" way to check would be to convert it to an xml doc and see if it parses:
String name = ...;
if(name.contains(">")) {
return false;
}
String xmlDoc = "<" + name + "/>";
DocumentBuilder db = ...;
db.parse(new InputSource(new StringReader(xmlDoc)));
I ran into the same problem and found lots of implementations in foss libraries, and even an old implementation in a Java class library, which has been removed ages ago... So here's a few options to choose from:
Java Class Library: XMLUtils.isValidNCName(String ncName) (note: removed in 2004)
Apache Axis: NCName.isValid(String stValue)
Saxonica: NameChecker.isValidNCName(CharSequence ncName)
OWL API: XMLUtils.isNCName(java.lang.CharSequence s)
Validator.nu HTML Parser: NCName.isNCName(java.lang.String str)
So, if you're using one of these libraries anyway, you're fine.
As I am not, I'll go with a copy of the XMLUtils from the OWL API, which has no external dependencies, is available under non-restrictive licenses (LGPL and Apache 2.0) and consists of nice and clean code.

Categories