Programmatically load data into solr using solrj and java

Programmatically load data into solr using solrj and java - java

How can I load data from an xml file into solr using the solrj API?

Thanks Pascal. I miss worded my question, I'm actually using groovy. But in any event your approach does work, but this was my solution:
CommonsHttpSolrServer server = SolrServerSingleton.getInstance().getServer();
def dataDir = System.getProperty("user.dir");
File xmlFile = new File(dataDir+"/book.xml");
def xml = xmlFile.getText();
DirectXmlRequest xmlreq = new DirectXmlRequest( "/update", xml);
server.request(xmlreq);
server.commit();
The first arg to DirectXmlRequest is a url path, it must be "/update" and that the variable xml is a string containing the XML. For example
<add>
<doc>
<field name="title">blah</field>
</doc>
</add>

With Java 6, you can use Xpath to fetch what you need from your xml file. Then, you populate a SolrInputDocument from what you extracted from the xml. When that document contains everything you need, you submit it to Solr using the add method of SolrServer.

SolrClient client = new HttpSolrClient("http://localhost:8983/solr/jiva/");
String dataDir = System.getProperty("user.dir");
File xmlFile = new File(dataDir + "/Alovera-Juice.xml");
if (xmlFile.exists()) {
InputStream is = new FileInputStream(xmlFile);
String str = IOUtils.toString(is);
DirectXmlRequest dxr = new DirectXmlRequest("/update", str);
client.request(dxr);
client.commit();
}

Related

Get value from application-lcl.properties in an xml configuration Spring

i have in some spring application , in application-lcl.properties a line with :
key1=value1
I want to use the value of key1 in another xml like this :
<appender name="ELASTIC" class="com.internetitem.logback.elasticsearch.ElasticsearchAppender">
<url>${key1}</url>
${key1} doesn't work. Do you know how to do it ? (the .xml already exists )
Thanks

Its a 2 step process
Load properties file into java.util.java.util.Properties class object.
Use Properties.storeToXML() method to write the content as XML
String inPropertiesFile = "application.properties";
String outXmlFile = "applicationProperties.xml";
InputStream is = new FileInputStream(inPropertiesFile); //Input file
OutputStream os = new FileOutputStream(outXmlFile); //Output file
Properties props = new Properties();
props.load(is);
props.storeToXML(os, "application.properties","UTF-8");

in the xml , put
<springProperty name="value1" source="key1"/>
and then use it by calling
<url>${value1}</url>

IBM Integration Bus Java Compute Node: output a w3c.dom.Document or String

I have been working on a Java module to transform XMLs for the last few months. It is supposed to take a soap request and fill the soap:header element with additional elements from a metadata repository, for example. The module should be universally implementable into any middleware (my native system is SAP PI).
Now I am tasked with implementing this module as a jar into a JavaCompute Node in IBM Integration Bus. The problem is that to export the resulting XML I need to get the data into the outMessage of the JavaCompute Node. However, I did not find a way to convert an org.w3c.com.Document to MbElement or to insert the Document or its content into the MbElement.
Actually I did not see a way to put anything in there at all (not even an XML String) without using the IBM API as intended, so I would have to write code that reads my already finished Document and builds an MbElement from it.
This looks like the following:
public void evaluate(MbMessageAssembly inAssembly) throws MbException {
MbOutputTerminal out = getOutputTerminal("out");
MbOutputTerminal alt = getOutputTerminal("alternate");
MbMessage inMessage = inAssembly.getMessage();
// create new empty message
MbMessage outMessage = new MbMessage();
MbMessageAssembly outAssembly = new MbMessageAssembly(inAssembly,
outMessage);
try {
// optionally copy message headers
// copyMessageHeaders(inMessage, outMessage);
// ----------------------------------------------------------
// Add user code below
//Create an example output Document
String outputContent = "<element><subelement>Value</subelement></element>";
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(outputContent));
Document outDocument = db.parse(is);
//Get the Document or its content into the outRoot or outMessage somehow.
MbElement outRoot = outMessage.getRootElement();
//Start to iterate over the Document and use Methods like this to build up the MbElement?
MbElement outBody = outRoot.createElementAsLastChild("request");
// End of user code
} catch (MbException e) { ...

You can cast your org.w3c.com.Document to byte array (example). Then you can use the following code:
MbMessage outMessage = new MbMessage();
//copy message headers if required
MbElement oRoot = outMessage.getRootElement();
MbElement oBody = oRoot.createElementAsLastChild(MbBLOB.PARSER_NAME);
oBody.createElementAsLastChild(MbElement.TYPE_NAME_VALUE, "BLOB", yourXmlAsByteArray);
MbMessageAssembly outAssembly = new MbMessageAssembly(inAssembly, inAssembly.getLocalEnvironment(), inAssembly.getExceptionList(), outMessage);

How do I get a cleaned html file from HtmlCleaner?

My application downloads a certain website as HTML file the first time it is started. The HTML file is very messy ofcourse, so I want to clean it with HtmlCleaner, so that I can then parse it with Jsoup. But how do I get a new cleaned html item after it was cleaned?
I did some research and this is all i could find:
HtmlCleaner htmlCleaner = new HtmlCleaner();
TagNode root = htmlCleaner.clean(url);
HtmlCleaner.getInnerHtml(root);
String html = "<" + root.getName() + ">" + htmlCleaner.getInnerHtml(root) + "</" + root.getName() + ">";
But I can't see where in this code does it write to a new file? If it doesn't, how do I implement it so that the old file will be deleted and the new cleaned html file will be created?

you can do something like following:
HtmlCleaner cleaner = new HtmlCleaner();
final String siteUrl = "http://www.themoscowtimes.com/";
TagNode node = cleaner.clean(new URL(siteUrl));
// serialize to xml file
new PrettyXmlSerializer(props).writeToFile(
node , "cleaned.xml", "utf-8"
);
or
// serialize to html file
SimpleHtmlSerializer serializer = new SimpleHtmlSerializer(htmlCleaner.getProperties());
serializer.writeToFile(node, "c:/temp/cleaned.html");

ColdFusion & Java (docx4j library)

I need to do docx manipulation (find/replace on placeholders and checking/unchecking checkboxes). Since ColdFusion 10 integrates well with Java, I decided to try and use the Java library docx4j, which basically mimics the OpenXML SDK (.net platform).
I have the docx4j JAR inside a custom folder, which I have setup in my Application.cfc via JavaSettings (new in CF10, and I tried it with other JARS and it works):
<cfcomponent output="false">
<cfset this.javaSettings =
{LoadPaths = ["/myJava/lib"], loadColdFusionClassPath = true, reloadOnChange= true,
watchInterval = 100, watchExtensions = "jar,class,xml"} />
</cfcomponent>
Now, I'm trying to use this sample:https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/samples/VariableReplace.java
But trying to call the WordprocessingMLPackage fails with the function CreateObject() saying that particular class doesn't exist:
<cfset docObj = createObject("java","org.docx4j.openpackaging.packages.WordprocessingMLPackage") />
Any ideas? I'm not really a Java guy, but there are not many options out there for docx manipulation.

Alright. Seems like I got everything working. I just got to figure out how to do a find/replace, and everything else I want to do in a docx document. Here's my code so far to show you guys that it looks like it is working (make sure that your Application.cfc looks like the original post if you are on CF10):
<cfscript>
docPackageObj = createObject("java","org.docx4j.openpackaging.packages.WordprocessingMLPackage").init();
docObj = createObject("java","org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart").init();
xmlUtilObj = createObject("java","org.docx4j.XmlUtils").init();
wmlDocObj = createObject("java","org.docx4j.wml.Document").init();
saveToZipFile = createObject("java","org.docx4j.openpackaging.io.SaveToZipFile").init(docPackageObj);
strFilePath = getDirectoryFromPath(getCurrentTemplatePath()) & "testDoc.docx";
wordMLPackage =
docPackageObj.load(createObject("java","java.io.File").init(javaCast("string",strFilePath)));
documentPart = wordMLPackage.getMainDocumentPart();
// unmarshallFromTemplate requires string input
strXml = xmlUtilObj.marshaltoString(documentPart.getJaxbElement(),true);
writeDump(var="#strXml#");
</cfscript>
Now, does anybody know how to cast structures in ColdFusion into hashmaps (or collections in general)? I think structures in CF are actually util.Vector, whereas hashmaps are util.HashMap. All of the examples I see with Docx4j that demonstrates find/replace in placeholders use this:
HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("colour", "green");
mappings.put("icecream", "chocolate");

Have you tried setting loadColdFusionClassPath = false instead of true? Perhaps there is a conflict with some of the JARs that ship w/ CF.

(Not really a new answer, but it is too much code for comments ..)
Here is the full code for the docx4j VariableReplace.java example
<cfscript>
saveToDisk = true;
inputFilePath = ExpandPath("./docx4j/sample-docs/word/unmarshallFromTemplateExample.docx");
outputFilePath = ExpandPath("./OUT_VariableReplace.docx");
inputFile = createObject("java", "java.io.File").init(inputFilePath);
wordMLPackage = createObject("java","org.docx4j.openpackaging.packages.WordprocessingMLPackage").load(inputFile);
documentPart = wordMLPackage.getMainDocumentPart();
XmlUtils = createObject("java","org.docx4j.XmlUtils");
xmlString = XmlUtils.marshaltoString(documentPart.getJaxbElement(),true);
mappings = createObject("java", "java.util.HashMap").init();
mappings["colour"] = "green";
mappings["icecream"] = "chocolate";
obj = XmlUtils.unmarshallFromTemplate(xmlString , mappings);
documentPart.setJaxbElement(obj);
if (saveToDisk) {
saveToZipFile = createObject("java","org.docx4j.openpackaging.io.SaveToZipFile").init(wordMLPackage);
SaveToZipFile.save( outputFilePath );
}
else {
WriteDump(XmlUtils.marshaltoString(documentPart.getJaxbElement(), true, true));
}
</cfscript>

Fail to upload a image file into Google Doc via java api

below is my code
DocsService client = new DocsService("testappv1");
client.setUserCredentials(username, password);
client.setProtocolVersion(DocsService.Versions.V2);
File file = new File("C:/test.jpg");
DocumentEntry newDocument = new DocumentEntry();
newDocument.setTitle(new PlainTextConstruct("test"));
String mimeType = DocumentListEntry.MediaType.fromFileName(file.getName()).getMimeType();
newDocument.setMediaSource(new MediaFileSource(file, mimeType));
newDocument = client.insert(destFolderUrl, newDocument);
the document was created successful, but it did not contain anything.

try the following
client.insert(new URL("https://docs.google.com/feeds/documents/private/full/?convert=false"), newDocument);
i think the ?convert=false bit is important, not sure how you do that without the url
client.insert(new URL(destFolderUrl+ "?convert=false"), newDocument);
would hopefully work in your case

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Programmatically load data into solr using solrj and java - java

How can I load data from an xml file into solr using the solrj API?

With Java 6, you can use Xpath to fetch what you need from your xml file. Then, you populate a SolrInputDocument from what you extracted from the xml. When that document contains everything you need, you submit it to Solr using the add method of SolrServer.

Related

Get value from application-lcl.properties in an xml configuration Spring

IBM Integration Bus Java Compute Node: output a w3c.dom.Document or String

How do I get a cleaned html file from HtmlCleaner?

ColdFusion & Java (docx4j library)

Fail to upload a image file into Google Doc via java api

Categories

Resources