Pushing AEM content into Solr 6

Pushing AEM content into Solr 6 - java

I am trying to push the AEM page content to solr remote server. Is there a way we can do it from AEM directly or we have to write a service for it. If I need a service what api should I use. I was able to create solr schema using solrindex node under oak:index.
Thanks
Abhishek

I had similar requirement. When I synced AEM with remote SOLR a separate document was created for each AEM node. So I ended up creating my custom service to bulk load all content pages to solr. I used AEM's query api to extract page content to get id, title, description and path. For description field I did tree traversal to extract property values and created space delimited description text field. I used solrj to then add documents to solr.

Adding Reference links to what Opkar has shared:-
Link:- http://www.aemsolrsearch.com/#/
Git:- https://github.com/headwirecom/aem-solr-search
Video/Demo :- http://www.aemsolrsearch.com/#/demo
AEM 6.2 Documentation :- https://docs.adobe.com/docs/en/aem/6-2/deploy/platform/queries-and-indexing.html#Configuring AEM with an embedded SOLR server
Adobe AEM Community post:- http://help-forums.adobe.com/content/adobeforums/en/experience-manager-forum/adobe-experience-manager.topic.html/forum__ir8q-is_there_a_detailed.html
I hope this would be helpful.
Thanks and Regards
Kautuk Sahni

Related

Generate HTML file using JSP template

How is it possible to generate HTML file using Servlet/JSP?
I'm using Spring MVC to create a service. This service would get some data from the database. Then I'd like it to read a JSP template from somewhere else, not from WEB-INF. There are attributes which are the data from the database that would be passed in upon reading this template. Then it should return a string which includes the source of JSP. This string should now contain the data which replaces the JSP variables. Finally, this string should be written to a file (.html file).
I can't find a tutorial about it. Instead, i noticed mostly the JSP file is being dispatched as response and displayed on the browser.
Please help. Thannks.

You can use spring with Thymeleaf.
Thymeleaf is a Java-based library used to create a web application. It provides good support for serving an XHTML/HTML5 in web applications. In this chapter, you will learn in detail about Thymeleaf.
Tutorial from Thymeleaf:- click here

Create Site in Alfresco using the Apache Chemistry

Greetings to the community! I am using alfresco Community Edition 6.0.0 with the Apache Chemistry API. I have successfully managed so far to create/fetch content from the alfresco repository through it (Folder and Document files).
Now what I would like to do is use the Apache Chemistry API to create an alfresco site (like I would do using the alfresco/api/-default-/public/alfresco/versions/1/sites POST method in the Alfresco REST API).
Is that feasible?? What I have done following the way I already created folders in the repository is:
Folder folder = retrieveSitesFolder(); // this returns the folder object using the node id of the "Sites" node
Map<String, Object> props = new HashMap<String, Object>();
props.put(PropertyIds.OBJECT_TYPE_ID, "F:st:site"); //this is recognized fine
props.put("st:siteVisibility", "PUBLIC");
props.put("st:sitePreset", "something");
props.put("cmis:name", "something");
Folder subFolder = folder.createFolder(props);
I am following the site model from here concerning the properties I add
https://svn.alfresco.com/repos/alfresco-open-mirror/alfresco/COMMUNITYTAGS/V4.2a/root/projects/repository/config/alfresco/model/siteModel.xml
Unfortunately, when I run this piece of code I get the following error:
Exception in thread "main" org.apache.chemistry.opencmis.commons.exceptions.CmisRuntimeException: 10290059 Site something does not exist.
which seems to me very strange as what I expect my code to do is create that site not search for it in anyway.
What makes this more strange is when I created a site with name "something" via the REST API and re-run the code, the code run successfully, but I did not get any extra site in the alfresco/api/-default-/public/alfresco/versions/1/sites endpoint of the REST API.
Could anyone shed some light on this please? Any help would be greatly appreciated!

As Gagravarr says the API hasn't supported creating functional sites until, as Billerby pointed out, the REST API made some improvements.
Apache Chemistry has no idea what a site is, but, as you've discovered, an st:site is just a child type of cm:folder.
Despite that this is most likely not going to work via CMIS, I wanted to point out that you are using "something" for site preset. That is not going to work unless you've defined a new site preset called "something".
By default, there is a single out-of-the-box site preset called "site-dashboard" which is the ID for the "Collaboration Site" preset.
You might change your st:sitePreset to "site-dashboard" and see if you get any further.

How to save fetched html content to database in apache nutch?

I'm using 1.8 version of apache nutch. I want to save crawled HTML content to postgre database to do this, I modify FetcherThread.java class as below.
case ProtocolStatus.SUCCESS: // got a page
pstatus = output(fit.url, fit.datum, content, status,
CrawlDatum.STATUS_FETCH_SUCCESS, fit.outlinkDepth);
updateStatus(content.getContent().length);
/*Added My code Here*/
But I want to use plug-in system instead of directly modifying FetcherThread class. To use plug-in system which extension points I need to use?

You could write a custom plugin and implement an extension of org.apache.nutch.indexer.IndexWriter to send the documents to
Postgres as part of the indexing step. You'll need to index the raw content which requires NUTCH-2032 - this is in Nutch 1.11 so you will need to upgrade your version of Nutch.
Alternatively you could write a custom MapReduce job which would take a segments as input, read the content and send it to your DB in the reduce step.

Grab XML from WSO2 governance

I'm in the final stages of a project to allow users to use the Management Console to upload services to WSO2 governance. Once a month, we want to grab all info from WSO2 and transform into HTML and post to a wiki.
At the moment, I haven't found a convient way to export all XML from WSO2 governance through Java. The only solution I found was to grab all services with a certain tag. This is not going to work well if there are other users that do not know to add this tag.
Does any one have any ideas on how I can export the XML from all services on my WSO2 instance?
Thanks!

You have to simply call Registry.getResource() method to retrieve resources..Check the Governance Registry API for available methods..So, you can use any of them which suits to your need..

Found the answer in code here: http://ajithvblogs.blogspot.com/2013/02/how-to-invoke-custom-artifacts-using.html#!/2013/02/how-to-invoke-custom-artifacts-using.html
I needed to get the info from the artifact, not the individual resources.
GenericArtifactManager artifactManager = new GenericArtifactManager(registry, "applications");

How to index data in single solr app using tomcat server

I am working on single solr app. I downloded solr exampple code for net, which is working fine while running on jetty server.It is having data which are to be indexed in C:\apache-solr-1.4.0\example\exampledocs and the indexes are stored in C:\apache-solr-1.4.0\example\solr\data, using jetty server indexes are created using command java -jar post.jar *.xml. Now i want to know how can i achieve this using Tomcat. do i need to change the configuration to change the path for indexe storage and for xml files storage. how data will b indexed so that i would able to search it.

If I understand your question correctly, you'll want to use the -Durl flag when running post.jar, e.g.:
java -jar -Durl=http://localhost:8080/solr/update post.jar solr.xml monitor.xml

In solrconfig.xml you can mention the path that has to hold the index
<dataDir>${solr.data.dir:}</dataDir>

I think you just have to read more from SOLR documentation, and click through what you have in the package.
There is an tomcat deployment doc in solr wiki:
http://wiki.apache.org/solr/SolrTomcat
And the war file is in the dist folder you've downloaded.
How to search it? There is no simple answer. I suggest you read more on the solr wiki. Find out what is a handler, what is the difference between dismax handler and standard handler, how schema.xml defines the database.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Pushing AEM content into Solr 6 - java

I am trying to push the AEM page content to solr remote server. Is there a way we can do it from AEM directly or we have to write a service for it. If I need a service what api should I use. I was able to create solr schema using solrindex node under oak:index. Thanks Abhishek

Related

Generate HTML file using JSP template

Create Site in Alfresco using the Apache Chemistry

How to save fetched html content to database in apache nutch?

Grab XML from WSO2 governance

How to index data in single solr app using tomcat server

Categories

Resources