Should I map all my indexes with my analyzer on ElasticSearch - java

I'm "almost" new on Elastic Search. I've been using it for a while but never used Analyzers before.
I can make a full text search on my project but the problem is, when I try to find a name like "Alex", I should completely type down the name correcly. It doesn't work with "Al" or "Ale". It says something like "no match found".
I found some source codes from different sites, but it makes me confused.
What should I do is:
1) Creating a nGram tokenizer
2) Then mapping it with my all indexes?
I have lots of indexes already created and I got errors while creating a mapping on them.
Should I create my analyzer settings and mapping very in the beggining just before indexing my records ?
I'm working on a Java project, so answers on JAVA API will be very appreciated.
Thanks a lot!

mappings should always be created first and then the data should be indexed. if possible, delete your old indices and recreate with new mapping. if you are concerned about loosing your data, then just create a new type for an existing index. the new type can use the new mapping.
for example, here is a piece that uses the Java API to create a custom mapping
public class MappingCreator {
static Logger log = Logger.getLogger(MappingCreator.class.getName());
final static String indexName = "indexName";
final static String typeName = "typeName";
final static String mappingFileName = "pathToMapping.jsonFile";
final static String clusterName = "elasticsearch"; // or name of your cluster
final static String hostName = "localhost";
public static void main(String args[]) throws IOException
{
MappingCreator mapCreator = new MappingCreator();
Client myESclient = getClient();
IndicesExistsResponse res = myESclient.admin().indices().prepareExists(indexName).execute().actionGet();
if (res.isExists()) {
log.warn("Index "+indexName +" already exists. Will be deleted");
final DeleteIndexRequestBuilder deleteIndexBuilder = myESclient.admin().indices().prepareDelete(indexName);
deleteIndexBuilder.execute().actionGet();
}
final CreateIndexRequestBuilder createIndexBuilder = myESclient.admin().indices().prepareCreate(indexName)
.addMapping(typeName, mapCreator.getIndexFieldMapping());
CreateIndexResponse createIndexResponse = createIndexBuilder.execute().actionGet();
log.debug("Created mapping "+createIndexResponse.toString());
myESclient.close();
}
private String getIndexFieldMapping() throws IOException {
return IOUtils.toString(getClass().getClassLoader().getResourceAsStream(mappingFileName));
}
private static Client getClient() {
TransportClient transportClient = null;
try
{
Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", clusterName).build();
transportClient = new TransportClient(settings);
transportClient = transportClient.addTransportAddress(new InetSocketTransportAddress(hostName, 9300));
/* be very careful about the port number here. by default its 9300. note that this is the TCP port which the java api will use. unlike the http port which is 9200 */
}
catch (Exception e)
{
log.error("Error in MappingCreator creating Elastic Search Client\n"
+ "Message "+e.getMessage()+"\n"
+ "StackTrace "+e.getStackTrace()
);
}
return (Client) transportClient;
}
}
i hope this helps. by the way its really cool that you are making your own nGram tokenizer. i would love to see the code for that and how it is done :)

Related

alfresco buildonly indexer for searching the properties created on the fly

I am using the latest version of alfresco 5.1 version.
one of my requirement is to create properties (key / value) where user enter the key as well as the value.
so I have done that like this
Map<QName, Serializable> props = new HashMap<QName, Serializable>();
props.put(QName.createQName("customProp1"), "prop1");
props.put(QName.createQName("customProp2"), "prop2");
ChildAssociationRef associationRef = nodeService.createNode(nodeService.getRootNode(storeRef), ContentModel.ASSOC_CHILDREN, QName.createQName(GUID.generate()), ContentModel.TYPE_CMOBJECT, props);
Now what I want to do is search the nodes with these newly created properties. I was able to search the newly created property like this.
public List<NodeRef> findNodes() throws Exception {
authenticate("admin", "admin");
StoreRef storeRef = new StoreRef(StoreRef.PROTOCOL_WORKSPACE, "SpacesStore");
List<NodeRef> nodeList = null;
Map<QName, Serializable> props = new HashMap<QName, Serializable>();
props.put(QName.createQName("customProp1"), "prop1");
props.put(QName.createQName("customProp2"), "prop2");
ChildAssociationRef associationRef = nodeService.createNode(nodeService.getRootNode(storeRef), ContentModel.ASSOC_CHILDREN, QName.createQName(GUID.generate()), ContentModel.TYPE_CMOBJECT, props);
NodeRef nodeRef = associationRef.getChildRef();
String query = "#cm\\:customProp1:\"prop1\"";
SearchParameters sp = new SearchParameters();
sp.addStore(storeRef);
sp.setLanguage(SearchService.LANGUAGE_LUCENE);
sp.setQuery(query);
try {
ResultSet results = serviceRegistry.getSearchService().query(sp);
nodeList = new ArrayList<NodeRef>();
for (ResultSetRow row : results) {
nodeList.add(row.getNodeRef());
System.out.println(row.getNodeRef());
}
System.out.println(nodeList.size());
} catch (Exception e) {
e.printStackTrace();
}
return nodeList;
}
The alfresco-global.properties indexer configuration is
index.subsystem.name=buildonly
index.recovery.mode=AUTO
dir.keystore=${dir.root}/keystore
Now my question is
Is it possible to achieve the same using the solr4 indexer ?
Or Is there any way to use buildonly indexer for a particular query ?
In your query
String query = "#cm\\:customProp1:\"prop1\"";
remove cm as you are building the QName on the fly so it does not come under cm i.e. (ContentModel) properties. So your query will be
String query = "#\\:customProp1:\"prop1\"";
Hope this will work for you
First, double check if you're simply experiencing eventual consistency, as described below. If you are, and if this presents a problem for you, consider switching to CMIS queries while staying on SOLR.
http://docs.alfresco.com/5.1/concepts/solr-event-consistency.html
Other than this, check if the node has been indexed at all. If it has, take a closer look at how you build your query.
How to find List of unindexed file in alfresco

Develop a web application based on triplestore database

I've recently developed a "classic" 3-tier web applications using Java EE.
I've used GlassFish as application server, MS SQL Server as DBMS and xhtml pages with primefaces components for the front end.
Now, for educational purposes, I want to substitute the relational db with a pure triplestore database but I'm not sure about the procedure to follow.
I've searched a lot on google and on this site but I didn't find what I was looking for, because every answer I found was more theoretical than practical.
If possible, I need a sort of tutorial or some practical tips.
I've read the documentation about Apache Jena but I'm not able to find a solid starting point.
In particoular:
- In order to use MS SQL Server with GlassFish I've used a JDBC Driver, created a datasource and a connection pool. Does it exist an equivalent procedure to set up a triple store database?
- To handle users authentication, I've used a Realm. What should I do now?
For the moment I've created "by hand" a RDF schema and using Jena Schemagen I've translated it into a Java Class. What should I do now?
After several attempts and other research on the net I finally achieved my goal.
I decided to develop a hybrid solution in which I manage users login and their navigation permits via MS SQL Server and JDBCRealm, while I use Jena TDB to save all the other data.
Starting with an RDF schema, I created a Java class that contains resources and properties to easily create my statements via code. Here's an example:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns="http://www.stackoverflow.com/example#"
xml:base="http://www.stackoverflow.com/example">
<rdfs:Class rdf:ID="User"></rdfs:Class>
<rdfs:Class rdf:ID="Project"></rdfs:Class>
<rdf:Property rdf:ID="email"></rdf:Property>
<rdf:Property rdf:ID="name"></rdf:Property>
<rdf:Property rdf:ID="surname"></rdf:Property>
<rdf:Property rdf:ID="description"></rdf:Property>
<rdf:Property rdf:ID="customer"></rdf:Property>
<rdf:Property rdf:ID="insertProject">
<rdfs:domain rdf:resource="http://www.stackoverflow.com/example#User"/>
<rdfs:range rdf:resource="http://www.stackoverflow.com/example#Project"/>
</rdf:Property>
</rdf:RDF>
And this is the Java class:
public class MY_ONTOLOGY {
private static final OntModel M = ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM);
private static final String NS = "http://www.stackoverflow.com/example#";
private static final String BASE_URI = "http://www.stackoverflow.com/example/";
public static final OntClass USER = M.createClass(NS + "User");
public static final OntClass PROJECT = M.createClass(NS + "Project");
public static final OntProperty EMAIL = M.createOntProperty(NS + "hasEmail");
public static final OntProperty NAME = M.createOntProperty(NS + "hasName");
public static final OntProperty SURNAME = M.createOntProperty(NS + "hasSurname");
public static final OntProperty DESCRIPTION = M.createOntProperty(NS + "hasDescription");
public static final OntProperty CUSTOMER = M.createOntProperty(NS + "hasCustomer");
public static final OntProperty INSERTS_PROJECT = M.createOntProperty(NS + "insertsProject");
public static final String getBaseURI() {
return BASE_URI;
}
}
Then I've created a directory on my PC where I want to store the data, like C:\MyTDBdataset.
To store data inside it, I use the following code:
String directory = "C:\\MyTDBdataset";
Dataset dataset = TDBFactory.createDataset(directory);
dataset.begin(ReadWrite.WRITE);
try {
Model m = dataset.getDefaultModel();
Resource user = m.createResource(MY_ONTOLOGY.getBaseURI() + "Ronnie", MY_ONTOLOGY.USER);
user.addProperty(MY_ONTOLOGY.NAME, "Ronald");
user.addProperty(MY_ONTOLOGY.SURNNAME, "Red");
user.addProperty(MY_ONTOLOGY.EMAIL, "ronnie#myemail.com");
Resource project = m.createResource(MY_ONTOLOGY.getBaseURI() + "MyProject", MY_ONTOLOGY.PROJECT);
project.addProperty(MY_ONTOLOGY.DESCRIPTION, "This project is fantastic");
project.addProperty(MY_ONTOLOGY.CUSTOMER, "Customer & Co");
m.add(user, MY_ONTOLOGY.INSERTS_PROJECT, project);
dataset.commit();
} finally {
dataset.end();
}
If I want to read statements in my TDB, I can use something like this:
dataset.begin(ReadWrite.READ);
try {
Model m = dataset.getDefaultModel();
StmtIterator iter = m.listStatements();
while (iter.hasNext()) {
Statement stmt = iter.nextStatement();
Resource subject = stmt.getSubject();
Property predicate = stmt.getPredicate();
RDFNode object = stmt.getObject();
System.out.println(subject);
System.out.println("\t" + predicate);
System.out.println("\t\t" + object);
System.out.println("");
}
m.write(System.out, "RDF/XML"); //IF YOU WANT TO SEE AT CONSOLE YOUR DATA AS RDF/XML
} finally {
dataset.end();
}
If you want to navigate your model in different ways, look at this tutorial provided by Apache.
If you want to remove specific statements in your model, you can write something like this:
dataset.begin(ReadWrite.WRITE);
try {
Model m = dataset.getDefaultModel();
m.remove(m.createResource("http://http://www.stackoverflow.com/example/Ronnie"), MY_ONTOLOGY.NAME, m.createLiteral("Ronald"));
dataset.commit();
} finally {
dataset.end();
}
That's all! Bye!

RestSharp, spring boot and JSON.org encoding issue

I have one c# desktop client that reads a local DB and uploads the values into the web application.
On the c# side I am using RestSharp and Json.net.
private static void DBUpdater()
{
var client = new RestClient();
client.BaseUrl = BASE_URL;
string json = JsonConvert.SerializeObject(modelo.getComunidades(), Formatting.Indented);
Console.WriteLine(json);
var request = new RestRequest("/nuevascomunidades", Method.POST);
request.RequestFormat = DataFormat.Json;
request.AddParameter("comunidades", json);
Console.WriteLine(client.Execute(request).ResponseStatus);
Console.ReadKey();
}
When I print out by console the generated json string all the characters are well represented.
However, when I get the values from the spring boot/spring data, any special character is represented completely wrong.
On the server side I am deserializing like this usng json.org:
#RequestMapping(value = "/nuevascomunidades", method = RequestMethod.POST)
public void nuevasComunidades(#RequestParam(value = "comunidades") String comunidades) {
logger.debug("######Entra en /nuevascomunidades");
JSONArray entrada = new JSONArray(comunidades);
JSONObject aux;
Comunidad comunidad;
int top = entrada.length();
for (int i = 0; i < top; i++) {
aux = entrada.getJSONObject(i);
comunidad = new Comunidad(Integer.valueOf(aux.getString("Numero")),
aux.getString("Nif"),
aux.getString("Nombre"),
aux.getString("Direccion"),
aux.getString("Cod_postal"),
aux.getString("Poblacion"),
aux.getString("Provincia"),
aux.getString("Pais"),
aux.getBoolean("Baja"));
comunidadRepositorio.save(comunidad);
logger.debug("######Comunidad añadida: " + comunidad.toString());
}
}
Any idea about how to fix the encoding?
Thanks in advance.
RIGHT REPRESENTATION: "Pais": "ESPAÑA # ºª ¡¿?!"
WRONG REPRESENTATION: pais='ESPA├?A # ┬║┬¬ ┬í┬┐?!'
EDIT:
I just added these settings in the application.properties without any success:
# HTTP encoding (HttpEncodingProperties)
spring.http.encoding.charset=UTF-8
spring.http.encoding.enabled=true
spring.http.encoding.force=true
Finally I got the problem.
Basically it was related to the Tomcat 8 settings.
You can read more in here:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q3

Eclipse AST variable binding on standalone java application

I'm trying to use Eclipse ASTParser in order to analyse and, if possible, add some code to some classes. One of the information I need requires to have bindings, but because this is a standalone project (the final goal it's a command line tool, independent from eclipse) I can't have them (requireBinding() returns null).
After reading a lot of posts, the far that I can go is using this examples in order to use FileASTRequestor but that's not the way to go since it seems to me that we have to give the KEY to bind before generating the AST tree.
I've found somewhere that we can use ASTParser.setEnvironment method in order to use the bindings in a standalone java application, but I don't think I'm doing it correctly. What's wrong with the code below?
private static final String rootDir = "D:\\workspace\\stateless\\";
private static final String[] classpath = java.lang.System.getProperty( "java.class.path" ).split(";");
private static final String source =
"package de.siemens.tools.stateless.test.examples; " +
"public class ClassWithFinalMemberVariables {" +
"private final int _memberIntVariable = 0;" +
"public void method() {" +
"int localVariable = 0;" +
"System.out.println(_memberIntVariable + localVariable);" +
"}" +
"}";
public static void main(String[] args) throws CoreException {
Document document = new Document(source);
ASTParser parser = ASTParser.newParser(AST.JLS4);
parser.setKind(ASTParser.K_COMPILATION_UNIT);
parser.setEnvironment(classpath, new String[] { rootDir },
new String[] { "UTF8" }, true);
parser.setSource(document.get().toCharArray());
parser.setResolveBindings(true);
parser.setBindingsRecovery(true);
CompilationUnit unit = (CompilationUnit)parser.createAST(null);
unit.recordModifications();
unit.accept(new ASTVisitor() {
#Override
public void endVisit(VariableDeclarationFragment node) {
IVariableBinding bind = node.resolveBinding();
if(bind == null)
System.out.println("ERROR: bind is null");
super.endVisit(node);
}
Output is always "ERROR: bind is null".
I've already solved it, the code is here:
http://pasteit.com/19433
Even though I prefer the ASTVisitor model, this one gives me every binding available.
And here is the discussion about the problem, for those of you who are curious: https://bugs.eclipse.org/bugs/show_bug.cgi?id=206391
EDIT: I don't have any idea if this is the best solution or not, if you have any suggestion please let me know

Last modied by detail for a file using SVNkit

Hi how to get the last modified by value for a file using SVNkit.
Scenario : the file is updated from SVN and itr is available in local repo(working copy).
You could use svn keywords http://svnbook.red-bean.com/en/1.4/svn.advanced.props.special.keywords.html 'modified by' should be author.
You have to ensure, that the file with the keywords will be changed before every check in. This could be done with an ant script.
The keyword could be used in a constant with a second constant extracting the interesting part:
private static final String SVN_AUTHOR_BASE = "$Author: 113665 $";
/** Is filled in automatically on check in */
public static final String SVN_AUTHOR = SVN_AUTHOR_BASE.
substring(9,SVN_AUTHOR_BASE.indexOf('$', 9) - 1);
public static String getLastModifiedBy(File localPath) throws SVNException {
final SVNStatus status = SVNClientManager.newInstance().getStatusClient().doStatus(localPath, false);
return status != null ? status.getAuthor() : null;
}
SVNProperties props=new SVNProperties();
repository.getFile(filePath,new Long(-1),props,null);
String author=props.getSVNPropertyValue("svn:entry:last-author").toString();
is working fine.

Categories