Neo4j - Java heap space. Wrong query or settings? - java

i have a problem with neo4j.
I don't know if problem is my query or something else.
Intro
I have to build an application that store bus/train routes.
This is my schema:
Nodes:
Organizaton: company that have routes/bus etc..
Route: A bus route like: Paris - Berlin.
Vehicle(Bus in this case): Fisical bus with a unique license plate.
Stops: point in a map with latitude and longitude.
Important Relationships:
NEXT: This is a really important relationship.
NEXT relationships contains those properties:
startHour
startMinutes
endHour
endMinutes
dayOfWeek (from 0 to 6 - Sun, Mon etc..)
vehicleId
Problem
My query is:
MATCH (s1:Stop {id: {departureStopId}}), (s2:Stop {id: {arrivalStopId}})
OPTIONAL MATCH (s1)-[nexts:NEXT*]->(s2)
WHERE ALL(i in nexts WHERE toInt(i.dayOfWeek) = {dayOfWeek} AND toInt(i.startHour) >= {hour})
RETURN nexts
LIMIT 10
For example: I wanna found all nexts relationships where dayOfWeek is Sunday (0) and property startHour > 11
After that I usually parse and validate final object on my nodejs backend.
This works when i was at the start.. with 1k relationships..
Now i have 10k relationships and my query have a TIMEOUT problem or queries are solved in 30s.. too much time...
I have no idea how to solve this.
I use neo4j with docker and i tried to read settings docs but i have no idea how Java works.
Can you help me guys?
UPDATE
Thank you all guys!
For now i solved with "allShortestPaths" but I think i will rename all relationships (like Michael Hunger said).

Have you tried:
MATCH p=allShortestPaths((s1:Stop {id: {departureStopId}})-[:NEXT*]-> (s2:Stop {id: {arrivalStopId}}) )
WHERE ALL(i in RELS(p) WHERE toInt(i.dayOfWeek) = {dayOfWeek} AND toInt(i.startHour) >= {hour})
RETURN rels(p) as nexts
LIMIT 10
This should use the fast shortest path algorithm because:
Planning shortest paths in Cypher can lead to different query plans depending on the predicates that need to be evaluated. Internally, Neo4j will use a fast bidirectional breadth-first search algorithm if the predicates can be evaluated whilst searching for the path.
See https://neo4j.com/docs/developer-manual/current/cypher/execution-plans/shortestpath-planning/#_shortest_path_with_fast_algorithm for more details.

Can you share your profile.
I presume you have a constraint on :Stop(id)
I would use shortest path or dijkstra with costs instead of optional match.
OPTIONAL MATCH will try to find ALL of such paths which are hundreds of millions and filter them as they go.
And it might make sense to group your NEXT relationships by day of week, .e.g :NEXT_MO, :NEXT_THU so you only look at 1/7 th of the data.

It's not settings; it's the fact that your query must visit each and every node in the graph in order to satisfy the query.
The problem would show itself in a relational database when a TABLE SCAN had to be used instead of an index.
I think the solution is to add buckets for hours, like you already have for days. If you have to have minutes, make 96 fifteen minute buckets to cover a day. That will give the query optimizer its best chance.

Related

How to implement database engine UPDATE command

I am developing a simple database engine in Java (using text files as tables) and I have to implement code for CRUD operations. I have successfully written code for CREATE and INSERT commands already. Now I want to continue with UPDATE which should look like this:
UPDATE table-name SET attribute-name=literal {,attribute-name=literal} WHERE condition
But I have an issue here, I am stuck with "condition". How can I approach the implementation of a condition? (WHERE attr1 = something AND attr2 >= something OR . . .) I will very much appreciate your feedback.
Best regards.
The WHERE part is always the most important component of any database system. To find out all the records satisfying the conditions in WHERE part, you should build proper indexes for any columns included in the condition.
For example, you will find WHERE attr1 = something AND attr2 >= something OR...
, then columns attr1, attr2 must have been indexed, otherwise it will take terrible long time to perform.
Index techniques may be hash index (for K-V search), B+ tree index and all their derived implementations.

Exact Match in SOLR 5.1

I have setup Solr 5.1.0 with proper data importation from MYSQL database. It is working good.
But I want exact match results or relevant to that only.
like,
Dancers in Mumbai
It gives all results which contains "dancers + mumbai" and only "dancers" + only "mumbai" keywords. I want result which must contains only "dancers + mumbai" not others.
This is not a complete answer, but it's the direction I'm trying to take with a similar problem. Comments are very welcome.
Step 1:
Implement multiple Solr cores, core 1 is "jobs" (dancers/lawyers/etc), and core 2 is "cities" (mumbai/chennai/etc).
Step 2:
Query each core for exact matches, so implement the KeywordTokenizerFactory on the relevant field to find exact matches only. This will give you all the matches accross cores (e.g. jobs: dancers and cities:mumbai).
Step 3:
Perform your general query using EDisMax for a user-friendly search (e.g. searching for "dancers in mumbai" accross many fields), and use the boost field to boost the jobs/cities found in the earlier query.
I would love to know if there is a better way of doing something this elaborate, but I have not found it yet. Hope it helps.
Using required terms like: +dancers +mumbia
Or a phrase query: "dancers in mumbia"
Would work.
You can also set the default operator for your query to be "AND", using the q.op parameter.

How to force hibernate to search for values with different patterns?

First of all, question is not about ignoring white spaces at the beginning or end of the strings so it is not a duplicate.
I have a mobile field in database that its values are in different formats such as xxx xxx xxx, xxxxxxxxx, x xxx xxx xx etc, how can I make hibernate criteria to ignore the patterns of strings?
For example, lets say the number in database is 344 555 666
344555666 is failed
344 555 666 is failed
344 is true (first three digits that do not have space in database!)
However, there is no doubt that all numbers are provided and all aforementioned values should return 344 555 666 as their results.
Another example would be as following:
Lets say a user searches for all phone numbers that includes 12345; then DB returns following results 12345678, 12345987 and 12345768 now I need to format these three numbers that are returned by DB before showing to the user.
Code
...
private String mobile;
....
Hibernate
.add(Restrictions.ilike("user.mobile", number);
PVR's answer is useful,but how about if in future I needed to add a new format like XXX-XXX-XXX or X-XXXX-XXXX-XXXX ? Please also note there is only one field that user uses to enter the search value.
Try using following..
criteria.add(Restrictions.ilike(
user.mobile, number, MatchMode.ANYWHERE));
Edit :
I meant that if the format of the no. in the database can only be one amongst XXX XXX XXXX / XXXXXXXXXX then we need to write a specific logic which checks both of the formats availability in database.
number1 : in format of XXX XXX XXXX
number2 : in format of XXXXXXXXXX
criteria.add(Restrictions.or(Restrictions.ilike(
user.mobile, number1, MatchMode.ANYWHERE),(Restrictions.ilike(
user.mobile, number2, MatchMode.ANYWHERE)));
Facing such problem, I usually reverse it. Currently, you have in a single column of your database (mobile) values in different formats (xxx xxx xxx, xxxxxxxxx, x xxx xxx xx etc.) and it is hard to make search on that column.
You should still allow input of mobile numbers in all those formats, but carefully rewrite them in one single format say 12345679 before writing them in database. This way that reformatting occurs only when inserting new records or on updates, and I assume you will have much more read accesses than write ones.
If you allready have records in your database, you should considere using a batch to transform them in one single operation .
Once you have only one format, you can put an index on the column as it could speed select queries by orders of magnitude as soon as you have thousands of records.
When you want to do a search, allow any format for user input of what they want, and apply same transformation that you apply on insert. For example if a user presents 123 456 789 or 123-456-789 or any of your accepted format, in your code for search transform it in 123456789 and do you query with that value (using the index ...)
From user point of view, you still allow he to present input as he wants, and simply the responses may come faster. The only drawback is that you will display not the value he entered but a standardized version of it.
From your point of view (as the programmer) you get something simpler to write and to maintain with less stress on database.
did you try Projections.sqlProjection
You can use replace REPLACE(mobile, ' ') inside
I know this is an answer that could eat up your db resources, you can
test it and check if it matches your need.
I've done phone number formatting before, but, the solution you are looking for could be difficult, if you have to search using regex I'll construct a regex in the code and search in the db. (Oracle has regex_like function, you may want to use that instead of ilike of hibernate)
eg phone number from client +333 555 9999, phone number in db: +3 33 555 9999
Construct the following regex based on what client sends:
/+(\s-.)*3(\s-.)*3(\s-.)*3(\s-.)*5(\s-.)*5(\s-.)*5(\s-.)*9(\s-.)*9(\s-.)*9(\s-.)*9(\s-.\d\w)*/
What you are saying is there could be many (.) dots may (\s) spaces many (-) hiphens in a phone number trailing with many(.\s-\d\w) (eg: x234 or ext2342)
As per your conversation with PVR, it seems like the format of the phone number can be anything.
Hibernate framework is based on patterns. It cannot handle any format on its own.
Its advisable to not include the phone number based criteria in Hibernate. You must execute your entire criteria query without the phone number and thereafter you must have java logic for filtering rest of the results.
However, the best solution is make your design more solid. Adding constraint on the format of the phone is the best practice. You can consider adding a validation on format of phone.
You can write your own Criterion by implementing the Criterion interface.
In your toSqlString method just use the replace function of your database. AFAIK replace(str, needle, replacement) is a SQL99 standard function so it should work in todays dbms.

Find most common words in sql

I have a new problem. I have a database with a column that contains a wide variety of text, is there any way I can get SQL to tell me which are the 10 most common words used in these fields? As an example:
1 I am coming home a bit late today.
2 Train is running late.
3 What is the train schedule like today?
4 Snow is really bad right now.
And output optimally would be:
is: 3
late : 2
train: 2
today: 2
If it is not possible to do it with SQL, what else would you suggest I look into to get this information?
This might technically be doable in SQL, but it will be painful and very slow when you have more rows in your database.
The problem you are describing is a perfect use case for an indexing engine though, such as Lucene (I used this one as an example it since your question first contained the tag 'java' before being edited).
One option is to use table-valued split function that returns each word as a row ; count them ; sort them by count in descending order

Matching inexact company names in Java

I have a database of companies. My application receives data that references a company by name, but the name may not exactly match the value in the database. I need to match the incoming data to the company it refers to.
For instance, my database might contain a company with name "A. B. Widgets & Co Ltd." while my incoming data might reference "AB Widgets Limited", "A.B. Widgets and Co", or "A B Widgets".
Some words in the company name (A B Widgets) are more important for matching than others (Co, Ltd, Inc, etc). It's important to avoid false matches.
The number of companies is small enough that I can maintain a map of their names in memory, ie. I have the option of using Java rather than SQL to find the right name.
How would you do this in Java?
You could standardize the formats as much as possible in your DB/map & input (i.e. convert to upper/lowercase), then use the Levenshtein (edit) distance metric from dynamic programming to score the input against all your known names.
You could then have the user confirm the match & if they don't like it, give them the option to enter that value into your list of known names (on second thought--that might be too much power to give a user...)
Although this thread is a bit old, I recently did an investigation on the efficiency of string distance metrics for name matching and came across this library:
https://code.google.com/p/java-similarities/
If you don't want to spend ages on implementing string distance algorithms, I recommend to give it a try as the first step, there's a ~20 different algorithms already implemented (incl. Levenshtein, Jaro-Winkler, Monge-Elkan algorithms etc.) and its code is structured well enough that you don't have to understand the whole logic in-depth, but you can start using it in minutes.
(BTW, I'm not the author of the library, so kudos for its creators.)
You can use an LCS algorithm to score them.
I do this in my photo album to make it easy to email in photos and get them to fall into security categories properly.
LCS code
Example usage (guessing a category based on what people entered)
I'd do LCS ignoring spaces, punctuation, case, and variations on "co", "llc", "ltd", and so forth.
Have a look at Lucene. It's an open source full text search Java library with 'near match' capabilities.
Your database may suport the use of Regular Expressions (regex) - see below for some tutorials in Java - here's the link to the MySQL documentation (as an example):
http://dev.mysql.com/doc/refman/5.0/en/regexp.html#operator_regexp
You would probably want to store in the database a fairly complex regular express statement for each company that encompassed the variations in spelling that you might anticipate - or the sub-elements of the company name that you would like to weight as being significant.
You can also use the regex library in Java
JDK 1.4.2
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
JDK 1.5.0
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Matcher.html
Using Regular Expressions in Java
http://www.regular-expressions.info/java.html
The Java Regex API Explained
http://www.sitepoint.com/article/java-regex-api-explained/
You might also want to see if your database supports Soundex capabilities (for example, see the following link to MySQL)
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex
vote up 1 vote down
You can use an LCS algorithm to score them.
I do this in my photo album to make it easy to email in photos and get them to fall into security categories properly.
* LCS code
* Example usage (guessing a category based on what people entered)
to be more precise, better than Least Common Subsequence, Least Common Substring should be more precise as the order of characters is important.
You could use Lucene to index your database, then query the Lucene index. There are a number of search engines built on top of Lucene, including Solr.

Categories