How to find a set of words in a string? - java

I am working on an java and MySQL based application and i have task to find set of words in a string. no matter what is position of words in a string but should be present in a string.
consider an example:
string is "sector 10 , Delhi"
but I am trying to search by Delhi sector 10
or by sector-10 Delhi or sector 10 , Delhi
help me to find such type patter in string by java or MySQL query.

must be use FullText feature of mysql :
http://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
for example :
SELECT address FROM area WHERE MATCH(adrress) AGAINST ('+dehli -bombi' IN BOOLEAN MODE)

Use some thing like this
Pattern p = Pattern.compile("\\b(Delhi|sector|10 )\\b");
Matcher m = p.matcher("sector 10 , Delhi");
m.find();
System.out.println(m.group());

Related

JPA Select query not returning results with one letter word

I have a query that when given a word that starts with a one-letter word followed by space character and then another word (ex: "T Distribution"), does not return results. While given "Distribution" alone returns results including the results for "T Distribution". It is the same behavior with all search terms beginning with a one-letter word followed by space character and then another word.
The problem appears when the search term is of this pattern:
"[one-letter][space][letter/word]". example: "o ring".
What would be the problem that the LIKE operator not working correctly in this case?
Here is my query:
#Cacheable(value = "filteredConcept")
#Query("SELECT NEW sina.backend.data.model.ConceptSummaryVer04(s.id, s.arabicGloss, s.englishGloss, s.example, s.dataSourceId,
s.synsetFrequnecy, s.arabicWordsCache, s.englishWordsCache, s.superId, s.categoryId, s.dataSourceCacheAr, s.dataSourceCacheEn,
s.superTypeCasheAr, s.superTypeCasheEn, s.area, s.era, s.rank, s.undiacritizedArabicWordsCache, s.normalizedEnglishWordsCache,
s.isTranslation, s.isGloss, s.arabicSynonymsCount, s.englishSynonymsCount) FROM Concept s
where s.undiacritizedArabicWordsCache LIKE %:searchTerm% AND data_source_id != 200 AND data_source_id != 31")
List<ConceptSummaryVer04> findByArabicWordsCacheAndNotConcept(#Param("searchTerm") String searchTerm, Sort sort);
the result of the query on the database itself:
link to screenshot
results on the database are returned no matter the letters case:
link to screenshot
I solved this problem.
It was due to the default configuration of the Full-text index on mysql database which is by default set to 2 (ft_min_word_len = 2).
I changed that and rebuilt the index. Then, one-letter words were returned by the query.
12.9.6 Fine-Tuning MySQL Full-Text Search
Use some quotes:
LIKE '%:searchTerm%';
Set searchTerm="%your_word%" and use it on query like this :
... s.undiacritizedArabicWordsCache LIKE :searchTerm ...

Trying to find apt regex expression for my query

Pattern p1 = Pattern.compile("(?:^|)'([^']*?)'(?:$|)");
Matcher m = p1.matcher(input);
//Matcher m = p2.matcher(testcases);
while (m.find()) {
output += (m.group().replace("\'", "").trim() + "/");
}
Input
/content/folder[#name='folder query 2']/folder[#name='Share file
Zone']/folder[#name='steve']/folder[#name="steve's Personal
Folder"]/folder[#name='Backup']/folder[#name='20150317']/folder[#name='.Archive']
output should be -
folder query 2,
Share file zone,
steve,
steve's Personal folder,
Backup,
20150317,
.Archive
for some reasons my regex expressions seems to be reading only words with quotes so it doesn't consider steve's neither double quotes of the same. i am trying to format the query hence what i require is only folder names irrespective of single quote or double quote without considering apostrophe associated.
Use the following regex: (['"])(.*?)\1
It matches the opening quote (single or double), capturing that character as capture #1, captures the text as capture #2, and ends with the same kind of quote used at the beginning, by matching the capture #1.
Remember to escape " and \ when writing as Java string literal.
Test
String input = "/content/folder[#name='folder query 2']/folder[#name='Share file Zone']/folder[#name='steve']/folder[#name=\"steve's Personal Folder\"]/folder[#name='Backup']/folder[#name='20150317']/folder[#name='.Archive']";
for (Matcher m = Pattern.compile("(['\"])(.*?)\\1").matcher(input); m.find(); )
System.out.println(m.group(2));
Output
folder query 2
Share file Zone
steve
steve's Personal Folder
Backup
20150317
.Archive
You can get every thing between [#name=['\"] and ['\"] your regex should look like this \\[#name=['\"](.*?)['\"]] :
Pattern p1 = Pattern.compile("\\[#name=['\"](.*?)['\"]]");
Matcher m = p1.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Output
folder query 2
Share file Zone
steve
steve's Personal Folder
Backup
20150317
.Archive
Ideone Demo

extracting a particular field from url

I want to extract particular fields from a url of a facebookpage. Iam not able to extract since link format is not static.eg:if I gave the below examples as input it should give the o/p as what we desire
1)https://www.facebook.com/pages/Ice-cream/109301862430120?rf=102173023157556
o/p -109301862430120
What about this type of link
can anyone help me
So in short, you want to get name after last / and (if there is any) before ? mark.
You can do it with using URI and File classes like
String data = "https://www.facebook.com/pages/Anti-Christian-sentiment/149675731889496?ref=br_tf";
System.out.println(new File(new URI(data).getRawPath()).getName());
Output: 149675731889496
If you need to use regex then you can use
([^/?]+)(\\?|$)
and just read content of group 1 (the one in first pair of parenthesis).
If you don't want to use groups, and make regex match only digit part (without including ? in match) then you can use look around mechanisms like look-ahead (?=...). Regex you would have to use would look like
[^/?]+(?=\\?|$)
Code example:
String data = "https://www.facebook.com/pages/Anti-Christian-sentiment/149675731889496?ref=br_tf";
Pattern p = Pattern.compile("([^/?]+)(\\?|$)");
Matcher m = p.matcher(data);
if (m.find()){
System.out.println(m.group(1));
}
Output:
149675731889496

how to get all names and date of births from a specific file using java

Hi below is my text file
welcome to java training
program
Name rtrti*&*
John
address india say^%$7
Date of Birth
11/12/1989
I have 100 files like above.The above text is the extracted text from the image files so it is not in order, from this i need to get the names and date of births can you please suggest me how to do this, I am new to this task.
Required output
John
11/12/1989
I have tried
Pattern p = Pattern.compile("Name");
Matcher matcher = p.matcher(content);
matcher.find();
But I have know idea how to get the next line of matched pattern, I cant not read this file line by line because my need is to store entire text in a single string.
I'll give a few hints that will get you on track. Without more details regarding the expected input, it will be difficult to give you a solid solution. First, I trust that you are already familiar with the Pattern and Matcher javadocs. You will need to understand the Groups and capturing section. Finally, you can utilize DOTALL mode which will allow the . character to match newlines.
To get you started, the following should work to find the name:
Pattern p = Pattern.compile(
"(?s)" + // DOTALL
".*" + // Match anything (to consume everything before 'Name')
"Name" + // Match the literal 'Name'
".*?" + // Reluctantly grab everything until...
"\n" + // Newline is reached
"\\s*" + // Consume leading whitespace
"(\\S+)" // Capture at least one non-whitespace character
);
Matcher m = p.matcher(content);
if(m.find()) {
String name = m.group(1); // The first capturing group contains "John"
}

How to check string contain a year sarting from 20.. or 19

I have a string, I want to check whether that string contain string like bellow, and get that string no(eg:2012) if exist. (I am trying to get years of a string)
"from 20##" , "From 20##" , "from 19##" or "From 19##"
and
"to 20##" , "To 20##" , "to 19##" or "To 19##" # can be any one digit no
please tell me how to do this
Use Regular Expression to search the pattern.
You can replace any digit by "[/d]" in the regular expression.
So, if you are searching for 20##, your search pattern will be 20[\d][\d].
Have a look at this tutorial is you still have doubts.
All you really need is
int year = Integer.parseInt(str.substring(str.length()-4)
and you are free to check that year >= 1900 && year < 2100.
If you need to assert that the string matches your pattern, you can test for
str.matches("([Ff]rom|[Tt]o)\\s+\\d{4}")

Categories