Regular expression to match start of an address - java

I would like to match the start of an address in java. I have tried with this website (http://www.regexplanet.com/advanced/java/index.html) and it did match the address but the very minute i tried it in netbean it did not.
any idea why?
Pattern p = Pattern.compile("\bcloud.*");
Matcher m = p.matcher("cloud (cloud.yahoo.com:225) - v0.00014 ( jan 10 1999 / 24:12:56 )");
System.out.println(m.matches());

\ should be escaped. Otherwise \b is interpreted as BACKSPACE instead of word boundary.
Pattern p = Pattern.compile("\\bcloud.*");
// ^^^
See http://ideone.com/1rdLg6

Related

Regex to extract hashtags with two dot-separated parts

I'm trying to create a regular expression in order to extract some text from strings. I want to extract text from urls or normal text messages e.g.:
endpoint/?userId=#someuser.id
OR
Hi #someuser.name, how are you?
And from both I want to extract exactly #someuser.name from message and #someuser.id from url. There might be be many of those string to extract from the url and messages.
My regular expression currently looks like this:
(#[^\.]+?\.)([^\W]\w+\b)
It works fine, except one for one case and I don't know how to do it - e.g.:
Those strings SHOULD NOT be matched: # .id, #.id. There must be at least one character between # and .. One or more spaces between those characters should not be matched.
How can I do it using my current regex?
You may use
String regex = "#[^.#]*[^.#\\s][^#.]*\\.\\w+";
See the regex demo and its graph:
Details
# - a # symbol
[^.#]* - zero or more chars other than . and #
[^.#\\s] - any char but ., # and whitespace
[^#.]* - - zero or more chars other than . and #
\. - a dot
\w+ - 1+ word chars (letters, digits or _).
Java demo:
String s = "# #.id\nendpoint/?userId=#someuser.id\nHi #someuser.name, how are you?";
String regex = "#[^.#]*[^.#\\s][^#.]*\\.\\w+";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(0));
}
Output:
#someuser.id
#someuser.name
You can try the following regex:
#(\w+)\.(\w+)
demo
Notes:
remove the parenthesis if you do not want to capture any group.
in your java regex string you need to escape every \
this gives #(\\w+)\\.(\\w+)
if the id is only made of numbers you can change the second \w by [0-9]
if the username include other characters than alphabet, numbers and underscore you have to change \w into a character class with all the authorised characters defined explicitly.
Code sample:
String input = "endpoint/?userId=#someuser.id Hi #someuser.name, how are you? # .id, #.id.";
Matcher m = Pattern.compile("#(\\w+)\\.(\\w+)").matcher(input);
while (m.find()) {
System.out.println(m.group());
}
output:
#someuser.id
#someuser.name
The redefined requirements are:
We search for pattern #A.B
A can be anything, except for only whitespaces, nor may it contain # or .
B can only be regular ASCII letters or digits
Converting those requirements into a (possible) regex:
#[^.#]+((?<!#\\s+)\\.)[A-Za-z0-9]+
Explanation:
#[^.#]+((?<!#\\s+)\\.)[A-Za-z0-9]+ # The entire capture for the Java-Matcher:
# # A literal '#' character
[^.#]+ # Followed by 1 or more characters which are NOT '.' nor '#'
( \\.) # Followed by a '.' character
(?<! ) # Which is NOT preceded by (negative lookbehind):
# # A literal '#'
\\s+ # With 1 or more whitespaces
[A-Za-z0-9]+ # Followed by 1 or more alphanumeric characters
# (PS: \\w+ could be used here if '_' is allowed as well)
Test code:
String input = "endpoint/?userId=#someuser.id Hi #someuser.name, how are you? # .id #.id %^*##*(.H(#EH Ok, # some spaces here .but none here #$p€©ï#l.$p€©ï#l that should do it..";
System.out.println("Input: \""+ input + '"');
System.out.println("Outputs: ");
java.util.regex.Matcher matcher = java.util.regex.Pattern.compile("#[^.#]+((?<!#\\s+)\\.)[A-Za-z0-9]+")
.matcher(input);
while(matcher.find())
System.out.println('"'+matcher.group()+'"');
Try it online.
Which outputs:
Input: "endpoint/?userId=#someuser.id Hi #someuser.name, how are you? # .id #.id %^*##*(.H(#EH Ok, # some spaces here .but none here #$p€©ï#l.$p€©ï#l that should do it.."
Outputs:
"#someuser.id"
"#someuser.name"
"##*(.H"
"# some spaces here .but"
#(\w+)[.](\w+)
results two groups, e.g
endpoint/?userId=#someuser.id -> group[0]=someuser and group[1]=id

How to capture multiple groups in regex?

I am trying to capture following word, number:
stxt:usa,city:14
I can capture usa and 14 using:
stxt:(.*?),city:(\d.*)$
However, when text is;
stxt:usa
The regex did not work. I tried to apply or condition using | but it did not work.
stxt:(.*?),|city:(\d.*)$
You may use
(stxt|city):([^,]+)
See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).
Pattern details:
(stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
: - a colon
([^,]+) - 1 or more characters other than a comma (Group 2).
Java demo:
String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Looking at your string, you could also find the word/digits after the colon.
:(\w+)

Extraction of subsequences that end with point and space by regular expression

hy
I want to extract sub sentences of this sentence by regular expression:
it learn od fg network layout. kdsjhuu ddkm networ.12kfdf. learndfefe layout. learn sdffsfsfs. sddsd learn fefe.
I couldn't write a correct regular expression for Pattern.compile.
This is my expression:([^(\\.\\s)]*)([^.]*\\.)
Actually, i need a way for writing "read everthing except \\.\\s
sub sentences:
it learn od fg network layout.
kdsjhuu ddkm networ.12kfdf.
learndfefe layout.
learn sdffsfsfs.
sddsd learn fefe.
Just split your string with regex "\\. "
String[] arr= str.split("\\. ");
You can use this pattern with the find method:
Pattern p = Pattern.compile("[^\\s.][^.]*(?:\\.(?!\\s|\\z)[^.]*)*\\.?");
Matcher m = p.matcher(yourText);
while(m.find()) {
System.out.println(m.group(0));
}
Pattern details:
[^\\s.] # all that is not a whitespace (to trim) or a dot
[^.]* # all that is not a dot (zero or more times)
(?: # open a non-capturing group
\\. (?!\\s|\\z) # dot not followed by a whitespace or the end of the string
[^.]* #
)* # close and repeat the group as needed
\\.? # an optional dot (allow to match a sentence at the end
# of the string even if there is no dot)

Regex match for string literal including escape sequence

This works just fine for normal string literal ("hello").
"([^"]*)"
But I also want my regex to match literal such as "hell\"o".
This what i have been able to come up with but it doesn't work.
("(?=(\\")*)[^"]*")
here I have tried to look ahead for <\">.
How about
Pattern.compile("\"((\\\\\"|[^\"])*)\"")//
^^ - to match " literal
^^^^ - to match \ literal
^^^^^^ - will match \" literal
or
Pattern.compile("\"((?:\\\\\"|[^\"])*)\"")//
if you don't want to add more capturing groups.
This regex accept \" or any non " between quotation marks.
Demo:
String input = "ab \"cd\" ef \"gh \\\"ij\"";
Matcher m = Pattern.compile("\"((?:\\\\\"|[^\"])*)\"").matcher(input);
while (m.find())
System.out.println(m.group(1));
Output:
cd
gh \"ij
Use this method:
"((?:[^"\\\\]*|\\\\.)*)"
[^"\\\\]* now will not match \ anymore either. But on the other alternation, you get to match any escaped character.
Try with this one:
Pattern pattern = Pattern.compile("((?:\\\"|[^\"])*)");
\\\" to match \" or,
[^\"] to match anything by "

how to get all names and date of births from a specific file using java

Hi below is my text file
welcome to java training
program
Name rtrti*&*
John
address india say^%$7
Date of Birth
11/12/1989
I have 100 files like above.The above text is the extracted text from the image files so it is not in order, from this i need to get the names and date of births can you please suggest me how to do this, I am new to this task.
Required output
John
11/12/1989
I have tried
Pattern p = Pattern.compile("Name");
Matcher matcher = p.matcher(content);
matcher.find();
But I have know idea how to get the next line of matched pattern, I cant not read this file line by line because my need is to store entire text in a single string.
I'll give a few hints that will get you on track. Without more details regarding the expected input, it will be difficult to give you a solid solution. First, I trust that you are already familiar with the Pattern and Matcher javadocs. You will need to understand the Groups and capturing section. Finally, you can utilize DOTALL mode which will allow the . character to match newlines.
To get you started, the following should work to find the name:
Pattern p = Pattern.compile(
"(?s)" + // DOTALL
".*" + // Match anything (to consume everything before 'Name')
"Name" + // Match the literal 'Name'
".*?" + // Reluctantly grab everything until...
"\n" + // Newline is reached
"\\s*" + // Consume leading whitespace
"(\\S+)" // Capture at least one non-whitespace character
);
Matcher m = p.matcher(content);
if(m.find()) {
String name = m.group(1); // The first capturing group contains "John"
}

Categories