Here is something that I don't really understand.
I would like to get the date part from the following string:
<th>Elkezdodott</th>
<td>2016. december 20., 19:29</td>
So I use the following code:
System.out.println(html);
Pattern p = Pattern.compile("\\p{Punct}th\\p{Punct}Elkezdodott\\p{Punct}{2}th\\p{Punct}\\p{Space}*" +
"\\p{Punct}td\\p{Punct}" +
"(\\d{4}\\p{Punct}\\p{Space}*[a-zA-Z]*\\p{Space}*\\d*\\p{Punct}{2}" +
"\\p{Space}*\\d{2}\\p{Punct}\\d{2})\\p{Punct}{2}td\\p{Punct}");
Matcher m = p.matcher(html);
if(m.matches()){
System.out.println("matches");
System.out.println(m.group());
}
This regex seems correct according to the Check RegExp option of the Android Studio:
The result of the System.out.println(html) is exactly the same as you can see on the image:
06-03 11:49:15.779 4581-5229/hu.lyra.moly_kihivasok I/System.out: <th>Elkezdodott</th>
06-03 11:49:15.779 4581-5229/hu.lyra.moly_kihivasok I/System.out: <td>2016. december 20., 19:29</td>
What I really don't understand is why m.matches() returns false. I also tried m.find(), but I got the same result. Did I miss something?
Thanks for any advice.
I've executed your exact example and it matches the string. The only thing you did wrong, is not passing an argument to the group() function. You need to define which group you want to match. In your case, this would be the first one. So, use group(1);.
Btw. why are you using such a complicated pattern to match your string? I would not use \p{} that often, because it makes it unreadable. Just use this:
"<th>Elkezdodott</th>\\n<td>(\\d{4}\\.\\s*[a-zA-Z]+\\s*\\d{1,2}\\.,\\s*\\d{2}:\\d{2})</td>"
Btw.^2 You shouldn't use regex to parse HTML. Use an HTML parser instead. There are plenty around. If you try to parse HTML with regex you are soon coming to major problems (nesting, wrong HTML, like missing end tags etc.).
Related
I have the following requirement where in I need to do few things only if the given string ends in "Y" or "Years" or "YEARS".
I tried doing it using regex like this.
String text=1.5Y;
if(Pattern.matches("Y$",text) || Pattern.matches("YEARS$",text) || Pattern.matches("Years",text))
{
//do
}
However this is getting failed.
Can someone point me where I have gone wrong or suggest me any other feasible method.
EDIT:
Thanks.That helps.
Finally I have used "(?i)^.*Y(ears)?$| (?i)^.*M(onths)?$".
But I want to make more changes to make it perfect.
Let's say I have many strings.
Ideally only strings like 1.5Y or 0.5-3.5Y or 2.5/2.5-4.5Y should pass if check.
It can be number of years(Ex:2.5y) or the period of years(2.5-3.5y) or the no of years/period of years(Ex.2.5/3.5-4.5Y) nothing more.
More Examples:
--------------
Y -should fail;
MY - should fail;
1.5CY - should fail;
1.5Y-2.5Y should fail;
1.5-2.5Y should pass;
1.5Y/2.5-3.5Y should fail;
1.5/2.5-3.5Y should pass;
You don't need a regex here:
if(text.endsWith("Y") || ...)
matches method attempts to match full input so use:
^.*Y$
for your first pattern.
btw you can use a single regex for all 3 cases:
if (text.matches( "(?i)^.*Y(ears)?$" ) ) {...}
(?i) does ignore case match.
.*(?:Y|YEARS|Years)$
You can directly use this .Match matches from beginning.So yours is failing.
You can simply use the regex pattern:
if (Pattern.matches(".*(Y|YEARS|Years)$",text)) {/*do something*/}
/((?!0)\d+|0)(.\d+)?(?:years|year|y)/gi
https://regex101.com/r/gJ6xD2/2
var text = "1.6y 1.5years 1year 1.5h";
text.match(/((?!0)\d+|0)(\.\d+)?(?:years|year|y)/gi);
Result["1.6y", "1.5years", "1year"]
(?=^(0\.\d+|[1-9](?:\d+)?(?:\.\d+)?)(?:(\s+)?[\/-](\s+)?(?:0\.\d+|[1-9](?:\d+)?(?:\.\d+)?))*(?:\s+)?(?:y(?:(ea)?rs|ears?)?|m(?:onths?)?)$).*
https://regex101.com/r/kL7rQ1/3
Only thing I wasn't sure "2.3 - 4 / 6.2 y" format is acceptable or not, so I've included it.
I need to write regex in java to match domain and subdomain(.domain.com).
Regex should return true for
domain.com
m.domain.com
abc.domain.com
www.domain.com
but returns false for
abcdomain.com
1domain.com
I try to match domain.com and and if preceding character is present then it must be .
I tried various options but it is failing in one or other test cases.
(^|.*?\.)domain\.com
Try this. See demo.
http://regex101.com/r/lB2sH2/1
Try this:
(\.|^)domain.com$
The first part means that there should be a . or nothing
and the $ means, "ends with"
You can try:
(^|\.)domain\.com$
but Java mostly handles only full-line matches, so:
(.+\.)?domain\.com
or you can use the .endWith() method in Java code:
if (domain.equals("domain.com") || domain.endsWith(".domain.com")) {
// do something...
}
I think you want something like this,
(?:\\w+\\.?)?domain\\.com
DEMO
try this regex
\bdomain\.com$
http://rubular.com/r/QG0FtVWtm6
If you don't know what "domain.com" is going to be, this regex below should give you just the subdomain of whatever domain you are looking for. Matches your specifications, including domains that look like abc.net
([a-z]+)(?=\.[a-z]+\.)
DEMO
Why does this code not work?
public static void main(String[] args) {
String s = "You need the new version for this. Please update app ...";
System.out.println(s.replaceAll(". ", ".\\\\n").replaceAll(" ...", "..."));
}
This is my wanted output:
You need the new version for this.\nPlease update app...
Thanks for the information
String.replaceAll method takes Regex as first argument.
So you need to escape your dot (.), as it has special meaning in Regex, which matches any character.
System.out.println(s.replaceAll("\\. ", ".\\\\n").replaceAll(" \\.\\.\\.", "..."));
However, for your given input, you can simply use String.replace method, as it does not take Regex, and has an added advantage of that.
. is a special regex character and will match anything. You need to escape it like this: \\.
So to match three dots you must use following regex: "\\.\\.\\."
what you want is
s.replaceAll("\\. ", ".\n").replaceAll(" \\.\\.\\.", "...")
You shouldn't be using replaceAll - use replace instead. replaceAll takes a regular expression when it is not needed here (and hence it will be unnecessarily inefficient).
String s = "You need the new version for this. Please update app ...";
System.out.println(s.replace(". ", ".\\n").replace(" ...", "..."));
(Also note that I've replaced ".\\\\n" with ".\\n" here, which produces the desired output.)
try as
System.out.println(s.replace(". ", ".\n").replace(" ...", "..."));
this gives
You need the new version for this.
Please update app...
I have the following REGEX that I'm serving up to java via an xml file.
[a-zA-Z -\(\) \-]+
This regex is used to validate server side and client side (via javascript) and works pretty well at allowing only alphabetic content and a few other characters...
My problem is that it will also allow zero lenth strings / empty through.
Does anyone have a simple and yet elegant solution to this?
I already tried...
[a-zA-Z -\(\) \-]{1,}+
but that didn;t seem to work.
Cheers!
UPDATE FOLLOWING INVESTIGATION
It appears the code I provided does in fact work...
String inputStr = " ";
String pattern = "[a-zA-Z -\\(\\) \\-]+";
boolean patternMatched = java.util.regex.Pattern.matches(pattern, inputStr);
if ( patternMatched ){
out.println("Pattern MATCHED");
}else{
out.println("NOT MATCHED");
}
After looking at this more closely I think the problem may well be within the logic of some of my java bean coding... It appears the regex is dropped out at the point where the string parse should take place, thereby allowing empty strings to be submitted... And also any other string... EEJIT that I am...
Cheers for the help in peer reviewing my initial stupid though....!
Have you tried this:
[a-zA-Z -\(\) \-]+
I have following code in Java:
Pattern fieldsPattern = Pattern.compile("(\"([^\"]+)\")|"
+"("+this.field_tag+"([0-9a-zA-Z_]+))");
Matcher fieldsMatcher = fieldsPattern.matcher(field);
while(fieldsMatcher.find())
{
//...
}
This code should capture expressions like "expression" and :expression (field_tag is just ":"). The problem occurs when I try to capture an expression like: "10.1" or "10,1". It dosen't work.
But expressions:
"10-1",
"10+1"
works as expected.
I also tried use this regexp on regexpal.com - site with javascript implementation of RegExp. On this site expressions like "10.1" and "10,1" works fine.
Is there any difference in java vs javascript in capturing dots? What am I doing wrong?
This works for me
Pattern fieldsPattern = Pattern.compile("(\"[^\"]+\")");
String field =" aa \"10\" \"10.1\" and \"10,1\"";
Matcher fieldsMatcher = fieldsPattern.matcher(field);
while(fieldsMatcher.find()) {
System.out.println(fieldsMatcher.group());
}
prints
"10"
"10.1"
"10,1"
The second set of brackets in the regex appear to be redundant, but are harmless.