Java get pattern of a String - java

is it possible to detect the pattern of a String and store it in a variable? so, if I have a String test1234 and highlight 1234 I expect something like \d{4}.

It would require that you find a regular expression that both your highlighted substring and desired replacement match and that is in no way unique. For example, "1234" could match .{4} or \d{4} or even .+ , which is not of a unique length. So, even if you could generate a regular expression from a string, it could happen that it would be the string itself or something you didn't want. Maybe you should rethink the general desired outcome of your program and try to come up with a different way of solving the issue at hand.
Hope that helped. Good luck!

Related

Why the below pattern matching is giving True for the String "ababx300000055773908"

Firstly sorry for the primitive question, I am wondering how the below method is returning true
Pattern.compile("([0-9]{15})").asPredicate().test("ababx300000055773908")
Please let me know, if i am missing something here.
You have to define the start and the end of the String to match.
your pattern is matching the 15 times numeric in the whole string without considering the location of the pattern.
use regex ^[0-9]{15}$
Pattern.compile("(^[0-9]{15}$)").asPredicate().test("ababx300000055773908");

Regex to match a fixed sub string in a String

I am trying to write a regular expression to verify the presence of a specific number in a fixed position in a String.
String: 109300300330066611111111100000000017000656052086116020170111Name 1
Number to find: 111111111 (Staring from position 17)
I have written the following regular expression:
^.{16}(?<Ones>111111111)(.*)
My understanding is:
Let first 16 characters be whatever they are
Use the Named Capturing Group to grab the specific word
Let the rest of the characters be whatever they are
I am new to regex, is there any issue with the above approach?
Can it be done in other/better way?
I am using Java 8.
Without more details of why you're doing what you're doing, there's just one possible improvement I can see. You repeated any character 16 times at the beginning of the string rather than writing out 16 .s, which is nice and readable, but then, it would be nice to do the same for the repeated 1s:
^.{16}(?<Ones>1{9})(.*)
Otherwise, the string of 1s is hard to understand without the coder manually counting how many there are in the regex.
If you want to hard-code the ones and you know the starting position and you just wnat to know if it is there, using a regex seems unnecessary. you can use this:
String s = "109300300330066611111111100000000017000656052086116020170111Name 1";
if (s.indexOf("111111111").equals(16) doSomething();
Another possible solution without regex:
if(s.substring(16,25).equals("111111111") doSomething();
Otherwise your regex looks good.

Regex to match if string *only* contains *all* characters from a character set, plus an optional one

I ran into a wee problem with Java regex. (I must say in advance, I'm not very experienced in either Java or regex.)
I have a string, and a set of three characters. I want to find out if the string is built from only these characters. Additionally (just to make it even more complicated), two of the characters must be in the string, while the third one is **optional*.
I do have a solution, my question is rather if anyone can offer anything better/nicer/more elegant, because this makes me cry blood when I look at it...
The set-up
There mandatory characters are: | (pipe) and - (dash).
The string in question should be built from a combination of these. They can be in any order, but both have to be in it.
The optional character is: : (colon).
The string can contain colons, but it does not have to. This is the only other character allowed, apart from the above two.
Any other characters are forbidden.
Expected results
Following strings should work/not work:
"------" = false
"||||" = false
"---|---" = true
"|||-|||" = true
"--|-|--|---|||-" = true
...and...
"----:|--|:::|---::|" = true
":::------:::---:---" = false
"|||:|:::::|" = false
"--:::---|:|---G---n" = false
...etc.
The "ugly" solution
Now, I have a solution that seems to work, based on this stackoverflow answer. The reason I'd like a better one will become obvious when you've recovered from seeing this:
if (string.matches("^[(?\\:)?\\|\\-]*(([\\|\\-][(?:\\:)?])|([(?:\\:)?][\\|\\-]))[(?\\:)?\\|\\-]*$") || string.matches("^[(?\\|)?\\-]*(([\\-][(?:\\|)?])|([(?:\\|)?][\\-]))[(?\\|)?\\-]*$")) {
//do funny stuff with a meaningless string
} else {
//don't do funny stuff with a meaningless string
}
Breaking it down
The first regex
"^[(?\\:)?\\|\\-]*(([\\|\\-][(?:\\:)?])|([(?:\\:)?][\\|\\-]))[(?\\:)?\\|\\-]*$"
checks for all three characters
The next one
"^[(?\\|)?\\-]*(([\\-][(?:\\|)?])|([(?:\\|)?][\\-]))[(?\\|)?\\-]*$"
check for the two mandatory ones only.
...Yea, I know...
But believe me I tried. Nothing else gave the desired result, but allowed through strings without the mandatory characters, etc.
The question is...
Does anyone know how to do it a simpler / more elegant way?
Bonus question: There is one thing I don't quite get in the regexes above (more than one, but this one bugs me the most):
As far as I understand(?) regular expressions, (?\\|)? should mean that the character | is either contained or not (unless I'm very much mistaken), still in the above setup it seems to enforce that character. This of course suits my purpose, but I cannot understand why it works that way.
So if anyone can explain, what I'm missing there, that'd be real great, besides, this I suspect holds the key to a simpler solution (checking for both mandatory and optional characters in one regex would be ideal.
Thank you all for reading (and suffering ) through my question, and even bigger thanks for those who reply. :)
PS
I did try stuff like ^[\\|\\-(?:\\:)?)]$, but that would not enforce all mandatory characters.
Use a lookahead based regex.
^(?=.*\\|)(?=.*-)[-:|]+$
or
^(?=.*\\|)[-:|]*-[-:|]*$
or
^[-:|]*(?:-:*\\||\\|:*-)[-:|]*$
DEMO 1DEMO 2
(?=.*\\|) expects atleast one pipe.
(?=.*-) expects atleast one hyphen.
[-:|]+ any char from the list one or more times.
$ End of the line.
Here is a simple answer:
(?=.*\|.*-|.*-.*\|)^([-|:]+)$
This says that the string needs to have a '-' followed by '|', or a '|' followed by a '-', via the look-ahead. Then the string only matches the allowed characters.
Demo: http://fiddle.re/1hnu96
Here is one without lookbefore and -hind.
^[-:|]*\\|[-:|]*-[-:|]*|[-:|]*-[-:|]*\\|[-:|]*$
This doesn't scale, so Avinash's solution is to be preferred - if your regex system has the lookbe*.

Simple Java regular expression matching fails

Before y'all jump on me for posting something similar to previous questions asked, yes, there seem to be a number of regex related questions but nothing which seems to help me, or at least that I can see.
I am trying to parse strings in JAVA using PATTERN and MATCHER and am really having no joy. My regular expression seems to match my input string when I use a few of the online regular expression testing websites but Java simply does not match my expression.
My input string is:
"Big apple" title="Little Apple" type="Container" url="http://malcolm.com/testing"
The regular expression I am using to match is ".*" title="(.*)" type="Container" url="(.*)"
Essentially I want to pull out the text within the second and the fourth set of quotes. There will always be 4 sets of quotes with text within and around.
I am coding as follows:
Variable XMLSubstring contains the string above (including the quotes) and is as stated, even when I print it out.
Pattern p = Pattern.compile(".* title=\"(.*)\" type=\"Container\" url=\"(.*)\"");
m = p.matcher(XMLSubstring);
It doesn't appear to be rocket science I'm attempting but I'm pulling my hair out trying to debug the bloody thing.
Is there something wrong with my regex pattern?
Is there something wrong with the code I am using?
Am I simply a moron and should stop coding with immediate effect?
EDIT & UPDATE: I have found the problem. My string had a space at the end of it which was breaking the parser! How silly, and I think based on that, I need to accept the third suggestion of mine and give up programming. Thanks all for your assistance.
Try this,
String str="\"Big apple\" title=\"Little Apple\" type=\"Container\" url=\"http://malcolm.com/testing\"";
Pattern p=Pattern.compile(".* title=\\\".*\\\" type=\\\"Container\\\" url=\\\".*\\\"");
Matcher m=p.matcher(str);

Java string: classes or packages with advanced functions?

I am doing string manipulations and I need more advanced functions than the original ones provided in Java.
For example, I'd like to return a substring between the (n-1)th and nth occurrence of a character in a string.
My question is, are there classes already written by users which perform this function, and many others for string manipulations? Or should I dig on stackoverflow for each particular function I need?
Check out the Apache Commons class StringUtils, it has plenty of interesting ways to work with Strings.
http://commons.apache.org/lang/api-2.3/index.html?org/apache/commons/lang/StringUtils.html
Have you looked at the regular expression API? That's usually your best bet for doing complex things with strings:
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
Along the lines of what you're looking to do, you can traverse the string against a pattern (in your case a single character) and match everything in the string up to but not including the next instance of the character as what is called a capture group.
It's been a while since I've written a regex, but if you were looking for the character A for instance, then I think you could use the regex A([^A]*) and keep matching that string. The stuff in the parenthesis is a capturing group, which I reference below. To match it, you'd use the matcher method on pattern:
http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#matcher%28java.lang.CharSequence%29
On the Matcher instance, you'd make sure that matches is true, and then keep calling find() and group(1) as needed, where group(1) would get you what is in between the parentheses. You could use a counter in your looping to make sure you get the n-1 instance of the letter.
Lastly, Pattern provides flags you can pass in to indicate things like case insensitivity, which you may need.
If I've made some mistakes here, then someone please correct me. Like I said, I don't write regexes every day, so I'm sure I'm a little bit off.

Categories