Match any unicode Letters with java regex [closed] - java

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I need to match any letters (like MS Office Word find with special character ^$ functionality) with regex.
I've tried with [a-zA-Z] but don't match any unicode letters like accent letters or ä, ö, ü, ß.
I've tried also with [a-zA-ZäöüßÄÖÜ]but there are too many letters.
Is there any regex to match all this letters?

This \\p{L} regex would match any kind of letter from any language.
DEMO

To match any unicode letter in Java use:
\\p{L}

You can use \\p{L} to match any letter, Unicode included.
For fine-tuned matching, you can consult the documentation on filefront, and combine it with the Unicode features documented in Java Pattern here.
Quick example
String input = "ZäöüßÄÖÜß您好";
System.out.println(input.matches(String.format("\\p{L}{%d}", input.length())));
Output
true

It seems you want to match not any letter (eg Arabic characters), but Latin characters:
\p{IsLatin}+
Using your chars:
System.out.println("ZäöüßÄÖÜ".matches("\\p{IsLatin}+")); // true

Related

Check if word alternates consonant and vowel [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 months ago.
Improve this question
So I need to check if a word is a pattern of alternating vowel and cosonant (Or consonant and vowel) in Java.
I want to make it a regex but I just came with this incomplete regex expression:
[aeiouAEIOI][^aeiouAEIOI]
Any ideas?
Thanks :)
Update: It's not regex restricted, so it can be an option if anyone has any ideas
One way is using a lookahead to check if neither two vowels nor two consonants next to each other.
(?i)^(?!.*?(?:[aeiou]{2}|[^aeiou]{2}))[a-z]+$
See this demo at regex101 (used i flag for caseless matching, the \n in demo is for staying in line)
Update: Thank you for the comment #Thefourthbird. For matching at least two characters you will need to change the last quantifier: Use [a-z]{2,} (two or more) instead of [a-z]+ (one or more). For only matching an even amount of characters (2,4,6,8...), change this part to: (?:[a-z]{2})+
FYI: If you use this with matches you can drop the ^ start and $ end anchor (see this Java demo).

Finding a regex pattern using JAVA [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I need a Regex expression for the following:
9845d530c7594ab45e8b905bbff
It should always start with 984, then have a UUID and the maximum length should be 27. Over here it is 5d530c7594ab45e8b905bbff (the UUID).I know for the UUID but I am not sure how to combine it in one.
For starting with 984 it should be
^984
And for a specific length it should be
\d{27}
But I am not sure about UUID(which over here should be case insensitive).
^984[0-9a-fA-F]{24}$
A quick explanation:
^984 must begin with "984"
[...]{24}$ 24 characters that match the given character set, then end
[0-9a-fA-F] a character set that includes any number 0-9, character a-f or character A-F
You can also use character classes \d for the numeric portion (must be a single numeric digit), but I like to be explicit, because otherwise my brain hurts. Character classes are useful if you're running against an unknown character set that might have multiple representations for a number. For instance, \d might match the Arabic numbers (٠‎ ١‎ ٢‎ ٣‎ ٤‎ ٥‎ ٦‎ ٧‎ ٨‎ ٩) or Devanagari numbers (० १ २ ३ ४ ५ ६ ७ ८ ९).
There are also regex "modifiers" that allow case insensitivity without having to specify the upper and lower case in the character set. It does depend on which regex implementation you use.
Java's built-in regular expressions library has a modifier flag:
Pattern.compile("^984[0-9a-f]$",CASE_INSENSITIVE);

Explain this regular expression in Java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Regular expression in java:
'String'.replaceAll("([aeioucgjkqsxyzbfpvwdtmn1234567890])\\1+", "$1")
Can someone explain what the different characters do?
Explanation:
[aeioucgjkqsxyzbfpvwdtmn1234567890] Matches a single character in the list.
([aeioucgjkqsxyzbfpvwdtmn1234567890]) Capturing group around the char class would capture that single character.
\1+ \1 is a pointer to refer the chars inside the group index 1. In our case, a single character is captured so it refers to that single character. \1+ means one or more occurrences of the characters inside group index 1.
For Example:
aaaa
The above regex would capture the first character and check if the following one or more characters are same as the first character which was captured. If yes, then the whole duplicated chars are replaced by a single char(which was inside group index 1 ), that is aaaa was replaced by a single a
DEMO
All letters that are listed between brackets will be replaced by $1 if after them comes a \1, which is a literal backslash one. The plus sign (+) means 1 or more.
Any sequence of 1 or more of the characters inside the brackets [...] will be replaced with $1.
For instance, this will remove all those characters from your string:
System.out.println(Str.replaceAll("([aeioucgjkqsxyzbfpvwdtmn1234567890])\1+", ""));

Whole word regex match [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have a list of strings, like c++, c, java, c#, .net.
I have to find the occurrences of these strings in some text.
I tried,
String pattern = "(?i)\\b"+Pattern.quote(str)+"\\b";
But it doesn't match with c++.
Then, I removed \b and it started matching every c in the text.
How do I match the whole word?
Sample String:
C, c#, C++ college cat cow
\bc\+\+\b cannot c++ because + is not considered a word character. \b can only match after a word character not after a non-word character like +.
You can probably use this regex:
\bc\+\+(?=\W|$)
Regex Demo

Regex to match some number of upper case characters at the beginning of a String [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Can anyone recommend a Regex that would match on the following rules:
Upper case or a space
My strings that I want to match look like this
LONDON 10 Downing St, London
or this
NEW YORK 2859 Broadway, New York, NY 10025
I want to be able to match the words LONDON and NEW YORK when I pass in each line.
P.S. I am doing this in Java
Beginning of the string: ^
Uppercase letter: \p{Lu}
Space:  
Combining the two: [\p{Lu} ]
Any number of the preceding token: *
Assertion that the match ends at the end of a word (requires Java 7 to work reliably): \b
Your regex, therefore, is
^[\p{Lu} ]*\b
Don't forget to double the backslashes to comply with Java's string escaping rules:
In Java 7:
Pattern regex = Pattern.compile("^[\\p{Lu} ]*\\b", Pattern.UNICODE_CHARACTER_CLASS);
In Java 6 and below:
Pattern regex = Pattern.compile("^[\\p{Lu} ]*(?<=\p{Lu})");
You can use this pattern:
^[A-Z ]+
This will match one or more upper case Latin letters or spaces from the beginning of the string.
You can easily modify this to avoid capturing trailing spaces:
^[A-Z ]*[A-Z]
Use this:
^\u+( \u+)*
It matches a number of uppercase characters, optionally followed by groups of (single space, more uppercase characters). This will avoid always ending with a space.

Categories