How can I look for two specific characters in a string? - java

String abc = "||:::|:|::";
It should return true if there's two | and three : appearances.
I'm not sure how to use "regex" or if it's the right method to use. There's no specific pattern in the abc String.

Using a regex would be a bad idea, especially if there's no specific order to them. Make a function that counts the number of times a character sppears in a string, and use that:
public int count(String base, char toFind)
{
int count = 0;
char[] haystack = base.toCharArray();
for (int i = 0; i < haystack.length; i++)
if (haystack[i] == toFind)
count++;
return count;
}
String abc = "||:::|:|::";
if (count(abc,"|") >= 2 && count(abc,":") >= 3)
{
//Do some code here
}

My favorite method for searching for the number of characters in a string is int num = s.length() - s.replaceAll("|","").length(); you can do that for both and test those ints.

If you want to test all conditions in one regex you can use look-ahead (?=condition).
Your regex can look like
String regex =
"(?=(.*[|]){2})"//contains two |
+ "(?=(.*:){3})"//contains three :
+ "[|:]+";//is build only from : and | characters
Now you can use it with matches like
String abc = "||:::|:|::";
System.out.println(abc.matches(regex));//true
abc = "|::::::";
System.out.println(abc.matches(regex));//false
Anyway I you can avoid regex and write your own method which will calculate number of | and : in your string and check if this numbers are greater or equal to 2 and 3. You can use StringUtils.countMatches from apache-commons so your test code could look like
public static boolean testString(String s){
int pipes = StringUtils.countMatches(s, "|");
int colons = StringUtils.countMatches(s, ":");
return pipes>=2 && colons>=3;
}
or
public static boolean testString(String s){
return StringUtils.countMatches(s, "|")>=2
&& StringUtils.countMatches(s, ":")>=3;
}

This is assuming you are looking for two '|' to be one after the other and the same for the three ':'
and one follows the other .Do it using the following single regular expressions.
".*||.*:::.*"
If you are looking to just check the presence of characters and their irrespective of their order then use String.matches method using the two regular expressions with a logical AND
".*|.*|.*"
".*:.*:.*:.*"
Here is a cheat sheet for regular expressions. Its fairly simple to learn. Look at groups and quantifiers in the document to understand the above expression.

Haven't tested it, but this should work
Pattern.compile("^(?=.*[|]{2,})(?=.*[:]{3,})$");
The entire string is read by ?=.* and checked wether the allowed characters (|) occurs at least twice. The same is then done for :, only that this has to match at least three times.

Related

trouble with writing regex java

String always consists of two distinct alternating characters. For example, if string 's two distinct characters are x and y, then t could be xyxyx or yxyxy but not xxyy or xyyx.
But a.matches() always returns false and output becomes 0. Help me understand what's wrong here.
public static int check(String a) {
char on = a.charAt(0);
char to = a.charAt(1);
if(on != to) {
if(a.matches("["+on+"("+to+""+on+")*]|["+to+"("+on+""+to+")*]")) {
return a.length();
}
}
return 0;
}
Use regex (.)(.)(?:\1\2)*\1?.
(.) Match any character, and capture it as group 1
(.) Match any character, and capture it as group 2
\1 Match the same characters as was captured in group 1
\2 Match the same characters as was captured in group 2
(?:\1\2)* Match 0 or more pairs of group 1+2
\1? Optionally match a dangling group 1
Input must be at least two characters long. Empty string and one-character string will not match.
As java code, that would be:
if (a.matches("(.)(.)(?:\\1\\2)*\\1?")) {
See regex101.com for working examples1.
1) Note that regex101 requires use of ^ and $, which are implied by the matches() method. It also requires use of flags g and m to showcase multiple examples at the same time.
UPDATE
As pointed out by Austin Anderson:
fails on yyyyyyyyy or xxxxxx
To prevent that, we can add a zero-width negative lookahead, to ensure input doesn't start with two of the same character:
(?!(.)\1)(.)(.)(?:\2\3)*\2?
See regex101.com.
Or you can use Austin Anderson's simpler version:
(.)(?!\1)(.)(?:\1\2)*\1?
Actually your regex is almost correct but problem is that you have enclosed your regex in 2 character classes and you need to match an optional 2nd character in the end.
You just need to use this regex:
public static int check(String a) {
if (a.length() < 2)
return 0;
char on = a.charAt(0);
char to = a.charAt(1);
if(on != to) {
String re = on+"("+to+on+")*"+to+"?|"+to+"("+on+to+")*"+on+"?";
System.out.println("re: " + re);
if(a.matches(re)) {
return a.length();
}
}
return 0;
}
Code Demo

Counting comma and any text in java String

I'm trying to write a function to count specific Strings.
The Strings to count look like the following:
first any character except comma at least once -
the comma -
any chracter but at least once
example string:
test, test, test,
should count to 3
I've tried do that by doing the following:
int countSubstrings = 0;
final Pattern pattern = Pattern.compile("[^,]*,.+");
final Matcher matcher = pattern.matcher(commaString);
while (matcher.find()) {
countSubstrings++;
}
Though my solution doesn't work. It always ends up counting to one and no further.
Try this pattern instead: [^,]+
As you can see in the API, find() will give you the next subsequence that matches the pattern. So this will find your sequences of "non-commas" one after the other.
Your regex, especially the .+ part will match any char sequence of at least length 1. You want the match to be reluctant/lazy so add a ?: [^,]*,.+?
Note that .+? will still match a comma that directly follows a comma so you might want to replace .+? with [^,]+ instead (since commas can't match with this lazyness is not needed).
Besides that an easier solution might be to split the string and get the length of the array (or loop and check the elements if you don't want to allow for empty strings):
countSubstrings = commaString.split(",").length;
Edit:
Since you added an example that clarifies your expectations, you need to adjust your regex. You seem to want to count the number of strings followed by a comma so your regex can be simplified to [^,]+,. This matches any char sequence consisting of non-comma chars which is followed by a comma.
Note that this wouldn't match multiple commas or text at the end of the input, e.g. test,,test would result in a count of 1. If you have that requirement you need to adjust your regex.
So, quite good answers are already given. Very readable. Something like this should work, beware, it's not clean and probably not the fastest way to do this. But is is quite readable. :)
public int countComma(String lots_of_words) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == ',') {
count++;
}
}
return count;
}
Or even better:
public int countChar(String lots_of_words, char the_chosen_char) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == the_chosen_char) {
count++;
}
}
return count;
}

Java String- How to get a part of package name in android?

Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.
As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);
You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag
All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];
private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}
I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");

Android get nearby comma, space, or period in a string

I have a string and I want to get the first comma, space, or period in it.
int word = title.indexOf(" ", idx);
This will get the first space, how Can I make it to get the first thing from space, comma, or period?
I tried using || but didn't work.
ex.
int word = title.indexOf(" " || "," || ".", idx);
Gets the index of the first occurence of space, comma or dot or -1 if none of them could be found:
Pattern pattern = Pattern.compile("[ ,\\.]");
Matcher matcher = pattern.matcher(title);
int index = matcher.find() ? matcher.start() : -1;
Note that you can pre-compile the pattern and reuse it as often as you like.
See also http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Note also that if you want to break a text into single words, you can/should use a BreakIterator instead!
What you're doing isn't valid Java syntax. Use the indexOf() method with a space, comma and period, then determine the smallest of these 3 values.
int a = title.indexOf(" ", idx);
int b = title.indexof(",", idx);
int c = title.indexOf(".", idx);
Then just determine which is the smallest.
A faster way would be to write your own method. Behind the scenes, indexOf will just loop over all the characters. You can do that yourself manually
public static int findFirstOccurrence(String s) {
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == ',' || // period/space) {
return i;
}
}
return -1;
}
Unfortunately, you can't use array of characters for indexOf, instead you need to call indexOf three times, or you can match a regex, the code you provided is invalid java syntax. this symbol || is a conditional OR operator that you can use to perform boolean operations like
if(x || y )

How to check if a string contains only digits in Java

In Java for String class there is a method called matches, how to use this method to check if my string is having only digits using regular expression. I tried with below examples, but both of them returned me false as result.
String regex = "[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));
String regex = "^[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));
Try
String regex = "[0-9]+";
or
String regex = "\\d+";
As per Java regular expressions, the + means "one or more times" and \d means "a digit".
Note: the "double backslash" is an escape sequence to get a single backslash - therefore, \\d in a java String gives you the actual result: \d
References:
Java Regular Expressions
Java Character Escape Sequences
Edit: due to some confusion in other answers, I am writing a test case and will explain some more things in detail.
Firstly, if you are in doubt about the correctness of this solution (or others), please run this test case:
String regex = "\\d+";
// positive test cases, should all be "true"
System.out.println("1".matches(regex));
System.out.println("12345".matches(regex));
System.out.println("123456789".matches(regex));
// negative test cases, should all be "false"
System.out.println("".matches(regex));
System.out.println("foo".matches(regex));
System.out.println("aa123bb".matches(regex));
Question 1:
Isn't it necessary to add ^ and $ to the regex, so it won't match "aa123bb" ?
No. In java, the matches method (which was specified in the question) matches a complete string, not fragments. In other words, it is not necessary to use ^\\d+$ (even though it is also correct). Please see the last negative test case.
Please note that if you use an online "regex checker" then this may behave differently. To match fragments of a string in Java, you can use the find method instead, described in detail here:
Difference between matches() and find() in Java Regex
Question 2:
Won't this regex also match the empty string, "" ?*
No. A regex \\d* would match the empty string, but \\d+ does not. The star * means zero or more, whereas the plus + means one or more. Please see the first negative test case.
Question 3
Isn't it faster to compile a regex Pattern?
Yes. It is indeed faster to compile a regex Pattern once, rather than on every invocation of matches, and so if performance implications are important then a Pattern can be compiled and used like this:
Pattern pattern = Pattern.compile(regex);
System.out.println(pattern.matcher("1").matches());
System.out.println(pattern.matcher("12345").matches());
System.out.println(pattern.matcher("123456789").matches());
You can also use NumberUtil.isNumber(String str) from Apache Commons
Using regular expressions is costly in terms of performance. Trying to parse string as a long value is inefficient and unreliable, and may be not what you need.
What I suggest is to simply check if each character is a digit, what can be efficiently done using Java 8 lambda expressions:
boolean isNumeric = someString.chars().allMatch(x -> Character.isDigit(x));
One more solution, that hasn't been posted, yet:
String regex = "\\p{Digit}+"; // uses POSIX character class
You must allow for more than a digit (the + sign) as in:
String regex = "[0-9]+";
String data = "23343453";
System.out.println(data.matches(regex));
Long.parseLong(data)
and catch exception, it handles minus sign.
Although the number of digits is limited this actually creates a variable of the data which can be used, which is, I would imagine, the most common use-case.
We can use either Pattern.compile("[0-9]+.[0-9]+") or Pattern.compile("\\d+.\\d+"). They have the same meaning.
the pattern [0-9] means digit. The same as '\d'.
'+' means it appears more times.
'.' for integer or float.
Try following code:
import java.util.regex.Pattern;
public class PatternSample {
public boolean containNumbersOnly(String source){
boolean result = false;
Pattern pattern = Pattern.compile("[0-9]+.[0-9]+"); //correct pattern for both float and integer.
pattern = Pattern.compile("\\d+.\\d+"); //correct pattern for both float and integer.
result = pattern.matcher(source).matches();
if(result){
System.out.println("\"" + source + "\"" + " is a number");
}else
System.out.println("\"" + source + "\"" + " is a String");
return result;
}
public static void main(String[] args){
PatternSample obj = new PatternSample();
obj.containNumbersOnly("123456.a");
obj.containNumbersOnly("123456 ");
obj.containNumbersOnly("123456");
obj.containNumbersOnly("0123456.0");
obj.containNumbersOnly("0123456a.0");
}
}
Output:
"123456.a" is a String
"123456 " is a String
"123456" is a number
"0123456.0" is a number
"0123456a.0" is a String
According to Oracle's Java Documentation:
private static final Pattern NUMBER_PATTERN = Pattern.compile(
"[\\x00-\\x20]*[+-]?(NaN|Infinity|((((\\p{Digit}+)(\\.)?((\\p{Digit}+)?)" +
"([eE][+-]?(\\p{Digit}+))?)|(\\.((\\p{Digit}+))([eE][+-]?(\\p{Digit}+))?)|" +
"(((0[xX](\\p{XDigit}+)(\\.)?)|(0[xX](\\p{XDigit}+)?(\\.)(\\p{XDigit}+)))" +
"[pP][+-]?(\\p{Digit}+)))[fFdD]?))[\\x00-\\x20]*");
boolean isNumber(String s){
return NUMBER_PATTERN.matcher(s).matches()
}
Refer to org.apache.commons.lang3.StringUtils
public static boolean isNumeric(CharSequence cs) {
if (cs == null || cs.length() == 0) {
return false;
} else {
int sz = cs.length();
for(int i = 0; i < sz; ++i) {
if (!Character.isDigit(cs.charAt(i))) {
return false;
}
}
return true;
}
}
In Java for String class, there is a method called matches(). With help of this method you can validate the regex expression along with your string.
String regex = "^[\\d]{4}$";
String value = "1234";
System.out.println(data.matches(value));
The Explanation for the above regex expression is:-
^ - Indicates the start of the regex expression.
[] - Inside this you have to describe your own conditions.
\\\d - Only allows digits. You can use '\\d'or 0-9 inside the bracket both are same.
{4} - This condition allows exactly 4 digits. You can change the number according to your need.
$ - Indicates the end of the regex expression.
Note: You can remove the {4} and specify + which means one or more times, or * which means zero or more times, or ? which means once or none.
For more reference please go through this website: https://www.rexegg.com/regex-quickstart.html
Offical regex way
I would use this regex for integers:
^[-1-9]\d*$
This will also work in other programming languages because it's more specific and doesn't make any assumptions about how different programming languages may interpret or handle regex.
Also works in Java
\\d+
Questions regarding ^ and $
As #vikingsteve has pointed out in java, the matches method matches a complete string, not parts of a string. In other words, it is unnecessary to use ^\d+$ (even though it is the official way of regex).
Online regex checkers are more strict and therefore they will behave differently than how Java handles regex.
Try this part of code:
void containsOnlyNumbers(String str)
{
try {
Integer num = Integer.valueOf(str);
System.out.println("is a number");
} catch (NumberFormatException e) {
// TODO: handle exception
System.out.println("is not a number");
}
}

Categories