Java Regular expression to find out the number of matching words - java

I am learning regular expression.Suppose, If I have two String like abcd & bcdd. To make them equal Strings I have to remove a from first and d from last string. is this possible to count the matched number like bcd=> (3).
Currently, I am doing this
Pattern p= Pattern.compile("["+abcd+"]{2}");
Matcher m= p.matcher("abcd bcdd");
My current solution doesn't provide me the correct result. So, my question
1) Is this possible ?
2) If possible, then how can I achieve that ?
Hope, you will help to increase my regular expression knowledge.

Not sure why you would use regex at all, if all you need is the number of "bcd"s. I've put both a non-regex and regex version here for comparison.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
<P>{#code java BcdRegexXmpl}</P>
**/
public class BcdRegexXmpl {
public static final void main(String[] igno_red) {
String sSentence = "abcd bcdd";
int iBcds = 0;
int iIdx = 0;
while(true) {
int iBcdIdx = sSentence.indexOf("bcd", iIdx);
if(iBcdIdx == -1) {
break;
}
iIdx = iBcdIdx + "bcd".length();
iBcds++;
}
System.out.println("Number of 'bcd's (no regex): " + iBcds);
//Alternatively
iBcds = 0;
//Same regex as #la-comadreja, with word-boundaries
//(for multiple "bcd"-s in a single word, remove the "\\b"-s)
Matcher m = Pattern.compile("\\b\\w*bcd\\w*\\b").matcher(sSentence);
while(m.find()) {
System.out.println("Found at index " + m.start());
iBcds++;
}
System.out.println("Number of 'bcd's (with regex): " + iBcds);
}
}
Output:
[R:\jeffy\programming\sandbox\xbnjava]java BcdRegexXmpl
Number of 'bcd's (no regex): 2
Found at index 0
Found at index 5
Number of 'bcd's (with regex): 2

Your pattern should be:
(a?)(bcd)(d?)
Another possibility is to write it as
\w*bcd\w*
If you want to count the number of "bcd"s in the string:
int bcds = 0;
for (int i = 0; i < str.length() - 2; i++) {
if (str.charAt(i) == 'b' && str.charAt(i+1) == 'c' && str.charAt(i+2) == 'd')
bcds++;
}

A maximally generalizable, concise and readable (and reasonably efficient) non-Regex answer:
int countMatches(String s, String searchStr) {
//Here, s is "abcd bcdd" and searchStr is "bcd"
int matches = 0;
for (int i = 0; i < s.length() - searchStr.length() + 1; i++) {
for (int j = 0; j < searchStr.length(); j++) {
if (s.charAt(i + j) != searchStr.charAt(j)) break;
if (j == searchStr.length() - 1) matches++;
}
}
return matches;
}

Related

Compress the string in java

Please help with the java code below.
When I give input, for example, aabbcccd,
the output is 99100102d, but it should be a2b2c3d.
Can anyone tell what's my mistake in this code? (This code tries to capture input and output how often a specific char has been typed)
import java.util.*;
public class Main {
public static void main(String args[]) {
try {
Scanner scn = new Scanner(System.in);
String s = scn.nextLine(); // taking input
StringBuilder str = new StringBuilder(s);
StringBuilder str_new = new StringBuilder();
int i = 0 ;
while (i < str.length()) {
int count = 1;
while (i < str.length()-1 && str.charAt(i) == str.charAt(i+1)){
count += 1;
i++;
}
if (count == 1)
str_new.append(str.charAt(i));
else
str_new.append(str.charAt(i) + (char)count);
i++;
}
System.out.println(str_new);
} catch (Exception e) {
return;
}
}
}
The problem comes from str.charAt(i) + (char)count, as they are 2 chars, they are summed up with their int value,
Solve that by using consecutive append() calls
str_new.append(str.charAt(i)).append(count);
You can reduce the code by using an outer for-loop and a ternary operator in the append, and increment only i in the inner while by saving i before
int count;
for (int i = 0; i < str.length(); i++) {
count = i;
while (i < str.length() - 1 && str.charAt(i) == str.charAt(i + 1)) {
i++;
}
str_new.append(str.charAt(i)).append((i - count) == 0 ? "" : (i - count + 1));
}
Your primary issue was the used of the StringBuilder and entering the values which I show in this example. But in this case I am using regular expressions.
(.) is a capture block that matches on any character
\\1* refers to the first capture block followed by 0 or more of the same character.
The following code constructs the Matcher for the entered text and then continues to find subsequent matches. They could be printed out as found or placed in a StringBuilder as I chose to do.
Scanner scn = new Scanner(System.in);
String text = scn.nextLine();
Matcher m = Pattern.compile("(.)\\1*").matcher(text);
StringBuilder sb = new StringBuilder();
while (m.find()) {
String s = m.group();
int count = s.length();
sb.append(s.charAt(0)).append(count > 1 ? count : "");
}
System.out.println(sb.toString());
for aaabbbbcadb Prints
a3b4cadb

Java - extract content inside square brackets (ignore nested square brackets)? [duplicate]

This question already has an answer here:
Match contents within square brackets, including nested square brackets
(1 answer)
Closed 3 years ago.
I want to extract the string content inside square brackets (if inside one square brackets contains nested square brackets, it should be ignored).
Example:
c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5
Should return:
match1 = "ts[0],99:99,99:99";
match2 = "ts[1],99:99,99:99, ts[2]";
The code I have so far works only with non-nested square brackets
String in = "c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5";
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher(in);
while(m.find()) {
System.out.println(m.group(1));
}
// print: ts[0, ts[1, 2
I made a function to do it (not with regex, but it works)
for (int i = 0; i < in.length(); i++){
char c = in.charAt(i);
String part = String.valueOf(c);
int numberOfOpenBrackets = 0;
if (c == '[') {
part = "";
numberOfOpenBrackets++;
for (int j = i + 1; j < in.length(); j++) {
char d = in.charAt(j);
if (d == '[') {
numberOfOpenBrackets++;
}
if (d == ']') {
numberOfOpenBrackets--;
i = j;
if (numberOfOpenBrackets == 0) {
break;
}
}
part += d;
}
System.out.println(part);
part = "[" + part + "]";
}
result += part;
}
// print: ts[0],99:99,99:99
// ts[1],99:99,99:99, ts[2]
If the nesting is just one level, you can search for a sequence between the brackets:
a sequence of:
either a not a [
or a [ followed by the shortest sequence to ]
So
Pattern p = Pattern.compile("\\[([^\\[]|\\[.*?\\])*\\]");
// [ ]
// ( not-[ or
// [, shortest sequence to ]
// )* repeatedly
The problem being that brackets must be correctly paired: no syntax errors allowed.
Without regex; just straight java:
import java.util.ArrayList;
import java.util.List;
public class BracketParser {
public static List<String> parse(String target) throws Exception {
List<String> results = new ArrayList<>();
for (int idx = 0; idx < target.length(); idx++) {
if (target.charAt(idx) == '[') {
String result = readResult(target, idx + 1);
if (result == null) throw new Exception();
results.add(result);
idx += result.length() + 1;
}
}
return results;
}
private static String readResult(String target, int startIdx) {
int openBrackets = 0;
for (int idx = startIdx; idx < target.length(); idx++) {
char c = target.charAt(idx);
if (openBrackets == 0 && c == ']')
return target.substring(startIdx, idx);
if (c == '[') openBrackets++;
if (c == ']') openBrackets--;
}
return null;
}
public static void main(String[] args) throws Exception {
System.out.println(parse("c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5"));
}
}
Complete code on GitHub
You might want to add a right boundary in your expression and ts start and swipe everything in between, which might work, maybe similar to this expression:
(ts.*?)(\]\s+\+)
If we have more chars here: (\s\+), you can simply add it with logical ORs in a char list and it would still work.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:

searching a Char letter by letter

Trying to search for patterns of letters in a file, the pattern is entered by a user and comes out as a String, so far I've got it to find the first letter by unsure how to make it test to see if the next letter also matches the pattern.
This is the loop I currently have. any help would be appreciated
public void exactSearch(){
if (pattern==null){UI.println("No pattern");return;}
UI.println("===================\nExact searching for "+patternString);
int j = 0 ;
for(int i=0; i<data.size(); i++){
if(patternString.charAt(i) == data.get(i) )
j++;
UI.println( "found at " + j) ;
}
}
You need to iterate over the first string until you find the first character of the other string. From there, you can create an inner loop and iterate on both simultaneously, like you did.
Hint: be sure to look watch for boundaries as the strings might not be of the same size.
You can try this :-
String a1 = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = 0;
while(foundIndex != -1) {
foundIndex = a1.indexOf(pattern,foundIndex);
if(foundIndex != -1)
{
System.out.println(foundIndex);
foundIndex += 1;
}
}
indexOf - first parameter is the pattern string,
second parameter is starting index from where we have to search.
If pattern is found, it will return the starting index from where the pattern matched.
If pattern is not found, indexOf will return -1.
String data = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = data.indexOf(pattern);
while (foundIndex > -1) {
System.out.println("Match found at: " + foundIndex);
foundIndex = data.indexOf(pattern, foundIndex + pattern.length());
}
Based on your request, you can use this algorithm to search for your positions:
1) We check if we reach at the end of the string, to avoid the invalidIndex error, we verify if the remaining substring's size is smaller than the pattern's length.
2) We calculate the substring at each iteration and we verify the string with the pattern.
List<Integer> positionList = new LinkedList<>();
String inputString = "AAACABCCCABC";
String pattern = "ABC";
for (int i = 0 ; i < inputString.length(); i++) {
if (inputString.length() - i < pattern.length()){
break;
}
String currentSubString = inputString.substring(i, i + pattern.length());
if (currentSubString.equals(pattern)){
positionList.add(i);
}
}
for (Integer pos : positionList) {
System.out.println(pos); // Positions : 4 and 9
}
EDIT :
Maybe it can be optimized, not to use a Collection for this simple task, but I used a LinkedList to write a quicker approach.

How to count white spaces in a given argument?

I find it strange why spaceCount doesn't add up when the expression is "12 + 1". I get an output 0 for spaceCount even though it should be 2. Any insight would be appreciated!
public int countSpaces(String expr) {
String tok = expr;
int spaceCount = 0;
String delimiters = "+-*/#! ";
StringTokenizer st = new StringTokenizer(expr, delimiters, true);
while (st.hasMoreTokens()) {
if ((tok = st.nextToken()).equals(" ")) {
spaceCount++;
}
}
return spaceCount; // the expression is: 12 + 1, so this should return 2, but it returns 0;
}
Your code seems to be ok, but if you want to count spaces you can use this :
int count = str.length() - str.replace(" ", "").length();
A tokenizer is overkill (and doesn't really help you) for this problem. Just loop through all the characters and count the spaces:
public int countSpaces( String expr )
{
int count = 0;
for( int i = 0; i < expr.length(); ++i )
{
if( expr.charAt(i) == ' ' )
++count;
}
return count;
}
Another one line solution could be the following which also performs a NULL check to the string.
int spacesCount = str == null ? 0 : str.length() - str.replace(" ", "").length();
Can also use:
String[] strArr = st.split(" ");
if (strArr.length > 1){
int countSpaces = strArr.length - 1;
}
This will find white spaces, including special ones.
You can keep the pattern so you don't need to compile it every time. If just need to search for " ", a loop should do it instead.
Matcher spaces = Pattern.compile("\\s").matcher(argumentString);
int count = 0;
while (spaces.find()) {
count++;
}

Find count of digits in string variable

I have a string which sometimes gives character value and sometimes gives integer value. I want to get the count of number of digits in that string.
For example, if string contains "2485083572085748" then total number of digits is 16.
Please help me with this.
A cleaner solution using Regular Expressions:
// matches all non-digits, replaces it with "" and returns the length.
s.replaceAll("\\D", "").length()
String s = "2485083572085748";
int count = 0;
for (int i = 0, len = s.length(); i < len; i++) {
if (Character.isDigit(s.charAt(i))) {
count++;
}
}
Just to refresh this thread with stream option of counting digits in a string:
"2485083572085748".chars()
.filter(Character::isDigit)
.count();
If your string gets to big and full of other stuff than digits you should try to do it with regular expressions. Code below would do that to you:
String str = "asdasd 01829898 dasds ds8898";
Pattern p = Pattern.compile("\d"); // "\d" is for digits in regex
Matcher m = p.matcher(str);
int count = 0;
while(m.find()){
count++;
}
check out java regex lessons for more.
cheers!
Loop each character and count it.
String s = "2485083572085748";
int counter = 0;
for(char c : s.toCharArray()) {
if( c >= '0' && c<= '9') {
++counter;
}
}
System.out.println(counter);
public static int getCount(String number) {
int flag = 0;
for (int i = 0; i < number.length(); i++) {
if (Character.isDigit(number.charAt(i))) {
flag++;
}
}
return flag;
}
in JavaScript:
str = "2485083572085748"; //using the string in the question
let nondigits = /\D/g; //regex for all non-digits
let digitCount = str.replaceAll(nondigits, "").length;
//counts the digits after removing all non-digits
console.log(digitCount); //see in console
Thanks --> https://stackoverflow.com/users/1396264/vedant for the Java version above. It helped me too.
int count = 0;
for(char c: str.toCharArray()) {
if(Character.isDigit(c)) {
count++;
}
}
Also see
Javadoc
Something like:
using System.Text.RegularExpressions;
Regex r = new Regex( "[0-9]" );
Console.WriteLine( "Matches " + r.Matches("if string contains 2485083572085748 then" ).Count );

Categories