Regular expression to remove everything but words. java - java

This code doesn't seem doing the right job. It removes the spaces between the words!
input = scan.nextLine().replaceAll("[^A-Za-z0-9]", "");
I want to remove all extra spaces and all numbers or abbreviations from a string, except words and this character: '.
For Example:
input: 34 4fF$##D one 233 r # o'clock 329riewio23
returns: one o'clock

public static String filter(String input) {
return input.replaceAll("[^A-Za-z0-9' ]", "").replaceAll(" +", " ");
}
The first replace replaces all characters except alphabetic characters, the single-quote, and spaces. The second replace replaces all instances of one or more spaces, with a single space.

Your solution doesn't work because you don't replace numbers and you also replace the ' character.
Check out this solution:
Pattern pattern = Pattern.compile("[^| ][A-Za-z']{2,} ");
String input = scan.nextLine();
Matcher matcher = pattern.matcher(input);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append(matcher.group());
}
System.out.println(result.toString());
It looks for the beginning of the string or a space ([^| ]) and then takes all the following characters ([A-Za-z']). However, it only takes the word if there are 2 or more charactes ({2,}) and there has to be a trailing space.

If you want to just extract that time information use this regex group match:
input = scan.nextLine();
Pattern p = Pattern.compile("([a-zA-Z]{3,})\\s.*?(o'clock)");
Matcher m = p.matcher(input);
if (m.find()) {
input = m.group(1) + " " + m.group(2);
}
The regex is quite naive though, and will only work if the input is always of a similar format.

Related

I want to replace a string with repeating char characters

Given regex I want to replace that part of string with multiple "." character based on its size.
I tried something like this:
s = s.replaceAll(matcher.group(1),"." * matcher.group(1).length() );
but the "." * length gives an error any way I can fix that.
You might have to use a formal pattern matcher here:
String input = "Peas porridge hot, peas porridge cold";
Pattern pattern = Pattern.compile("(?i)\\bpeas\\b");
Matcher m = pattern.matcher(input);
StringBuffer buffer = new StringBuffer();
while(m.find()) {
m.appendReplacement(buffer, m.group().replaceAll(".", "."));
}
m.appendTail(buffer);
System.out.println(buffer.toString());
// .... porridge hot, .... porridge cold
The above logic is to match each occurrence of peas (case insensitive). For each match, we pause and splice on a replacement which is the match (peas), with every character being replaced by dot.

Java Regex Matcher skipping the matches

Below is my Java code to delete all pair of adjacent letters that match, but I am getting some problems with the Java Matcher class.
My Approach
I am trying to find all successive repeated characters in the input e.g.
aaa, bb, ccc, ddd
Next replace the odd length match with the last matched pattern and even length match with "" i.e.
aaa -> a
bb -> ""
ccc -> c
ddd -> d
s has single occurrence, so it's not matched by the regex pattern and excluded from the substitution
I am calling Matcher.appendReplacement to do conditional replacement of the patterns matched in input, based on the group length (even or odd).
Code:
public static void main(String[] args) {
String s = "aaabbcccddds";
int i=0;
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("([a-z])\\1+");
Matcher m = repeatedChars.matcher(s);
while(m.find()) {
if(m.group(i).length()%2==0)
m.appendReplacement(output, "");
else
m.appendReplacement(output, "$1");
i++;
}
m.appendTail(output);
System.out.println(output);
}
Input : aaabbcccddds
Actual Output : aaabbcccds (only replacing ddd with d but skipping aaa, bb and ccc)
Expected Output : acds
This can be done in a single replaceAll call like this:
String repl = str.replaceAll( "(?:(.)\\1)+", "" );
Regex expression (?:(.)\\1)+ matches all occurrences of even repetitions and replaces it with empty string this leaving us with first character of odd number of repetitions.
RegEx Demo
Code using Pattern and Matcher:
final Pattern p = Pattern.compile( "(?:(.)\\1)+" );
Matcher m = p.matcher( "aaabbcccddds" );
String repl = m.replaceAll( "" );
//=> acds
You can try like that:
public static void main(String[] args) {
String s = "aaabbcccddds";
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("(\\w)(\\1+)");
Matcher m = repeatedChars.matcher(s);
while(m.find()) {
if(m.group(2).length()%2!=0)
m.appendReplacement(output, "");
else
m.appendReplacement(output, "$1");
}
m.appendTail(output);
System.out.println(output);
}
It is similar to yours but when getting just the first group you match the first character and your length is always 0. That's why I introduce a second group which is the matched adjacent characters. Since it has length of -1 I reverse the odd even logic and voila -
acds
is printed.
You don't need multiple if statements. Try:
(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)
Replace with $1
Regex live demo
Java code:
str.replaceAll("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)", "$1");
Java live demo
Regex breakdown:
(?: Start of non-capturing group
(\\w) Capture a word character
(?:\\1\\1)+ Match an even number of same character
| Or
(\\w) Capture a word character
\\2+ Match any number of same character
) End of non-capturing group
(?!\\1|\\2) Not followed by previous captured characters
Using Pattern and Matcher with StringBuffer:
StringBuffer output = new StringBuffer();
Pattern repeatedChars = Pattern.compile("(?:(\\w)(?:\\1\\1)+|(\\w)\\2+)(?!\\1|\\2)");
Matcher m = repeatedChars.matcher(s);
while(m.find()) m.appendReplacement(output, "$1");
m.appendTail(output);
System.out.println(output);

JAVA : Ignoring the space at the beigining of a string read from file((splitting that first space))

In Java , I want to read a string from file but I need to ignore if the string contains space at the beginning.
For example:
//1 space// what //more than 1 space// 893
The output must be:
what 893
only one space between them and their is no space at the begining
I tried method: String.split() but it didn't split more than one space.
You could try a pattern such as "\\s*(\\S+)\\s+(\\d+)". That would be any optional whitepace followed by a group of non-whitespace characters then any (consecutive) whitespace and finally a group of digits. Like this,
String in = " what 893";
Pattern p = Pattern.compile("\\s*(\\S+)\\s+(\\d+)");
Matcher m = p.matcher(in);
if (m.matches()) {
System.out.printf("%s %s%n", m.group(1), m.group(2));
}
Output is
what 893
Try:
s=" what 893";
s = s.replaceAll("^\s+","");
s = s.replaceAll("\s+"," ");
you can use the String.trim() to remove the leading space
You first can replace all multiple spaces with single space and then do the required operation.
String newString = originalString.trim().replaceAll("\\s+", " ");
// perform required operation on newString

Find words in string surrounded by "[" and "]":

I need help with a simple task in java. I have the following sentence:
Where Are You [Employee Name]?
your have a [Shift] shift..
I need to extract the strings that are surrounded by [ and ] signs.
I was thinking of using the split method with " " parameter and then find the single words, but I have a problem using that if the phrase I'm looking for contains: " ". using indexOf might be an option as well, only I don't know what is the indication that I have reached the end of the String.
What is the best way to perform this task?
Any help would be appreciated.
Try with regex \[(.*?)\] to match the words.
\[: escaped [ for literal match as it is a meta char.
(.*?) : match everything in a non-greedy way.
Sample code:
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher("Where Are You [Employee Name]? your have a [Shift] shift.");
while(m.find()) {
System.out.println(m.group());
}
Here you go Java regular expression that extract text between two brackets including white spaces:
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="[ Employee Name ]";
String re1=".*?";
String re2="( )";
String re3="((?:[a-z][a-z]+))"; // Word 1
String re4="( )";
String re5="((?:[a-z][a-z]+))"; // Word 2
String re6="( )";
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String ws1=m.group(1);
String word1=m.group(2);
String ws2=m.group(3);
String word2=m.group(4);
String ws3=m.group(5);
System.out.print("("+ws1.toString()+")"+"("+word1.toString()+")"+"("+ws2.toString()+")"+"("+word2.toString()+")"+"("+ws3.toString()+")"+"\n");
}
}
}
if you want to ignore white space remove "( )";
This is a Scanner base solution
Scanner sc = new Scanner("Where Are You [Employee Name]? your have a [Shift] shift..");
for (String s; (s = sc.findWithinHorizon("(?<=\\[).*?(?=\\])", 0)) != null;) {
System.out.println(s);
}
output
Employee Name
Shift
Use a StringBuilder (I assume you don't need synchronization).
As you suggested, indexOf() using your square bracket delimiters will give you a starting index and an ending index. use substring(startIndex + 1, endIndex - 1) to get exactly the string you want.
I'm not sure what you meant by the end of the String, but indexOf("[") is the start and indexOf("]") is the end.
That's pretty much the use case for a regular expression.
Try "(\\[[\\w ]*\\])" as your expression.
Pattern p = Pattern.compile("(\\[[\\w ]*\\])");
Matcher m = p.matcher("Where Are You [Employee Name]? your have a [Shift] shift..");
if (m.find()) {
String found = m.group();
}
What does this expression do?
First it defines a group (...)
Then it defines the starting point for that group. \[ matches [ since [ itself is a 'keyword' for regular expressions it has to be masked by \ which is reserved in Java Strings and has to be masked by another \
Then it defines the body of the group [\w ]*... here the regexpression [] are used along with \w (meaning \w, meaning any letter, number or undescore) and a blank, meaning blank. The * means zero or more of the previous group.
Then it defines the endpoint of the group \]
and closes the group )

Java and regular expression, substring

I'm am tottaly lost when coming to regular expressions.
I get generated strings like:
Your number is (123,456,789)
How can I filter out 123,456,789?
You can use this regex for extracting the number including the commas
\(([\d,]*)\)
The first captured group will have your match. Code will look like this
String subjectString = "Your number is (123,456,789)";
Pattern regex = Pattern.compile("\\(([\\d,]*)\\)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
String resultString = regexMatcher.group(1);
System.out.println(resultString);
}
Explanation of the regex
"\\(" + // Match the character “(” literally
"(" + // Match the regular expression below and capture its match into backreference number 1
"[\\d,]" + // Match a single character present in the list below
// A single digit 0..9
// The character “,”
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
")" +
"\\)" // Match the character “)” literally
This will get you started http://www.regular-expressions.info/reference.html
String str="Your number is (123,456,789)";
str = str.replaceAll(".*\\((.*)\\).*","$1");
or you can make the replacement a bit faster by doing:
str = str.replaceAll(".*\\(([\\d,]*)\\).*","$1");
try
"\\(([^)]+)\\)"
or
int start = text.indexOf('(')+1;
int end = text.indexOf(')', start);
String num = text.substring(start, end);
private void showHowToUseRegex()
{
final Pattern MY_PATTERN = Pattern.compile("Your number is \\((\\d+),(\\d+),(\\d+)\\)");
final Matcher m = MY_PATTERN.matcher("Your number is (123,456,789)");
if (m.matches()) {
Log.d("xxx", "0:" + m.group(0));
Log.d("xxx", "1:" + m.group(1));
Log.d("xxx", "2:" + m.group(2));
Log.d("xxx", "3:" + m.group(3));
}
}
You'll see the first group is the whole string, and the next 3 groups are your numbers.
String str = "Your number is (123,456,789)";
str = new String(str.substring(16,str.length()-1));

Categories