Remove begining punctuation from a word - java

I have seen a couple of threads here that kindof matches what I am asking here. But none are concrete. If I have a string like "New Delhi", I want my code to extract New Delhi. So here the quotes are stripped off. I want to strip off any punctuation, in general at start and end.
So far, this helps to strip the punctuations at the end: String replacedString = replaceable_string.replaceAll("\\p{Punct}*([a-z]+)\\p{Punct}*", "$1");
What am I doing wrong here? My output is "New Delhi with the beginning quote still there.

The following will remove a punctuation character from both the beginning and end of a String object if present:
String s = "\"New, Delhi\"";
// Output: New, Delhi
System.out.println(s.replaceAll("^\\p{Punct}|\\p{Punct}$", ""));
The ^ part of the Regex represents the beginning of the text, and $ represents the end of the text. So, ^\p{Punct} will match a punctuation that is a first character and \p{Punct}$ will match a punctuation that is a last character. I used | (OR) to match either the first expression or the second one, resulting in ^\p{Punct}|\p{Punct}$.
In case you want to remove all punctuation characters from the beginning and the end of the String object, you can use the following:
String s = "\"[{New, Delhi}]\"";
// Output: New, Delhi
System.out.println(s.replaceAll("^\\p{Punct}+|\\p{Punct}+$", ""));
I simply added the + sign after each \p{Punct}. The + sign means "One or more", so it will match many punctuations if they are present at the beginning or end of the text.
Hope this is what you were looking for :)

class SO {
public static void main(String[] args) {
String input = "\"New Delhi\"";
String output = "";
try {
output = input.replaceAll("(^\\p{P}+)(.+)(\\p{P}+$)", "($1)($2)($3)");
} catch (IndexOutOfBoundsException e) {
}
System.out.println("Input: " + input);
System.out.println("Output: " + output);
}
}
Result:
Input: "New Delhi"
Output: (")(New Delhi)(")

String replacedString = replacable_string.replaceAll("^\"|\"$", "");
or
String replacedString = replace_string.replace("\"", "");
should work also.

Try using this :
String data = "\"New Delhi\"";
Pattern pattern = Pattern.compile("[^\\w\\s]*([\\w\\s]+)[^\\w\\s]*");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
// Indicates match is found. Do further processing
System.out.println(matcher.group(1));
}

try
String s = "\"New Deli\"".replaceAll("\\p{Punct}*(\\P{Punct}+)\\p{Punct}*", "$1");

your [a-z] is only going to capture lower case letters and no spaces. Try ([a-zA-Z ])

Related

java regex replaceAll with negated groups

I'm trying to use the String.replaceAll() method with regex to only keep letter characters and ['-_]. I'm trying to do this by replacing every character that is neither a letter nor one of the characters above by an empty string.
So far I have tried something like this (in different variations) which correctly keeps letters but replaces the special characters I want to keep:
current = current.replaceAll("(?=\\P{L})(?=[^\\'-_])", "");
Make it simplier :
current = current.replaceAll("[^a-zA-Z'_-]", "");
Explanation :
Match any char not in a to z, A to Z, ', _, - and replaceAll() method will replace any matched char with nothing.
Tested input : "a_zE'R-z4r#m"
Output : a_zE'R-zrm
You don't need lookahead, just use negated regex:
current = current.replaceAll("[^\\p{L}'_-]+", "");
[^\\p{L}'_-] will match anything that is not a letter (unicode) or single quote or underscore or hyphen.
Your regex is too complicated. Just specify the characters you want to keep, and use ^ to negate, so [^a-z'_-] means "anything but these".
public class Replacer {
public static void main(String[] args) {
System.out.println("with 1234 &*()) -/.,>>?chars".replaceAll("[^\\w'_-]", ""));
}
}
You can try this:
String str = "Se#rbi323a`and_Eur$ope#-t42he-[A%merica]";
str = str.replaceAll("[\\d+\\p{Punct}&&[^-'_\\[\\]]]+", "");
System.out.println("str = " + str);
And it is the result:
str = Serbia'and_Europe-the-[America]

Regular expression to remove everything but words. java

This code doesn't seem doing the right job. It removes the spaces between the words!
input = scan.nextLine().replaceAll("[^A-Za-z0-9]", "");
I want to remove all extra spaces and all numbers or abbreviations from a string, except words and this character: '.
For Example:
input: 34 4fF$##D one 233 r # o'clock 329riewio23
returns: one o'clock
public static String filter(String input) {
return input.replaceAll("[^A-Za-z0-9' ]", "").replaceAll(" +", " ");
}
The first replace replaces all characters except alphabetic characters, the single-quote, and spaces. The second replace replaces all instances of one or more spaces, with a single space.
Your solution doesn't work because you don't replace numbers and you also replace the ' character.
Check out this solution:
Pattern pattern = Pattern.compile("[^| ][A-Za-z']{2,} ");
String input = scan.nextLine();
Matcher matcher = pattern.matcher(input);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append(matcher.group());
}
System.out.println(result.toString());
It looks for the beginning of the string or a space ([^| ]) and then takes all the following characters ([A-Za-z']). However, it only takes the word if there are 2 or more charactes ({2,}) and there has to be a trailing space.
If you want to just extract that time information use this regex group match:
input = scan.nextLine();
Pattern p = Pattern.compile("([a-zA-Z]{3,})\\s.*?(o'clock)");
Matcher m = p.matcher(input);
if (m.find()) {
input = m.group(1) + " " + m.group(2);
}
The regex is quite naive though, and will only work if the input is always of a similar format.

How to replace last letter to another letter in java using regular expression

i have seen to replace "," to "." by using ".$"|",$", but this logic is not working with alphabets.
i need to replace last letter of a word to another letter for all word in string containing EXAMPLE_TEST using java
this is my code
Pattern replace = Pattern.compile("n$");//here got the real problem
matcher2 = replace.matcher(EXAMPLE_TEST);
EXAMPLE_TEST=matcher2.replaceAll("k");
i also tried "//n$" ,"\n$" etc
Please help me to get the solution
input text=>njan ayman
output text=> njak aymak
Instead of the end of string $ anchor, use a word boundary \b
String s = "njan ayman";
s = s.replaceAll("n\\b", "k");
System.out.println(s); //=> "njak aymak"
You can use lookahead and group matching:
String EXAMPLE_TEST = "njan ayman";
s = EXAMPLE_TEST.replaceAll("(n)(?=\\s|$)", "k");
System.out.println("s = " + s); // prints: s = njak aymak
Explanation:
(n) - the matched word character
(?=\\s|$) - which is followed by a space or at the end of the line (lookahead)
The above is only an example! if you want to switch every comma with a period the middle line should be changed to:
s = s.replaceAll("(,)(?=\\s|$)", "\\.");
Here's how I would set it up:
(?=.\b)\w
Which in Java would need to be escaped as following:
(?=.\\b)\\w
It translates to something like "a character (\w) after (?=) any single character (.) at the end of a word (\b)".
String s = "njan ayman aowkdwo wdonwan. wadawd,.. wadwdawd;";
s = s.replaceAll("(?=.\\b)\\w", "");
System.out.println(s); //nja ayma aowkdw wdonwa. wadaw,.. wadwdaw;
This removes the last character of all words, but leaves following non-alphanumeric characters. You can specify only specific characters to remove/replace by changing the . to something else.
However, the other answers are perfectly good and might achieve exactly what you are looking for.
if (word.endsWith("char oldletter")) {
name = name.substring(0, name.length() - 1 "char newletter");
}

Regex for special characters in java

public static final String specialChars1= "\\W\\S";
String str2 = str1.replaceAll(specialChars1, "").replace(" ", "+");
public static final String specialChars2 = "`~!##$%^&*()_+[]\\;\',./{}|:\"<>?";
String str2 = str1.replaceAll(specialChars2, "").replace(" ", "+");
Whatever str1 is I want all the characters other than letters and numbers to be removed, and spaces to be replaced by a plus sign (+).
My problem is if I use specialChar1, it does not remove some characters like ;, ', ", and if I am use specialChar2 it gives me an error :
java.util.regex.PatternSyntaxException: Syntax error U_REGEX_MISSING_CLOSE_BRACKET near index 32:
How can this be to achieved?. I have searched but could not find a perfect solution.
This worked for me:
String result = str.replaceAll("[^\\dA-Za-z ]", "").replaceAll("\\s+", "+");
For this input string:
/-+!##$%^&())";:[]{}\ |wetyk 678dfgh
It yielded this result:
+wetyk+678dfgh
replaceAll expects a regex:
public static final String specialChars2 = "[`~!##$%^&*()_+[\\]\\\\;\',./{}|:\"<>?]";
The problem with your first regex, is that "\W\S" means find a sequence of two characters, the first of which is not a letter or a number followed by a character which is not whitespace.
What you mean is "[^\w\s]". Which means: find a single character which is neither a letter nor a number nor whitespace. (we can't use "[\W\S]" as this means find a character which is not a letter or a number OR is not whitespace -- which is essentially all printable character).
The second regex is a problem because you are trying to use reserved characters without escaping them. You can enclose them in [] where most characters (not all) do not have special meanings, but the whole thing would look very messy and you have to check that you haven't missed out any punctuation.
Example:
String sequence = "qwe 123 :#~ ";
String withoutSpecialChars = sequence.replaceAll("[^\\w\\s]", "");
String spacesAsPluses = withoutSpecialChars.replaceAll("\\s", "+");
System.out.println("without special chars: '"+withoutSpecialChars+ '\'');
System.out.println("spaces as pluses: '"+spacesAsPluses+'\'');
This outputs:
without special chars: 'qwe 123 '
spaces as pluses: 'qwe+123++'
If you want to group multiple spaces into one + then use "\s+" as your regex instead (remember to escape the slash).
I had a similar problem to solve and I used following method:
text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
Code with time bench marking
public static String cleanPunctuations(String text) {
return text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
}
public static void test(String in){
long t1 = System.currentTimeMillis();
String out = cleanPunctuations(in);
long t2 = System.currentTimeMillis();
System.out.println("In=" + in + "\nOut="+ out + "\nTime=" + (t2 - t1)+ "ms");
}
public static void main(String[] args) {
String s1 = "My text with 212354 digits spaces and \n newline \t tab " +
"[`~!##$%^&*()_+[\\\\]\\\\\\\\;\\',./{}|:\\\"<>?] special chars";
test(s1);
String s2 = "\"Sample Text=\" with - minimal \t punctuation's";
test(s2);
}
Sample Output
In=My text with 212354 digits spaces and
newline tab [`~!##$%^&*()_+[\\]\\\\;\',./{}|:\"<>?] special chars
Out=My+text+with+212354+digits+spaces+and+newline+tab+special+chars
Time=4ms
In="Sample Text=" with - minimal punctuation's
Out=Sample+Text+with+minimal+punctuations
Time=0ms
you can use a regex like this:
[<#![CDATA[¢<(+|!$*);¬/¦,%_>?:#="~{#}\]]]#>]`
remove "#" at first and at end from expression
regards
#npinti
using "\w" is the same as "\dA-Za-z"
This worked for me:
String result = str.replaceAll("[^\\w ]", "").replaceAll("\\s+", "+");

Match only first and last character of a string

I had a look at other stackoverflow questions and couldn't find one that asked the same question, so here it is:
How do you match the first and last characters of a string (can be multi-line or empty).
So for example:
String = "this is a simple sentence"
Note that the string includes the beginning and ending quotation marks.
How do I get match the first and last characters where the string begins and ends with a quotation mark (").
I tried:
^"|$" and \A"\Z"
but these do not produce the desired result.
Thanks for your help in advance :)
Is this what you are looking for?
String input = "\"this is a simple sentence\"";
String result = input.replaceFirst("(?s)^\"(.*)\"$", " $1 ");
This will replace the first and last character of the input string with spaces if it starts and ends with ". It will also work across multiple lines since the DOTALL flag is specified by (?s).
The regex that matches the whole input ".*". In java, it looks like this:
String regex = "\".*\"";
System.out.println("\"this is a simple sentence\"".matches(regex)); // true
System.out.println("this is a simple sentence".matches(regex)); // false
System.out.println("this is a simple sentence\"".matches(regex)); // false
If you want to remove the quotes, use this:
String input = "\"this is a simple sentence\"";
input = input.replaceAll("(^\"|\"$)", "")); // this is a simple sentence (without any quotes)
If you want this to work over multiple lines, use this:
String input = "\"this is a simple sentence\"\n\"and another sentence\"";
System.out.println(input + "\n");
input = input.replaceAll("(?m)(^\"|\"$)", "");
System.out.println(input);
which produces output:
"this is a simple sentence"
"and another sentence"
this is a simple sentence
and another sentence
Explanation of regex (?m)(^"|"$):
(?m) means "Caret and dollar match after and before newlines for the remainder of the regular expression"
(^"|"$) means ^" OR "$, which means "start of line then a double quote" OR "double quote then end of line"
Why not use the simple logic of getting the first and last characters based on charAt method of String? Place a few checks for empty/incomplete strings and you should be done.
String regexp = "(?s)\".*\"";
String data = "\"This is some\n\ndata\"";
Matcher m = Pattern.compile(regexp).matcher(data);
if (m.find()) {
System.out.println("Match starts at " + m.start() + " and ends at " + m.end());
}

Categories