Deal with apostrophe in java regex in replaceALL - java

Trying the replace only the EXACT & WHOLE OCCURRENCES of pattern using the following code. Apparently you in you'll is being replaced as ###'ll. But what I want is only you to be replaced.
Please suggest.
import java.util.*;
import java.io.*;
public class Fielreadingtest{
public static void main(String[] args) throws IOException {
String MyText = "I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed. ";
String newLine = System.getProperty("line.separator");
System.out.println("Before:" + newLine + MyText);
String pattern = "\\byou\\b";
MyText = MyText.replaceAll(pattern, "###");
System.out.println("After:" + newLine +MyText);
}
}
/*
Before:
I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed.
After:
I knew about ### long before I met ###. I also know that ###’re an awesome person. By the way ###’ll be missed.
*/
This being said I have an input file which contains a list of words that I want to skip which looks like this:
Now as per #Anubhav I have to use (^|\\s)you([\\s.]|$) to replace exactly you but not anything else. Is my best bet to use a tool like notepad++ and pre & post fix all my input words as above or change something in the code itslef. The code I'm using is this:
for (String pattern : patternsToSkip) {
line = line.replaceAll(pattern, "");
}
source: https://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_wordcount2_source.html?scroll=topic_7_1

You can instead use this regex:
String pattern = "(^|\\s)you([\\s.,;:-]|$)";
This will match "you" only at:
start or preceded by a space
end or followed by a space OR a some listed punctuation characters

You can use a negative lookahead:
\b(you)(?!['’])
Escaped for a Java string:
"\\b(you)(?!['’])"
Your demo input contains a different apostrophe than on my keyboard. I've put both in the negative lookahead.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
/**
<P>{#code java ReplaceYouWholeWordWithAtAtAt}</P>
**/
public class ReplaceYouWholeWordWithAtAtAt {
public static final void main(String[] ignored) {
String sRegex = "\\byou(?!['’])";
String sToSearch = "I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed.";
String sRplcWith = "###";
Matcher m = Pattern.compile(sRegex).matcher(sToSearch);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, sRplcWith);
}
m.appendTail(sb);
System.out.println(sb);
}
}
Output:
[C:\java_code\]java ReplaceYouWholeWordWithAtAtAt
I knew about ### long before I met ###. I also know that youÆre an awesome person. By the way youÆll be missed.

Related

Decipher this regex

I came back to a project I was working on several months ago, and one problem I figured out then was when I need to extract a certain part of a String. The String used both paranthesis and quotationmarks, so I couldn't split it like normal text.
Example of how the String might look:
Word_Object("id"): preword:subword
Now say I wanted to only grab what's after the ("id"):, that is
'preword:subword'
I found that regex helped me out, and it took quite some time to find an EXAMPLE that was applicable for what I wanted. I had to settle for example, because I tried to find sources on how to learn about this incredibly complex system but I failed hard at that. The regex that solved it looks like this: "Word_Object(\\(\"" + "id" + "\")\\): "
I was content then that it seemed to work, but now when I got back to the project and tried it, I was trying to extract a word that used a underscore _and the underscore with the following word(s) was left out.
Example, splitting the text Word_Object("id"): preword:subword_underscorewordusing the regex (using complete line now) idSplit = subTemp.split("Word_Object(\\(\"" + "id" + "\")\\): ");would simply return: preword:subwordinstead of the wanted preword:subword_underscoreword.
Did I somehow in this regex instruct it to ignore anything after the 2nd special-character (since it does accept :, but apparently _ breaks everything)?
public static void main(String[] args) {
final String[] split = "Word_Object(\"id\"): preword:subword_underscoreword".split("Word_Object(\\(\"" + "id" + "\")\\): ");
System.out.println("split = " + split[1]);
}
Leads to
split = preword:subword_underscoreword
Since you might need to keep the id dynamic, here is a replaceAll solution:
String s = "Word_Object(\"id\"): preword:subword_underscoreword";
System.out.println(s.replaceAll("Word_Object(\\(\"" + "id" + "\")\\):\\s*",""));
See IDEONE demo
Output: preword:subword_underscoreword
You should match instead of replacing or splitting:
private static final Pattern PRE_SUB_WORD_EXTRACT = Pattern.compile("Word_Object\\(\"\\w+\"\\): (\\w+):(\\w+)");
public static void main(String[] args) {
String test = "Word_Object(\"id\"): preword:subword_underscorewordusing";
Matcher testMatcher = PRE_SUB_WORD_EXTRACT.matcher(test);
if (!testMatcher.matches()) {
System.out.println("Bollocks");
System.exit(1);
}
System.out.printf("%s : %s%n", testMatcher.group(1), testMatcher.group(2));
}
As mentioned in comments there's no need to use .split() it will give you an array of Strings and not the exact one, just use .replace() with an empty string and yopu will get the result you need :
String str = "Word_Object(\"id\"): preword:subword_underscoreword";
String str2 = str.replace("Word_Object(\"id\"): ", "");
This is a DEMO that will give you preword:subword_underscoreword in output.

Regex to detect end of line(\n) that has double slash(//)

I need a regex for this example:
//This is a comment and I need this \n position
String notwanted ="//I do not need this end of line position";
Try this regex:
(?<!")\/\/[^\n]+(\n)
you can use Matcher method matcher.start(1) to get index of \n character, but in will not match String where \\ is preceded by ". Example in Java:
public class Main {
public static void main(String[] args){
String example = "//This is a comment and I need this \\n position\n" +
"String notwanted =\"//I do not need this end of line position\";";
Pattern regex = Pattern.compile("(?<!\")//[^\\n]+(\\n)");
Matcher matcher = regex.matcher(example);
while (matcher.find()) {
System.out.println(matcher.start(1));
}
}
}
however it would be enough to use:
(?<!")\/\/[^\n]+
and just use matcher.end(), to get start position of new line.
Another case, if you would like to split a string using this position, you can also use this one:
example.split("(?<=^//[^\n]{0,1000})\n");
The (?<=^//[^\n]{0,999}) means:
?<= - lookbehind,
^// - beginning of a line, fallowed by // comments sign
[^\n]{0,1000} - multiple characters but not new lines; here is tricky thing, as lookbehind need to have defined lenght, you cannot use quatifires like * or +, this is why you need to use interval, in this case, from 0 to 1000 characters, but be aware, if your comment is more than 1000 characters (not too possible but still possible), it will not work - so set this number (1000 in this example) carefully
\n - new line you are looking for
but if you would like to split whole string in multiple places, you will need to add modifier (?m) - multiline match - on the beginning of regex:
(?m)(?<=^//[^\n]{0,1000})\n
but I'm not entirely sure
>>EDIT<< response to questions from comments
Try this code:
public class Main {
public static void main(String[] args){
String example =
"//This is a comment and I need this \\n position\n" +
"String notwanted =\"//I do not need this end of line position\";\n" +
"String a = aaa; //comment here";
Pattern regex = Pattern.compile("(?m)(?<=(^|;\\s{0,1000})//[^\n]{0,1000})(\n|$)");
Matcher matcher = regex.matcher(example);
while(matcher.find()){
System.out.println(matcher.start());
}
System.out.println(example.replaceAll("(?<=(^|;\\s{0,1000})//[^\n]{0,1000})(\n|$)", " (X)\n"));
}
}
maybe this regex will fulfill your expectations. If not, please redefine and ask another question with more details like: input, expexted output, your current code, your goal.
This should work for you. It's really really awful. Couldn't really think of a much better, versatile solution. I'm assuming you also wanted comments like this:
String myStr = "asasdasd"; //some comment here
^[^"\n]*?(?:[^"\n]*?"(?>\\"|[^"\n])*?"[^"\n]*?)*?[^"\n]*?\/\/.*?(\n)
Regex101

Java- I want to change a particular string with another one

import java.util.*;
import java.io.*;
public class OptimusPrime{
public static void main(String[] args){
System.out.println("Please enter the sentence");
Scanner scan= new Scanner(System.in);
String bucky=scan.nextLine();
int pOs=bucky.indexOf("is");
System.out.println(pOs);
if(pOs==-1){
System.out.println("the statement is invalid for the question");
}
else{
String nay=bucky.replace("is", "was");
System.out.println(nay);
}
}
}
Now I know the "replace" method is wrong as i want to change the particular string "is" and not the portion of other string elements. I also tried using SetChar method but I guess the "string is immutable" concept applies here.
How to go about it?
Using String.replaceAll() instead enables you to use a regex. You can use the predefined character class \W in order to catch a non-word character :
System.out.println("This is not difficult".replaceAll("\\Wis", ""));
Output :
This not difficult
The verb is disappeared but not the isfrom This.
Note 1 : It also removes the non-word character. If you want to keep it, you can capture it with some parenthesis in the regex then reintroduce it with $1:
System.out.println("This [is not difficult".replaceAll("(\\W)is", "$1"));
Output :
This [ not difficult
Note 2 : If you want to handle a string which begins with is, this line will not be enough but it is quite easy to handle with another regex.
System.out.println("is not difficult".replaceAll("^is", ""));
Output :
not difficult
If you use replaceAll instead, then you can use \b to use the word boundary to perform a "whole words only" search.
See this example:
public static void main(final String... args) {
System.out.println(replace("this is great", "is", "was"));
System.out.println(replace("crysis", "is", "was"));
System.out.println(replace("island", "is", "was"));
System.out.println(replace("is it great?", "is", "was"));
}
private static String replace(final String source, final String replace, final String with) {
return source.replaceAll("\\b" + replace + "\\b", with);
}
The output is:
this was great
crysis
island
was it great?
Simpler way:
String nay = bucky.replaceAll(" is ", " was ");
Match word boundary:
String nay = bucky.replaceAll("\\bis\\b", "was");
to replace string with another string you can use this
if Your string variable contains like this
bucky ="Android is my friend";
Then you can do like this
bucky =bucky.replace("is","are");
and your bucky's data will be like this Android are my friend
Hope this helps you.

How To do this in Regex - code base alterations

I have a complete Java based code base, where members are named:
String m_sFoo;
Array m_arrKeepThings;
Variable/object names includes both a m_ prefix to indicate a member, and an hungarian notation type indicator.
I'm looking for a way to perform a single time code replacment to (for example on the above to cases):
Array keepThings;
String foo;
Of course there are many other alternatives, but I hope that based on two examples, I'll be able to perform the full change.
Performances is not an issue as it's a single time fix.
To clarify, if I had to explain this in lines, it would be:
Match words starting with m_[a-zA-Z].
After m_, drop whatever is there before the first Capital letter.
Change the first capital letter to lower case.
Check out this post: Regex to change to sentence case
Generally I am afraid that you cannot change the case of letters using regular expressions.
I'd recommend you to implement a simple utility (using any language you want). You can do it in java. Just go through your file tree, search for pattern like m_[sidc]([A-Z]), take the captured sequence, call toLowerCase() and perform replace.
Other solution is to search and replace for m_sA, then m_sB, ... m_sZ using eclipse. Total: 26 times. It is a little bit stupid but probably anyway faster than implementing and debugging of your own code.
If you are really, really sure that the proposed changed won't result in clashes (variables that only differ in their prefix) I would do it with a line of perl:
perl -pi.bak -e "s/\bm_[a-z_]+([A-Z]\w*)\b/this.\u$1/g;" *.java
This will perform an inline edit of your Java sources, while keeping a backup with extension .bak replacing your pattern between word boundaries (\b) capitalising the first letter of the replacement (\u) multiple times per line.
You can then perform a diff between the backup files and the result files to see if all went well.
Here is some Java code that works. It is not pure regex, but based on:
Usage:
String str = "String m_sFoo;\n"
+ "Array m_arrKeepThings;\n"
+ "List<? extends Reader> m_lstReaders; // A silly comment\n"
+ "String.format(\"Hello World!\"); /* No m_named vars here */";
// Read the file you want to handle instead
NameMatcher nm = new NameMatcher(str);
System.out.println(nm.performReplacements());
NameMatcher.java
package so_6806699;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
*
* #author martijn
*/
public class NameMatcher
{
private String input;
public static final String REGEX = "m_[a-z]+([A-Z0-9_\\$\\µ\\£]*)";
public static final Pattern PATTERN = Pattern.compile(REGEX);
public NameMatcher(String input)
{
this.input = input;
}
public String performReplacements()
{
Matcher m = PATTERN.matcher(input);
StringBuilder sb = new StringBuilder();
int oldEnd = 0;
while (m.find())
{
int start = m.start();
int end = m.end();
String match = input.substring(start, end);
String matchGroup1 = match.replaceAll(REGEX, "$1");
if (!matchGroup1.isEmpty())
{
char[] match_array = matchGroup1.toCharArray();
match_array[0] = Character.toLowerCase(match_array[0]);
match = new String(match_array);
}
sb.append(input.substring(oldEnd, start));
oldEnd = end;
sb.append(match);
}
sb.append(input.substring(oldEnd));
return sb.toString();
}
}
Demo Output:
String foo;
Array keepThings;
List<? extends Reader> readers; // A silly comment
String.format("Hello World!"); /* No m_named vars here */
Edit 0:
Since dollar signs ($), micro (µ) and pound (£) are valid characters for Java name variables, I edited the regex.
Edit 1: It seems that there are a lot of non-latin characters that are valid (éùàçè, etc). Hopefully you don't have to handle them.
Edit 2: I'm only a human being! So be aware of errors there might be in the code! Make a BACKUP first!
Edit 3: Code improved. A NPE was thrown when the code contains this: m_foo. These will be unhandled.

How to check a string starts with numeric number?

I have a string which contains alphanumeric character.
I need to check whether the string is started with number.
Thanks,
See the isDigit(char ch) method:
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html
and pass it to the first character of the String using the String.charAt() method.
Character.isDigit(myString.charAt(0));
Sorry I didn't see your Java tag, was reading question only. I'll leave my other answers here anyway since I've typed them out.
Java
String myString = "9Hello World!";
if ( Character.isDigit(myString.charAt(0)) )
{
System.out.println("String begins with a digit");
}
C++:
string myString = "2Hello World!";
if (isdigit( myString[0]) )
{
printf("String begins with a digit");
}
Regular expression:
\b[0-9]
Some proof my regex works: Unless my test data is wrong?
I think you ought to use a regex:
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
String neg = "-123abc";
String pos = "123abc";
String non = "abc123";
/* I'm not sure if this regex is too verbose, but it should be
* clear. It checks that the string starts with either a series
* of one or more digits... OR a negative sign followed by 1 or
* more digits. Anything can follow the digits. Update as you need
* for things that should not follow the digits or for floating
* point numbers.
*/
Pattern pattern = Pattern.compile("^(\\d+.*|-\\d+.*)");
Matcher matcher = pattern.matcher(neg);
if(matcher.matches()) {
System.out.println("matches negative number");
}
matcher = pattern.matcher(pos);
if (matcher.matches()) {
System.out.println("positive matches");
}
matcher = pattern.matcher(non);
if (!matcher.matches()) {
System.out.println("letters don't match :-)!!!");
}
}
}
You may want to adjust this to accept floating point numbers, but this will work for negatives. Other answers won't work for negatives because they only check the first character! Be more specific about your needs and I can help you adjust this approach.
This should work:
String s = "123foo";
Character.isDigit(s.charAt(0));
System.out.println(Character.isDigit(mystring.charAt(0));
EDIT: I searched for java docs, looked at methods on string class which can get me 1st character & looked at methods on Character class to see if it has any method to check such a thing.
I think, you could do the same before asking it.
EDI2: What I mean is, try to do things, read/find & if you can't find anything - ask.
I made a mistake when posting it for the first time. isDigit is a static method on Character class.
Use a regex like ^\d

Categories