Best way to replace repeated character in String - java

I am given the String "('allan', 'bob's', 'charles', 'dom')". Now I require this string but in the form "('allan', 'bob''s', 'charles', 'dom')".
Notice I have replaced bob's with bob''s and that is all. My initial solution went along the lines of
String str = "('allan', 'bob's', 'charles', 'dom')";
String[] elements = str.substring(1, str.length()-1).split(", ");
String res = "(";
for (int j = 0; j < elements.length; j++) {
res += "'"+ elements[j].substring(1, elements[j].length()-1).replace("'", "''") + "'" + ((j == elements.length - 1) ? ")" : ",' ");
}
Where res is the final solution. However I am wondering if there is a shorter, more elegant solution to this?

replaceAll will work with "'s" as a regular expression.
public static void main(String[] args) {
String str = "('allan', 'bob's', 'charles', 'dom')";
str = str.replaceAll("'s", "''s");
System.out.println(str);
}
Output :
('allan', 'bob''s', 'charles', 'dom')

It's sounds like you just want to replace a single quote that is between two letters with two single quotes. String.replaceAll() using the regex pattern "(\\w)(['])(\\w)" and replacement string "$1''$3" should get this done for you.
Pattern breakdown:
(\\w) - Capture a letter or number into group 1
([']) - Capture the single quote into group 2
(\\w) - Capture a letter or number into group 3
Replacement string breakdown:
$1 - Capture group 1
'' - Two single quotes
$3 - Capture group 3
Code sample:
public static void main(String[] args) throws Exception {
String str = "('allan', 'bob's', 'charles', 'dom')";
str = str.replaceAll("(\\w)(['])(\\w)", "$1''$3");
System.out.println(str);
}
Results:
('allan', 'bob''s', 'charles', 'dom')

str = str.replace("'", "''").replaceAll("''(.*)?''(, *|\\)$)", "'$1'$2");
(~~) (~~~|~~~~)
Doubling the apostrophes cannot be done on a substring replace (without function-replace), hence must be done first.
Then one can pick out the quoted values, and correct them.
The trailer is either a comma+space or a final closing parenthesis.
.*? takes the shortest sequence.

i would recommend to fix the thing which outputted that string, and:
use escape sequences
use some library for reading/writing the data json or yaml might be good candidates

Related

Use regex to un camelCase Java String

This code seems to work perfectly, but I'd love to clean it up with regex.
public static void main(String args[]) {
String s = "IAmASentenceInCamelCaseWithNumbers500And1And37";
System.out.println(unCamelCase(s));
}
public static String unCamelCase(String string) {
StringBuilder newString = new StringBuilder(string.length() * 2);
newString.append(string.charAt(0));
for (int i = 1; i < string.length(); i++) {
if (Character.isUpperCase(string.charAt(i)) && string.charAt(i - 1) != ' '
|| Character.isDigit(string.charAt(i)) && !Character.isDigit(string.charAt(i - 1))) {
newString.append(' ');
}
newString.append(string.charAt(i));
}
return newString.toString();
}
Input:
IAmASentenceInCamelCaseWithNumbers500And1And37
Output:
I Am A Sentence In Camel Case With Numbers 500 And 1 And 37
I'm not a fan of using that ugly if statement, and I'm hoping there's a way to use a single line of code that utilizes regex. I tried for a bit but it would fail on words with 1 or 2 letters.
Failing code that doesn't work:
return string.replaceAll("(.)([A-Z0-9]\\w)", "$1 $2");
The right regex and code to do your job is this.
String s = "IAmASentenceInCamelCaseWithNumbers500And1And37";
System.out.println("Output: " + s.replaceAll("[A-Z]|\\d+", " $0").trim());
This outputs,
Output: I Am A Sentence In Camel Case With Numbers 500 And 1 And 37
Editing answer for query asked by OP in comment:
If input string is,
ThisIsAnABBRFor1Abbreviation
Regex needs a little modification and becomes this, [A-Z]+(?![a-z])|[A-Z]|\\d+ for handling abbreviation.
This code,
String s = "ThisIsAnABBRFor1Abbreviation";
System.out.println("Input: " + s.replaceAll("[A-Z]+(?![a-z])|[A-Z]|\\d+", " $0").trim());
Gives expected output as per OP ZeekAran in comment,
Input: This Is An ABBR For 1 Abbreviation
You may use this lookaround based regex solution:
final String result = string.replaceAll(
"(?<=\\S)(?=[A-Z])|(?<=[^\\s\\d])(?=\\d)", " ");
//=> I Am A Sentence In Camel Case With Numbers 500 And 1 And 37
RegEx Demo
RegEx Details:
Regex matches either of 2 conditions and replaces it with a space. It will ignore already present spaces in input.
(?<=\\S)(?=[A-Z]): Previous char is non-space and next char is a uppercase letter
|: OR
(?<=[^\\s\\d])(?=\\d): previous char is non-digit & non-space and next one is a digit
I think you can try this
let str = "IAmASentenceInCamelCaseWithNumbers500And1And37";
function unCamelCase(str){
return str.replace(/(?:[A-Z]|[0-9]+)/g, (m)=>' '+m.toUpperCase()).trim();
}
console.log(unCamelCase(str));
Explanation
(?:[A-Z]|[0-9]+)
?: - Non capturing group.
[A-Z] - Matches any one capital character.
'|' - Alternation (This works same as Logical OR).
[0-9]+ - Matches any digit from 0-9 one or more time.
P.S Sorry for the example in JavaScript but same logic can be achived in JAVA pretty easily.

Regex pattern to convert comma separated String

Changing string with comma separated values to numbered new-line values
For example:
Input: a,b,c
Output:
1.a
2.b
3.c
Finding it hard to change it using regex pattern, instead of converting string to string array and looping through.
I'm not really sure, that it's possible to achive with only regex without any kind of a loop. As fore me, the solution with spliting the string into an array and iterating over it, is the most straightforward:
String value = "a,b,c";
String[] values = value.split(",");
String result = "";
for (int i=1; i<=values.length; i++) {
result += i + "." + values[i-1] + "\n";
}
Sure, it's possible to do without splitting and any kind of arrays, but it could be a little bit awkward solution, like:
String value = "a,b,c";
Pattern pattern = Pattern.compile("[(^\\w+)]");
Matcher matcher = pattern.matcher(value.replaceAll("\\,", "\n"));
StringBuffer s = new StringBuffer();
int i = 0;
while (matcher.find()) {
matcher.appendReplacement(s, ++i + "." + matcher.group());
}
System.out.println(s.toString());
Here the , sign is replaced with \n new line symbol and then we are looking for a groups of characters at the start of every line [(^\\w+)]. If any group is found, then we are appending to the start of this group a line number. But even here we have to use a loop to set the line number. And this logic is not as clear, as the first one.

Splitting a string between a char

I want to split a String on a delimiter.
Example String:
String str="ABCD/12346567899887455422DEFG/15479897445698742322141PQRS/141455798951";
Now I want Strings as ABCD/12346567899887455422, DEFG/15479897445698742322141 like I want
only 4 chars before /
after / any number of chars numbers and letters.
Update:
The only time I need the previous 4 characters is after a delimiter is shown, as the string may contain letters or numbers...
My code attempt:
public class StringReq {
public static void main(String[] args) {
String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
testSplitStrings(str);
}
public static void testSplitStrings(String path) {
System.out.println("splitting of sprint starts \n");
String[] codeDesc = path.split("/");
String[] codeVal = new String[codeDesc.length];
for (int i = 0; i < codeDesc.length; i++) {
codeVal[i] = codeDesc[i].substring(codeDesc[i].length() - 4,
codeDesc[i].length());
System.out.println("line" + i + "==> " + codeDesc[i] + "\n");
}
for (int i = 0; i < codeVal.length - 1; i++) {
System.out.println(codeVal[i]);
}
System.out.println("splitting of sprint ends");
}
}
You claim that after / there can appear digits and alphabets, but in your example I don't see any alphabets which should be included in result after /.
So based on that assumption you can simply split in placed which has digit before and A-Z character after it.
To do so you can split with regex which is using look-around mechanism like str.split("(?<=[0-9])(?=[A-Z])")
Demo:
String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
for (String s : str.split("(?<=[0-9])(?=[A-Z])"))
System.out.println(s);
Output:
BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890
If you alphabets can actually appear in second part (after /) then you can use split which will try to find places which have four alphabetic characters and / after it like split("(?=[A-Z]{4}/)") (assuming that you are using at least Java 8, if not you will need to manually exclude case of splitting at start of the string for instance by adding (?!^) or (?<=.) at start of your regex).
you can use regex
Pattern pattern = Pattern.compile("[A-Z]{4}/[0-9]*");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
Instead of:
String[] codeDesc = path.split("/");
Just use this regex (4 characters before / and any characters after):
String[] codeDesc = path.split("(?=.{4}/)(?<=.)");
Even simpler using \d:
path.split("(?=[A-Za-z])(?<=\\d)");
EDIT:
Included condition for 4 any size letters only.
path.split("(?=[A-Za-z]{4})(?<=\\d)");
output:
BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890
It is still unclear if this is authors expected result.

Getting next two words from a given word in string with words containing non alphanumeric characters as well

I have a String as below:
String str = "This is something Total Toys (RED) 300,000.00 (49,999.00) This is something";
Input from user would be a keyword String viz. Total Toys (RED)
I can get the index of the keyword using str.indexOf(keyword);
I can also get the start of the next word by adding length of keyword String to above index.
However, how can I get the next two tokens after the keyword in given String which are the values I want?
if(str.contains(keyWord)){
String Value1 = // what should come here such that value1 is 300,000.00 which is first token after keyword string?
String Value2 = // what should come here such that value2 is (49,999.00) which is second token after keyword string?
}
Context : Read a PDF using PDFBox. The keyword above is the header in first column of a table in the PDF and the next two tokens I want to read are the values in the next two columns on the same row in this table.
You can use regular expressions to do this. This will work for all instances of the keyword that are followed by two tokens, if the keyword is not followed by two tokens, it won't match; however, this is easily adaptable, so please state if you want to match in cases where 0 or 1 tokens follow the keyword.
String regex = "(?i)%s\\s+([\\S]+)\\s+([\\S]+)";
Matcher m = Pattern.compile(String.format(regex, Pattern.quote(keyword))).matcher(str);
while (m.find())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}
In you example, %s in regex would be replaced by "Total Toys", giving:
300,000.00 49,999.00
(?i) means case-insensitive
\\s means whitespace
\\S means non-whitespace
[...] is a character class
+ means 1 or more
(...) is a capturing group
EDIT:If you want to use a keyword with special characters intrinsic to regular expressions, then you need to use Pattern.quote(). For example, in regex, ( and ) are special characters, so a keyword with them will result in an incorrect regex. Pattern.quote() interprets them as raw characters, so they will be escaped in the regex, ie changed to \\( and \\).
If you want three groups, use this:
String regex = "%s\\s+([\\S]+)\\s+([\\S]+)(?:\\s+([\\S]+))?";
NB: If only two groups follow, group(3) will be null.
Something like this:
String remainingPart= str.substring(str.indexOf(keyWord)+keyWord.length());
StringTokenizer st=new StringTokenizer(remainingPart);
if(st.hasMoreTokens()){
Value1=st.nextToken();
}
if(st.hasMoreTokens()){
Value2=st.nextToken();
}
Try this,
String str = "This is something Total Toys 300,000.00 49,999.00 This is something";
if(str.contains(keyWord)) {
String splitLine = str.split(keyword)[1];
String tokens[] = splitLine.split(" ");
String Value1 = tokens[1];
String Value2 = tokens[2];
}
Here is something that works given what you have provided:
public static void main(String[] args)
{
String search = "Total Toys";
String str = "This is something Total Toys 300,000.00 49,999.00 This is something";
int index = str.indexOf(search);
index += search.length();
String[] tokens = str.substring(index, str.length()).trim().split(" ");
String val1 = tokens[0];
String val2 = tokens[1];
System.out.println("Val1: " + val1 + ", Val2: " + val2);
}
Output:
Val1: 300,000.00, Val2: 49,999.00

Regex for special characters in java

public static final String specialChars1= "\\W\\S";
String str2 = str1.replaceAll(specialChars1, "").replace(" ", "+");
public static final String specialChars2 = "`~!##$%^&*()_+[]\\;\',./{}|:\"<>?";
String str2 = str1.replaceAll(specialChars2, "").replace(" ", "+");
Whatever str1 is I want all the characters other than letters and numbers to be removed, and spaces to be replaced by a plus sign (+).
My problem is if I use specialChar1, it does not remove some characters like ;, ', ", and if I am use specialChar2 it gives me an error :
java.util.regex.PatternSyntaxException: Syntax error U_REGEX_MISSING_CLOSE_BRACKET near index 32:
How can this be to achieved?. I have searched but could not find a perfect solution.
This worked for me:
String result = str.replaceAll("[^\\dA-Za-z ]", "").replaceAll("\\s+", "+");
For this input string:
/-+!##$%^&())";:[]{}\ |wetyk 678dfgh
It yielded this result:
+wetyk+678dfgh
replaceAll expects a regex:
public static final String specialChars2 = "[`~!##$%^&*()_+[\\]\\\\;\',./{}|:\"<>?]";
The problem with your first regex, is that "\W\S" means find a sequence of two characters, the first of which is not a letter or a number followed by a character which is not whitespace.
What you mean is "[^\w\s]". Which means: find a single character which is neither a letter nor a number nor whitespace. (we can't use "[\W\S]" as this means find a character which is not a letter or a number OR is not whitespace -- which is essentially all printable character).
The second regex is a problem because you are trying to use reserved characters without escaping them. You can enclose them in [] where most characters (not all) do not have special meanings, but the whole thing would look very messy and you have to check that you haven't missed out any punctuation.
Example:
String sequence = "qwe 123 :#~ ";
String withoutSpecialChars = sequence.replaceAll("[^\\w\\s]", "");
String spacesAsPluses = withoutSpecialChars.replaceAll("\\s", "+");
System.out.println("without special chars: '"+withoutSpecialChars+ '\'');
System.out.println("spaces as pluses: '"+spacesAsPluses+'\'');
This outputs:
without special chars: 'qwe 123 '
spaces as pluses: 'qwe+123++'
If you want to group multiple spaces into one + then use "\s+" as your regex instead (remember to escape the slash).
I had a similar problem to solve and I used following method:
text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
Code with time bench marking
public static String cleanPunctuations(String text) {
return text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
}
public static void test(String in){
long t1 = System.currentTimeMillis();
String out = cleanPunctuations(in);
long t2 = System.currentTimeMillis();
System.out.println("In=" + in + "\nOut="+ out + "\nTime=" + (t2 - t1)+ "ms");
}
public static void main(String[] args) {
String s1 = "My text with 212354 digits spaces and \n newline \t tab " +
"[`~!##$%^&*()_+[\\\\]\\\\\\\\;\\',./{}|:\\\"<>?] special chars";
test(s1);
String s2 = "\"Sample Text=\" with - minimal \t punctuation's";
test(s2);
}
Sample Output
In=My text with 212354 digits spaces and
newline tab [`~!##$%^&*()_+[\\]\\\\;\',./{}|:\"<>?] special chars
Out=My+text+with+212354+digits+spaces+and+newline+tab+special+chars
Time=4ms
In="Sample Text=" with - minimal punctuation's
Out=Sample+Text+with+minimal+punctuations
Time=0ms
you can use a regex like this:
[<#![CDATA[¢<(+|!$*);¬/¦,%_>?:#="~{#}\]]]#>]`
remove "#" at first and at end from expression
regards
#npinti
using "\w" is the same as "\dA-Za-z"
This worked for me:
String result = str.replaceAll("[^\\w ]", "").replaceAll("\\s+", "+");

Categories