How to add a space after certain characters using regex Java - java

I have a string consisting of 18 digits Eg. 'abcdefghijklmnopqr'. I need to add a blank space after 5th character and then after 9th character and after 15th character making it look like 'abcde fghi jklmno pqr'. Can I achieve this using regular expression?
As regular expressions are not my cup of tea hence need help from regex gurus out here. Any help is appreciated.
Thanks in advance

Regex finds a match in a string and can't preform a replacement. You could however use regex to find a certain matching substring and replace that, but you would still need a separate method for replacement (making it a two step algorithm).
Since you're not looking for a pattern in your string, but rather just the n-th char, regex wouldn't be of much use, it would make it unnecessary complex.
Here are some ideas on how you could implement a solution:
Use an array of characters to avoid creating redundant strings: create a character array and copy characters from the string before
the given position, put the character at the position, copy the rest
of the characters from the String,... continue until you reach the end
of the string. After that construct the final string from that
array.
Use Substring() method: concatenate substring of the string before
the position, new character, substring of the string after the
position and before the next position,... and so on, until reaching the end of the original string.
Use a StringBuilder and its insert() method.
Note that:
First idea listed might not be a suitable solution for very large strings. It needs an auxiliary array, using additional space.
Second idea creates redundant strings. Strings are immutable and final in Java, and are stored in a pool. Creating
temporary strings should be avoided.

Yes you can use regex groups to achieve that. Something like that:
final Pattern pattern = Pattern.compile("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})");
final Matcher matcher = pattern.matcher("abcdefghijklmnopqr");
if (matcher.matches()) {
String first = matcher.group(0);
String second = matcher.group(1);
String third = matcher.group(2);
String fourth = matcher.group(3);
return first + " " + second + " " + third + " " + fourth;
} else {
throw new SomeException();
}
Note that pattern should be a constant, I used a local variable here to make it easier to read.
Compared to substrings, which would also work to achieve the desired result, regex also allow you to validate the format of your input data. In the provided example you check that it's a 18 characters long string composed of only lowercase letters.
If you had a more interesting examples, with for example a mix of letters and digits, you could check that each group contains the correct type of data with the regex.
You can also do a simpler version where you just replace with:
"abcdefghijklmnopqr".replaceAll("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})", "$1 $2 $3 $4")
But you don't have the benefit of checking because if the string doesn't match the format it will just not replaced and this is less efficient than substrings.
Here is an example solution using substrings which would be more efficient if you don't care about checking:
final Set<Integer> breaks = Set.of(5, 9, 15);
final String str = "abcdefghijklmnopqr";
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if (breaks.contains(i)) {
stringBuilder.append(' ');
}
stringBuilder.append(str.charAt(i));
}
return stringBuilder.toString();

Related

How can I remove whitespaces around the first occurrence of specific char?

How can I remove the whitespaces before and after a specific char? I want also to remove the whitespaces only around the first occurrence of the specific char. In the examples below, I want to remove the whitespaces before and after the first occurrence of =.
For example for those strings:
something = is equal to = something
something = is equal to = something
something =is equal to = something
I need to have this result:
something=is equal to = something
Is there any regular expression that I can use or should I check for the index of the first occurrence of the char =?
private String removeLeadingAndTrailingWhitespaceOfFirstEqualsSign(String s1) {
return s1.replaceFirst("\\s*=\\s*", "=");
}
Notice this matches all whitespace including tabs and new lines, not just space.
You can use the regular expression \w*\s*=\s* to get all matches. From there call trim on the first index in the array of matches.
Regex demo.
Yes - you can create a Regex that matches optional whitespace followed by your pattern followed by optional whitepace, and then replace the first instance.
public static String replaceFirst(final String toMatch, final String forIP) {
// string you want to match before and after
final String quoted = Pattern.quote(toMatch);
final Pattern patt = Pattern.compile("\\s*" + quoted + "\\s*");
final Matcher match = patt.matcher(forIP);
return match.replaceFirst(toMatch);
}
For your inputs this gives the expected result - assuming toMatch is =. It also works with arbitrary bigger things - eg.. imagine giving "is equal to" instead ... getting
something =is equal to= something
For the simple case you can ignore the quoting, for an arbitrary case it helps (although as
many contributors have pointed out before the Pattern.quoting isn't good for every case).
The simple case thus becomes
return forIP.replaceFirst("\\s*" + forIP + "\\s*", forIP);
OR
return forIP.replaceFirst("\\s*=\\s*", "=");

Split a string based on pattern and merge it back

I need to split a string based on a pattern and again i need to merge it back on a portion of string.
for ex: Below is the actual and expected strings.
String actualstr="abc.def.ghi.jkl.mno";
String expectedstr="abc.mno";
When i use below, i can store in a Array and iterate over to get it back. Is there anyway it can be done simple and efficient than below.
String[] splited = actualstr.split("[\\.\\.\\.\\.\\.\\s]+");
Though i can acess the string based on index, is there any other way to do this easily. Please advise.
You do not understand how regexes work.
Here is your regex without the escapes: [\.\.\.\.\.\s]+
You have a character class ([]). Which means there is no reason to have more than one . in it. You also don't need to escape .s in a char class.
Here is an equivalent regex to your regex: [.\s]+. As a Java String that's: "[.\\s]+".
You can do .split("regex") on your string to get an array. It's very simple to get a solution from that point.
I would use a replaceAll in this case
String actualstr="abc.def.ghi.jkl.mno";
String str = actualstr.replaceAll("\\..*\\.", ".");
This will replace everything with the first and last . with a .
You could also use split
String[] parts = actualString.split("\\.");
string str = parts[0]+"."+parts[parts.length-1]; // first and last word
public static String merge(String string, String delimiter, int... partnumbers)
{
String[] parts = string.split(delimiter);
String result = "";
for ( int x = 0 ; x < partnumbers.length ; x ++ )
{
result += result.length() > 0 ? delimiter.replaceAll("\\\\","") : "";
result += parts[partnumbers[x]];
}
return result;
}
and then use it like:
merge("abc.def.ghi.jkl.mno", "\\.", 0, 4);
I would do it this way
Pattern pattern = Pattern.compile("(\\w*\\.).*\\.(\\w*)");
Matcher matcher = pattern.matcher("abc.def.ghi.jkl.mno");
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
If you can cache the result of
Pattern.compile("(\\w*\\.).*\\.(\\w*)")
and reuse "pattern" all over again this code will be very efficient as pattern compilation is the most expensive. java.lang.String.split() method that other answers suggest uses same Pattern.compile() internally if the pattern length is greater then 1. Meaning that it will do this expensive operation of Pattern compilation on each invocation of the method. See java.util.regex - importance of Pattern.compile()?. So it is much better to have the Pattern compiled and cached and reused.
matcher.group(1) refers to the first group of () which is "(\w*\.)"
matcher.group(2) refers to the second one which is "(\w*)"
even though we don't use it here but just to note that group(0) is the match for the whole regex.

How to use Substring when String length is not fixed everytime

I have string something like :
SKU: XP321654
Quantity: 1
Order date: 01/08/2016
The SKU length is not fixed , so my function sometime returns me the first or two characters of Quantity also which I do not want to get. I want to get only SKU value.
My Code :
int index = Content.indexOf("SKU:");
String SKU = Content.substring(index, index+15);
If SKU has one or two more digits then also it is not able to get because I have specified limit till 15. If I do index + 16 to get long SKU data then for Short SKU it returns me some character of Quantity also.
How can I solve it. Is there any way to use instead of a static string character length as limit.
My SKU last digit will always number so any other thing which I can use to get only SKU till it's last digit?
Using .substring is simply not the way to process such things. What you need is a regex (or regular expression):
Pattern pat = Pattern.compile("SKU\\s*:\\s*(\\S+)");
String sku = null;
Matcher matcher = pattern.matcher(Content);
if(matcher.find()) { //we've found a match
sku = matcher.group(1);
}
//do something with sku
Unescaped the regex is something like:
SKU\s*:\s*(\S+)
you are thus looking for a pattern that starts with SKU then followed by zero or more \s (spacing characters like space and tab), followed by a colon (:) then potentially zero or more spacing characters (\s) and finally the part in which you are interested: one or more (that's the meaning of +) non-spacing characters (\S). By putting these in brackets, these are a matching group. If the regex succeeds in finding the pattern (matcher.find()), you can extract the content of the matching group matcher.group(1) and store it into a string.
Potentially you can improve the regex further if you for instance know more about how a SKU looks like. For instance if it consists only out of uppercase letters and digits, you can replace \S by [0-9A-Z], so then the pattern becomes:
Pattern pat = Pattern.compile("SKU\\s*:\\s*([0-9A-Z]+)");
EDIT: for the quantity data, you could use:
Pattern pat2 = Pattern.compile("Quantity\\s*:\\s*(\\d+)");
int qt = -1;
Matcher matcher = pat2.matcher(Content);
if(matcher.find()) { //we've found a match
qt = Integer.parseInt(matcher.group(1));
}
or see this jdoodle.
You know you can just refer to the length of the string right ?
String s = "SKU: XP321654";
String sku = s.substring(4, s.length()).trim();
I think using a regex is clearly overkill in this case, it is way way simpler than this. You can even split the expression although it's a bit less efficient than the solution above, but please don't use a regex for this !
String sku = "SKU: XP321654".split(':')[1].trim();
1: you have to split your input by lines (or split by \n)
2: when you have your line: you search for : and then you take the remaining of the line (with the String size as mentionned in Dici answer).
Depending on how exactly the string contains new lines, you could do this:
public static void main(String[] args) {
String s = "SKU: XP321654\r\n" +
"Quantity: 1\r\n" +
"Order date: 01/08/2016";
System.out.println(s.substring(s.indexOf(": ") + 2, s.indexOf("\r\n")));
}
Just note that this 1-liner has several restrictions:
The SKU property has to be first. If not, then modify the start index appropriately to search for "SKU: ".
The new lines might be separated otherwise, \R is a regex for all the valid new line escape characters combinations.

How to split a string in Java using "%*%" as separator, including the separator in the result list of strings?

I'm looking for the simplest way of tokenizing strings such as
INPUT OUTPUT
"hello %my% world" -> "hello ", "%my%", " world"
in Java. Is it possible to accomplish this with regex? I am basically looking for a String.split() that takes as separator something of the form "%*%" but that won't ignore it, as it seems to generally do.
Thanks
No, you can't do this the way you explained it. The reason is--it's ambiguous!
You give the example:
"hello %my% world" -> "hello ", "%my%", " world"
Should the % be attached to the string before it or after it?
Should the output be
"hello ", "%my", "% world"
Or, perhaps the output should be
"hello %", "my%", " world"
In your example you don't follow either of these rules. You come up with %my% which attaches the delimiter first to the string after it appears and then to the string before it appears.
Do you see the ambiguity?
So, you first need to come up with a clear set of rules about where you want the delimeter to be attached to. Once you do this, one simple (although not particularly efficient since Strings are immutable) way of achieving what you want is to:
Use String.split() to split the strings in the normal way
Follow your rule set to re-add the delimiter to where it should be in the string.
A simpler solution would be to just split the string by %s. That way, every other subsequence would have been between %s. All you have to do afterwards is iterate over the results, toggling a flag to know if the result is a regular string or one between %s.
Special attention has to be taken to the split implementation, how does it handle empty subsequences. Some implementations decide to discard empty subsequences at the begin/end of the input, others discard all empty subsequences and others discard none of them.
This would not result in the exact output that you want, since the %s would be gone. However you can easily add those back if there is an actual need for them (and I presume there isn't).
why not you split by space between your words. in that case you will get "hello","%my%","world".
If possible, use a simpler delimiter. And I'm okay with jury-rigging "%" as your delimiter, just so you can get String.split() instead of regexps. But if that's not possible...
Regexps! You can parse this using a Matcher. If you know there's one delimiter per line, you specify a pattern that eats the whole line:
String singleDelimRegexp = "(.*)(%[^%]*%)(.*)";
Pattern singleDelimPattern = Pattern.compile(singleDelimRegexp);
Matcher singleDelimMatcher = singleDelimPattern.matcher(input);
if (singleDelimMatcher.matches()) {
String before = singleDelimMatcher.group(1);
String delim = singleDelimMatcher.group(2);
String after = singleDelimMatcher.group(3);
System.out.println(before + "//" + delim + "//" + after);
}
If the input is long and you need a chain of results, you use Matcher in a loop:
String multiDelimRegexp = "%[^%]*%";
Pattern multiDelimPattern = Pattern.compile(multiDelimRegexp);
Matcher multiDelimMatcher = multiDelimPattern.matcher(input);
int lastEnd = 0;
while (multiDelimMatcher.find()) {
String data = input.substring(lastEnd, multiDelimMatcher.start());
String delim = multiDelimMatcher.group();
lastEnd = multiDelimMatcher.end();
System.out.println(data);
System.out.println(delim);
}
String lastData = input.substring(lastEnd);
System.out.println(lastData);
Add those to a data structure as you go, and you'll build the whole parsed input.
Running on input: http://ideone.com/s8FzeW

Regular Expression - Java

For the string value "ABCD_12" (including quotes), I would like to extract only the content and exclude out the double quotes i.e. ABCD_12 . My code is:
private static void checkRegex()
{
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9])+\"");
Matcher findMatches = stringPattern.matcher("\"ABC_12\"");
if (findMatches.matches())
System.out.println("Match found" + findMatches.group(0));
}
Now I have tried doing findMatches.group(1);, but that only returns the last character in the string (I did not understand why !).
How can I extract only the content leaving out the double quotes?
Try this regex:
Pattern.compile("\"([a-zA-Z_0-9]+)\"");
OR
Pattern.compile("\"([^\"]+)\"");
Problem in your code is a misplaced + outside right parenthesis. Which is causing capturing group to capture only 1 character (since + is outside) and that's why you get only last character eventually.
A nice simple (read: non-regex) way to do this is:
String myString = "\"ABC_12\"";
String myFilteredString = myString.replaceAll("\"", "");
System.out.println(myFilteredString);
gets you
ABC_12
You should change your pattern to this:
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9]+)\"");
Note that the + sign was moved inside the group, since you want the character repetition to be part of the group. In the code you posted, what you were actually searching for was a repetition of the group, which consisted in a single occurence of a single characters in [a-zA-Z_0-9].
If your pattern is strictly any text in between double quotes, then you may be better off using substring:
String str = "\"ABC_12\"";
System.out.println(str.substring(1, str.lastIndexOf('\"')));
Assuming it is a bit more complex (double quotes in between a larger string), you can use the split() function in the Pattern class and use \" as your regex - this will split the string around the \" so you can easily extract the content you want
Pattern p = Pattern.compile("\"");
// Split input with the pattern
String[] result =
p.split(str);
for (int i=0; i<result.length; i++)
System.out.println(result[i]);
}
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html#split%28java.lang.CharSequence%29

Categories