How to split a string in between square brackets like below - java

I want to split the string
[{"starDate":"","endDate":"","relativeDays":,"cronExpression":""},{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}]
to
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}

Maybe this one-liner work out for you, since it's not clear whether you want two strings or one string with explicit line breaks
String str = "[{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"},{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"}]";
str = String.join("},", str.replaceAll("^\\[|]$+", "").split("},"));

You should ideally do this the proper way using JSON parsing if it applies.
Meanwhile, for this very specific case, you can use a Regular Expression to extract the parts you need (Example in Java) :
String input = "[{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"},{\"starDate\":\"\",\"endDate\":\"\",\"relativeDays\":,\"cronExpression\":\"\"}]";
Matcher matcher = Pattern.compile("(\\{[^\\}]*\\})").matcher(input);
while( matcher.find() )
System.out.println(matcher.group(1));
This will result in :
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}
{"starDate":"","endDate":"","relativeDays":,"cronExpression":""}

Related

Split a string based on pattern and merge it back

I need to split a string based on a pattern and again i need to merge it back on a portion of string.
for ex: Below is the actual and expected strings.
String actualstr="abc.def.ghi.jkl.mno";
String expectedstr="abc.mno";
When i use below, i can store in a Array and iterate over to get it back. Is there anyway it can be done simple and efficient than below.
String[] splited = actualstr.split("[\\.\\.\\.\\.\\.\\s]+");
Though i can acess the string based on index, is there any other way to do this easily. Please advise.
You do not understand how regexes work.
Here is your regex without the escapes: [\.\.\.\.\.\s]+
You have a character class ([]). Which means there is no reason to have more than one . in it. You also don't need to escape .s in a char class.
Here is an equivalent regex to your regex: [.\s]+. As a Java String that's: "[.\\s]+".
You can do .split("regex") on your string to get an array. It's very simple to get a solution from that point.
I would use a replaceAll in this case
String actualstr="abc.def.ghi.jkl.mno";
String str = actualstr.replaceAll("\\..*\\.", ".");
This will replace everything with the first and last . with a .
You could also use split
String[] parts = actualString.split("\\.");
string str = parts[0]+"."+parts[parts.length-1]; // first and last word
public static String merge(String string, String delimiter, int... partnumbers)
{
String[] parts = string.split(delimiter);
String result = "";
for ( int x = 0 ; x < partnumbers.length ; x ++ )
{
result += result.length() > 0 ? delimiter.replaceAll("\\\\","") : "";
result += parts[partnumbers[x]];
}
return result;
}
and then use it like:
merge("abc.def.ghi.jkl.mno", "\\.", 0, 4);
I would do it this way
Pattern pattern = Pattern.compile("(\\w*\\.).*\\.(\\w*)");
Matcher matcher = pattern.matcher("abc.def.ghi.jkl.mno");
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
If you can cache the result of
Pattern.compile("(\\w*\\.).*\\.(\\w*)")
and reuse "pattern" all over again this code will be very efficient as pattern compilation is the most expensive. java.lang.String.split() method that other answers suggest uses same Pattern.compile() internally if the pattern length is greater then 1. Meaning that it will do this expensive operation of Pattern compilation on each invocation of the method. See java.util.regex - importance of Pattern.compile()?. So it is much better to have the Pattern compiled and cached and reused.
matcher.group(1) refers to the first group of () which is "(\w*\.)"
matcher.group(2) refers to the second one which is "(\w*)"
even though we don't use it here but just to note that group(0) is the match for the whole regex.

Change tags in symbol Pattern/Matcher

This code works fine :
final String result = myString.replaceAll("<tag1>", "{").replaceAll("<tag2>", "}");
but I have to parse big files, so I'm asking me if I can have a Pattern.compile("REGEX"); before the while :
Patter p = Pattern.compile("REGEX");
while(scan.hasNextLine()){
final String myWorkLine = scan.readLine();
p.matcher(s).replaceAll("$1"); // or other value
..;
}
I expect faster result because regex compilation is maid once and only once.
EDIT
I want to put (if it is possible) the replaceAll(..).replaceAll(..) model in a Pattern, and have tag1==>{, and tag2==>}.
Question : is outside loop Pattern model faster than inside loop replaceAll.replaceAll model?
To answer your original question: yes, you could do that, and indeed it would be faster than your original code, if you apply the same regular expression(s) multiple times in a loop. Your loop should be rewritten like this:
Pattern p1 = Pattern.compile("REGEX1");
Pattern p1 = Pattern.compile("REGEX1");
while (scan.hasNextLine()) {
String myWorkLine = scan.readLine();
myWorkLine = p1.matcher(myWorkLine).replaceAll("replacement1");
myWorkLine = p2.matcher(myWorkLine).replaceAll("replacement2");
...;
}
But, if your're not using regular expressions, as your first example suggests ("<tag1>"), then don't use String.replaceAll(String regex, String replacement), as it is slower because of the regular expression. Instead use String.replace(CharSequence target, CharSequence replacement), as it doesn't work with regular expression and is much faster.
Example:
"ABAP is fun! ABAP ABAP ABAP".replace("ABAP", "Java");
See: Java Docs for String.replace
It's not nice changing your question that radically, but ok, here again an answer for your regular expression:
String s1
= "You can <bold>have nice weather</bold>, but <bold>not</bold> always!";
//EDIT: the regex was 'overengineered', and .?? should have been .*?
//String s2 = s1.replaceAll("(.*?)<bold>(.*?)</bold>(.??)", "$1{$2}$3");
String s2 = s1.replaceAll("<bold>(.*?)</bold>", "{$1}");
System.out.println(s2);
Output: You can {have nice weather}, but {not} always!
Here the loop with this new regex, and yes, this would be faster than original loop:
//EDIT: the regex was 'overengineered'
Pattern p = Pattern.compile("<bold>(.*?)</bold>");
while (scan.hasNextLine()) {
String myWorkLine = scan.readLine();
myWorkLine = p.matcher(myWorkLine).replaceAll("{$1}");
...;
}
EDIT:
Here the description of Java RegEx syntax constructs
replaceAll uses regex Patterns. From the java.lang.String source code:
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
Edit1: Please stop changing what you're asking. Pick a question and stick with it.
Edit2:
If you're really sure you want to do it this way, compiling a regex outside of the loop, in the simplest case you'd need two different patterns:
Pattern tag1Pattern = Pattern.compile("<tag1>");
Pattern tag2Pattern = Pattern.compile("<tag2>");
while( scan.hasNextLine() ) {
String line = scan.readLine();
String modifiedLine = tag1Pattern.matcher(line).replaceAll("{");
modifiedLine = tag2Pattern.matcher(line).replaceAll("}");
...
}
You're still applying the pattern matcher twice per line, so if there's any performance hits that's why.
Without knowing what your data looks like, it's hard to give you a more precise answer or better regex. Unless you've edited your question (again) while I was writing this.

Optionally using String.split(), split a string at the last occurance of a delimiter

I have a string that matches this regular expression: ^.+:[0-9]+(\.[0-9]+)*/[0-9]+$ which can easily be visualized as (Text):(Double)/(Int). I need to split this string into the three parts. Normally this would be easy, except that the (Text) may contain colons, so I cannot split on any colon - but rather the last colon.
The .* is greedy so it already does a pretty neat job of doing this, but this wont work as a regular expression into String.split() because it will eat my (Text) as part of the delimiter. Ideally I'd like to have something that would return a String[] with three strings. I'm 100% fine with not using String.split() for this.
I don't like regex (just kidding I do but I'm not very good at it).
String s = "asdf:1.0/1"
String text = s.substring(0,s.lastIndexOf(":"));
String doub = s.substring(s.lastIndexOf(":")+1,text.indexOf("/"));
String inte = s.substring(text.indexOf("/")+1,s.length());
Why don't you just use a straight up regular expression?
Pattern p = Pattern.compile("^(.*):([\\d\\.]+)/(\\d+)$");
Matcher m = p.matcher( someString );
if (m.find()) {
m.group(1); // returns the text before the colon
m.group(2); // returns the double between the colon and the slash
m.group(3); // returns the integer after the slash
}
Or similar. The pattern ^(.*):([\d\.]+)/(\d+)$ assumes that you actually have values in all three positions, and will allow just a period/fullstop in the double position, so you may want to tweak it to your specifications.
String.split() is typically used in simpler scenarios where the delimiter and formatting are more consistent and when you don't know how many elements you are going to be splitting.
Your use case calls for a plain old regular expression. You know the formatting of the string, and you know you want to collect three values. Try something like the following.
Pattern p = Pattern.compile("(.+):([0-9\\.]+)/([0-9]+)$");
Matcher m = p.matcher(myString);
if (m.find()) {
String myText = m.group(1);
String myFloat = m.group(2);
String myInteger = m.group(3);
}

Regex Pattern to avoid : and , in the strings

I have a string which comes from the DB.
the string is something like this:-
ABC:def,ghi:jkl,hfh:fhgh,ahf:jasg
In short String:String, and it repeats for large values.
I need to parse this string to get only the words without any : or , and store each word in ArrayList
I can do it using split function(twice) but I figured out that using regex I can do it one go and get the arraylist..
String strLine="category:hello,good:bye,wel:come";
Pattern titlePattern = Pattern.compile("[a-z]");
Matcher titleMatcher = titlePattern.matcher(strLine);
int i=0;
while(titleMatcher.find())
{
i=titleMatcher.start();
System.out.println(strLine.charAt(i));
}
However it is not giving me proper results..It ends up giving me index of match found and then I need to append it which is not so logical and efficient,.
Is there any way around..
String strLine="category:hello,good:bye,wel:come";
String a[] = strLine.split("[,:]");
for(String s :a)
System.out.println(s);
Use java StringTokenizer
Sample:
StringTokenizer st = new StringTokenizer(in, ":,");
while(st.hasMoreTokens())
System.out.println(st.nextToken());
Even if you can use a regular expression to parse the entire string at once, I think it would be less readable than splitting it with multiple steps.

How to find and replace a substring?

For example I have such a string, in which I must find and replace multiple substrings, all of which start with #, contains 6 symbols, end with ' and should not contain ) ... what do you think would be the best way of achieving that?
Thanks!
Edit:
just one more thing I forgot, to make the replacement, I need that substring, i.e. it gets replaces by a string generated from the substring being replaced.
yourNewText=yourOldText.replaceAll("#[^)]{6}'", "");
Or programmatically:
Matcher matcher = Pattern.compile("#[^)]{6}'").matcher(yourOldText);
StringBuffer sb = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(sb,
// implement your custom logic here, matcher.group() is the found String
someReplacement(matcher.group());
}
matcher.appendTail(sb);
String yourNewString = sb. toString();
Assuming you just know the substrings are formatted like you explained above, but not exactly which 6 characters, try the following:
String result = input.replaceAll("#[^\\)]{6}'", "replacement"); //pattern to replace is #+6 characters not being ) + '
You must use replaceAll with the right regular expression:
myString.replaceAll("#[^)]{6}'", "something")
If you need to replace with an extract of the matched string, use a a match group, like this :
myString.replaceAll("#([^)]{6})'", "blah $1 blah")
the $1 in the second String matches the first parenthesed expression in the first String.
this might not be the best way to do it but...
youstring = youstring.replace("#something'", "new stringx");
youstring = youstring.replace("#something2'", "new stringy");
youstring = youstring.replace("#something3'", "new stringz");
//edited after reading comments, thanks

Categories