Best way to parse this string in java? - java

I have a string that is the form of:
{'var1':var2}
I was able to parse this string so that var1 and var2 are both string variables. However it takes multiple string tokenizer calls, first to split from the ":" and then to extract the data.
So what would be the best (least lines of code) to do this?

If you just want an array containing the two values, then you can can do it in two lines by extracting a substring and then splitting on "':". It would end up looking something like this:
s = s.substring(2, s.length()-1);
String[] sarr = s.split("':");
If you really wanted a single line of code, you could combine them into:
String[] sarr = s.substring(2, s.length()-1).split("':");

This is unsolvable in the general case.
Consider for example:
case a)
var1=
:':':
var2=
':'
The the full original string would be
{':':':':':'}
case b)
var1=
:
var2=
':':':'
the the full original string would be
{':':':':':'}
So, we need "more information". Depending on your requirements / use case you had to live with the ambiguity, put limitations on the strings, or escape/encode the strings.

This should work:
String yourstring = "{'var1':var2}";
String regex = "\\{'(.+)':(.+)}";
Matcher m = Pattern.compile(regex).matcher(yourstring);
String var1 = m.group(1);
String var2 = m.group(2);
EDIT: for the commentators:
String:
{'this is':somestring':more stuff:for you}
Output:
var1 = this is':somestring
var2 = more stuff:for you
PS: tested with Perl, don't have Java at hand right now, sorry.
EDIT: looks like Java regex engine does not like { unescaped as user unknown points out. Escaped it.

Something like (fragile - see comment(s)):
// 3 lines..
String[] parts = "{'var1':var2} ".trim().split("':");
String var1 = parts[0].substring(2,parts[0].length);
String var2 = parts[1].substring(0,parts[1].length-1);

You can use regex:
String re = "\\{'(.*)':(.*)}";
String var1 = s.replaceAll (re, "$1");
String var2 = s.replaceAll (re, "$2");
You need to mask the opening {, else you get an java.util.regex.PatternSyntaxException: Illegal repetition

Related

when I split a string into multiple strings, how do I get a certain part of that string?

I am trying to find a way to have access to a specific string after a bigger string is broken down into smaller strings. Below is an example:
So now that there are two strings, how do I get the first or second? Since there are brackets, I thought that it is an array of string so I thought all I had to do was do something like System.out.println(parts[0]); But that doesnt work..
String string = "hello ::= good morning";
String parts = Arrays.toString(string.split("::="));
System.out.println(parts);
the output should be [hello, good morning]
You need to put it an array like so:
String s = "I Like Apples."
String[] parts = s.split(" ");
for(String a : parts)
System.out.println(a);
It works fine if you just remove Arrays.toString(...:
String string = "hello ::= good morning";
String parts[] = string.split("::=");
System.out.println(parts[0]);
Update:
To print the whole array, you can do this:-
System.out.println(Arrays.toString(parts));
Also, to trim the spaces, you can change the split line to:-
String parts[] = string.trim().split("\\s*::=\\s*");
This looks like a better use case for pattern grouping with a regular expression to me. I would group the left-hand side and right-hand side of ::= preceded and followed by optional white-space. For example,
String string = "hello ::= good morning";
Pattern p = Pattern.compile("(.+)\\s*::=\\s*(.+)");
Matcher m = p.matcher(string);
if (m.find()) {
System.out.println(m.group(2));
}
Which outputs
good morning
If you also wanted "hello" that would be m.group(1)

Split a string based on pattern and merge it back

I need to split a string based on a pattern and again i need to merge it back on a portion of string.
for ex: Below is the actual and expected strings.
String actualstr="abc.def.ghi.jkl.mno";
String expectedstr="abc.mno";
When i use below, i can store in a Array and iterate over to get it back. Is there anyway it can be done simple and efficient than below.
String[] splited = actualstr.split("[\\.\\.\\.\\.\\.\\s]+");
Though i can acess the string based on index, is there any other way to do this easily. Please advise.
You do not understand how regexes work.
Here is your regex without the escapes: [\.\.\.\.\.\s]+
You have a character class ([]). Which means there is no reason to have more than one . in it. You also don't need to escape .s in a char class.
Here is an equivalent regex to your regex: [.\s]+. As a Java String that's: "[.\\s]+".
You can do .split("regex") on your string to get an array. It's very simple to get a solution from that point.
I would use a replaceAll in this case
String actualstr="abc.def.ghi.jkl.mno";
String str = actualstr.replaceAll("\\..*\\.", ".");
This will replace everything with the first and last . with a .
You could also use split
String[] parts = actualString.split("\\.");
string str = parts[0]+"."+parts[parts.length-1]; // first and last word
public static String merge(String string, String delimiter, int... partnumbers)
{
String[] parts = string.split(delimiter);
String result = "";
for ( int x = 0 ; x < partnumbers.length ; x ++ )
{
result += result.length() > 0 ? delimiter.replaceAll("\\\\","") : "";
result += parts[partnumbers[x]];
}
return result;
}
and then use it like:
merge("abc.def.ghi.jkl.mno", "\\.", 0, 4);
I would do it this way
Pattern pattern = Pattern.compile("(\\w*\\.).*\\.(\\w*)");
Matcher matcher = pattern.matcher("abc.def.ghi.jkl.mno");
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
If you can cache the result of
Pattern.compile("(\\w*\\.).*\\.(\\w*)")
and reuse "pattern" all over again this code will be very efficient as pattern compilation is the most expensive. java.lang.String.split() method that other answers suggest uses same Pattern.compile() internally if the pattern length is greater then 1. Meaning that it will do this expensive operation of Pattern compilation on each invocation of the method. See java.util.regex - importance of Pattern.compile()?. So it is much better to have the Pattern compiled and cached and reused.
matcher.group(1) refers to the first group of () which is "(\w*\.)"
matcher.group(2) refers to the second one which is "(\w*)"
even though we don't use it here but just to note that group(0) is the match for the whole regex.

Spilt string and match end of string with enum values

I have one string which i need to divide into two parts using regex
String string = "2pbhk";
This string i need to divide into 2p and bhk
More over second part should always be bhk or rk, as strings can be one of 1bhk, 5pbhk etc
I have tried
String pattern = ([^-])([\\D]*);
You can use the following regex "(?=bhk|rk)" with split.
str.split("(?=bhk|rk)");
This will split it if there is one of bhk or rk.
This should do the trick:
(.*)(bhk|rk)
First capture holds the "number" part, and the second bhk OR rk.
Regards
String string = "2pbhk";
String first_part, second_part = null;
if(string.contains("bhk")){
first_part = string.substring(0, string.indexOf("bhk"));
second_part = "bhk";
}
else if(string.contains("rk")){
first_part = string.substring(0, string.indexOf("rk"));
second_part = "rk";
}
Try the above once, not using regex but should work.
In case you are looking to split strings that end with rk or bhk but not necessarily at the end of the string (i.e. at the word boundaries), you need to use a regex with \\b:
String[] arr = "5ddddddpbhk".split("(?=(?:rk|bhk)\\b)");
System.out.println(Arrays.toString(arr));
If you want to allow splitting inside a longer string, remove the \\b.
If you only split individual words, use $ instead of \\b (i.e. end of string):
(?=(?:rk|bhk)$)
Here is my IDEONE demo

java how to split string from end

I have a string like "test.test.test"...".test" and i need to access last "test" word in this string. Note that the number of "test" in the string is unlimited. if java had a method like php explode function, everything was right, but... . I think splitting from end of string, can solve my problem.
Is there any way to specify direction for split method?
I know one solution for this problem can be like this:
String parts[] = fileName.split(".");
//for all parts, while a parts contain "." character, split a part...
but i think this bad solution.
Try substring with lastIndexOf method of String:
String str = "almas.test.tst";
System.out.println(str.substring(str.lastIndexOf(".") + 1));
Output:
tst
I think you can use lastIndexOf(String str) method for this purpose.
String str = "test.test.test....test";
int pos = str.lastIndexOf("test");
String result = str.substring(pos);

regular expression to split the string in java

I want to split the string say [AO_12345678, Real Estate] into AO_12345678 and Real Estate
how can I do this in Java using regex?
main issue m facing is in avoiding "[" and "]"
please help
Does it really have to be regex?
if not:
String s = "[AO_12345678, Real Estate]";
String[] split = s.substring(1, s.length()-1).split(", ");
I'd go the pragmatic way:
String org = "[AO_12345678, Real Estate]";
String plain = null;
if(org.startsWith("[") {
if(org.endsWith("]") {
plain = org.subString(1, org.length());
} else {
plain = org.subString(1, org.length() + 1);
}
}
String[] result = org.split(",");
If the string is always surrounded with '[]' you can just substring it without checking.
One easy way, assuming the format of all your inputs is consistent, is to ignore regex altogether and just split it. Something like the following would work:
String[] parts = input.split(","); // parts is ["[AO_12345678", "Real Estate]"]
String firstWithoutBrace = parts[0].substring(1);
String secondWithoutBrace = parts[1].substring(0, parts[1].length() - 1);
String first = firstWithoutBrace.trim();
String second = secondWithoutBrace.trim();
Of course you can tailor this as you wish - you might want to check whether the braces are present before removing them, for example. Or you might want to keep any spaces before the comma as part of the first string. This should give you a basis to modify to your specific requirements however.
And in a simple case like this I'd much prefer code like the above to a regex that extracted the two strings - I consider the former much clearer!
you can also use StringTokenizer. Here is the code:
String str="[AO_12345678, Real Estate]"
StringTokenizer st=new StringTokenizer(str,"[],",false);
String s1 = st.nextToken();
String s2 = st.nextToken();
s1=AO_12345678
s1=Real Estate
Refer to javadocs for reading about StringTokenizer
http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
Another option using regular expressions (RE) capturing groups:
private static void extract(String text) {
Pattern pattern = Pattern.compile("\\[(.*),\\s*(.*)\\]");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) { // or .matches for matching the whole text
String id = matcher.group(1);
String name = matcher.group(2);
// do something with id and name
System.out.printf("ID: %s%nName: %s%n", id, name);
}
}
If speed/memory is a concern, the RE can be optimized to (using Possessive quantifiers instead of Greedy ones)
"\\[([^,]*+),\\s*+([^\\]]*+)\\]"

Categories