Use regex in Java to extract specific parts of a String - java

In the following string I want to extract the ids that come after {\"company_id\": the part. The first in this case will be 4100, and there are two more farther away 4045 and 2979. All of this ids will be 4 digits. Sorry for including such a long string. The reason why I want to use regex and not some sort of Json parser is because the json is string that is malformed.
String company = "[{\"company_id\":4100,\"data\":{\"drm_user_id\":572901936637129135,\"direct_status_id\":0,\"direct_optin_date\":0,\"direct_first_optin_date\":0,\"direct_last_optin_date\":0,\"direct_optout_date\":0,\"direct_last_form_date\":0,\"direct_last_form_id\":0,\"direct_last_promo_id\":0,\"anon_status_id\":600,\"anon_optin_date\":1446132360498,\"anon_first_optin_date\":1446132360498,\"anon_last_optin_date\":1446132360498,\"anon_optout_date\":0,\"anon_last_form_date\":1446132360498,\"anon_last_form_id\":101,\"anon_last_promo_id\":1002003,\"last_registration_date\":1446132360498,\"mp_status_id\":600,\"mp_control_state\":-1,\"mp_match_date\":0,\"mp_vs_version\":0,\"mp_initial_value_segment\":0,\"mp_id\":0,\"conversion_last_form_date\":0,\"conversion_last_form_id\":0,\"conversion_last_promo_id\":-1,\"last_message_date\":1446132368928,\"cg_version\":0,\"cg_version_date\":0,\"num_anon_messages_global\":0,\"num_anon_messages_global_date\":0,\"reg_creator_id\":576,\"reg_form_id\":101,\"reg_method_id\":1,\"reg_creator_type_id\":1},\"personal_data\":{\"version\":0,\"personal_data\":\"{}\",\"mdc_data\":{\"version\":0},\"custom_data\":\"{}\"},\"category_data\":{},\"campaignImpressions\":{},\"journeyStartDate\":0},{\"company_id\":4045,\"data\":{\"drm_user_id\":572901936637129135,\"direct_status_id\":0,\"direct_optin_date\":0,\"direct_first_optin_date\":0,\"direct_last_optin_date\":0,\"direct_optout_date\":0,\"direct_last_form_date\":0,\"direct_last_form_id\":0,\"direct_last_promo_id\":0,\"anon_status_id\":600,\"anon_optin_date\":1446132360498,\"anon_first_optin_date\":1446132360498,\"anon_last_optin_date\":1446132360498,\"anon_optout_date\":0,\"anon_last_form_date\":1446132360498,\"anon_last_form_id\":101,\"anon_last_promo_id\":1002003,\"last_registration_date\":1446132360498,\"mp_status_id\":600,\"mp_control_state\":-1,\"mp_match_date\":0,\"mp_vs_version\":0,\"mp_initial_value_segment\":0,\"mp_id\":0,\"conversion_last_form_date\":0,\"conversion_last_form_id\":0,\"conversion_last_promo_id\":-1,\"last_message_date\":1446132368928,\"cg_version\":0,\"cg_version_date\":0,\"num_anon_messages_global\":0,\"num_anon_messages_global_date\":0,\"reg_creator_id\":576,\"reg_form_id\":101,\"reg_method_id\":1,\"reg_creator_type_id\":1},\"personal_data\":{\"version\":0,\"personal_data\":\"{}\",\"mdc_data\":{\"version\":0},\"custom_data\":\"{}\"},\"category_data\":{},\"campaignImpressions\":{},\"journeyStartDate\":0},{\"company_id\":2979,\"data\":{\"drm_user_id\":572901936637129135,\"direct_status_id\":0,\"direct_optin_date\":0,\"direct_first_optin_date\":0,\"direct_last_optin_date\":0,\"direct_optout_date\":0,\"direct_last_form_date\":0,\"direct_last_form_id\":0,\"direct_last_promo_id\":0,\"anon_status_id\":600,\"anon_optin_date\":1446132360498,\"anon_first_optin_date\":1446132360498,\"anon_last_optin_date\":1446132360498,\"anon_optout_date\":0,\"anon_last_form_date\":1446132360498,\"anon_last_form_id\":101,\"anon_last_promo_id\":1002003,\"last_registration_date\":1446132360498,\"mp_status_id\":600,\"mp_control_state\":-1,\"mp_match_date\":0,\"mp_vs_version\":0,\"mp_initial_value_segment\":0,\"mp_id\":0,\"conversion_last_form_date\":0,\"conversion_last_form_id\":0,\"conversion_last_promo_id\":-1,\"last_message_date\":1446132368928,\"cg_version\":0,\"cg_version_date\":0,\"num_anon_messages_global\":0,\"num_anon_messages_global_date\":0,\"reg_creator_id\":576,\"reg_form_id\":101,\"reg_method_id\":1,\"reg_creator_type_id\":1},\"personal_data\":{\"version\":0,\"personal_data\":\"{}\",\"mdc_data\":{\"version\":0},\"custom_data\":\"{}\"},\"category_data\":{},\"campaignImpressions\":{},\"journeyStartDate\":0}]";
This is what I have so far:
Pattern pattern = Pattern.compile("company_id\\\\\":(\\d{4})");
Matcher matcher = pattern.matcher(company);
while(matcher.find()){
System.out.println(matcher.group(1)+"\n");
}
However this does not work,and I am not sure how to actually check that the number comes after this {\"company_id\": specific part.

Just a single backslash would be enough. \" should match a double quote.
Pattern pattern = Pattern.compile("\"company_id\":(\\d{4})");

Related

How to split following String to number-letter units

What pattern would I use to split the following types of strings.:
"NumStringNumString..."
For example "3X12Y5Z" into a String array of "3X","12Y", and "5Z"
Note: if necessary assume that the string is only one character as the original problem stated. I would still prefer the more general solution though.
I thought that the pattern "^(\d+\w+)" would work, but it doesn't cut it.
^ forces to the beginning of the string, where you want to find all the patterns.
if necessary assume that the string is only one character
I'll also assume uppercase characters only
Pattern p = Pattern.compile("[0-9]+[A-Z]")
Matcher m = p.matcher("3X12Y5Z")
while (m.find()) {
System.out.println(m.group())
}

Extract substring after a certain pattern

I have the following string:
http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true
How can I extract the part after 30/? In this case, it's 32531a5d-b0b1-4a8b-9029-b48f0eb40a34.I have another strings having same part upto 30/ and after that every string having different id upto next / which I want.
You can do like this:
String s = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
System.out.println(s.substring(s.indexOf("30/")+3, s.length()));
split function of String class won't help you in this case, because it discards the delimiter and that's not what we want here. you need to make a pattern that looks behind. The look behind synatax is:
(?<=X)Y
Which identifies any Y that is preceded by a X.
So in you case you need this pattern:
(?<=30/).*
compile the pattern, match it with your input, find the match, and catch it:
String input = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
Matcher matcher = Pattern.compile("(?<=30/).*").matcher(input);
matcher.find();
System.out.println(matcher.group());
Just for this one, or do you want a generic way to do it ?
String[] out = mystring.split("/")
return out[out.length - 2]
I think the / is definitely the delimiter you are searching for.
I can't see the problem you are talking about Alex
EDIT : Ok, Python got me with indexes.
Regular expression is the answer I think. However, how the expression is written depends on the data (url) format you want to process. Like this one:
Pattern pat = Pattern.compile("/Content/SiteFiles/30/([a-z0-9\\-]+)/.*");
Matcher m = pat.matcher("http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true");
if (m.find()) {
System.out.println(m.group(1));
}

Split number string on java using regex

I want to using regex on Java to split a number string.
I using a online regex tester test the regex is right.
But in Java is wrong.
Pattern pattern = Pattern.compile("[\\\\d]{1,4}");
String[] results = pattern.split("123456");
// I expect 2 results ["1234","56"]
// Actual results is ["123456"]
Anything do I missing?
I knows this question is boring. But I wanna to solve this problem.
Answer
Pattern pattern = Pattern.compile("[\\d]{1,4}");
String[] results = pattern.split("123456");
// Results length is 0
System.out.println(results.length);
is not working. I have try it. It's will return nothing on the results.
Please try before answer it.
Sincerely thank the people who helped me.
Solution:
Pattern pattern = Pattern.compile("([\\d]{1,4})");
Matcher matcher = pattern.matcher("123456");
List<String> results = new ArrayList<String>();
while (matcher.find()) {
results.add(matcher.group(1));
}
Output 2 results ["1234","56"]
Pattern pattern = Pattern.compile("[\\\\d]{1,4}")
Too many backslashes, try [\\d]{1,4} (you only have to escape them once, so the backslash in front of the d becomes \\. The pattern you wrote is actually [\\d]{1,4} (a literal backslash or a literal d, one to four times).
When Java decided to add regular expressions to the standard library, they should have also added a regular expression literal syntax instead of shoe-horning it over Strings (with the unreadable extra escaping and no compile-time syntax checking).
Solution:
Pattern pattern = Pattern.compile("([\\d]{1,4})");
Matcher matcher = pattern.matcher("123456");
List<String> results = new ArrayList<String>();
while (matcher.find()) {
results.add(matcher.group(1));
}
Output 2 results ["1234","56"]
You can't do it in one method call, because you can't specify a capturing group for the split, which would be needed to break up into four char chunks.
It's not "elegant", but you must first insert a character to split on, then split:
String[] results = "123456".replaceAll("....", "$0,").split(",");
Here's the output:
System.out.println(Arrays.toString(results)); // prints [1234, 56]
Note that you don't need to use Pattern etc because String has a split-by-regex method, leading to a one-line solution.

how to encode String into Pattern and retrieve the String

Question closed because I misunderstood the situation. To show my stupidity though, I'll not remove what I wrote.
I'd like to encode a piece of string into Pattern, and get the string back.
I tried:
String s = buff.readLine();
Pattern p = new Pattern(s);
and use the following to retrieve my string
System.out.println(p.toString());
But it didn't work, the output is just the "package name#(some random things)... I tried Pattern p = Pattern.compile (s);
but I got an error from the compiler.
Well I just tried this:
Pattern p = Pattern.compile("Hello");
System.out.println( p.toString() );
And it worked, printing out 'Hello'.
Are you importing the java.util.regex.Pattern package?
The javadoc for Pattern#toString() seems to indicate that the source of the complete regex is only returned since java 1.5. However, Pattern#pattern() does not have a since tag, so it is presumably available since the class was introduced (java 1.4). Try System.out.println(p.pattern());
You're using a regex Pattern object to store and retrieve a String. This makes no sense. A Pattern is not used for storing Strings. A Pattern is used for searching other strings. It's a regular expression engine. Let me give you an example of the use of a Pattern.
We really have 2 objects when using Regular Expressions in Java. Pattern, and Matcher.
Pattern = A Regular Expression.
Matcher = All of the Matches found when we apply the Pattern to a String.
Let me give you an example of Pattern and Matcher, we'll search for four digits, separated by a colon, like as in time, ie 12:42
long timeL;
Pattern pattern = Pattern.compile(".*([1234567890]{2}:[1234567890]{2}).*");
Matcher matcher = pattern.matcher("Match me! 12:42 Match me!");
if (matcher.matches()) {
String timeStr = matcher.group(1);
System.out.println("Just the time: "+timeStr);
System.out.println("The entire String: "+matcher.group(0));
String[] timeParts = timeStr.split("[:]");
int hours = Integer.parseInt(timeParts[0]);
int minutes = Integer.parseInt(timeParts[1]);
timeL = (hours*60*60*1000) + (minutes*60*1000);
System.out.println(timeL);
}
After we've applied the Pattern to the String, and gotten a Matcher, we ask if the Matcher actually has a Match or not. You'll notice that we then request group 1, which is the match in the parantheses in: .([1234567890]{2}:[1234567890]{2}).
group 0 would be the entire match, and would result in returning the String given.
So, I hope you understand why it's extremely weird to be using a Pattern to store a String.

How to get several regex groups from Matcher in Java?

I have a Java program that does some String matching. I'm looking for anything that matches \d+x\d+ in a String. This works, using the Pattern and Matcher classes. However, to parse the String parts I have found, I have to manually parse the String I get from the Matcher.find() and Matcher.group(). How can I tell the Pattern I'm looking for something in the form of (\d+)x(\d+) and get the Matcher to return those groups separately?
So instead of the string "1x23" I want to get two strings, "1" and "23".
Use Matcher.group(int), not Matcher.group().
With the given regex and input, group(1) should be "1" and group(2) should be "23".

Categories