extract substring attributes and values from main string in JAVA

extract substring attributes and values from main string in JAVA - java

I have the following String:
\018attribute1=value1\028attribute2=value2\033attribute3=value3
Note: the "\0xx" is a hard string no special character include and "\0xx" is random. I get this string from an api and it contains the "\" character.
I want to extract attribute names and attribute values
how to do it?

You need to escape the \ character twice. Once for Java, and once for regex. This turns it into \\\\. Then you can use Pattern and Matcher to build a map of your values:
Pattern p = Pattern.compile("\\\\0..([^=]+)=([^\\\\]*)");
Matcher matcher = p.matcher("\\018attribute1=value1\\028attribute2=value2\\033attribute3=value3");
Map<String, String> attributes = new HashMap<String, String>();
while (matcher.find()) {
attributes.put(matcher.group(1), matcher.group(2));
}

Something like so worked for me:
String str = "\\018attribute1=value1\\028attribute2=value2\\033attribute3=value3";
Pattern p = Pattern.compile("0\\d{2}(.*?)=(.*?)(\\\\|$)");
Matcher m = p.matcher(str);
while(m.find())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println("-------");
}
It yielded:
attribute1
value1
-------
attribute2
value2
-------
attribute3
value3
The regex assumes that the pattern you need to match will always start with a back slash followed by 2 digits (\0xx). It will then extract the first substring until it hits the equals sign. Once it matches the equals sign, it will keep matching until it either hits another slash or else the end of the string.
If you will not have the three digits, you can either substitute it with something along the lines of \0\w{2} this will match a zero followed by anything which is a letter, number or underscore.

Related

Replace regex pattern to lowercase in java

I'm trying to replace a url string to lowercase but wanted to keep the certain pattern string as it is.
eg: for input like:
http://BLABLABLA?qUERY=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}
The expected output would be lowercased url but the multiple macros are original:
http://blablabla?query=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}
I was trying to capture the strings using regex but didn't figure out a proper way to do the replacement. Also it seemed using replaceAll() doesn't do the job. Any hint please?

It looks like you want to change any uppercase character which is not inside ${...} to its lowercase form.
With construct
Matcher matcher = ...
StringBuffer buffer = new StringBuffer();
while (matcher.find()){
String matchedPart = ...
...
matcher.appendReplacement(buffer, replacement);
}
matcher.appendTail(buffer);
String result = buffer.toString();
or since Java 9 we can use Matcher#replaceAll(Function<MatchResult,String> replacer) and rewrite it like
String replaced = matcher.replaceAll(m -> {
String matchedPart = m.group();
...
return replacement;
});
you can dynamically build replacement based on matchedPart.
So you can let your regex first try to match ${...} and later (when ${..} will not be matched because regex cursor will not be placed before it) let it match [A-Z]. While iterating over matches you can decide based on match result (like its length or if it starts with $) if you want to use use as replacement its lowercase form or original form.
BTW regex engine allows us to place in replacement part $x (where x is group id) or ${name} (where name is named group) so we could reuse those parts of match. But if we want to place ${..} as literal in replacement we need to escape \$. To not do it manually we can use Matcher.quoteReplacement.
Demo:
String yourUrlString = "http://BLABLABLA?qUERY=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}";
Pattern p = Pattern.compile("\\$\\{[^}]+\\}|[A-Z]");
Matcher m = p.matcher(yourUrlString);
StringBuffer sb = new StringBuffer();
while(m.find()){
String match = m.group();
if (match.length() == 1){
m.appendReplacement(sb, match.toLowerCase());
} else {
m.appendReplacement(sb, Matcher.quoteReplacement(match));
}
}
m.appendTail(sb);
String replaced = sb.toString();
System.out.println(replaced);
or in Java 9
String replaced = Pattern.compile("\\$\\{[^}]+\\}|[A-Z]")
.matcher(yourUrlString)
.replaceAll(m -> {
String match = m.group();
if (match.length() == 1)
return match.toLowerCase();
else
return Matcher.quoteReplacement(match);
});
System.out.println(replaced);
Output: http://blablabla?query=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}

This regex will match all the characters before the first &macro, and put everything between http:// and the first &macro in its own group so you can modify it.
http://(.*?)&macro
Tested here
UPDATE: If you don't want to use groups, this regex will match only the characters between http:// and the first &macro
(?<=http://)(.*?)(?=&macro)
Tested here

Regular expression java to extract the balance from a string

I have a String which contains " Dear user BAL= 1,234/ ".
I want to extract 1,234 from the String using the regular expression. It can be 1,23, 1,2345, 5,213 or 500
final Pattern p=Pattern.compile("((BAL)=*(\\s{1}\\w+))");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(3);
else
return "";
This returns 3.
What regular expression should I make? I am new to regular expressions.

You search in your regex for word characters \w+ but you should search for digits with \d+.
Additionally there is the comma, so you need to match that as well.
I'd use
/.BAL=\s([\d,]+(?=/)./
as pattern and get only the number in the resulting group.
Explanation:
.* match anything before
BAL= match the string "BAL="
\s match a whitespace
( start matching group
[\d,]+ matches every digit or comma one ore more times
(?=/) match the former only if followed by a slash
) end matching group
.* matches anything thereaft
This is untestet, but it should work like this:
final Pattern p=Pattern.compile(".*BAL=\\s([\\d,]+(?=/)).*");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(1);
else
return "";
According to an online tester, the pattern above matches the text:
BAL= 1,234/

If it didn't have to be extracted by the regular expression you could simply do:
// split on any whitespace into a 4-element array
String[] foo = text.split("\\s+");
return foo[3];

Remove occurrences of a given character sequence at the beginning of a string using Java Regex

I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.

Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times

in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )

Find characters that match a regex's set

I have a regex w_p[a-z]
It would match input like w_pa, w_pb ... w_pz. I like to find which character exactly was matched i.e. a,b or z for the above input. Is this possible with java regex?

Yes, you need to capture:
final Pattern pattern = Pattern.compile("w_p([a-z])");
final Matcher m = pattern.matcher(input);
if (m.find())
// what is matched is in m.group(1)

Sure, use Regexpr groups. w_p([a-z]) defines a group for the character you are looking for.
Pattern p = Pattern.compile("w_p([a-z])");
Matcher matcher = p.matcher(input);
if (matcher.find()) {
String character = matcher.group(1)
}
matcher.group(0) contains all that was matched (w_pa or w_pb etc.)
matcher.group(1) contains what was found in the first () pair.
See the documentation for more information.

The REGEX will be something like this:
w_p([a-z])
So you will create a group from wich you can get the value

RegEX: how to match string which is not surrounded

I have a String "REC/LESS FEES/CODE/AU013423".
What could be the regEx expression to match "REC" and "AU013423" (anything that is not surrounded by slashes /)
I am using /^>*/, which works and matches the string within slash's i.e. using this I am able to find "/LESS FEES/CODE/", but I want to negate this to find reverse i.e. REC and AU013423.
Need help on this. Thanks

If you know that you're only looking for alphanumeric data you can use the regex ([A-Z0-9]+)/.*/([A-Z0-9]+) If this matches you will have the two groups which contain the first & final text strings.
This code prints RECAU013423
final String s = "REC/LESS FEES/CODE/AU013423";
final Pattern regex = Pattern.compile("([A-Z0-9]+)/.*/([A-Z0-9]+)", Pattern.CASE_INSENSITIVE);
final Matcher matcher = regex.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
You can tweak the regex groups as necessary to cover valid characters

Here's another option:
String s = "REC/LESS FEES/CODE/AU013423";
String[] results = s.split("/.*/");
System.out.println(Arrays.toString(results));
// [REC, AU013423]

^[^/]+|[^/]+$
matches anything that occurs before the first or after the last slash in the string (or the entire string if there is no slash present).
To iterate over all matches in a string in Java:
Pattern regex = Pattern.compile("^[^/]+|[^/]+$");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

extract substring attributes and values from main string in JAVA - java

Related

Replace regex pattern to lowercase in java

Regular expression java to extract the balance from a string

Remove occurrences of a given character sequence at the beginning of a string using Java Regex

Find characters that match a regex's set

RegEX: how to match string which is not surrounded

Categories

Resources