I was using this working pattern (logback.groovy):
{'((?:password(=|:|>))|(?:secret(=|:))|(?:salt(=|:)))','\$1*******\$3'}
to mask sensitive data. One day I needed to surround it with double quotes, like
was: password=smth
became: "password"="smth"
So I turned regexp into this (just added \" before and after keywords, and also I've tried \\"):
{'(\"?(?:password\"?(=|:|>))|(?:secret\"?(=|:))|(?:salt\"?(=|:)))','\$1*******\$3'}
But I get this error on app startup:
Failed to parse pattern
Unexpected character ('?' (code 63)): was expecting comma to separate Object entries
Can someone please explain to me what am I doing wrong?
If someone wondering here is correct version:
{'(\\\"?(?:password\\\"?(=|:|>))|(?:secret\\\"?(=|:))|(?:salt\\\"?(=|:)))','\$1*******\$3'}
Sorry if I am asking very basic question here. But i am using a dropwizard application in which log format is given as "%-6level [%d{HH:mm:ss.SSS}] [%t] %logger{5} - %X{code} %msg %n".
And application logs contains below lines :-
INFO [2017-06-07 13:54:43,828] com.foo.Bar: In Get Method
I understood that meaning of %-6level , date and %t and %msg %n in log format but couldn't get the meaning of %X{code} and %logger{5} and don't see these printing in my logs.
Can someone point me to a proper doc where each of these parameters of log format is explained in details.
Here is a nice documentation about layouts in logback : Layouts
For the logger{length} part :
Outputs the name of the logger at the origin of the logging event.
This conversion word takes an integer as its first and only option.
The converter's abbreviation algorithm will shorten the logger name,
usually without significant loss of meaning. Setting the value of
length option to zero constitutes an exception. It will cause the
conversion word to return the sub-string right to the rightmost dot
character in the logger name. The next table provides examples of the
abbreviation algorithm in action.
For the X{key:-defaultVal} part :
Outputs the MDC (mapped diagnostic context) associated with the thread
that generated the logging event.
More information about MDC can be found here : Mapped Diagnostic Context
With your configuration you would call it like e.g :
MDC.put("code", "whateverCode");
I got to deal here with a problem, caused by a dirty design. I get a list of string and want to parse attributes out of it. Unfortunately, I can't change the source, where these String were created.
Example:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false"
Now I want to extract the attributes type, languageCode, url, ref, info and deactivated.
The problem here is the field info, whose text is not limited by quote mark. Also commas may occur in this field, so I can't use the comma at the end of the string, to find out where is ends.
Additional, those strings not always contain all attributes. type, info and deactivated are always present, the rest is optional.
Any suggestions how I can solve this problem?
One possible solution is to search for = characters in the input and then take the single word immediately before it as the field name - it seems that all your field names are single words (no whitespace). If that's the case, you can then take everything after the = until the next field name (accounting for separating ,) as the value.
This assumes that the value cannot contain =.
Edit:
As a possible way to handle embedded =, you can see if the word in front of it is one your known field names - if not, you can possibly treat the = as an embedded character rather than an operator. This, however, assumes that you have a fixed set of known fields (some of which may not always appear). This assumption may be eased if you know that the field names are case-sensitive.
Assuming that order of elements is fixed you could write solution using regex like this one
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String regex = //type, info and deactivated are always present
"type=(?<type>.*?)"
+ "(?:, languageCode=(?<languageCode>.*?))?"//optional group
+ "(?:, url=(?<url>.*?))?"//optional group
+ "(?:, ref=(?<rel>.*?))?"//optional group
+ ", info=(?<info>.*?)"
+ ", deactivated=(?<deactivated>.*?)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
if(m.matches()){
System.out.println("type -> "+m.group("type"));
System.out.println("languageCode -> "+m.group("languageCode"));
System.out.println("url -> "+m.group("url"));
System.out.println("rel -> "+m.group("rel"));
System.out.println("info -> "+m.group("info"));
System.out.println("deactivated -> "+m.group("deactivated"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
rel -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false
EDIT: Version2 regex searching for oneOfPossibleKeys=value where value ends with:
, oneOfPossibleKeys=
or has end of string after it (represented by $).
Code:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String[] possibleKeys = {"type","languageCode","url","ref","info","deactivated"};
String keysStrRegex = String.join("|", possibleKeys);
//above will contain type|languageCode|url|ref|info|deactivated
String regex = "(?<key>\\b(?:"+keysStrRegex+")\\b)=(?<value>.*?(?=, (?:"+keysStrRegex+")=|$))";
// (?<key>\b(?:type|languageCode|url|ref|info|deactivated)\b)
// =
// (?<value>.*?(?=, (?:type|languageCode|url|ref|info|deactivated)=|$))System.out.println(regex);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(m.group("key")+" -> "+m.group("value"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
ref -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false
You could use a regular expression, capturing all the "fixed" groups and using whatever remains for info. This should even work if the info part contains , or = characters. Here's some quick example (using Python, but that should not be a problem...).
>>> p = r"(type=[A-Z]+), (languageCode=[-A-Z]+), (url=[^,]+), (ref=\d), (info=.+?), (deactivated=(?:true|false))"
>>> s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars, even deactivated=true., deactivated=false"
>>> re.search(p, s).groups()
('type=INFO',
'languageCode=EN-GB',
'url=http://www.stackoverflow.com',
'ref=1',
'info=Text, that may contain all kind of chars, even deactivated=true.',
'deactivated=false')
If any of those elements are optional, you can put a ? after those groups, and make the comma optional. If the order can be different, then it's more complicated. In this case, instead of using one RegEx to capture everything at once, use several RegExes to capture the individual attributes and then remove (replace with '') those in the string before matching the next attribute. Finally, match info.
On further consideration, given that those attributes could have any order, it may be more promising to capture just everything spanning from one keyword to the next, regardless of its actual content, very similar to Pshemo's solution:
keys = "type|languageCode|url|ref|info|deactivated"
p = r"({0})=(.+?)(?=\, (?:{0})=|$)".format(keys)
matches = re.findall(p, s)
But this, too, might fail in some very obscure cases, e.g. if the info attribute contains something like ', ref=foo', including the comma. However, there seems to be no way around those ambiguities. If you had a string like info=in this string, ref=1, and in another, ref=2, ref=1, does it contain one ref attribute, or three, or none at all?
I have Strings like the following:
"parameter: param0=true, param1=401230 param2=asset client: desktop"
"parameter: param0=false, param1=15230 user: user213 client: desktop"
"parameter: param0=false, param1=51235 param2=asset result: ERROR"
The pattern is parameter:, then the param's, and after the params either client: and/or user: and/or result.
I want to match the stuff between parameter: and the first occurrence of either client:, user: or result:
So for the 2nd String it should match param0=false, param1=15230.
My regex is:
parameter:\s+(.*)\s+(result|client|user):
But now if I match the 2nd String it captures param0=false, param1=15230 user: user213 (looks like regex is matching greedy)
How to fix this? parameter:\s+(.*)\s+(result|client|user)+?: won't fix it
With this regex tester I can add the modifier U to the regex to make regex lazy by default, is this possible in Java too?
Try putting the ? character inside the first captured group (the subpattern you intend to extract):
parameter:\\s+(.*?)\\s+(result|client|user):
No. There is no ungreedy modifier in Java. You have to use ? behind modifiers to make the quantifiers as lazy capture.
This means you should denote all quantifiers with a ?, see the following pattern:
"parameter:\\s+?(.*?)\\s+?(result|client|user):"
Specified by:http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
I have a regular expression I am trying to use to rewrite an incoming REST url and am getting stuck on one use case when one section of the URL is excluded.
Here is the regex I'm currently using:
^(/[^/]+/(?:books))/([^/]+?)(?:/(?:(?!page).+?))?(?:/page/(\\d+))?$
As example I'm using "$1 - $2 - $3" as parts to use in writing new URL.
Here are the examples that are working correctly...
"/mySite/books/topic1/page/2" results in "/mySite/books - topic1 - 2"
"/mySite/books/topic1/subtopic1/page/2" results in "/mySite/books - topic1 - 2"
All the above work as intended. The problem is when the URL excludes the "topic1" part of the URL then the results are not what I need. Example:
"/mySite/books/page/2" results in "/mySite/books - page - "
What I need is the $2 to be blank, because there is no topic, and the page number still as $3. What I need as output...
"/mySite/books/page/2" results in "/mySite/books - - 2"
What can I change in my regex to satisfy that scenario without disrupting the existing ones that work correctly? This is being done in Java.
You might try to use regex pattern
^(/[^/]+/books)/(?:(?!page/)([^/]+)/)?page/(\\d+)$
It should suffice to make your second group ungreedy. Then the engine will first try to find a match without using it (trying only /page/\\d+ instead). And if that fails it tries to include the second group:
^(/[^/]+/(?:books))/([^/]+?)(?:/(?:(?!page).+?))??(?:/page/(\\d+))?$
Prepending any kind of quantifier (+, *, ? and {..} with ?) makes it ungreedy.