Java regex lookbehind - java

I want to match a string that has "json" (occurs more than 2 times) and without string "from" between two "json".
For example(what I want the string match or not):
select json,json from XXX -> Yes
select json from json XXXX -> No
select json,XXXX,json from json XXX -> Yes
Why the third is matching because I just want two "json" string occurs without "from" inside between it.
After learning regex lookbehind, I'm write the regex like this:
select.*json.*?(?<!from)json.*from.*
I'm using regex lookbehind to except the from string.
But after test, I find this regex match the string "select get_json_object from get_json_object" too.
What wrong for my regex? Any suggestion is appreciated.

You need to use tempered greedy token for achieving this. Use this regex,
\bjson\b(?:(?!\bfrom\b).)+\bjson\b
This expression (?:(?!\bfrom\b).)+ will match any text that does not contain from as a whole word inside it.
Regex Demo
For matching the whole line, you can use,
^.*\bjson\b(?:(?!\bfrom\b).)+\bjson\b.*$
Like you wanted in your post, this regex will match the line as long as it finds a string where a from does not appear between two jsons
Regex Demo with full line match
Edit:
Why OP's regex select.*json.*?(?<!from)json.*from.* didn't work as expected
Your regex starts matching with select and then .* matches as much as possible, while making sure it finds json ahead followed by some optional characters and then again expects to find a json string then .* matches again some characters then expects to find a from and finally using .* zero or more optional characters.
Let's take an example string that should match.
select json from json json XXXX
It has two json string without from in between so it should match but it doesn't, because in your regex, the order or presence of json and from is fixed which is json then again json then from which is not the case in this string.
Here is a Java code demo
List<String> list = Arrays.asList("select json,json from XXX","select json from json XXXX","select json,json from json XXX","select json from json json XXXX");
list.forEach(x -> {
System.out.println(x + " --> " + x.matches(".*\\bjson\\b(?:(?!\\bfrom\\b).)+\\bjson\\b.*"));
});
Prints,
select json,json from XXX --> true
select json from json XXXX --> false
select json,json from json XXX --> true
select json from json json XXXX --> true

Related

Any suggestions how to create Regex for this in java for String.replaceAll()?

My String is like this.
{\\\"692950841314120\\\":[{\\\"type\\\":\\\"ads_management\\\",\\\"call_count\\\":3,\\\"total_cputime\\\":1,\\\"total_time\\\":5,\\\"estimated_time_to_regain_access\\\":0}]}
Since the key here is a variable value I am trying to replace this 692950841314120(or the values which I get from sever) with a constant like ID. My main goal is to parse this as POJO. I have tried using..
string.replaceAll("^[0-9]{15}$","ID")
but due to Slashes I think i am not able to get the desired value. Is there any better way to do this. I know I can do below Code but I don't want any ID123 if I added extra value and distort any other info in JSON.
string.replaceAll("[0-9]{15}","ID")
Strictly speaking, if you have a valid JSON string, you should parse it using something like GSON, rather than using regex. That being said, if you must use regex, you could try removing the starting and ending anchors:
string.replaceAll("[0-9]{15}", "ID")
Or maybe use double quotes instead:
string.replaceAll("\"[0-9]{15}\"", "ID")
It is safer to assume the value is inisde \" and \":.
You can then use
.replaceAll("(\\\\\")[0-9]{15}(\\\\\":)", "$1ID$2")
The regex is (\\")[0-9]{15}(\\":) and it means:
(\\") - match and capture \" substring into Group 1
[0-9]{15} - fifteen digits
(\\":) - Group 2: a \": substring.
The $1 and $2 are placeholders holding the Group 1 and 2 values.
You should use "A word boundary" \b.
Try this.
public static void main(String[] args) {
String input = "{\\\"692950841314120\\\":"
+ "[{\\\"type\\\":\\\"12345678901234567890\\\","
+ "\\\"call_count\\\":3,"
+ "\\\"total_cputime\\\":1,"
+ "\\\"total_time\\\":5,"
+ "\\\"estimated_time_to_regain_access\\\":0}]}";
System.out.println(input.replaceAll("\\b[0-9]{15}\\b", "ID"));
}
output:
{\"ID\":[{\"type\":\"12345678901234567890\",\"call_count\":3,\"total_cputime\":1,\"total_time\":5,\"estimated_time_to_regain_access\":0}]}

how to check regex starts and ends with regex

I am having the regex for capturing string if they are in between double quote and not start or end with /.
But the regex solution which I wanted.
The regex should not capture
Condition 1. Capture text between two double or single quotes.
Condition 2. But it shouldn't capture if starts with [ and ends with ]
Condition 3. But it shouldn't if starts with /" and ends with /' or starts with /" and ends with /'
Example:
REGEX: \"(\/?.)*?\"
Input: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])
output:
captured output:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
5. "test"
6. "in"
Expected result:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
Condition 1 explanation:
Capture the text between double or single quotes.
example:
input : "test","m2m:cin.as"
output: "test","m2m:cin.as"
Condition 2 explanation:
If the regex is between starts with [ and ends with ] but it is having double or single quote then also it should not capture.
example:
input: ["test"]
output: it should not capture
Condition 3 explanation:
In the above-expected result for the input "[/"Dimming Value/"]" there is a two-time double quote but is capturing only one excluding /". So, the output is [/"Dimming Value/"]. Like this, I want if /' (single quote preceded by /).
Note:
For input "[/"Dimming Value/"]" or '[/'Dimming Value/']', here although the text is between double quote and single quote and having [ and ] it should not ignore the string. The output should be [/"Dimming Value/"].
As I understood, you want to capture text between double quotes, except:
if initial double quotes prefixed by [ or final double quotes suffixed by ]
doubles quotes prefixed by / should not be the begin or end of matched text
I don't know if you want also capture text between single quotes, because you text is not complete clear.
For create a non capture group with negative matching of prefixed chars, you need a group of type Negative Lookbehind, with syntax (?<!prefix that you dont want), but this is not present on java or javascript regex engine.
The best regex that I build to return what you want for you example (but only work on PHP or python (you can check it on site regex101.com or similar)) is:
(?<![\[/])\"(?!\])(\/?.)*?\"(?![\]/])
I added the restriction for don't match if initial double quotes suffixed by ] to prevent match "][" on text ["test"]["in"]
Anyway, this will not solve your problem, since will not work within java or javascript engine!
Do you have any way to process the results, and exclude the bad matches?
If so, you can match bad prefix and bad suffix and exclude it from the results:
[\[]?\"(\/?.)*?\"[\]]?
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
["test"]
["in"]
Full javascript code, including pos processing:
'Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])'
.match(/[\[]?\"(\/?.)*?\"[\]]?/g).filter(s => !s.startsWith('[') && !s.endsWith(']'))
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
EDIT:
equivalent java code:
CharSequence yourStringHere = "Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(\"test\"), \"m2m:cin.as\"),\"payloads_ul.test\"),\"[/\"Dimming Value/\"]\",input[\"test\"][\"in\"])";
Matcher m = Pattern.compile("[\\[]?\\\"(\\/?.)*?\\\"[\\]]?")
.matcher(yourStringHere);
while (m.find()) {
String s = m.group();
if (!s.startsWith("[") && !s.endsWith("]")) {
allMatches.add(s);
}
}

How to Parse a Java String containing HTML element as JsonObject?

Hi I am having a Java String with following value received from HTTPRequest
{SubRefNumber:"3243 ",QBType:"-----",Question:"<p><img title="format.jpg" src="..."></img></p>"};
As the String contains HTML elements as part of it,while i try to parse the String as JsonObject as below (quesRow is the variable with above String as value)
JSONObject jsonObject = new JSONObject(quesRow);
I get parse error
org.codehaus.jettison.json.JSONException: Expected a ',' or '}' at character 103 of {SubRefNumber:"3243.....
I need to parse the HTML elements within Question Key as a seperate data from this JSONString. is there any way to handle this scenario? Please Guide...TIA
A valid JSON does not contain an unescaped quotation mark (") inside a string (See RFC 7159 Chapter 7 - https://www.rfc-editor.org/rfc/rfc7159#page-9).
There are different options to escape the quotation mark in your source string, already when putting it into the JSON string parameter:
escape with a backslash - "
escape as unicode sequence - \u0022

How to split this string in java need regex?

i need to split this string:
COMITATO: TRIESTE Indirizzo legale: VIA REVOLTELLA 39 34139
Trieste (Trieste) Mob.: 3484503368 Fax: 040310096 Sito web: www.csentrieste.it/
the wanted result must be an array like:
{COMITATO:,TRIESTE,Indirizzo legale:,VIA REVOLTELLA 39 34139
Trieste (Trieste) ,Mob.:,3484503368,Fax:,Sito web:,www.csentrieste.it/}
the problem is also that some attribute of string can be missing so i cant split using the header of attribute like "COMITATO:" or "Indirizzo legale:"
example:if "Indirizzo legale:" its missing string will appear like:
COMITATO: TRIESTE Mob.: 3484503368 Fax: 040310096 Sito web: www.csentrieste.it/
Well, this regex will parse your given inputs:
(?<firstname>.*?):\s*(?<lastname>\w+)(?:(?<occupation>[^:]+):\s*(?<address>.+\n.+))?\sMob.:\s*(?<mobile>\d+)\s*Fax:\s*(?<fax>\d+)\s*Sito web:\s*(?<website>.*)
We can salvage some readability and easy access of the results by using named groups. Nothing too clever about the regex, we just crawl through the string, using what static structure we can to anchor the pattern: the colons, the "Mob", "Fax", and "Sito web". Obviously the "maybe missing" address part is optional.
regex demo here

match a string of characters between tags:

I have the following strings:
<PAUL SAINT-KARL 1997-05-07>
<BOB DEAN 2001-05-07>
<GUY JEDDY 2007-05-07>
I want a java regex that would match this type of pattern "name and date" and then extract the name and date separately.
I able to match them separately with the following java regex:
1) (\d{4}-\d{2}-\d{2})>
2) <([ A-Z&#;0-9-]*+)
What I'm looking for is one regex that would identify the full text pattern as provided, and then extract the subsections, such as the actual name, and the date.
I'm looking to use Matcher.group() to retrieve the complete match from the target string.
Thanks
Try this:
"<([ A-Z&#;0-9-]*?) (\\d{4}-\\d{2}-\\d{2})>"
I changed the *+ to *? to make the * match lazily.

Categories