How to split this string in java need regex? - java

i need to split this string:
COMITATO: TRIESTE Indirizzo legale: VIA REVOLTELLA 39 34139
Trieste (Trieste) Mob.: 3484503368 Fax: 040310096 Sito web: www.csentrieste.it/
the wanted result must be an array like:
{COMITATO:,TRIESTE,Indirizzo legale:,VIA REVOLTELLA 39 34139
Trieste (Trieste) ,Mob.:,3484503368,Fax:,Sito web:,www.csentrieste.it/}
the problem is also that some attribute of string can be missing so i cant split using the header of attribute like "COMITATO:" or "Indirizzo legale:"
example:if "Indirizzo legale:" its missing string will appear like:
COMITATO: TRIESTE Mob.: 3484503368 Fax: 040310096 Sito web: www.csentrieste.it/

Well, this regex will parse your given inputs:
(?<firstname>.*?):\s*(?<lastname>\w+)(?:(?<occupation>[^:]+):\s*(?<address>.+\n.+))?\sMob.:\s*(?<mobile>\d+)\s*Fax:\s*(?<fax>\d+)\s*Sito web:\s*(?<website>.*)
We can salvage some readability and easy access of the results by using named groups. Nothing too clever about the regex, we just crawl through the string, using what static structure we can to anchor the pattern: the colons, the "Mob", "Fax", and "Sito web". Obviously the "maybe missing" address part is optional.
regex demo here

Related

Java regex lookbehind

I want to match a string that has "json" (occurs more than 2 times) and without string "from" between two "json".
For example(what I want the string match or not):
select json,json from XXX -> Yes
select json from json XXXX -> No
select json,XXXX,json from json XXX -> Yes
Why the third is matching because I just want two "json" string occurs without "from" inside between it.
After learning regex lookbehind, I'm write the regex like this:
select.*json.*?(?<!from)json.*from.*
I'm using regex lookbehind to except the from string.
But after test, I find this regex match the string "select get_json_object from get_json_object" too.
What wrong for my regex? Any suggestion is appreciated.
You need to use tempered greedy token for achieving this. Use this regex,
\bjson\b(?:(?!\bfrom\b).)+\bjson\b
This expression (?:(?!\bfrom\b).)+ will match any text that does not contain from as a whole word inside it.
Regex Demo
For matching the whole line, you can use,
^.*\bjson\b(?:(?!\bfrom\b).)+\bjson\b.*$
Like you wanted in your post, this regex will match the line as long as it finds a string where a from does not appear between two jsons
Regex Demo with full line match
Edit:
Why OP's regex select.*json.*?(?<!from)json.*from.* didn't work as expected
Your regex starts matching with select and then .* matches as much as possible, while making sure it finds json ahead followed by some optional characters and then again expects to find a json string then .* matches again some characters then expects to find a from and finally using .* zero or more optional characters.
Let's take an example string that should match.
select json from json json XXXX
It has two json string without from in between so it should match but it doesn't, because in your regex, the order or presence of json and from is fixed which is json then again json then from which is not the case in this string.
Here is a Java code demo
List<String> list = Arrays.asList("select json,json from XXX","select json from json XXXX","select json,json from json XXX","select json from json json XXXX");
list.forEach(x -> {
System.out.println(x + " --> " + x.matches(".*\\bjson\\b(?:(?!\\bfrom\\b).)+\\bjson\\b.*"));
});
Prints,
select json,json from XXX --> true
select json from json XXXX --> false
select json,json from json XXX --> true
select json from json json XXXX --> true

Set RegEx in Java to be non-greedy by default

I have Strings like the following:
"parameter: param0=true, param1=401230 param2=asset client: desktop"
"parameter: param0=false, param1=15230 user: user213 client: desktop"
"parameter: param0=false, param1=51235 param2=asset result: ERROR"
The pattern is parameter:, then the param's, and after the params either client: and/or user: and/or result.
I want to match the stuff between parameter: and the first occurrence of either client:, user: or result:
So for the 2nd String it should match param0=false, param1=15230.
My regex is:
parameter:\s+(.*)\s+(result|client|user):
But now if I match the 2nd String it captures param0=false, param1=15230 user: user213 (looks like regex is matching greedy)
How to fix this? parameter:\s+(.*)\s+(result|client|user)+?: won't fix it
With this regex tester I can add the modifier U to the regex to make regex lazy by default, is this possible in Java too?
Try putting the ? character inside the first captured group (the subpattern you intend to extract):
parameter:\\s+(.*?)\\s+(result|client|user):
No. There is no ungreedy modifier in Java. You have to use ? behind modifiers to make the quantifiers as lazy capture.
This means you should denote all quantifiers with a ?, see the following pattern:
"parameter:\\s+?(.*?)\\s+?(result|client|user):"
Specified by:http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

java regexp for reluctant matching

need to find an expression for the following problem:
String given = "{ \"questionID\" :\"4\", \"question\":\"What is your favourite hobby?\",\"answer\" :\"answer 4\"},{ \"questionID\" :\"5\", \"question\" :\"What was the name of the first company you worked at?\",\"answer\" :\"answer 5\"}";
What I want to get: "{ \"questionID\" :\"4\", \"question\":\"What is your favourite hobby?\",\"answer\" :\"*******\"},{ \"questionID\" :\"5\", \"question\" :\"What was the name of the first company you worked at?\",\"answer\" :\"******\"}";
What I am trying:
String regex = "(.*answer\"\\s:\"){1}(.*)(\"[\\s}]?)";
String rep = "$1*****$3";
System.out.println(test.replaceAll(regex, rep));
What I am getting:
"{ \"questionID\" :\"4\", \"question\":\"What is your favourite hobby?\",\"answer\" :\"answer 4\"},{ \"questionID\" :\"5\", \"question\" :\"What was the name of the first company you worked at?\",\"answer\" :\"******\"}";
Because of the greedy behaviour, the first group catches both "answer" parts, whereas I want it to stop after finding enough, perform replacement, and then keep looking further.
The pattern
("answer"\s*:\s*")(.*?)(")
Seems to do what you want. Here's the escaped version for Java:
(\"answer\"\\s*:\\s*\")(.*?)(\")
The key here is to use (.*?) to match the answer and not (.*). The latter matches as many characters as possible, the former will stop as soon as possible.
The above pattern won't work if there are double quotes in the answer. Here's a more complex version that will allow them:
("answer"\s*:\s*")((.*?)[^\\])?(")
You'll have to use $4 instead of $3 in the replacement pattern.
The following regex works for me :
regex = "(?<=answer\"\\s:\")(answer.*?)(?=\"})";
rep = "*****";
replaceALL(regex,rep);
The \ and " might be incorrectly escaped since I tested without java.
http://regexr.com?303mm

Extracting data from a text file - repeated values

79 0009!017009!0479%0009!0479 0009!0469%0009!0469
0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449
0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419
0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009
0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339
0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032
In this data, I'm supposed to extract the number 47, 46 , 45 , 44 and so on. I´m supposed to avoid the rest. The numbers always follow this flow - 9!0 no 9%
for example: 9!0 42 9%
Which language should I go about to solve this and which function might help me?
Is there any function that can position a special character and copy the next two or three elements?
Ex: 9!0 42 9% and ' 009
look out for ! and then copy 42 from there and look out for ' that refers to another value (009). It's like two different regex to be used.
You can use whatever language you want, or even a unix command line utility like sed, awk, or grep. The regex should be something like this - you want to match 9!0 followed by digits followed by 0%. Use this regex: 9!0(\d+)0% (or if the numbers are all two digits, 9!0(\d{2})0%).
The other answers are fine, my regex solution is simply "9!.(\d\d)"
And here's a full solution in powershell, which can be easily correlated to other .net langs
$t="79 0009!017009!0479%0009!0479 0009!0469%0009!0469 0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449 0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419 0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009 0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339 0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032"
$p="9!.(\d\d)"
$ms=[regex]::match($t,$p)
while ($ms.Success) {write-host $ms.groups[1].value;$ms=$ms.NextMatch()}
This is perl:
#result = $subject =~ m/(?<=9!0)\d+(?=9%)/g;
It will give you an array of all your numbers. You didn't provide a language so I don't know if this is suitable for you or not.
Pattern regex = Pattern.compile("(?<=9!0)\\d+(?=9%)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

match a string of characters between tags:

I have the following strings:
<PAUL SAINT-KARL 1997-05-07>
<BOB DEAN 2001-05-07>
<GUY JEDDY 2007-05-07>
I want a java regex that would match this type of pattern "name and date" and then extract the name and date separately.
I able to match them separately with the following java regex:
1) (\d{4}-\d{2}-\d{2})>
2) <([ A-Z&#;0-9-]*+)
What I'm looking for is one regex that would identify the full text pattern as provided, and then extract the subsections, such as the actual name, and the date.
I'm looking to use Matcher.group() to retrieve the complete match from the target string.
Thanks
Try this:
"<([ A-Z&#;0-9-]*?) (\\d{4}-\\d{2}-\\d{2})>"
I changed the *+ to *? to make the * match lazily.

Categories