Regex to find delimiter and qualifier in csv string (java)

Regex to find delimiter and qualifier in csv string (java) - java

Im trying to come up with a regex in java that can be used to extract the used delimiter and qualifier characters in a given csv string. My idea was that instead of matching the wole string Ill just look for the last field, so in pseudocode my regex would look like this:
(match as much of beginning as possible) folowed by
Option 1: (delimiter)(qualifier)(any character)*?(qualifier)(end of string|any linebreak character)
Option 2: (delimiter)((?!reference to delimiter capturing group)[any character])*?(qualifier)(end of string|any linebreak character)
And the regex I came up wih:
([\s\S])*((\W)(?!\3)(\W)[\s\S]*?\4($|\R))|((\W)((?!\7)[\s\S])*?($|\R))
Where group 3 is the delimiter, group 4 the qualifier and group 7 the dlimiter of option 2.
regex 101 link with nonworking example
Is my concept already wrong or only my regex?
Edit: As pointed out in a comment there can be ambigious lines, but the regex doesnt have to 100% find the delimiter/qualifier on a single try. Im fine with a regex that scans multiple lines to get the result. Also this is to be used in a program where the user defines a simple definition of the data he wants to import (which doesnt include the delimiter/qualifier). Specifically the number of fields, which can be used to test which (of the found) delimiters is the right one if there isnt a clear answer after even multiple lines.

Related

Regex for multiple occurrences of specific words

Hello
I'm trying to create a validation rule that checks the regular expression to accept only specific phrases. Regex is based on Java.
Here are examples of correct inputs:
1OR2
2
1 OR 2 OR 15
( 2OR3) AND 1
(12AND13 AND1)OR(4 AND5)
((2AND3 AND 1)OR(4AND5))AND6
but I would be happy if only the regex could accept anything like :
())34AND(4
I have no idea how to create a regex to check if the brackets open and close correctly(they can be nested). I assumed it can be impossible to check it in regex so the proper validation for the brackets I've already made in the code(stack implementation). In the code I have a second step validation of the phrase.
All I need the regex to do is to check if there are these specific things inside the phrase:
numbers, round brackets, words AND and OR with multiple occurrences and whitespaces are allowed.
It should NOT accept letters or other characters.
All I managed to create so far is this:
^[0-9 \\(][0-9 \\(\\)]*
also tried adding something like:
\\b(AND|OR)\\b
inside the second pair of brackets but with no luck.
I cannot figure out how to correct it to add OR and AND words.

I used the following and matched all the inputs you gave:
^[^\)][0-9 \( (AND|OR)]*$
I assumed you didn't want to start with ), which is why I included ^[^\)].
In case you weren't aware, I use https://www.regexpal.com to check my regular expressions for code.

Since you have an arbitrary number of nested elements it's arguably not possible with regex.
For demonstration purposes only, this matches zero or more conjunctions and one set of parenthesis:
^\d+(\s*(?:AND|OR)\s*(\d+|\(?\s*\d+(\s*(?:AND|OR)\s*\d+)\s*\)))*$|^(\d+|\(?\s*\d+(\s*(?:AND|OR)\s*\d+)\s*\))\s*(\s*(?:AND|OR)\s*\d+)*$
That's it. Adding more sets and levels of nested parenthesis leads to exponentially increasing complexity - till it breaks altogether.
Demo

Regular expression to return results that do not match selection

I work on a product that provides a Java API to extend it.
The API provides a function which
takes a Perl regular expression and
returns a list of matching files.
I want to filter the list to remove all files that end in .xml, .xsl and .cfg; basically the opposite of .*(\.xml|\.xsl|\.cfg).
I have been searching but I haven't been able to get anything to work yet.
I tried .*(?!\.cfg) and ^((?!cfg).)*$ and \.(?!cfg$|?!xml$|?!xsl$).
I don't know if I am on the right track or not.
Note
I know the regex systems are similar, but I can't get a Java regex working either.

You may use
^(?!.*\.(x[ms]l|cfg)$).+
See the regex demo
Details:
^ - start of a string
(?!.*\.(x[ms]l|cfg)$) - a negative lookahead that fails the match if any 0+ chars other than line break chars (.*) are followed with xml, xsl or cfg ((x[ms]l|cfg)) at the end of the string ($)
.+ - any 1 or more chars other than linebreak chars. Might be omitted if the entire string match is not required (in some tools it is required though).

You need something like this, which matches only if the end of the string isn't preceded by a dot and one of the three unwanted types
/(?<!\.(?:xml|xsl|cfg))\z/

Regex to match first/last names with optional titles

I created the following regex (Java):
(Lord |Lady |Ser )?(Agatha|John)?([ ]??Cain)?
It's working fine except in one situation (and maybe others I didn't take into account during my tests):
As you can see, when you only have the family name, the regex is also taking the whitespace behind the word. I totally understand why, but I don't know how to fix it.
This regex is used to find persons into a big text file which represents the content of a book. And, of course, it must be compatible with my current working environment (Java).

You can use regex lookback to accomplish your goal.
\b(?<!\S)(?:(Lord|Lady|Ser)\s+)?(Agatha|John)?(?:\s*(?<=\b)(Cain))?(?<=\S)\b # regex101
It has these qualities which seem to match (possibly exceed) your criteria:
The regex match is forced to start with a non-whitespace character.
The first capture will be the title (or empty).
The second capture will be the first name (or empty).
The third capture will be the last name (or empty).
All matches have no leading or trailing whitespace.
Additionally, it will even match through line wraps (shown in additional text in the linked regex test sample).
Title, first, and last names are in singleton groups making additions to the match sets as simple as adding an additional alternation to their respective groups.
A trailing lookbehind insisting on the match ending with a non-whitespace was also added to avoid matching just "Lord " of an otherwise non-matching "Lord X".
A regex101 fiddle with example data is linked to the regex.

combine multiple regex to extract sub string from : separated string

I have been stuck for some time developing a single regex to extract a path from either of the following strings :
1. "life:living:fast"
2. "life"
3. ":life"
4. ":life:"
I have these regex expressions to use :
(.{3,}):", ":(.{3,}):", ":(.{3,})", "(.{3,})
The first match is all I need. i.e. the desired result for each should be the string located where the word life is. consider life to be a variable
But for some reason combining these individual regex's is a pain: If I excecute them sequentially I get the word 'life' extracted. However I am unable to combine them into one.
I appreciate your effort.

If you want the first life with the colons, you can use this:
^:?(?:.{3,}?)(?::|$)
See demo
If you prefer the first life without the colons, switch to this:
((?<=^:)|^)([^:]{3,}?)(?=:|$)
See demo
How it Works #1: ^:?(?:.{3,}?)(?::|$)
With ^:?, at the beginning of the string, we match an optional colon
(?:.{3,}?) lazily matches three or more chars up to...
(?::|$) a colon or the end of the string
How it Works #1: ((?<=^:)|^)([^:]{3,}?)(?=:|$)
((?<=^:)|^) ensures that we are either positioned at the beginning of the string, or after a colon immediately after the beginning of the string
([^:]{3,}?) lazily matches chars that are not colons...
up to a point where the lookahead (?=:|$) can assert that what follows is a colon or the end of the string.

You can use this pattern, since you are looking for the first word:
(?<=^:?)[^:]{3,}
Note that this pattern doesn't check all the string.

Regular Expression - Return all matches as a single match

I'm working with a piece of code that applies a regex to a string and returns the first match. I don't have access to modify the code to return all matches, nor do I have the ability to implement alternative code.
I have the following example target string:
usera,userb,,userc,,userd,usere,userf,
This is a list of comma delimited usernames joined from multiple sources, some of which were blank resulting in two commas in some places. I'm trying to write a regex that will return all of the comma delimited usernames except for specific values.
For example, consider the following expression:
[^,]\w{1,},(?<!(userb|userc|userd),)
This results in three matches:
usera,
usere,
userf,
Is there any way to get these results as a single match, instead of a match collection, e.g. a single match having the text 'usera,usere,userf,' ?
If I could write code in any language this would be trivial, but I'm limited to input of only the target string and the pattern, and I need a single match that has all items except for the ones I'm omitting. I'm not sure if this is even possible, everything I've ever done with regex's involves processing multiple items in a match collection.
Here is an example in Regex Coach. This image shows that there are the three matches I want, but my requirement is to have the text in a single match, not three separate matches.
EDIT1:
To clarify this ticket is specifically intended to solve the use case using only regular expression syntax. Solving this problem in code is trivial but solving it using only a regex was the requirement given the fact that the executing code is part of a 3rd party product that I didn't want to reverse engineer, wrap, or replace.

Is there any way to get these results as a single match, instead of a match collection, e.g. a single match having the text 'usera,usere,userf,'?
No. Regex matches are consecutive.
A regular expression matches a (sub)string from start to finish. You cannot drop the middle part, this is not how regex engines work. But you can apply the expression again to find another matching substring (incremental search - that's what Regex Coach does). This would result in a match collection.
That being said, you could also just match everything you don't want to keep and remove it, e.g.
,(?=[\s,]+)|(userb|userc|userd)[\s,]*
http://rubular.com/r/LOKOg6IeBa

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex to find delimiter and qualifier in csv string (java) - java

Related

Regex for multiple occurrences of specific words

Regular expression to return results that do not match selection

Regex to match first/last names with optional titles

combine multiple regex to extract sub string from : separated string

Regular Expression - Return all matches as a single match

Categories

Resources