I have a dataset of resume and I want to extract data from each resume
I will give an example as a sample to what I need
String test= "Worked in Innovision Information System Private Limited as Project Trainee-Content Writing from Date to Date.";
I want to extract the company name, role (designation), and Date (From-to)
I'm new to regex so please correct me if I'm wrong
the first thing I tried to extract each one of them separately
String regexStr5="Worked in:? \\w+" ;
String regexStr6 ="as:? ([a-zA-Z ]+)";
and for the date Date : (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, \d{4}
How can I put them all together in the same regex?!!
and print the company-Name +role+date
A literal string match would be just fine for above test string.
Regex: Worked in (.*) as (.*) from (.*) to (.*).
Replacement to do: Company Name: \1\nRole (designation): \2\nDate: \3 to \4
Regex101 Demo
Related
This is my string
2007-01-12Jakistxt2008-01-31xxx2008-02-292008-15-102008-19-452009-05-0120999-11-11pppp2001-00-0109-01-012001-01-002009-01-1112009-02-291998-11-11
I tried find date in format YYYY-MM-DD . I know that directly it is not possible.
I managed print this result
2007-01-12
2008-01-31
2008-02-292
2008-19-452
0999-11-11
2001-00-010
2001-01-002
2009-02-291
String regex4="\\d{4}-\\d{2}-\\d{2,3}";
Pattern wzor4=Pattern.compile(regex4);
Pattern wzor5=Pattern.compile(regex5);
Matcher efekt4=wzor4.matcher(wyrazenie);
String rezultat4="";
while (efekt4.find()) {
list422.add(efekt4.group());
}
for(int i=0;i<list422.size();i++) System.out.println(list422.get(i));`
Try this pattern: (?(?<=^)|(?<=\D))\d{4}-\d{2}-\d{2}(?(?=$)|(?=\D)).
It uses \d{4}-\d{2}-\d{2} to match your strng format.
Also, the date cannot be followed or preceeded by any digit:
(?(?<=^)|(?<=\D)) - it's conditional look-behind: if we are at the beginning of the string, then start matching, if not, make sure that what's before is not a digit (\D)
(?(?=$)|(?=\D)) - it's look-ahead analogical to look-behind.
Demo
Alternatively, you could use just \d{4}-\d{2}-\d{2}, which would also match adjacent to each other.
Demo
I put it String regex4="(?(?<=^)|(?<=\D))\d{4}-\d{2}-\d{2}(?(?=$)|(?=\D))";
and Eclipse said
Unknown inline modifier near index 2
(?(?<=^)|(?<=\D))\d{4}-\d{2}-\d{2}(?(?=$)|(?=\D))
^
I try to parse log file:
'Del username /PART="deneme" /ROLE="VR_ANALYST" /TYPE="C" /CAPABILITY="S" /ADD' (S)
'Del Batu /PART="_PROJECT" /ROLE="VR_AUTHOR" /TYPE="R" /CAPABILITY="S" /ADD' (S)
RULE => 'Del input ........ /ROLE="input2" .......
input and input2 will given from user.
In this sentence :
username: should be a parameter (input from user)
VR_ANALYST: should be a parameter (input from user)
'Del: must be in the regex (the first four letters must be 'Del)
/ROLE="": must be in the regex
1)Regex should start with 'Del
2)continue with first input
3)some other strings
4)/ROLE="
5)second input
6)"
7)continue with other strings
I have nearly no knowledge about regex but I try to do :
'Del parameter \"*"\ (/ROLE=") parameter2 (") \*
Could you please give some advice how can create a regex of this sentence and use my paramters in this regex.
1.About regex i'll suggest You some of a online regex checker like (http://www.regexplanet.com/advanced/java/index.html) and just learn a little, its pretty simple ;)
2.If u want to add Your parameters to the regex u can use
replace(oldChar, newChar) on string that contains regex pattern.
Try this REGEX..
^'Del\s*parameter.*/ROLE="parameter2".*
I got the following string to extract some information from:
String: String: String Number;
Right now I'm using the following regex to get the arguments:
(.*?):(.*?):(.*?);$
This way I would get with a Matcher the following output:
group(1) = String
group(2) = String
group(3) = String Number
If I want the number I need to execute another regex on the output of the 3rd group like the following:
([a-zA-Z]* ?([0-9])?$)
Used ont the String String Number this would give me and output like
group(1) = String
group(2) = Number
I thought about combining both steps and use a regex like (.*?):(.*?):([a-zA-Z]* ?([0-9])?);$ on the String: String: String Number;-String. But this does not work and I dont see the reason.
Hwere you go, I added some extra whitespace matching, but this seems to work, you were missing the whitespace between the second : and the following string
^(.*?):\s*(.*?):\s*([a-zA-Z]*\s+([0-9])?);$
need to find an expression for the following problem:
String given = "{ \"questionID\" :\"4\", \"question\":\"What is your favourite hobby?\",\"answer\" :\"answer 4\"},{ \"questionID\" :\"5\", \"question\" :\"What was the name of the first company you worked at?\",\"answer\" :\"answer 5\"}";
What I want to get: "{ \"questionID\" :\"4\", \"question\":\"What is your favourite hobby?\",\"answer\" :\"*******\"},{ \"questionID\" :\"5\", \"question\" :\"What was the name of the first company you worked at?\",\"answer\" :\"******\"}";
What I am trying:
String regex = "(.*answer\"\\s:\"){1}(.*)(\"[\\s}]?)";
String rep = "$1*****$3";
System.out.println(test.replaceAll(regex, rep));
What I am getting:
"{ \"questionID\" :\"4\", \"question\":\"What is your favourite hobby?\",\"answer\" :\"answer 4\"},{ \"questionID\" :\"5\", \"question\" :\"What was the name of the first company you worked at?\",\"answer\" :\"******\"}";
Because of the greedy behaviour, the first group catches both "answer" parts, whereas I want it to stop after finding enough, perform replacement, and then keep looking further.
The pattern
("answer"\s*:\s*")(.*?)(")
Seems to do what you want. Here's the escaped version for Java:
(\"answer\"\\s*:\\s*\")(.*?)(\")
The key here is to use (.*?) to match the answer and not (.*). The latter matches as many characters as possible, the former will stop as soon as possible.
The above pattern won't work if there are double quotes in the answer. Here's a more complex version that will allow them:
("answer"\s*:\s*")((.*?)[^\\])?(")
You'll have to use $4 instead of $3 in the replacement pattern.
The following regex works for me :
regex = "(?<=answer\"\\s:\")(answer.*?)(?=\"})";
rep = "*****";
replaceALL(regex,rep);
The \ and " might be incorrectly escaped since I tested without java.
http://regexr.com?303mm
I have the following strings:
<PAUL SAINT-KARL 1997-05-07>
<BOB DEAN 2001-05-07>
<GUY JEDDY 2007-05-07>
I want a java regex that would match this type of pattern "name and date" and then extract the name and date separately.
I able to match them separately with the following java regex:
1) (\d{4}-\d{2}-\d{2})>
2) <([ A-Z&#;0-9-]*+)
What I'm looking for is one regex that would identify the full text pattern as provided, and then extract the subsections, such as the actual name, and the date.
I'm looking to use Matcher.group() to retrieve the complete match from the target string.
Thanks
Try this:
"<([ A-Z&#;0-9-]*?) (\\d{4}-\\d{2}-\\d{2})>"
I changed the *+ to *? to make the * match lazily.