Java-read lines of files and extract only the helpfull info - java

I have files file1, file2 contains contents such:
[2017-02-01 10:00:00 start running [error:yes] [doing:no] [finish:] [remind:] [alarmno:123456789] [logno:123456789] [ref:-1:2:-1:-1:-1] [type:2:big issues happen] [flag:0:]]<--- this line1
Line2:
The same line 1 except date, type, logno and alarmno sometimes contains + or - signs.
... The other lines
I already read all those lines to list of string myLines.
(Note: the contents of file1 will be the first element of myLines seperated by comma and the second element of myLines will be the contents of file2 seprated by comma and so on.
For exmple this the first element of myLines list:
[2017-02-01 10:00:00 start running [error:yes] [doing:no] [finish:] [remind:] [alarmno:123456789] [logno:123456789] [ref:-1:2:-1:-1:-1] [type:2:big issues happen] [flag:0:],
2017-02-01 10:00:00 start running [error:yes] [doing:no] [finish:] [remind:] [alarmno:123456789] [logno:123456789] [ref:-1:2:-1:-1:-1] [type:2:big issues happen] [flag:0:]]
<--- this the first element of myLines list its the contents of file1
If the file contains one line that mean the first element of myLines list will only contains that line only without comma seprated.
I want only
The date at the first of each lines
The alarmno(only the digits no, not the word for exmample in the
above line: 123456789)
The logno in the above line (123456789)
The type for example in the above line the following text (big
issues happen)
This what I tried:
String regex = "\\d{2}:\\d{2}:\\d{2}\\s+\\w*\\s+\\w*\\s+\\[\\w*:\\w*]\\s+\\[\\w*:\\]\\s+\\[\\w*:\\]\\s+\\[\\w*:\\]";
String s=null;
for(int i=0; i<myLines.size(); i++)
{
s = myLines.get(i).replaceAll(regex," ");
}
But that result the date and the alarmno:12345... and the other line contents.
I even tried to repeat that expression but not help me.
Are there any way to implement that in java?

You may use
^\[?(\d[\d-]+).*?\[alarmno:(\w*)].*?\[logno:(\w*)].*?\[type:\w*:([^\]]*)]
See the regex demo
Details:
^ - start of string
\[? - an optional [
(\d[\d-]+) - Group 1: a digits and 1 or more digits or -s
.*? - any 0+ chars other than line break chars as few as possible
\[alarmno: - a [alarmno: substring
(\w*) - Group 2: 0+ word chars
] - a literal ]
.*? - any 0+ chars other than line break chars as few as possible
\[logno: - a literal [logno: substring
(\w*) - Group 3: 0+ word chars
] - a ]
.*? - any 0+ chars other than line break chars as few as possible
\[type: - a [type: substring
\w* - 0+ word chars
: - a colon
([^\]]*) - Group 4: 0+ chars other than ]
] - a ]
Java demo:
String s = "[2017-08-17 08:00:00 Comming in [Contact:NO] [REF:] [REF2:] [REF3:] [Name:+AA] [Fam:aa] [TEMP:-2:0:-2:0:-2] [Resident:9:free] [end:0:]";
Pattern pat = Pattern.compile("^\\[*(\\d[\\d: -]+\\d).*?\\[Name:([^]]*)].*?\\[Fam:(\\w*)].*?\\[Resident:\\w*:([^]]*)]");
Matcher matcher = pat.matcher(s);
if (matcher.find()){
System.out.println("Date: " + matcher.group(1));
System.out.println("Name: " + matcher.group(2));
System.out.println("Fam: " + matcher.group(3));
System.out.println("Resident: " + matcher.group(4));
}
Output:
Date: 2017-08-17 08:00:00
Name: +AA
Fam: aa
Resident: free

Related

Java - How to validate regex expression with multiple parentheses then extract the components from the string?

I have an input string like:
abc(123:456),def(135.666:3434.777),ghi("2015-06-07T09:01:05":"2015-07-08")
Basically, it is (naive idea with regex):
[a-zA-Z0-9]+(((number)|(quoted datetime)):((number)|(quoted datetime)))(,[a-zA-Z0-9]+(((number)|(quoted datetime)):((number)|(quoted datetime))))+?
How can I make a regex pattern in Java to validate that the input string follows this pattern and then I can extract the values [a-zA-Z0-9]+ and ((number)|(quoted datetime)):((number)|(quoted datetime)) from them?
You can use
(\w+)\((\d+(?:\.\d+)?|\"[^\"]*\"):(\d+(?:\.\d+)?|\"[^\"]*\")\)
See the regex demo.
In Java, it can be declared as:
String regex = "(\\w+)\\((\\d+(?:\\.\\d+)?|\"[^\"]*\"):(\\d+(?:\\.\\d+)?|\"[^\"]*\")\\)";
Details:
(\w+) - Group 1: one or more word chars
\( - a ( char
(\d+(?:\.\d+)?|\"[^\"]*\") - Group 2: one or more digits optionally followed with . and one or more digits, or ", zero or more chars other than " and then a " char
: - a colon
(\d+(?:\.\d+)?|\"[^\"]*\") - Group 3: one or more digits optionally followed with . and one or more digits, or ", zero or more chars other than " and then a " char
\) - a ) char

In a string i want to replace all words inside square bracket with its 3rd square block string

I have a string like " case 1 is good [phy][hu][get] my dog is [hy][iu][put] [phy][hu][gotcha]"
I want the result string as " case 1 is good get my dog is [hy][iu][put] gotcha "
Basically, I want all the substrings of the format [phy][.*][.*] to be replaced with the content of the last (third) square bracket.
I tried using this regex pattern "\[phy\]\.[^\]]*]\.\[(.*?(?=\]))]" , but I am unable to think of a way that will solve my problem without having to iterate through each matching substring.
You may use
\[phy\]\[[^\]\[]*\]\[([^\]\[]*)\]
and replace with $1. See the regex demo and the Regulex graph:
Details
\[phy\] - [phy] substring
\[ - [ char
[^\]\[]* - 0 or more chars other than [ and ]
\] - a ] char
\[ - [ char
([^\]\[]*) - Capturing group 1 ($1 is its value in the replacement pattern) that matches zero or more chars other than [ and ]
\] - a ] char
Java usage demo
String input = "case 1 is good [phy][hu][get] my dog is [hy][iu][put] [phy][hu][gotcha]";
String result = input.replaceAll("\\[phy]\\[[^\\]\\[]*]\\[([^\\]\\[]*)]", "$1");
System.out.println(result);
// => case 1 is good get my dog is [hy][iu][put] gotcha

Java Pattern matcher and RegEx

I need RegEX help please... basically a pattern that matches the following strings
G1:k6YxekrAP71LqRv[P:3]
G1:k6YxekrAP71LqRv[S:2,3,4|P:3]
G1:k6YxekrAP71LqRv[P:3|R:2,3,4,5]
G1:k6YxekrAP71LqRv[S:2,3,4|P:3|R:2,3,4,5]
"G1:k6YxekrAP71LqRv" and "P:3" are the main thing to match
I've done the below to match the first string but got lost with the rest.
G1:k6YxekrAP71LqRv(\[|\|)P:3(\||\])
If I am not mistaken, the strings all begin with G1:k6YxekrAP71LqRv.
After that, there is [P:3] by itself, or with either left S:2,3,4|, right |R:2,3,4,5 or with both left and right. The values 2,3,4 and 2,3,4,5 could be repetitive digits divided by a comma.
To match the full pattern you could use:
(G1:k6YxekrAP71LqRv)\[(?:S:(?:\d,)+\d\|)?(P:3)(?:\|R:(?:\d,)+\d)?\]
Explanation
(G1:k6YxekrAP71LqRv) # Match literally in group 1
\[ # Match [
(?: # Non capturing group
S: # Match literally
(?:\d,)+\d\| # Match repeatedly a digit and comma one or more times
\d\| # Followed by a digit and |
)? # Close group and make it optional
(P:3) # Capture P:3 in group 2
(?: # Non capturing group
\|R: # match |R:
(?:\d,)+ # Match repeatedly a digit and comma one or more times
\d # Followed by a digit
)? # Close group and make it optional
\] # Match ]
Java Demo
And for the (?:\d,)+\d you could also use 2,3,4 and 2,3,4,5 fi you want to match those literally.
To match the whole string with G1:k6YxekrAP71LqRv at the start and should contain P:3, you could use a positive lookahead (?=.*P:3):
\AG1:k6YxekrAP71LqRv(?=.*P:3).*\z
Solution:
"((G1:k6YxekrAP71LqRv)\\[.*(?<=\\||\\[)P:3(?=\\]|\\,|\\|)[^\\]]*\\])"
Explanation:
\\ - this is used in the regex to escape characters that have special meaning in regex
G1:k6YxekrAP71LqRv these characters need to be matched literally (matching group #1 in parenthesis ("()")
\\[.* - [ character and after it any character zero or more times
(?<=\\||\\[)P:3 - positive lookbehind - here you say, you want P:3 to be preceded by | OR [
AND
P:3(?=\\]|\\,|\\|) - positive lookahead - P:3 to be followed only by ] OR , OR | (if you don't want to match e.g.: P:3,4, simply delete the following part from the regex: |\\,
(P:3) - capturing group #2
[^\\]]* - there can appear zero or more characters other than ]
\\] - ] character at the end of match
Code to check pattern:
String s1 = "G1:k6YxekrAP71LqRv[P:3]";
String s2 = "G1:k6YxekrAP71LqRv[S:2,3,4|P:3]";
String s3 = "G1:k6YxekrAP71LqRv[P:3|R:2,3,4,5]";
String s4 = "G1:k6YxekrAP71LqRv[S:2,3,4|P:3|R:2,3,4,5]";
String withCommaAfter = "G1:k6YxekrAP71LqRv[S:2,3,4|P:3,4]";
String notMatch1 ="G1:k6YxekrAP71LqRv[P:33]";
String notMatch2 = "G1:k6YxekrAP71LqRv[S:2,3,4|P:33]";
String[] sampleStrings = new String[] {s1, s2, s3, s4, withCommaAfter, notMatch1, notMatch2}; // to store source strings and to print results in a loop
Pattern p = Pattern.compile("(G1:k6YxekrAP71LqRv)\\[.*(?<=\\||\\[)(P:3)(?=\\]|\\,|\\|)[^\\]]*\\]");
for(String s : sampleStrings) {
System.out.println("Checked String: \"" + s + "\"");
Matcher m = p.matcher(s);
while(m.find()) { // if match is found print the following line to the console
System.out.println("\t whole String : " + m.group());
System.out.println("\t G1...qRv part : " + m.group(1));
System.out.println("\t P:3 part : " + m.group(2) + "\n");
}
}
Output that you get if you want String withCommaAfter to be matched too (if you don't want it to be matched, delete |\\, from the regex:
Checked String: "G1:k6YxekrAP71LqRv[P:3]"
whole String : G1:k6YxekrAP71LqRv[P:3]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[S:2,3,4|P:3]"
whole String : G1:k6YxekrAP71LqRv[S:2,3,4|P:3]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[P:3|R:2,3,4,5]"
whole String : G1:k6YxekrAP71LqRv[P:3|R:2,3,4,5]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[S:2,3,4|P:3|R:2,3,4,5]"
whole String : G1:k6YxekrAP71LqRv[S:2,3,4|P:3|R:2,3,4,5]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[S:2,3,4|P:3,4]"
whole String : G1:k6YxekrAP71LqRv[S:2,3,4|P:3,4]
G1...qRv part : G1:k6YxekrAP71LqRv
P:3 part : P:3
Checked String: "G1:k6YxekrAP71LqRv[P:33]"
Checked String: "G1:k6YxekrAP71LqRv[S:2,3,4|P:33]"

What is the best regex which can split list of http headers?

My header list format is string of:
"headerName1:value1,headerName2:value2,headerName3:value3,..."
So since a comma can be present in headers, splitting using it might be a problem.
So what would be the characters that might not be present within the headers which I can use for splitting?
This is my code:
public List<Header> getHeaders(String headers) {
List<Header> headersList = new ArrayList<>();
if (!"".equals(headers)) {
String[] spam = headers.split(",");
for (String aSpam : spam) {
String[] header = aSpam.split(":",2);
if (header.length > 1) {
headersList.add(new Header(header[0], header[1]));
} else {
throw new HTTPSinkAdaptorRuntimeException("Invalid format");
}
}
return headersList;
}
My desired output is an array, {"headerName1:value1", "headerName2:value2", "headerName3:value3", ...}
The problem is: "From: Donna Doe, chief bottle washer ,TO: John Doe, chief bottle washer "
scenario like this it does not work well.
I believe you want to extract any 1+ word chars before : as key and then any number of any chars before the end of string or the first sequence of 1+ word chars followed with :.
You may consider using
(\w+):([^,]*(?:,(?!\s*\w+:)[^,]*)*)
which is an unrolled variant of (\w+):(.*?)(?=\s*\w+:|$) regex. See the regex demo.
Details:
(\w+) - Group 1 (key)
: - a colon
([^,]*(?:,(?!\s*\w+:)[^,]*)*) - Group 2 (value):
[^,]* - zero or more chars other than ,
(?:,(?!\s*\w+:)[^,]*)* - zero or more sequences of:
,(?!\s*\w+:) - comma not followed with 0+ whitespaces and then 1+ word chars + :
[^,]* - zero or more chars other than ,
The (.*?)(?=\s*\w+:|$) is more readable, but less efficient. It captures into Group 2 any 0+ chars other than line break chars (with (.*?)), but as few as possible (due to *?) up to the first occurrence of end of string ($) or 0+ whitespaces + 1 or more word chars + : (with the (?=\s*\w+:|$) positive lookahead).
See the Java demo:
Map<String,String> hash = new HashMap<>();
String s = "headerName1:va,lu,e1, headerName2:v,a,lue2,headerName3:valu,,e3,hn:dddd, ddd:val";
Pattern pattern = Pattern.compile("(\\w+):([^,]*(?:,(?!\\s*\\w+:)[^,]*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
hash.put(matcher.group(1), matcher.group(2));
}
System.out.println(hash);
// => {headerName1=va,lu,e1, ddd=val, headerName2=v,a,lue2, hn=dddd, headerName3=valu,,e3}

Regex that will match 6 characters that only allows digits, leading, and trailing spaces

The regex that I'm trying to implement should match the following data:
123456
12345 
 23456
     5
1     
      
  2   
 2345 
It should not match the following:
12 456
1234 6
 1 6
1 6
It should be 6 characters in total including the digits, leading, and trailing spaces. It could also be 6 characters of just spaces. If digits are used, there should be no space between them.
I have tried the following expressions to no avail:
^\s*[0-9]{6}$
\s*[0-9]\s*
You can just use a *\d* * pattern with a restrictive (?=.{6}$) lookahead:
^(?=.{6}$) *\d* *$
See the regex demo
Explanation:
^ - start of string
(?=.{6}$) - the string should only have 6 any characters other than a newline
* - 0+ regular spaces (NOTE to match horizontal space - use [^\S\r\n])
\d* - 0+ digits
* - 0+ regular spaces
$ - end of string.
Java demo (last 4 are the test cases that should fail):
List<String> strs = Arrays.asList("123456", "12345 ", " 23456", " 5", // good
"1 ", " ", " 2 ", " 2345 ", // good
"12 456", "1234 6", " 1 6", "1 6"); // bad
for (String str : strs)
System.out.println(str.matches("(?=.{6}$) *\\d* *"));
Note that when used in String#matches(), you do not need the intial ^ and final $ anchors as the method requires a full string match by anchoring the pattern by default.
You can also do:
^(?!.*?\d +\d)[ \d]{6}$
The zero width negative lookahead (?!.*?\d +\d) ensures that the lines having space(s) in between digits are not selected
[ \d]{6} matches the desired lines that have six characters having just space and/or digits.

Categories