Parse a log file using java.regex.Matcher

Parse a log file using java.regex.Matcher - java

I am learning Java programming. I have a Cisco log:
String logLine="Jul 15 21:12:41 router_provider_pe2 57: *Jul 15 21:12:26.223: %LDP-5-NBRCHG: LDP Neighbor 10.1.1.34:0 (3) is UP";
I am trying this regular expression:
String logPattern = "([\\w]+\\s[\\d]+\\s[\\d:]+) (\\d+:) ([*\\w]+\\s[\\d]+\\s[\\d:]+:) (\\w.+)";
But it is not fine. Could you help me?

Your string:
"Jul 15 21:12:41 router_provider_pe2 57: *Jul 15 21:12:26.223: %LDP-5-NBRCHG: LDP Neighbor 10.1.1.34:0 (3) is UP"
Your pattern:
"([\w]+\s[\d]+\s[\d:]+) (\d+:) ([*\w]+\s[\d]+\s[\d:]+:) (\w.+)"
The part of the pattern in the first set of parentheses matches Jul 15 21:12:41. The pattern expects this to be followed by a space, and then by at least one digit. But the string at this point contains a space and the letter r, which is not a digit. Therefore, there is no match.

Related

How to get integers values from a mixed string

Say I have a string :
String words= "28282 Hello, my name is potato, my age is 245 112 274 141 bla bla bla etc";
Is there any way in java to get only the age 245 112 274 141 without having to use substring() . Because as my string's length in which I'm working on are inconsistent so I cannot use the substring() method

Using Streams (Java 8+), it's quite simple:
Stream.of("28282 Hello, my name is potato, my age is 245 112 274 141 bla bla bla etc"
.replaceAll("^(\\d*\\s*)?(.*)", "$2").split("\\s+"))
.filter(part -> part.matches("^\\d+$"))
.collect(Collectors.joining(" "));

you can use regex as words.replaceAll(".*age\\D*([\\d\\s]+).*","$1").split("\\s+")
This will fetch all the digits as string , Optionally you can split the to convert them into an array of numbers
Demo
.*age\\D*([\\d\\s]+).* : .* match anything, match age then match 0 or more non-digit number
([\\d\\s]+).* : match and capture, 1 or more ( digit and space ) represented as $1 then match anything till line break
Java Demo

Regex to merge multiple numbers with spaces in one line

I need a Regex to merge multiple numbers in a line without merging them all together.
Example line :
Hello World9.99 123 456.00 7 890 123.45 0.97
My desired output is :
Hello World9.99 123456.00 7890123.45 0.97
I know basic regex but am not experienced with lookaheads/behinds.
So far I created this method :
final String regex = "(?<!\\.\\d{1,3})\\s+(?=\\d{1,3}\\.?\\d{2}?)";
public String mergeNumbers(String s){
return s.replaceAll(regex, "");
}
This works fine if the number tied to the word has a dot.
But I just can't figure out how to match this line without a dot at the beginning :
Hello World99 123 456.00 7 890 123.45 0.97
This is returning :
Hello World99123456.00 7890123.45 0.97
but I want :
Hello World99 123456.00 7890123.45 0.97
So my question is :
How can I modify my regex to match both cases?

I suggest using
.replaceAll("\\b(?<!\\.)(\\d+)\\s+(?=\\d)", "$1")
See the regex demo.
Details:
\b - a word boundary
(?<!\.) - there can be no . immediately before the current location
(\d+) - Group 1 (referred to with $1 backreference from the string replacement pattern): one or more digits
\s+ - 1+ whitespaces
(?=\\d) - there must be a digit immediately to the right of the current location.

How to extract sub-strings for a collection of text?

I extracted text from pdf document. .. I want to extract some particular fields in it using java..
The portion of text ..
US00RE44697E (i9) United States (12) Reissued Patent (10)
Patent Number: RE44,697 E Jones et al. (45) Date of
ReissuedPatent: Jan. 7, 2014 (54) ENCRYPTIONPROCESSORWITH SHARED
MEMORY INTERCONNECT (75) Inventors: David E.Jones, Ottawa
(CA); Cormac M.O'Connell, Carp (CA) (73) Assignee: Mosaid
Technologies Incorporated, Ottawa, Ontario (CA) (21)
Appl.No.: 13/603,137 (22) Filed: Sep. 4, 2012 Related U.S.
Patent Documents Reissue of: (64) Patent No.: Issued:
Appl. No.: Filed: 6,088,800 Jul. 11, 2000
09/032,029 Feb. 27, 1998 (51) Int.CI. G06F 21/00
(2013.01) (52) U.S. CI. USPC .............713/189; 713/190;
713/193; 380/28; 380/33; 380/52 (58) Field of Classification
Search None
Now my mission is to extract fields form it and give to strings.. that is
the text (10) Patent Number: RE44,697 E will be extracted as String pat_no= " RE44,697 E"
the text (54) ENCRYPTIONPROCESSORWITH SHARED
MEMORY INTERCONNECT will be extracted as String title= "ENCRYPTIONPROCESSORWITH SHARED
MEMORY INTERCONNECT"
the extremely irregular text block
(64) Patent No.: Issued: Appl. No.: Filed:
6,088,800 Jul. 11, 2000 09/032,029 Feb. 27, 1998
have to be extracted as
String pat_no_org = "6,088,800";
String issued = "jul.11,2000"
String filed = "feb 27 ,1998"
......
like this..
My Works
First i used the string.split , string.substring , string,indexof and even apache string utils , but none helped.. Because the text are scattered , above methods doesn't helped.. I also tried regular expressions ,but since I very weak in it I can't program .
Please tell me how to achieve my objective using java ?

With regex, I would split it in 3 parts:
1.) (10) Patent Number the regex could look like this:
\(10\)\s*Patent Number:\s*([\w,]+)
as a java string:
"\\(10\\)\\s*Patent Number:\\s*([\\w,]+)"
The matches for the first parenthesized group will be in [1].
\s is a shorthand for [ \t\r\n\f] any kind of white-space.
\w is a shorthand for [A-Za-z0-9_] word-characters, together with , in a character class.
Some characters have special meanings in regex. They have to be escaped with a backslash.
2.) (54) ENCRYPT...
A pattern could look like:
(?s)\(54\)\s*(.*?)\s*(?=\(\d|$\))
as a java string:
"(?s)\\(54\\)\\s*(.*?)\\s*(?=\\(\\d|$\\))"
(?s) The s modifier equals Pattern.DOTALL where the dot matches new-lines too.
(?=\(\d|$\)) a lookahead is used, to match (.*?) lazy any amount of any characters until another ( followed by a digit | or string-end $ (anchor for end) is seen.
3.) For the other desired 3 parts I would try to reflect formatting of the input with the pattern. This requires, that all data is constructed compatible. A pattern could look like this:
(?s)\(64\).*?Filed:\s*([\d,]+)\s*(\w+\.\s*\d+,\s*\d+)\s*\n[\d+][^\n]+\n\s*(\w+\.\s*\d+,\s*\d+)
as a java string:
"(?s)\\(64\\).*?Filed:\\s*([\\d,]+)\\s*(\\w+\\.\\s*\\d+,\\s*\\d+)\\s*\\n[\\d+][^\\n]+\\n\\s*(\\w+\\.\\s*\\d+,\\s*\\d+)"
\n matches a newline.
Matches will be in [1] e.g. 6,088,800, [2] e.g. Jul. 11, 2000 and [3] e.g. Feb. 27, 1998.
For getting started with regex, this is too much information at once :)

Regex: strip all tags except those containing keyword "univ"

[introduction][position]Lead Researcher and Research Manager[/position] in the [affiliation]Web Search and Mining Group, Microsoft Research[/affiliation]</b>.
I am a [position]lead researcher[/position] at [affiliation]Microsoft Research[/affiliation]. I am also [position]adjunct professor[/position] of [affiliation]Peking University[/affiliation], [affiliation]Xian Jiaotong University[/affiliation] and [affiliation]Nankai University[/affiliation].
I joined [affiliation]Microsoft Research[/affiliation] in June 2001. Prior to that, I worked at the Research Laboratories of NEC Corporation.
I obtained a [bsdegree]B.S.[/bsdegree] in [bsmajor]Electrical Engineering[/bsmajor] from [bsuniv]Kyoto University[/bsuniv] in [bsdate]1988[/bsdate] and a [msdegree]M.S.[/msdegree] in [msmajor]Computer Science[/msmajor] from [msuniv]Kyoto University[/msuniv] in [msdate]1990[/msdate]. I earned my [phddegree]Ph.D.[/phddegree] in [phdmajor]Computer Science[/phdmajor] from the [phduniv]University of Tokyo[/phduniv] in [phddate]1998[/phddate].
I am interested in [interests]statistical learning[/interests], [interests]natural language processing[/interests], [interests]data mining, and information retrieval[/interests].[/introduction]
I'm able to strip all tags from the paragraph above with:
String stripped = html.replaceAll("\\[.*?\\]", "");
But I'd like to keep three pairs of tags in the paragraph, which are [bsuniv][/bsuniv],[msuniv][/msuniv] and [phduniv][/phduniv]. In other words, I don't want to strip those tags containing the keyword "univ". I can't find a convenient way to rewrite the regular expression. Anyone help me?

You can use a negative-look ahead assertion here: -
str = str.replaceAll("\\[(.(?!univ))*?\\]", "");
or: -
str = str.replaceAll("\\[((?!univ).)*?\\]", "");
Both of them will give you the desired output. There is only one difference -
The first one does a negative look-ahead, against the current character, and if it is not followed by univ, it moves to the next character.
The second one does a negative look-ahead against an empty string before every character, and if it is not followed by univ, it goes ahead to match a single character.

Extracting data from a text file - repeated values

79 0009!017009!0479%0009!0479 0009!0469%0009!0469
0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449
0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419
0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009
0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339
0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032
In this data, I'm supposed to extract the number 47, 46 , 45 , 44 and so on. I´m supposed to avoid the rest. The numbers always follow this flow - 9!0 no 9%
for example: 9!0 42 9%
Which language should I go about to solve this and which function might help me?
Is there any function that can position a special character and copy the next two or three elements?
Ex: 9!0 42 9% and ' 009
look out for ! and then copy 42 from there and look out for ' that refers to another value (009). It's like two different regex to be used.

You can use whatever language you want, or even a unix command line utility like sed, awk, or grep. The regex should be something like this - you want to match 9!0 followed by digits followed by 0%. Use this regex: 9!0(\d+)0% (or if the numbers are all two digits, 9!0(\d{2})0%).

The other answers are fine, my regex solution is simply "9!.(\d\d)"
And here's a full solution in powershell, which can be easily correlated to other .net langs
$t="79 0009!017009!0479%0009!0479 0009!0469%0009!0469 0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449 0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419 0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009 0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339 0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032"
$p="9!.(\d\d)"
$ms=[regex]::match($t,$p)
while ($ms.Success) {write-host $ms.groups[1].value;$ms=$ms.NextMatch()}

This is perl:
#result = $subject =~ m/(?<=9!0)\d+(?=9%)/g;
It will give you an array of all your numbers. You didn't provide a language so I don't know if this is suitable for you or not.
Pattern regex = Pattern.compile("(?<=9!0)\\d+(?=9%)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse a log file using java.regex.Matcher - java

Related

How to get integers values from a mixed string

Regex to merge multiple numbers with spaces in one line

How to extract sub-strings for a collection of text?

Regex: strip all tags except those containing keyword "univ"

Extracting data from a text file - repeated values

Categories

Resources