Unable to find pattern in Java - java

I have been trying to use pattern matcher to find the specific pattern and I have created the regex pattern through this website and it shows that the pattern is found in the text file I wanted to read.
Extra info.: This code works like this : Start reading the textfile,
when meet >D10, enter another loop and get the information until the
next >D10 is found. Loop this process until EOF.
My sample text file:
D14*
Y7620D03*
X247390Y66680D03*
X251540Y160150D03*
G01Y136780*
G03X-374970Y133680I3100J0*
D17*
Y7620D03*
X247390Y66680D03*
X251540Y160150D03*
G01Y136780*
G03X-374970Y133680I3100J0*
My pattern code in java:
private final Pattern PinNamePattern = compile("(D[1-9][0-9])\\*");
private final Pattern LocationXYPattern = compile("^(G0[1-3])?(X|Y)(-?[\\d]+)(D0[1-3])?\\*",Pattern.MULTILINE);
private final Pattern LocationXYIJPattern = compile("^(G0[1-3])?X(-?[\\d]+)?Y(-?[\\d]+)?I?(-?[\\d]+)?J?(-?[\\d]+)?(D0[1-3])?\\*",Pattern.MULTILINE);
My code in java:
while ((line = br.readLine()) != null) {
Matcher pinNameMatcher = PinNamePattern.matcher(line);
//If found Aperture Name
if (pinNameMatcher.find()) {
currentApperture = pinNameMatcher.group(1);
System.out.println(currentApperture);
pinNameMatcher.reset();
//Start matching Location X Y I J
//Will keep looping as long as next aperture name not found
//Second While loop
while (!(pinNameMatcher.find()) ) {
line = br.readLine();
Matcher locXYMatcher = LocationXYPattern.matcher(line);
Matcher locXYIJMatcher = LocationXYIJPattern.matcher(line);
LineNumber++;
if (locXYMatcher.find()) {
System.out.println("XY FOUND");
if (locXYIJMatcher.find()) {
System.out.println("XYIJ FOUND");
}
}
However, when I'm using java to read, the pattern just simply cannot be found. Is there anything I missed out or am I doing it wrong? I have tried removing the "^" and MULTILINE flag but the pattern is still not found.

Your regex looks and works fine, it's possible you aren't searching it properly.
String s = "G03X-374970Y133680I3100J0*";
Pattern pattern = Pattern.compile("^(G0[1-3])?X(-?[\\d]+)?Y(-?[\\d]+)?I?(-?[\\d]+)?J?(-?[\\d]+)?(D0[1-3])?\\*");
Matcher m = pattern.matcher(s);
while (m.find()) {
String s = m.group(0);
System.out.println(s); // prints G03X-374970Y133680I3100J0*
}
In your updated code, you are looking for the second and third pattern only when the first pattern matches, which is probably not what you want. Try using this as a foundation and improving upon it:
while ((line = br.readLine()) != null) {
Matcher pinNameMatcher = PinNamePattern.matcher(line);
if (pinNameMatcher.find()) {
currentApperture = pinNameMatcher.group(0);
System.out.println(currentApperture);
}
Matcher locXYMatcher = LocationXYPattern.matcher(line);
if (locXYMatcher.find()) {
System.out.println(locXYMatcher.group(0));
}
Matcher locXYIJMatcher = LocationXYIJPattern.matcher(line);
if (locXYMatcher.find()) {
System.out.println(locXYIJMatcher.group(0));
}
}

Related

Finding the host name from a log file which is not separated by symbol using Java Regex

I have a log file and my task is to find the hostname of the log file which status is ERROR. Here are my log file details.
2017-02-09T02:37:44 [ERROR] Consumer iwjef99 could not be contacted
2017-02-09T02:37:46 [INFO] Message received from Producer w89fj93
2017-02-09T02:37:51 [ERROR] Consumer 7sjeuf returned 504
2017-02-09T02:37:53 [INFO] Message received from Producer a99jef9
2017-02-09T02:37:59 [INFO] Message sent to Consumer a99jef9
2017-02-09T02:38:55 [ERROR] Consumer a99jef9 disconnected unexpectedly
for the first log status is ERROR and hostname is iwjef99
.I already tried in this approach to find the hostname
List<String> list = new ArrayList<String>();
File file = new File("C:\\Users\\Arif\\Desktop\\test.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
Pattern e = Pattern.compile(".*\\bERROR\\b.*");
Pattern h1 = Pattern.compile("([a-zA-Z]+?[0-9]+|[0-9]+?[a-zA-Z]+)");
String st;
while ((st = br.readLine()) != null) {
Matcher m = e.matcher(st);
if (m.find()) {
Matcher h = h1.matcher(st);
if (h.find()) {
list.add(h.group());
}
}
}
for (int i = 0; i < list.size(); i++) {
System.out.println(list.get(i));
}
It catching the string after "-" parameters and showing the output like this
09T
09T
09T
09T
But my desire output should like this
iwjef99
7sjeuf
a99jef9
How can i do that?
Pattern e = Pattern.compile(".*\\bERROR\\b.*");
I have modified this regex now it should be
Pattern e = Pattern.compile("\\[ERROR] [A-Za-z]+ ([\\w]+)");
I have removed your second regex so now your code looks like this
List<String> list = new ArrayList<>();
File file = new File("logfile");
BufferedReader br = new BufferedReader(new FileReader(file));
Pattern e = Pattern.compile(".*\\[ERROR\\] [A-Za-z]+ ([A-Za-z0-9]+)");
String st;
while ((st = br.readLine()) != null) {
Matcher m = e.matcher(st);
if (m.find()) {
list.add(m.group(1));
}
}
for (String aList : list) {
System.out.println(aList);
}
String[] tokens = st.split(" ");
if("[ERROR]".equals(tokens[1])){
list.add(tokens[3]);
}
You should split by space and not use REGEX, you'll save yourself some trouble.
The issue lies in your regex, It's finding 09T because it matches the regex, I've come up with the following regex that works which is based on this answer:
([0-9]+[a-z]+|[a-z]+[0-9]+)[0-9a-z]*
working example
I've removed the A-Z part from that answer since the string you want to match appears to be lowercase only. The regex will match (from source):
One or more numeric characters, followed by one or more alphabetic characters, followed by 0 or more alphanumeric characters
or
One or more alphabetic characters, followed by one or more numeric characters, followed by 0 or more alphanumeric characters
You only need to match against one pattern, if the error log row always contains the word Consumer you can use
Pattern p = Pattern.compile("\\[ERROR\\] Consumer ([\\w]*)");
and the server name can be found in group 1
Matcher m = p.matcher(st);
if (m.find()) {
String server = m.group(1);
}
If the first word after [ERROR] might vary
Pattern p = Pattern.compile("\\[ERROR\\] ([\\w]*) ([\\w]*)");
then the second group contains the server
Matcher m = p.matcher(st);
if (m.find()) {
String server = m.group(2);
}
Instead of using regex, you can achieve your goal using String.Split() since the values appear in the same place when split by ' ' space character.
if (st.Split(' ')[1] == "[ERROR]") list.add(st.Split(' ')[3])

Two separate patterns and matchers (java)

I'm working on a simple bot for discord and the first pattern reading works fine and I get the results I'm looking for, but the second one doesn't seem to work and I can't figure out why.
Any help would be appreciated
public void onMessageReceived(MessageReceivedEvent event) {
if (event.getMessage().getContent().startsWith("!")) {
String output, newUrl;
String word, strippedWord;
String url = "http://jisho.org/api/v1/search/words?keyword=";
Pattern reading;
Matcher matcher;
word = event.getMessage().getContent();
strippedWord = word.replace("!", "");
newUrl = url + strippedWord;
//Output contains the raw text from jisho
output = getUrlContents(newUrl);
//Searching through the raw text to pull out the first "reading: "
reading = Pattern.compile("\"reading\":\"(.*?)\"");
matcher = reading.matcher(output);
//Searching through the raw text to pull out the first "english_definitions: "
Pattern def = Pattern.compile("\"english_definitions\":[\"(.*?)]");
Matcher matcher2 = def.matcher(output);
event.getTextChannel().sendMessage(matcher2.toString());
if (matcher.find() && matcher2.find()) {
event.getTextChannel().sendMessage("Reading: "+matcher.group(1)).queue();
event.getTextChannel().sendMessage("Definition: "+matcher2.group(1)).queue();
}
else {
event.getTextChannel().sendMessage("Word not found").queue();
}
}
}
You had to escape the [ character to \\[ (once for the Java String and once for the Regex). You also did forget the closing \".
the correct pattern looks like this:
Pattern def = Pattern.compile("\"english_definitions\":\\[\"(.*?)\"]");
At the output, you might want to readd \" and start/end.
event.getTextChannel().sendMessage("Definition: \""+matcher2.group(1) + "\"").queue();

Removing link from Text in Java?

I need to change somethign like this -> Hello, go here http://www.google.com for your ...
grab the link, and change it in a method i made, and replace it back into the string like this
-> Hello, go here http://www.yahoo.com for your...
Here is what i have so far:
if(Text.toLowerCase().contains("http://"))
{
// Do stuff
}
else if(Text.toLowerCase().contains("https://"))
{
// Do stuff
}
All i need to do is change the URL in the String to something different. The Url in the String will not always be http://www.google.com, so i can not just say replace("http://www.google.com","")
Use regex:
String oldUrl = text.replaceAll(".*(https?://)www((\\.\\w+)+).*", "www$2");
text = text.replaceAll("(https?://)www(\\.\\w+)+", "$1" + traslateUrl(oldUrl));
Note: code changed to meet extra requirements in comments below.
you can grab the link from the string using below code. I assumed the string will contain only .com domain
String input = "Hello, go here http://www.google.com";
Pattern pattern = Pattern.compile("http[s]{0,1}://www.[a-z-]*.com");
Matcher m = pattern.matcher(input);
while (m.find()) {
String str = m.group();
}
Have you tried something like:
s= s.replaceFirst("http:.+[ ]", new link);
This will find any word beginning with http up till the first white space and replace it with whatever you want
if you want to keep the link then you can do:
String oldURL;
if (s.contains("http")) {
String[] words = s.split(" ");
for (String word: words) {
if (word.contains("http")) {
oldURL = word;
break;
}
}
//then replace the url or whatever
}
You can try this
private String removeUrl(String commentstr)
{
String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:##%/;$()~_?\\+-=\\\\\\.&]*)";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
int i = 0;
while (m.find()) {
commentstr = commentstr.replaceAll(m.group(i),"").trim();
i++;
}
return commentstr;
}

Regex to parse phone numbers in text document with java

I'm trying to use regex to find phone numbers in the form (xxx) xxx-xxxx that are all inside a text document with messy html.
The text file has lines like:
<div style="font-weight:bold;">
<div>
<strong>Main Phone:
<span style="font-weight:normal;">(713) 555-9539
<strong>Main Fax:
<span style="font-weight:normal;">(713) 555-9541
<strong>Toll Free:
<span style="font-weight:normal;">(888) 555-9539
and my code contains:
Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");
Matcher m = p.matcher(line); //from buffered reader, reading 1 line at a time
if (m.matches()) {
stringArray.add(line);
}
The problem is when I put even simple things into the pattern to compile, it still returns nothing. And if it doesn't even recognize something like \d, how am I going to get a telephone number? For example:
Pattern p = Pattern.compile("\\d+"); //Returns nothing
Pattern p = Pattern.compile("\\d"); //Returns nothing
Pattern p = Pattern.compile("\\s+"); //Returns lines
Pattern p = Pattern.compile("\\D"); //Returns lines
This is really confusing to me, and any help would be appreciated.
Use Matcher#find() instead of matches() which would try to match the complete line as a phone number. find() would search and return true for sub-string matches as well.
Matcher m = p.matcher(line);
Also, the line above suggests you're creating the same Pattern and Matcher again in your loop. That's not efficient. Move the Pattern outside your loop and reset and reuse the same Matcher over different lines.
Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");
Matcher m = null;
String line = reader.readLine();
if (line != null && (m = p.matcher(line)).find()) {
stringArray.add(line);
}
while ((line = reader.readLine()) != null) {
m.reset(line);
if (m.find()) {
stringArray.add(line);
}
}
Or instead of regexp you can use Google library - libphonenumber, just as follows
Set<String> phones = new HashSet<>();
PhoneNumberUtil util = PhoneNumberUtil.getInstance();
Iterator<PhoneNumberMatch> iterator = util.findNumbers(source, null).iterator();
while (iterator.hasNext()) {
phones.add(iterator.next().rawString());
}

Why does this regex fail?

I have a PCL file and open it with Notepad ++ to view the source code (with PCL Viewer I see the final results but I need to view the source also).
Please see Lab Number and the rest of the characters. I am able to extract Lab Number and its code with this regex:
private static String PATTERN_LABNUMBER = "Lab Number[\\W\\D]*(\\d*)";
and it gives me:
0092616281
I now want to extract Date Reported and I use this regex (after a lot of other tries):
private static String PATTERN_DATE_REPORTED =
"Date Reported[\\W\\D]*(\\d\\d/\\d\\d/\\d\\d\\d\\d \\d\\d:\\d\\d)";
but it does NOT find it in the PCL file.
I've also tried with:
private static String PATTERN_DATE_REPORTED =
"Date Reported[\\W\\D]*([0-9]{2}/[0-9]{2}/[0-9]{4} [0-9]{2}:[0-9]{2})";
but the same not found result...
Do you see where I am missing something in this last regex?
Thanks a lot!
UPDATE:
I use this java code to extract Lab number and Date Reported:
public String extractWithRegEx(String regextype, String input) {
String matchedString = null;
if (regextype != null && input != null) {
Matcher matcher = Pattern.compile(regextype).matcher(input);
if (matcher.find()) {
System.out.println("Matcher found for regextype "+regextype);
matchedString = matcher.group(0);
if (matcher.groupCount() > 0) {
matchedString = matcher.group(1);
}
}
}
return matchedString;
}
Here is the code to accomplish what you want..
Pattern pattern = Pattern.compile("Date Reported.*(\\d{2}/\\d{4} \\d{2}:\\d{2})$", Pattern.MULTILINE);
String st = "date dfdsfsd fgfd gdfgfdgdf gdfgdfg gdfgdf 3232/22/2010 23:34\n"+
"dsadsadasDate Reported gdfgfd gdfgfdgdf gdfgdfg gdfgdf 3232/22/2010 23:34";
Matcher matcher = pattern.matcher(st);
while (matcher.find()) {
System.out.println(matcher.group(1));
}

Categories