Java regex - get line number from matching text

Java regex - get line number from matching text - java

It's based from my previous question.
For my case I want to get number of line from regex pattern. E.g :
name : andy
birth : jakarta, 1 jan 1990
number id : 01011990 01
age : 26
study : Informatics engineering
I want to get number of line from text that match of number [0-9]+. I wish output like this :
line 2
line 3
line 4

This will do it for you. I modified the regular expression to ".*[0-9].*"
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;
import java.util.regex.Pattern;
import java.util.concurrent.atomic.AtomicInteger;
class RegExLine
{
public static void main(String[] args)
{
new RegExLine().run();
}
public void run()
{
String fileName = "C:\\Path\\to\\input\\file.txt";
AtomicInteger atomicInteger = new AtomicInteger(0);
try (Stream<String> stream = Files.lines(Paths.get(fileName)))
{
stream.forEach(s ->
{
atomicInteger.getAndIncrement();
if(Pattern.matches(".*[0-9].*", s))
{
System.out.println("line "+ atomicInteger);
}
});
}
catch (IOException e)
{
e.printStackTrace();
}
}
}

Use a Scanner to iterate all lines of your input. And use Matcher Object to check for RegEx Pattern.
String s = "name : andy\n" +
"birth : jakarta, 1 jan 1990\n" +
"number id : 01011990 01\n" +
"age : 26\n" +
"study : Informatics engineering";
Scanner sc = new Scanner(s);
int lineNr = 1;
while (sc.hasNextLine()) {
String line = sc.nextLine();
Matcher m = Pattern.compile(".*[0-9].*").matcher(line);
if(m.matches()){
System.out.println("line " + lineNr);
}
lineNr++;
}

You could simply have the following:
public static void main(String[] args) throws IOException {
int i = 1;
Pattern pattern = Pattern.compile(".*[0-9]+.*");
try (BufferedReader br = new BufferedReader(new FileReader("..."))) {
String line;
while ((line = br.readLine()) != null) {
if (pattern.matcher(line).matches()) {
System.out.println("line " + i);
}
i++;
}
}
}
This code simply opens a BufferedReader to a given file path and iterates over each line in it (until readLine() returns null, indicating the end of the file). If the line matches the pattern ".*[0-9]+.*", meaning the line contains at least a digit, the line number is printed.

Use Matcher Object to check for RegEx Pattern.
public static void main( String[] args )
{
String s = "name : andy\n" + "birth : jakarta, 1 jan 1990\n" + "number id : 01011990 01\n" + "age : 26\n"
+ "study : Informatics engineering";
try
{
Pattern pattern = Pattern.compile( ".*[0-9].*" );
Matcher matcher = pattern.matcher( s );
int line = 1;
while ( matcher.find() )
{
line++;
System.out.println( "line :" + line );
}
}
catch ( Exception e )
{
e.printStackTrace();
}
}

Related

How can I scope three different conditions using the same loop in Java?

I would like to count countX and countX using the same loop instead of creating three different loops. Is there any easy way approaching that?
public class Absence {
private static File file = new File("/Users/naplo.txt");
private static File file_out = new File("/Users/naplo_out.txt");
private static BufferedReader br = null;
private static BufferedWriter bw = null;
public static void main(String[] args) throws IOException {
int countSign = 0;
int countX = 0;
int countI = 0;
String sign = "#";
String absenceX = "X";
String absenceI = "I";
try {
br = new BufferedReader(new FileReader(file));
bw = new BufferedWriter(new FileWriter(file_out));
String st;
while ((st = br.readLine()) != null) {
for (String element : st.split(" ")) {
if (element.matches(sign)) {
countSign++;
continue;
}
if (element.matches(absenceX)) {
countX++;
continue;
}
if (element.matches(absenceI)) {
countI++;
}
}
}
System.out.println("2. exerc.: There are " + countSign + " rows int the file with that sign.");
System.out.println("3. exerc.: There are " + countX + " with sick note, and " + countI + " without sick note!");
} catch (FileNotFoundException ex) {
Logger.getLogger(Absence.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
text file example:
# 03 26
Jujuba Ibolya IXXXXXX
Maracuja Kolos XXXXXXX

I think you meant using less than 3 if statements. You can actually so it with no ifs.
In your for loop write this:
Countsign += (element.matches(sign)) ? 1 : 0;
CountX += (element.matches(absenceX)) ? 1 : 0;
CountI += (element.matches(absenceI)) ? 1 : 0;

Both answers check if the word (element) matches all regular expressions while this can (and should, if you ask me) be avoided since a word can match only one regex. I am referring to the continue part your original code has, which is good since you do not have to do any further checks.
So, I am leaving here one way to do it with Java 8 Streams in "one liner".
But let's assume the following regular expressions:
String absenceX = "X*";
String absenceI = "I.*";
and one more (for the sake of the example):
String onlyNumbers = "[0-9]*";
In order to have some matches on them.
The text is as you gave it.
public class Test {
public static void main(String[] args) throws IOException {
File desktop = new File(System.getProperty("user.home"), "Desktop");
File txtFile = new File(desktop, "test.txt");
String sign = "#";
String absenceX = "X*";
String absenceI = "I.*";
String onlyNumbers = "[0-9]*";
List<String> regexes = Arrays.asList(sign, absenceX, absenceI, onlyNumbers);
List<String> lines = Files.readAllLines(txtFile.toPath());
//#formatter:off
Map<String, Long> result = lines.stream()
.flatMap(line-> Stream.of(line.split(" "))) //map these lines to words
.map(word -> regexes.stream().filter(word::matches).findFirst()) //find the first regex this word matches
.filter(Optional::isPresent) //If it matches no regex, it will be ignored
.collect(Collectors.groupingBy(Optional::get, Collectors.counting())); //collect
System.out.println(result);
}
}
The result:
{X*=1, #=1, I.=2, [0-9]=2}
X*=1 came from word: XXXXXXX
#=1 came from word: #
I.*=2 came from words: IXXXXXX and Ibolya
[0-9]*=2 came from words: 03 and 06
Ignore the fact I load all lines in memory.

So I made it with the following lines to work. It escaped my attention that every character need to be separated from each other. Your ternary operation suggestion also nice so I will use it.
String myString;
while ((myString = br.readLine()) != null) {
String newString = myString.replaceAll("", " ").trim();
for (String element : newString.split(" ")) {
countSign += (element.matches(sign)) ? 1 : 0;
countX += (element.matches(absenceX)) ? 1 : 0;
countI += (element.matches(absenceI)) ? 1 : 0;

how to delete up extra line breakers in string

I have got a text like this in my String s (which I have already read from txt.file)
trump;Donald Trump;trump#yahoo.eu
obama;Barack Obama;obama#google.com
bush;George Bush;bush#inbox.com
clinton,Bill Clinton;clinton#mail.com
Then I'm trying to cut off everything besides an e-mail address and print out on console
String f1[] = null;
f1=s.split("(.*?);");
for (int i=0;i<f1.length;i++) {
System.out.print(f1[i]);
}
and I have output like this:
trump#yahoo.eu
obama#google.com
bush#inbox.com
clinton#mail.com
How can I avoid such output, I mean how can I get output text without line breakers?

Try using below approach. I have read your file with Scanner as well as BufferedReader and in both cases, I don't get any line break. file.txt is the file that contains text and the logic of splitting remains the same as you did
public class CC {
public static void main(String[] args) throws IOException {
Scanner scan = new Scanner(new File("file.txt"));
while (scan.hasNext()) {
String f1[] = null;
f1 = scan.nextLine().split("(.*?);");
for (int i = 0; i < f1.length; i++) {
System.out.print(f1[i]);
}
}
scan.close();
BufferedReader br = new BufferedReader(new FileReader(new File("file.txt")));
String str = null;
while ((str = br.readLine()) != null) {
String f1[] = null;
f1 = str.split("(.*?);");
for (int i = 0; i < f1.length; i++) {
System.out.print(f1[i]);
}
}
br.close();
}
}

You may just replace all line breakers as shown in the below code:
String f1[] = null;
f1=s.split("(.*?);");
for (int i=0;i<f1.length;i++) {
System.out.print(f1[i].replaceAll("\r", "").replaceAll("\n", ""));
}
This will replace all of them with no space.

Instead of split, you might match an email like format by matching not a semicolon or a whitespace character one or more times using a negated character class [^\\s;]+ followed by an # and again matching not a semicolon or a whitespace character.
final String regex = "[^\\s;]+#[^\\s;]+";
final String string = "trump;Donald Trump;trump#yahoo.eu \n"
+ " obama;Barack Obama;obama#google.com \n"
+ " bush;George Bush;bush#inbox.com \n"
+ " clinton,Bill Clinton;clinton#mail.com";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final List<String> matches = new ArrayList<String>();
while (matcher.find()) {
matches.add(matcher.group());
}
System.out.println(String.join("", matches));
[^\\s;]+#[^\\s;]+
Regex demo
Java demo

package com.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "trump;Donald Trump;trump#yahoo.eu "
+ "obama;Barack Obama;obama#google.com "
+ "bush;George Bush;bush#inbox.com "
+ "clinton;Bill Clinton;clinton#mail.com";
String spaceStrings[] = s.split("[\\s,;]+");
String output="";
for(String word:spaceStrings){
if(validate(word)){
output+=word;
}
}
System.out.println(output);
}
public static final Pattern VALID_EMAIL_ADDRESS_REGEX = Pattern.compile(
"^[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,6}$",
Pattern.CASE_INSENSITIVE);
public static boolean validate(String emailStr) {
Matcher matcher = VALID_EMAIL_ADDRESS_REGEX.matcher(emailStr);
return matcher.find();
}
}

Just replace '\n' that may arrive at start and end.
write this way.
String f1[] = null;
f1=s.split("(.*?);");
for (int i=0;i<f1.length;i++) {
f1[i] = f1[i].replace("\n");
System.out.print(f1[i]);
}

Regex patter in Java matching single letter instead of complete word.

I am new to java and been trying to write some line of code where the requirement is something regex patter will be saved in file, read the content from file and save it array list then compare with some string variable and find the match. But in this process when am trying to do its matching single letter instead of the whole word. below is the code .
import java.io.*;
import java.util.Scanner;
import java.util.ArrayList;
import java.util.regex.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public void findfile( String path ){
File f = new File(path);
if(f.exists() && !f.isDirectory()) {
System.out.println("file found.....!!!!");
if(f.length() == 0 ){
System.out.println("file is empty......!!!!");
}}
else {
System.out.println("file missing");
}
}
public void readfilecontent(String path, String sql){
try{Scanner s = new Scanner(new File(path));
ArrayList<String> list = new ArrayList<String>();
while (s.hasNextLine()){
list.add(s.nextLine());
}
s.close();
System.out.println(list);
Pattern p = Pattern.compile(list.toString(),Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(sql);
if (m.find()){
System.out.println("match found");
System.out.println(m.group());
}
else {System.out.println("match not found"); }
}
catch (FileNotFoundException ex){}
}
public static void main( String args[] ) {
String path = "/code/sql.pattern";
String sql = "select * from schema.test";
RegexMatches regex = new RegexMatches();
regex.findfile(path);
regex.readfilecontent(path,sql);
}
the sql.pattern contains
\\buser\\b
\\border\\b
Am expecting that it shouldn't match anything and print message saying match not found instead it says match found and m.group() prints letter s as output could anyone please help.
Thanks in advance.

The problem here seems to be the double slash.
I would not recommend you to provide list.toString() in Pattern.compile method because it also inserts '[', ',' and ']' character which can mess up with you regex, instead you can refer below code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public void findfile(String path) {
File f = new File(path);
if (f.exists() && !f.isDirectory()) {
System.out.println("file found.....!!!!");
if (f.length() == 0) {
System.out.println("file is empty......!!!!");
}
} else {
System.out.println("file missing");
}
}
public void readfilecontent(String path, String sql) {
try {
Scanner s = new Scanner(new File(path));
ArrayList<String> list = new ArrayList<String>();
while (s.hasNextLine()) {
list.add(s.nextLine());
}
s.close();
System.out.println(list);
list.stream().forEach(regex -> {
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(sql);
if (m.find()) {
System.out.println("match found for regex " + regex );
System.out.println("matched substring: "+ m.group());
} else {
System.out.println("match not found for regex " + regex);
}
});
} catch (FileNotFoundException ex) {
ex.printStackTrace();
}
}
public static void main(String args[]) {
String path = "/code/sql.pattern";
String sql = "select * from schema.test";
RegexMatches regex = new RegexMatches();
regex.findfile(path);
regex.readfilecontent(path, sql);
}
}
while keeping /code/sql.pattern as below:
\buser\b
\border\b
\bfrom\b

Java reading from text file and remove dash in string

I have a text file:
John Smith 2009-11-04
Jenny Doe 2009-12-29
Alice Jones 2009-01-03
Bob Candice 2009-01-04
Carol Heart 2009-01-07
Carlos Diaz 2009-01-10
Charlie Brown 2009-01-14
I'm trying to remove the dashes and store them as separate types: first, last, year,month,day and then add it to a sortedset/hashmap. But for some reason. It's not working right.
Here is my code:
public class Test {
File file;
private Scanner sc;
//HashMap<Name, Date> hashmap = new HashMap<>();
/**
* #param filename
*/
public Test(String filename) {
file = new File(filename);
}
public void openFile(String filename) {
// open the file for scanning
System.out.println("Test file " + filename + "\n");
try {
sc = new Scanner(new File("birthdays.dat"));
}
catch(Exception e) {
System.out.println("Birthdays: Unable to open data file");
}
}
public void readFile() {
System.out.println("Name Birthday");
System.out.println("---- --------");
System.out.println("---- --------");
while (sc.hasNext()) {
String line = sc.nextLine();
String[] split = line.split("[ ]?-[ ]?");
String first = split[0];
String last = split[1];
//int year = Integer.parseInt(split[2]);
//int month = Integer.parseInt(split[3]);
//int day = Integer.parseInt(split[4]);
Resource name = new Name(first, last);
System.out.println(first + " " + last + " " + split[2] );
//hashmap.add(name);
}
}
public void closeFile() {
sc.close();
}
public static void main(String[] args) throws FileNotFoundException,
ArrayIndexOutOfBoundsException {
try {
Scanner sc = new Scanner( new File(args[0]) );
for( int i = 0; i < args.length; i++ ) {
//System.out.println( args[i] );
if( args.length == 0 ) {
}
else if( args.length >= 1 ) {
}
// System.out.printf( "Name %-20s Birthday", name.toString(), date.toString() );
}
} catch (ArrayIndexOutOfBoundsException e) {
System.err.println("Usage: Birthdays dataFile");
// Terminate the program here somehow, or see below.
System.exit(-1);
} catch (FileNotFoundException e) {
System.err.println("Birthdays: Unable to open data file");
// Terminate the program here somehow, or see below.
System.exit(-1);
}
Test r = new Test(args[0]);
r.openFile(args[0]);
r.readFile();
r.closeFile();
}
}

Your splitting on dashes but your is program is build around a split using spaces.
Try just splitting on spaces
String[] split = line.split("\\s");
So "John Smith 2009-11-04".split("[ ]?-[ ]?"); results in ["John Smith 2009", "11", "04"] When what you want is for it to split on spaces ["John", "Smith", "2009-11-04"]

I would do this differently, first create a domain object:
public class Person {
private String firstName;
private String lastName;
private LocalDate date;
//getters & setters
//equals & hashCode
//toString
}
Now create a method that parses a single String of the format you have to a Person:
//instance variable
private final DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
public Person parsePerson(final String input) {
final String[] data = input.split("\\s+");
final Person person = new Person();
person.setFirstName(data[0]);
person.setLastName(data[1]);
person.setDate(LocalDate.parse(data[2], dateTimeFormatter));
return person;
}
Note that the DateTimeFormatter is an instance variable, this is for speed. You should also set the ZoneInfo on the formatter if you need to parse dates not in your current locale.
Now, you can read your file into a List<Person> very easily:
public List<Person> readFromFile(final Path path) throws IOException {
try (final Stream<String> lines = Files.lines(path)) {
return lines
.map(this::parsePerson)
.collect(toList());
}
}
And now that you have a List<Person>, you can sort or process them however you want.
You can even do this while creating the List:
public List<Person> readFromFile(final Path path) throws IOException {
try (final Stream<String> lines = Files.lines(path)) {
return lines
.map(this::parsePerson)
.sorted(comparing(Person::getLastName).thenComparing(Person::getFirstName))
.collect(toList());
}
}
Or have your Person implements Comparable<Person> and simply use natural order.
TL;DR: Use Objects for your objects and life becomes much simpler.

I would use a regex:
private static Pattern LINE_PATTERN
= Pattern.compile("(.+) (.+) ([0-9]{4})-([0-9]{2})-([0-9]{2})");
...
while (sc.hasNext()) {
String line = sc.nextLine();
Matcher matcher = LINE_PATTERN.matcher(line);
if (!matcher.matches()) {
// malformed line
} else {
String first = matcher.group(1);
String last = matcher.group(2);
int year = Integer.parseInt(matcher.group(3));
int month = Integer.parseInt(matcher.group(4));
int day = Integer.parseInt(matcher.group(5));
// do something with it
}
}

You are splitting on spaces and a hyphen. This pattern does not exist.
String[] split = line.split("[ ]?");
String first = split[0];
String last = split[1];
line = split[2];
//now split the date
String[] splitz = line.split("-");
or something like this might work:
String delims = "[ -]+";
String[] tokens = line.split(delims);

If i understood your question right then Here is answer. Check it out.
List<String> listGet = new ArrayList<String>();
String getVal = "John Smith 2009-11-04";
String[] splited = getVal.split("[\\-:\\s]");
for(int j=0;j<splited.length;j++)
{
listGet.add(splited[j]);
}
System.out.println("first name :"+listGet.get(0));
System.out.println("Last name :"+listGet.get(1));
System.out.println("year is :"+listGet.get(2));
System.out.println("month is :"+listGet.get(3));
System.out.println("day is :"+listGet.get(4));
OP :
first name :John
Last name :Smith
year is :2009
month is :11
day is :04

Regex in Java with matches stored into an ArrayList

I have the following code made with the purpose of storing and displaying all words that begin with letter a and ending with z. First of all I am getting an error from my regex pattern, and secondly I am getting an error from not displaying the content (String) stored into an ArrayList.
import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;
public class RegexSimple2{
public static void main(String[] args) {
try{
Scanner myfis = new Scanner("D:\\myfis2.txt");
ArrayList <String> foundaz = new ArrayList<String>();
while(myfis.hasNext()){
String line = myfis.nextLine();
String delim = " ";
String [] words = line.split(delim);
for ( String s: words){
if(!s.isEmpty()&& s!=null){
Pattern pi = Pattern.compile("[a|A][a-z]*[z]");
Matcher ma = pi.matcher(s);
boolean search = false;
while (ma.find()){
search = true;
foundaz.add(s);
}
if(!search){
System.out.println("Words that start with a and end with z have not been found");
}
}
}
}
if(!foundaz.isEmpty()){
for(String s: foundaz){
System.out.println("The word that start with a and ends with z is:" + s + " ");
}
}
}
catch(Exception ex){
System.out.println(ex);
}
}
}

You need to change how you are reading the file in. In addition, change the regex to [aA].*z. The .* matches zero or more of anything. See the minor changes I made below:
import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
try {
BufferedReader myfis = new BufferedReader(new FileReader("D:\\myfis2.txt"));
ArrayList<String> foundaz = new ArrayList<String>();
String line;
while ((line = myfis.readLine()) != null) {
String delim = " ";
String[] words = line.split(delim);
for (String s : words) {
if (!s.isEmpty() && s != null) {
Pattern pi = Pattern.compile("[aA].*z");
Matcher ma = pi.matcher(s);
if (ma.find()) {
foundaz.add(s);
}
}
}
}
if (!foundaz.isEmpty()) {
System.out.println("The words that start with a and ends with z are:");
for (String s : foundaz) {
System.out.println(s);
}
}
} catch (Exception ex) {
System.out.println(ex);
}
}
}
Input was:
apple
applez
Applez
banana
Output was:
The words that start with a and ends with z are:
applez
Applez

import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;
public class RegexSimple2
{
public static void main(String[] args) {
try
{
Scanner myfis = new Scanner(new File("D:\\myfis2.txt"));
ArrayList <String> foundaz = new ArrayList<String>();
while(myfis.hasNext())
{
String line = myfis.nextLine();
String delim = " ";
String [] words = line.split(delim);
for (String s : words) {
if (!s.isEmpty() && s != null)
{
Pattern pi = Pattern.compile("[aA].*z");
Matcher ma = pi.matcher(s);
if (ma.find()) {
foundaz.add(s);
}
}
}
}
if(foundaz.isEmpty())
{
System.out.println("No matching words have been found!");
}
if(!foundaz.isEmpty())
{
System.out.print("The words that start with a and ends with z are:\n");
for(String s: foundaz)
{
System.out.println(s);
}
}
}
catch(Exception ex)
{
System.out.println(ex);
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex - get line number from matching text - java

Related

How can I scope three different conditions using the same loop in Java?

how to delete up extra line breakers in string

Regex patter in Java matching single letter instead of complete word.

Java reading from text file and remove dash in string

Regex in Java with matches stored into an ArrayList

Categories

Resources