Regex in Java is not matching anything from text file - java

I have a text file with several lines
Category: Type of problem you're having
Description: Overview of the problem
How To Fix: Directions to fix your problem (has carriage
returns, sometimes)
Related Links: Additional Resources
**There are no numbers in my list; it was the only way I could think of to make it neater...*
I've been trying to get my code to recognize all of the information between "How To Fix:" and "Related Links" when it has more than one line. I know from my research that I have to use either (?s) or Pattern.DOTALL, however neither of them seem to be working. I'm fairly new to Regex, so I'm expect is something elementary. Here is my code:
String fileName = System.getProperty("user.home") + "/Desktop/Test.txt";
try {
FileReader fr = new FileReader(fileName);
BufferedReader br = new BufferedReader(fr);
sc1 = new Scanner(br);
String findingRegex = "(Description:.*)";
String recommRegex = "(?<=How To Fix:)(.*)(?=Related Links)";//regex I'm trying to use
Pattern pFinding = Pattern.compile(findingRegex);
Pattern pRecomm = Pattern.compile(recommRegex, Pattern.DOTALL);
while (sc1.hasNextLine()) {
String clean = sc1.nextLine().trim();
String clean2 = clean.replaceAll("\\\\x\\p{XDigit}{2}", "");
Matcher mFinding = pFinding.matcher(clean2);
Matcher mRecomm = pRecomm.matcher(clean2);
while (mFinding.find()) {
System.out.println(mFinding);
}
while (mRecomm.find()){
System.out.println(mRecomm); //nothing prints?
}
}
br.close();
fr.close();
System.out.println("The following data was imported: ");
try {
tbl.displayAll();
} catch (NullPointerException npe) {
System.out.println("You have no data.");
}
} catch (FileNotFoundException fnfe) {
System.out.println("File named Test.txt was not located on your desktop. Program Terminated.");
System.exit(0);
} catch (IOException ioe) {
System.out.println("The import operation failed. Program Terminated");
System.exit(0);
} finally {
sc1.close();
}
Lastly, I tested my Regex here and it worked as expected?
MY SOLUTION:
String findingRegex = "(?<=Description:)(.*)(?=How To Fix)";
String recommRegex = "(?<=How To Fix:)(.*)(?=Related Links)";
Pattern pFinding = Pattern.compile(findingRegex, Pattern.DOTALL);
Pattern pRecomm = Pattern.compile(recommRegex, Pattern.DOTALL);
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null){
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
String newFile = sb.toString();
Matcher mFinding = pFinding.matcher(newFile);
Matcher mRecomm = pRecomm.matcher(newFile);
while (mFinding.find()) {
System.out.println(mFinding);
}
while (mRecomm.find()){
System.out.println(mRecomm);
}

Here:
String clean = sc1.nextLine().trim();
You are breaking your input up by line. But then you're trying to match multiple lines. There aren't multiple lines to match, because you only kept the one.
You could read the entire file into memory first, and then match against it. Or you could do something like
StringBuilder sb = new StringBuilder();
int state = 0;
while (sc1.hasNextLine()) {
String line = sc1.nextLine();
if (line.contains("How To Fix:")) {
state = 1;
}
if (state == 1) {
sb.append(line);
}
if (line.contains("Related Links:")) {
state = 0;
}
}
(You'll need to modify this if you need to match more than once per file.)

Related

Java: Searching for specific word in a text file

I've currently got a large text file with lots of the most popular names. I get the user to input a specific name and I'm currently trying to print the line that has the name. My problem is that if the user enters a name like Alex, every name that contains Alex like Alexander, Alexis, Alexia gets printed when I only want Alex to get printed. What can I do to "if(line.contains(name)){" to fix this.
The line contains info like the name, it's popularity ranking and number of people with that name
try {
line = reader.readLine();
while (line != null) {
if(line.contains(name)){
text += line;
line = reader.readLine();
}
line = reader.readLine();
}
}catch(Exception e){
System.out.println("Error");
}
System.out.println(text);
A shorthand would be to use Java8 Streams: Here is a look :
public class Test2 {
public static void main(String[] args) {
String fileName = "c://lines.txt";
String name = "nametosearch";
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.filter(line -> line.contains(" " + name + " ")).forEach(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
}
}
You can use regex with a word boundary for this task:
final String regex = String.format("\\b%s\\b", name);
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(line);
matcher.find();
if( matcher.group(0).length() > 0 ) {
text += line;
line = reader.readLine();
}
line.equals(name)
Replace
line.contains(name)

how to split one text into multiple text files

I have the following Text:
1
(some text)
/
2
(some text)
/
.
.
/
8519
(some text)
and I want to split this text into several text-files where each file has the name of the number before the text i.e. (1.txt, 2.txt) and so on, and the content of this file will be the text.
I tried this code
BufferedReader br = new BufferedReader(new FileReader("(Path)\\doc.txt"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
// sb.append(System.lineSeparator());
line = br.readLine();
}
String str = sb.toString();
String[] arrOfStr = str.split("/");
for (int i = 0; i < arrOfStr.length; i++) {
PrintWriter writer = new PrintWriter("(Path)" + arrOfStr[i].charAt(0) + ".txt", "UTF-8");
writer.println(arrOfStr[i].substring(1));
writer.close();
}
System.out.println("Done");
} finally {
br.close();
}
this code works for files 1-9. However, things go wrong for files 10-8519 since I took the first number in the string (arrOfStr [i].charAt(0)) I know my solution is insufficient any suggestions?
In addition to my comment, considering there isn't a space between the leading integer and the first word, the substring at the first space doesn't work.
This question/answer has a few options that should help, the one using regex (\d+) being the simplest one imo, and copied below.
Matcher matcher = Pattern.compile("\\d+").matcher(arrOfStr[i]);
matcher.find();
int yourNumber = Integer.valueOf(matcher.group());
Given a string find the first embedded occurrence of an integer
As you mentioned, the problem is that you only take the first digit. You could enumerate the first characters until you find a non digit character ( arrOfStr[i].charAt(j) <'0' || arrOfStr[i].charAt(j) > '9' ) but it shoud be easier to user a Scanner and an appropriate regexp.
int index = new Scanner(arrOfStr[i]).useDelimiter("\\D+").nextInt();
The delimiter is precisely any group of non-digit character
Here is a quick solution for the given problem. You can test and do proper exception handling.
package practice;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.List;
public class FileNioTest {
public static void main(String[] args) {
Path path = Paths.get("C:/Temp/readme.txt");
try {
List<String> contents = Files.readAllLines(path);
StringBuffer sb = new StringBuffer();
String folderName = "C:/Temp/";
String fileName = null;
String previousFileName = null;
// Read from the stream
for (String content : contents) {// for each line of content in contents
if (content.matches("-?\\d+")) { // check if it is a number (based on your requirement)
fileName = folderName + content + ".txt"; // create a file name with path
if (sb != null && sb.length() > 0) { // this means if content present to write in the file
writeToFile(previousFileName, sb); // write to file
sb.setLength(0); // clearing buffer
}
createFile(fileName); // create a new file if number found in the line
previousFileName = fileName; // store the name to write content in previous opened file.
} else {
sb.append(content); // keep storing the content to write in the file.
}
System.out.println(content);// print the line
}
if (sb != null && sb.length() > 0) {
writeToFile(fileName, sb);
sb.setLength(0);
}
} catch (IOException ex) {
ex.printStackTrace();// handle exception here
}
}
private static void createFile (String fileName) {
Path newFilePath = Paths.get(fileName);
if (!Files.exists(newFilePath)) {
try {
Files.createFile(newFilePath);
} catch (IOException e) {
System.err.println(e);
}
}
}
private static void writeToFile (String fileName, StringBuffer sb) {
try {
Files.write(Paths.get(fileName), sb.toString().getBytes(), StandardOpenOption.APPEND);
}catch (IOException e) {
System.err.println(e);
}
}
}

Buffer Reader code to read input file

I have a text file named "message.txt" which is read using Buffer Reader. Each line of the text file contains both "word" and "meaning" as given in this example:
"PS:Primary school"
where PS - word, Primary school - meaning
When the file is being read, each line is tokenized to "word" and "meaning" from ":".
If the "meaning" is equal to the given input string called "f_msg3", "f_msg3" is displayed on the text view called "txtView". Otherwise, it displays "f_msg" on the text view.
But the "if condition" is not working properly in this code. For example if "f_msg3" is equal to "Primary school", the output on the text view must be "Primary school". But it gives the output as "f_msg" but not "f_msg3". ("f_msg3" does not contain any unnecessary strings.)
Can someone explain where I have gone wrong?
try {
BufferedReader file = new BufferedReader(new InputStreamReader(getAssets().open("message.txt")));
String line = "";
while ((line = file.readLine()) != null) {
try {
/*separate the line into two strings at the ":" */
StringTokenizer tokens = new StringTokenizer(line, ":");
String word = tokens.nextToken();
String meaning = tokens.nextToken();
/*compare the given input with the meaning of the read line */
if(meaning.equalsIgnoreCase(f_msg3)) {
txtView.setText(f_msg3);
} else {
txtView.setText(f_msg);
}
} catch (Exception e) {
txtView.setText("Cannot break");
}
}
} catch (IOException e) {
txtView.setText("File not found");
}
Try this
............
meaning = meaning.replaceAll("\\s+", " ");
/*compare the given input with the meaning of the read line */
if(meaning.equalsIgnoreCase(f_msg3)) {
txtView.setText(f_msg3);
} else {
txtView.setText(f_msg);
}
............
Otherwise comment the else part, then it will work.
I don't see any obvious error in your code, maybe it is just a matter
of cleaning the string (i.e. removing heading and trailing spaces, newlines and so on) before comparing it.
Try trimming meaning, e.g. like this :
...
String meaning = tokens.nextToken();
if(meaning != null) {
meaning = meaning.trim();
}
if(f_msg3.equalsIgnoreCase(meaning)) {
txtView.setText(f_msg3);
} else {
txtView.setText(f_msg);
}
...
A StringTokenizer takes care of numbers (the cause for your error) and other "tokens" - so might be considered to invoke too much complexity.
String[] pair = line.split("\\s*\\:\\s*", 2);
if (pair.length == 2) {
String word = pair[0];
String meaning = pair[1];
...
}
This splits the line into at most 2 parts (second optional parameter) using a regular expression. \s* stands for any whitespace: tabs and spaces.
You could also load all in a Properties. In a properties file the format key=value is convention, but also key:value is allowed. However then some escaping might be needed.
ArrayList vals = new ArrayList();
String jmeno = "Adam";
vals.add("Honza");
vals.add("Petr");
vals.add("Jan");
if(!(vals.contains(jmeno))){
vals.add(jmeno);
}else{
System.out.println("Adam je už v seznamu");
}
for (String jmena : vals){
System.out.println(jmena);
}
try (BufferedReader br = new BufferedReader(new FileReader("dokument.txt")))
{
String aktualni = br.readLine();
int pocetPruchodu = 0;
while (aktualni != null)
{
String[] znak = aktualni.split(";");
System.out.println(znak[pocetPruchodu] + " " +znak[pocetPruchodu + 1]);
aktualni = br.readLine();
}
br.close();
}
catch (IOException e)
{
System.out.println("Nezdařilo se");
}
try (BufferedWriter bw = new BufferedWriter(new FileWriter("dokument2.txt")))
{
int pocetpr = 0;
while (pocetpr < vals.size())
{
bw.write(vals.get(pocetpr));
bw.append(" ");
pocetpr++;
}
bw.close();
}
catch (IOException e)
{
System.out.println("Nezdařilo se");
}

Using trim() in Java to remove parts of an ouput

I have some code I wrote that outputs a batch file output to a jTextArea. Currently the batch file outputs an active directory query for the computer name, but there is a bunch of stuff that outputs as well that I want to be removed from the output from the variable String trimmedLine. Currently it's still outputting everything else and I can't figure out how to get only the computer name to appear.
Output: "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET"
I want the output to instead just show only this:
FDCD111304
Can anyone show me how to fix my code to only output the computer name and nothing else?
Look at console output (Ignore top line in console output)
btnPingComputer.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent arg0) {
String line;
BufferedWriter bw = null;
BufferedWriter writer =null;
try {
writer = new BufferedWriter(new FileWriter(tempFile));
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String lineToRemove = "OU=Workstations";
String s = null;
Process p = null;
try {
p = Runtime.getRuntime().exec("c:\\computerQuery.bat");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
StringBuffer sbuffer = new StringBuffer(); // new trial
BufferedReader in = new BufferedReader(new InputStreamReader(p
.getInputStream()));
try {
while ((line = in.readLine()) != null) {
System.out.println(line);
textArea.append(line);
textArea.append(String.format(" %s%n", line));
sbuffer.append(line + "\n");
s = sbuffer.toString();
String trimmedLine = line.trim();
if(trimmedLine.equals(lineToRemove)) continue;
writer.write(line + System.getProperty("line.separator"));
}
fw.write("commandResult is " + s);
String input = "CN=FDCD511304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("(.*?)\\=(.*?)\\,");
Matcher m = pattern.matcher(input);
while(m.find()) {
String currentVar = m.group().substring(3, m.group().length() - 1);
System.out.println(currentVar); //store or do whatever you want
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally
{
try {
fw.close();
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
try {
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
});
You could also use javax.naming.ldap.LdapName when dealing with distinguished names. It also handles escaping which is tricky with regex alone (i.e. cn=foo\,bar,dc=fl,dc=net is a perfectly valid DN)
String dn = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
LdapName ldapName = new LdapName(dn);
String commonName = (String) ldapName.getRdn(ldapName.size() - 1).getValue();
Well I would personally use the split() function to first get the parts split up and then parse out again. So my (probably unprofessional and buggy code) would be
String args[] = line.split(",");
String args2[] = args[0].split("=");
String computerName = args2[1];
And that would be where this is:
while ((line = in.readLine()) != null) {
System.out.println(line);
String trimmedLine = line.trim();
if (trimmedLine.equals(lineToRemove))
continue;
writer.write(line
+ System.getProperty("line.separator"));
textArea.append(trimmedLine);
textArea.append(String.format(" %s%n", line));
}
You can use a different regular expression and Matcher.matches() to find only the value you're looking for:
String str = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("(?:.*,)?CN=([^,]+).*");
Matcher matcher = pattern.matcher(str);
if(matcher.matches()) {
System.out.println(matcher.group(1));
} else {
System.out.println("No value for CN found");
}
FDCD111304
That regular expression will find the value for CN regardless of where in the string it is. The first group is to discard anything in front of CN= (we use a group starting with ?: here to indicate that the contents of the group should not be kept), then we match CN=, then the value, which may not contain a comma and then the rest of the string (which we don't care about).
You can also use a different regex and Matcher.find() to get both the keys and values and choose which keys to act on:
String str = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("([^=]+)=([^,]+),?");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
String key = matcher.group(1);
String value = matcher.group(2);
if("CN".equals(key) || "DC".equals(key)) {
System.out.printf("%s: %s%n", key, value);
}
}
CN: FDCD111304
DC: FL
DC: NET
Try using substring to chop off the parts you dont require hence creating a new string
There're few options, simples dumbest:
str.substring(str.indexOf("=") + 1, str.indexOf(","))
Second one and more flexible approach would be to build HashArray, it would be helpful in future to read other values.
Edit: Second method
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.HashMap;
public class HelloWorld{
public static void main(String []args){
String input = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("(.*?)\\=(.*?)\\,");
Matcher m = pattern.matcher(input);
while(m.find()) {
String currentVar = m.group().substring(0, m.group().length() - 2);
System.out.println(currentVar); //store or do whatever you want
}
}
}
This one will print all values like CN=FDCD11130, you can split it by '=' and store in key/value container like HashMap or just inside list.

java string matching from a large text file issue

I would like to implement a task of string matching from a large text file.
1. replace all the non-alphanumeric characters
2. count the number of a specific term in the text file. For example, matching term "tom". The matching is not case sensitive.so term "Tom" should me counted. However the term tomorrow should not be counted.
code template one:
try {
in = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile));
} catch (FileNotFoundException e1) {
System.out.println("Not found the text file: "+inputFile);
}
Scanner scanner = null;
try {
while (( line = in.readLine())!=null){
String newline=line.replaceAll("[^a-zA-Z0-9\\s]", " ").toLowerCase();
scanner = new Scanner(newline);
while (scanner.hasNext()){
String term = scanner.next();
if (term.equalsIgnoreCase(args[1]))
countstr++;
}
}
} catch (IOException e) {
e.printStackTrace();
}
code template two:
try {
in = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile));
} catch (FileNotFoundException e1) {
System.out.println("Not found the text file: "+inputFile);
}
Scanner scanner = null;
try {
while (( line = in.readLine())!=null){
String newline=line.replaceAll("[^a-zA-Z0-9\\s]", " ").toLowerCase();
String[] strArray=newline.split(" ");//split by blank space
for (int =0;i<strArray.length;i++)
if (strArray[i].equalsIgnoreCase(args[1]))
countstr++;
}
}
} catch (IOException e) {
e.printStackTrace();
}
By running the two codes, I get the different results, the Scanner looks like to get the right one.But for the large text file, the Scanner runs much more slower than the latter one. Anyone who can tell me the reason and give a much more efficient solution.
In your first approch. You dont need to use two scanner. Scanner with "" is not good choice for the large line.
your line is already Converted to lowercase. So you just need to do lowercase of key outside once . And do equals in loop
Or get the line
String key = String.valueOf(".*?\\b" + "Tom".toLowerCase() + "\\b.*?");
Pattern p = Pattern.compile(key);
word = word.toLowerCase().replaceAll("[^a-zA-Z0-9\\s]", "");
Matcher m = p.matcher(word);
if (m.find()) {
countstr++;
}
Personally i would choose BufferedReader approach for the large file.
String key = String.valueOf(".*?\\b" + args[0].toLowerCase() + "\\b.*?");
Pattern p = Pattern.compile(key);
try (final BufferedReader br = Files.newBufferedReader(inputFile,
StandardCharsets.UTF_8)) {
for (String line; (line = br.readLine()) != null;) {
// processing the line.
line = line.toLowerCase().replaceAll("[^a-zA-Z0-9\\s]", "");
Matcher m = p.matcher(line);
if (m.find()) {
countstr++;
}
}
}
Gave Sample in Java 7. Change if required!!

Categories