Regex in Java with matches stored into an ArrayList - java

I have the following code made with the purpose of storing and displaying all words that begin with letter a and ending with z. First of all I am getting an error from my regex pattern, and secondly I am getting an error from not displaying the content (String) stored into an ArrayList.
import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;
public class RegexSimple2{
public static void main(String[] args) {
try{
Scanner myfis = new Scanner("D:\\myfis2.txt");
ArrayList <String> foundaz = new ArrayList<String>();
while(myfis.hasNext()){
String line = myfis.nextLine();
String delim = " ";
String [] words = line.split(delim);
for ( String s: words){
if(!s.isEmpty()&& s!=null){
Pattern pi = Pattern.compile("[a|A][a-z]*[z]");
Matcher ma = pi.matcher(s);
boolean search = false;
while (ma.find()){
search = true;
foundaz.add(s);
}
if(!search){
System.out.println("Words that start with a and end with z have not been found");
}
}
}
}
if(!foundaz.isEmpty()){
for(String s: foundaz){
System.out.println("The word that start with a and ends with z is:" + s + " ");
}
}
}
catch(Exception ex){
System.out.println(ex);
}
}
}

You need to change how you are reading the file in. In addition, change the regex to [aA].*z. The .* matches zero or more of anything. See the minor changes I made below:
import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
try {
BufferedReader myfis = new BufferedReader(new FileReader("D:\\myfis2.txt"));
ArrayList<String> foundaz = new ArrayList<String>();
String line;
while ((line = myfis.readLine()) != null) {
String delim = " ";
String[] words = line.split(delim);
for (String s : words) {
if (!s.isEmpty() && s != null) {
Pattern pi = Pattern.compile("[aA].*z");
Matcher ma = pi.matcher(s);
if (ma.find()) {
foundaz.add(s);
}
}
}
}
if (!foundaz.isEmpty()) {
System.out.println("The words that start with a and ends with z are:");
for (String s : foundaz) {
System.out.println(s);
}
}
} catch (Exception ex) {
System.out.println(ex);
}
}
}
Input was:
apple
applez
Applez
banana
Output was:
The words that start with a and ends with z are:
applez
Applez

import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.*;
public class RegexSimple2
{
public static void main(String[] args) {
try
{
Scanner myfis = new Scanner(new File("D:\\myfis2.txt"));
ArrayList <String> foundaz = new ArrayList<String>();
while(myfis.hasNext())
{
String line = myfis.nextLine();
String delim = " ";
String [] words = line.split(delim);
for (String s : words) {
if (!s.isEmpty() && s != null)
{
Pattern pi = Pattern.compile("[aA].*z");
Matcher ma = pi.matcher(s);
if (ma.find()) {
foundaz.add(s);
}
}
}
}
if(foundaz.isEmpty())
{
System.out.println("No matching words have been found!");
}
if(!foundaz.isEmpty())
{
System.out.print("The words that start with a and ends with z are:\n");
for(String s: foundaz)
{
System.out.println(s);
}
}
}
catch(Exception ex)
{
System.out.println(ex);
}
}
}

Related

counting the number of occurences of each word in a pdf file java

I am making a java program using PDFbox that reads any pdf file and counts how many times each word appears in the file but for some reason nothing appears when I run the program, I expect it to print each word and the number of occurrences of that word next to it. thanks in advance.
here is my code:
package lab8;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Map;
import java.util.TreeMap;
import java.util.Scanner;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class Extractor {
public static void main(String[] args) throws FileNotFoundException {
Map<String, Integer> frequencies = new TreeMap<String, Integer>();
PDDocument pd;
File input = new File("C:\\Users\\Ammar\\Desktop\\Application.pdf");
Scanner in = new Scanner(input);
try {
pd = PDDocument.load(input);
PDFTextStripper stripper = new PDFTextStripper();
stripper.setEndPage(20);
String text = stripper.getText(pd);
while (in.hasNext()) {
String word = clean(in.next());
if (word != "") {
Integer count = frequencies.get(word);
if (count == null) {
count = 1;
} else {
count = count + 1;
}
frequencies.put(word, count);
}
}
for (String key : frequencies.keySet()) {
System.out.println(key + ": " + frequencies.get(key));
}
if (pd != null) {
pd.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static String clean(String s) {
String r = "";
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (Character.isLetter(c)) {
r = r + c;
}
}
return r.toLowerCase();
}
}
I have tried to resolve the logic.
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Map;
import java.util.TreeMap;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class Extractor {
public static void main(String[] args) throws FileNotFoundException {
Map<String, Integer> wordFrequencies = new TreeMap<String, Integer>();
Map<Character, Integer> charFrequencies = new TreeMap<Character, Integer>();
PDDocument pd;
File input = new File("C:\\Users\\Ammar\\Desktop\\Application.pdf");
try {
pd = PDDocument.load(input);
PDFTextStripper stripper = new PDFTextStripper();
stripper.setEndPage(20);
String text = stripper.getText(pd);
for(int i=0; i<text.length(); i++)
{
char c = text.charAt(i);
int count = charFrequencies.get(c) != null ? (charFrequencies.get(c)) + 1 : 1;
charFrequencies.put(c, count);
}
String[] texts = text.split(" ");
for (String txt : texts) {
int count = wordFrequencies.get(txt) != null ? (wordFrequencies.get(txt)) + 1 : 1;
wordFrequencies.put(txt, count);
}
System.out.println("Printing the number of words");
for (String key : wordFrequencies.keySet()) {
System.out.println(key + ": " + wordFrequencies.get(key));
}
System.out.println("Printing the number of characters");
for (char charKey : charFrequencies.keySet()) {
System.out.println(charKey + ": " + charFrequencies.get(charKey));
}
if (pd != null) {
pd.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Try this code. If there is still some problem and you are not able to resolve. I can try to resolve.
In your code you can also use StringTokenizer's object by passing your string i.e
StringTokenizer st = new StringTokenizer(stripper.getText(pd));
And in while loop st.hasMoreTokens() and to render each word String word = clean(st.nextToken()); This is also working fine.

How to read the file and save into the hashmap, then save the first element as the key and the rest in a set?

I am reading a file with a disease name and its remedies. Therefore, i want to save the name as key and remedies in a set as the value. How can i reach that? It seems there is some problems in my code.
public static HashMap<String,Set<String>> disease = new HashMap <> ();
public static void main(String[] args) {
Scanner fin = null;
try {
fin = new Scanner (new File ("diseases.txt"));
while (fin.hasNextLine()) {
HashSet <String> remedies = null;
String [] parts = fin.nextLine().split(",");
int i = 1;
while (fin.hasNext()) {
remedies.add(parts[i].trim());
i++;
}
disease.put(parts[0],remedies);
}
fin.close();
}catch(Exception e) {
System.out.println("Error: " + e.getMessage());
}
finally {
try {fin.close();} catch(Exception e) {}
}
Set <String> result = disease.get("thrombosis");
display(result);
public static <T> void display (Set<T> items) {
if (items == null)
return;
int LEN = 80;
String line = "[";
for (T item:items) {
line+= item.toString() + ",";
if (line.length()> LEN) {
line = "";
}
}
System.out.println(line + "]");
}
here is my code
cancer,pain,swelling,bleeding,weight loss
gout,pain,swelling
hepatitis A,discoloration,malaise,tiredness
thrombosis,high heart rate
diabetes,frequent urination
and here is what the txt contains.
In your code , you haven't initialized the remedies HashSet(thats why it is throwing NullPointerException at line number 14).
and second issue is : i is getting incremented by 1 and you are not checking with size of your pats array ( i > parts.length) .
I edited your code :
Scanner fin = null;
try {
fin = new Scanner(new File("diseases.txt"));
while (fin.hasNextLine()) {
HashSet<String> remedies = new HashSet<String>();
String[] parts = fin.nextLine().split(",");
int i = 1;
while (fin.hasNext()&&parts.length>i) {
remedies.add(parts[i].trim());
i++;
}
disease.put(parts[0], remedies);
}
import java.util.HashMap;
import java.util.HashSet;
import java.util.Scanner;
import java.io.File;
import java.util.Set;
public class Solution {
public static HashMap<String, Set<String>> disease = new HashMap<>();
public static void main(String[] args) {
Scanner fin = null;
try {
fin = new Scanner (new File("diseases.txt"));
while (fin.hasNextLine()) {
HashSet <String> remedies = new HashSet<>();
String [] parts = fin.nextLine().split(",");
for (int i=1; i < parts.length; i++) {
remedies.add(parts[i].trim());
}
disease.put(parts[0],remedies);
}
fin.close();
}catch(Exception e) {
System.out.println("Error: " + e.getMessage());
}
finally {
try {fin.close();} catch(Exception e) {}
}
Set <String> result = disease.get("thrombosis");
display(result);
}
public static <T> void display(Set<T> items) {
if (items == null)
return;
int LEN = 80;
String line = "[";
for (T item : items) {
line += item.toString() + ",";
if (line.length() > LEN) {
line = "";
}
}
System.out.println(line + "]");
}
}
Here is full working code. As suggested by #Pratik that you forget to initialize HashSet that's why NullPointerException error was coming.
You have a few issues here:
no need for inner while loop (while (fin.hasNext()) {) - instead use `for(int i=1; i
HashSet <String> remedies = null; - this means the set is not initialized and we cannot put items in it - nede to change to: HashSet<String> remedies = new HashSet<>();
It is better practice to close() the file in the finally part
The 'display' method will delete the line (if it is longer than 80 characters) before printing it.
it is better to use StringBuilder when appending strings
So the corrected code would be:
import java.io.File;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Scanner;
import java.util.Set;
public class TestSOCode {
public static HashMap<String,Set<String>> disease = new HashMap<>();
private static int LINE_LENGTH = 80;
public static void main(String[] args) {
Scanner fin = null;
try {
fin = new Scanner(new File("diseases.txt"));
while (fin.hasNextLine()) {
HashSet<String> remedies = new HashSet<>();
String[] parts = fin.nextLine().split(",");
disease.put(parts[0], remedies);
for (int i = 1; i < parts.length; i++) {
remedies.add(parts[i].trim());
}
}
} catch (Exception e) {
System.out.println("Error: " + e.getMessage());
} finally {
try {
fin.close();
} catch (Exception e) {
System.out.println("Error when closing file: " + e.getMessage());
}
}
Set<String> result = disease.get("thrombosis");
display(result);
}
public static <T> void display (Set<T> items) {
if (items == null)
return;
StringBuilder line = new StringBuilder("[");
int currentLength = 1; // start from 1 because of the '[' char
for (T item:items) {
String itemStr = item.toString();
line.append(itemStr).append(",");
currentLength += itemStr.length() + 1; // itemStr length plus the ',' char
if (currentLength >= LINE_LENGTH) {
line.append("\n");
currentLength = 0;
}
}
// replace last ',' with ']'
line.replace(line.length() - 1, line.length(), "]");
System.out.println(line.toString());
}
}

Regex patter in Java matching single letter instead of complete word.

I am new to java and been trying to write some line of code where the requirement is something regex patter will be saved in file, read the content from file and save it array list then compare with some string variable and find the match. But in this process when am trying to do its matching single letter instead of the whole word. below is the code .
import java.io.*;
import java.util.Scanner;
import java.util.ArrayList;
import java.util.regex.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public void findfile( String path ){
File f = new File(path);
if(f.exists() && !f.isDirectory()) {
System.out.println("file found.....!!!!");
if(f.length() == 0 ){
System.out.println("file is empty......!!!!");
}}
else {
System.out.println("file missing");
}
}
public void readfilecontent(String path, String sql){
try{Scanner s = new Scanner(new File(path));
ArrayList<String> list = new ArrayList<String>();
while (s.hasNextLine()){
list.add(s.nextLine());
}
s.close();
System.out.println(list);
Pattern p = Pattern.compile(list.toString(),Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(sql);
if (m.find()){
System.out.println("match found");
System.out.println(m.group());
}
else {System.out.println("match not found"); }
}
catch (FileNotFoundException ex){}
}
public static void main( String args[] ) {
String path = "/code/sql.pattern";
String sql = "select * from schema.test";
RegexMatches regex = new RegexMatches();
regex.findfile(path);
regex.readfilecontent(path,sql);
}
the sql.pattern contains
\\buser\\b
\\border\\b
Am expecting that it shouldn't match anything and print message saying match not found instead it says match found and m.group() prints letter s as output could anyone please help.
Thanks in advance.
The problem here seems to be the double slash.
I would not recommend you to provide list.toString() in Pattern.compile method because it also inserts '[', ',' and ']' character which can mess up with you regex, instead you can refer below code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public void findfile(String path) {
File f = new File(path);
if (f.exists() && !f.isDirectory()) {
System.out.println("file found.....!!!!");
if (f.length() == 0) {
System.out.println("file is empty......!!!!");
}
} else {
System.out.println("file missing");
}
}
public void readfilecontent(String path, String sql) {
try {
Scanner s = new Scanner(new File(path));
ArrayList<String> list = new ArrayList<String>();
while (s.hasNextLine()) {
list.add(s.nextLine());
}
s.close();
System.out.println(list);
list.stream().forEach(regex -> {
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(sql);
if (m.find()) {
System.out.println("match found for regex " + regex );
System.out.println("matched substring: "+ m.group());
} else {
System.out.println("match not found for regex " + regex);
}
});
} catch (FileNotFoundException ex) {
ex.printStackTrace();
}
}
public static void main(String args[]) {
String path = "/code/sql.pattern";
String sql = "select * from schema.test";
RegexMatches regex = new RegexMatches();
regex.findfile(path);
regex.readfilecontent(path, sql);
}
}
while keeping /code/sql.pattern as below:
\buser\b
\border\b
\bfrom\b

Searching for strings in a text file

I was programing in Python but now I want to do the same code in Java. Can you help me please? This is the code that I was working on
import random
import re
a = "y"
while a == "y":
i = input('Search: ')
b = i.lower()
word2 = ""
for letter in b:
lista = []
with open('d:\lista.txt', 'r') as inF:
for item in inF:
if item.startswith(letter):
lista.append(item)
word = random.choice(lista)
word2 = word2 + word
print(word2)
a = input("Again? ")
Now I want to do the same on Java but Im not really sure how to do it. Its not that easy. Im just a beginner. So far I founded a code that makes the search in a text file but I'm stuck.
This is the java code. It finds the position of the word. I've been trying to modify it without the results Im looking for.
import java.io.*;
import java.util.Scanner;
class test {
public static void main(String[] args){
Scanner input = new Scanner(System.in);
System.out.println("Search: ");
String searchText = input.nextLine();
String fileName = "lista.txt";
StringBuilder sb = new StringBuilder();
try {
BufferedReader reader = new BufferedReader(new FileReader(fileName));
while (reader.ready()) {
sb.append(reader.readLine());
}
}
catch(IOException ex) {
ex.printStackTrace();
}
String fileText = sb.toString();
System.out.println("Position in file : " + fileText.indexOf(searchText));
}
}
What I want is to find an item in a text file, a list, but just want to show the items that begin with the letters of the string I want to search. For example, I have the string "urgent" and the text file contains:
baby
redman
love
urban
gentleman
game
elephant
night
todd
So the display would be "urban"+"redman"+"gentleman"+ until it reaches the end of the string.
Let's assume that you've already tokenized the string so you've got a list of Strings, each containing a single word. It's what comes from the reader if you've got one word per line, which is how your Python code is written.
String[] haystack = {"baby", "redman", "love", "urban", "gentleman", "game",
"elephant", "night", "todd"};
Now, to search for a needle, you can simply compare the first characters of your haystack to all characters of the needle :
String needle = "urgent";
for (String s : haystack) {
for (int i = 0; i < needle.length(); ++i) {
if (s.charAt(0) == needle.charAt(i)) {
System.out.println(s);
break;
}
}
}
This solutions runs in O(|needle| * |haystack|).
To improve it a bit for the cost of a little bit of extra memory, we can precompute a hash table for the available starts :
String needle = "urgent";
Set<Character> lookup = new HashSet<Character>();
for (int i = 0; i < needle.length(); ++i) {
lookup.add(needle.charAt(i));
}
for (String s : haystack) {
if (lookup.contains(s.charAt(0))) {
System.out.println(s);
}
}
The second solution runs in O(|needle| + |haystack|).
This works if your list of words isn't too large. If your list of words is large you could adapt this so that you stream over the file multiple time collecting words to use.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
public class Test {
public static void main(String[] args) {
Map<Character, List<String>> map = new HashMap<Character, List<String>>();
File file = new File("./lista.txt");
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(file));
String line = null;
while ((line = reader.readLine()) != null) {
// assumes words are space separated with no
// quotes or commas
String[] tokens = line.split(" ");
for(String word : tokens) {
if(word.length() == 0) continue;
// might as well avoid case issues
word = word.toLowerCase();
Character firstLetter = Character.valueOf(word.charAt(0));
List<String> wordsThatStartWith = map.get(firstLetter);
if(wordsThatStartWith == null) {
wordsThatStartWith = new ArrayList<String>();
map.put(firstLetter, wordsThatStartWith);
}
wordsThatStartWith.add(word);
}
}
Random rand = new Random();
String test = "urgent";
List<String> words = new ArrayList<String>();
for (int i = 0; i < test.length(); i++) {
Character key = Character.valueOf(test.charAt(i));
List<String> wordsThatStartWith = map.get(key);
if(wordsThatStartWith != null){
String randomWord = wordsThatStartWith.get(rand.nextInt(wordsThatStartWith.size()));
words.add(randomWord);
} else {
// text file didn't contain any words that start
// with this letter, need to handle
}
}
for(String w : words) {
System.out.println(w);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if(reader != null) {
try {
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
}
This assumes the content of lista.txt looks like
baby redman love urban gentleman game elephant night todd
And the output will look something like
urban
redman
gentleman
elephant
night
todd

Issue Reading from a file and using a 2D array to sort the data

I'm making a province sorter, and the requirement is that I must leave the main class as is, and make a private class called Munge, i've been at this for hours and changed my code hundreds of times, basically it reads from a text file that looks like this
Hamilton, Ontario
Toronto, Ontario
Edmonton, Alberta
Red Deer, Alberta
St John's, Newfoundland
and needs to be output like this
Alberta; Edmonton, Red Deer
Ontario; Hamilton, Toronto
Newfoundland; St John's
my main class is unchangeable and looks like this
public class Lab5 {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
if(args.length < 2) {
System.err.println("Usage: java -jar lab5.jar infile outfile");
System.exit(99);
}
Munge dataSorter = new Munge(args[0], args[1]);
dataSorter.openFiles();
dataSorter.readRecords();
dataSorter.writeRecords();
dataSorter.closeFiles();
}
}
and the Munge class i've made looks like this
package lab5;
import java.io.File;
import java.util.Scanner;
import java.util.Formatter;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;
public class Munge
{
private String inFileName, outFileName;
private Scanner inFile;
private Formatter outFile;
private int line = 0;
private String[] data;
public Munge(String inFileName, String outFileName)
{
this.inFileName = inFileName;
this.outFileName = outFileName;
data = new String[100];
}
public void openFiles()
{
try
{
inFile = new Scanner(new File(inFileName));
File file = new File("input.txt");
SortedMap<String, List<String>> map = new TreeMap<String, List<String>>();
Scanner scanner = new Scanner(file).useDelimiter("\\n");
while (scanner.hasNext()) {
String newline = scanner.next();
if (newline.contains(",")) {
String[] parts = newline.split(",");
String city = parts[0].trim();
String province = parts[1].trim();
List<String> cities = map.get(province);
if (cities == null) {
cities = new ArrayList<String>();
map.put(province, cities);
}
if (!cities.contains(city)) {
cities.add(city);
}
}
}
for (String province : map.keySet()) {
StringBuilder sb = new StringBuilder();
sb.append(province).append(": ");
List<String> cities = map.get(province);
for (String city : cities) {
sb.append(city).append(", ");
}
sb.delete(sb.length() - 2, sb.length());
String output = sb.toString();
System.out.println(output);
}
}
catch(FileNotFoundException exception)
{
System.err.println("File not found.");
System.exit(1);
}
catch(SecurityException exception)
{
System.err.println("You do not have access to this file.");
System.exit(1);
}
try
{
outFile = new Formatter(outFileName);
}
catch(FileNotFoundException exception)
{
System.err.println("File not found.");
System.exit(1);
}
catch(SecurityException exception)
{
System.err.println("You do not have access to this file.");
System.exit(1);
}
}
public void readRecords()
{
while(inFile.hasNext())
{
data[line] = inFile.nextLine();
System.out.println(data[line]);
line++;
}
}
public void writeRecords()
{
for(int i = 0; i < line; i++)
{
String tokens[] = data[i].split(", ");
Arrays.sort(tokens);
for(int j = 0; j < tokens.length; j++)
outFile.format("%s\r\n", tokens[j]);
}
}
public void closeFiles()
{
if(inFile != null)
inFile.close();
if(outFile != null)
outFile.close();
}
}
you'll have to excuse my brackets, there formatted correctly in netbeans but i had to move the bottom ones over to keep it in the codeblock
As I think this is homework I'll avoid giving you a solution but give some hints of what to do.
When you have read a line it consists of City, Province. So the first thing you need to do is split the string into two parts. The second part is the province and the first is the city. You need to make a collection for each province and store the city in the correct province collection.
Once you have that you sort the names of the found provinces, and iterate through them. Sort the cities for the province and then output the province name and each city name.
Useful classes could be will be HashMap, TreeMap, List, Collections (has sort methods).
Hope that helps to get you further, otherwise try to be more specific where you are stuck.

Categories