Reading letters and avoiding numbers and symbols - java

I am given a .txt file which has a bunch of words, here is a sample of what it looks like:
Legs
m%cks
animals
s3nt!m4nts
I need to create a code which reads this .txt file and put the words without numbers and symbols into an array. So basically I gotta put Legs and animals into an array The other two words I gotta just print it out.
public class Readwords {
public static void main(String[] args) {
String[] array=new string[10];
}
}
How do I get the program to read letters only and ignore the numbers and symbols?

You can use Regex for finding numbers and symbols,after that replace them.
1).Read the whole .txt file to a string.
2).Use replaceAll function to replace the unwanted characters.
String str = your text;
str = str.replaceAll(your regex, "");

You can try this:
try {
BufferedReader file = new BufferedReader(new FileReader("yourfile.txt")
String line;
ArrayList<String> array = new ArrayList<String>();
while ((line = file.nextLine()) != null) {
if (line.matches("[a-zA-Z]+"))
array.add(line);
else
System.out.println(line);
}
String[] result = array.toArray(new String[array.size()]);
file.close();
return result;
}
catch (Exception e)
e.printStackTrace;

Related

How to split an ArrayList of sentences into an ArrayList of words in Java without reading a text file more than once?

I need to read a text file only once and store the sentences into an ArrayList. Then, I need to split the ArrayList of sentences into another ArrayList of each individual word. Not sure how to go about doing this?
In my code, I've split all the words into an ArrayList, but I think it's reading from the file again, which I can't do.
My code so far:
public class Main {
public static void main(String[] args){
try{
FileReader fr = new FileReader("input.txt");
BufferedReader br = new BufferedReader(fr);
ArrayList<String> sentences = new ArrayList<String>();
ArrayList<String> words = new ArrayList<String>();
String line;
while((line=br.readLine()) != null){
String[] lines = line.toLowerCase().split("\\n|[.?!]\\s*");
for (String split_sentences : lines){
sentences.add(split_sentences);
}
/*Not sure if the code below reads the file again. If it
does, then it is useless.*/
String[] each_word = line.toLowerCase().split("\\n|[.?!]\\s*|\\s");
for(String split_words : each_word){
words.add(split_words);
}
}
fr.close();
br.close();
String[] sentenceArray = sentences.toArray(new String[sentences.size()]);
String[] wordArray = words.toArray(new String[words.size()]);
}
catch(IOException e) {
e.printStackTrace();
}
}
}
/*Not sure if the code below reads the file again. If it does, then it is useless.*/
It doesn't. You are simply reparsing the line that you have already read.
You have already solved your problem.

Identifying each word in a file

Importing a large list of words and I need to create code that will recognize each word in the file. I am using a delimiter to recognize the separation from each word but I am receiving a suppressed error stating that the value of linenumber and delimiter are not used. What do I need to do to get the program to read this file and to separate each word within that file?
public class ASCIIPrime {
public final static String LOC = "C:\\english1.txt";
#SuppressWarnings("null")
public static void main(String[] args) throws IOException {
//import list of words
#SuppressWarnings("resource")
BufferedReader File = new BufferedReader(new FileReader(LOC));
//Create a temporary ArrayList to store data
ArrayList<String> temp = new ArrayList<String>();
//Find number of lines in txt file
String line;
while ((line = File.readLine()) != null)
{
temp.add(line);
}
//Identify each word in file
int lineNumber = 0;
lineNumber++;
String delimiter = "\t";
//assess each character in the word to determine the ascii value
int total = 0;
for (int i=0; i < ((String) line).length(); i++)
{
char c = ((String) line).charAt(i);
total += c;
}
System.out.println ("The total value of " + line + " is " + total);
}
}
This smells like homework, but alright.
Importing a large list of words and I need to create code that will recognize each word in the file. What do I need to do to get the program to read this file and to separate each word within that file?
You need to...
Read the file
Separate the words from what you've read in
... I don't know what you want to do with them after that. I'll just dump them into a big list.
The contents of my main method would be...
BufferedReader File = new BufferedReader(new FileReader(LOC));//LOC is defined as class variable
//Create an ArrayList to store the words
List<String> words = new ArrayList<String>();
String line;
String delimiter = "\t";
while ((line = File.readLine()) != null)//read the file
{
String[] wordsInLine = line.split(delimiter);//separate the words
//delimiter could be a regex here, gotta watch out for that
for(int i=0, isize = wordsInLine.length(); i < isize; i++){
words.add(wordsInLine[i]);//put them in a list
}
}
You can use the split method of the String class
String[] split(String regex)
This will return an array of strings that you can handle directly of transform in to any other collection you might need.
I suggest also to remove the suppresswarning unless you are sure what you are doing. In most cases is better to remove the cause of the warning than supress the warning.
I used this great tutorial from thenewboston when I started off reading files: https://www.youtube.com/watch?v=3RNYUKxAgmw
This video seems perfect for you. It covers how to save file words of data. And just add the string data to the ArrayList. Here's what your code should look like:
import java.io.*;
import java.util.*;
public class ReadFile {
static Scanner x;
static ArrayList<String> temp = new ArrayList<String>();
public static void main(String args[]){
openFile();
readFile();
closeFile();
}
public static void openFile(){
try(
x = new Scanner(new File("yourtextfile.txt");
}catch(Exception e){
System.out.println(e);
}
}
public static void readFile(){
while(x.hasNext()){
temp.add(x.next());
}
}
public void closeFile(){
x.close();
}
}
One thing that is nice with using the java util scanner is that is automatically skips the spaces between words making it easy to use and identify words.

Unable to output ArrayList contents

I'm reading a txt file wich contains one word in each line, stripping the word from non-alphanumeric characters and storing the results into an Array List.
public class LeeArchivo
{
public static void main(String args[]) throws IOException
{
try
{
BufferedReader lector = null;
List<String> matrix = new ArrayList<String>();
lector = new BufferedReader(new FileReader("spanish.txt"));
String line = null;
while((line = lector.readLine())!=null)
{
//matrix.add(line);
matrix.add((line.split("[^a-zA-Z0-9']").toString()));
}
System.out.println(matrix);
System.out.println(matrix.size());
} catch (IOException e)
{
e.printStackTrace();
}
}
}
when I try to print the contents of the ArrayList all I get is each String Object's memory address. The funny thing is, if I don't split the line ie.: I just matrix.add(line) I get the Strings Ok.
I've tried StringBuilders, Iterators, .toString but nothing works.
Can somebody help me to understand what's going on here?
Thanks.
The line.split("[^a-zA-Z0-9']") call returns a String array. Not a String.
So, you are adding not a String instance to your array list, but the result of String array object toString() method call - the String array object's memory address.
If you need to get the whole string after splitting, you should concatenate all
elements of the array, for example:
while((line = lector.readLine())!=null) {
String[] arr = line.split("[^a-zA-Z0-9']");
String res = "";
for(String s : arr) {
res += s;
}
matrix.add(res);
}

reading from text file to string array

So I can search for a string in my text file, however, I wanted to sort data within this ArrayList and implement an algorithm. Is it possible to read from a text file and the values [Strings] within the text file be stored in a String[] Array.
Also is it possible to separate the Strings? So instead of my Array having:
[Alice was beginning to get very tired of sitting by her sister on the, bank, and of having nothing to do:]
is it possible to an array as:
["Alice", "was" "beginning" "to" "get"...]
.
public static void main(String[]args) throws IOException
{
Scanner scan = new Scanner(System.in);
String stringSearch = scan.nextLine();
BufferedReader reader = new BufferedReader(new FileReader("File1.txt"));
List<String> words = new ArrayList<String>();
String line;
while ((line = reader.readLine()) != null) {
words.add(line);
}
for(String sLine : words)
{
if (sLine.contains(stringSearch))
{
int index = words.indexOf(sLine);
System.out.println("Got a match at line " + index);
}
}
//Collections.sort(words);
//for (String str: words)
// System.out.println(str);
int size = words.size();
System.out.println("There are " + size + " Lines of text in this text file.");
reader.close();
System.out.println(words);
}
To split a line into an array of words, use this:
String words = sentence.split("[^\\w']+");
The regex [^\w'] means "not a word char or an apostrophe"
This will capture words with embedded apostrophes like "can't" and skip over all punctuation.
Edit:
A comment has raised the edge case of parsing a quoted word such as 'this' as this.
Here's the solution for that - you have to first remove wrapping quotes:
String[] words = input.replaceAll("(^|\\s)'([\\w']+)'(\\s|$)", "$1$2$3").split("[^\\w']+");
Here's some test code with edge and corner cases:
public static void main(String[] args) throws Exception {
String input = "'I', ie \"me\", can't extract 'can't' or 'can't'";
String[] words = input.replaceAll("(^|[^\\w'])'([\\w']+)'([^\\w']|$)", "$1$2$3").split("[^\\w']+");
System.out.println(Arrays.toString(words));
}
Output:
[I, ie, me, can't, extract, can't, or, can't]
Also is it possible to separate the Strings?
Yes, You can split string by using this for white spaces.
String[] strSplit;
String str = "This is test for split";
strSplit = str.split("[\\s,;!?\"]+");
See String API
Moreover you can also read a text file word by word.
Scanner scan = null;
try {
scan = new Scanner(new BufferedReader(new FileReader("Your File Path")));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
while(scan.hasNext()){
System.out.println( scan.next() );
}
See Scanner API

Read each line in a text file for strings, doubles, and ints an place them in different arrays

this is my first question here so I hope I'm doing this right. I have a programming project that needs to read each line of a tab delimited text file and extract a string, double values, and int values. I'm trying to place these into separate arrays so that I can use them as parameters. This is what I have so far(aside from my methods):
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Scanner;
public class LoanDriver {
public static void main(String[] args)
{
String[] stringData = new String[9];
Scanner strings = null;
try
{
FileReader read = new FileReader("amounts.txt");//Read text file.
strings = new Scanner(read);
String skip = strings.nextLine();//Skip the first line by storing it in an uncalled variable
strings.useDelimiter("\t *");//Tab delimited
}
catch (FileNotFoundException error)
{}
while (strings.hasNext())
{
String readLine = strings.next();
stringData = readLine.split("\t");
}
}}
If I try to get the [0] value, it skips all the way to the bottom of the file and returns that value, so it works to some extent, but not from the top like it should. Also, I can't incorporate arrays into it because I always get an error that String[] and String is a type mismatch.
Instead of using delimiter, try reading the file line by line using Scanner.nextLine and split each new line you read using String.split ("\t" as argument).
try {
FileReader read = new FileReader("amounts.txt");//Read text file.
strings = new Scanner(read);
String skip = strings.nextLine();//Skip the first line by storing it in an uncalled variable
}
catch (FileNotFoundException error) { }
String line;
while ((line = strings.nextLine()) != null) {
String[] parts = line.split("\t");
//...
}
You are getting the last value in the file when you grab stringData[0] because you overwrite stringData each time you go through the while loop. So the last value is the only one present in the array at the end. Try this instead:
List<String> values = new ArrayList<String>();
while (strings.hasNext()) {
values.add(strings.next());
}
stringData = values.toArray(new String[values.size()]);

Categories