How do I access every single word in a file java?

How do I access every single word in a file java? - java

I am trying to keep every single word in this file into an array so i could apply my own language implementation on it. I have applied split already but when I put the string into the variable parts, parts[0] will display the whole file instead of one word only while parts[1] will give an error
java.lang.ArrayIndexOutOfBoundsException: 2
How do I access every single word in this file?
String[] parts = line.split("\\s+");
System.out.print(parts[0] + '\n');
file test.snol contains
SNOL
INTO num IS 5
INTO res IS MULT num num
INTO res IS MULT res res
INTO res IS MOD res num
PRINT num
PRINT res
LONS

If you are using java-8 you can do so in a single line :-
String[] words = Files.lines(Paths.get(PATH))
.flatMap(line -> Arrays.stream(line.split(" ")))
.toArray(String[]::new);
Alternatively, if you want to access each line as a list of String[] you can use :-
List<String[]> lines = Files.lines(Paths.get(PATH))
.collect(Collectors.toList())
.stream().map(e -> e.split(" "))
.collect(Collectors.toList());

The regular expression match token for whitespace is \s. Your code uses a forward slash (/) instead of a backslash (\) which has no special meaning, so your code is trying to match two forward slashes followed by one or more ss.
In Java, regular expressions are passed through strings, so backslashes need to be escaped by a second backslash (unlike a forward slash which needs no special handling). Your regular expression should read "\\s+" which will match one more whitespace characters.
Your call to split should then return an array with each word from the line as a different element.
If you are reading your file line by line, you can access every word with code like
BufferedReader reader = new BufferedReader(new FileReader("D:\\test.snol"));
String line;
while ((line = reader.readLine()) != null) {
String[] words = line.split("\\s+");
for (String word : words) {
System.out.println(word);
}
}

Related

Take string input, parse each word to all lowercase and print each word on a line, non-alphabetic characters are treated as a break between words

I'm trying to take a string input, parse each word to all lowercase and print each word on a line (in sorted order), ignoring non-alphabetic characters (single letter words count as well). So,
Sample input:
Adventures in Disneyland
Two blondes were going to Disneyland when they came to a fork in the
road. The sign read: "Disneyland Left."
So they went home.
Output:
a
adventures
blondes
came
disneyland
fork
going
home
in
left
read
road
sign
so
the
they
to
two
went
were
when
My program:
Scanner reader = new Scanner(file);
ArrayList<String> words = new ArrayList<String>();
while (reader.hasNext()) {
String word = reader.next();
if (word != "") {
word = word.toLowerCase();
word = word.replaceAll("[^A-Za-z ]", "");
if (!words.contains(word)) {
words.add(word);
}
}
}
Collections.sort(words);
for (int i = 0; i < words.size(); i++) {
System.out.println(words.get(i));
}
This works for the input above, but prints the wrong output for an input like this:
a t\|his# is$ a)( -- test's-&*%$#-`case!#|?
The expected output should be
a
case
his
is
s
t
test
The output I get is
*a blank line is printed first*
a
is
testscase
this
So, my program obviously doesn't work since scanner.next() takes in characters until it hits a whitespace and considers that a string, whereas anything that is not a letter should be treated as a break between words. I'm not sure how I might be able to manipulate Scanner methods so that breaks are considered non-alphabetic characters as opposed to whitespace, so that's where I'm stuck right now.

The other answer has already mentioned some issues with your code.
I suggest another approach to address your requirements. Such transformations are a good use case for Java Streams – it often yields clean code:
List<String> strs = Arrays.stream(input.split("[^A-Za-Z]+"))
.map(t -> t.toLowerCase())
.distinct()
.sorted()
.collect(Collectors.toList());
Here are the steps:
Split the string by one or more subsequent characters not being alphabetic;
input.split("[^A-Za-Z]+")
This yields tokens consistint solely of alphabetic characters.
Stream over the resulting array using Arrays.stream();
Map each element to their lowercase equivalent:
.map(t -> t.toLowerCase())
The default locale is used. Use toLowerCase(Locale) to explicitly set the locale.
Discard duplicates using Stream.distinct().
Sort the elements within the stream by simply calling sorted();
Collect the elements into a List with collect().
If you need to read it from a file, you could use this:
Files.lines(filepath)
.flatMap(line -> Arrays.stream(line.split("[^A-Za-Z]+")))
.map(... // Et cetera
But if you need to use a Scanner, then you could be using something like this:
Scanner s = new Scanner(input)
.useDelimiter("[^A-Za-z]+");
List<String> parts = new ArrayList<>();
while (s.hasNext()) {
parts.add(s.next());
}
And then
List<String> strs = parts.stream()
.map(... // Et cetera

Don't use == or != for comparing String(s). Also, perform your transform before you check for empty. This,
if (word != "") {
word = word.toLowerCase();
word = word.replaceAll("[^A-Za-z ]", "");
if (!words.contains(word)) {
words.add(word);
}
}
should look something like
word = word.toLowerCase().replaceAll("[^a-z ]", "").trim();
if (!word.isEmpty() && !words.contains(word)) {
words.add(word);
}

How can I split String array with following delimiters in java

I have a line in input file.
It is arranged as following (example):
(space)MOV(space)A,(space)(space)#20
When computer is reading this line, I plan to split() this string and add into the array. I use following code for this:
while((nline = bufreader.readLine()) != null)
{
String[] array = nline.split("[ ,]");
With other words, string is splitted with delimiters: (space) and (comma). So, I expect my array to have a length of 3. but in practce I get 6.
So, as I understood, computer creates array of {"(space)", "MOV", "(space)", "A", "(space)", "(space)", "#20"}. However, I need this array: {"MOV", "A", "#20"}
How can I get this? Or how can I split the array according to the above mentioned delimiters. (I suppose that nline.split("[ ,]") is not correct).

I put all the explanations in the comment to proper lines, have a look at this:
String nline;
BufferedReader bufreader = new BufferedReader(new FileReader(new File("nameOfYourFile")));
while((nline = bufreader.readLine()) != null) {
String trimmed = nline.trim(); // removing leading and trailing spaces
// System.out.println(trimmed); Output from this line: >>MOV A, #20<< (">>" and "<<" just to show where it begins and ends)
String[] splitted = trimmed.split("[ |,]{1,}"); // split on ' ' OR ',' that appear AT LEAST once (so it also matches " ," (space + comma))
System.out.println(Arrays.toString(splitted)); // Output: [MOV, A, #20]
}
bufreader.close();

Transferring each elemnt in a text file into an array

I have made this method to take in a file.txt and transfer its elements into an array list.
My problem is, I dont want to transfer a whole line into one string. I want to take each element on the line as string.
public ArrayList<String> readData() throws IOException {
FileReader pp=new FileReader(filename);
BufferedReader nn=new BufferedReader(pp);
ArrayList<String> data=new ArrayList<String>();
String line;
while((line=nn.readLine()) != null){
data.add(line);
}
xoxo.close();
return data;
}
is it possible ?

What about reading the lines, but splitting each line into the single words?
while ((line = nn.readLine()) != null) {
for (String word : line.split(" ")) {
data.add(line);
}
}
The method split(" ") in this example will split the line on each whitespace " " and put the single words into an array.
In case the words in the file are separated by another character (like a comma for example) you can use that too in split():
line.split(",");
If I may, here is a somewhat easier way to read a text file:
Scanner scanner = new Scanner(filename);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
for (String word : line.split(" ")) {
data.add(word);
}
}
Well not easier but shorter :)
And one last advice: if you give your variables a more.. readable name like bufferedReader instead of naming them all nn, pp, xoxo you might have less problems when the code grows more and more complex later on

Use split function for String.
String line = "This is line";
String [] a = line.split("\\s");// \\s is regular expression for space
a[0] = This
a[1] = is
a[2] = line

If by 'Element' you mean each word, then simply changing
line = nn.readLine()
to
line = nn.read()
should fix your problem, as the read method will take in every character it reads until it hits a space character in which it will return the processed characters. However if by element you mean each character then the problem is much harder. You will need to read each word and split that string up using any of the various functions Java provides.

regular expression for extracting some data from a text file

I have a text with sentences by this format:
sentence 1 This is a sentence.
t-extraction 1 This is a sentence
s-extraction 1 This_DT is_V a_DT sentence_N
sentence 2 ...
As you see, the lines are separated by enter key. sentence, t-extraction, s-extraction words are repeated. The numbers are sentence numbers 1,2,.. . The phrases are separated by Tab key for example in the first line: sentence(TAb)1(TAb)This is a sentence.
or in the second line:t-extraction(TAb)1(TAb)This(TAb)is(TAb)a sentence.
I need to map some of these information in a sql table, so I should extract them.
I need first and second sentence(without sentence word in first lines and t-extraction and numbers in second lines). Each separated part by Tab will be mapped in a field in sql (for example 1 in one column, This is a sentence in one column, This (in second lines) in one column, and also is and a sentence ).
What is your suggestion? Thanks in advance.

You could use String.split().
The regex you could use is [^A-Za-z_]+ or [ \t]+

Using the split method on String is probably the key to this. The split command breaks a string into parts where the regex matches, returning an array of Strings of the parts between the matches.
You want to match on tab (or \t as it is delimited to). You also want to process three lines as a unit, the code below shows one way of doing this (it does depend on the file being in good format).
Of course you want to use a reader created from your file not a string.
public class Test {
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader("/my/file.data"));
String line = null;
for(int i = 0; (line = reader.readLine()) != null; i++){
if(i % 3 == 0){
String[] parts = line.split("\t");
System.out.printf("sentence ==> %s\n", Arrays.toString(parts));
} else if(i % 3 == 1){
String[] parts = line.split("\t");
System.out.printf("t-sentence ==> %s\n", Arrays.toString(parts));
} else {
String[] parts = line.split("\t");
System.out.printf("s-sentence ==> %s\n", Arrays.toString(parts));
}
}
}
}

Java regex, delete content to the left of comma

I got a string with a bunch of numbers separated by "," in the following form :
1.2223232323232323,74.00
I want them into a String [], but I only need the number to the right of the comma. (74.00). The list have abouth 10,000 different lines like the one above. Right now I'm using String.split(",") which gives me :
System.out.println(String[1]) =
1.2223232323232323
74.00
Why does it not split into two diefferent indexds? I thought it should be like this on split :
System.out.println(String[1]) = 1.2223232323232323
System.out.println(String[2]) = 74.00
But, on String[] array = string.split (",") produces one index with both values separated by newline.
And I only need 74.00 I assume I need to use a REGEX, which is kind of greek to me. Could someone help me out :)?

If it's in a file:
Scanner sc = new Scanner(new File("..."));
sc.useDelimiter("(\r?\n)?.*?,");
while (sc.hasNext())
System.out.println(sc.next());
If it's all one giant string, separated by new-lines:
String oneGiantString = "1.22,74.00\n1.22,74.00\n1.22,74.00";
Scanner sc = new Scanner(oneGiantString);
sc.useDelimiter("(\r?\n)?.*?,");
while (sc.hasNext())
System.out.println(sc.next());
If it's just a single string for each:
String line = "1.2223232323232323,74.00";
System.out.println(line.replaceFirst(".*?,", ""));
Regex explanation:
(\r?\n)? means an optional new-line character.
. means a wildcard.
.*? means 0 or more wildcards (*? as opposed to just * means non-greedy matching, but this probably doesn't mean much to you).
, means, well, ..., a comma.
Reference.
split for file or single string:
String line = "1.2223232323232323,74.00";
String value = line.split(",")[1];
split for one giant string (also needs regex) (but I'd prefer Scanner, it doesn't need all that memory):
String line = "1.22,74.00\n1.22,74.00\n1.22,74.00";
String[] array = line.split("(\r?\n)?.*?,");
for (int i = 1; i < array.length; i++) // the first element is empty
System.out.println(array[i]);

Just try with:
String[] parts = "1.2223232323232323,74.00".split(",");
String value = parts[1]; // your 74.00

String[] strings = "1.2223232323232323,74.00".split(",");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I access every single word in a file java? - java

Related

Take string input, parse each word to all lowercase and print each word on a line, non-alphabetic characters are treated as a break between words

How can I split String array with following delimiters in java

Transferring each elemnt in a text file into an array

regular expression for extracting some data from a text file

Java regex, delete content to the left of comma

Categories

Resources