searching from txt file for a specific characters (Java) - java

I have a big txt. (a dictionary) file which contains about 100k + words ordered like that:
tree trees asderi 12
car cars asdfei 123
mouse mouses dasrkfi 333
plate plates asdegvi 333
......
(ps. there are no empty rows in between)
what i want to do is to to check the 3th column (asderi in this case at first row) and if there are letters "i" and "e" in this word then copy the first word in this row (tree in this case) to a new txt. file. I don't need a whole solution but maybe and example how to read 3th word and check for it letters and if they are TRUE print the first word in that line out.

When it comes to big data files you want to process line by line rather than reading all of it to your memory you may want to start with this to process the file line by line:
BufferedReader br = new BufferedReader(new FileReader(new File("C:/sample/sample.txt")));
String line;
while ((line = br.readLine()) != null) {
// process the line.
}
br.close();
Once you have the line i bet you will be able to use the common String-methods like .indexOf(.., .substring(..., .split to aquire the data you want (expecially since the source file seems to have well structured data).
So assumed your "columns" are always seperated by a space and there is never a word in a column containing a space nor is there never a column missing you could catch the columns using .split like this:
// this will be the current line of the file
String s = "tree trees asderi 12";
String[] fragments = s.split(" ");
String thirdColumn = fragments[2];
boolean hasI = thirdColumn.contains("i");
String firstColumn = fragments[0];
System.out.println("Fragment: "+thirdColumn+" contains i: "+hasI+" thats why i want the first fragment: "+firstColumn);
But in the end you will have to try around a bit and play with the String-methods to get it together especially for all special cases this file probably will bring up ;)
You may update your "question" with some source you managed to write with this hints and then ask again if you get stuck.

Related

Separating elements in a string by white space into two dimensional array

I am trying to store the following strings in a file into a two dimensional array. What code I have written works except for when an element contains a space, it separates into an additional element. Here is my file:
Student1 New York
Student 2 Miami
Student3 Chicago
So I would want my output to look like this:
[Student1] [New York]
[Student 2] [Miami]
[Student3] [Chicago]
This is my actual output:
[Student1] [New] [York]
[Student] [2] [Miami]
[Student3] [Chicago]
Here is what I've written so far:
String file= "file.txt";
BufferedReader br = new BufferedReader(new FileReader(file));
while ((file = br.readLine()) != null) {
if (!file.isEmpty()) {
String strSingleSpace = file.trim().replaceAll("\\s+", " ");
String[] obj = strSingleSpace.trim().split("\\s+");
int i=0;
String[][] newString = new String[obj.length][];
for(String temp : obj){
newString[i++]=temp.trim().split("\\s+");
}
List<String[]> yourList = Arrays.asList(newString);
System.out.println(yourList.get(0)[0] + " " + yourList.get(1)[0]);
Just giving you some "food for thought": your code is treating all lines the same way. As if they were looking exactly the same. Although you already made it very clear, that some lines have a different format.
In other words: there is no point in blindly splitting on spaces, if sometimes spaces belong into the first or the second column.
Instead:
Determine the last index of a number in a line - and then everything up to that index "makes up the first column".
The remainder of that line (after that last number) should go into the second column; only call trim() on that remaining string to get rid of the potentially leading spaces.
You could put all of that into a single matching regular expression too; but as that is probably some kind of homework; I leave that exercise to the reader.
I think for your specific test case it will work if you change this line:
String[] obj = strSingleSpace.trim().split("\\s+");\
to this:
String[] obj = strSingleSpace.trim().split("\\s+", 1);

Setting two different text files as seperate string arrays and finding matches from the two arrays in Java

So basically i'm trying to take two text files (one with many jumbled words and one with many dictionary words.) I am supposed to take these two text files and convert them to two seperate arrays.
Following that, I need to compare jumbled strings from the first array and match the dictionary word in the second array up to it's jumbled counterpart. (ex. aannab(in the first array) to banana(in the second array))
I know how to set one array from a string, however I don't know how to do two from two seperate text files.
Use HashMap for matching. Where first text file data will be the key of Map and second text file data will be value. Then, by using key, you will get matching value.
you can read each file into an array like this:
String[] readFile(String filename) throws IOException {
List<String> stringList = new ArrayList<>();
try {
FileInputStream fis = new FileInputStream(new File(filename));
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String line = null;
while ((line = br.readLine()) != null) {
stringList.add(line);
}
} finally {
br.close();
}
return stringList.toArray(new String[stringList.size()]);
}
Next, try to do the matching:
String[] jumbles = readFile("jumbles.txt");
String[] dict = readfile("dict.txt);
for (String jumble : jumbles) {
for (String word : dict) {
// can only be a match if the same length
if (jumble.length() == word.length()) {
//next loop through each letter of jumble and see if it
//appears in word.
}
}
}
I know how to set one array from a string, however I don't know how to do two from two seperate text files
I would encourage you to divide your problems don't knows and knows.
Search don't knows over internet you will get lot of ways to do it.
Then search for what you know,to explore whether it can be done in a better way.
To help you here,
Your Don't knows:
Reading file in Java.
Processing the content of read file.
Your known part :
String to array representation ( Search whether there are better ways in your use case)
Combine both :-)

Start reading the file after a specific word

I have a text file with some information in it, which looks something like displayed below.
I'm supposed to read the file after a specific word occurs (Complete Population), and store the vertically aligned values in each line like in an array (could be arraylist too)
What the file looks like -
Tue May 14 08:27:25 EST 2013
mating_pool=80
mutation_dist=3
algo_name=ARMO
Complete Population
8.78792396E8 7.45689508E8 8.37899916E8 9.52778502E8 8.47061622E8
8.80017166E8 7.50224432E8 8.23658404E8 9.51664198E8 8.49145008E8
8.85724416E8 7.48191542E8 7.61295532E8 1.00892758E9 8.52389824E8
8.96069156E8 7.11234404E8 7.68007126E8 9.7238065E8 8.5759227E8
8.96193522E8 7.11177522E8 7.67777526E8 9.72449466E8 8.5763106E8
8.95546766E8 7.1112849E8 7.68311754E8 9.71998374E8 8.57960886E8
8.95480802E8 7.11023308E8 7.68223532E8 9.72097758E8 8.5803376E8
8.9549393E8 7.11015392E8 7.68194136E8 9.72079838E8 8.5804897E8
8.95467666E8 7.11364074E8 7.68318732E8 9.7189094E8 8.58053462E8
8.95574386E8 7.11095656E8 7.68187948E8 9.71985272E8 8.58095624E8
8.95390774E8 7.11052654E8 7.684207E8 9.72098718E8 8.58105648E8
What I have tried
I'm able to read only one line of the numbers and not sure how to add numbers vertically.
Any help is appreciated.
Well, there actually is no issue here. You just need to code it.
There are some nice pieces of code in this thread.
Do something like this:
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
if (line.contains("Complete Population"){
// do something
break; // breaks the while loop
}
}
// we reached the section with numbers
while ((line = br.readLine()) != null) {
// use String.split to split the line, then convert
//the values to double and process them.
}
}
br.close();
Use a BufferedReader to wrap a FileReader on the file, and then use nextLine() to read each line.
Create a Pattern object with regex ".*Complete Population.*", and use a Matcher on that Pattern to check each line (looping with condition that the BufferedReader's nextLine() doesn't return null -- since null indicates end of file reached.)
When a line matches, begin processing subsequent lines to form arrays.
I'm not sure what you mean by "the vertically-aligned values", but if you mean the space-separated values on each line as an array, use String.split("\\s+"); on each line to split on whitespace, returning an array of Strings.
If by vertical arrays, you mean the first elements on each of the lines, then the second elements on each of the lines, and so on:
You can store these arrays of Strings retrieved by String.split("\\s+")ing each line together as a 2-d array by placing each array into a main array which will hold them all (an array of arrays of per-line Strings), and then, when the full read-in is done and end of file is reached, go back to this 2-d array and access element [0] of each line to get a list of the first items on each line, element[1] of each line to get a list of the second items on each line, and so on. If you want, you can store these (effectively vertical lists of items on the lines) in another set of arrays.

Text searching with line Number Complication

EDIT:
Thanks dawww, the problem was with the Encoding, i changed it to UFT-8, and now the program works perfectly well. Just a tad slow.
I am in desperate need of help.
THE PROBLEM:
I have a TreeSet with words i took out of a text, they're all lower case and follow this regex("[^a-zA-Z]"), what i need is to compare word by word of the TreeSet with the text i took them from and get the line number each word appear, store them into and ArrayList and return.
I have the following Code:
public ArrayList<Integer> search(String word, String book) throws FileNotFoundException, IOException{
FileReader path = new FileReader(book);
LineNumberReader read = new LineNumberReader(path);
ArrayList<Integer> lines = new ArrayList<>();
String line;
for(line = read.readLine(); line != null; line = read.readLine()){
if(line.toLowerCase().contains(word)){
lines.add(read.getLineNumber());
}
}
return lines;
}
The idea is to use the search method's return as a value into a Map> (each word and the lines)
like this:
for(String s : words){
map.put(s, search(s , book));
}
words is the TreeSet with the strings i took from the text (Alice in wonderland by Lewis Carroll).
the code doesn't work, and i don't know why. The code compiles and runs but the map is empty.
To check if line contains word case insensitive, you can use Apache Commons Lang library, and specifically this method: StringUtils.containsIgnoreCase(CharSequence str, CharSequence searchStr).
This library has also other utility methods that can help, for example strip and trim are useful for cleaning Strings before operate with them.
Another problem can be with the encoding of the file. FileReader always use the platform default encoding. Try to use new InputStreamReader(new FileInputStream(filePath), <encoding>) to read from the file.
Remember contains method is case sensative.
And you are making line to lower case line.toLowerCase()
It may not be matching because of that.
Please put System.out.print statement for line.toLowerCase() and word to check it
System.out.print(line.toLowerCase()+" "+word);
And if that is the case, solution will be to lower case the word also in if condition.
if(line.toLowerCase().contains(word.toLowerCase())){
lines.add(read.getLineNumber());
}

Comparing data from 2 files by tokenization

I'm reading 2 files: one named myFile, and the other named dictionary.
In dictionary, there is 2 value for each of the word in it.
So, I read the sentence in myFile, and tokenize them to sort out the value for each word.
My code is running as below:
while ((text = file.readLine()) != null){//read myFile content line by line
ArrayList<String> content = new ArrayList<String>();
StringTokenizer str = new StringTokenizer(text);//split line content
while (str.hasMoreTokens()) {
String token = str.nextToken();
content.add(token);
}//create an array to store the content of line
//define subjective of each line
boolean subjective = false;
//compare from file content with SentiWordNet
for (int i=0; i<content.size(); i++){
String cont = content.get(i);
while((line = csv.readLine()) != null)
{
//read line from SentiWordNet
String[] data = line.split("\t");
//read data SentiWordnet
String sentiWord = data[4];
if (sentiWord.contains(cont)){
if (data[2] != "0" || data[3] != "0")
subjective = true;
}
}
}
System.out.println(subjective);
}
file is myFile with sentence, and csv is the dictionary.
The problem now is that only the 1st token in the myFile carry out the comparison, while others do not.
Any idea how to solve?
Looks like you are not closing the dictionary. This line of code:
while((line = csv.readLine()) != null)
Will start to fail once you get to the end of the dictionary the first time (i.e. for the first word in myFile). For subsequent words, it will fail immediately, because you haven't closed/reopened the file.
EDIT:
In looking at your code, you are trying to determine if a word is subjective by reading each sentence in myFile and looping over each word in the sentence, and for each such word, reading dictionary. If your myFile contains many sentences and words, you will be reading the dictionary (which is likely large) many times, which seems inefficient.
For example, if there are s sentences, each with w words, you will be opening, and reading the entire dictionary s*n times.
Alternatively, what you could do read in the entire myFile into an array of sentences of length s, or even into an array of words of length n*w. This will take memory on the order of n*w (your current algorithm takes only w memory since you are creating a single array to store the words in a sentence, and re-use this array for each sentence). Then, read in the dictionary once, and for each word in the dictionary, see if it is in the array of words/sentences.
A better approach which costs more memory (assuming your dictionary is bigger than myFile) might be to read the entire dictionary into memory and sort it. Then, read myFile and locate each word in your in-memory dictionary using an efficient search. Should be much faster, assuming your files are large.

Categories