I have a text file with thousands and thousands of lines of gibberish, Hidden somewhere inside is a string of words in english.
What would be the most efficient way to search through the text without having to read it line by line?
Is there a script I could write to read through the file?
I can post the file if anyones interested?
edit: If someone would be willing to show me how to check for words with a BufferedReader in Java that would be really cool!
If you know nothing more than that there is one streak of valid english words somewhere in the file, you will have to read in the file and check each word against a set of valid words (dictionary). On the first hit, you continue to read in the file until the first non-valid word occurs.
This assumes there are no accidentally valid words within the gibberish. In that case, you would have to find all streaks of valid words, and then probably have a human (you) decide which is the right one.
edit: another thing you can do is define a minimum streak length n if you know that the string of words you are looking for consists of a minimum on n valid words. This could at least spare you dealing with all the false positive 1-word-streaks of single accidentally valid words within the gibberish.
Related
I need to remove comments from code, but in this case I'll have to do it without using
System.out.println(sourceCode.replaceAll("//.*|/\\*((.|\\n)(?!=*/))+\\*/", ""));
The program needs to check the code character by character to look for "/" and then proceed to check if the next character is "/" or "*".
I'm looking for a good way to read through the code and check characters letter by letter
This is a classic problem given to new learners in Java. I would suggest to go for a simple approach as it is intended to help you practice your coding skills
Read the java source code as a file in your program char by char.
Search for comments beginning. In this case, there are 2, /* and //.
Open a string buffer and start writing the read contents into it.
If its /*, then don't write it in buffer. Keep on moving to next character till you find */.
Repeat till end of file is reached.
If single line comments need to be removed, then same algorithm can be followed till you get a new line character.
If you need help in reading from file char by char, refer to Java documentation.
When end of file is reached, then write the string buffer back to the file.
I am building a sorting program for a class, and this whole week I've been stuck on how to read in the text file. The text file will be specified as a command argument on command line, and it will consist of hospital records. it will be 4 pieces of data separated by comma on each line. it will be someones last, and first name, room number, and age. I have to read in this data somehow line by line. number for peoples records aren't specified. I know how to sort them, I just havent been able to figure out how to read in the data.
this is an example of what it looks like.
Costanza,George,122,53
Poppins,Mary,123,72
You could read in the file line by line, splitting the line at the commas, storing the split string in an array, and setting fields accordingly. Do you have a class for the patient?
Here is an example that uses similar methods that may be applicable to your situation.
while(in.hasNextLine()){
line = in.nextLine();
studentTraits = line.split(" \\| ");
...}
//studentTraits is an array with 5 indexes, and
//each line of the file has 5 sections separated by the pipe character
Hope this helps. Next time you ask a question, it will be much more helpful if you asked a much more specific question. Here, you did not exactly ask a question, you just pretty much asked for someone to write some code for you, and it doesn't look like you put any effort into solving the problem for yourself. Please show what you know, and ask about what you are stuck on.
I am trying to create a "word completion" tree java program from a dictionary that is a text file but I am not sure where to go from here. The word completion program will match any words that start with the string entered. I am new to java/ programming. I have designed the tree as a multi way tree with each node storing a character as a letter and a boolean variable to indicate if it is the end of the word (amongst other things).
I am at the point where I am trying to see if my reading in of the file into the tree is working correct. However when I try to print my tree, it is not working correctly. It is not displaying the first letter correctly in every word after the first word. Instead of reading in from file, for testing purposes I am simply adding only 4 words to tree (Base, Basement, Ma, Matthew).
So my question is can anyone tell me why it is not printing correctly AND what I need to do next in order to finish the word completion?
Thank you so much in advance to everyone for taking the time to help me with my problem
it's this part
while(t!=null) {
if(t.down!=null && t.right!=null) {
//System.out.println(t.letter + " children");
//System.out.print(t.letter);
print(t.down);
}
t=t.right;
when you encounter another word you should print, you start it from t.down. You can for example, store all the letters up to that node on mutual stack, print them, and then proceed to printing other letters from tree.
Issue here is: t.down is next letter (from point of view of current node) in some other word.
Try adding more words with common starting substrings to understand my point easily.
I have a text file, in which I am writing 3 things
Eg < int,int,char> for each word.
Now, I am reading the file such that I consider a block of 3.1st one I always consider an integer, 2nd one also as integer and the 3rd one as character .There is no problem when the integer is from 0-9 but when it exceeds like 10,100 then my program doesn't work for the obvious reasons.
Like there is no problem when I have to read this
11a here <1=int,1=int,a=char>
but when something like this comes, I face problem
152a here <15=int,2=int,a=char>
I have put the whole text file in a string.Now, how how do I read the characters that I no longer face the above mentioned problem
Some more info: My text file contains characters like this
11a22d33f1234f
Given your current description of the problem, there is no way to determine if an entry such as
152a
corresponds to (15,2,a) or (1,52,a).
Why don't you write to the file with some delimiter between elements, and then split() around the delimiter when reading back in from the file?
your text file has improper format then
how do you want to differ "1 11 a" and "11 1 a" e.g.
cant you use csv or something like that?
I need to write a parser for textfiles (at least 20 kb), and I need to determine if words out of a set of words appear in this textfile (about 400 words and numbers). So I am looking for the most efficient possibilitie to do this (if a match is found, i need to do some further processing of this and it's previous line).
What I currently do, is to exclude lines that do not contain any information for sure (kind of metadata lines) and then compare word by word - but i don't think that only comparing word by word is the most efficient possibility.
Can anyone please provide some tips/hints/ideas/...
Thank you very much
It depends on what you mean with "efficient".
If you want a very straightforward way to code it, keep in mind that the String object in java has method String.contains(CharSequence sequence).
Then, you could put the file content into a String and then iterate on your keywords you want to check to see if any of those appear in String, using the method contains().
How about the following:
Put all your keywords in a HashSet (Set<String> keywords;)
Read the file one line at once
For each line in file:
Tokenize to words
For each word in line:
If word is contained in keywords (keywords.containes(word))
Process actual line
If previous line is available
Process previous line
Keep track of previous line (prevLine = line;)