I am facing a weird file reading a file. The problem is when I read a file, it displays all the data in one line. To heal this, I added line.separators while reading the file. It works fine.see following code
line = br.readLine();
while (line != null) {
String[] parts = line.split(" ");
word_count += parts.length;
line_count++;
fileRead+=line;
fileRead+=System.getProperty("line.separator","\n");
line = br.readLine();
}
Now, the problem comes, when I read the data from fileRead String and count the length of each and every word, then it doesn't give me the correct length/size of some strings like
Let say file contains
Hello, today is Sunday
Thanks
It gives me correct lenth of hello(5) today (5) is(2) Sunday(13). it appends Sunday string like Sunday/n/rThanks. I dont know to get the length of two individuals strings
Code for getting lengths
public void stringLenth(String[] parts) {
for(int i=0;i<parts.length;i++){
System.out.println("hello"+parts[i]+"lenth"+parts[i].trim().length());
parts[i] = parts[i].replaceAll("\\r|\\n", "");
if(parts[i].length() < minWordCount ){
minWordCount = parts[i].trim().length();
}
}
}
Any idea?
Use \\s instead of a single whitespace character to split your line.
Instead of splitting, try to use a regex with a Matcher and use \\w as regex to find all words.
Related
I am trying to read an input file that contains the following:
input.txt
Hello world. Welcome,
to the java.
And, I have to append the sentence with prefix(BEGIN) and suffix(END) and the output should like the following:
output expected:
BEGIN_Hello world_END.BEGIN_ Welcome,
to the java_END.
Following is my input file reading function. I am reading an entire file and storing it in array list:
InputDetails.java
private List<String> readInput = new ArrayList<>();
public void readFile() throws IOException {
while((inputLine = input.readLine()) != null ) {
readInput.add(inputLine);
}
}
//Getter to return input file content
public List<String> getReadInput() {
return readInput;
}
And following is my code for appending the string with BEGIN and END:
public void process() {
InputDetails inputD = new InputDetails();
for(int i=0;i<inputD.getReadInput().size();i++) {
String sentence = inputD.getReadInput().get(i);
String splitSentence[] = sentence.split("\\.");
for(int j=0;j<splitSentence.length;j++) {
System.out.println(splitSentence[j]);
splitSentence[j] = "BEGIN_"+splitSentence[j]+"__END";
}
sentence = String.join(".",splitSentence);
inputD.writeToFile(sentence);
}
}
output getting:
BEGIN_SENTENCE__Hello world__END_SENTENCE.BEGIN_SENTENCE__Welcome
to the java.
Note: Each sentence is separated by a "." (period) character. The output Sentence should be prefixed with BEGIN_ and suffixed with __END. The period character is not considered a part of the sentence. And, input file are delimited by one or more spaces. The sentence is complete when it has period(.) Even if it means the sentence completes on the new line(just as the input that i specified above). All, the special chars position should be retained in the output. There can also be a space between period(.) or a comma(,) and a word. for eg: java . or Welcome ,
Can Anyone help me fix this? Thanks
First, you'll need to join your string list input into a single string. Then, you can use the String.split() method to break up your input into parts delimited by the . character. You can then choose to either run a loop on that array or use the stream method (as shown below) to iterate over your sentences. On each part, simply append the required BEGIN_ and _END blocks to the sentence. You can use manual string concatenation using the + operator or use a string template with String.format() (as shown below). Finally, reintroduce the . delimiter used to break the input by joining the parts back into a single string.
String fullString = String.join("", getReadInput());
Arrays.asList(fullString).split("\\.")).stream()
.map(s -> String.format("BEGIN_%s_END", s))
.collect(Collectors.joining("."));
I have a csv file formatted
<F,Bird,20,10/> < A,Fish,5,11,2/>
I was wondering how to read in those values separately.
Would I have to get the whole line to an array?
I have thought of doing line.split("/>") but then the first data would have < in them which I don't want.
If I on the other hand just seperate it using line.split(",") and then assign each values accordingly the values in the middle would merge so that does not work neither.
Is there a way to separate the string first without the <>/ symbols?
You can use several delimiters in split regexp, like this:
String line = "<F,Bird,20,10/> < A,Fish,5,11,2/>";
String[] lines = line.split("<|/> <|/>");
for (String item: lines) {
System.out.println(item);
}
Output (with all your spaces):
F,Bird,20,10
A,Fish,5,11,2
Try splitting your input string using the lookbehind ?<=/>:
String input = "<F,Bird,20,10/> < A,Fish,5,11,2/>";
input = input.replaceAll("\\s+", "");
String[] parts = input.split("(?<=/>)");
for (String part : parts) {
System.out.println(part.replaceAll("[<>/]", ""));
}
Note that I removed all spaces from your string to make splitting cleaner. We could still try to split with arbitrary whitespace present, but it would be more work. From this point, you can easily access the CSV data contained within each tag.
Output:
F,Bird,20,10
A,Fish,5,11,2
Demo here:
Rextester
I'm building an android/Java program which reads from a text file and store each sentence in the text file in an array list. Then it checks the occurrence of a particular word in each sentence and prints out the sentence which contains the word.
This is the code that I have so far:
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.text4);
text = (TextView)findViewById(R.id.info2);
BufferedReader reader = null;
try {
reader = new BufferedReader(
new InputStreamReader(getAssets().open("input3.txt")));
String line;
List<String> sentences = new ArrayList<String>();
}
}
}
As you can see from the above code, the program looks for the word "Despite".
My text file consist of three sentences. This program works perfectly by outputting the specific sentence with the word "Despite" if my text file is arranged using the following structure (This structure has a line break after each sentence.
However, if the the text file is arranged in the following structure (No line break after each sentence), the program will output all three sentences on the output screen.
I don't want to add a line break after each of my sentences in the text file for this program to work. How do I alter my code so it works for any type of text file regardless of its structure?
Your split() doesn't work, at all. First, your expression will only match this exact substring:
.?!\r\n\t
Extra tabs at the end are also included in the match.
You probably meant to use a character class, e.g. [0-9], but you forgot the brackets.
Since line is exactly one line of text from the file, why are splitting on \r and \n? Also, why is a tab (\t) considered a sentence separator?
Next part that's wrong with the split(), is the fact that you're only ever taking the first value ([0]). If the split had worked, that would discard the second and third sentences.
Also, when looking for a word, make sure you don't match a longer word, e.g. if looking for is, don't match this, so you need to include word-boundary checks (\b).
To ensure that the matched token, e.g. period, is included in the sentence, you need do use a zero-width positive lookbehind non-capturing group ((?<=X)).
Word matching should also be case-insensitive.
And finally, the code structure is wrong. It won't compile since you're missing an end-brace (}). This is made extra confusing because of the bad indentations.
Here is updated code:
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(getAssets().open("input3.txt")))) {
List<String> sentences = new ArrayList<>();
for (String line; (line = reader.readLine()) != null; ) {
for (String sentence : line.split("(?<=[.?!\t])")) {
sentence = sentence.trim();
if (! sentence.isEmpty()) {
sentences.add(sentence);
}
}
}
Pattern word = Pattern.compile("\\bDESPITE\\b", Pattern.CASE_INSENSITIVE);
for (String sentence : sentences) {
if (word.matcher(sentence).find()) {
text.setText(sentence);
break; // No need to continue searching
}
}
} catch (IOException e) {
Toast.makeText(getApplicationContext(), "Error reading file!", Toast.LENGTH_LONG).show();
e.printStackTrace();
}
I have made this method to take in a file.txt and transfer its elements into an array list.
My problem is, I dont want to transfer a whole line into one string. I want to take each element on the line as string.
public ArrayList<String> readData() throws IOException {
FileReader pp=new FileReader(filename);
BufferedReader nn=new BufferedReader(pp);
ArrayList<String> data=new ArrayList<String>();
String line;
while((line=nn.readLine()) != null){
data.add(line);
}
xoxo.close();
return data;
}
is it possible ?
What about reading the lines, but splitting each line into the single words?
while ((line = nn.readLine()) != null) {
for (String word : line.split(" ")) {
data.add(line);
}
}
The method split(" ") in this example will split the line on each whitespace " " and put the single words into an array.
In case the words in the file are separated by another character (like a comma for example) you can use that too in split():
line.split(",");
If I may, here is a somewhat easier way to read a text file:
Scanner scanner = new Scanner(filename);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
for (String word : line.split(" ")) {
data.add(word);
}
}
Well not easier but shorter :)
And one last advice: if you give your variables a more.. readable name like bufferedReader instead of naming them all nn, pp, xoxo you might have less problems when the code grows more and more complex later on
Use split function for String.
String line = "This is line";
String [] a = line.split("\\s");// \\s is regular expression for space
a[0] = This
a[1] = is
a[2] = line
If by 'Element' you mean each word, then simply changing
line = nn.readLine()
to
line = nn.read()
should fix your problem, as the read method will take in every character it reads until it hits a space character in which it will return the processed characters. However if by element you mean each character then the problem is much harder. You will need to read each word and split that string up using any of the various functions Java provides.
I have a text with sentences by this format:
sentence 1 This is a sentence.
t-extraction 1 This is a sentence
s-extraction 1 This_DT is_V a_DT sentence_N
sentence 2 ...
As you see, the lines are separated by enter key. sentence, t-extraction, s-extraction words are repeated. The numbers are sentence numbers 1,2,.. . The phrases are separated by Tab key for example in the first line: sentence(TAb)1(TAb)This is a sentence.
or in the second line:t-extraction(TAb)1(TAb)This(TAb)is(TAb)a sentence.
I need to map some of these information in a sql table, so I should extract them.
I need first and second sentence(without sentence word in first lines and t-extraction and numbers in second lines). Each separated part by Tab will be mapped in a field in sql (for example 1 in one column, This is a sentence in one column, This (in second lines) in one column, and also is and a sentence ).
What is your suggestion? Thanks in advance.
You could use String.split().
The regex you could use is [^A-Za-z_]+ or [ \t]+
Using the split method on String is probably the key to this. The split command breaks a string into parts where the regex matches, returning an array of Strings of the parts between the matches.
You want to match on tab (or \t as it is delimited to). You also want to process three lines as a unit, the code below shows one way of doing this (it does depend on the file being in good format).
Of course you want to use a reader created from your file not a string.
public class Test {
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader("/my/file.data"));
String line = null;
for(int i = 0; (line = reader.readLine()) != null; i++){
if(i % 3 == 0){
String[] parts = line.split("\t");
System.out.printf("sentence ==> %s\n", Arrays.toString(parts));
} else if(i % 3 == 1){
String[] parts = line.split("\t");
System.out.printf("t-sentence ==> %s\n", Arrays.toString(parts));
} else {
String[] parts = line.split("\t");
System.out.printf("s-sentence ==> %s\n", Arrays.toString(parts));
}
}
}
}