Search for string in file and write to multiple files - java

If the String is found within the file in more than one place, I want to write it to file2. If that String is found in the original file OR file2, write to file3. If String was matched in any of the previous three files, write to file4.
I have used several BufferedWriters which does not work. Help here? What do I replace with "fileIAmSearching"?
import java.io.*;
import java.util.*;
public class SortGeneSym {
public static void main (String [] args) {
try {
BufferedReader br = new BufferedReader(new FileReader( "FormattedHumanRNA" ));
String line;
String genesym;
while ((line= br.readLine() ) != null)
{ String arr[] = line.split( "\t");
genesym = arr[0];
//variable genesym is the first String in line
if(fileIAmsearching.contains(genesym)) {
BufferedWriter bw1 = new BufferedWriter(new FileWriter( "1Occurance" ));
bw1.write (line);
// a match!
break;
}
else if (fileIAmSearching.contains(genesym)) {
BufferedWriter bw2 = new BufferedWriter(new FileWriter( "2Occurance"));
bw2.write (line);
break;
}
else if (fileIamSearching.contains(genesym)) {
BufferedWriter bw3 = new BufferedWriter(new FileWriter( "3Occurance"));
bw3.write (line);
break;
}
else (fileIamSearching.contains(genesym) = null ) {
BufferedWriter bw0 = new BufferedWriter(new FileWriter( "0Occurance"));
bw0.writer (line);
break;
}
}
}
catch (IOException e) {
System.out.println ("file probs dne");
}
}
}

See the API for the String class, it contains several methods that allow you to search the character values of the object. For instance, you can read each line from the file using a BufferedReader, and then search each line (String) using the contains method:
while ( ( line = br.readLine() ) != null ){
if ( line.contains(whatImSearchingFor) ){
//do something
}
}
Some other comments:
If you wish to search in a case sensitive manner, you can convert both 'needle' and 'haystack' variables to upper or lower case and then search.
If you wish to search for phrases which may overlap line breaks, you can use a StringBuilder to append each line of the File (with or without the line breaks depending on the context), then search that StringBuilder (or convert to a String)
No reason to break out of the if/else conditionals
You should close the BufferedWriters (and BufferedReaders). This is often done using a try/catch/finally, where the finally will always be called to close the Writer - you might wish to create these outside the while loop, write to them as needed, then close after the loop is complete.

Related

How to split single text file into multiple with character as delimiter

I have a text document that has multiple separate entries all compiled into one .log file.
The format of the file looks something like this.
$#UserID#$
Date
User
UserInfo
SteamFriendID
=========================
<p>Message</p>
$#UserID#$
Date
User
UserInfo
SteamFriendID
========================
<p>Message</p>
$#UserID#$
Date
User
UserInfo
SteamFriendID
========================
<p>Message</p>
I'm trying to take everything in between the instances of "$#UserID$#", and print them into separate text files.
So far, with the looking that I've done, I tried implementing it using StringBuilder in something like this.
FileReader fr = new FileReader(“Path to raw file.”);
int idCount = 1;
FileWriter fw = new FileWriter("Path to parsed files" + idCount);
BufferedReader br = new BufferedReader(fr);
//String line, date, user, userInfo, steamID;
StringBuilder sb = new StringBuilder();
//br.readLine();
while ((line = br.readLine()) != null) {
if(line.substring(0,1).contains("$#")) {
if (sb.length() != 0) {
File file = new File("Path to parsed logs" + idCount);
PrintWriter pw = new PrintWriter(file, "UTF-8");
pw.println(sb.toString());
pw.close();
//System.out.println(sb.toString());
Sb.delete(0, sb.length());
idCount++;
}
continue;
}
sb.append(line + "\r\n");
}
But this only gives me the first 2 of the entries in separate parsed files. Leaving the 3rd one out for some reason.
The other way I was thinking about doing it was reading in all the lines using .readAllLines(), store the list as an array, loop through the lines to find "$#", get that line's index & then recursively write the lines starting at the index given.
Does anyone know of a better way to do this, or would be willing to explain to me why I'm only getting two of the three entries parsed?
Short / quick fix is to write the contents of the StringBuilder once after your while loop like this:
public static void main(String[] args) {
try {
int idCount = 1;
FileReader fr = new FileReader("<path to desired file>");
BufferedReader br = new BufferedReader(fr);
//String line, date, user, userInfo, steamID;
StringBuilder sb = new StringBuilder();
//br.readLine();
String line = "";
while ((line = br.readLine()) != null) {
if(line.startsWith("$#")) {
if (sb.length() != 0) {
writeFile(sb.toString(), idCount);
System.out.println(sb);
sb.setLength(0);
idCount++;
}
continue;
}
sb.append(line + "\r\n");
}
if (sb.length() != 0) {
writeFile(sb.toString(), idCount);
System.out.println(sb);
idCount++;
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static void writeFile(String content, int id) throws IOException
{
File file = new File("<path to desired dir>\\ID_" + id + ".txt");
file.createNewFile();
PrintWriter pw = new PrintWriter(file, "UTF-8");
pw.println(content);
pw.close();
}
I've changed two additional things:
the condition "line.substring(0,1).contains("$#")" did not work properly, the substring call only returns one character, but is compared to two characters -> never true. I changed that to use the 'startsWith' method.
After the content of the StringBuilder is written to file, you did not reset or empty it, resulting in the second and third file containing every previous blocks aswell (thrid file equals input then...). So thats done with "sb.setLength(0);".

Retrieve data from txt file and replace the data into a new txt file using java

I am trying to read in a text file and then manipulate a little and update the records into a new text file.
Here is what I have so far:
ArrayList<String> linesList = new ArrayList<>();
BufferedReader br;
String empid, email;
String[] data;
try {
String line;
br = new BufferedReader(new FileReader("file.txt"));
while ((line = br.readLine()) !=null) {
linesList.add(line);
}
br.close();
}
catch (IOException e) { e.printStackTrace(); }
for (int i = 0; i < linesList.size(); i++) {
data = linesList.get(i).split(",");
empid = data[0];
ccode = data[3];
}
File tempFile = new File("File2.txt");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempFile));
for (int i = 0; i < linesList.size(); i++) {
if(i==0){
bw.write(linesList.get(i));
bw.newLine();
}
else{
data = linesList.get(i).split(",");
String empid1 = data[0];
if(data[13].equals("IND")) {
String replace = data[3].replaceAll("IND", "IN");
ccode1 = replace;
System.out.println(ccode1);
}
else if(data[13].equals("USA")) {
String replace = data[3].replaceAll("USA", "US");
ccode1 = replace;
}
else {
ccode1 = replace; //This does not work as replace is not defined here, but how can I get it to work here.
}
String newData=empid1+","+ccode1;
bw.write(newData);
bw.newLine();
}
}
Here is what is inside the text file:
EID,First,Last,Country
1,John,Smith,USA
2,Jane,Smith,IND
3,John,Adams,USA
So, what I need help with is editing the three letter country code and replacing it with a 2 letter country code. For example: USA would become US, and IND would become IN. I am able to read in the country code, but am having trouble in changing the value and then replacing the changed value back into a different text file. Any help is appreciated. Thanks in advance.
Open file in text editor, Search and Replace, ,USA with ,US, ,IND with ,IN and so on.
As such, to automate it, on the same while loop you read a line do:
//while(read){ line.replaceAll(",USA",",US");
That will be the easiest way to complete your objective.
To save, open a BufferedWriter bw; just like you opened a reader and use bw.write(). You would probably prefer to open both at the same time, the reader on your source file, and the writer on a new file, with _out suffix. That way you dont need to keep the file data in memory, you can read and write as you loop.
For harder ways, read the csv specs: https://www.rfc-editor.org/rfc/rfc4180#section-2
Notice that you have to account for the possibility of fields being enclosed in quotes, like: "1","John","Smith","USA", which means you also have to replace ,\"USA with ,\"US.
The delimiter may or may not be a comma, you have to make sure yur input will always use the same delimiter, or that you can detect and switch at runtime.
You have to account for the case where a delimiter may be part of a field, or where quotes are part of a field.
Now you know/can solve these issues you can, instead of using replace, parse the lines character by character using while( (/*int*/ c = br.read()) != -1), and do this replacement manually with an if gate.
/*while(read)*/
if( c == delimiter ){
if not field, start next field, else add to field value
} else if( c == quote ){
if field value empty, ignore and expect closing quote, else if quote escape not marked, mark it, else, add quote to field value
}
(...)
} else if( c == 13 or c == 10 ){
finished line, check last field of row read and replace data
}
To make it better/harder, define a parsing state machine, put the states into an Enum, and write the if gates with them in mind (this will make your code be more like a compiler parser).
You can find parsing code at different stages here: https://www.mkyong.com/java/how-to-read-and-parse-csv-file-in-java/
You need to change a little bit in your concept. If you want to edit a file then,
create a new file and write content in new file and delete old file and rename new file
with old name.
ArrayList<String> linesList = new ArrayList<>();
BufferedReader br;
String[] data;
File original=new File("D:\\abc\\file.txt");
try {
String line;
br = new BufferedReader(new FileReader(original));
while ((line = br.readLine()) !=null) {
linesList.add(line);
}
br.close();
}
catch (IOException e) { e.printStackTrace(); }
File tempFile = new File("D:\\abc\\tempfile.txt");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempFile));
for (int i = 0; i < linesList.size(); i++) {
if(i==0){
bw.write(linesList.get(i));
bw.newLine();
}
else{
data = linesList.get(i).split(",");
String empid = data[0];
String name=data[1];
String lname=data[2];
String ccode = data[3].substring(0, 2);
String newData=empid+","+name+","+lname+","+ccode+"\n";
bw.write(newData);
bw.newLine();
}
}
bw.close();
if (!original.delete()) {
System.out.println("Could not delete file");
return;
}
// Rename the new file to the filename the original file had.
if (!tempFile.renameTo(original))
System.out.println("Could not rename file");

How to remove a particular string in a text file using java?

My input file has numerous records and for sample, let us say it has (here line numbers are just for your reference)
1. end
2. endline
3. endofstory
I expect my output as:
1.
2. endline
3. endofstory
But when I use this code:
import java.io.*;
public class DeleteTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("D:/mypath/file.txt");
File temp = File.createTempFile("file1", ".txt", file.getParentFile());
String charset = "UTF-8";
String delete = "end";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), charset));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp), charset));
for (String line; (line = reader.readLine()) != null;) {
line = line.replace(delete, "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
}
I get my output as:
1.
2. line
3. ofstory
Can you guys help me out with what I expect as output?
First, you'll need to replace the line with the new string List item not an empty string. You can do that using line = line.replace(delete, "List item"); but since you want to replace end only when it is the only string on a line you'll have to use something like this:
line = line.replaceAll("^"+delete+"$", "List item");
Based on your edits it seems that you indeed what to replace the line that contains end with an empty string. You can do that using something like this:
line = line.replaceAll("^"+delete+"$", "");
Here, the first parameter of replaceAll is a regular expression, ^ means the start of the string and $ the end. This will replace end only if it is the only thing on that line.
You can also check if the current line is the line you want to delete and just write an empty line to the file.
Eg:
if(line.equals(delete)){
writer.println();
}else{
writer.println(line);
}
And to do this process for multiple strings you can use something like this:
Set<String> toDelete = new HashSet<>();
toDelete.add("end");
toDelete.add("something");
toDelete.add("another thing");
if(toDelete.contains(line)){
writer.println();
}else{
writer.println(line);
}
Here I'm using a set of strings I want to delete and then check if the current line is one of those strings.

Java BufferedWriter isn't working

Im having a problem with a BufferedWriter. I am reading in a 50,000 word wordlist, using a stemming algorithm and creating a new wordlist that just contains the word stems. Instead of this new file containing any stems however it litrally just contains:
-
Here is my code:
public static void main(String[] args) {
BufferedReader reader=null;
BufferedWriter writer=null;
try {
writer = new BufferedWriter(new FileWriter(new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<String>();
reader = new BufferedReader(new InputStreamReader(new FileInputStream("src/wordlist"),"UTF-8"));
String word;
int i=0;
while ((word=reader.readLine())!=null) {
i++;
Stemmer s= new Stemmer();
s.addword(word);
s.stem();
String stem =s.toString();
if(!db.contains(stem)){
db.add(stem);
writer.write(stem);
//System.out.println(stem);
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
reader.close();
writer.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
The output i get on the console is:
Reduced file from 58110 words to 28201
So i know its working. Ive also tried changing writer.write(stem); to writer.write("hi"); and I still get the same output in newwordlist.txt.
I know its no fault of the Stemmer class, Ive tried outputting the stem string (where I commented the code) and that produced the correct output to console so the fault must be with the writer but I dont understand what.
Edit 1
I simplified to code to:
BufferedReader reader=null;
BufferedWriter writer=null;
try {
writer = new BufferedWriter(new FileWriter(new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<String>();
reader = new BufferedReader(new InputStreamReader(new FileInputStream("src/wordlist.txt"),"UTF-8"));
String word;
int i=0;
while ((word=reader.readLine())!=null) {
i++;
if(!db.contains(word)){
db.add(word);
writer.write("hi");
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
reader.close();
writer.close();
} catch (IOException e1) {
e1.printStackTrace();
}
Now i get console output:
Reduced file from 58110 words to 58109
But the output file is still blank
I would expect the code as given in the Question to produce a file that consists of one line, consisting of all of the "stems" concatenated. (Or in the "hi" version, one line consisting of "hihihi...." repeated a large number of times.)
It is conceivable that whatever you are using to view the file cannot cope with an input file that consists of many thousands of characters ... and no end-of-line.
Change
writer.write(stem);
to
writer.write(stem);
writer.write(EOL);
where EOL is the platform specific end-of-line sequence.
Assuming you are using Java 7, it would be better to use try-with-resource to make sure that the output stream is always closed / flushed, even if there is an error:
public static void main(String[] args) {
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(new FileInputStream("src/wordlist"), "UTF-8"));
BufferedWriter writer = new BufferedWriter(new FileWriter(
new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<>();
String EOL = System.getProperty("line.separator");
String word;
int i = 0;
while ((word = reader.readLine()) != null) {
i++;
Stemmer s = new Stemmer();
s.addword(word);
s.stem();
String stem = s.toString();
if (db.add(stem)) {
writer.write(stem);
writer.write(EOL);
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
} catch (IOException e1) {
e1.printStackTrace();
}
}
(I tidied up a couple of other things too ...)
The reason you get Reduced file from 58110 words to 58109 console output is that you only have one System.out.println statement after the loop.
The writer should write words only to the output file src/newwordlist.txt and not to the console. If you want your program to output words to the console add additional System.out.println(word) after writer.write("hi");
Hope this helps...
Works for me. Is this your exact class, did you edit it before pasting in?
wordlist;
the
cat
sat
on
the
mat
newwordlist.txt;
thecatsatonmat
My Stemmer just returns the word you gave it.
public class Stemmer {
private String word;
public void addword(String word) {
this.word = word;
}
public void stem() {
// TODO Auto-generated method stub
}
#Override
public String toString() {
return word;
}
}
According to the Java documentation you need to use BufferedWriter.write() as follows:
write(string,offset,length);
so try:
writer.write(stem,0,stem.length());
When I run your edited code I get one line with
hihihihihihihihihihihihihi ............
As expected.
Perhaps you intended to add newline characters line this.
if(!db.contains(word)){
db.add(word);
writer.write(word);
writer.write("\n");
}

How do I make my java code search only for a to z and 0 to 9

My java code takes almost 10-15minutes to run (Input file is 7200+ lines long list of query). How do I make it run in short time to get same results?
How do I make my code to search only for aA to zZ and 0 to 9??
If I don't do #2, some characters in my output are shown as "?". How do I solve this issue?
// no parameters are used in the main method
public static void main(String[] args) {
// assumes a text file named test.txt in a folder under the C:\file\test.txt
Scanner s = null;
BufferedWriter out = null;
try {
// create a scanner to read from the text file test.txt
FileInputStream fstream = new FileInputStream("C:\\user\\query.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
// Write to the file
out = new BufferedWriter(new FileWriter("C:\\user\\outputquery.txt"));
// keep getting the next String from the text, separated by white space
// and print each token in a line in the output file
//while (s.hasNext()) {
// String token = s.next();
// System.out.println(token);
// out.write(token + "\r\n");
//}
String strLine="";
String str="";
while ((strLine = br.readLine()) != null) {
str+=strLine;
}
String st=str.replaceAll(" ", "");
char[]third =st.toCharArray();
System.out.println("Character Total");
for(int counter =0;counter<third.length;counter++){
//String ch= "a";
char ch= third[counter];
int count=0;
for ( int i=0; i<third.length; i++){
// if (ch=="a")
if (ch==third[i])
count++;
}
boolean flag=false;
for(int j=counter-1;j>=0;j--){
//if(ch=="b")
if(ch==third[j])
flag=true;
}
if(!flag){
System.out.println(ch+" "+count);
out.write(ch+" "+count);
}
}
// close the output file
out.close();
} catch (IOException e) {
// print any error messages
System.out.println(e.getMessage());
}
// optional to close the scanner here, the close can occur at the end of the code
finally {
if (s != null) {
// close the input file
s.close();
}
}
}
For something like this I would NOT recommend java though it entirely possible it is much easier with GAWK or something similar. GAWK also has java like syntax so its easy to pick up. You should check it out.
SO isn't really the place to ask such a broad how-do-I-do-this-question but I will refer you to the following page on regular expression and text match in Java. Also, check out the Javadocs for regexes.
If you follow that link you should get what you want, else you could post a more specific question back on SO.

Categories