Saving huge file in a string JAVA - java

i'm trying to read a FASTA file into a string in java.
My code works fine with small files, but when I choose a real FASTA file
which includes 5 million chars, so I can use this string, the program get stucked. get stucked= i see no output, and the program becomes with black screen.
public static String ReadFastaFile(File file) throws IOException{
String seq="";
try(Scanner scanner = new Scanner(new File(file.getPath()))) {
while ( scanner.hasNextLine() ) {
String line = scanner.nextLine();
seq+=line;
// process line here.
}
}
return seq;
}

Try to use a StringBuilder to process big loads of text data:
public static String ReadFastaFile( File file ) throws IOException {
StringBuilder seq = new StringBuilder();
try( Scanner scanner = new Scanner( file ) ) {
while ( scanner.hasNextLine() ) {
String line = scanner.nextLine();
seq.append( line );
// process line here.
}
}
return seq.toString();
}

I would try to use BufferedReader to read the file, something like this:
public static String readFastaFile(File file) throws IOException {
String seq="";
try(BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ((line = br.readLine()) != null) {
// process line here.
}
}
return seq;
}
And also concatenate with StringBuilder like davidbuzatto said.

Related

how to feed multipe files to command line in java without using shell

I am quite new to java, so it might be a stupid question. But I need it to be solved for my data structure class project...
So I am trying to feed my program with 2 different input files. I know we can use Scanner and InputStreamReader to achieve this with 1 file, I don't know how I should do it with 2 files.
In some answers to similar questions with mine, someone mentioned shell which I think can probably solve this problem. However, I don't know anything about shell, so I am wondering if this problem can be solved without writing a shell file, and what the syntax would be for inputting multiple files in command line.
What I execute in command line(with 1 input file):
java UserInterfaceOrNot < input.txt > output.txt
I will post more code if needed.
Code:
public class UserInterfaceOrNot
{
public static EventManager em;
public static Scanner scn = new Scanner(new InputStreamReader(System.in));
public static void main (String [] args)
{
UserInterfaceOrNot ui = new UserInterfaceOrNot();
while (scn.hasNext()){ui.runData();}
scn = new Scanner(new InputStreamReader(System.in));
while (scn.hasNext() && !scn.next().equals("x")){ui.runCommand();}
}
java UserInterfaceOrNot input1.txt input2.txt output.txt
When you call your program as this, you're actually passing 3 arguments to your java public static void main (String [] args) method.
You can find these argument in order in that String array (String [] args).
To read the arguments:
String myFirstFile = args[0]; // this will be "input1.txt"
String mySecondFile = args[1]; // this will be "input2.txt"
String myOutputFile = args[2]; // this will be "output.txt"
You can read each file (input1 and input2) like this by creating another method
public String readFileAsString(String inputFile) throw IOException {
BufferedReader br = new BufferedReader(new FileReader(inputFile));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
return sb.toString();
} finally {
br.close();
}
}
Then in your main method you can call it like this:
public static void main(String[] args) throws Exception {
UserInterfaceOrNot ui = new UserInterfaceOrNot();
String inputFile1 = args[0];
String inputFile2 = args[1];
String input1AsString = ui.readFileAsString(inputFile1);
String input2AsString = ui.readFileAsString(inputFile2);
//continue with your logic
}

Java Reading in text file and outputting it to new file with removed duplicates

I have a text file with an integer on each line, ordered from least to greatest, and I want to put them in a new text file with any duplicate numbers removed.
I've managed to read in the text file and print the numbers on the screen, but I'm unsure on how to actually write them in a new file, with duplicates removed?
public static void main(String[] args)
{
try
{
FileReader fr = new FileReader("sample.txt");
BufferedReader br = new BufferedReader(fr);
String str;
while ((str = br.readLine()) != null) {
out.println(str + "\n");
}
br.close();
}
catch (IOException e) {
out.println("File not found");
}
}
When reading the file, you could add the numbers to a Set, which is a data structure that doesn't allow duplicate values (just Google for "java collections" for more details)
Then you iterate through this Set, writing the numbers to a FileOutputStream (google for "java io" for more details)
Instead of printing each of the numbers, add them to an Array. After you've added all the integers, you can cycle through the array to remove duplicates (sample code for this can be found fairly easily).
Once you have an array, use BufferedWriter to write to an output file. Example code for how to do this can be found here: https://www.mkyong.com/java/how-to-write-to-file-in-java-bufferedwriter-example/
Alternatively, use a Set, and BufferedWriter should still work in the same way.
assuming the input file is already ordered:
public class Question42475459 {
public static void main(final String[] args) throws IOException {
final String inFile = "sample.txt";
try (final Scanner scanner = new Scanner(new BufferedInputStream(new FileInputStream("")), "UTF-8");
BufferedWriter writer = new BufferedWriter(new FileWriter(inFile + ".out", false))) {
String lastLine = null;
while (scanner.hasNext()) {
final String line = scanner.next();
if (!line.equals(lastLine)) {
writer.write(line);
writer.newLine();
lastLine = line;
}
}
}
}
}

BufferedReader messed up by different line seperators

I'm having a buffered reader streaming a file. There are two cases right now:
It is streaming a file generated on one PC, let's call it File1.
It is streaming a file generated on another Computer, let's call it File2.
I'm assuming my problem is caused by the EOLs.
BufferedReader does read both files, but for the File2, it reads an extra empty line for every new line.
Also, when I compare the line using line.equalsIgnoreCase("abc"), given that the line is "abc" it does not return true.
Use this code together with the two files provided in the two links to replicate the problem:
public class JavaApplication {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException {
File file = new File("C:/Users/User/Downloads/html (2).htm");
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));
String line = "";
while ((line = in.readLine()) != null) {
System.out.println(line);
}
}
File1,
File2
Note how the second file prints an empty line after each line...
I've been searching and trying and searching and trying, and couldn't come up with a solution.
Any ideas how to fix that? (Especially the compare thing?)
Works for me.
public class CRTest
{
static StringReader test = new StringReader( "Line 1\rLine 2\rLine 3\r" );
public static void main(String[] args) throws IOException {
BufferedReader buf = new BufferedReader( test );
for( String line = null; (line = buf.readLine()) != null; )
System.out.println( line );
}
}
Prints:
run:
Line 1
Line 2
Line 3
BUILD SUCCESSFUL (total time: 1 second)
As Joop said, I think you've mixed up which file isn't working. Please use the above skeleton to create an MCVE and show us exactly what file input isn't working for you.
Since you appear to have a file with reversed \r\n lines, here's my first attempt at a fix. Please test it, I haven't tried it yet. You need to wrap your InputStreamReader with this class, then wrap the BufferedReader on the outside like normal.
class CRFix extends Reader
{
private final Reader reader;
private boolean readNL = false;
public CRFix( Reader reader ) {
this.reader = reader;
}
#Override
public int read( char[] cbuf, int off, int len )
throws IOException
{
for( int i = off; i < off+len; i++ ) {
int c = reader.read();
if( c == -1 )
if( i == off ) return -1;
else return i-off-1;
if( c == '\r' && readNL ) {
readNL = false;
c = reader.read();
}
if( c == '\n' )
readNL = true;
else
readNL = false;
cbuf[i] = (char)c;
}
return len;
}
#Override
public void close()
throws IOException
{
reader.close();
}
}
Joop was right, after some more research it seems like, even though both files have specified a UTF-16 encoding in their header, one was encoded in UTF-16, and the other (File1) in UTF-8. This lead to the "double line effect".
Thanks for the effort that was put in answering this question.

Java : Cannot Read CSV file separated by ';' and line terminated by '\n'

public static void main(String[] args) throws IOException {
String dataRow;
BufferedReader CSVFile =
new BufferedReader(new FileReader("test.csv"));
while ((dataRow = CSVFile.readLine()) != null) {
System.out.println(dataRow.split(";")[0]);
}
// Close the file once all data has been read.
CSVFile.close();
// End the printout with a blank line.
System.out.println("Done");
}
The CSV file I am Trying to read in normal text view
ID;Numbers
12;234
343;233
All I am able to print is space no strings in it.
Output
1. ��I
2.
3.
4.
Done
The File Encoding is Only "UNICODE"
How to read a Unicode file in java.
Do I have to set a parameter setting the encoding type of the file in FileReader java class construtor??
Kindly advise.
public static void main(String[] args) throws IOException {
String dataRow;
BufferedReader CSVFile = new BufferedReader(new FileReader("F:\\csv.csv"));
while ((dataRow = CSVFile.readLine()) != null) {
String []data = dataRow.split(";");
for (String d : data) {
System.out.print(d + " ");
}
System.out.println();
}
CSVFile.close();
System.out.println("Done");
}
The result is shown below:
ID Numbers
12 234
343 233
Done
The way to get the file encode is shown below
open the csv file with nodepad editor, then click file -> save as
if you file encoding is unicode, when you read file,you can pass a file encodeing argument to function
the fllowing code is below
public static void main(String[] args) throws IOException {
String dataRow;
BufferedReader CSVFile = new BufferedReader(new InputStreamReader(new FileInputStream("F:\\csv.csv"),"unicode"));
while ((dataRow = CSVFile.readLine()) != null) {
String []data = dataRow.split(";");
for (String d : data) {
System.out.print(d + " ");
}
System.out.println();
}
CSVFile.close();
System.out.println("Done");
}
if you do not pass the file encoding argument, when you read the file content, the encoding depends on your os

Read a particuliar line in txt file in Java

The file ListeMot.txt contain 336529 Line
How to catch a particular line.
This my code
int getNombre()
{
nbre = (int)(Math.random()*336529);
return nbre ;
}
public String FindWord () throws IOException{
String word = null;
int nbr= getNombre();
InputStreamReader reader = null;
LineNumberReader lnr = null;
reader = new InputStreamReader(new FileInputStream("../image/ListeMot.txt"));
lnr = new LineNumberReader(reader);
word = lnr.readLine(nbr);
}
Why I can't get word = lnr.readLine(nbr);??
Thanks
P.S I am new in java!
To get the Nth line you have to read all the lines before it.
If you do this more than once, the most efficient thing to do may be to load all the lines into memory first.
private final List<String> words = new ArrayList<String>();
private final Random random = new Random();
public String randomWord() throws IOException {
if (words.isEmpty()) {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("../image/ListeMot.txt")));
String line;
while ((line = br.readLine()) != null)
words.add(line);
br.close();
}
return words.get(random.nextInt(words.size()));
}
BTW: The the parameter theWord meant to be used?
There is no method like readLine(int lineNumber) in Java API. You should read all previous lines from a specific line number. I have manipulated your 2nd method, take a look at it:
public void FindWord () throws IOException
{
String word = "";
int nbr = getNombre();
InputStreamReader reader = null;
LineNumberReader lnr = null;
reader = new InputStreamReader( new FileInputStream( "src/a.txt" ) );
lnr = new LineNumberReader( reader );
while(lnr.getLineNumber() != nbr)
word = lnr.readLine();
System.out.println( word );
}
The above code is not error free since I assume you know the limit of the line number in the given text file, i.e. if we generate a random number which is greater than the actual line number, the code will go into an infinite loop, be careful.
Another issue, line numbers start from 1 so I suggest you to change your random line number generator method like this:
int getNombre()
{
nbre = (int)(Math.random()*336529) + 1;
return nbre ;
}
The LineNumberReader only keeps track of the number of lines read, it does not give random access to lines in the stream.

Categories