I'm a newbie to java, and I'm reading in a ~25 MB file, and it takes forever to just load... Are there any alternatives to make this faster? Is it the Scanner that can't handle large files?
String text = "";
Scanner sc = new Scanner(new File("text.txt"));
while(sc.hasNext()) {
text += sc.next();
}
You are concatenating to text every iteration, and Strings are immutable in Java. This means it creates a new String object in memory every time text is "modified," resulting in long load times for large files. You should always try and use a StringBuilder when you are continuously altering a String.
You could do:
StringBuilder text = new StringBuilder();
Scanner sc = new Scanner(new File("text.txt");
while(sc.hasNext()) {
text.append(sc.next());
}
When you want to access the contents of text, you can call text.toString().
It is the String +=, which creates everytime an evergrowing new String object.
In fact for smaller than 25 MB one could do (undermore):
StringBuilder sb = new StringBuilder();
BufferReader in = new BufferedReader(new InputStreamReader(
new FileInputStream(new File("text.txt"), "UTF-8")));
for (;;) {
String line = in.readLine();
if (line == null)
break;
sb.append(line).append("\n");
}
in.close();
String text = sb.toString();
readLine yields the line upto the newline character(s), not including them.
In Java 7 one could do:
Path path = Paths.get("text.txt");
String text = new String(Files.readAllBytes(path), "UTF-8");
The encoding is given explicitly, as UTF-8. "Windows-1252" would be for Windows Latin-1 etcetera.
Try to use BufferedStreams, e.g, BufferedInputStream, BufferedReader they will accelerate it. For more information about BufferedStreams take a look at here;
http://docs.oracle.com/javase/tutorial/essential/io/buffers.html
And instead of String use StringBuilder since Strings are immutable in Java, it will create a new String within each iteration of while loop
Related
I believe I am not using correctly String Tokenizer. Here is my code:
buffer = new byte[(int) (end - begin)];
fin.seek(begin);
fin.read(buffer, 0, (int) (end - begin));
StringTokenizer strk = new StringTokenizer(new String(buffer),
DELIMS,true);
As you can see I am reading a chunk of lines from a file(end and begin are line numbers) and I am transfering the data to a string tokenizer. My delimitators are:
DELIMS = "\r\n ";
because I want to separate words that have a space between them, or are on the next line.
However this code sometimes separates whole words also. What could be the explanation?? Is my DELIMS string conceived wrong?
Also I am passing "true" as an argument to the tokenizer because I want the delimitators to be treated as tokens as well.( I want this because I want to count the line I am currently at)
Could you please help me. Thanks a lot.
To start with, your method for converting bytes into a String is a bit suspect, and this overall method will be less-than-efficient, especially for a larger file.
Are you required to use StringTokenizer? If not, I'd strongly recommend using Scanner instead. I'd provide you with an example, but will ask that you just refer to the Javadocs instead, which are quite comprehensive and already contain good examples. That said, it accepts delimiters as well - but as Regular Expressions, so just be aware.
You could always wrap your input stream in a LineNumberReader. That will keep track of the line number for you. LineNumberReader extends BufferedReader, which has a readLine() method. With that, you could use a regular StringTokenizer to get your words as tokens. You could use regular expressions or Scanner, but for this case, StringTokenizer is simpler for beginners to understand and quicker.
You must have a RandomAccessFile. You didn't specify that, but I'm guessing based on the methods you used. Try something like:
byte [] buffer = ...; // you know how to get this.
ByteArrayInputStream stream = new ByteArrayInputStream(buffer);
// if you have java.util.Scanner
{
int lineNumber = 0;
Scanner s = new Scanner(stream);
while (s.hasNextLine()) {
lineNum++;
String line = s.nextLine();
System.out.format("I am on line %s%n", lineNum);
Scanner lineScanner = new Scanner(line);
while (lineScanner.hasNext()) {
String word = lineScanner.next();
// do whatever with word
}
}
}
// if you don't have java.util.Scanner, or want to use StringTokenizer
{
LineNumberReader reader = new LineNumberReader(
new InputStreamReader(stream));
String line = null;
while ((line = reader.nextLine()) != null) {
System.out.println("I am on line " + reader.getLineNumber());
StringTokenizer tok = new StringTokenizer(line);
while (tok.hasMoreTokens()) {
String word = tok.nextToken();
// do whatever with word
}
}
}
I am using this code to read a txt file, line by line.
// Open the file that is the first command line parameter
FileInputStream fstream = new FileInputStream("/Users/dimitramicha/Desktop/SweetHome3D1.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
// Read File Line By Line
int i = 0;
while ((strLine = br.readLine()) != null) {
str[i] = strLine;
i++;
}
// Close the input stream
in.close();
and I save it in an array.
Afterwards, I would like to make an if statement about the Strings that I saved in the array. But when I do that it doesn't work, because (as I've thought) it saves also the spaces (backslashes). Do you have any idea how I can save the data in the array but without spaces?
I would do:
strLineWithoutSpaces = strLine.replace(' ', '');
str[i] = strLineWithoutSpaces;
You can also do more replaces if you find other characters that you don't want.
Have a look at the replace method in String and call it on strLine before putting it in the array.
You can use a Scanner which by default uses white space to separate tokens. Have a look at this tutorial.
I am using Java + Selenium 1 to test a web application.
I have to read through a text file line by line using befferedreader.readLine and compare the data that was found to another String.
Is there way to assign each line a unique string? I think it would be something like this:
FileInputStream fstream = new FileInputStream("C:\\write.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
String[] strArray = null;
int p=0;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
strArray[p] = strLine;
assertTrue(strArray[p].equals(someString));
p=p+1;
}
The problem with this is that you don't know how many lines there are, so you can't size your array correctly. Use a List<String> instead.
In order of decreasing importance,
You don't need to store the Strings in an array at all, as pointed out by Perception.
You don't know how many lines there are, so as pointed out by Qwerky, if you do need to store them you should use a resizeable collection like ArrayList.
DataInputStream is not needed: you can just wrap your FileInputStream directly in an InputStreamReader.
You may want to try something like:
public final static String someString = "someString";
public boolean isMyFileOk(String filename){
Scanner sc = new Scanner(filename);
boolean fileOk = true;
while(sc.hasNext() && fileOk){
String line = sc.nextLine();
fileOk = isMyLineOk(line);
}
sc.close();
return fileOk;
}
public boolean isMyLineOk(String line){
return line.equals(someString);
}
The Scanner class is usually a great class to read files :)
And as suggested, you may check one line at a time instead of loading them all in memory before processing them. This may not be an issue if your file is relatively small but you better keep your code scalable, especially for doing the exact same thing :)
Currently I am trying something very simple. I am looking through an XML document for a certain phrase upon which I try to replace it. The problem I am having is that when I read the lines I store each line into a StringBuffer. When I write the it to a document everything is written on a single line.
Here my code:
File xmlFile = new File("abc.xml")
BufferedReader br = new BufferedReader(new FileReade(xmlFile));
String line = null;
while((line = br.readLine())!= null)
{
if(line.indexOf("abc") != -1)
{
line = line.replaceAll("abc","xyz");
}
sb.append(line);
}
br.close();
BufferedWriter bw = new BufferedWriter(new FileWriter(xmlFile));
bw.write(sb.toString());
bw.close();
I am assuming I need a new line character when I prefer sb.append but unfortunately I don't know which character to use as "\n" does not work.
Thanks in advance!
P.S. I figured there must be a way to use Xalan to format the XML file after I write to it or something. Not sure how to do that though.
The readline reads everything between the newline characters so when you write back out, obviously the newline characters are missing. These characters depend on the OS: windows uses two characters to do a newline, unix uses one for example. To be OS agnostic, retrieve the system property "line.separator":
String newline = System.getProperty("line.separator");
and append it to your stringbuffer:
sb.append(line).append(newline);
Modified as suggested by Brel, your text-substituting approach should work, and it will work well enough for simple applications.
If things start to get a little hairier, and you end up wanting to select elements based on their position in the XML structure, and if you need to be sure to change element text but not tag text (think <abc>abc</abc>), then you'll want to call in in the cavalry and process the XML with an XML parser.
Essentially you read in a Document using a DocuemntBuilder, you hop around the document's nodes doing whatever you need to, and then ask the Document to write itself back to file. Or do you ask the parser? Anyway, most XML parsers have a handful of options that let you format the XML output: You can specify indentation (or not) and maybe newlines for every opening tag, that kinda thing, to make your XML look pretty.
Sb would be the StringBuffer object, which has not been instantiated in this example. This can added before the while loop:
StringBuffer sb = new StringBuffer();
Scanner scan = new Scanner(System.in);
String filePath = scan.next();
String oldString = "old_string";
String newString = "new_string";
String oldContent = "";
BufferedReader br = null;
FileWriter writer = null;
File xmlFile = new File(filePath);
try {
br = new BufferedReader(new FileReader(xmlFile));
String line = br.readLine();
while (line != null) {
oldContent = oldContent + line + System.lineSeparator();
line = br.readLine();
}
String newContent = oldContent.replaceAll(oldString, newString);
writer = new FileWriter(xmlFile);
writer.write(newContent);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
scan.close();
br.close();
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
I'm coming from a C++ background, so be kind on my n00bish queries...
I'd like to read data from an input file and store it in a stringstream. I can accomplish this in an easy way in C++ using stringstreams. I'm a bit lost trying to do the same in Java.
Following is a crude code/way I've developed where I'm storing the data read line-by-line in a string array. I need to use a string stream to capture my data into (rather than use a string array).. Any help?
char dataCharArray[] = new char[2];
int marker=0;
String inputLine;
String temp_to_write_data[] = new String[100];
// Now, read from output_x into stringstream
FileInputStream fstream = new FileInputStream("output_" + dataCharArray[0]);
// Convert our input stream to a BufferedReader
BufferedReader in = new BufferedReader (new InputStreamReader(fstream));
// Continue to read lines while there are still some left to read
while ((inputLine = in.readLine()) != null )
{
// Print file line to screen
// System.out.println (inputLine);
temp_to_write_data[marker] = inputLine;
marker++;
}
EDIT:
I think what I really wanted was a StringBuffer.
I need to read data from a file (into a StringBuffer, probably) and write/transfer all the data back to another file.
In Java, first preference should always be given to buying code from the library houses:
http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html
http://commons.apache.org/io/api-1.4/org/apache/commons/io/FileUtils.html
In short, what you need is this:
FileUtils.readFileToString(File file)
StringBuffer is one answer, but if you're just writing it to another file, then you can just open an OutputStream and write it directly out to the other file. Holding a whole file in memory is probably not a good idea.
In you simply want to read a file and write another one:
BufferedInputStream in = new BufferedInputStream( new FileInputStream( "in.txt" ) );
BufferedOutputStream out = new BufferedOutputStream( new FileOutputStream( "out.txt" ) );
int b;
while ( (b = in.read()) != -1 ) {
out.write( b );
}
If you want to read a file into a string:
StringWriter out = new StringWriter();
BufferedReader in = new BufferedReader( new FileReader( "in.txt" ) );
int c;
while ( (c = in.read()) != -1 ) {
out.write( c );
}
StringBuffer buf = out.getBuffer();
This can be made more efficient if you read using byte arrays. But I recommend that you use the excellent apache common-io. IOUtils (http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html) will do the loop for you.
Also, you should remember to close the streams.
I also come from C++, and I was looking for a class similar to the C++ 'StringStreamReader', but I couldn't find it. In my case (which I think was very simple), I was trying to read a file line by line and then read a String and an Integer from each of these lines. My final solution was to use two objects of the class java.util.Scanner, so that I could use one of them to read the lines of the file directly to a String and use the second one to re-read the content of each line (now in the String) to the variables (a new String and a positive 'int'). Here's my code:
try {
//"path" is a String containing the path of the file we want to read
Scanner sc = new Scanner(new BufferedReader(new FileReader(new File(path))));
while (sc.hasNextLine()) { //while the file isn't over
Scanner scLine = new Scanner(sc.nextLine());
//sc.nextLine() returns the next line of the file into a String
//scLine will now proceed to scan (i.e. analyze) the content of the string
//and identify the string and the positive 'int' (what in C++ would be an 'unsigned int')
String s = scLine.next(); //this returns the string wanted
int x;
if (!scLine.hasNextInt() || (x = scLine.nextInt()) < 0) return false;
//scLine.hasNextInt() analyzes if the following pattern can be interpreted as an int
//scLine.nextInt() reads the int, and then we check if it is positive or not
//AT THIS POINT, WE ALREADY HAVE THE VARIABLES WANTED AND WE CAN DO
//WHATEVER WE WANT WITH THEM
//in my case, I put them into a HashMap called 'hm'
hm.put(s, x);
}
sc.close();
//we finally close the scanner to point out that we won't need it again 'till the next time
} catch (Exception e) {
return false;
}
return true;
Hope that helped.