How can I convert an entire input file into a string? [duplicate] - java

This question already has answers here:
How do I create a Java string from the contents of a file?
(35 answers)
Closed 6 years ago.
In Java, how would I convert an entire input file into one String?
In other words, if I have an input file "test.in":
c++
java
python
test
then I want to create a String containing "c++javapythontest".
I thought of something along the lines of
Scanner input = new Scanner(new File("test.in"));
while(input.hasNext()){
String test = test + input.nextLine();
}
but that doesn't seem to work.
Is there an efficient way to do this?

To read file contents, discarding newline chars:
String contents = Files.lines(Paths.get("test.in")).collect(Collectors.joining());
I think you needed to test for hasNextLine(), and your code's performance suffers from the creation of so many objects when you concatenate strings like that. If you changed your code to use a StringBuilder, it would run much faster.

There could be many ways to do it. One of the ways you can try is using the nio package classes. You can use the readAllBytes method of the java.nio.file.Files class to get a byte array first and then create a new String object from the byte array new String(bytes).
Read the Java Doc of this method.
Following is a sample program:
byte[] bytes= Files.readAllBytes(Paths.get(filePath));
String fileContent = new String(bytes);

Declare the String test out of the loop, then iterate filling it.

This code is a small modification to your original logic.
StringBuilder creates a mutable sequence of characters which means we just append the content to the value of StringBuilder object instead of creating a new object everytime.
In your code String test = test + input.nextLine(); was inside while loop.
Thus fresh objects of String test were created with every iteration of while loop and therefore it was not saving previous values.
String path = "test.txt";
Scanner input = new Scanner(new File(path));
StringBuilder sb = new StringBuilder();
while (input.hasNext()) {
sb.append(input.nextLine() + "\n");
}
System.out.println(sb.toString());

You can try this instead.
Its a simple one liner
String str = new String(Files.readAllBytes(Paths.get("/path/to/file")));
This reads the entire file and keeps it as String.
If you want to remove new line characters.
str.replace("\n", "");

String.join has been added in Java8 which internally uses StringBuilder
String content = String.join("", Files.readAllLines(Paths.get("test.in")));

Related

Iterate through a dictionary array

I have a String array containing a poem which has deliberate spelling mistakes. I am trying to iterate through the String array to identify the spelling mistakes by comparing the String array to a String array containing a dictionary. If possible I would like a suggestion that allows me to continue using nested for loops
for (int i = 0; i < poem2.length; i++) {
boolean found = false;
for (int j = 0; j < dictionary3.length; j++) {
if (poem2[i].equals(dictionary3[j])) {
found = true;
break;
}
}
if (found==false) {
System.out.println(poem2[i]);
}
}
The output is printing out the correctly spelt words as well as the incorrectly spelt ones and I am aiming to only print out the incorrectly spelt ones. Here is how I populate the 'dictionary3' and 'poem2' arrays:
char[] buffer = null;
try {
BufferedReader br1 = new BufferedReader(new
java.io.FileReader(poem));
int bufferLength = (int) (new File(poem).length());
buffer = new char[bufferLength];
br1.read(buffer, 0, bufferLength);
br1.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String text = new String(buffer);
String[] poem2 = text.split("\\s+");
char[] buffer2 = null;
try {
BufferedReader br2 = new BufferedReader(new java.io.FileReader(dictionary));
int bufferLength = (int) (new File(dictionary).length());
buffer2 = new char[bufferLength];
br2.read(buffer2, 0, bufferLength);
br2.close();
} catch (IOException e) {
System.out.println(e.toString());
}
String dictionary2 = new String(buffer);
String[] dictionary3 = dictionary2.split("\n");
Your basic problem is in line
String dictionary2 = new String(buffer);
where you ware trying to convert characters representing dictionary stored in buffer2 but you used buffer (without 2 suffix). Such style of naming your variables may suggest that you either need a loop, or in this case separate method which will return for selected file array of words it holds (you can also add as method parameter delimiter on which string should be split).
So your dictionary2 held characters from buffer which represented poem, not dictionary data.
Another problem is
String[] dictionary3 = dictionary2.split("\n");
because you are splitting here only on \n but some OS like Windows use \r\n as line separator sequence. So your dictionary array may contain words like foo\r instead of foo which will cause poem2[i].equals(dictionary3[j] to always fail.
To avoid this problem you can split on \\R (available since Java 8) or \r?\n|\r.
There are other problems in your code like closing resource within try section. If any exception will be thrown before, close() will never be invoked leaving unclosed resources. To solve it close resources in finally section (which is always executed after try - regardless if exception will be thrown or not), or better use try-with-resources.
BTW you can simplify/clarify your code responsible for reading words from files
List<String> poem2 = new ArrayList<>();
Scanner scanner = new Scanner(new File(yourFileLocation));
while(scanner.hasNext()){//has more words
poem2.add(scanner.next());
}
For dictionary instead of List you should use Set/HashSet to avoid duplicates (usually sets also have better performance when checking if they contain some elements or not). Such collections already provide methods like contains(element) so you wouldn't need that inner loop.
I copied your code and ran it, and I noticed two issues. Good news is, both are very quick fixes.
#1
When I printed out everything in dictionary3 after it is read in, it is the exact same as everything in poem2. This line in your code for reading in the dictionary is the problem:
String dictionary2 = new String(buffer);
You're using buffer, which was the variable you used to read in the poem. Therefore, buffer contains the poem and your poem and dictionary end up the same. I think you want to use buffer2 instead, which is what you used to read in the dictionary:
String dictionary2 = new String(buffer2);
When I changed that, the dictionary and poem appear to have the proper entries.
#2
The other problem, as Pshemo pointed out in their answer (which is completely correct, and a very good answer!) is that you are splitting on \n for the dictionary. The only thing I would say differently from Pshemo here is that you should probably split on \\s+ just like you did for the poem, to stay consistent. In fact, when I debugged, I noticed that the dictionary words all have "\r" appended to the end, probably because you were splitting on \n. To fix this, change this line:
String[] dictionary3 = dictionary2.split("\n");
To this:
String[] dictionary3 = dictionary2.split("\\s+");
Try changing those two lines, and let us know if that resolves your issue. Best of luck!
Convert your dictionary to an ArrayList and use Contains instead.
Something like this should work:
if(dictionary3.contains(poem2[i])
found = true;
else
found = false;
With this method you can also get rid of that nested loop, as the contains method handles that for you.
You can convert your Dictionary to an ArrayList with the following method:
new ArrayList<>(Arrays.asList(array))

Why string concatenation takes so long time? [duplicate]

This question already has answers here:
StringBuilder vs String concatenation in toString() in Java
(20 answers)
Closed 7 years ago.
I am concatenating a String in a loop but it takes ages, why is that?
for (String object : jsonData) {
counter++;
finalJsonDataStr += object;
}
Variable object is a piece of JSON, up to 70 chars and the loop goes approx 50k times.
I understand some people advice StringBuffer or StringBuilder but this link says, it has no performance improvements: StringBuilder vs String concatenation in toString() in Java
Use a String Builder to append to strings.
When you concatenate, Java is actually creating a new String with the results of the concatenation.
Do it multiple times and you are creating gazillion of strings for nothing.
Try:
StringBuilder sb = new StringBuilder();
for (String object : jsonData) {
counter++;
sb.append(object.toString()); //this does the concatenation internally
//but is very efficient
}
finalJsonDataStr = sb.toString(); //this gives you back the whole string
Remark:
When you do stuff like
myString = "hello " + someStringVariable + " World!" + " My name is " + name;
The compiler is smart enough to replace all that with a single StringBuilder, like:
myString = new StringBuilder("hello ")
.append(someStringVariable)
.append(" World!")
.append(" My name is ")
.append(name).toString();
But for some reason I don't know, it doesn't do it when the concatenation happens inside a loop.
You should use a StringBuffer or a StringBuilder.
When you add Strings with plus, a StringBuilder is created, strings are concatenated and a new String is return with toString() method of the StringBuilder. So image this object creation and string manipulation 50k times. It's much better if you instantiate only one StringBuilder yourself and just append strings...
This answer could be of use to you: concatenation operator (+) vs concat()
Before going to the actual problem, see how internal concatenation works.
String testString ="str"+"ingcon"+"catenation";
If we print the above declared String to console and see, the result is stringconcatenation.Which is correct and the + works fine. Here is out actual question, how does that + symbol did the magic ? ? Is it not a normal mathematical addition of Strings. The below code snippet shows how that code with + actually converts.
StringBuilder compilerGeneratedBuilder = new StringBuilder();
compilerGeneratedBuilder.append("str");
compilerGeneratedBuilder.append("ingcon");
compilerGeneratedBuilder.append("catenation");
String finalString = compilerGeneratedBuilder.toString();
More .....
50K times loop is a descent performance blocker to consider.
In such cases use StringBuilder with append method. Cause concat (+) create a new object every time a new String Builder object. That leads to 50k objects creations.
With single StringBuilder and append method, you can save the time of Objection creation as well as the memory too.

Storing multiple values in Java without using arrays

Take user input for 5 times, store them in a variable and display all 5 values in last. How can I do this in Java? Without using arrays, collections or database. Only single variable like String and int.
Output should look like this
https://drive.google.com/file/d/0B1OL94dWwAF4cDVyWG91SVZjRk0/view?pli=1
This seems like a needless exercise in futility, but I digress...
If you want to store them in a single string, you can do it like so:
Scanner in = new Scanner(System.in);
String storageString = "";
while(in.hasNext()){
storageString += in.next() + ";";
}
if you then input foo bar baz storageString will contain foo;bar;baz;. (in.next() will read the input strings to the spaces, and in.hasNext() returns false at the end of the line)
As more strings are input, they are appended to the storageString variable. To retrieve the strings, you can use String.split(String regex). Using this is done like so:
String[] strings = storageString.split(";");
the strings array which is retrieved here from the storageString variable above should have the value ["foo", "bar", "baz"].
I hope this helps. Using a string as storage is not optimal because JVM creates a new object every time a string is appended onto it. To get around this, use StringBuilder.
*EDIT: I originally had said the value of the strings array would be ["foo", "bar", "baz", ""]. This is wrong. The javadoc states 'Trailing empty strings are therefore not included in the resulting array'.
public static void main(String[] args) {
String s = "";
Scanner in = new Scanner(System.in);
for(int i=0;i<5;i++){
s += in.nextLine();
}
System.out.println(s);
}
Why dont you use Stingbuilder or StringBuffer, keep appending the some delimiter followed by the input text.
Use simple String object and concatenate it with new value provided by user.
String myString = "";
// while reading from input
myString += providedValue;

String replace and output in Java [duplicate]

This question already has answers here:
Java String replace not working [duplicate]
(6 answers)
Closed 9 years ago.
I got a query, please see code below:
public void readFile(String path,String pathName,int num){
try{
PrintWriter out2=new PrintWriter(new PrintWriter(path));
File a=new File(pathName);
Scanner b=new Scanner(a);
while(b.hasNextLine()){
String message=b.nextLine();
Scanner h=new Scanner(message);
while(h.hasNext()){
String f=h.next();
if (f.equals("are")){
f.replace("are","ARE");
}
}
out2.printf("%s",message);
out2.println();
.......
The file content for scanner read is
who are you?
how are you?
what is up!
However, when I run the above codes and the output to the new file are the same with the input file, it means the "are" not replaced by "ARE", I have no idea which part is wrong, please advise, thanks guys!
This line just outputs the message unchanged to the new file.
out2.printf("%s",message);
Also the loop is strange too: why do you read it word by word, and then use String.replace()? You could do it line by line, using String.replaceAll():
while(h.hasNextLine()){
String message=b.nextLine();
out2.printf("%s",message.replaceAll("(^|\\W)are(\\W|$)"," ARE "));
}
The (^|\\W)are(\\W|$) string is a regular expression, having the meaning to match all content, that starts with either being the start of the string ^, or a non-word character (\\W), the string are, and ends with a non-word character or the end of line($)...
As scanner has whitespace as the default delimiter, it might be ever better to use (^|\\s)are(\\s|$), however both these will replace the whitespace before and after "ARE" with a single space ()...
Also, keep in mind, that String.replace does not mutate the input String... You have to assign the result, or use it any other way, like pass it to a function...
String is final and immutable, which is the same.
so f.replace("are","ARE"); must be inserted into a new or not variable.
f = f.replace("are","ARE");
I do not understand why you are doing that. Here is an alternative approach:
Get a BufferedReader to read the file.
While there is data in the file, read the lines.
If line.contains("are") then line = line.replace("are","ARE")
println(line)
As to why your code did not work:
In this line, f.replace("are","ARE"); You forgot to get the output.
Make it as such: message = f.replace("are","ARE");
Another option is to use StringBuffer or StringBuilder
Strings are immutable. Therefore, you can not run the replace method on object f and expect its value to be changed since the replace method of a string object will simply return a new String object.
either use a StringBuilder instead, or use :
f = f.replace
On the other hand, StringBuilder objects are mutable. Therefore, you can run the StringBuilder version of the replace method directly on the object if you choose that route instead.

Using StringBuilder to process csv files to save heap space

I am reading a csv file that has about 50,000 lines and 1.1MiB in size (and can grow larger).
In Code1, I use String to process the csv, while in Code2 I use StringBuilder (only one thread executes the code, so no concurrency issues)
Using StringBuilder makes the code a little bit harder to read that using normal String class.
Am I prematurely optimizing things with StringBuilder in Code2 to save a bit of heap space and memory?
Code1
fr = new FileReader(file);
BufferedReader reader = new BufferedReader(fr);
String line = reader.readLine();
while ( line != null )
{
int separator = line.indexOf(',');
String symbol = line.substring(0, seperator);
int begin = separator;
separator = line.indexOf(',', begin+1);
String price = line.substring(begin+1, seperator);
// Publish this update
publisher.publishQuote(symbol, price);
// Read the next line of fake update data
line = reader.readLine();
}
Code2
fr = new FileReader(file);
StringBuilder stringBuilder = new StringBuilder(reader.readLine());
while( stringBuilder.toString() != null ) {
int separator = stringBuilder.toString().indexOf(',');
String symbol = stringBuilder.toString().substring(0, separator);
int begin = separator;
separator = stringBuilder.toString().indexOf(',', begin+1);
String price = stringBuilder.toString().substring(begin+1, separator);
publisher.publishQuote(symbol, price);
stringBuilder.replace(0, stringBuilder.length(), reader.readLine());
}
Edit
I eliminated the toString() call, so there will be less string objects produced.
Code3
while( stringBuilder.length() > 0 ) {
int separator = stringBuilder.indexOf(",");
String symbol = stringBuilder.substring(0, separator);
int begin = separator;
separator = stringBuilder.indexOf(",", begin+1);
String price = stringBuilder.substring(begin+1, separator);
publisher.publishQuote(symbol, price);
Thread.sleep(10);
stringBuilder.replace(0, stringBuilder.length(), reader.readLine());
}
Also, the original code is downloaded from http://www.devx.com/Java/Article/35246/0/page/1
Will the optimized code increase performance of the app? - my question
The second code sample will not save you any memory nor any computation time. I am afraid you might have misunderstood the purpose of StringBuilder, which is really meant for building strings - not reading them.
Within the loop or your second code sample, every single line contains the expression stringBuilder.toString(), essentially turning the buffered string into a String object over and over again. Your actual string operations are done against these objects. Not only is the first code sample easier to read, but it is most certainly as performant of the two.
Am I prematurely optimizing things with StringBuilder? - your question
Unless you have profiled your application and have come to the conclusion that these very lines causes a notable slowdown on the execution speed, yes. Unless you are really sure that something will be slow (eg if you recognize high computational complexity), you definately want to do some profiling before you start making optimizations that hurt the readability of your code.
What kind of optimizations could be done to this code? - my question
If you have profiled the application, and decided this is the right place for an optimization, you should consider looking into the features offered by the Scanner class. Actually, this might both give you better performance (profiling will tell you if this is true) and more simple code.
Am I prematurely optimizing things with StringBuilder in Code2 to save a bit of heap space and memory?
Most probably: yes. But, only one way to find out: profile your code.
Also, I'd use a proper CSV parser instead of what you're doing now: http://ostermiller.org/utils/CSV.html
Code2 is actually less efficient than Code1 because every time you call stringBuilder.toString() you're creating a new java.lang.String instance (in addition to the existing StringBuilder object). This is less efficient in terms of space and time due to the object creation overhead.
Assigning the contents of readLine() directly to a String and then splitting that String will typically be performant enough. You could also consider using the Scanner class.
Memory Saving Tip
If you encounter multiple repeating tokens in your input consider using String.intern() to ensure that each identical token references the same String object; e.g.
String[] tokens = parseTokens(line);
for (String token : tokens) {
// Construct business object referencing interned version of token.
BusinessObject bo = new BusinessObject(token.intern());
// Add business object to collection, etc.
}
StringBuilder is usually used like this:
StringBuilder sb = new StringBuilder();
sb.append("You").append(" can chain ")
.append(" your ").append(" strings ")
.append("for better readability.");
String myString = sb.toString(); // only call once when you are done
System.out.prinln(sb); // also calls sb.toString().. print myString instead
StringBuilder has several good things
StringBuffer's operations are synchronized but StringBuilder is not, so using StringBuilder will improve performance in single threaded scenarios
Once the buffer is expanded the buffer can be reused by invoking setLength(0) on the object. Interestingly if you step into the debugger and examine the contents of StringBuilder you will see that contents are still exists even after invoking setLength(0). The JVM simply resets the pointer beginning of the string. Next time when you start appending the chars the pointer moves
If you are not really sure about length of string, it is better to use StringBuilder because once the buffer is expanded you can reuse the same buffer for smaller or equal size
StringBuffer and StringBuilder are almost same in all operations except that StringBuffer is synchronized and StringBuilder is not
If you dont have multithreading then it is better to use StringBuilder

Categories