Save huge amount of Strings in file during program execution in Java - java

In my program the main logic is to construct Strings in different methods and then to save them in a specific order in a file. But my memory consumption is very high, so I wonder how to save less strings in memory. So I will try to simplify the program for the ease of reading here. My small String Generating methods do like:
Then my main logic is something like:
BufferedWriter bw = new BufferedWriter(fw);
for(int i=1;i<1_000_000_000;i++){
bw.write(processMethod(i));
}
bw.close();
where the processMethod is a method that calls methods like generateTag m times, makes String using StringBuilder and then saves to BufferedWriter .
public static String generateTag(final String tagName, final String name, final String value) {
StringBuilder attribute = new StringBuilder();
attribute.append("<tag:");
attribute.append(tagName);
attribute.append(" name=\"");
attribute.append(name);
attribute.append("\">");
attribute.append(value);
attribute.append("</tag:");
attribute.append(tagName);
attribute.append(">");
return attribute.toString();
}
So when I start the processMethod executes 1_000_000_000 times and then m times is called generateTag like methods. I have 1_000_000_000 * m strings in the memory. How can I easily remove their creation? I think something like:
public static String generateTag(final String tagName, final String name, final String value, final BufferedWriter bf) {
....
bf.write(someBuilder.toString());
..
}
But passing BufferedWriter around is not a good I think.
Can you suggest me some less String created description.

If indeed your program merely calls methods one after the other and those methods generate strings and those strings are written to the file in the order they are generated, then it's simpler to write directly to the file using the main BufferedWriter:
try (BufferedWriter bw = new BufferedWriter(fw);) {
for(int i=1;i<1_000_000_000;i++){
processMethod(i, bw);
}
}
(Note that I used try-with-resources to automatically close the buffered writer).
And then when processMethod calls generateTag, it passes the buffered writer to it:
public void processMethod(int i, BufferedWriter bw) {
...
generateTag(...,bw);
...
}
And generateTag is going to be:
public static void generateTag(final String tagName, final String name, final String value, final BufferedWriter bw) {
bw.write("<tag:");
bw.write(tagName);
bw.write(" name=\"");
bw.write(name);
bw.write("\">");
bw.write(value);
bw.write("</tag:");
bw.write(tagName);
bw.write(">");
}
Since BufferedWriter is buffered, it means that there is disk access not every time you call write, but every time the buffer is filled up. So this won't cost you in disk access speed. But it will save you all that memory.
Of course, if you don't write the results serially, or the result of one method depends on the result of the other, then you need to modify this, but still, you should write as soon as you have the next piece of String ready.

One thing you may consider is to not saving them in memory, and instead write them as soon as possible...
many XML api like jdom, dom and dom4j abet the harmful habits of building tree model in memory, whereas in reality, it is far more efficient to dump the byte out into output buffer asap...
You could rewrite your method to include an output stream variable and flush out all the bytes out with the output stream.

Use the Appendable abstraction to ensure the generateTag (and likewise processMethod, if you wish) have no dependency to BufferedWriter. In fact, we can even pass a StringBuilder to do the same as before, a feature, which we can use to provide an appropriate overload:
public static CharSequence generateTag(
final String tagName, final String name, final String value) {
StringBuilder attribute=new StringBuilder(80);
try { generateTag(tagName, name, value, attribute); }
catch (IOException ex) { throw new AssertionError(ex); }
return attribute;
}
public static void generateTag(
final String tagName, final String name, final String value, Appendable attribute)
throws IOException {
attribute.append("<tag:");
attribute.append(tagName);
attribute.append(" name=\"");
attribute.append(name);
attribute.append("\">");
attribute.append(value);
attribute.append("</tag:");
attribute.append(tagName);
attribute.append(">");
}
Note that the overload returns CharSequence rather than String to be able to omit the final StringBuilder.toString() copying step. Unfortunately, calling append(CharSequence)on the BufferedWriter with it would not pay off, given the current implementation. Still there are APIs allowing to use CharSequence directly.

Related

designing classes for other developers to use in java

class CSVReader {
private List<String> output;
private InputStream input;
public CSVReader(InputStream input) {
this.input = input;
}
public void read() throws Exception{
//do something with the inputstream
// create output list.
}
public List<String> getOutput() {
return Collections.unmodifiableList(output);
}
I am trying to create a simple class which will be part of a library. I would like to create code that satisfies the following conditions:
handles all potential errors or wraps them into library errors and
throws them.
creates meaningful and complete object states (no incomplete object structures).
easy to utilize by developers using the library
Now, when I evaluated the code above, against the goals, I realized that I failed badly. A developer using this code would have to write something like this -
CSVReader reader = new CVSReader(new FileInputStream("test.csv");
reader.read();
read.getOutput();
I see the following issues straight away -
- developer has to call read first before getOutput. There is no way for him to know this intuitively and this is probably bad design.
So, I decided to fix the code and write something like this
public List<String> getOutput() throws IOException{
if(output==null)
read();
return Collections.unmodifiableList(output);
}
OR this
public List<String> getOutput() {
if(output==null)
throw new IncompleteStateException("invoke read before getoutput()");
return Collections.unmodifiableList(output);
}
OR this
public CSVReader(InputStream input) {
read(); //throw runtime exception
}
OR this
public List<String> read() throws IOException {
//read and create output list.
// return list
}
What is a good way to achieve my goals? Should the object state be always well defined? - there is never a state where "output" is not defined, so I should create the output as part of constructor? Or should the class ensure that a created instance is always valid, by calling "read" whenever it finds that "output" is not defined and just throw a runtime exception? What is a good approach/ best practice here?
I would make read() private and have getOutput() call it as an implementation detail. If the point of exposing read() is to lazy-load the file, you can do that with exposing getOutput only
public List<String> getOutput() {
if (output == null) {
try {
output = read();
} catch (IOException) {
//here you either wrap into your own exception and then declare it in the signature of getOutput, or just not catch it and make getOutput `throws IOException`
}
}
return Collections.unmodifiableList(output);
}
The advantage of this is that the interface of your class is very trivial: you give me an input (via constructor) I give you an output (via getOutput), no magic order of calls while preserving lazy-loading which is nice if the file is big.
Another advantage of removing read from the public API is that you can go from lazy-loading to eager-loading and viceversa without affecting your clients. If you expose read you have to account for it being called in all possible states of your object (before it's loaded, while it's already running, after it already loaded). In short, always expose the least possible
So to address your specific questions:
Yes, the object state should always be well-defined. Your point of not knowing that an external call on read by the client class is needed is indeed a design smell
Yes, you could call read in the constructor and eagerly load everything upfront. Deciding to lazy-load or not is an implementation detail dependent on your context, it should not matter to a client of your class
Throwing an exception if read has not been called puts again the burden to calling things in the right, implicit order on the client, which is unnecessary due to your comment that output is never really undefined so the implementation itself can make the risk-free decision of when to call read
I would suggest you make your class as small as possible, dropping the getOutput() method all together.
The idea is to have a class that reads a CSV file and returns a list, representing the result. To achieve this, you can expose a single read() method, that will return a List<String>.
Something like:
public class CSVReader {
private final InputStream input;
public CSVReader(String filename) {
this.input = new FileInputStream(filename);
}
public List<String> read() {
// perform the actual reading here
}
}
You have a well defined class, a small interface to maintain and the instances of CSVReader are immutable.
Have getOutput check if it is null (or out of date) and load it in automatically if it is. This allows for a user of your class to not have to care about internal state of the class's file management.
However, you may also want to expose a read function so that the user can chose to load in the file when it is convenient. If you make the class for a concurrent environment, I would recommend doing so.
The first approach takes away some flexibility from the API: before the change the user could call read() in a context where an exception is expected, and then call getOutput() exception-free as many times as he pleases. Your change forces the user to catch a checked exception in contexts where it wasn't necessary before.
The second approach is how it should have been done in the first place: since calling read() is a prerequisite of calling getOutput(), it is a responsibility of your class to "catch" your users when they "forget" to make a call to read().
The third approach hides IOException, which may be a legitimate exception to catch. There is no way to let the user know if the exception is going to be thrown or not, which is a bad practice when designing runtime exceptions.
The root cause of your problem is that the class has two orthogonal responsibilities:
Reading a CSV, and
Storing the result of a read for later use.
If you separate these two responsibilities from each other, you would end up with a cleaner design, in which the users would have no confusion over what they must call, and in what order:
interface CSVData {
List<String> getOutput();
}
class CSVReader {
public static CSVData read(InputStream input) throws IOException {
...
}
}
You could combine the two into a single class with a factory method:
class CSVData {
private CSVData() { // No user instantiation
}
// Getting data is exception-free
public List<String> getOutput() {
...
}
// Creating instances requires a factory call
public static CSVData read(InputStream input) throws IOException {
...
}
}

JAVA api validation/Exception handling

How to handle error conditions when writing a Java API/Utility
This is my Implementation for my API interface
public void bin2zip(InputStream[] is,OuputStream os, String[] names)
{
//if number of streams and number of names do not match do something
}
What I am trying to do is handling a case when the length of the is != length of name.
How do i handle this. I dont want my API to do some work until ArrayOutOfBound exception to be thrown. I want to catch this early.
One solution is something like this:
if it does not match I throw
if(is.length==names.length)
throws new Exception("ParemeterValidationException: The inputstream array and name array length should match");
if(containsInvalidFileName(names))
throws new Exception("ParemeterValidationException: The names array length should contain valid filenames");
Also, can this be done compile time using DataDependency (I can make ValidationClass for the API and make sure the developer get hold of this object to pass on to this conversion API) or the runtime exception is the best way?
I believe doing a ValidationClass will make API use complicated
I did go through some materials (if anyone interested), but need some directions.
http://lcsd05.cs.tamu.edu/slides/keynote.pdf
Java: checked vs unchecked exception explanation
http://docs.oracle.com/javase/tutorial/collections/interoperability/api-design.html
Wherever possible, don't let end users screw it up.
public final class Bin2Zipper {
private final List<InputStream> inputStreams = ...;
private final List<String> names = ...;
public BinZipper() {
}
public void add(final InputStream is, final String name) {
this.inputStreams.add(is);
this.names.add(name);
}
public void bin2zip(final OutputStream os) {
// ...
}
}
A fluent interface might even be better. Then your code would look like:
Bin2Zipper.add(is1, name1).add(is2, name2).add(is3, name3).toZip(os);
public final class Bin2Zipper {
private final List<InputStream> inputStreams = ...;
private final List<String> names = ...;
private Bin2Zipper(final InputStream is, final String name) {
this.inputStreams.add(is);
this.names.add(name);
}
public static Bin2Zipper add(final InputStream is, final String name) {
return new Bin2Zipper(is, name);
}
public Bin2Zipper add(final InputStream is, final String name) {
this.inputStreams.add(is);
this.names.add(name);
return this;
}
public void zip(final OutputStream os) {
...
}
}
Where these fall down is when the client starts off with the two arrays. In that case, it can be annoying for them to have to loop over all the entries themselves. I think it's still worth it. If you don't, then you'll have to compare the sizes of the inputs right away. You almost certainly want to throw an unchecked exception, probably an IllegalArgumentException like Vince said.
I think your solution of comparing the array lengths is perfectly appropriate. I think in this case you should throw an IllegalArgumentException; this exception is defined in the standard and used by most standard functions doing this kind of checking.
Many standard libraries use this kind of interface it is easily understood.
That said I think you should prefer an interface that simply doesn't facilitate such misuse such as that suggested by #Eric - the library everybody likes to use is the one that works first time every time because it's too simple to mess up.

Comparing strings from a written file

I'm stuck on this program I'm making for school. Here's my code:
public static void experiencePointFileWriter() throws IOException{
File writeFileResults = new File("User Highscore.txt");
BufferedWriter bw;
bw = new BufferedWriter(new FileWriter(writeFileResults, true));
bw.append(userName + ": " + experiencePoints);
bw.newLine();
bw.flush();
bw.close();
FileReader fileReader = new FileReader(writeFileResults);
char[] a = new char[50];
fileReader.read(a); // reads the content to the array
for (char c : a)
System.out.print(c); // prints the characters one by one
fileReader.close();
}
The dilemma I'm facing is how can I sort new scores with the scores in writeFileResults by the numerical value of int experiencePoints? If you're wondering about the variables userName is assigned by a textfield.getText method, and an event happens when you press one of 36 buttons which launches a math.Random statement with one of 24 possible outcomes. They all add different integer numbers to experiencePoints.
Well, I don't want to do your homework, and this does seem introductory so I'd like to give you some hints.
First, there's a few things missing:
We don't have some of the variables you've given us, so there is no type associated with oldScores
There is no reference to userName or experiencePoints outside this method call
If you can add this information, it would make this process easier. I could infer things, but then I might be wrong, or worse yet, have you learn nothing because I did your assignment for you. ;)
EDIT:
So, based on extra information, you're data file is holding an "array" of usernames and experience values. Thus, the best way (read: best design, not shortest) would be to load these into custom objects then write a comparator function (read: implement the abstract class Comparator).
Thus, in pseudo-Java, you'd have:
Declare your data type:
private static class UserScore {
private final String name;
private final double experience;
// ... fill in the rest, it's just a data struct
}
In your reader, when you read the values, split each line to get the values, and create a new List<UserScore> object which contains all of the values read from the file (I'll let you figure this part out)
After you have your list, you can use Collections#sort to sort the list to be the correct order, here would be an example of this:
// assuming we have our list, userList
Collections.sort(userList, new Comparator<UserScore>() {
public int compare(UserScore left, UserScore right) {
return (int)(left.getExperience() - right.getExperience()); // check the docs to see why this makes sense for the compare function
}
}
// userList is now sorted based on the experience points
Re-write your file, as you see fit. You now have a sorted list.

Picking up from where I left off when reading a file in Java

I am trying to read info from a file and create objects out of that information. Every 6 or so lines of the file is a different unit, meaning that the first set of lines are relevant to object A, the next set to object B, and so on.
I can read from the file and create my object just fine--for the first set. My problem is that I don't know how to get the reader to pick up from the spot it left off at when creating the next object...
(Note: the read() method which creates the file is part of the new object being created, not in a main() or anything like that). Here are the relevant bits of code:
The driver:
public class CSD{
public static void main (String[] argv){
Vector V=new Vector(10);
CoS jon=new CoS();
jon.display();
}//end main
}
which calls CoS, whose constructor is:
public CoS(){
try{
String fileName=getFileName();
FileReader freader=new FileReader(fileName);
BufferedReader inputFile=new BufferedReader(freader);
this.read(inputFile);
setDegree(major);
setStatus(credits);
} catch(FileNotFoundException ex){
}//end catch
}
Which calls both read() and getFileName():
public void read(BufferedReader inputFile){
try{
int n;
super.read(inputFile);
String str=inputFile.readLine();
if (str!=null){
n=Integer.parseInt(str);
setCredits(n);
str=inputFile.readLine();
setMajor(str);
}//end if
}catch(IOException ex){}
}//end method
public String getFileName() {
Scanner scan = new Scanner(System.in);
String filename;
System.out.print("Enter the file name and path ==> ");
filename = scan.nextLine();
System.out.println("");
return filename;
}
Thanks in advance, guys!
Why not use ObjectInputStream and ObjectOutputStream? Or any kind of real serialization?
javadoc: http://docs.oracle.com/javase/6/docs/api/java/io/ObjectOutputStream.html
example code: http://www.javadb.com/writing-objects-to-file-with-objectoutputstream
Basically, since you write your objects to a file and want to take care of the lines where they are located, I'll suggest a few other serialization alternatives.
One is the Object * Stream - you create a ObjectStream on a File and just write objects thru it. Later when you read, you read the objects in the reverse order you wrote them and they will come back just as you wrote them.
Another is to implement Serializable. Remember that transient keyword? Use it on fields you do not want to save to the file.
And then there's the raw "by hand" approach where you save only the things you want to save and reconstruct the objects later by passing these initialization values to their constructor. Kinda like people suggested that you make the file line a argument to the ctor :)
EDIT:
guess writing with Object*Streams requires you to implement Serializable or Externalizable.
but if the example code isn't clear enough, ask :)

replace a string segment from input stream

I am trying to receive a huge text file as an inputstream and want to convert a string segment with another string. I am strictly confused how to do it, it works well if I convert whole inputstream as a string which I don't want as some of the contents are lost. can anyone please help how to do it??
e.g.
if I have a file which has the contents "This is the test string which needs to be modified". I want to accept this string as input stream and want to modify the contents to "This is the test string which is modified" , ( by replacing 'needs to be' with is).
public static void main(String[] args) {
String string = "This is the test string which needs to be modified";
InputStream inpstr = new ByteArrayInputStream(string.getBytes());
//Code to do
}
In this I want the output as: This is the test string which is modified
Thanking you in advance.
If the text to be changed will always fit in one logical line, as I stated in comment, I'd go with simple Line Reading (if applyable) using something like:
public class InputReader {
public static void main(String[] args) throws IOException
{
String string = "This is the test string which needs to be modified";
InputStream inpstr = new ByteArrayInputStream(string.getBytes());
BufferedReader rdr = new BufferedReader(new InputStreamReader(inpstr));
String buf = null;
while ((buf = rdr.readLine()) != null) {
// Apply regex on buf
// build output
}
}
}
However I've always like to use inheritance so I'd define this somewhere:
class MyReader extends BufferedReader {
public MyReader(Reader in)
{
super(in);
}
#Override
public String readLine() throws IOException {
String lBuf = super.readLine();
// Perform matching & subst on read string
return lBuf;
}
}
And use MyReader in place of standard BufferedReader keeping the substitution hidden inside the readLine method.
Pros: substitution logic is in a specified Reader, code is pretty standard.
Cons: it hides the substitution logic to the caller (sometimes this is also a pro, still it depends on usage case)
HTH
May be I understood you wrong, but I think you should build a stack machine. I mean you can use a small string stack to collect text and check condition of replacement.
If just collected stack already is not matched to your condition, just flush stack to output and collect it again.
If your stack is similar with condition, carry on collecting it.
If your stack is matched your condition, make a modification and flush modified stack to output.

Categories