Reading a file and editing it in Java - java

What I am doing is I am reading in a html file and I am looking for a specific location in the html for me to enter some text.
So I am using a bufferedreader to read in the html file and split it by the tag . I want to enter some text before this but I am not sure how to do this. The html would then be along the lines of ...(newText)(/HEAD) (The brackets round head are meant to be angled brackets. Don't know how to insert them)
Would I need a PrintWriter to the same file and if so, how would I tell that to write it in the correct location.
I am not sure which way would be most efficient to do something like this.
Please Help.
Thanks in advance.
Here is part of my java code:
File f = new File("newFile.html");
FileOutputStream fos = new FileOutputStream(f);
PrintWriter pw = new PrintWriter(fos);
BufferedReader read = new BufferedReader(new FileReader("file.html"));
String str;
int i=0;
boolean found = false;
while((str= read.readLine()) != null)
{
String[] data = str.split("</HEAD>");
if(found == false)
{
pw.write(data[0]);
System.out.println(data[0]);
pw.write("</script>");
found = true;
}
if(i < 1)
{
pw.write(data[1]);
System.out.println(data[1]);
i++;
}
pw.write(str);
System.out.println(str);
}
}
catch (Exception e) {
e.printStackTrace( );
}
When I do this it gets to a point in the file and I get these errors:
FATAL ERROR: MERLIN: Unable to connect to EDG API,
Cannot find .edg_properties file.,
java.lang.OutOfMemoryError: unable to create new native thread,
Cannot truncate table,
EXCEPTION:Cannot open connection to server: SQLExceptio,
Caught IOException: java.io.IOException: JZ0C0: Connection is already closed, ...
I'm not sure why I get these or what all of these mean?
please Help.

Should be pretty easy:
Read file into a String
Split into before/after chunks
Open a temp file for writing
Write before chunk, your text, after chunk
Close up, and move temp file to original
Sounds like you are wondering about the last couple steps in particular. Here is the essential code:
File htmlFile = ...;
...
File tempFile = File.createTempFile("foo", ".html");
FileWriter writer = new FileWriter(tempFile);
writer.write(before);
writer.write(yourText);
writer.write(after);
writer.close();
tempFile.renameTo(htmlFile);

Most people suggest writing to a temporary file and then copying the temporary file over the original on successful completion.

The forum thread has some ideas of how to do it.
GL.

For reading and writing you can use FileReaders/FileWriters or the corresponding IO stream classes.
For the editing, I'd suggest to use an HTML parser to handle the document. It can read the HTML document into an internal datastructure which simplifies your effort to search for content and apply modification. (Most?) Parsers can serialize the document to HTML again.
At least you're sure to not corrupt the HTML document structure.

Following up on the list of errors in your edit, a lot of that possibly stems from the OutOfMemoryError. That means you simply ran out of memory in the JVM, so Java was unable to allocate objects. This may be caused by a memory leak in your application, or it could simply be that the work you're trying to do does need more memory transiently than you have allocated it.
You can increase the amount of memory that the JVM starts up with by providing the Xmx argument to the java executable, e.g.:
-Xmx1024m
would set the maximum heap size to 1024 megabytes.
The other issues might possibly caused by this; when objects can't reliably be created or modified, lots of weird things tend to happen. That said, there's a few things that look like you can take action. In particular, whatever MERLIN is it looks like it can't do it's work because it needs a property file for EDG, which it's unable to find in the location it's looking. You'll probably need to either put a config file there, or tell it to look at another location.
The other IOExceptions are fairly self-explanatory. Your program could not establish a connection to the server because of a SQLException (the underlying exception itself will probably be found in the logs); and some other part of the program tried to communicate to a remote machine using a closed connection.
I'd look at fixing the properties file (if it's not a benign error) and the memory issues first, and then seeing if any of the remaining problems still manifest.

Related

Create a text file if it doesn't exist and append to it if it does using Java BufferedWriter

This is probably ridiculously simple for gun Java programmers, yet the fact that I (a relative newbie to Java) couldn't find a simple, straightforward example of how to do it means that I'm going to use the self-answer option to hopefully prevent others going through similar frustration.
I needed to output error information to a simple text file. These actions would be infrequent and small (and sometimes not needed at all) so there is no point keeping a stream open for the file; the file is opened, written to and closed in the one action.
Unlike other "append" questions that I've come across, this one requires that the file be created on the first call to the method in that run of the Java application. The file will not exist before that.
The original code was:
Path pathOfLog = Paths.get(gsOutputPathUsed + gsOutputFileName);
Charset charSetOfLog = Charset.forName("US-ASCII");
bwOfLog = Files.newBufferedWriter(pathOfLog, charSetOfLog);
bwOfLog.append(stringToWrite, 0, stringToWrite.length());
iReturn = stringToWrite.length();
bwOfLog.newLine();
bwOfLog.close();
The variables starting with gs are pre-populated string variables showing the output location, and stringToWrite is an argument which is passed in.
So the .append method should be enough to show that I wanted to append content, right?
But it isn't; each time the procedure was called the file was left containing only the string of the most recent call.
The answer is that you also need to specify open options when calling the newBufferedWriter method. What gets you is the default arguments as specified in the documentation:
If no options are present then this method works as if the CREATE,
TRUNCATE_EXISTING, and WRITE options are present.
Specifically, it's TRUNCATE_EXISTING that causes the problem:
If the file already exists and it is opened for WRITE access, then its
length is truncated to 0.
The solution, then, is to change
bwOfLog = Files.newBufferedWriter(pathOfLog, charSetOfLog);
to
bwOfLog = Files.newBufferedWriter(pathOfLog, charSetOfLog,StandardOpenOption.CREATE, StandardOpenOption.APPEND);
Probably obvious to long time Java coders, less so to new ones. Hopefully this will help someone avoid a bit of head banging.
You can also try this :
Path path = Paths.get("C:\\Users", "textfile.txt");
String text = "\nHello how are you ?";
try (BufferedWriter writer = Files.newBufferedWriter(path, StandardCharsets.UTF_8, StandardOpenOption.APPEND,StandardOpenOption.CREATE)) {
writer.write(text);
} catch (IOException e) {
e.printStackTrace();
}

Delete file after staring connection using FileInputStream

I have a temporary file which I want to send the client from the controller in the Play Framework. Can I delete the file after opening a connection using FileInputStream? For example can I do something like this -
File file = getFile();
InputStream is = new FileInputStream(file);
file.delete();
renderBinary(is, "name.txt");
What if file is a large file? If I delete the file, will subsequent reads() on InputStream give an error? I have tried with files of around 1MB I don't get an error.
Sorry if this is a very naive question, but I could not find anything related to this and I am pretty new to Java
I just encountered this exact same scenario in some code I was asked to work on. The programmer was creating a temp file, getting an input stream on it, deleting the temp file and then calling renderBinary. It seems to work fine even for very large files, even into the gigabytes.
I was surprised by this and am still looking for some documentation that indicates why this works.
UPDATE: We did finally encounter a file that caused this thing to bomb. I think it was over 3 Gb. At that point, it became necessary to NOT delete the file while the rendering was in process. I actually ended up using the Amazon Queue service to queue up messages for these files. The messages are then retrieved by a scheduled deletion job. Works out nicely, even with clustered servers on a load balancer.
It seems counter-intuitive that the FileInputStream can still read after the file is removed.
DiskLruCache, a popular library in the Android world originating from the libcore of the Android platform, even relies on this "feature", as follows:
// Open all streams eagerly to guarantee that we see a single published
// snapshot. If we opened streams lazily then the streams could come
// from different edits.
InputStream[] ins = new InputStream[valueCount];
try {
for (int i = 0; i < valueCount; i++) {
ins[i] = new FileInputStream(entry.getCleanFile(i));
}
} catch (FileNotFoundException e) {
....
As #EJP pointed out in his comment on a similar question, "That's how Unix and Linux behave. Deleting a file is really deleting its name from the directory: the inode and the data persist while any processes have it open."
But I don't think it is a good idea to rely on it.

Creating a .txt file from scratch

I'm working on a microcontroller and I'm trying to write some data from some sensors into a .txt file on the SDcard and later on place the sd card in a card reader and read the data on the PC.
Does anyone know how to write a .txt file from scratch for a FAT32 file system? I don't have any predefined code/methods/functions to call, I'll need to create the code from nothin.
It's not a question for a specific programming language, that is why I tagged more than one. I can later on convert the code from C or Java to my programming language of choice. But I can't seem to find such low level methods/functions in any type of language :)
Any ideas?
FatFs is quite good, and highly portable. It has support for FAT12, FAT16 and FAT32, long filenames, seeking, reading and writing (most of these things can be switched on and off to change the memory footprint).
If you're really tight on memory there's also Petit FatFs, but it doesn't have write support by default and adding it would take some work.
After mounting the drive you'd simply open a file to create it. For example:
FATFS fatFs;
FIL newFile;
// The drive number may differ
if (f_mount(0, &fatFs) != FR_OK) {
// Something went wrong
}
if (f_open(&newFile, "/test.txt", FA_WRITE | FA_OPEN_ALWAYS) != FR_OK) {
// Something went wrong
}
If you really need to create the file using only your own code you'll have to traverse the FAT, looking for empty space and then creating new LFN entries (where you store the filename) and DIRENTs (which specify the clusters on the disk that will hold the file data).I can't see any reason for doing this except if this is some kind of homework / lab exercise. In any case you should do some reading about the FAT structure first and return with some more specific questions once you've got started.
In JAVA you can do like this
Writer output = null;
String text = "This is test message";
File file = new File("write.txt");
output = new BufferedWriter(new FileWriter(file));
output.write(text);
output.close();
System.out.println("Your file has been written");

Fastest way to access given lines of text file with and without using GZip and the Jar File (GZip in memory?)

I have given number (5-7) of large UTF8 text files (7 MB). In unicode their size is about 15MB each.
I need to load given parts of a given file. The files are known and does not change. I would like to access and load lines at give place as fast as possible. I load these lines adding HTML tags and display them in a JEditorPane. I know the bottle neck will be the rendering by the JEditorPane of the HTML generated but for now I would like to concentrate on the file access performances.
Moreover the user can search for a given word in all the files.
For now the code I use is :
private static void loadFile(String filename, int startLine, int stopLine) {
try {
FileInputStream fis = new FileInputStream(filename);
InputStreamReader isr = new InputStreamReader(fis, "UTF8");
BufferedReader reader = new BufferedReader(isr);
for (int j = startLine; j <= stopLine; j++) {
//here I add HTML tags
//or do string comparison in case of search by the user
sb.append(reader.readLine());
}
reader.close();
} catch (FileNotFoundException e) {
System.out.println(e);
} catch (IOException e) {
System.out.println(e);
}
}
Now my questions :
As the number of parts of each file is known, 67 in my case (for each file), I could create 67 smaller files. It will be "faster" to load a given part but be slower when I do a search as I must open each of the 67 file.
I have not done bench marking but my feelings says that opening 67 files in case of a search is much longer than the time to perform empty reader.readlines when loading a part of the file.
So in my case it is better to have a single larger file. Do you agree with that ?
If I put each large file in the resource, I mean in the Jar file, will the performance be worse, if yes is it significantly worse ?
And the related question is what if I zip each file to spare size. As far as I undersand a Jar file is simply a zip file.
I think I don't know how unzipping works. If I zip a file, will the file be decompressed in memory or will my program be able to access the given lines I need directly on the disk.
Same for the Jar file will it be decompressed in memory.
If unzipping is not in memory can someone edit my code to use zip file.
Final question and the most important for me. I could increase all the performance if everything was performed in memory, but due to unicode and the quite large files this could easily result in a heap of memory of more than 100MB. Is there a possibility of having the zip file loaded in memory and work on that. This would be fast and use only few memory.
Summary of the questions
In my case, 1 large file is best than plenty of small ones.
If files are zipped, is the unzip process (GZipInputStream) performed in memory. Is all the file unzipped in memory and then access or is it possible to access it directly on disk.
If yes to question 2, can someone edit my code to be able to do it ?
MOST IMPORTANT : is it possible to have the zip file loaded in memory and how ?
I hope my questions are clear enough. ;-)
UPDATE : Thanks to Mike for the getResourceAsStream hint, I get it working
Notice that benchmarking give that load the Gzip file is efficient, but in ma case is too slow.
~200 ms for the gzip file
~125 ms for the standard file so 1.6 times faster.
Assuming that the resource folder is called resources
private static void loadFile(String filename, int startLine, int stopLine) {
try {
GZIPInputStream zip = new GZIPInputStream(this.class.getResourceAsStream("resources/"+filename));
InputStreamReader isr = new InputStreamReader(zip, "UTF8");
BufferedReader reader = new BufferedReader(isr);
for (int j = startLine; j <= stopLine; j++) {
//here I add HTML tags
//or do string comparison in case of search by the user
sb.append(reader.readLine());
}
reader.close();
} catch (FileNotFoundException e) {
System.out.println(e);
} catch (IOException e) {
System.out.println(e);
}
}
If the files really aren't changing very often I would suggest using some other data structures. Creating a hash table of all the words and locations they show up would make searching much faster, creating an index of all the line start positions would make that process much faster.
But, to answer your questions more directly:
Yes, one large file is probably still better than many small files, I doubt that reading a line and decoding from UTF8 will be noticeable compared to opening many files, or decompressing many files.
Yes, the unzipping process is performed in memory, and on the fly. It happens as you request data, but acts as a buffered stream, it will decompress entire blocks at a time, so it is actually very efficient.
I can't fix your code directly, but I can suggest looking up getResourceAsStream:
http://docs.oracle.com/javase/6/docs/api/java/lang/Class.html#getResourceAsStream%28java.lang.String%29
This function will open a file that is in a zip / jar file and give you access to it as a stream, automatically decompressing it in memory as you use it.
If you treat it as a resource, java will do it all for you, you will have to read up on some of the specifics of handling resources, but java should handle it fairly intelligently.
I think it would be quicker for you to load the file(s) into memory. You can then zip around to whatever part of the file you need.
Take a look at RandomAccessFile for this.
The GZipInputStream reads the files into memory as a buffered stream.
That's another question entirely :)
Again, the zip file will be decompressed in memory depending on what Class you use to open it.

reduce number of opened files in java code

Hi I have some code that uses block
RandomAccessFile file = new RandomAccessFile("some file", "rw");
FileChannel channel = file.getChannel();
// some code
String line = "some data";
ByteBuffer buf = ByteBuffer.wrap(line.getBytes());
channel.write(buf);
channel.close();
file.close();
but the specific of the application is that I have to generate large number of temporary files, more then 4000 in average (used for Hive inserts to the partitioned table).
The problem is that sometimes I catch exception
Failed with exception Too many open files
during the app running.
I wounder if there any way to tell OS that file is closed already and not used anymore, why the
channel.close();
file.close();
does not reduce the number of opened files. Is there any way to do this in Java code?
I have already increased max number of opened files in
#/etc/sysctl.conf:
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.ipc.somaxconn=8096
Update:
I tried to eliminate the problem, so I parted the code to investigate each part of it (create files, upload to hive, delete files).
Using class 'File' or 'RandomAccessFile' fails with the exception "Too many open files".
Finally I used the code:
FileOutputStream s = null;
FileChannel c = null;
try {
s = new FileOutputStream(filePath);
c = s.getChannel();
// do writes
c.write("some data");
c.force(true);
s.getFD().sync();
} catch (IOException e) {
// handle exception
} finally {
if (c != null)
c.close();
if (s != null)
s.close();
}
And this works with large amounts of files (tested on 20K with 5KB size each). The code itself does not throw exception as previous two classes.
But production code (with hive) still had the exception. And it appears that the hive connection through the JDBC is the reason of it.
I will investigate further.
The amount of open file handles that can be used by the OS is not the same thing as the number of file handles that can be opened by a process. Most unix systems restrict the number of file handles per process. Most likely it something like 1024 file handles for your JVM.
a) You need to set the ulimit in the shell that launches the JVM to some higher value. (Something like 'ulimit -n 4000')
b) You should verify that you don't have any resource leaks that are preventing your files from being 'finalized'.
Make sure to use a finally{} block. If there is an exception for some reason the close will never happen in the code as written.
Is this the exact code? Because I can think of one scenario where you might be opening all the files in a loop and written the code to close all of them in the end which is causing this problem. Please post the full code.

Categories