how to write UTF8 data to xml file using RandomAccessFile? - java

When trying to write some UTF8 data to a file, I end up with some garbage in the file. The code is as follows
public static boolean saveToFile(StringBuffer buffer,
String fileName,
ArrayList exceptionList,
String className)
{
log.debug("In saveToFile for file [" + fileName + "]");
RandomAccessFile raf = null;
File file = new File(fileName);
File backupFile = new File(fileName+"_bck");
try
{
if (file.exists())
{
if (backupFile.exists())
{
backupFile.delete();
}
file.renameTo(backupFile);
}
raf = new RandomAccessFile(file, "rw");
raf.writeBytes(buffer.toString());
raf.close();
The output of buffer.toString() is
<?xml version="1.0" encoding="UTF-8"?>
<ivr>
<version>1.1</version>
<templateName>αβγδεζη
The data in the file however is
<?xml version="1.0" encoding="UTF-8"?>
<ivr>
<version>1.1</version>
<templateName>▒▒▒▒▒▒▒</templateName>
How can I make sure that data i nthe file itself is UTF8

I'm not surpised you get garbage:
raf.writeBytes(buffer.toString())
The documentation for RandomAccessFile.writeBytes(String) says (emphasis added):
Writes the string to the file as a sequence of bytes. Each character in the string is written out, in sequence, by discarding its high eight bits.
In a few circumstances, that operation will result in a correctly encoded file. But in most it won't. That writeBytes() method is a foolish design by the Java developers. You need to correctly encode your text as bytes in UTF-8, and then write those bytes.
Do you really need to operate on the file as a random access file. If not, just manipulate it with a Writer wrapping an OutputStream.
You could use Charset.encode(CharBuffer) to produce a ByteBuffer holding the encoded bytes, then write those bytes to the file:
raf.write(StandardCharsets.UTF_8.encode(buffer).array());

The Javadoc for RandomAccessFile states that for writeBytes()
Writes the string to the file as a sequence of bytes. Each character
in the string is written out, in sequence, by discarding its high
eight bits. The write starts at the current position of the file
pointer.
Assuming that discarding parts of your String isn't what you want, you should be using writeUtf():
Writes a string to the file using modified UTF-8 encoding in a
machine-independent manner.

String txt = buffer.toString();
raf.write(txt.getBytes(StandardCharsets.UTF_8));

Related

what is the variable "data" storing in this java program?

My code is working. I just need to know about the role of a specific variable in the code.
I tried to print the value in the variable "data", but it gives me some numbers i cant understand.
public static void main(String[] args) throws IOException {
FileInputStream fileinputstream = new FileInputStream ("c:\\Users\\USER\\Desktop\\read.TXT");
FileOutputStream fileoutputstream = new FileOutputStream("c:\\Users\\USER\\Desktop\\write.TXT");
while (fileinputstream.available() > 0) {
int data = fileinputstream.read();
fileoutputstream.write(data);
}
fileinputstream.close();
fileoutputstream.close();
}
You can look at the docs for FileInputStream.read, which says:
Reads a byte of data from this input stream. This method blocks if no input is yet available.
Returns:
the next byte of data, or -1 if the end of the file is reached.
So the integer you got (i.e. the number stored in data) is the byte read from the file. Since your file is a text file, it is the ASCII value of the characters in that file (assuming your file is encoded in ASCII).
FileInputStream#read() reads a single byte of information from the underlying file.
Since these files are text files (according to their extensions), you probably should be using a FileInputStream, but a FileReader, to properly handle characters, and not the bytes that make them up.
fileinputstream.read() returns "the next byte of data, or -1 if the end of the file is reached."
You can read more here

Character digit not true when read from UTF-8 file

So im using a scanner to read a file. However i dont understand that if the file is a UTF-8 file, and the current line being read when iterating over the file, is containing a digit, the method Character.isDigit(line.charAt(0)) returns false. However if the file is not a UTF-8 file the method returns true.
Heres some code
File theFile = new File(pathToFile);
Scanner fileContent = new Scanner(new FileInputStream(theFile), "UTF-8");
while(fileContent.hasNextLine())
{
String line = fileContent.nextLine();
if(Character.isDigit(line.charAt(0)))
{
//When the file being read from is NOT a UTF-8 file, we get down here
}
When using the debugger and looking at the line String, i can see that in both cases (UTF-8 file or not) the string seems to hold the same, a digit. Why is this happening?
As finally found by exchanging comments, your file includes a BOM. This is generally not recommended for UTF-8 files because Java does not expect it and sees it as data.
So there are two options you have:
if you are in control of the file, reproduce it without the BOM
If not, then check the file for BOM existence and remove it before proceeding to other operations.
Here is some code to start. It rather skips than removes the BOM. Feel free to modify as you like. It was in some test utility I had written some years ago:
private static InputStream filterBOMifExists(InputStream inputStream) throws IOException {
PushbackInputStream pushbackInputStream = new PushbackInputStream(new BufferedInputStream(inputStream), 3);
byte[] bom = new byte[3];
if (pushbackInputStream.read(bom) != -1) {
if (!(bom[0] == (byte) 0xEF && bom[1] == (byte) 0xBB && bom[2] == (byte) 0xBF)) {
pushbackInputStream.unread(bom);
}
}
return pushbackInputStream;
}

Memory problems loading a file, plus converting into hex

I'm trying to make a file hexadecimal converter (input file -> output hex string of the file)
The code I came up with is
static String open2(String path) throws FileNotFoundException, IOException,OutOfMemoryError {
System.out.println("BEGIN LOADING FILE");
StringBuilder sb = new StringBuilder();
//sb.ensureCapacity(2147483648);
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
//System.out.println(sb.capacity());
sb.append(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return sb.toString();
}
I am sure that "path" is a valid filename.
The problem is with big files (>=
500mb), the compiler outputs a OutOfMemoryError: Java Heap Space on the StringBuilder.append.
To create this code I followed some tips from http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly but I got a doubt when I tried to force a space allocation for the StringBuilder sb: "2147483648 is too big for an int".
If I want to use this code even with very big files (let's say up to 2gb if I really have to stop somewhere) what's the better way to output a hexadecimal string conversion of the file in terms of speed?
I'm now working on copying the converted string into a file. Anyway I'm having problems of "writing the empty buffer on the file" after the eof of the original one.
static String open3(String path) throws FileNotFoundException, IOException {
System.out.println("BEGIN LOADING FILE (Hope this is the last change)");
FileWriter fos = new FileWriter("HEXTMP");
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
fos.write(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return "HEXTMP";
}
obviously the file HEXTMP created has a size multiple of 256k, but if the file is 257k it will be a 512 file with LOT of "000000" at the end.
I know I just have to create a last byte array with cut length.
(I used a file writer because i wanted to write the string of hex; otherwise it would have just copied the file as-is)
Why are you loading complete file?
You can load few bytes in buffer from input file, process bytes in buffer, then write processed bytes buffer to output file. Continue this till all bytes from input file are not processed.
FileInputStream fis = new FileInputStream("in file");
FileOutputStream fos = new FileOutputStream("out");
byte buffer [] = new byte[8192];
while(true){
int count = fis.read(buffer);
if(count == -1)
break;
byte[] processed = processBytesToConvert(buffer, count);
fos.write(processed);
}
fis.close();
fos.close();
So just read few bytes in buffer, convert it to hex string, get bytes from converted hex string, then write back these bytes to file, and continue for next few input bytes.
The problem here is that you try to read the whole file and store it in memory.
You should use stream, read some lines of your input file, convert them and write them in the output file. That way your program can scale, whatever the size of the input file is.
The key would be to read file in chunks instead of reading all of it in one go. Depending on its use you could vary size of the chunk. For example, if you are trying to make a hex viewer / editor determine how much content is being shown in the viewport and read only as much of data from file. Or if you are simply converting and dumping hex to another file use any chunk size that is small enough to fit in memory but big enough for performance. This should be tunable over some runs. Perhaps use filesystem NIO in Java 7 so that you can do all three tasks - reading, processing and writing - concurrently. The link included in question gives good primer on reading files.

Java Apache FileUtils readFileToString and writeStringToFile problems

I need to parse a java file (actually a .pdf) to an String and go back to a file. Between those process I'll apply some patches to the given string, but this is not important in this case.
I've developed the following JUnit test case:
String f1String=FileUtils.readFileToString(f1);
File temp=File.createTempFile("deleteme", "deleteme");
FileUtils.writeStringToFile(temp, f1String);
assertTrue(FileUtils.contentEquals(f1, temp));
This test converts a file to a string and writtes it back. However the test is failing.
I think it may be because of the encodings, but in FileUtils there is no much detailed info about this.
Anyone can help?
Thanks!
Added for further undestanding:
Why I need this?
I have very large pdfs in one machine, that are replicated in another one. The first one is in charge of creating those pdfs. Due to the low connectivity of the second machine and the big size of pdfs, I don't want to synch the whole pdfs, but only the changes done.
To create patches/apply them, I'm using the google library DiffMatchPatch. This library creates patches between two string. So I need to load a pdf to an string, apply a generated patch, and put it back to a file.
A PDF is not a text file. Decoding (into Java characters) and re-encoding of binary files that are not encoded text is asymmetrical. For example, if the input bytestream is invalid for the current encoding, you can be assured that it won't re-encode correctly. In short - don't do that. Use readFileToByteArray and writeByteArrayToFile instead.
Just a few thoughts:
There might actually some BOM (byte order mark) bytes in one of the files that either gets stripped when reading or added during writing. Is there a difference in the file size (if it is the BOM the difference should be 2 or 3 bytes)?
The line breaks might not match, depending which system the files are created on, i.e. one might have CR LF while the other only has LF or CR. (1 byte difference per line break)
According to the JavaDoc both methods should use the default encoding of the JVM, which should be the same for both operations. However, try and test with an explicitly set encoding (JVM's default encoding would be queried using System.getProperty("file.encoding")).
Ed Staub awnser points why my solution is not working and he suggested using bytes instead of Strings. In my case I need an String, so the final working solution I've found is the following:
#Test
public void testFileRWAsArray() throws IOException{
String f1String="";
byte[] bytes=FileUtils.readFileToByteArray(f1);
for(byte b:bytes){
f1String=f1String+((char)b);
}
File temp=File.createTempFile("deleteme", "deleteme");
byte[] newBytes=new byte[f1String.length()];
for(int i=0; i<f1String.length(); ++i){
char c=f1String.charAt(i);
newBytes[i]= (byte)c;
}
FileUtils.writeByteArrayToFile(temp, newBytes);
assertTrue(FileUtils.contentEquals(f1, temp));
}
By using a cast between byte-char, I have the symmetry on conversion.
Thank you all!
Try this code...
public static String fetchBase64binaryEncodedString(String path) {
File inboundDoc = new File(path);
byte[] pdfData;
try {
pdfData = FileUtils.readFileToByteArray(inboundDoc);
} catch (IOException e) {
throw new RuntimeException(e);
}
byte[] encodedPdfData = Base64.encodeBase64(pdfData);
String attachment = new String(encodedPdfData);
return attachment;
}
//How to decode it
public void testConversionPDFtoBase64() throws IOException
{
String path = "C:/Documents and Settings/kantab/Desktop/GTR_SDR/MSDOC.pdf";
File origFile = new File(path);
String encodedString = CreditOneMLParserUtil.fetchBase64binaryEncodedString(path);
//now decode it
byte[] decodeData = Base64.decodeBase64(encodedString.getBytes());
String decodedString = new String(decodeData);
//or actually give the path to pdf file.
File decodedfile = File.createTempFile("DECODED", ".pdf");
FileUtils.writeByteArrayToFile(decodedfile,decodeData);
Assert.assertTrue(FileUtils.contentEquals(origFile, decodedfile));
// Frame frame = new Frame("PDF Viewer");
// frame.setLayout(new BorderLayout());
}

Inserting text into an existing file via Java

I would like to create a simple program (in Java) which edits text files - particularly one which performs inserting arbitrary pieces of text at random positions in a text file. This feature is part of a larger program I am currently writing.
Reading the description about java.util.RandomAccessFile, it appears that any write operations performed in the middle of a file would actually overwrite the exiting content. This is a side-effect which I would like to avoid (if possible).
Is there a simple way to achieve this?
Thanks in advance.
Okay, this question is pretty old, but FileChannels exist since Java 1.4 and I don't know why they aren't mentioned anywhere when dealing with the problem of replacing or inserting content in files. FileChannels are fast, use them.
Here's an example (ignoring exceptions and some other stuff):
public void insert(String filename, long offset, byte[] content) {
RandomAccessFile r = new RandomAccessFile(new File(filename), "rw");
RandomAccessFile rtemp = new RandomAccessFile(new File(filename + "~"), "rw");
long fileSize = r.length();
FileChannel sourceChannel = r.getChannel();
FileChannel targetChannel = rtemp.getChannel();
sourceChannel.transferTo(offset, (fileSize - offset), targetChannel);
sourceChannel.truncate(offset);
r.seek(offset);
r.write(content);
long newOffset = r.getFilePointer();
targetChannel.position(0L);
sourceChannel.transferFrom(targetChannel, newOffset, (fileSize - offset));
sourceChannel.close();
targetChannel.close();
}
Well, no, I don't believe there is a way to avoid overwriting existing content with a single, standard Java IO API call.
If the files are not too large, just read the entire file into an ArrayList (an entry per line) and either rewrite entries or insert new entries for new lines.
Then overwrite the existing file with new content, or move the existing file to a backup and write a new file.
Depending on how sophisticated the edits need to be, your data structure may need to change.
Another method would be to read characters from the existing file while writing to the edited file and edit the stream as it is read.
If Java has a way to memory map files, then what you can do is extend the file to its new length, map the file, memmove all the bytes down to the end to make a hole and write the new data into the hole.
This works in C. Never tried it in Java.
Another way I just thought of to do the same but with random file access.
Seek to the end - 1 MB
Read 1 MB
Write that to original position + gap size.
Repeat for each previous 1 MB working toward the beginning of the file.
Stop when you reach the desired gap position.
Use a larger buffer size for faster performance.
You can use following code:
BufferedReader reader = null;
BufferedWriter writer = null;
ArrayList list = new ArrayList();
try {
reader = new BufferedReader(new FileReader(fileName));
String tmp;
while ((tmp = reader.readLine()) != null)
list.add(tmp);
OUtil.closeReader(reader);
list.add(0, "Start Text");
list.add("End Text");
writer = new BufferedWriter(new FileWriter(fileName));
for (int i = 0; i < list.size(); i++)
writer.write(list.get(i) + "\r\n");
} catch (Exception e) {
e.printStackTrace();
} finally {
OUtil.closeReader(reader);
OUtil.closeWriter(writer);
}
I don't know if there's a handy way to do it straight otherwise than
read the beginning of the file and write it to target
write your new text to target
read the rest of the file and write it to target.
About the target : You can construct the new contents of the file in memory and then overwrite the old content of the file if the files handled aren't so big. Or you can write the result to a temporary file.
The thing would probably be easiest to do with streams, RandomAccessFile doesn't seem to be meant for inserting in the middle (afaik). Check the tutorial if you need.
I believe the only way to insert text into an existing text file is to read the original file and write the content in a temporary file with the new text inserted. Then erase the original file and rename the temporary file to the original name.
This example is focused on inserted a single line into an existing file, but still maybe of use to you.
If it is a text file,,,,Read the existing file in StringBuffer and append the new content in the same StringBuffer now u can write the SrtingBuffer on file. so now the file contains both the existing and new text.
As #xor_eq answer's edit queue is full, here in a new answer a more documented and slightly improved version of his:
public static void insert(String filename, long offset, byte[] content) throws IOException {
File temp = Files.createTempFile("insertTempFile", ".temp").toFile(); // Create a temporary file to save content to
try (RandomAccessFile r = new RandomAccessFile(new File(filename), "rw"); // Open file for read & write
RandomAccessFile rtemp = new RandomAccessFile(temp, "rw"); // Open temporary file for read & write
FileChannel sourceChannel = r.getChannel(); // Channel of file
FileChannel targetChannel = rtemp.getChannel()) { // Channel of temporary file
long fileSize = r.length();
sourceChannel.transferTo(offset, (fileSize - offset), targetChannel); // Copy content after insert index to
// temporary file
sourceChannel.truncate(offset); // Remove content past insert index from file
r.seek(offset); // Goto back of file (now insert index)
r.write(content); // Write new content
long newOffset = r.getFilePointer(); // The current offset
targetChannel.position(0L); // Goto start of temporary file
sourceChannel.transferFrom(targetChannel, newOffset, (fileSize - offset)); // Copy all content of temporary
// to end of file
}
Files.delete(temp.toPath()); // Delete the temporary file as not needed anymore
}

Categories