How can convert a certain file into in a String? And then from that String to get the original file? This needs to realize it using Java.
It would be greate an example of code or a link to a tutorial.
Have to converted an entire File (not thecontent) in the String and then the String to the File.
Thanks.
NB: In both cases below you will have to digitally sign your applet, so that it is granted permissions to read/write from/to the file system!
If you can use external libraries, use Apache commons-io (you will have to sign these jars as well)
FileUtils.readFileToString(..)
FileUtils.writeStringToFile(..)
If you can't, you can see this tutorial.
This is for text files - but if you want to read binary files (.png for instance), you should not read it as string. Instead read it as byte[] and use commons-codec -
encode: String base64String = new String(Base64.encodeBase64(byteArray));
decode: byte[] originalBytes = Base64.decodeBase64(string.getByte());
Base64 is a string format for storing binary data.
public static String getFileContent(File file) throws IOException
{
FileReader in=new FileReader(file);
StringWriter w= new StringWriter();
char buffer[]=new char[2048];
int n=0;
while((n=in.read(buffer))!=-1)
{
w.write(buffer, 0, n);
}
w.flush();
in.close();
return w.toString();
}
public static void toFileContent(String s,File file) throws IOException
{
FileWriter f=new FileWriter(file);
f.write(s);
f.flush();
f.close();
}
Related
I have a file in ISO-8859-1 containing german umlauts and I need to unmarshall it using JAXB. But before I need the content in UTF-8.
#Override
public List<Usage> convert(InputStream input) {
try {
InputStream inputWithNamespace = addNamespaceIfMissing(input);
inputWithNamespace = convertFileToUtf(inputWithNamespace);
ORDR order = xmlUnmarshaller.unmarshall(inputWithNamespace, ORDR.class);
...
I get the "file" as an InputStream. My idea was to read the file's content in UTF-8 and make another InputStream to use. This is what I've tried:
private InputStream convertFileToUtf(InputStream inputStream) throws IOException {
byte[] bytesInIso = ByteStreams.toByteArray(inputStream);
String stringIso = new String(bytesInIso);
byte[] bytesInUtf = new String(bytesInIso, ISO_8859_1).getBytes(UTF_8);
String stringUtf = new String(bytesInUtf);
return new ByteArrayInputStream(bytesInUtf);
}
I have those 2 Strings to check the contents, but even just reading the ISO file, it gives question marks where umlauts are (?) and converting that to UTF_8 gives strange characters like 1/2 and so on.
UPDATE
byte[] bytesInIso = ByteStreams.toByteArray(inputWithNamespace);
String contentInIso = new String(bytesInIso);
byte[] bytesInUtf = new String(bytesInIso, ISO_8859_1).getBytes(UTF_8);
String contentInUtf = new String(bytesInUtf);
Verifying contentInIso prints question marks instead of the umlauts and by checking contentInIso instead of umlauts, it has characters like "�".
#Override
public List<Usage> convert(InputStream input) {
try {
InputStream inputWithNamespace = addNamespaceIfMissing(input);
byte[] bytesInIso = ByteStreams.toByteArray(inputWithNamespace);
String contentInIso = new String(bytesInIso);
byte[] bytesInUtf = new String(bytesInIso, ISO_8859_1).getBytes(UTF_8);
String contentInUtf = new String(bytesInUtf);
ORDR order = xmlUnmarshaller.unmarshall(inputWithNamespace, ORDR.class);
This method convert it's called by another one called processUsageFile:
private void processUsageFile(File usageFile) {
try (FileInputStream fileInputStream = new FileInputStream(usageFile)) {
usageImporterService.importUsages(usageFile.getName(), fileInputStream, getUsageTypeValidated(usageFile.getName()));
log.info("Usage file {} imported successfully. Moving to archive directory", usageFile.getName());
If i take the code I have written under the UPDATE statement and put it immediately after the try, the first contentInIso has question marks but the contentInUtf has the umlauts. Then, by going into the convert, jabx throws an exception that the file has a premature end of line.
Regarding the behaviour you are getting,
String stringIso = new String(bytesInIso);
In this step, you construct a new String by decoding the specified array of bytes using the platform's default charset.
Since this is probably not ISO_8859_1, I think the String you are looking at becomes garbled here.
I am able to create a pdf file but when I try to open the output pdf file I am getting error : "the file is damaged"
Here is my code please help me.
String encodedBytes= "QmFzZTY0IGVuY29kaW5nIHNjaGVtZXMgYXJlIHVzZWQgd2hlbiBiaW5hcnkgZGF0YSBuZWVkcyB0byBiZSBzdG9yZWQgb3IgdHJhbnNmZXJyZWQgYXMgdGV4dHVhbCBkYXRh";
BASE64Decoder decoder = new BASE64Decoder();
byte[] decodedBytes = decoder.decodeBuffer(encodedBytes);
File file = new File("C:/Users/istest/Documents/test.pdf");
FileOutputStream fos = new FileOutputStream(file);
fos.write(decodedBytes);
Your string is not a valid PDF file.
A pdf file should start its proper Magic number (please refer to the Format indicators section of this link)
PDF files start with "%PDF" (hex 25 50 44 46).
or in Base64 : JVBERi
if you try your code with a valid PDF encoded string like this one, it might work.
But because you did not provided the code of your BASE64Decoder class, it is hard to be sure that it will work.
For that reason, here is a simple implementation of the java.util.Base64 package (Warning do not copy/past this example and do not try it before changing the given base64 string here with the correct one as supplied in the previous link...as noted in the bellow comment, in order to be short the correct string was replaced by a corrupted one)
import java.io.File;
import java.io.FileOutputStream;
import java.util.Base64;
class Base64DecodePdf {
public static void main(String[] args) {
File file = new File("./test.pdf");
try ( FileOutputStream fos = new FileOutputStream(file); ) {
// To be short I use a corrupted PDF string, so make sure to use a valid one if you want to preview the PDF file
String b64 = "JVBERi0xLjUKJYCBgoMKMSAwIG9iago8PC9GaWx0ZXIvRmxhdGVEZWNvZGUvRmlyc3QgMTQxL04gMjAvTGVuZ3==";
byte[] decoder = Base64.getDecoder().decode(b64);
fos.write(decoder);
System.out.println("PDF File Saved");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Credit : source.
I know that there are some similar questions in the site, but they could not provide me a helpful answer. What is the best/most efficient way to read a .bin file in Java line by line? Which classes and methods should someone use to open it and get the data? Could Bufferedreader do the job or is it only for text files;
Binary file don't have lines, but you must know the format of the file to know what structure exists (headers, structs,etc) and write a parser.
You can use BufferedInputStream, see the following:
http://www.javapractices.com/topic/TopicAction.do?Id=245
Read structured data from binary file -?
This should do it.
public byte[] readFromStream(InputStream inputStream) throws Exception
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
byte[] data = new byte[4096];
int count = inputStream.read(data);
while(count != -1)
{
dos.write(data, 0, count);
count = inputStream.read(data);
}
return baos.toByteArray();
}
I need to parse a java file (actually a .pdf) to an String and go back to a file. Between those process I'll apply some patches to the given string, but this is not important in this case.
I've developed the following JUnit test case:
String f1String=FileUtils.readFileToString(f1);
File temp=File.createTempFile("deleteme", "deleteme");
FileUtils.writeStringToFile(temp, f1String);
assertTrue(FileUtils.contentEquals(f1, temp));
This test converts a file to a string and writtes it back. However the test is failing.
I think it may be because of the encodings, but in FileUtils there is no much detailed info about this.
Anyone can help?
Thanks!
Added for further undestanding:
Why I need this?
I have very large pdfs in one machine, that are replicated in another one. The first one is in charge of creating those pdfs. Due to the low connectivity of the second machine and the big size of pdfs, I don't want to synch the whole pdfs, but only the changes done.
To create patches/apply them, I'm using the google library DiffMatchPatch. This library creates patches between two string. So I need to load a pdf to an string, apply a generated patch, and put it back to a file.
A PDF is not a text file. Decoding (into Java characters) and re-encoding of binary files that are not encoded text is asymmetrical. For example, if the input bytestream is invalid for the current encoding, you can be assured that it won't re-encode correctly. In short - don't do that. Use readFileToByteArray and writeByteArrayToFile instead.
Just a few thoughts:
There might actually some BOM (byte order mark) bytes in one of the files that either gets stripped when reading or added during writing. Is there a difference in the file size (if it is the BOM the difference should be 2 or 3 bytes)?
The line breaks might not match, depending which system the files are created on, i.e. one might have CR LF while the other only has LF or CR. (1 byte difference per line break)
According to the JavaDoc both methods should use the default encoding of the JVM, which should be the same for both operations. However, try and test with an explicitly set encoding (JVM's default encoding would be queried using System.getProperty("file.encoding")).
Ed Staub awnser points why my solution is not working and he suggested using bytes instead of Strings. In my case I need an String, so the final working solution I've found is the following:
#Test
public void testFileRWAsArray() throws IOException{
String f1String="";
byte[] bytes=FileUtils.readFileToByteArray(f1);
for(byte b:bytes){
f1String=f1String+((char)b);
}
File temp=File.createTempFile("deleteme", "deleteme");
byte[] newBytes=new byte[f1String.length()];
for(int i=0; i<f1String.length(); ++i){
char c=f1String.charAt(i);
newBytes[i]= (byte)c;
}
FileUtils.writeByteArrayToFile(temp, newBytes);
assertTrue(FileUtils.contentEquals(f1, temp));
}
By using a cast between byte-char, I have the symmetry on conversion.
Thank you all!
Try this code...
public static String fetchBase64binaryEncodedString(String path) {
File inboundDoc = new File(path);
byte[] pdfData;
try {
pdfData = FileUtils.readFileToByteArray(inboundDoc);
} catch (IOException e) {
throw new RuntimeException(e);
}
byte[] encodedPdfData = Base64.encodeBase64(pdfData);
String attachment = new String(encodedPdfData);
return attachment;
}
//How to decode it
public void testConversionPDFtoBase64() throws IOException
{
String path = "C:/Documents and Settings/kantab/Desktop/GTR_SDR/MSDOC.pdf";
File origFile = new File(path);
String encodedString = CreditOneMLParserUtil.fetchBase64binaryEncodedString(path);
//now decode it
byte[] decodeData = Base64.decodeBase64(encodedString.getBytes());
String decodedString = new String(decodeData);
//or actually give the path to pdf file.
File decodedfile = File.createTempFile("DECODED", ".pdf");
FileUtils.writeByteArrayToFile(decodedfile,decodeData);
Assert.assertTrue(FileUtils.contentEquals(origFile, decodedfile));
// Frame frame = new Frame("PDF Viewer");
// frame.setLayout(new BorderLayout());
}
I use the following code to save Chinese characters into a .txt file, but when I opened it with Wordpad, I couldn't read it.
StringBuffer Shanghai_StrBuf = new StringBuffer("\u4E0A\u6D77");
boolean Append = true;
FileOutputStream fos;
fos = new FileOutputStream(FileName, Append);
for (int i = 0;i < Shanghai_StrBuf.length(); i++) {
fos.write(Shanghai_StrBuf.charAt(i));
}
fos.close();
What can I do ? I know if I cut and paste Chinese characters into Wordpad, I can save it into a .txt file. How do I do that in Java ?
There are several factors at work here:
Text files have no intrinsic metadata for describing their encoding (for all the talk of angle-bracket taxes, there are reasons XML is popular)
The default encoding for Windows is still an 8bit (or doublebyte) "ANSI" character set with a limited range of values - text files written in this format are not portable
To tell a Unicode file from an ANSI file, Windows apps rely on the presence of a byte order mark at the start of the file (not strictly true - Raymond Chen explains). In theory, the BOM is there to tell you the endianess (byte order) of the data. For UTF-8, even though there is only one byte order, Windows apps rely on the marker bytes to automatically figure out that it is Unicode (though you'll note that Notepad has an encoding option on its open/save dialogs).
It is wrong to say that Java is broken because it does not write a UTF-8 BOM automatically. On Unix systems, it would be an error to write a BOM to a script file, for example, and many Unix systems use UTF-8 as their default encoding. There are times when you don't want it on Windows, either, like when you're appending data to an existing file: fos = new FileOutputStream(FileName,Append);
Here is a method of reliably appending UTF-8 data to a file:
private static void writeUtf8ToFile(File file, boolean append, String data)
throws IOException {
boolean skipBOM = append && file.isFile() && (file.length() > 0);
Closer res = new Closer();
try {
OutputStream out = res.using(new FileOutputStream(file, append));
Writer writer = res.using(new OutputStreamWriter(out, Charset
.forName("UTF-8")));
if (!skipBOM) {
writer.write('\uFEFF');
}
writer.write(data);
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
String chinese = "\u4E0A\u6D77";
boolean append = true;
writeUtf8ToFile(new File("chinese.txt"), append, chinese);
}
Note: if the file already existed and you chose to append and existing data wasn't UTF-8 encoded, the only thing that code will create is a mess.
Here is the Closer type used in this code:
public class Closer implements Closeable {
private Closeable closeable;
public <T extends Closeable> T using(T t) {
closeable = t;
return t;
}
#Override public void close() throws IOException {
if (closeable != null) {
closeable.close();
}
}
}
This code makes a Windows-style best guess about how to read the file based on byte order marks:
private static final Charset[] UTF_ENCODINGS = { Charset.forName("UTF-8"),
Charset.forName("UTF-16LE"), Charset.forName("UTF-16BE") };
private static Charset getEncoding(InputStream in) throws IOException {
charsetLoop: for (Charset encodings : UTF_ENCODINGS) {
byte[] bom = "\uFEFF".getBytes(encodings);
in.mark(bom.length);
for (byte b : bom) {
if ((0xFF & b) != in.read()) {
in.reset();
continue charsetLoop;
}
}
return encodings;
}
return Charset.defaultCharset();
}
private static String readText(File file) throws IOException {
Closer res = new Closer();
try {
InputStream in = res.using(new FileInputStream(file));
InputStream bin = res.using(new BufferedInputStream(in));
Reader reader = res.using(new InputStreamReader(bin, getEncoding(bin)));
StringBuilder out = new StringBuilder();
for (int ch = reader.read(); ch != -1; ch = reader.read())
out.append((char) ch);
return out.toString();
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
System.out.println(readText(new File("chinese.txt")));
}
(System.out uses the default encoding, so whether it prints anything sensible depends on your platform and configuration.)
If you can rely that the default character encoding is UTF-8 (or some other Unicode encoding), you may use the following:
Writer w = new FileWriter("test.txt");
w.append("上海");
w.close();
The safest way is to always explicitly specify the encoding:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
w.append("上海");
w.close();
P.S. You may use any Unicode characters in Java source code, even as method and variable names, if the -encoding parameter for javac is configured right. That makes the source code more readable than the escaped \uXXXX form.
Be very careful with the approaches proposed. Even specifying the encoding for the file as follows:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
will not work if you're running under an operating system like Windows. Even setting the system property for file.encoding to UTF-8 does not fix the issue. This is because Java fails to write a byte order mark (BOM) for the file. Even if you specify the encoding when writing out to a file, opening the same file in an application like Wordpad will display the text as garbage because it doesn't detect the BOM. I tried running the examples here in Windows (with a platform/container encoding of CP1252).
The following bug exists to describe the issue in Java:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
The solution for the time being is to write the byte order mark yourself to ensure the file opens correctly in other applications. See this for more details on the BOM:
http://mindprod.com/jgloss/bom.html
and for a more correct solution see the following link:
http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html
Here's one way among many. Basically, we're just specifying that the conversion be done to UTF-8 before outputting bytes to the FileOutputStream:
String FileName = "output.txt";
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer writer = new OutputStreamWriter(new FileOutputStream(FileName,Append), "UTF-8");
writer.write(Shanghai_StrBuf.toString(), 0, Shanghai_StrBuf.length());
writer.close();
I manually verified this against the images at http://www.fileformat.info/info/unicode/char/ . In the future, please follow Java coding standards, including lower-case variable names. It improves readability.
Try this,
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(FileName,Append), "UTF8"));
for (int i=0;i<Shanghai_StrBuf.length();i++) out.write(Shanghai_StrBuf.charAt(i));
out.close();