I need to parse a java file (actually a .pdf) to an String and go back to a file. Between those process I'll apply some patches to the given string, but this is not important in this case.
I've developed the following JUnit test case:
String f1String=FileUtils.readFileToString(f1);
File temp=File.createTempFile("deleteme", "deleteme");
FileUtils.writeStringToFile(temp, f1String);
assertTrue(FileUtils.contentEquals(f1, temp));
This test converts a file to a string and writtes it back. However the test is failing.
I think it may be because of the encodings, but in FileUtils there is no much detailed info about this.
Anyone can help?
Thanks!
Added for further undestanding:
Why I need this?
I have very large pdfs in one machine, that are replicated in another one. The first one is in charge of creating those pdfs. Due to the low connectivity of the second machine and the big size of pdfs, I don't want to synch the whole pdfs, but only the changes done.
To create patches/apply them, I'm using the google library DiffMatchPatch. This library creates patches between two string. So I need to load a pdf to an string, apply a generated patch, and put it back to a file.
A PDF is not a text file. Decoding (into Java characters) and re-encoding of binary files that are not encoded text is asymmetrical. For example, if the input bytestream is invalid for the current encoding, you can be assured that it won't re-encode correctly. In short - don't do that. Use readFileToByteArray and writeByteArrayToFile instead.
Just a few thoughts:
There might actually some BOM (byte order mark) bytes in one of the files that either gets stripped when reading or added during writing. Is there a difference in the file size (if it is the BOM the difference should be 2 or 3 bytes)?
The line breaks might not match, depending which system the files are created on, i.e. one might have CR LF while the other only has LF or CR. (1 byte difference per line break)
According to the JavaDoc both methods should use the default encoding of the JVM, which should be the same for both operations. However, try and test with an explicitly set encoding (JVM's default encoding would be queried using System.getProperty("file.encoding")).
Ed Staub awnser points why my solution is not working and he suggested using bytes instead of Strings. In my case I need an String, so the final working solution I've found is the following:
#Test
public void testFileRWAsArray() throws IOException{
String f1String="";
byte[] bytes=FileUtils.readFileToByteArray(f1);
for(byte b:bytes){
f1String=f1String+((char)b);
}
File temp=File.createTempFile("deleteme", "deleteme");
byte[] newBytes=new byte[f1String.length()];
for(int i=0; i<f1String.length(); ++i){
char c=f1String.charAt(i);
newBytes[i]= (byte)c;
}
FileUtils.writeByteArrayToFile(temp, newBytes);
assertTrue(FileUtils.contentEquals(f1, temp));
}
By using a cast between byte-char, I have the symmetry on conversion.
Thank you all!
Try this code...
public static String fetchBase64binaryEncodedString(String path) {
File inboundDoc = new File(path);
byte[] pdfData;
try {
pdfData = FileUtils.readFileToByteArray(inboundDoc);
} catch (IOException e) {
throw new RuntimeException(e);
}
byte[] encodedPdfData = Base64.encodeBase64(pdfData);
String attachment = new String(encodedPdfData);
return attachment;
}
//How to decode it
public void testConversionPDFtoBase64() throws IOException
{
String path = "C:/Documents and Settings/kantab/Desktop/GTR_SDR/MSDOC.pdf";
File origFile = new File(path);
String encodedString = CreditOneMLParserUtil.fetchBase64binaryEncodedString(path);
//now decode it
byte[] decodeData = Base64.decodeBase64(encodedString.getBytes());
String decodedString = new String(decodeData);
//or actually give the path to pdf file.
File decodedfile = File.createTempFile("DECODED", ".pdf");
FileUtils.writeByteArrayToFile(decodedfile,decodeData);
Assert.assertTrue(FileUtils.contentEquals(origFile, decodedfile));
// Frame frame = new Frame("PDF Viewer");
// frame.setLayout(new BorderLayout());
}
Related
I have the following file, which contains a binary representation of an .MSG file :
binaryMessage.txt
And I put it in my Eclipse workspace, in the following folder - src/main/resources/test :
I want to use the string which is within this text file , within the following JUnit code, so I tried the following way :
request.setContent("src/main/resources/test/binaryMessage");
mockMvc.perform(post(EmailController.PATH__METADATA_EXTRACTION_OPERATION)
.contentType(MediaType.APPLICATION_JSON)
.content(json(request)))
.andExpect(status().is2xxSuccessful());
}
But this doesn't work. Is there a way I can pass in the string the file directly without using IO code ?
You can't read a file without using IO code (or libraries that use IO code). That said, it's not that difficult to read the file into memory so you can send it.
To read a binary file into a byte[] you can use this method:
private byte[] readToByteArray(InputStream is) throws IOException {
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int len;
while ((len = in.read(buffer)) != -1) {
baos.write(buffer, 0, len);
}
return baos.toByteArray();
} finally {
if (is != null) {
is.close();
}
}
}
Then you can do
request.setContent(readToByteArray(getClass().getResourceAsStream("test/binaryMessage")));
In addition to my comment on Samuel's answer, I just noticed that you depend on your concrete execution directory. I personally don't like that and normally use the class loader's functions to find resources.
Thus, to be independent of your working directory, you can use
getClass().getResource("/test/binaryMessage")
Convert this to URI and Path, then use Files.readAllBytes to fetch the contents:
Path resourcePath = Paths.get(getClass().getResource("/test/binaryMessage").toURI());
byte[] content = Files.readAllBytes(resourcePath);
... or even roll that into a single expression.
But to get back to your original question: no, this is I/O code, and you need it. But since the dawn of Java 7 (in 2011!) this does not need to be painful anymore.
I have some text strings that I need to process and inside the strings there are HTML special characters. For example:
10ðŸ˜ðŸ˜ðŸ˜‚😂😂😂😢😂10ðŸ˜ðŸ˜ðŸ˜‚😂😂😂😢😂😂
I would like to convert those characters to utf-8.
I used org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4 but didn't have any luck. Is there an easy way to deal with this problem?
Apache commons-text library has the StringEscapeUtils class that has the unescapeHtml4() utility method.
String utf8Str = StringEscapeUtils.unescapeHtml4(htmlStr);
You may also need unescapeXml()
#Bohemian 's code is correct, It works for me, your un-encoded string is 10ðŸ˜ðŸ˜ðŸ˜‚😂😂😂😢😂10ðŸ˜ðŸ˜ðŸ˜‚😂😂😂😢😂😂.
Now, I'm adding another answer instead of commenting on Bohemian's answer because there are two things that still need to be mentioned:
I copy-pasted your string into HTML code and the browser can't render your characters properly, because your String is incorrectly encoded, i. e. the string has encoded the high surrogate and the low one for two-bytes-chars separately, instead of encoding the whole codepoint (it seems the original string is a UTF-16 encoded string, maybe a Java String?).
You want the string to be re-encoded to UTF-8.
Once you have your String unencoded by StringEscapeUtils.unescapeHtml(htmlStr) (which un-encodes your string successfully despite being encoded incorrectly), it doesn't have much sense talking about "string encodings" as java strings are "unaware" about encodings. (they use UTF-16 internally though).
If you need a group of bytes containing a UTF-8 encoded "string", you need to get the "raw" bytes from a String encoded as UTF-8:
String javaStr = StringEscapeUtils.unescapeHtml(htmlStr);
byte[] rawUft8String = javaStr.getBytes("UTF-8");
And do with such byte array whatever you need.
Now if what you need is to write a UTF-8 encoded string to a File, instead of that byte array you need to specify the encoding when you create the proper java.io.Writer.
Try this code to un-encode your string (change the file path first) and then open the resulting file in any editor that supports UTF-8:
java.io.Writer approach (better):
public static void main(String[] args) throws IOException {
String str = "10ðŸ˜ðŸ˜ðŸ˜‚😂😂😂😢😂10ðŸ˜ðŸ˜ðŸ˜‚😂😂😂😢😂😂";
String javaString = StringEscapeUtils.unescapeHtml(str);
try(Writer output = new OutputStreamWriter(
new FileOutputStream("/path/to/testing.txt"), "UTF-8")) {
output.write(javaString);
}
}
java.io.OutputStream approach (if you already have a "raw string"):
public static void main(String[] args) throws IOException {
String str = "10ðŸ˜ðŸ˜ðŸ˜‚😂😂😂😢😂10ðŸ˜ðŸ˜ðŸ˜‚😂😂😂😢😂😂";
String javaString = StringEscapeUtils.unescapeHtml(str);
try(OutputStream output = new FileOutputStream("/path/to/testing.txt")) {
for (byte b : javaString.getBytes(Charset.forName("UTF-8"))) {
output.write(b);
}
}
}
Is there a possibility to get the encoding of a existing .txt file? for example: you know a customer needs a specific encoding and you want to automize the process of .sql-data delivery. then you read out the endcoding from a client config and compare it to the current encoding of the file to be delivered. if they differ you change the encoding. could not find a solution till now. any help would be appreciated.
There is no explicit declaration of text encoding in files, but you can guess the encoding by analyzing specific byte sequences that are characteristic of a certain encoding.
Chardet does exactly that and tries to guess. If it can't say for sure what the encoding is, it will give you a list with confidence values (e.g. "90% this is utf8"). The project includes both a Python module and a command line tool. For a Java version, see JChardet.
My 2cents: if you just need a quick way to detect, the command line chardet tool is the way to go.
juniversalchardet is one of the best available API for detecting the encoding type. Please checkout this link. You can go through the list of encoding types supported by it
Working Example from the site
import org.mozilla.universalchardet.UniversalDetector;
public class TestDetector {
public static void main(String[] args) throws java.io.IOException {
byte[] buf = new byte[4096];
String fileName = args[0];
java.io.FileInputStream fis = new java.io.FileInputStream(fileName);
// (1)
UniversalDetector detector = new UniversalDetector(null);
// (2)
int nread;
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
// (3)
detector.dataEnd();
// (4)
String encoding = detector.getDetectedCharset();
if (encoding != null) {
System.out.println("Detected encoding = " + encoding);
} else {
System.out.println("No encoding detected.");
}
// (5)
detector.reset();
}
}
Hope this helps!
I want to add a help screen to my Codename One App.
As the text is longer as other strings, I would like put it in a separate file and add it to the app-package.
How do I do this? Where do I put the text file, and how can I easily read it in one go into a string?
(I already know how to put the string into a text area inside a form)
In the Codename One Designer go to the data section and add a file.
You can just add the text there and fetch it using myResFile.getData("name");.
You can also store the file within the src directory and get it using Display.getInstance().getResourceAsStream("/filename.txt");
I prefer to have the text file in the filesystem instead of the resource editor, because I can just edit the text with the IDE. The method getResourceAsStream is the first part of the solution. The second part is to load the text in one go. There was no support for this in J2ME, you needed to read, handle buffers etc. yourself. Fortunately there is a utility method in codename one. So my working method now looks like this:
final String HelpTextFile = "/helptext.txt";
...
InputStream in = Display.getInstance().getResourceAsStream(
Form.class, HelpTextFile);
if (in != null){
try {
text = com.codename1.io.Util.readToString(in);
in.close();
} catch (IOException ex) {
System.out.println(ex);
text = "Read Error";
}
}
The following code worked for me.
//Gets a file system storage instance
FileSystemStorage inst = FileSystemStorage.getInstance();
//Gets CN1 home`
final String homePath = inst.getAppHomePath();
final char sep = inst.getFileSystemSeparator();
// Getting input stream of the file
InputStream is = inst.openInputStream(homePath + sep + "MyText.txt");
// CN1 Util class, readInputStream() returns byte array
byte[] b = Util.readInputStream(is);
String myString = new String(b);
I need to write something into a text file's beginning. I have a text file with content and i want write something before this content. Say i have;
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
After modifying,I want it to be like this:
Page 1-Scene 59
25.05.2011
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
Just made up the content :) How can i modify a text file like this way?
You can't really modify it that way - file systems don't generally let you insert data in arbitrary locations - but you can:
Create a new file
Write the prefix to it
Copy the data from the old file to the new file
Move the old file to a backup location
Move the new file to the old file's location
Optionally delete the old backup file
Just in case it will be useful for someone here is full source code of method to prepend lines to a file using Apache Commons IO library. The code does not read whole file into memory, so will work on files of any size.
public static void prependPrefix(File input, String prefix) throws IOException {
LineIterator li = FileUtils.lineIterator(input);
File tempFile = File.createTempFile("prependPrefix", ".tmp");
BufferedWriter w = new BufferedWriter(new FileWriter(tempFile));
try {
w.write(prefix);
while (li.hasNext()) {
w.write(li.next());
w.write("\n");
}
} finally {
IOUtils.closeQuietly(w);
LineIterator.closeQuietly(li);
}
FileUtils.deleteQuietly(input);
FileUtils.moveFile(tempFile, input);
}
I think what you want is random access. Check out the related java tutorial. However, I don't believe you can just insert data at an arbitrary point in the file; If I recall correctly, you'd only overwrite the data. If you wanted to insert, you'd have to have your code
copy a block,
overwrite with your new stuff,
copy the next block,
overwrite with the previously copied block,
return to 3 until no more blocks
As #atk suggested, java.nio.channels.SeekableByteChannel is a good interface. But it is available from 1.7 only.
Update : If you have no issue using FileUtils then use
String fileString = FileUtils.readFileToString(file);
This isn't a direct answer to the question, but often files are accessed via InputStreams. If this is your use case, then you can chain input streams via SequenceInputStream to achieve the same result. E.g.
InputStream inputStream = new SequenceInputStream(new ByteArrayInputStream("my line\n".getBytes()), new FileInputStream(new File("myfile.txt")));
I will leave it here just in case anyone need
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
try (FileInputStream fileInputStream1 = new FileInputStream(fileName1);
FileInputStream fileInputStream2 = new FileInputStream(fileName2)) {
while (fileInputStream2.available() > 0) {
byteArrayOutputStream.write(fileInputStream2.read());
}
while (fileInputStream1.available() > 0) {
byteArrayOutputStream.write(fileInputStream1.read());
}
}
try (FileOutputStream fileOutputStream = new FileOutputStream(fileName1)) {
byteArrayOutputStream.writeTo(fileOutputStream);
}