I have no idea how can I insert boolean sign into RTF document from java programm. I think about √ or ✓ and –. I tried insert these signs to clear document and save it as *.rtf and then open it in Notepad++ but there is a lot of codes (~160 lines) and I can not understand what is it. Do you have any idea?
After a short search I found this:
Writing unicode to rtf file
So a final code version would be:
public void writeToFile() {
String strJapanese = "日本語✓";
try {
FileOutputStream fos = new FileOutputStream("test.rtf");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(strJapanese);
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Please read about RTF
√ or ✓ and – are not available in every charset, so specify it. If yout output in UTF-8 (and i advise you to do so, check here on how to do this). You might need to encode the sign aswell, check Wikipedia
Related
I don't know why this is so much harder than expected.
I'm trying to create an HTML file with Java, and it is not working. The code creates a file, but the contents are not what I inputted.
My simplified code is as follows:
File file = new File("text.html");
PrintWriter out = null;
try {
out = new PrintWriter(file);
out.write("<b>Hello World!</b>");
} catch (Exception e) { }
out.close();
Instead of the contents "Hello World!", the HTML file contains the escaped form "<b>Hello World!</b>". When I open the file with TextWrangler, I see that Java has automatically escaped all my angle brackets into < and >, which breaks all the formatting and is NOT what I want.
How do I avoid this?
I am having a set of pdf files that contain central european characters such as č, Ď, Š and so on. I want to convert them to text and I have tried pdftotext and PDFBox through Apache Tika but always some of them are not converted correctly.
The strange thing is that the same character in the same text is correctly converted at some places and incorrectly at some others! An example is this pdf.
In the case of pdftotext I am using these options:
pdftotext -nopgbrk -eol dos -enc UTF-8 070612.pdf
My Tika code looks like that:
String newname = f.getCanonicalPath().replace(".pdf", ".txt");
OutputStreamWriter print = new OutputStreamWriter (new FileOutputStream(newname), Charset.forName("UTF-16"));
String fileString = "path\to\myfiles\"
try{
is = new FileInputStream(f);
ContentHandler contenthandler = new BodyContentHandler(10*1024*1024);
Metadata metadata = new Metadata();
PDFParser pdfparser = new PDFParser();
pdfparser.parse(is, contenthandler, metadata, new ParseContext());
String outputString = contenthandler.toString();
outputString = outputString.replace("\n", "\r\n");
System.err.println("Writing now file "+newname);
print.write(outputString);
}catch (Exception e) {
e.printStackTrace();
}
finally {
if (is != null) is.close();
print.close();
}
Edit: Forgot to mention that I am facing the same issue when converting to text from Acrobat Reader XI, as well.
Well aside from anything else, this code will use the platform default encoding:
PrintWriter print = new PrintWriter(newname);
print.print(outputString);
print.close();
I suggest you use OutputStreamWriter instead wrapping a FileOutputStream, and specify UTF-8 as an encoding (as it can encode all of Unicode, and is generally well supported).
You should also close the writer in a finally block, and I'd probably separate the "reading" part from the "writing" part. (I'd avoid catching Exception too, but going into the details of exception handling is a bit beyond the point of this answer.)
How to check pdf file is password protected or not in java?
I know of several tools/libraries that can do this but I want to know if this is possible with just program in java.
Update
As per mkl's comment below this answer, it seems that there are two types of PDF structures permitted by the specs: (1) Cross-referenced tables (2) Cross-referenced Streams. The following solution only addresses the first type of structure. This answer needs to be updated to address the second type.
====
All of the answers provided above refer to some third party libraries which is what the OP is already aware of. The OP is asking for native Java approach. My answer is yes, you can do it but it will require a lot of work.
It will require a two step process:
Step 1: Figure out if the PDF is encrypted
As per Adobe's PDF 1.7 specs (page number 97 and 115), if the trailer record contains the key "\Encrypted", the pdf is encrypted (the encryption could be simple password protection or RC4 or AES or some custom encryption). Here's a sample code:
Boolean isEncrypted = Boolean.FALSE;
try {
byte[] byteArray = Files.readAllBytes(Paths.get("Resources/1.pdf"));
//Convert the binary bytes to String. Caution, it can result in loss of data. But for our purposes, we are simply interested in the String portion of the binary pdf data. So we should be fine.
String pdfContent = new String(byteArray);
int lastTrailerIndex = pdfContent.lastIndexOf("trailer");
if(lastTrailerIndex >= 0 && lastTrailerIndex < pdfContent.length()) {
String newString = pdfContent.substring(lastTrailerIndex, pdfContent.length());
int firstEOFIndex = newString.indexOf("%%EOF");
String trailer = newString.substring(0, firstEOFIndex);
if(trailer.contains("/Encrypt"))
isEncrypted = Boolean.TRUE;
}
}
catch(Exception e) {
System.out.println(e);
//Do nothing
}
Step 2: Figure out the encryption type
This step is more complex. I don't have a code sample yet. But here is the algorithm:
Read the value of the key "/Encrypt" from the trailer as read in the step 1 above. E.g. the value is 288 0 R.
Look for the bytes "288 0 obj". This is the location of the "encryption dictionary" object in the document. This object boundary ends at the string "endobj".
Look for the key "/Filter" in this object. The "Filter" is the one that identifies the document's security handler. If the value of the "/Filter" is "/Standard", the document uses the built-in password-based security handler.
If you just want to know whether the PDF is encrypted without worrying about whether the encryption is in form of owner / user password or some advance algorithms, you don't need the step 2 above.
Hope this helps.
you can use PDFBox:
http://pdfbox.apache.org/
code example :
try
{
document = PDDocument.load( yourPDFfile );
if( document.isEncrypted() )
{
//ITS ENCRYPTED!
}
}
using maven?
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0</version>
</dependency>
Using iText pdf API we can identify the password protected PDF.
Example :
try {
new PdfReader("C:\\Password_protected.pdf");
} catch (BadPasswordException e) {
System.out.println("PDF is password protected..");
} catch (Exception e) {
e.printStackTrace();
}
You can validate pdf, i.e it can be readable, writable by using Itext.
Following is the code snippet,
boolean isValidPdf = false;
try {
InputStream tempStream = new FileInputStream(new File("path/to/pdffile.pdf"));
PdfReader reader = new PdfReader(tempStream);
isValidPdf = reader.isOpenedWithFullPermissions();
} catch (Exception e) {
isValidPdf = false;
}
The correct how to do it in java answer is per #vhs.
However in any application by far the simplest is to use very lightweight pdfinfo tool to filter the encryption status and here using windows cmd I can instantly get a report that two different copies of the same file are encrypted
>forfiles /m *.pdf /C "cmd /c echo #file &pdfinfo #file|find /i \"Encrypted\""
"Certificate (9).pdf"
Encrypted: no
"ds872 source form.pdf"
Encrypted: AES 128-bit
"ds872 filled form.pdf"
Encrypted: AES 128-bit
"How to extract data from a particular area in a PDF file - Stack Overflow.pdf"
Encrypted: no
"Test.pdf"
Encrypted: no
>
The solution:
1) Install PDF Parser http://www.pdfparser.org/
2) Edit Parser.php in this section:
if (isset($xref['trailer']['encrypt'])) {
echo('Your Allert message');
exit();}
3)In your .php form post ( ex. upload.php) insert this:
for the first require '...yourdir.../vendor/autoload.php';
then write this function:
function pdftest_is_encrypted($form) {
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile($form);
}
and then call the function
pdftest_is_encrypted($_FILES["upfile"]["tmp_name"]);
This is all, if you'll try to load a PDF with password the system return an error "Your Allert message"
I need to parse a java file (actually a .pdf) to an String and go back to a file. Between those process I'll apply some patches to the given string, but this is not important in this case.
I've developed the following JUnit test case:
String f1String=FileUtils.readFileToString(f1);
File temp=File.createTempFile("deleteme", "deleteme");
FileUtils.writeStringToFile(temp, f1String);
assertTrue(FileUtils.contentEquals(f1, temp));
This test converts a file to a string and writtes it back. However the test is failing.
I think it may be because of the encodings, but in FileUtils there is no much detailed info about this.
Anyone can help?
Thanks!
Added for further undestanding:
Why I need this?
I have very large pdfs in one machine, that are replicated in another one. The first one is in charge of creating those pdfs. Due to the low connectivity of the second machine and the big size of pdfs, I don't want to synch the whole pdfs, but only the changes done.
To create patches/apply them, I'm using the google library DiffMatchPatch. This library creates patches between two string. So I need to load a pdf to an string, apply a generated patch, and put it back to a file.
A PDF is not a text file. Decoding (into Java characters) and re-encoding of binary files that are not encoded text is asymmetrical. For example, if the input bytestream is invalid for the current encoding, you can be assured that it won't re-encode correctly. In short - don't do that. Use readFileToByteArray and writeByteArrayToFile instead.
Just a few thoughts:
There might actually some BOM (byte order mark) bytes in one of the files that either gets stripped when reading or added during writing. Is there a difference in the file size (if it is the BOM the difference should be 2 or 3 bytes)?
The line breaks might not match, depending which system the files are created on, i.e. one might have CR LF while the other only has LF or CR. (1 byte difference per line break)
According to the JavaDoc both methods should use the default encoding of the JVM, which should be the same for both operations. However, try and test with an explicitly set encoding (JVM's default encoding would be queried using System.getProperty("file.encoding")).
Ed Staub awnser points why my solution is not working and he suggested using bytes instead of Strings. In my case I need an String, so the final working solution I've found is the following:
#Test
public void testFileRWAsArray() throws IOException{
String f1String="";
byte[] bytes=FileUtils.readFileToByteArray(f1);
for(byte b:bytes){
f1String=f1String+((char)b);
}
File temp=File.createTempFile("deleteme", "deleteme");
byte[] newBytes=new byte[f1String.length()];
for(int i=0; i<f1String.length(); ++i){
char c=f1String.charAt(i);
newBytes[i]= (byte)c;
}
FileUtils.writeByteArrayToFile(temp, newBytes);
assertTrue(FileUtils.contentEquals(f1, temp));
}
By using a cast between byte-char, I have the symmetry on conversion.
Thank you all!
Try this code...
public static String fetchBase64binaryEncodedString(String path) {
File inboundDoc = new File(path);
byte[] pdfData;
try {
pdfData = FileUtils.readFileToByteArray(inboundDoc);
} catch (IOException e) {
throw new RuntimeException(e);
}
byte[] encodedPdfData = Base64.encodeBase64(pdfData);
String attachment = new String(encodedPdfData);
return attachment;
}
//How to decode it
public void testConversionPDFtoBase64() throws IOException
{
String path = "C:/Documents and Settings/kantab/Desktop/GTR_SDR/MSDOC.pdf";
File origFile = new File(path);
String encodedString = CreditOneMLParserUtil.fetchBase64binaryEncodedString(path);
//now decode it
byte[] decodeData = Base64.decodeBase64(encodedString.getBytes());
String decodedString = new String(decodeData);
//or actually give the path to pdf file.
File decodedfile = File.createTempFile("DECODED", ".pdf");
FileUtils.writeByteArrayToFile(decodedfile,decodeData);
Assert.assertTrue(FileUtils.contentEquals(origFile, decodedfile));
// Frame frame = new Frame("PDF Viewer");
// frame.setLayout(new BorderLayout());
}
I need to write something into a text file's beginning. I have a text file with content and i want write something before this content. Say i have;
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
After modifying,I want it to be like this:
Page 1-Scene 59
25.05.2011
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
Just made up the content :) How can i modify a text file like this way?
You can't really modify it that way - file systems don't generally let you insert data in arbitrary locations - but you can:
Create a new file
Write the prefix to it
Copy the data from the old file to the new file
Move the old file to a backup location
Move the new file to the old file's location
Optionally delete the old backup file
Just in case it will be useful for someone here is full source code of method to prepend lines to a file using Apache Commons IO library. The code does not read whole file into memory, so will work on files of any size.
public static void prependPrefix(File input, String prefix) throws IOException {
LineIterator li = FileUtils.lineIterator(input);
File tempFile = File.createTempFile("prependPrefix", ".tmp");
BufferedWriter w = new BufferedWriter(new FileWriter(tempFile));
try {
w.write(prefix);
while (li.hasNext()) {
w.write(li.next());
w.write("\n");
}
} finally {
IOUtils.closeQuietly(w);
LineIterator.closeQuietly(li);
}
FileUtils.deleteQuietly(input);
FileUtils.moveFile(tempFile, input);
}
I think what you want is random access. Check out the related java tutorial. However, I don't believe you can just insert data at an arbitrary point in the file; If I recall correctly, you'd only overwrite the data. If you wanted to insert, you'd have to have your code
copy a block,
overwrite with your new stuff,
copy the next block,
overwrite with the previously copied block,
return to 3 until no more blocks
As #atk suggested, java.nio.channels.SeekableByteChannel is a good interface. But it is available from 1.7 only.
Update : If you have no issue using FileUtils then use
String fileString = FileUtils.readFileToString(file);
This isn't a direct answer to the question, but often files are accessed via InputStreams. If this is your use case, then you can chain input streams via SequenceInputStream to achieve the same result. E.g.
InputStream inputStream = new SequenceInputStream(new ByteArrayInputStream("my line\n".getBytes()), new FileInputStream(new File("myfile.txt")));
I will leave it here just in case anyone need
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
try (FileInputStream fileInputStream1 = new FileInputStream(fileName1);
FileInputStream fileInputStream2 = new FileInputStream(fileName2)) {
while (fileInputStream2.available() > 0) {
byteArrayOutputStream.write(fileInputStream2.read());
}
while (fileInputStream1.available() > 0) {
byteArrayOutputStream.write(fileInputStream1.read());
}
}
try (FileOutputStream fileOutputStream = new FileOutputStream(fileName1)) {
byteArrayOutputStream.writeTo(fileOutputStream);
}