Convert base64 format to pdf in java : output damaged file - java

I am able to create a pdf file but when I try to open the output pdf file I am getting error : "the file is damaged"
Here is my code please help me.
String encodedBytes= "QmFzZTY0IGVuY29kaW5nIHNjaGVtZXMgYXJlIHVzZWQgd2hlbiBiaW5hcnkgZGF0YSBuZWVkcyB0byBiZSBzdG9yZWQgb3IgdHJhbnNmZXJyZWQgYXMgdGV4dHVhbCBkYXRh";
BASE64Decoder decoder = new BASE64Decoder();
byte[] decodedBytes = decoder.decodeBuffer(encodedBytes);
File file = new File("C:/Users/istest/Documents/test.pdf");
FileOutputStream fos = new FileOutputStream(file);
fos.write(decodedBytes);

Your string is not a valid PDF file.
A pdf file should start its proper Magic number (please refer to the Format indicators section of this link)
PDF files start with "%PDF" (hex 25 50 44 46).
or in Base64 : JVBERi
if you try your code with a valid PDF encoded string like this one, it might work.
But because you did not provided the code of your BASE64Decoder class, it is hard to be sure that it will work.
For that reason, here is a simple implementation of the java.util.Base64 package (Warning do not copy/past this example and do not try it before changing the given base64 string here with the correct one as supplied in the previous link...as noted in the bellow comment, in order to be short the correct string was replaced by a corrupted one)
import java.io.File;
import java.io.FileOutputStream;
import java.util.Base64;
class Base64DecodePdf {
public static void main(String[] args) {
File file = new File("./test.pdf");
try ( FileOutputStream fos = new FileOutputStream(file); ) {
// To be short I use a corrupted PDF string, so make sure to use a valid one if you want to preview the PDF file
String b64 = "JVBERi0xLjUKJYCBgoMKMSAwIG9iago8PC9GaWx0ZXIvRmxhdGVEZWNvZGUvRmlyc3QgMTQxL04gMjAvTGVuZ3==";
byte[] decoder = Base64.getDecoder().decode(b64);
fos.write(decoder);
System.out.println("PDF File Saved");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Credit : source.

Related

Zip Archives get corrupted when uploading to Azure Blob Store using REST API

I have been really banging my head against the wall with this one, uploading text files is fine, but when I upload a zip archive into my blob store -> it gets corrupted, and cannot be opened once downloaded.
Doing a hex compare (image below) of the original versus file that has been through Azure shows some subtle replacements have happened, but I cannot find the source of the change/corruption.
I have tried forcing UTF-8/Ascii/UTF-16, but found UTF-8 is probably correct, none have resolved the issue.
I have also tried different http libraries but got the same result.
Deployment environment is forcing unirest, and cannot use the Microsoft API (Which seems to work fine).
package blobQuickstart.blobAzureApp;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.Base64;
import org.junit.Test;
import kong.unirest.HttpResponse;
import kong.unirest.Unirest;
public class StackOverflowExample {
#Test
public void uploadSmallZip() throws Exception {
File testFile = new File("src/test/resources/zip/simple.zip");
String blobStore = "secretstore";
UploadedFile testUploadedFile = new UploadedFile();
testUploadedFile.setName(testFile.getName());
testUploadedFile.setFile(testFile);
String contentType = "application/zip";
String body = readFileContent(testFile);
String url = "https://" + blobStore + ".blob.core.windows.net/naratest/" + testFile.getName() + "?sv=2020-02-10&ss=b&srt=o&sp=c&se=2021-09-07T20%3A10%3A50Z&st=2021-09-07T18%3A10%3A50Z&spr=https&sig=xvQTkCQcfMTwWSP5gXeTB5vHlCh2oZXvmvL3kaXRWQg%3D";
HttpResponse<String> response = Unirest.put(url)
.header("x-ms-blob-type", "BlockBlob").header("Content-Type", contentType)
.body(body).asString();
if (!response.isSuccess()) {
System.out.println(response.getBody());
throw new Exception("Failed to Upload File! Unexpected response code: " + response.getStatus());
}
}
private static String readFileContent(File file) throws Exception {
InputStream is = new FileInputStream(file);
ByteArrayOutputStream answer = new ByteArrayOutputStream();
byte[] byteBuffer = new byte[8192];
int nbByteRead;
while ((nbByteRead = is.read(byteBuffer)) != -1)
{
answer.write(byteBuffer, 0, nbByteRead);
}
is.close();
byte[] fileContents = answer.toByteArray();
String s = Base64.getEncoder().encodeToString(fileContents);
byte[] resultBytes = Base64.getDecoder().decode(s);
String encodedContents = new String(resultBytes);
return encodedContents;
}
}
Please help!
byte[] resultBytes = Base64.getDecoder().decode(s);
String encodedContents = new String(resultBytes);
You are creating a String from a byte array containing binary data. String is only for printable characters. You do multiple pointless encoding/decoding just taking more memory.
If the content is in a ZIP format, it's binary, just return the byte array. Or you can encode the content, but then you should return the content encoded. As a weakness, you're doing it all in memory, limiting potential size of the content.
Unirest file handlers will by default force a multipart body - not supported by Azure.
A Byte Array can be provided directly as per this: https://github.com/Kong/unirest-java/issues/248
Unirest.put("http://somewhere")
.body("abc".getBytes())

How to properly open a png file

I am trying to attach a png file. Currently when I sent the email, the attachment is 2x bigger than the file should be and an invalid png file. Here is the code I currently have:
import com.sendgrid.*;
Attachments attachments = new Attachments();
String filePath = "/Users/david/Desktop/screenshot5.png";
String data = "";
try {
data = new String(Files.readAllBytes(Paths.get(filePath)));
} catch (IOException e) {
}
byte[] encoded = Base64.encodeBase64(data.getBytes());
String encodedString = new String(encoded);
attachments.setContent(encodedString);
Perhaps I am encoding the data incorrectly? What would be the correct way to 'get' the data to attach it?
With respect, this is why Python presents a problem to modern developers. It abstracts away important concepts that you can't fully understand in interpreted languages.
First, and this is a relatively basic concept, but you can't convert arbitrary byte sequences to a string and hope it works out. The following line is your first problem:
data = new String(Files.readAllBytes(Paths.get(filePath)));
EDIT: It looks like the library you are using expects the file to be base64 encoded. I have no idea why. Try changing your code to this:
Attachments attachments = new Attachments();
String filePath = "/Users/david/Desktop/screenshot5.png";
try {
byte[] encoded = Base64.encodeBase64(Files.readAllBytes(Paths.get(filePath)));
String encodedString = new String(encoded);
attachments.setContent(encodedString);
} catch (IOException e) {
}
The only issue you were having is that you were trying to represent arbitrary bytes as a string.
Take a look at the Builder class in the repository here. Example:
FileInputStream fileContent = new FileInputStream(filePath);
Attachments.Builder builder = new Attachments.Builder(fileName, fileContent);
mail.addAttachments(builder.build());

Decode base64 PDF and write to file in JMeter

I am trying to decode a pdf from a response and write it to a file.
The file gets created and appears to be the correct file size, but when I go to open it, I get an error that says, "There was an error opening this document. The file is damaged and could not be repaired."
I am using the code from this post to decode and create the file.
I set the base64 encoded file returned from the API as the variable vars.get("documentText")
Here is how my BeanShell PostProcessor code looks:
import org.apache.commons.io.FileUtils;
import org.apache.commons.codec.binary.Base64;
String Createresponse= vars.get("documentText");
vars.put("response",new String(Base64.decodeBase64(Createresponse.getBytes("UTF-8"))));
Output = vars.get("response");
f = new FileOutputStream("C:\\Users\\user\\Desktop\\Test.pdf");
p = new PrintStream(f);
this.interpreter.setOut(p);
print(Output);
f.close();
Am I doing something incorrectly?
I have also done the following, but get the same result:
byte[] data = Base64.decodeBase64(vars.get("documentText"));
FileOutputStream out = new FileOutputStream("C:\\Users\\user\\Desktop\\Test.pdf");
out.write(data);
out.close();
EDIT:
The entire PDF from the Response looks like the following: (these are just the first 5 lines (of approx. 7,548 lines), but they are all similar):
JVBERi0xLjQKMSAwIG9iago8PAovVGl0bGUgKP7/KQovQ3JlYXRvciAo/v8pCi9Qcm9kdWNlciAo
/v8AUQB0ACAANQAuADUALgAxKQovQ3JlYXRpb25EYXRlIChEOjIwMTcwMzI3MTgwNTEzKQo+Pgpl
bmRvYmoKMiAwIG9iago8PAovVHlwZSAvQ2F0YWxvZwovUGFnZXMgMyAwIFIKPj4KZW5kb2JqCjQg
MCBvYmoKPDwKL1R5cGUgL0V4dEdTdGF0ZQovU0EgdHJ1ZQovU00gMC4wMgovY2EgMS4wCi9DQSAx
LjAKL0FJUyBmYWxzZQovU01hc2sgL05vbmU+PgplbmRvYmoKNSAwIG9iagpbL1BhdHRlcm4gL0Rl
I'm assuming this is what is causing an issue? Is there a way to convert the response to a single String that can be decoded?
EDIT 2:
So the 
 in the response is definitely my problem. I looked up the hex code character and it translates to a carriage return. If I manually copy the Response from within JMeter, paste it into Notepad++, remove 
 and then decode it manually, the PDF opens as it should.
I tried modifying my BeanShell script to remove the carriage return and then decode it, but it still isn't fully functional. The PDF now opens, however, it is just blank white pages. Here is my updated code:
String Createresponse= vars.get("documentText");
String b64 = Createresponse.replace("
","");
vars.put("response",new String(Base64.decodeBase64(b64)));
Output = vars.get("response");
f = new FileOutputStream("C:\\Users\\user\\Desktop\\Test.pdf");
p = new PrintStream(f);
this.interpreter.setOut(p);
print(Output);
f.close();
This works for me. You input data is wrong.
package com.test;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;
import org.junit.Test;
public class TestBase64 {
String data =
"JVBERi0xLjQKMSAwIG9iago8PAovVGl0bGUgKP7/KQovQ3JlYXRvciAo/v8pCi9Qcm9kdWNlciAo/v8AUQB0ACAANQAuADUALgAxKQovQ3JlYXRpb25EYXRlIChEOjIwMTcwMzI3MTgwNTEzKQo+Pgpl";
#Test
public void decodeBase64()
{
byte[] localData = Base64.getDecoder().decode(data);
try (FileOutputStream out = new FileOutputStream("/testout64.dat"))
{
out.write(localData);
out.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
This results in
%PDF-1.4
1 0 obj
<<
/Title (þÿ)
/Creator (þÿ)
/Producer (þÿ Q t 5 . 5 . 1)
/CreationDate (D:20170327180513)
>>
e
and seems to be valid PDF.
What is the &_#_x_d_;_ part? Seems to be some custom format characters.
I basically had the answer to my question, the problem was with the base64 encoded Response I was trying to decode was multi-line and included carriage return hex code.
My solution to this was to remove the carriage return hex code from the response and condense it to a single string of base64 encoded text and then write the file out.
import org.apache.commons.io.FileUtils;
import org.apache.commons.codec.binary.Base64;
String response = vars.get("documentText");
String encodedFile = response.replace("
","").replaceAll("[\n]+","");
// Decode the response
vars.put("decodedFile",new String(Base64.decodeBase64(encodedFile)));
// Write out the decoded file
Output = vars.get("decodedFile");
file = new FileOutputStream("C:\\Users\\user\\Desktop\\decodedFile.pdf");
p = new PrintStream(file);
this.interpreter.setOut(p);
print(Output);
p.flush();
file.close();

Write Base64-encoded image to file

How to write a Base64-encoded image to file?
I have encoded an image to a string using Base64.
First, I read the file, then convert it to a byte array and then apply Base64 encoding to convert the image to a string.
Now my problem is how to decode it.
byte dearr[] = Base64.decodeBase64(crntImage);
File outF = new File("c:/decode/abc.bmp");
BufferedImage img02 = ImageIO.write(img02, "bmp", outF);
The variable crntImage contains the string representation of the image.
Assuming the image data is already in the format you want, you don't need ImageIO at all - you just need to write the data to the file:
// Note preferred way of declaring an array variable
byte[] data = Base64.decodeBase64(crntImage);
try (OutputStream stream = new FileOutputStream("c:/decode/abc.bmp")) {
stream.write(data);
}
(I'm assuming you're using Java 7 here - if not, you'll need to write a manual try/finally statement to close the stream.)
If the image data isn't in the format you want, you'll need to give more details.
With Java 8's Base64 API
byte[] decodedImg = Base64.getDecoder()
.decode(encodedImg.getBytes(StandardCharsets.UTF_8));
Path destinationFile = Paths.get("/path/to/imageDir", "myImage.jpg");
Files.write(destinationFile, decodedImg);
If your encoded image starts with something like data:image/png;base64,iVBORw0..., you'll have to remove the part. See this answer for an easy way to do that.
No need to use BufferedImage, as you already have the image file in a byte array
byte dearr[] = Base64.decodeBase64(crntImage);
FileOutputStream fos = new FileOutputStream(new File("c:/decode/abc.bmp"));
fos.write(dearr);
fos.close();
import java.util.Base64;
.... Just making it clear that this answer uses the java.util.Base64 package, without using any third-party libraries.
String crntImage=<a valid base 64 string>
byte[] data = Base64.getDecoder().decode(crntImage);
try( OutputStream stream = new FileOutputStream("d:/temp/abc.pdf") )
{
stream.write(data);
}
catch (Exception e)
{
System.err.println("Couldn't write to file...");
}
Other option using apache-commons:
import org.apache.commons.codec.binary.Base64;
import org.apache.commons.io.FileUtils;
...
File file = new File( "path" );
byte[] bytes = Base64.decodeBase64( "base64" );
FileUtils.writeByteArrayToFile( file, bytes );
Try this:
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.net.URL;
import javax.imageio.ImageIO;
public class WriteImage
{
public static void main( String[] args )
{
BufferedImage image = null;
try {
URL url = new URL("URL_IMAGE");
image = ImageIO.read(url);
ImageIO.write(image, "jpg",new File("C:\\out.jpg"));
ImageIO.write(image, "gif",new File("C:\\out.gif"));
ImageIO.write(image, "png",new File("C:\\out.png"));
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Done");
}
}

Cannot convert Gujarati PDF(unicode) to text

I am trying to read a PDF file of Gujarat electoral roll (sample file). I need to extract all the information in a structured format. I am using pdfbox from Apache to extract text from the PDF file.
The problem that I am facing is that certain characters are getting lost in the conversion and there is a lot of noise in the converted text. Please find the converted file here.
The code
import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.util.*;
public class Main {
public static void main(String[] args){
PDDocument pd;
BufferedWriter wr;
try {
File input = new File("myPDF_manual.pdf");
File output = new File("newPaperTestFile.txt"); // The text file where you are going to store the extracted data
pd = PDDocument.load(input);
PDFTextStripper stripper = new PDFTextStripper();
wr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(output)));
stripper.writeText(pd, wr);
if (pd != null) {
pd.close();
wr.close();
System.out.println(" file processed.");
}
} catch (Exception e){
e.printStackTrace();
}
}
}
I also tried the code using getText() method of PDFTextStripper class but the result is same.
I also tried to convert the pdf to xml using pdftohtml command line utility for linux. But there also some of the information is still lost. The xml file can be found here
Please suggest me any solution to solve this problem. Solution doesn't need to be Java specific.

Categories