How can replace words in pdf file in java?

How can replace words in pdf file in java? - java

I tried code like this, but it's not replacing words. Is it correct way to replace words in pdf file?
#SpringBootApplication
public class DocReadWriteApplication {
public static final String SRC = "../Downloads/Debt LOI.pdf";
public static final String DEST = "../Downloads/hello.pdf";
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
file.getParentFile().mkdirs();
manipulatePdf(SRC, DEST);
}
public static void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getPageN(1);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
PdfArray refs = null;
if (dict.get(PdfName.CONTENTS).isArray()) {
refs = dict.getAsArray(PdfName.CONTENTS);
} else if (dict.get(PdfName.CONTENTS).isIndirect()) {
refs = new PdfArray(dict.get(PdfName.CONTENTS));
}
for (int i = 0; i < refs.getArrayList().size(); i++) {
PRStream stream = (PRStream) refs.getDirectObject(i);
byte[] data = PdfReader.getStreamBytes(stream);
stream.setData(new String(data).replace("transaction", "Data").getBytes());
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
}
Anybody have done like this?

Related

Word found unreadable content in .docx after replacing content through docx4j

I am getting error Word found unreadable content in .docx after replacing content through docx4j.
Please find code snippet.
I am using docx4j-6.1.2 jar
public class Testt {
public static void main(String[] args) throws Exception {
final String TEMPLATE_NAME = "D://fileuploadtemp//123.docx";
InputStream templateInputStream = new FileInputStream(TEMPLATE_NAME);
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
String xpath = "//w:r[w:t[contains(text(),'TEST')]]";
List<Object> list = documentPart.getJAXBNodesViaXPath(xpath, true);
for (Object obj : list) {
org.docx4j.wml.ObjectFactory factory = new org.docx4j.wml.ObjectFactory();
org.docx4j.wml.Text t = factory.createText();
t.setValue("\r\n");
((R) obj).getContent().clear();
((R) obj).getContent().add(t);
}
OutputStream os = new FileOutputStream(new File("D://fileuploadtemp//1234.docx"));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
wordMLPackage.save(outputStream);
outputStream.writeTo(os);
os.close();
outputStream.close();
templateInputStream.close();
}
}

Adding UNICODE emoticon on pdf with itext

I have some problem adding unicode emoticon on pdf created with itext pdf. I tried with this and itextpdf core 5.5.13
public class MathSymbols {
public static final String DEST = "EXAMPLE.pdf";
public static final String FONT = "/res/fonts/arialuni.ttf";
public static String TEXT ;
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
TEXT = "this "+Character.toChars(0x1F600)+" string \uD83D\uDE00 contains \ud83d\ude00 special \u2609 characters like this \u2208, \u2229, \u2211, \u222b, \u2206";
new MathSymbols().createPdf(DEST);
}
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
BaseFont bf = BaseFont.createFont(FONT, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
Font f = new Font(bf,12);
Paragraph p = new Paragraph(TEXT, f);
document.add(p);
document.close();
}}
I have the same proble on itextpdf 7.x with this snippet
public class Main {
public static final String DEST = "example.pdf";
public static void main(String args[]) throws IOException {
File file = new File(DEST);
new Main().createPdf(DEST);
}
public void createPdf(String dest) throws IOException {
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
PdfFont f = PdfFontFactory.createFont("/resources/arialuni.ttf", PdfEncodings.IDENTITY_H,true);
Paragraph p = new Paragraph("H\u2082SO\u2074 1 \uD83D\uDE00 contains \ud83d\ude00 spe \u2702 cial \u2609 characters like this \u2208, \u2229, \u2211, \u222b, \u2206").setFont(f).setFontSize(10);
document.add(p);
document.close();
}}
I tried with different fonts and different way in java like here.
But I only obtaing a withe space, or a square, in the pdf and no emoticon

Extracting font color along with font type from PDF using PDFBox

I need to extract Font color as well as Font type[E.g.-Black, Tahoma, Bold] from PDF by Java(Using PDFBox). Below is the code I have written to extract font type and embed the same in the extracted text.
public class PDFParse {
public static void main(String args[]) {
PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
File file = new File("Sample Bill.pdf");
try {
PDFParser parser = new PDFParser(new FileInputStream(file));
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper() {
String prevBaseFont = "";
protected void writeString(String text, List<TextPosition> textPositions) throws IOException
{
StringBuilder builder = new StringBuilder();
for (TextPosition position : textPositions)
{
String baseFont = position.getFont().getBaseFont();
if (baseFont != null && !baseFont.equals(prevBaseFont))
{
builder.append('[').append(baseFont).append(']');
prevBaseFont = baseFont;
}
builder.append(position.getCharacter());
}
writeString(builder.toString());
}
};
pdDoc = new PDDocument(cosDoc);
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(5);
pdfStripper.setSortByPosition(true);
String parsedText = pdfStripper.getText(pdDoc);
PrintWriter out = new PrintWriter("sample.txt");
out.println(parsedText);
out.close();
System.out.println(parsedText);
}
}
How to extract the font color for each word and embed the same in the same extracted file? Thanks :)

How to print tagged values from a file?

I had to write a code to identify the language of tweets and to print out the tweets of a certain language. I have written the language identification part, but cannot get to print only the lines necessary.
Here is the code:
import java.io.*;
import java.util.*;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.functions.SMO;
import weka.classifiers.trees.RandomForest;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
public class Lang_Detect
{
public static weka.classifiers.Classifier c;
public static HashMap<String,String> trigram=new HashMap<String,String>();
public static void initiate() throws Exception
{
c = loadModel("C:\\Users\\DIV\\ff\\Maithili\\nb.model"); // loads nb model
}
public static NaiveBayes loadModel(String path) throws Exception
{
NaiveBayes classifier;
FileInputStream fis = new FileInputStream(path);
ObjectInputStream ois = new ObjectInputStream(fis);
classifier = (NaiveBayes) ois.readObject();
ois.close();
return classifier;
}
public static void read_trigram()
{
try
{
FileInputStream fis = new FileInputStream("C:\\Users\\DIV\\ff\\Maithili\\Trigram.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fis,"UTF-8"));
String line;
while((line = br.readLine())!=null)
{
String words[]=line.split(":");
trigram.put(words[0].trim(), "");
}
fis.close();
}catch(IOException f){}
}
public static String feature_vector(String line)
{
String vector="";
String words[]=line.split(" ");
HashMap<String,String> local_word=new HashMap<String,String>();
for(int i=0;i<words.length;i++)
{
char ch[]=words[i].toCharArray();
for(int j=0;j<ch.length-2;j++)
{
local_word.put(ch[j]+""+ch[j+1]+""+ch[j+2], "");
}
}
for (Map.Entry<String, String> entry : trigram.entrySet())
{
if(local_word.containsKey(entry.getKey()))
{
vector+="1,";
}
else
{
vector+="0,";
}
}
return vector;
}
public static String lang_tag(String file) throws Exception
{
String tagged_sentence="";
int l=0,cntr=0;;
//String words[]=sentence.toLowerCase().split(" ");
StringBuffer str=new StringBuffer();
read_trigram();
// TODO Auto-generated method stub
int count=1;
str.append("#relation Language\n");
for (Map.Entry<String, String> entry : trigram.entrySet())
{
str.append("#attribute Trigram"+count+" numeric\n");
count++;
}
str.append("#attribute class {HN,NP,MT}\n");
str.append("#DATA\n");
try
{
FileInputStream fis = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fis,"UTF-8"));
String line;
while((line = br.readLine())!=null)
{
str.append(feature_vector(line)+"?\n");
}
fis.close();
}catch(IOException f){}
Global.file_update("C:\\Users\\DIV\\ff\\Maithili\\HN_NP_MT_Unlabelled.arff", str.toString());
Instances unlabeled = new Instances(
new BufferedReader(
new FileReader("HN_NP_MT_Unlabelled.arff")));
// set class attribute
unlabeled.setClassIndex(unlabeled.numAttributes() - 1);
Instances labeled = new Instances(unlabeled);
// label instances
for (int i = 0; i < unlabeled.numInstances(); i++)
{
double clsLabel = c.classifyInstance(unlabeled.instance(i));
String tag="";
if(clsLabel==0.0)
tag="HN";
else if(clsLabel==1.0)
tag="NP";
else if(clsLabel==2.0)
{
tag="MT";
Global.file_append("C:\\Users\\DIV\\ff\\Maithili\\Detected_Maithili_Tweets.txt", tag);
}
System.out.println(tag);
}
return tagged_sentence.trim();
}
public static void main(String[] args) throws Exception
{
initiate();
lang_tag("C:\\Users\\DIV\\ff\\Maithili\\tweets.txt");
}
}
As you can see in the lang_tag(), I want to print the lines which are tagged as MT, But I cannot get the lines in any particular variable.
Can someone help me?

Read an excel sheet in, process it and output it

I am playing around with the jexcel libary
I have tried to code a small program which does the following:
Read an xls File
Make some computaitons in the sheet and write it to another place
public class DataProcessor {
private String inputFile;
private String outputFile;
private Sheet sheet;
private Workbook w;
public void setInputFile(String inputFile) {
this.inputFile = inputFile;
}
public void setOutputFile(String outputFile) {
this.outputFile = outputFile;
}
public void read() throws IOException {
File inputWorkbook = new File(inputFile);
Workbook w;
try {
w = Workbook.getWorkbook(inputWorkbook);
sheet = w.getSheet(0);
} catch (BiffException e) {
e.printStackTrace();
}
}
#SuppressWarnings("deprecation")
public void write() throws IOException, WriteException {
File file = new File(inputFile);
WorkbookSettings wbSettings = new WorkbookSettings();
wbSettings.setLocale(new Locale("en", "EN"));
WritableWorkbook workbook = Workbook.createWorkbook(file, wbSettings);
workbook.createSheet("Lolonator", 0);
workbook.createSheet("Lolonator123", 1);
workbook.copy(w);
workbook.write();
workbook.close();
}
public static void main(String[] args) throws IOException, WriteException {
ReadExcel test = new ReadExcel();
test.setInputFile("C:/Users/Desktop/sheet1.xls");
test.read();
System.out.println("####################################################");
System.out.println("File read!");
// Write
System.out.println("####################################################");
System.out.println("Start to write the file!");
WriteExcel out = new WriteExcel();
out.setOutputFile("C:/Users/Desktop/sheet2.xls");
out.write();
System.out.println("Please check the result file!");
}
}
However, this does not work. I do not get any output in my sheet, even though my program runs without exception to the end. I really appreciate your answer!!!

In your write function, you are using "inputFile" as parameter to File constructor but you are not initializing it after you create the out object.
So the following line in the write function
File file = new File(inputFile);
should be
File file = new File(outputFile);
Also are you sure that you do not see any errors after running this code. It should be throwing a null pointer exception.
Hope this helps...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can replace words in pdf file in java? - java

Related

Word found unreadable content in .docx after replacing content through docx4j

Adding UNICODE emoticon on pdf with itext

Extracting font color along with font type from PDF using PDFBox

How to print tagged values from a file?

Read an excel sheet in, process it and output it

Categories

Resources