convert html to ppt using java aspose library - java

i am passing html code to a variable in java. using aspose library, the html code should be executed and rendered into ppt (i am also giving the reference to css in the html).
appreciated if the ppt is editable.

Please use the following java equivalent code on your end.
public static void main(String[] args) throws Exception {
// The path to the documents directory.
String dataDir ="C:\\html\\";
// Create Empty presentation instance
Presentation pres = new Presentation();
// Access the default first slide of presentation
ISlide slide = pres.getSlides().get_Item(0);
// Adding the AutoShape to accommodate the HTML content
IAutoShape ashape = slide.getShapes().addAutoShape(ShapeType.Rectangle, 10, 10, (float) pres.getSlideSize().getSize().getWidth(), (float) pres.getSlideSize().getSize().getHeight());
ashape.getFillFormat().setFillType(FillType.NoFill);
// Adding text frame to the shape
ashape.addTextFrame("");
// Clearing all paragraphs in added text frame
ashape.getTextFrame().getParagraphs().clear();
// Loading the HTML file using InputStream
InputStream inputStream = new FileInputStream(dataDir + "file.html");
Reader reader = new InputStreamReader(inputStream);
int data = reader.read();
String content = ReadFile(dataDir + "file.html");
// Adding text from HTML stream reader in text frame
ashape.getTextFrame().getParagraphs().addFromHtml(content);
// Saving Presentation
pres.save(dataDir + "output.pptx", SaveFormat.Pptx);
}
public static String ReadFile(String FileName) throws Exception {
File file = new File(FileName);
StringBuilder contents = new StringBuilder();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(file));
String text = null;
// repeat until all lines is read
while ((text = reader.readLine()) != null) {
contents.append(text).append(System.getProperty("line.separator"));
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (reader != null) {
reader.close();
}
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
return contents.toString();
}

#Balchandar Reddy,
I have observed your comments and like to share that ImportingHTMLTextInParagraphs.class points to path of file. I have updated the code relate to this.
Secondly, you need to call import com.aspose.slides.IAutoShape on your end to resolve the issue.

I have observed your requirements and regret to share that Aspose.Slides which is an API for managing PowerPoint slides, does not support feature for converting HTML to PPT/PPTX. However, it supports importing HTML text inside slide text frames that you may use.
// Create Empty presentation instance// Create Empty presentation instance
using (Presentation pres = new Presentation())
{
// Acesss the default first slide of presentation
ISlide slide = pres.Slides[0];
// Adding the AutoShape to accomodate the HTML content
IAutoShape ashape = slide.Shapes.AddAutoShape(ShapeType.Rectangle, 10, 10, pres.SlideSize.Size.Width - 20, pres.SlideSize.Size.Height - 10);
ashape.FillFormat.FillType = FillType.NoFill;
// Adding text frame to the shape
ashape.AddTextFrame("");
// Clearing all paragraphs in added text frame
ashape.TextFrame.Paragraphs.Clear();
// Loading the HTML file using stream reader
TextReader tr = new StreamReader(dataDir + "file.html");
// Adding text from HTML stream reader in text frame
ashape.TextFrame.Paragraphs.AddFromHtml(tr.ReadToEnd());
// Saving Presentation
pres.Save("output_out.pptx", Aspose.Slides.Export.SaveFormat.Pptx);
}
I am working as Support developer/ Evangelist at Aspose.

Related

read docx document using java

I have a project steganography to hide docx document into jpeg image. Using apache POI, I can run it and read docx document but only letters can be read.
Even though there are pictures in it.
Here is the code
FileInputStream in = null;
try
{
in = new FileInputStream(directory);
XWPFDocument datax = new XWPFDocument(in);
XWPFWordExtractor extract = new XWPFWordExtractor(datax);
String DataFinal = extract.getText();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line = null;
this.isi_file = extract.getText();
}
catch (IOException x) {}
System.out.println("isi :" + this.isi_file);
How can I read all component in the docx document using java? Please help me and thank you for your helping.
Please check documentation for XWPFDocument class. It contains some useful methods, for example:
getAllPictures() returns list of all pictures in document;
getTables() returns list of all tables in document.
In your code snippet exists line XWPFDocument datax = new XWPFDocument(in);. So after that line your can write some code like:
// process all pictures in document
for (XWPFPictureData picture : datax.getAllPictures()) {
// get each picture as byte array
byte[] pictureData = picture.getData();
// process picture somehow
...
}

How to remove headers and footer from pdf file using pdfbox in java

I am using Pdf Parser to convert pdf to text.Below is my code to convert pdf to text file using java.
My PDF file contains Following Data:
Data Sheet(Header)
PHP Courses for PHP Professionals(Header)
Networking Academy
We live in an increasingly connected world, creating a global economy and a growing need for technical skills. Networking Academy delivers information technology skills to over 500,000 students a year in more than 165 countries worldwide. Networking Academy students have the opportunity to participate in a powerful and consistent learning experience that is supported by high quality, online curricula and assessments, instructor training, hands-on labs, and classroom interaction. This experience ensures the same level of qualifications and skills regardless of where in the world a student is located.
All copyrights reserved.(Footer).
Sample code:
public class PDF_TEST {
PDFParser parser;
String parsedText;
PDFTextStripper pdfStripper;
PDDocument pdDoc;
COSDocument cosDoc;
PDDocumentInformation pdDocInfo;
// PDFTextParser Constructor
public PDF_TEST() {
}
// Extract text from PDF Document
String pdftoText(String fileName) {
File f = new File(fileName);
if (!f.isFile()) {
return null;
}
try {
parser = new PDFParser(new FileInputStream(f));
} catch (Exception e) {
return null;
}
try {
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
} catch (Exception e) {
e.printStackTrace();
try {
if (cosDoc != null) cosDoc.close();
if (pdDoc != null) pdDoc.close();
} catch (Exception e1) {
e.printStackTrace();
}
return null;
}
return parsedText;
}
// Write the parsed text from PDF to a file
void writeTexttoFile(String pdfText, String fileName) {
try {
PrintWriter pw = new PrintWriter(fileName);
pw.print(pdfText);
pw.close();
} catch (Exception e) {
e.printStackTrace();
}
}
//Extracts text from a PDF Document and writes it to a text file
public static void test() {
String args[]={"C://Sample.pdf","C://Sample.txt"};
if (args.length != 2) {
System.exit(1);
}
PDFTextParser pdfTextParserObj = new PDFTextParser();
String pdfToText = pdfTextParserObj.pdftoText(args[0]);
if (pdfToText == null) {
}
else {
pdfTextParserObj.writeTexttoFile(pdfToText, args[1]);
}
}
public static void main(String args[]) throws IOException
{
test();
}
}
The above code works for extracting pdf to text.But my requirement is to ignore Header and Footer and extract only content from pdf file.
Required output:
Networking Academy
We live in an increasingly connected world, creating a global economy and a growing need for technical skills. Networking Academy delivers information technology skills to over 500,000 students a year in more than 165 countries worldwide. Networking Academy students have the opportunity to participate in a powerful and consistent learning experience that is supported by high quality, online curricula and assessments, instructor training, hands-on labs, and classroom interaction. This experience ensures the same level of qualifications and skills regardless of where in the world a student is located.
Please suggest me how to do this.
Thanks.
In general there is nothing special about header or footer texts in PDFs. It is possible to tag that material differently, but tagging is optional and the OP did not provide a sample PDF to check.
Thus, some manual work (or somewhat failure intensive image analysis) generally is necessary to find the regions on the pages for header, content, and footer material.
As soon as you have the coordinates for these regions, though, you can use the PDFTextStripperByAreawhich extends the PDFTextStripper to collect text by regions. Simply define a region for the page content using the largest rectangle including the content but excluding headers and footers, and after pdfStripper.getText(pdDoc) call getTextForRegion for the defined region.
You can use PDFTextStripperByArea to remove "Header" and "Footer" by pdf file.
Code in java using PDFBox.
public String fetchTextByRegion(String path, String filename, int pageNumber) throws IOException {
File file = new File(path + filename);
PDDocument document = PDDocument.load(file);
//Rectangle2D region = new Rectangle2D.Double(x,y,width,height);
Rectangle2D region = new Rectangle2D.Double(0, 100, 550, 700);
String regionName = "region";
PDFTextStripperByArea stripper;
PDPage page = document.getPage(pageNumber + 1);
stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);
String text = stripper.getTextForRegion(regionName);
return text;
}

Displaying Arabic on Device J2ME

I am using some arabic text in my app. on simulator Arabic Text is diplaying fine.
BUT on device it is not displaying Properly.
On Simulator it is like مَرْحَبًا that.
But on device it is like مرحبا.
My need is this one مَرْحَبًا.
Create text resources for a MIDP application, and how to load them at run-time. This technique is unicode safe, and so is suitable for all languages. The run-time code is small, fast, and uses relatively little memory.
Creating the Text Source
اَللّٰهُمَّ اِنِّىْ اَسْئَلُكَ رِزْقًاوَّاسِعًاطَيِّبًامِنْ رِزْقِكَ
مَرْحَبًا
The process starts with creating a text file. When the file is loaded, each line becomes a separate String object, so you can create a file like:
This needs to be in UTF-8 format. On Windows, you can create UTF-8 files in Notepad. Make sure you use Save As..., and select UTF-8 encoding.
Make the name arb.utf8
This needs to be converted to a format that can be read easily by the MIDP application. MIDP does not provide convenient ways to read text files, like J2SE's BufferedReader. Unicode support can also be a problem when converting between bytes and characters. The easiest way to read text is to use DataInput.readUTF(). But to use this, we need to have written the text using DataOutput.writeUTF().
Below is a simple J2SE, command-line program that will read the .uft8 file you saved from notepad, and create a .res file to go in the JAR.
import java.io.*;
import java.util.*;
public class TextConverter {
public static void main(String[] args) {
if (args.length == 1) {
String language = args[0];
List<String> text = new Vector<String>();
try {
// read text from Notepad UTF-8 file
InputStream in = new FileInputStream(language + ".utf8");
try {
BufferedReader bufin = new BufferedReader(new InputStreamReader(in, "UTF-8"));
String s;
while ( (s = bufin.readLine()) != null ) {
// remove formatting character added by Notepad
s = s.replaceAll("\ufffe", "");
text.add(s);
}
} finally {
in.close();
}
// write it for easy reading in J2ME
OutputStream out = new FileOutputStream(language + ".res");
DataOutputStream dout = new DataOutputStream(out);
try {
// first item is the number of strings
dout.writeShort(text.size());
// then the string themselves
for (String s: text) {
dout.writeUTF(s);
}
} finally {
dout.close();
}
} catch (Exception e) {
System.err.println("TextConverter: " + e);
}
} else {
System.err.println("syntax: TextConverter <language-code>");
}
}
}
To convert arb.utf8 to arb.res, run the converter as:
java TextConverter arb
Using the Text at Runtime
Place the .res file in the JAR.
In the MIDP application, the text can be read with this method:
public String[] loadText(String resName) throws IOException {
String[] text;
InputStream in = getClass().getResourceAsStream(resName);
try {
DataInputStream din = new DataInputStream(in);
int size = din.readShort();
text = new String[size];
for (int i = 0; i < size; i++) {
text[i] = din.readUTF();
}
} finally {
in.close();
}
return text;
}
Load and use text like this:
String[] text = loadText("arb.res");
System.out.println("my arabic word from arb.res file ::"+text[0]+" second from arb.res file ::"+text[1]);
Hope this will help you. Thanks

Java: combine 2000-5000 PDFs into 1 using iText yield OutOfMemorryError

I have eyeballing this code for a long time, trying to reducing the amount of memory the code use and still it generated java.lang.OutOfMemoryError: Java heap space. As my last resort, I want to ask the community on how can I improve this code to avoid OutOfMemoryError
I have a driver/manifest file (.txt file) that contain information about the PDFs. I have about 2000-5000 pdf inside a zip file that I need to combine together. Before the combining, for each pdf, I need to add 2-3 more pdf pages to it. Manifest object holds information about a pdf.
try{
blankPdf = new PdfReader(new FileInputStream(config.getBlankPdf()));
mdxBacker = new PdfReader(new FileInputStream(config.getMdxBacker()));
theaBacker = new PdfReader(new FileInputStream(config.getTheaBacker()));
mdxAffidavit = new PdfReader(new FileInputStream(config.getMdxAffidavit()));
theaAffidavit = new PdfReader(new FileInputStream(config.getTheaAffidavit()));
ImmutableList<Manifest> manifestList = //Read manifest file and obtain List<Manifest>
File zipFile = new File(config.getInputDir() + File.separator + zipName);
//Extracting PDF into `process` folder
ZipUtil.extractAll(config.getExtractPdfDir(), zipFile);
outputPdfName = zipName.replace(".zip", ".pdf");
outputZipStream = new FileOutputStream(config.getOutputDir() +
File.separator + outputPdfName);
document = new Document(PageSize.LETTER, 0, 0, 0, 0);
writer = new PdfCopy(document , outputZipStream);
document.open(); //Open the document
//Start combining PDF files together
for(Manifest m : manifestList){
//Obtain full path to the current pdf
String pdfFilePath = config.getExtractPdfDir() + File.separator + m.getPdfName();
//Before combining PDF, add backer and affidavit to individual PDF
PdfReader pdfReader = PdfUtil.addBackerAndAffidavit(config, pdfType, m,
pdfFilePath, blankPdf, mdxBacker, theaBacker, mdxAffidavit,
theaAffidavit);
for(int pageNumber=1; pageNumber<=pdfReader.getNumberOfPages(); pageNumber++){
document.newPage();
PdfImportedPage page = writer.getImportedPage(pdfReader, pageNumber);
writer.addPage(page);
}
}
} catch (DocumentException e) {
} catch (IOException e) {
} finally{
if(document != null) document.close();
try{
if(outputZipStream != null) outputZipStream.close();
if(writer != null) writer.close();
}catch(IOException e){
}
}
Please, rest assure that I have look at this code for a long time, and try rewrite it many times to reduce the amount of memory it using. After the OutOfMemoryError, there are still lots of pdf files that have not been added 2-3 extra pages, so I think it is inside addBackerAndAffidavit, however, I try to close every resources I opened, but it still exception out. Please help.
You need to invoke PdfWriter#freeReader() by end of every loop to free the involved PdfReader. The PdfCopy#freeReader() has this method inherited from PdfWriter and does the same. See also the javadoc:
freeReader
public void freeReader(PdfReader reader)
throws IOException
Description copied from class: PdfWriter
Use this method to writes the reader to the document and free the memory used by it. The main use is when concatenating multiple documents to keep the memory usage restricted to the current appending document.
Overrides:
freeReader in class PdfWriter
Parameters:
reader - the PdfReader to free
Throws:
IOException - on error

How to load a file across the network and handle it as a String

I would like to display the contents of the url in a JTextArea. I have a url that points to an XML file, I just want to display the contents of the file in JTextArea. how can I do this?
better JComponent for Html contents would be JEditorPane/JTextPane, then majority of WebSites should be displayed correctly there, or you can create own Html contents, but today Java6 supporting Html <=Html 3.2, lots of examples on this forum or here
You can do that way:
final URL myUrl= new URL("http://www.example.com/file.xml");
final InputStream in= myUrl.openStream();
final StringBuilder out = new StringBuilder();
final byte[] buffer = new byte[BUFFER_SIZE_WHY_NOT_1024];
try {
for (int ctr; (ctr = in.read(buffer)) != -1;) {
out.append(new String(buffer, 0, ctr));
}
} catch (IOException e) {
// you may want to handle the Exception. Here this is just an example:
throw new RuntimeException("Cannot convert stream to string", e);
}
final String yourFileAsAString = out.toString();
Then the content of your file is stored in the String called yourFileAsAString.
You can insert it in your JTextArea using JTextArea.insert(yourFileAsAString, pos) or append it using JTextArea.append(yourFileAsAString).
In this last case, you can directly append the readed text to the JTextArea instead of using a StringBuilder. To do so, just remove the StringBuilder from the code above and modify the for() loop the following way:
for (int ctr; (ctr = in.read(buffer)) != -1;) {
youJTextArea.append(new String(buffer, 0, ctr));
}
Assuming its HTTP URL
Open the HTTPURLConnection and read out the content
Using java.net.URL open resource as stream (method openStream()).
Load entire as String
place to your text area

Categories