Java Print API scaling with HTML

Java Print API scaling with HTML - java

I am stuck at this now. I have checked almost every popular question on SO site regarding Java Print API to print HTML files (with third-party libraries such as Flying Saucer, iText, CSSBox, etc). But still couldn't get it worked at my end yet.
Here are the links of my previous questions:
https://stackoverflow.com/questions/28106757/java-print-api-prints-html-with-huge-size
How to print HTML and not the code using Java Print API?
Basically I am trying to print the HTML file that contains some CSS with <style> tag. This CSS has classes applied for <table> and <p> tags for example. I cannot change CSS code inside HTML as it should be viewed exactly with this style in browser.
Below is my program
import java.awt.print.PrinterException;
import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import javax.print.PrintException;
import javax.print.PrintService;
import javax.print.PrintServiceLookup;
import javax.print.attribute.HashPrintServiceAttributeSet;
import javax.print.attribute.PrintServiceAttributeSet;
import javax.print.attribute.standard.PrinterName;
import javax.swing.JEditorPane;
public class Print {
public static void main(String[] args) throws PrintException {
String printerName = "\\\\network-path\\myPrinter";
String fileName = "C:\\log\\myLog.html";
URL url = null;
try {
url = (new File(fileName)).toURI().toURL();
} catch (MalformedURLException e) {
e.printStackTrace();
}
JEditorPane editorPane = new JEditorPane();
editorPane.setEditable(false);
if (url != null) {
try {
editorPane.setPage(url);
} catch (IOException e) {
System.err.println("Attempted to read a bad URL: " + url);
}
} else {
System.err.println("Couldn't find file: " + fileName);
}
PrintServiceAttributeSet printServiceAttributeSet = new HashPrintServiceAttributeSet();
printServiceAttributeSet.add(new PrinterName(printerName, null));
PrintService[] printServices = PrintServiceLookup.lookupPrintServices(null, printServiceAttributeSet); // list of printers
PrintService printService = printServices[0];
PrintRequestAttributeSet pras = new HashPrintRequestAttributeSet();
Copies copies = new Copies(1);
pras.add(copies);
pras.add(OrientationRequested.PORTRAIT);
pras.add(MediaSizeName.ISO_A4);
try {
editorPane.print(null, null, false, printService, pras, false);
} catch (PrinterException e) {
throw new PrintException("Print error occurred:" + e.getMessage());
}
}
}
The problem is above code works and I get good print of the above HTML with proper CSS styling. But it just scales up. When the said HTML is opened in IE it looks different and when it is printed by the code what I get is different. I would prefer the print to be same as it is viewed in IE.
I also tried to get it done by passing SimpleDoc object to the printer. My printService supports below formats:
image/gif [B
image/gif java.io.InputStream
image/gif java.net.URL
image/jpeg [B
image/jpeg java.io.InputStream
image/jpeg java.net.URL
image/png [B
image/png java.io.InputStream
image/png java.net.URL
application/x-java-jvm-local-objectref java.awt.print.Pageable
application/x-java-jvm-local-objectref java.awt.print.Printable
application/octet-stream [B
application/octet-stream java.net.URL
application/octet-stream java.io.InputStream
But nothing works with SimpleDoc. I then tried converting HTML to .png using CSSBox. It works but for multipage HTML, generated image is shrunk and is not viewable for printing. With Flying Saucer and iText version 2.0.8 I get NoSuchMethodError. Also even if I get it worked (by compiling the source against the said iText version) the output is broken.
Can someone please help? I would prefer to stick to Java Print API than using any third-party. Am I missing something when using SimpleDoc object approach? What settings need to be set to print above HTML using SimpleDoc object and available printService formats.

Related

Certain PDF files are not downloading correctly

I have very little experience in JAVA (working on my first real program) been looking for a solution for hours. I have hacked together a small program to download PDF files from a link. It works fine for most links but some of them just don't work.
The connection type for all the links that works show up as application/pdf but some links show a connection of text/html for some reason.
I keep trying to rewrite the code using whatever I can find online but I keep getting the same result.
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.net.ConnectException;
import java.net.URL;
import java.net.URLConnection;
public class Main {
public static void main(String[] args) throws Exception {
String link = "https://www.menards.com/main/items/media/UNITE051/SDS/SpectracideVegetationKillerReadyToUse2-228-714-8845-SDS-Feb16.pdf";
String fileName = "File Name.pdf";
URL url1 = new URL(link);
try {
URLConnection urlConn = url1.openConnection();
byte[] buffer = new byte[1024];
double downloaded = 0.00;
int read = 0;
System.out.println(urlConn.getContentType()); // This shows as text/html but it should be PDF
FileOutputStream fos1 = new FileOutputStream(fileName);
BufferedInputStream is1 = new BufferedInputStream(urlConn.getInputStream());
BufferedOutputStream bout = new BufferedOutputStream(fos1, 1024);
try {
while ((read = is1.read(buffer, 0, 1024)) >= 0) {
bout.write(buffer, 0, read);
downloaded += read;
}
bout.close();
fos1.flush();
fos1.close();
is1.close();
} catch (Exception e) {}
} catch (Exception e) {}
}
}
I need to be able to download the PDF from the link in the code.
This is what is saved in a text document of the PDF:
<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script src="/_Incapsula_Resource?SWJIYLWA=5074a744e2e3d891814e9a2dace20bd4,719d34d31c8e3a6e6fffd425f7e032f3">
</script>
<body>
</body></html>

The website implemented a check to make sure I was using a browser. I copied the user agent from chrome and it allowed me to download the PDF.

The URL that you are fetching doesn't point to a PDF file. It is pointing to a HTML file which embeds the PDF file. You probably need to closely look at what is the URL to PDF file. You code seems alright.
Just do a cURL on the URL and see. It will most probably return a HTML file.

Can't find a Print Service using java in Windows

I am trying to locate a print service that can handle a job, i am using the PrintService API in Java.
This is my code:
private PrintService[] services = null;
services = PrintServiceLookup.lookupPrintServices(DocFlavor.INPUT_STREAM.PDF, null);
System.out.println("We found : " + services.length + " service(s)");
The output was always:
We found : 0 service(s)
I don't know why it can't find a service although I have a printer installed in my computer! noted that:
The printer work very well
I used the same code before when i had Linux OS, it worked. Now i am using Windows..

There was no PrintService found corresponding to the specified DocFlavor: 'PDF'
Because when i tried to find out which are the DocFlavor supported by my printer:
PrintService[] prnSvc = PrintServiceLookup.lookupPrintServices(null, null);
DocFlavor[] docFalvor = prnSvc[0].getSupportedDocFlavors();
for (int i = 0; i < docFalvor.length; i++) {
System.out.println(docFalvor[i].getMimeType());
}
I got just:
image/gif
image/gif
image/gif
image/jpeg
image/jpeg
image/jpeg
image/png
image/png
image/png
application/x-java-jvm-local-objectref
application/x-java-jvm-local-objectref
application/octet-stream
application/octet-stream
application/octet-stream
Similar posts: Printer services Not found? and Java Print program with Specfications issues?

It seems like there is a problem with the PDF capability under Windows. I ran into the same problem and haven't found a solution yet.
Other people have found a workaround, but this seems to be illegal by now (see https://community.oracle.com/thread/2046162).
EDIT
I worked around this problem by converting the PDF to a PNG image.
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import java.awt.image.BufferedImage;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.List;
import static java.awt.image.BufferedImage.TYPE_INT_RGB;
import static javax.imageio.ImageIO.write;
import static org.apache.pdfbox.pdmodel.PDDocument.load;
public class PdfToImageConverter {
public static String GIF = "gif";
public static String JPG = "jpg";
public static String PNG = "png";
public static byte[] convertPdfTo(final String imageType, final byte[] pdfContent) throws IOException {
final PDDocument document = load(new ByteArrayInputStream(pdfContent));
final List<PDPage> allPages = document.getDocumentCatalog().getAllPages();
final PDPage pdPage = allPages.get(0);
final BufferedImage image = pdPage.convertToImage(TYPE_INT_RGB, 300);
final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
write(image, "png", outputStream);
outputStream.flush();
final byte[] imageInByte = outputStream.toByteArray();
outputStream.close();
return imageInByte;
}
}
I added MediaSizeName.ISO_A4 as PrintRequestAttribute to the PrintJob, this solution works for me.

Java Applet PDF printing

I am trying to build a java applet which prints a PDF file and sends it to a label printer rather than the default. I explored desktop.print but couldn't work out how to specify the printer.
This is the code I have, i've tried to look for solutions but have ended stuck. I have signed the applet and the error it gives me it just says application error 0
import java.io.*;
import java.net.*;
import javax.swing.*;
import java.awt.print.*;
import javax.print.*;
import javax.print.attribute.*;
import javax.print.attribute.standard.*;
public class printPDF extends JApplet {
public void init(){
String uri = System.getProperty("user.home") + "\\jobbase\\print.pdf";
DocFlavor flavor = DocFlavor.INPUT_STREAM.PDF;
PrintRequestAttributeSet aset = new HashPrintRequestAttributeSet();
aset.add(new PrinterName("label", null));
aset.add(new Copies(1));
PrintService[] pservices =
PrintServiceLookup.lookupPrintServices(flavor, aset);
if (pservices.length > 0) {
DocPrintJob printJob = pservices[0].createPrintJob();
try{
FileInputStream fis = new FileInputStream(uri);
Doc doc = new SimpleDoc(fis, flavor, null);
try {
printJob.print(doc, aset);
} catch (PrintException e) {
System.err.println(e);
}
} catch(IOException ioe){
ioe.printStackTrace(System.out);
}
} else {
System.err.println("No suitable printers");
}
}
}

You can't just send the PDF to the printer unless you know it can understand it. Most of the time you need to rasterize it on the client. I write a blog article explaining the options at http://www.jpedal.org/PDFblog/2010/01/printing-pdf-files-from-java/

If you know the name of the printer you can achieve this. In one client I needed silent printing: if a printer named appprinter was present, I used it, if not I tried with the default. This worked out fine.
For printing I use ICEPDF.
Kate: thanks for the suggestion, honestly IcePDF is pretty straight forward, this example is included in the source code that you can download from the link above. In order to obtain the PrinterService (aka printer) needed you can delete all the user input requested by keyboard and just use the one with the name you want.
So, in version 5.0.5: [install-folder]/examples/printservices/PrintService.java
delete user selection of printservice: lines 106 to 155
add instead:
PrintService selectedService=null;
for (int j=0;j<services.length;j++) {
if ("myprintername".equalsIgnoreCase(services[j].getName())) {
selectedService=aux[j];
}
}
Hope now it is more useful.
Best regards.

Check HTTP Image path if valid

I have a question in Java how can I check if an image http path is valid or existing?
For example:
This image is existing.
http://g0.gstatic.com/ig/images/promos/homepage_home.png
But this one is not.
http://sampledomain.com/images/fake.png
I would like to make a logic such that:
If(image is existing)
- do this
Else
- do others
Thanks
UPDATE:
Tried it with this code that I got while googling:
import java.awt.Image;
import java.io.IOException;
import java.net.URL;
import javax.imageio.ImageIO;
public class TestImage {
public static void main(String[] arg){
Image image = null;
try {
URL url = new URL("http://g0.gstatic.com/ig/images/promos/homepage_home.png");
image = ImageIO.read(url);
} catch (IOException e) {
System.out.println("Error");
}
}
}
But I always get an error...I am not sure if this is possible.. Any other thoughts?

Make an Http Head request. If it's an existing path you'll get a response back, otherwise you'll get an error.
This does not check that it is a valid image though, just that the path exists. If you want to check whether the image is valid, I think you have got no choice other than to download it.

You can check if the image exists or not using selenium javascript executor.
Hope this code helps you -
picPath is the Url path you want to validate.
result = (Boolean) ((JavascriptExecutor) driver).executeScript(
"var http = new XMLHttpRequest(); http.open('HEAD',arguments[0], false); http.send();return http.status!=404;", picPath);

get links in a web site

how can i get links in a web page without loading it? (basically what i want is this. a user enters a URL and i want to load all the available links inside that URL.) can you please tell me a way to achieve this

Here is example Java code, specifically:
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
public class Main {
public static void main(String args[]) throws Exception {
URL url = new URL(args[0]);
Reader reader = new InputStreamReader((InputStream) url.getContent());
System.out.println("<HTML><HEAD><TITLE>Links for " + args[0] + "</TITLE>");
System.out.println("<BASE HREF=\"" + args[0] + "\"></HEAD>");
System.out.println("<BODY>");
new ParserDelegator().parse(reader, new LinkPage(), false);
System.out.println("</BODY></HTML>");
}
}
class LinkPage extends HTMLEditorKit.ParserCallback {
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (t == HTML.Tag.A) {
System.out.println("<A HREF=\"" + a.getAttribute(HTML.Attribute.HREF) + "\">"
+ a.getAttribute(HTML.Attribute.HREF) + "</A><BR>");
}
}
}

You'll have to load the page on your server and then find the links, preferably by loading up the document in an HTML/XML parser and traversing that DOM. The server could then send the links back to the client.
You can't do it on the client because the browser won't let your Javascript code look at the contents of the page from a different domain.

If you want the content of a page you'll have to load it. But what you can do is loading it in memory and parse it to get all the <a> tags and their content.
You'll be able to parse this XML with tools like JDom or Sax if you're working with java (as your tag says) or with simple DOM tools with javascript.
Resources :
Parse XML with javascript
On the same topic :
get all the href attributes of a web site (javascript)

Just open an URLConnection, gets the page and parse it.

public void extract_link(String site)
{
try {
List<String> links = extractLinks(site);
for (String link : links) {
System.out.println(link);
}
} catch (Exception e) {
System.out.println(e);
}
}
This is a simple function to view all links in a page.
If you want to view link in the inner links , just call it recursively(but make sure you give a limit according to your need).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Print API scaling with HTML - java

Related

Certain PDF files are not downloading correctly

Can't find a Print Service using java in Windows

Java Applet PDF printing

Check HTTP Image path if valid

get links in a web site

Categories

Resources