How to get favicon.ico from a website using Java?

How to get favicon.ico from a website using Java? - java

So I'm making an application to store shortcuts to all the user's favorite applications, acting kind of like a hub. I can have support for actual files and I have a .lnk parser for shortcuts. I thought it would be pretty good for the application to support Internet shortcuts, too. This is what I'm doing:
Suppose I'm trying to get Google's icon (http://www.google.com/favicon.ico).
I start out by getting rid of the extra pages (e.g. www.google.com/anotherpage would become www.google.com.
Then, I use ImageIO.read(java.net.URL) to get the Image.
The problem is that ImageIO never returns an Image when I call this method:
String trimmed = getBaseURL(page); //This removes the extra pages
Image icon = null;
try {
String fullURLString = trimmed + "/favicon.ico";
URL faviconURL = new URL(fullURLString);
icon = ImageIO.read(faviconURL);
} catch (IOException e) {
e.printStackTrace();
}
return icon;
Now I have two questions:
Does Java support the ICO format even though it is from Microsoft?
Why does ImageIO fail to read from the URL?
Thank you in advance!

Try Image4J.
As this quick Scala REPL session shows (paste-able as Java code):
> net.sf.image4j.codec.ico.ICODecoder.read(new java.net.URL("http://www.google.com/favicon.ico").openStream())
res1: java.util.List[java.awt.image.BufferedImage] = [BufferedImage#65712a80: type = 2 DirectColorModel: rmask=ff0000 gmask=ff00 bmask=ff amask=ff000000 IntegerInterleavedRaster: width = 16 height = 16 #Bands = 4 xOff = 0 yOff = 0 dataOffset[0] 0]
UPDATE
To answer your questions: Does Java support ICO? Doesn't seem like it:
> javax.imageio.ImageIO.read(new java.net.URL("http://www.google.com/favicon.ico"))
java.lang.IllegalArgumentException: Empty region!
Why does ImageIO fail to read from the URL? Well, the URL itself seems to work for me, so you may have a proxy/firewall issue, or it could be the problem above.

Old post, but for future reference:
I've written a plugin for ImageIO that adds support for .ICO (MS Windows Icon) and .CUR (MS Windows Cursor) formats.
You can get it from GitHub here: https://github.com/haraldk/TwelveMonkeys/
After you have installed the plugin, you should be able to read the icon, using the code in the original post without any modifications.

You don't need ImageIO for this. Just copy the bytes, same as for any other static resource.

There is Apache Commons Imaging for reading ico files and others: https://commons.apache.org/proper/commons-imaging/index.html
Reading an ico file works like this:
List<BufferedImage> images = org.apache.commons.imaging.Imaging.getAllBufferedImages(yourIcoFile);
In your case you have to download it first, I guess.

Related

pdfbox embedding subset font for annotations

I am trying to use Apache PDFBOX v2.0.21 to modify existing PDF documents, adding signatures and annotations. That means that I am actively using incremental save mode. I am also embedding LiberationSans font to accommodate some Unicode characters. It makes sense for me to use the subsetting feature of PDF embedded fonts as embedding LiberationSans in full makes the PDF file around 200+ KB more in side.
After multiple trials and errors I finally managed to have something working - all but the font subsetting. The way I do this is to initialize the PDFont object once using
try (InputStream fs = PDFService.class.getResourceAsStream("/static/fonts/LiberationSans-Regular.ttf")) {
_font = PDType0Font.load(pddoc, fs, true);
}
And then to use custom Appearance Stream to show the text.
private void addAnnotation(String name, PDDocument doc, PDPage page, float x, float y, String text) throws IOException {
List<PDAnnotation> annotations = page.getAnnotations();
PDAnnotationRubberStamp t = new PDAnnotationRubberStamp();
t.setAnnotationName(name); // might play important role
t.setPrinted(true); // always visible
t.setReadOnly(true); // does not interact with user
t.setContents(text);
PDRectangle rect = ....;
t.setRectangle(rect);
PDAppearanceDictionary ap = new PDAppearanceDictionary();
ap.setNormalAppearance(createAppearanceStream(doc, t));
ap.getCOSObject().setNeedToBeUpdated(true);
t.setAppearance(ap);
annotations.add(t);
page.setAnnotations(annotations);
t.getCOSObject().setNeedToBeUpdated(true);
page.getResources().getCOSObject().setNeedToBeUpdated(true);
page.getCOSObject().setNeedToBeUpdated(true);
doc.getDocumentCatalog().getPages().getCOSObject().setNeedToBeUpdated(true);
doc.getDocumentCatalog().getCOSObject().setNeedToBeUpdated(true);
}
private PDAppearanceStream createAppearanceStream(final PDDocument document, PDAnnotation ann) throws IOException
{
PDAppearanceStream aps = new PDAppearanceStream(document);
PDRectangle rect = ann.getRectangle();
rect = new PDRectangle(0, 0, rect.getWidth(), rect.getHeight());
aps.setBBox(rect); // set bounding box to the dimensions of the annotation itself
// embed our unicode font (NB: yes, this needs to be done otherwise aps.getResources() == null which will cause NPE later during setFont)
PDResources res = new PDResources();
_fontName = res.add(_font).getName();
aps.setResources(res);
PDAppearanceContentStream apsContent = null;
try {
// draw directly on the XObject's content stream
apsContent = new PDAppearanceContentStream(aps);
apsContent.beginText();
apsContent.setFont(_font, _fontSize);
apsContent.showText(ann.getContents());
apsContent.endText();
}
finally {
if (apsContent != null) {
try { apsContent.close(); } catch (Exception ex) { log.error(ex.getMessage(), ex); }
}
}
aps.getResources().getCOSObject().setNeedToBeUpdated(true);
aps.getCOSObject().setNeedToBeUpdated(true);
return aps;
}
This code runs, but creates a PDF with dots instead of actual characters, which, I guess, means that the font subset has not been embedded. Moreover, I get the following warnings:
2021-04-17 12:33:31.326 WARN 20820 --- [ main]
o.a.p.pdmodel.PDAbstractContentStream : attempting to use subset
font LiberationSans without proper context
After looking through the source code, I get and I guess that I am messing something up when creating the appearance stream - somehow it's not connected with the PDDocument and the subsetting does not continue normally. Note that the above code works well when the font is embedded fully (i.e. if I call PDType0Font.load with the last parameter set to false)
Can anyone think of some hint to give to me? Thank you!

I don't know - am I lucky? It is very often that luckiness in programming points to something completely wrong or misleading. In any case, if someone can still give a hint, my ears are more than open...
Again, after looking through the code, I saw the following in PDDocument.save():
// subset designated fonts
for (PDFont font : fontsToSubset)
{
font.subset();
}
This is not happening in PDDocument.saveIncremental() which I am using. Just to mess around with the code, I went and did the following just before calling saveIncremental() on my document:
_font.subset(); // you can see in the beginning of the question how _font is created
_font.getCOSObject().setNeedToBeUpdated(true);
pddoc.saveIncremental(baos);
Believe it or not, but the document was saved correctly - at least it appears correct in Acrobat Reader DC and Chrome & Firefox PDF viewers. Note that Unicode codepoints are added to the subset for the font during showText() on appearance content stream.
UPDATE 18/04/2021: as I mentioned in the comments, I got reports from users that started seeing messages like "Cannot extract the embedded font XXXXXX+LiberationSans-Regular from ...", when they opened the modified PDF files. Strangely enough, I didn't see these messages during my tests. It turns out that my copy of Acrobat Reader DC was newer than theirs, and specifically with the continuous release version 2021.001.20149 no errors were shown, while with the continuous release version 2020.012.20043 the above message was shown.
After investigations, it turns out that the problem was with the way I was embedding the font. I am not aware if any other way exists, and I am not that familiar with the PDF specification to know otherwise. What I was doing, as you can see from the above code, was to load the font ONCE for the document, and then to use it freely in the resource dictionary of the appearance stream of EVERY annotation. This had as a result all the resource dictionaries of the annotation content streams to reference an F1 font that was defined with the SAME /BaseFont name. The PDF Reference, 3rd ed. on p.323 specifically states that:
"... the PostScript name of the font - ... - begins with a tag
followed by a plus sign (+). The tag consists of exactly six uppercase
letters; the choice of letters is arbitrary, but different subsets in
the same PDF file must have different tags..."
Once I started to call PDType0Font.load for each of my annotations and calling subset() (and of course setNeedToBeUpdated) after creating appearance stream for each of them, I saw that the BaseName attributes started to look indeed differently - and indeed, the older 2020 version of Acrobat Reader DC stopped complaining.
[edit 07/10/2021: even trying to use a single PDFont object per page (having multiple annotations with this font), and subsetting it once, after having called showText on appearances of all annotations, appears to not work - it appears that the subsetting uses the letters I passed to the first showText, and not the others, resulting in wrong rendering of the 2nd, 3rd etc. annotations that might have characters that didn't exist in the 1st annotation - so I reiterate that what worked was to use loadFont for each separate annotation and then (after modifying appearance with showText, which will mark the letters to be used during subsetting) to call subset() on each of these fonts (which will result in the change of the font name)]
Note that other than using iText RUPS for inspecting the PDF contents, one could use Foxit PDF viewer to at least ensure that the subset font names are different. Acrobat Reader DC and PDF-xChange in Properties -> Fonts just show the initial font name, like LiberationSans, without showing the 6-letter unique prefix.
UPDATE 19/04/2021 I am still working on this issue - because I still get reports about the infamous "Cannot extract the embedded font" message. It is quite possible that the original cause of that message was not (or not only) the fact that the different subsets had same BaseFont names. One thing that I am observing is that on some computers, the stamp annotations that I am using cause Acrobat Reader DC to open automatically the so called "Comments pane" - there are options to turn this automatic thing off (Preferences -> Commenting -> Show comments pane when a PDF with comments is opened). When this pane opens, either manually or automatically, the error message appears (and I was on my wits ends to see why same version of Acrobat Reader DC behaves differently for different machines). I think that Acrobat Reader tries to extract the full version of the font and fails, since it is only a subset. But, I guess, this doesn't have to do with the semantic contents of the document - the document still passes "qpdf --check". I am currently trying to find if it is possible to restrict stamps to not allow comments - i.e. some way to disable the comments pane in Acrobat Reader DC, although I have little hope.
UPDATE 20/04/2021 opened a new question here

Get cover art from a music file using JAudioTagger in Java

I'm using JAudioTagger to fetch the metadata from music files, getting the title, year etc is working fine but I am having a problem with getting the cover art. I have not been able to find any examples searching online, any help would be great!
Here is my current code, which the coverArt BufferedImage is showing up as null when debugging. I have checked and the mp3 file has a cover image.
ID3v23Tag id3v23Tag = (ID3v23Tag)tag;
TagField coverArtField =
id3v23Tag.getFirstField(org.jaudiotagger.tag.id3.ID3v23FieldKey.COVER_ART.getFieldName());
FrameBodyAPIC body = (FrameBodyAPIC)((ID3v23Frame)coverArtField).getBody();
byte[] imageRawData = (byte[])body.getObjectValue(DataTypes.OBJ_PICTURE_DATA);
coverArt = ImageIO.read(ImageIO.createImageInputStream(new ByteArrayInputStream(imageRawData)));

In my application I use
MP3File mp3;
mp3.getTag().getFirstArtwork();
which returns the firstArtwork of the MP3 (which is in most cases the cover you are looking for). This can be cast to a BufferedImage if necessary.

Anyone still looking for an answer can do this..
AudioFile f = AudioFileIO.read(new File(path));
Tag tag = f.getTag();
if(tag.hasField("Cover Art")){
byte[] b = tag.getFirstArtwork().getBinaryData();
}
Now you have your image in binary data. you can easily use it with glide or picasso if you want..
Glide.with(this).load(b).into(ImageView);

Normally, the easiest way is simply:
List<Artwork> existingArtworkList = tag.getArtworkList();
You don't have to perform any casting nor work at the frame body level. Is there are reason you are doing this?
Take a look at the imageRawData - is that being read correctly? Maybe the problem is at the imageio level. If it's a JPEG it should begin 0xFF, 0xD8 for example.

Pink/Reddish tint while resizing jpeg images using java thumbnailator or imgscalr

I am trying to convert an image (url below) using two libraries (thumbnailator and imgscalr. My code works on most of the images except a few which after conversion have a pink/reddish tint.
I am trying to understand the cause and would welcome any recommendation.
Note - Image type of this image is 5 i.e BufferedImage.TYPE_3BYTE_BGR and i am using Java 7
Using Thumbnailator
Thumbnails.of(fromDir.listFiles())
.size(thumbnailWidth, thumbnailHeight)
.toFiles(Rename.SUFFIX_HYPHEN_THUMBNAIL);
Using imgscalr
BufferedImage bufferedImage = ImageIO.read(file);
final BufferedImage jpgImage;
LOG.debug("image type is =[{}] ", bufferedImage.getType());
BufferedImage scaledImg = Scalr.resize(bufferedImage, Method.ULTRA_QUALITY, thumbnailWidth, thumbnailHeight, Scalr.OP_ANTIALIAS);
File thumbnailFile = new File(fromDirPath + "/" + getFileName(file.getName()) +THUMBNAIL_KEYWORD + ".png");
ImageIO.write(scaledImg, getFileExtension(file.getName()), thumbnailFile);
bufferedImage.flush();
scaledImg.flush();

I get this question a lot (author of imgscalr) -- the problem is almost always that you are reading/writing out different file formats and the ALPHA channel is causing one of your color channels (R/G/B) to be culled from the resulting file.
For example, if you read in a file that was ARGB (4 channel) and wrote it out as a JPG (3 channel) - unless you purposefully manipulate the image types yourself and render the old image to the new one directly, you will get a file with a "ARG" channels... or more specifically, just Red and Green - no Blue.
PNG supports an alpha channel and JPG does not, so be aware of that.
The way to fix this is to purposefully create appropriate BufferedImage's of the right type (RGB, ARGB, etc.) and using the destImage.getGraphics() call to render one image to the other before writing it out to disk and re-encoding it.
Sun and Oracle have NEVER made the ImageIO libraries smart enough to detect the unsupported channels when writing to differing file types, so this behavior happens all the time :(
Hope that helps!

The following piece of code resolved my issue:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Thumbnails.of(new ByteArrayInputStream(imageByteArray))
.outputFormat("jpg")
.size(200, 200)
.toOutputStream(outputStream);
return baos.toByteArray();
I am using Thumbnailator and the code was posted here: https://github.com/coobird/thumbnailator/issues/23

Trying to serve image stored in GAE Datastore originally created by canvas.toDataURL()

I have a canvas painted by the user.
In the JavaScript I do:
var data = canvas.toDataURL().substr(22);
// snipped code that sets up xhr POST to "d/"
var params = "img=" + encodeURIComponent(data);
xhr.send(params);
I substr(22) to get rid of "data:image/png;base64,"
Then in app engine I do:
doodle.setProperty("img", new Text(req.getParameter("img")));
So I am setting the img property of the doodle Entity to the canvas.toDataURL().substr(22)
Now, when I want to retrieve the image, I do:
if (debugimg) {
resp.setContentType("text/plain");
resp.getWriter().print(((Text)groove.getProperty("img")).getValue());
}
else {
resp.setContentType("image/png;base64");
resp.getWriter().print(((Text)groove.getProperty("img")).getValue());
}
But for the life of me, the image never comes up.
Here is an example. I drew this, and can save it and render it in JavaScript.
https://yougotadoodle.appspot.com/d.jsp?id=1483002
If I use debugimg, this is what is being saved:
http://yougotadoodle.appspot.com/d?id=1483002&img=true&debugimg=true
But when I try to serve it with setContentType("image/png;base64") or even just "image/png" you get a broken picture:
http://yougotadoodle.appspot.com/d?id=1483002&img=true
I have tried a few different things, including not substr(22)ing it. Any ideas?
I tried using a Blob(), so storing it like this:
doodle.setProperty("img", new Blob(req.getParameter("img").getBytes()));
and reading it like this:
resp.getWriter().print(((Blob)groove.getProperty("img")).getBytes());
But that seemed to spit out somethign like this:
[B#1f11e0f

You have to decode this string before serving it as image/png because it is the Base64 encoded version.
I tested it locally in Python and your Hello SO! worked perfectly after decoding the given string. I'm not sure how to do it in Java but it should be fairly easy.

Here are three code snippets that have worked for me on the JS side (jpg) through the put to a blob property. May not be optimal, but it does work. HTH. -stevep
Create canvas render:
imgFinalData = canvas.toDataURL('image/jpg', 1.0);
Setup variable for POST to GAE:
f64 = imgFinalData.substr(imgFinalData.indexOf(',')+1).toString();
Post to GAE (fd is an array used to store mutiple POST vars):
fd.push('bytes=' + escape(f64));
//Here is the call with fd that handles the post:
postXmlHttpRequest(url, fd.join('&'), handlePostFinal);
One the GAE side (Python):
Property that stores POST data (line from entity class):
bytes = db.BlobProperty(required=True, indexed=False)
How the post data is processed b/4 put:
data = urllib.unquote(self.request.get('bytes'))
data = data.replace(' ','+')
bytes = base64.b64decode(data + '=' * (4 - len(data) % 4))
Property line inside the entity statement for put:
bytes = db.Blob(bytes),

A good library for converting PDF to TIFF? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I need a Java library to convert PDFs to TIFF images. The PDFs are faxes, and I will be converting to TIFF so that I can then do barcode recognition on the image. Can anyone recommend a good free open source library for conversion from PDF to TIFF?

I can't recommend any code library, but it's easy to use GhostScript to convert PDF into bitmap formats. I've personally used the script below (which also uses the netpbm utilties) to convert the first page of a PDF into a JPEG thumbnail:
#!/bin/sh
/opt/local/bin/gs -q -dLastPage=1 -dNOPAUSE -dBATCH -dSAFER -r300 \
-sDEVICE=pnmraw -sOutputFile=- $* |
pnmcrop |
pnmscale -width 240 |
cjpeg
You can use -sDEVICE=tiff... to get direct TIFF output in various TIFF sub-formats from GhostScript.

Disclaimer: I work for Atalasoft
We have an SDK that can convert PDF to TIFF. The rendering is powered by Foxit software which makes a very powerful and efficient PDF renderer.

we here also doing conversion PDF -> G3 tiffs with high and low res. From my experience the best tool you can have is Adobe PDF SDK, the only problem with it is its insane price. So we don't use it.
what works fine for us is ghostscript, last versions are pretty much robust and do render correctly majority of the pdfs. And we have quite a few of them coming during the day. In production conversion is done using the gsdll32.dll; but if you want to try it use the following command line:
gswin32c -dNOPAUSE -dBATCH -dMaxStripSize=8192 -sDEVICE=tiffg3 -r204x196 -dDITHERPPI=200 -sOutputFile=test.tif prefix.ps test.pdf
it would convert your PDF into the high res G3 TIFF. and prefix.ps code is here:
<< currentpagedevice /InputAttributes get
0 1 2 index length 1 sub {1 index exch undef } for
/InputAttributes exch dup 0 <</PageSize [0 0 612 1728]>> put
/Policies << /PageSize 3 >> >> setpagedevice
another thing about this sdk is that it's open source; you're getting both c and ps (postscript) source code for it. Also if you're going with another tool check what kind of an engine they have to power the pdf rendering, it could happen they are using gs for it; like for instance LeadTools does.
hope this helps, regards

You can use the icepdf library (Apache 2.0 License).
They even provide this exact use case as one of their example source code:
http://wiki.icesoft.org/display/PDF/Multi-page+Tiff+Capture

Maybe it is not neccessary to convert the PDF into TIFF. The fax will most likely be an embedded image in the PDF, so you could just extract these images again. That should be possible with the already mentioned iText library.
I don't know if this is easier than the other approach.

Take a look at Apache PDFBox - A Java PDF Library

No Itext can not convert PDFs to Tiff.
However, there are commercial libraries that can do that. jPDFImages is a 100% java library that can convert PDF to images in TIFF, JPEG or PNG formats (and maybe JBIG? I am not sure). It can also do the reverse, create PDF from images. It starts at $300 for a server.

Here is a good article and wrapper classes for using GhostScript with C# .NET...ended up using this in production
http://www.codeproject.com/KB/cs/GhostScriptUseWithCSharp.aspx

I have some great experience with iText (now, I'm using 5.0.6 version) and this is the code for tiff convertion into pdf:
private static String convertTiff2Pdf(String tiff) {
// target path PDF
String pdf = null;
try {
pdf = tiff.substring(0, tiff.lastIndexOf('.') + 1) + "pdf";
// New document A4 standard (LETTER)
Document document = new Document(PageSize.LETTER, 0, 0, 0, 0);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(pdf));
int pages = 0;
document.open();
PdfContentByte cb = writer.getDirectContent();
RandomAccessFileOrArray ra = null;
int comps = 0;
ra = new RandomAccessFileOrArray(tiff);
comps = TiffImage.getNumberOfPages(ra);
// Convertion statement
for (int c = 0; c < comps; ++c) {
Image img = TiffImage.getTiffImage(ra, c + 1);
if (img != null) {
System.out.println("page " + (c + 1));
img.scalePercent(7200f / img.getDpiX(), 7200f / img.getDpiY());
document.setPageSize(new Rectangle(img.getScaledWidth(), img.getScaledHeight()));
img.setAbsolutePosition(0, 0);
cb.addImage(img);
document.newPage();
++pages;
}
}
ra.close();
document.close();
} catch (Exception e) {
logger.error("Convert fail");
logger.debug("", e);
pdf = null;
}
logger.debug("[" + tiff + "] -> [" + pdf + "] OK");
return pdf;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.