Generate random emoji with plain Java - java

is there any easy way to create random emojis using java?
Did only find libraries which use a file as their data source.

There’s a full list of unicode-supported emoji here. As you scroll down this list, you’ll see why these libraries choose to rely on a data file instead of a set of generation rules.
Many emoji characters are represented by multiple code points, not all of which are even in the same code page, let alone the same length. To map a random number generator to the spectrum of emoji would be a very difficult (impossible?) task without a complete list of emoji.

Related

Adding text to a pdf?

So I have a template pdf for an agenda, what I want to know is how do I detect where the date should be.
Lets say in the template there is the word “DATE:”.
After that I want add the corresponding date/text next to that space so I detect “DATE:” and after writing it looks something like “DATE: 13/02/2020” and save it as a new pdf
You tagged your question both java and python-3.x. That makes it very broad. My answer, therefore, also is generic, not specific. In general you should decide which language you ask for.
For your task you will need to do two things,
first apply text extraction with coordinates to your pdf, search for that DATE marker in the text, and determine the coordinates right after that text piece; some libraries allow a shortcut and have routines that only extract text matching a regular expressions and its coordinates;
then add text to your content at those coordinates.
Neither java nor python have explicit pdf support in their core. Thus, for your task you'll have to choose a pdf library for those tasks. (Theoretically you could try and implement your own pdf processing routines, but the pdf format is quite complex, so in general that would take very long.)
So you first should check which general purpose pdf library for your chosen language appears most appropriate for those tasks and your other requirements (like licensing). There are many questions and answers on stack overflow concerning text extraction which may help you in choosing.
Some words of warning, though, not all pdfs allow proper text extraction. There are pdf generators which don't add the information required for text extraction to pdfs; some actually even add misleading information. Thus, you might have to reject some templates. Alternatively, if the template is fixed, simply determine the correct coordinates for text insertion by measuring in a pdf viewer or by trial and error.
And if you still have influence on the requirements, propose to use templates with pdf AcroForm form fields. Form field fill-in allows more control for the template designer concerning the positioning and styling of the fill-ins, and fill-in is easier than the process outlined above. If you don't want form fields in the result pdfs, simply flatten the forms after fill-in.

PDF font subsetting and subset merging in Java

I have a part in my code where I am programatically filling out PDF forms using iText Java based on user-entered data, and then I concat a number of such PDFs into one using iText again.
The PDF forms that are getting merged can be (and usually are) different.
The resulting PDF is way too large - looking at it, 98% of the space is taken by fonts.
The way I understand it, what happens is that the individual PDF forms have different font subsets, so when I merge them, I get massive amount of duplicate glyphs, except that the subsets are not identical, so I can't get rid of them without merging the subsets.
The other problem is that the PDF forms themselves might not even contain subsets, but heavily packed fonts that have 2000+ glyphs, so even if I manage to leave only one instance of that font in the PDF, that still can be many megabytes. Hence it seems that I need to be able to 1) create and 2) merge existing font subsets.
The quirk is that I do not control neither the PDF forms (that are being filled out) nor their number, nor the order in which they are concatenated, so it is not possible to solve this by controlling what kind of fonts are embedded in them.
Adobe Acrobat can of course solve such a problem - it can create and also merge font subsets - but I need a programatic, server-side solution. According to google hits, iText cannot do this. Is there another library that I could use (or anything else I can do)?

rendering pdf on webside ala google documents

In a current project i need to display PDFs in a webpage. Right now we are embedding them with the Adobe PDF Reader but i would rather have something more elegant (the reader does not integrate well, it can not be overlaid with transparent regions, ...).
I envision something close google documents, where they display PDFs as image but also allow text to be selected and copied out of the PDF (an requirement we have).
Does anybody know how they do this? Or of any library we could use to obtain a comparable result?
I know we could split the PDFs into images on server side, but this would not allow for the selection of text ...
Thanks in advance for any help
PS: Java based project, using wicket.
I have some suggestions, but it'll be definitely hard to implement this stuff. Good luck!
First approach:
First, use a library like pdf-renderer (https://pdf-renderer.dev.java.net/) to convert the PDF into an image. Store these images on your server or use a caching-technique. Converting PDF into an image is not hard.
Then, use the Type Select JavaScript library (http://www.typeselect.org/) to overlay textual data over your text. This text is selectable, while the real text is still in the original image. To get the original text, see the next approach, or do it yourself, see the conclusion.
The original text then must be overlaid on the image, which is a pain.
Second approach:
The PDF specifications allow textual information to be linked to a Font. Most documents use a subset of Type-3 or Type-1 fonts which (often) use a standard character set (I thought it was Unicode, but not sure). If your PDF document does not contain a standard character set, (i.e. it has defined it's own) it's impossible to know what characters are which glyphs (symbols) and thus are you unable to convert to a textual representation.
Read the PDF document, read the graphics-objects, parse the instructions (use the PDF specification for more insight in this process) for rendering text, converting them to HTML. The HTML conversion can select appropriate tags (like <H1> and <p>, but also <b> and <i>) based on the parameters of the fonts (their names and attributes) used and the instructions (letter spacing, line spacing, size, face) in the graphics-objects.
You can use the pdf-renderer library for reading and parsing the PDF files and then code a HTML translator yourself. This is not easy, and it does not cover all cases of PDF documents.
In this approach you will lose the original look of the document. There are some PDF generation libraries which do not use the Adobe Font techniques. This also is a problem with the first approach, even you can see it you can not select it (but equal behavior with the official Adobe Reader, thus not a big deal you'd might say).
Conclusion:
You can choose the first approach, the second approach or both.
I wouldn't go in the direction of Optical Character Recognition (OCR) since it's really overkill in such a problem, since it also has several drawbacks. This approach is Google using. If there are characters which are unrecognized, a human being does the processing.
If you are into the human-processing thing; you can only use the Type Select library and PDF to Image conversion and do the OCR yourself, which is probably the easiest (human as a machine = intelligently cheap, lol) way to solve the problem.

Long text input from user and PDF generation

I have built a web application that can be seen as an overcomplicated application form. There are bunch of text areas with a given character limit. After the form submission various things happen and one of them is PDF generation.
The text is queried from the DB and inserted in the PDF template created in iReports. This works fine but the major pain is overflowing text.
The maximum number of characters is set based on 'average' text. But sometimes people prefer to write with CAPS or add plenty of linefeeds to format their text. These then cause user's text to overflow the space given in PDF. Unfortunately the PDF document must look like a real application form so I cannot allow unlimited space.
What kind of approaches you have used to tackle this?
Clean/restrict user input?
Calculate the space requirement of the text based on font metrics?
Provide preview of the PDF? (too bad users are not allowed to change their input after submission...)
Ideally, calculate the requirement based on metrics. I don't know how iReports handles text, but with iText, it lays everything out itself, you just present the data as a streaming document, so we don't worry about overflowing text.
However, iReport may not support that, or you may need to have the PDF layout fit within certain bounds. I'd try to clean the input (ie: if it's all caps, lowercase/sentence case/proper case it), strip extra whitespace. If cleaning the input can't be reliably done, or people are still getting past that, I'd also restrict it.
As a last resort, I'd present the PDF for the user to authorize. Really, users shouldn't be given more work to do, and they're not going to do it anyways.
Your own suggested solutions to your problem are all good. Probably the most important question to have answered is what should your PDF look like when the data to be displayed in a field won't fit? Do you ever need the "full answer" for anything else? When you know the answer to these, you'll have your options reduced.
For example if a field must be limited to 1/2 a page, and users sometimes enter more than 1/2 a page of text you can either
1) limit the user input - on submission calculate the size (using font-metrics as you said) and reject the submission until corrected. This assumes you can legitimately force the user to reduce their data entry.
2) accept the user input and truncate in the display of this report. Some systems use "..." to indicate data has been truncated, and can provide a hyperlink (even within the PDF) to get more information.
Providing a preview would work really well, but only if the users are good at checking and correcting and your system can handle the extra load this will generate.
Do you have control of the font that is used when generating the PDF? If so, I would look for a font in the Monospace family. This will give you consistent length for a given number of chars, regardless of puncuation, capitalization, etc.

BarCode Image Generator in Java

How can I create a barcode image in Java? I need something that will allow me to enter a number and produce the corresponding barcode image. Is there a free library available for this type of task?
iText is a great Java PDF library. They also have an API for creating barcodes. You don't need to be creating a PDF to use it.
This page has the details on creating barcodes. Here is an example from that site:
BarcodeEAN codeEAN = new BarcodeEAN();
codeEAN.setCodeType(codeEAN.EAN13);
codeEAN.setCode("9780201615883");
Image imageEAN = codeEAN.createImageWithBarcode(cb, null, null);
The biggest thing you will need to determine is what type of barcode you need. There are many different barcode formats and iText does support a lot of them. You will need to know what format you need before you can determine if this API will work for you.
There is also this free API that you can use to make free barcodes in java.
Barbecue
There is a free library called barcode4j
ZXing is a free open source Java library to read and generate barcode images. You need to get the source code and build the jars yourself. Here's a simple tutorial that I wrote for building with ZXing jars and writing your first program with ZXing.
[http://www.vineetmanohar.com/2010/09/java-barcode-api/]
I use
barbeque
, it's great, and supports a very wide range of different barcode formats.
See if you like
its API
.
Sample API:
public static Barcode createCode128(java.lang.String data)
throws BarcodeException
Creates a Code 128 barcode that
dynamically switches between character
sets to give the smallest possible
encoding. This will encode all
numeric characters, upper and lower
case alpha characters and control
characters from the standard ASCII
character set. The size of the barcode
created will be the smallest possible
for the given data, and use of this
"optimal" encoding will generally
give smaller barcodes than any of the
other 3 "vanilla" encodings.

Categories