Basically, assuming I have a link to a google slide (as provided by user), for example:
https://docs.google.com/presentation/d/1c3TbLKMVwOqgP70l0ph2jSvIHAaZnZoSMnvW8cxs8Ik/edit?usp=sharing
Ultimately, I want to answer the following:
Is there a way to get the slides into an array of some sort of image file?
Then, is there a way to get the speaker notes into an array of Strings?
Since we have the link (assuming it is set as "Public-anyone can view"), we theoretically have access to all these information, as we can access it in the Google Slides page. However, how do we extract it algorithmically? Is there a specific place where these information is stored in the Google server where I can just retrieve it?
You can download the whole presentation in various formats:
https://docs.google.com/presentation/d/<PRESENTATION_ID>/export/<FORMAT>
Possible formats are: pdf, pptx, odp, txt (strings only)
e.g.
https://docs.google.com/presentation/d/1c3TbLKMVwOqgP70l0ph2jSvIHAaZnZoSMnvW8cxs8Ik/export/pptx
To download slides as images, you have to specify a particular slide (page):
https://docs.google.com/presentation/d/<PRESENTATION_ID>/export/<FORMAT>?pageid=<PAGE_ID>
Possible formats are: jpeg, svg, png
e.g.
https://docs.google.com/presentation/d/1c3TbLKMVwOqgP70l0ph2jSvIHAaZnZoSMnvW8cxs8Ik/export/jpeg?pageid=g1f5653c4cc_0_437
You can find the page id at the end of a URL (http://docs.google...slide=id.<PAGE_ID>).
Related
I want to write function in my android application that will confirm that picture was
taken by specific mobile device.
Today I can tell, if the picture was taken by specific model. (like galaxy s3).
But,I want more than that.
I would like to get from image EXIF unique id and compare it to the device unique id.
Thanks for help.
Maybe instead of using a long code. You can use a pre-ready Library for that.
Metadata metadata = ImageMetadataReader.readMetadata(imageFile);
This is the example code used by the Library, https://drewnoakes.com/code/exif/
It would give you the MetaData for the image. Which is also the content you want, Like the GeoLocation, CellularInformation, Resolution, Focus and all other data attached to the image.
A qoutation from the link
It understands several formats of metadata, all of which may be present in a single image: In addition it can decode values that are specific to particular camera manufacturers and models.
Then you can check for the details on that image and show or hide or what so ever you want to do with that image.
You can get the code from Google Code too: https://code.google.com/p/metadata-extractor/
Good luck!
I have tagged a few keywords to an image in Photoshop and I'm attempting to read these keywords using Java. But right now I'm only able to read the image's format with the help of other questions that has been asked here. I have gone through the ImageIO class documentation and I could not find any functions that extract keywords out from image. Is there any other way to extract out keywords and other information such as data and time created?
Can anybody help me on reading data from google webpage. for example:i want to read the links, author names below the links and PDF or HTML links on the right side to my database using Java.
Please find the link here:
http://scholar.google.com/scholar?hl=en&q=visualization&btnG=&as_sdt=1%2C4&as_sdtp=
What you're asking about is called data extraction. You need to load the HTML page and then logically select the pieces of information from the HTML.
Start by using an HTML parser to read the HTML page and then look for the patterns in how Google lays out its scholar links. You might find that things are listed in an un-ordered list, or maybe certain elements have an identifying tag or class that you can use to extract the data you want.
I have created a program that should one day become a PDF editor
It's purpose will be saving GUI's textual content to the PDF, and loading it from it. GUI resembles text editor, but it only has certain fields(JTextAreas, actually).
It can look like this (this is only one page, it can have many more, also upper and lower margins are cut out of the picture) It should actually resemble A4 in pixel size.
I have looked around for a bit for PDF libraries and found out that iText could suit my PDF creating needs, however, if I understood it correct, it retirevs text from a whole page as a string which won't work for me, because I will need to detect diferent fields/paragaphs/orsomething to be able to load them back into the program.
Now, I'm a bit lazy, but I don't want to spend hours going trough numerus PDF libraries just to find out that they won't work for me.
Instead, I'm asking someone with a bit more Java PDF handling experience to recommend me one according to my needs.
Or maybe recommend me how to add invisible parts to PDF which will help my program to determine where is it exactly situated insied a PDF file...
Just to be clear (I formed my question wrong before), only thing I need to put in my PDF is text, and that's all I need to later be able to get out. My program should be able to read PDF's which he created himself...
Also, because of the designated use of files created with this program, they need to be in the PDF format.
Short Answer: Use an intermediate format like JSON or XML.
Long Answer: You're using PDF's in a manner that they wasn't designed for. PDF's were not designed to store data; they were designed to present and format data in an portable form. Furthermore, a PDF is a very "heavy" way to store data. I suggest storing your data in another manner, perhaps in a format like JSON or XML.
The advantage now is that you are not tied to a specific output-format like PDF. This can come in handy later on if you decide that you want to export your data into another format (like a Word document, or an image) because you now have a common representation.
I found this link and another link that provides examples that show you how to store and read back metadata in your PDF. This might be what you're looking for, but again, I don't recommend it.
If you really insist on using PDF to store data, I suggest that you store the actual data in either XML or RDF and then attach that to the PDF file when you generate it. Then you can read the XML back for the data.
Assuming that your application will only consume PDF files generated by the same application, there is one part of the PDF specification called Marked Content, that was introduced precisely for this purpose. Using Marked Content you can specify the structure of the text in your document (chapter, paragraph, etc).
Read Chapter 14 - Document Interchange of the PDF Reference Document for more details.
In a current project i need to display PDFs in a webpage. Right now we are embedding them with the Adobe PDF Reader but i would rather have something more elegant (the reader does not integrate well, it can not be overlaid with transparent regions, ...).
I envision something close google documents, where they display PDFs as image but also allow text to be selected and copied out of the PDF (an requirement we have).
Does anybody know how they do this? Or of any library we could use to obtain a comparable result?
I know we could split the PDFs into images on server side, but this would not allow for the selection of text ...
Thanks in advance for any help
PS: Java based project, using wicket.
I have some suggestions, but it'll be definitely hard to implement this stuff. Good luck!
First approach:
First, use a library like pdf-renderer (https://pdf-renderer.dev.java.net/) to convert the PDF into an image. Store these images on your server or use a caching-technique. Converting PDF into an image is not hard.
Then, use the Type Select JavaScript library (http://www.typeselect.org/) to overlay textual data over your text. This text is selectable, while the real text is still in the original image. To get the original text, see the next approach, or do it yourself, see the conclusion.
The original text then must be overlaid on the image, which is a pain.
Second approach:
The PDF specifications allow textual information to be linked to a Font. Most documents use a subset of Type-3 or Type-1 fonts which (often) use a standard character set (I thought it was Unicode, but not sure). If your PDF document does not contain a standard character set, (i.e. it has defined it's own) it's impossible to know what characters are which glyphs (symbols) and thus are you unable to convert to a textual representation.
Read the PDF document, read the graphics-objects, parse the instructions (use the PDF specification for more insight in this process) for rendering text, converting them to HTML. The HTML conversion can select appropriate tags (like <H1> and <p>, but also <b> and <i>) based on the parameters of the fonts (their names and attributes) used and the instructions (letter spacing, line spacing, size, face) in the graphics-objects.
You can use the pdf-renderer library for reading and parsing the PDF files and then code a HTML translator yourself. This is not easy, and it does not cover all cases of PDF documents.
In this approach you will lose the original look of the document. There are some PDF generation libraries which do not use the Adobe Font techniques. This also is a problem with the first approach, even you can see it you can not select it (but equal behavior with the official Adobe Reader, thus not a big deal you'd might say).
Conclusion:
You can choose the first approach, the second approach or both.
I wouldn't go in the direction of Optical Character Recognition (OCR) since it's really overkill in such a problem, since it also has several drawbacks. This approach is Google using. If there are characters which are unrecognized, a human being does the processing.
If you are into the human-processing thing; you can only use the Type Select library and PDF to Image conversion and do the OCR yourself, which is probably the easiest (human as a machine = intelligently cheap, lol) way to solve the problem.