I'm printing to a file. Is there a way to print the text with strikethrough through it? I have done some googling, but did not find any applicable answers.
You would have to save the file in a PDF, HTML or create some kind of word processor document. Simple text (or more correctly plaintext) does not have formatting ... in any language ...
I'd recommend HTML. It is simple to create (PDF is a pain), gives you the option of other formatting (people always end up asking for a heading), allows you to format as tables (managers love tables), and will open anywhere (could even be served on a web-server, eliminating printing and tree-killing altogether).
If you want to force it, you can use the unicode index of those letters, like this:
"\u03C0" //π
http://unicode-table.com/de/0268/
This, as an example is the ɨ
Related
So I have a template pdf for an agenda, what I want to know is how do I detect where the date should be.
Lets say in the template there is the word “DATE:”.
After that I want add the corresponding date/text next to that space so I detect “DATE:” and after writing it looks something like “DATE: 13/02/2020” and save it as a new pdf
You tagged your question both java and python-3.x. That makes it very broad. My answer, therefore, also is generic, not specific. In general you should decide which language you ask for.
For your task you will need to do two things,
first apply text extraction with coordinates to your pdf, search for that DATE marker in the text, and determine the coordinates right after that text piece; some libraries allow a shortcut and have routines that only extract text matching a regular expressions and its coordinates;
then add text to your content at those coordinates.
Neither java nor python have explicit pdf support in their core. Thus, for your task you'll have to choose a pdf library for those tasks. (Theoretically you could try and implement your own pdf processing routines, but the pdf format is quite complex, so in general that would take very long.)
So you first should check which general purpose pdf library for your chosen language appears most appropriate for those tasks and your other requirements (like licensing). There are many questions and answers on stack overflow concerning text extraction which may help you in choosing.
Some words of warning, though, not all pdfs allow proper text extraction. There are pdf generators which don't add the information required for text extraction to pdfs; some actually even add misleading information. Thus, you might have to reject some templates. Alternatively, if the template is fixed, simply determine the correct coordinates for text insertion by measuring in a pdf viewer or by trial and error.
And if you still have influence on the requirements, propose to use templates with pdf AcroForm form fields. Form field fill-in allows more control for the template designer concerning the positioning and styling of the fill-ins, and fill-in is easier than the process outlined above. If you don't want form fields in the result pdfs, simply flatten the forms after fill-in.
When looking for a Java library which can produce (and read) SpreadsheetML 2003 files, I came across the Xelem library.
It almost perfectly fits all my needs; however, it does not seem to provide the possibility to format parts of strings in cells individually.
I.e. I need to be able to write some words in cells in bold text while others are not bold, which is why I cannot use styles (as they format the whole cell uniformly):
"non-bold-text and bold-text"
When creating and saving files containing such cells with Excel, Excel uses bold tags of the HTML namespace like this:
...
<Cell><ss:Data ss:Type="String" xmlns="http://www.w3.org/TR/REC-html40">non-bold-text and <B>bold-text</B></ss:Data></Cell>
...
However, I cannot find any possibility to get Xelem to generate such code. It does not seem to offer the ability to specify additional namespaces for a cell.
Am I missing something here, and if not, is there a (simple) workaround for this?
Hints for alternative libraries are also appreciated, but the file format needs to be SpreadsheetML 2003.
I have a .csv file with text, and am supposed to parse the data, and based on specific keywords, replace the words with the necessary html tags for linking the keywords to a website.
So far, I wrote a .csv parser and writer, that gets all the data from the columns required out of the first file, and prints those columns to a newly created (.csv) file (e.g. text id in one cell, text title in the next cell, and the actual text in the next cell).
Now I am still waiting to get a list of keywords, as well as the website hierarchy and links to put it, but to be honest I have no idea how to continue working on this. Somehow I'll have to parse down the website hierarchy to where the text title is present, and only consider elements beneath it, and link them to keywords in my text. How can this be done? I there special software of extensions, libs, packs for java to do something like this?
Any help would be appreciated, I'm running on a deadline here...
THX!
P.S.: I am coding all of it in java
I'm not sure, but it sounds like you want to create an href column in your output:
Visit W3Schools
You could do this most simply by concatenating the strings:
String makeHref(String title, String id, String link) {
return "<a href=" + ... etc. }
before you write out the second csv. You'll need to escape the "s, of course.
It's also entirely possible that I didn't understand the question. You may want to try to be more specific if that's the case.
I'm pretty sure the answer i'm going to get is: "why don't you just have the text files all be the same or follow some set format". Unfortunately i do not have this option but, i was wondering if there is a way to take any text file and translate it over to another text or xml file that will always look the same?
The text files pretty much have the same data just arranged differently.
The closest i can come up with is to have an XSLT sheet for each text file but, then i have to turn around and read the file that was just created, delete it, and repeat for each text file.
So, is there a way to grab the data off text files that essentially have the same data just stored differently; and store this data in an object that i could then re-use later on in some process?
If it was up to me, i would push for every text file to follow some predefined format since they all pretty much contain the same data but, it's not up to me.
Odd question... You say they are text files yet mention XSLT as a possible solution. XSLT will only work if the source is XML, if that is so, please redefine the question. If you say text files I assume delimiter separated (e.g. csv), fixed length,...
There are some parsers (like smooks) out there that allow you to parse multiple formats, but it will still require you to perform the "mapping" yourself of course.
This is a typical problem in the integration world so any integration tool should offer you a solution (e.g. wso2, fuse,...).
I have a program which will be used for building questions database. I'm making it for a site that want user to know that contet was donwloaded from that site. That's why I want the output be PDF - almost everyone can view it, almost nobody can edit it (and remove e.g. footer or watermark, unlike in some simpler file types). That explains why it HAS to be PDF.
This program will be used by numerous users which will create new databases or expand existing ones. That's why having output formed as multple files is extremly sloppy and inefficient way of achieving what I want to achieve (it would complicate things for the user).
And what I want to do is to create PDF files which are still editable with my program once created.
I want to achieve this by implementing my custom file type readable with my program into the output PDF.
I came up with three ways of doing that:
Attach the file to PDF and then corrupting the part of PDF which contains it in a way it just makes the PDF unaware that it contains the file, thus making imposible for user to notice it (easely). Upon reading the document I'd revert the corruption and extract file using one of may PDF libraries.
Hide the file inside an image which would be added to the PDF somwhere on the first or last page, somehow (that is still need to work out) hidden from the public eye. Knowing it's location, it should be relativley easy to retrieve it using PDF library.
I have learned that if you add "%" sign as a first character in line inside a PDF, the whole line will be ignored (similar to "//" in Java) by the PDF reader (atleast Adobe reader), making possible for me to add as many lines as I want to the PDF (if I know where, and I do) whitout the end user being aware of that. I could implement my whole custom file into PDF that way. The problem here is that I actually have to read the PDF using one of the Java's input readers, but I'm not sure which one. I understand that PDF can't be read like a text file since it's a binary file (Right?).
In the end, I decided to go with the method number 3.
Unless someone has any better ideas, and the conditions are:
1. One file only. And that file is PDF.
2. User must not be aware of the addition.
The problem is that I don't know how to read the PDF as a file (I'm not trying to read it as a PDF, which I would do using a PDF library).
So, does anyone have a better idea?
If not, how do I read PDF as a FILE, so the output is array of characters (with newline detection), and then rewrite the whole file with my content addition?
In Java, there is no real difference between text and binary files, you can read them both as an inputstream. The difference is that for binary files, you can't really create a Reader for it, because that assumes there's a way to convert the byte stream to unicode characters, and that won't work for PDF files.
So in your case, you'd need to read the files in byte buffers and possibly loop over them to scan for bytes representing the '%' and end-of-line character in PDF.
A better way is to use another existing way of encoding data in a PDF: XMP tags. This is allows any sort of complex Key-Value pairs to be encoded in XML and embedded in PDF's, JPEGs etc. See http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf.
There's an open source library in Java that allows you to manipulate that: http://pdfbox.apache.org/userguide/metadata.html. See also a related question from another guy who succeeded in it: custom schema to XMP metadata or http://plindenbaum.blogspot.co.uk/2010/07/pdfbox-insertextract-metadata-frominto.html
It's all just 1's and 0's - just use RandomAccessFile and start reading. The PDF specification defines what a valid newline character(s) is/are (there are several). Grab a hex editor and open a PDF and you can at least start getting a feel for things. Be careful of where you insert your lines though - you'll need to add them towards the end of the file where they won't screw up the xref table offsets to the obj entries.
Here's a related question that may be of interest: PDF parsing file trailer
I would suggest putting your comment immediately before the startxref line. If you put it anywhere else, you could wind up shifting things around and breaking the xref table pointers.
So a simple algorithm for inserting your special comment will be:
Go to the end of the file
Search backwards for startxref
Insert your special comment immediately before startxref - be sure to insert a newline character at the end of your special comment
Save the PDF
You can (and should) do this manually in a hex editor.
Really important: are your users going to be saving changes to these files? i.e. if they fill in the form field, are they going to hit save? If they are, your comment lines may be removed during the save (and different versions of different PDF viewers could behave differently in this regard).
XMP tags are the correct way to do what you are trying to do - you can embed entire XML segments, and I think you'd be hard pressed to come up with a data structure that couldn't be expressed as XML.
I personally recommend using iText for this, but I'm biased (I'm one of the devs). The iText In Action book has an excellent chapter on embedding XMP data into PDFs. Here's some sample code from the book (which I definitely recommend): http://itextpdf.com/examples/iia.php?id=217