converting .prn file in to html page using java - java

how to convert .prn file in to html page using java.
I am treating it as a text file and reading it line by line but thats quite cumbersome as each line requires its own splitting logic. As prn file is nicely formatted can we directly extract the file and load it as an html?any suggessions?

Since a .prn file is byte stream that is sent to printer for printing, I think you are going to have to keep using your custom parser as it doesn't appear that the Java Print Service has any options for parsing.
If the tags are consistent with other file formats it may be worth while to check out other parsing libraries such as simple.json and modify them to your needs.

Related

read the contents from html and write in excel sheet in java

Please help me to read a html file and write it in an excel file by using java . I have searched the net , I could get only for copying tables , I need the data in it to be read and write in an excel file .
contents in html file
<title>**Deprecated Method Found**</title>
<h2>**Summary**</h2>
Deprecated API is error-prone and is a potential security threat and thus should not be used.
<h2>**Description**</h2>
Old API is sometimes marked deprecated because its implementation is designed in a way that can be error-prone. Deprecated API should be avoided where possible.
<h2>**Security Implications**</h2>
Blocks of code that use deprecated API are designed in a careless manner and thus are a potential security threat.
I need these separate headings in separate columns and the contents in rows .
Is it possible to parse this html to excel file
Try reading the HTML using jsoup, a Java HTML Parser. You can then save it as a CSV (Comma separated values) format which can be opened in excel.
jsoup to read text after element
Get all h2 elements using
Elements h2s = document.select("h2");
Then iterate over the elements reading what the heading says. If this heading is important, use the following code to get the text following that tag.
String text = h2.nextSibling().toString();
Sample CSV opened in Excel:

Save image as string in a File using Java

I am using a library which converts html to image and saves image as a file.
Here I don't want image to be saved as a file, rather want it to be in a string which I can process further.
Is there any way I can do that.
Sorry if this question is already asked.
The short answer is: NO!
I assume Image is a binary format of some sort ... and it will represent the parsed html output as a .jpg and thus has basically merged all layers of the DOM without you being able to reproduce the DOM from the binary representation.
You may want to save html code as text and even better as (.html or if need be as .xml) and put that into a file!
You want to work with Binary!
If you want Binary in Base64 use the Apache Comons Library for convertion:
http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
This is the most commonly used Base64 library!
Also check out Apache Commons IOUtils!!
Example Usage here:
Base64.encode(FileUtils.readFileToByteArray(file));

Parsing XML file from the end of file

I want to use XML for storing some data. But I do not want read full file when I want to get the last data that was inserted there, as well as I do not want to rewrite full file when adding new data there. Is there a standard way in java to parse xml file not from the beginning but from the end. So that for example SAX or StaX parser will first encounter last closing root tag and than last tag. Or if I want to do this I should read and write everything like I am reading/writing regular text file?
Fundamentally, XML is a poor representation choice for this. The format is inherently "contained" like this, and I haven't seen any APIs which encourage you to fight against that.
Options:
Choose a different format entirely (e.g. use a database)
Create lots of small XML files instead - each one self-contained. When you want the whole of the data, read all the files
Just swallow the hit and read/write the whole file each time.
I found a good topic on this with example solutions for what I want.
This link: http://www.oreillynet.com/xml/blog/2007/03/parsing_xml_backwards.html
Seems that XML is not good file format to achieve what I want. There is no standard parser that can parse XML from the end instead of beginning.
Probably the best solution for will be storing all xml data in one file that contains composition of many xml files contents. On each line stored separate contents of XML. The file itself is not well formed XML but each line contains well formed xml that I will parse using standard xml parser(StaX).
This way I will be able to read just lines from the end of file and append new data to the end of file. When I need the whole data or only the part of it I will read all line or part of them. Probably I can also implement pagination from the end of file for that because the file can be big.
Why XML in each line? I think it is easy to use API for parsing it as well as it is human readable to store data in xml instead of just separating values in the line with some symbol.
Why not use sax/stax and simply process only your last entry? Yes, it will need to open and go through the whole file, but at least it's fairly efficient as opposed to loading the whole DOM tree.
Short of doing that, I don't think you can do what you're asking using XML as a source.
Another alternative, apart from the ones provided by Jon Skeet in his answer, would be to keep the same format but insert the latest entries first, and stop processing the files as soon as you've read your entry.

Print XML as PDF using PCL

I have an xml file with which i want to print as a PDF using PCL. I am new to PCL. Can i use PCL to get the xml printed in PDF format directly or should i have some intermediate process to create a PDF file and then use PCL to get it printed as PDF?
If you have a xml, there are two ways to recieve PDF file.
1. Create stylesheet for your xml, and use XEP
or
2. use just your xml and VisualXSL, which will help you create your pdf for print.
More additional: If you will create your xsl stylsheet, you can format by XEP many type of PDFs, for example PDF/1A, or another levels
Both XEP and VisualXSL are Renderx products(http://www.renderx.com/tools/index.html) and they have trial versions, that you can use:). I have used both products many times, and was satisfied.
You can also visit the forum where you can find answers about how to use and how usefull are products described above. http://cooltools.renderx.com
PCL is a printer control language. In other words command bytes you send to a (usually HP) printer which is then converted to ink on a page. This is normally not the way you will generate a PDF since too much information from the original will be lost.
You will normally want to convert your XML to something describing the actual print you want to have. A reasonable choice for this is the XSL-FO XML dialect which, however, is not very nice to do by hand. You can then choose to convert your XML into DocBook XML which in turn has very nice style sheets for converting further on to XSL-FO and other formats.
You can then use Apache FOP to convert XSL-FO into many formats, one being PDF. This allows you to - if FOP gets too small - to replace with one of several commercial XSL_FO rendering engines at a later date.

How to find if the file is a CSV file?

I have a scenario wherein the user uploads a file to the system. The only file that the system understands in a CSV, but the user can upload any type of file eg: jpeg, doc, html. I need to throw an exception if the user uploads anything other than CSV file.
Can anybody let me know how can I find if the uploaded file is a CSV file or not?
CSV files vary a lot, and they all could be called, legitimately, CSV files.
I guess your approach is not the best one, the correct approach would be to tell if the uploaded file is a text file the application can parse instead of it it's a CSV or not.
You would report errors whenever you can't parse the file, be it a JPG, MP3 or CSV in a format you cannot parse.
To do that, I would try to find a library to parse various CSV file formats, else you have a long road ahead writing code to parse many possible types of CSV files (or restricting the application's flexibility by supporting few CSV formats.)
One such library for Java is opencsv
If you're using some library CSV parser, all you would have to do is catch any errors it throws.
If the CSV parser you're using is remotely robust, it will throw some useful errors in the event that it doesn't understand the file format.
I can think of several methods.
One way is to try to decode the file using UTF-8. (This is built into Java and is probably built into .NET too.) If the file decodes properly, then you at least know that it's a text file of some kind.
Once you know it's a text file, parse out the individual fields from each line and check that you get the number of fields that you expect. If the number of fields per line is inconsistent then you might just have a file that contains text but is not organized into lines and fields.
Otherwise you have a CSV. Then you can validate the fields.
If it's a web application, you might want to check the content-type HTTP header the browser sends when uploading/posting a file through a form.
If there's a bind for the language you're using, you might also try using libmagic, is pretty good at recognizing file types. For example, the UNIX tool file uses it.
http://sourceforge.net/projects/libmagic/
I don't know if you can tell for 100% certain in any way, but I'd suggest that the first validations should be:
Is the file extension .csv
Count the number of commas in the file per line, there should normally be the same amount of commas on each line of the file for it to be a valid CSV file. (As Jkramer said, this only works if the files can't contain quoted commas).
try this one :
String type = Files.probeContentType(Paths.get(filepath));
I solved it like this: read the file with UTF-16 encoding, if no comma is found in the file, it means UTF-16 encoding didnt work. Which means that this csv file is of Excel format (NOT plain text).
if(fileA.endsWith(".csv") && fileB.endsWith(".csv")) {
second_list=readCSVFile(fileA);
new_list=readCSVFile(fileB);
if(!String.join("", second_list).contains(",") || !String.join("", new_list).contains(",")) {
//read these files with UTF-8 encoding
System.out.println("[WARN] csv files will be read like text files. (UTF-16 encoding couldnt find any comma in the file i.e., UTF-16 encoding didn't work)");
second_list=readFile(fileA);
new_list=readFile(fileB);
} else {
// keep the csv as UTF-16 encoded
}

Categories