I want all the user comments from this site : http://www.consumercomplaints.in/?search=chevrolet
The problem is the comments are just displayed partially, and to see the complete comment I have to click on the title above it, and this process has to be repeated for all the comments.
The other problem is that there are many pages of comments.
So I want to store all the complete comments in an excel sheet from the above site specified.
Is this possible ?
I am thinking of using crawler4j and jericho along with Eclipse.
My code for visitPage method:
#Override
public void visit(Page page) {
String url = page.getWebURL().getURL();
System.out.println("URL: " + url);
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String html = htmlParseData.getHtml();
// Set<WebURL> links = htmlParseData.getOutgoingUrls();
// String text = htmlParseData.getText();
try
{
String CrawlerOutputPath = "/DA Project/HTML Source/";
File outputfile = new File(CrawlerOutputPath);
//If file doesnt exists, then create it
if(!outputfile.exists()){
outputfile.createNewFile();
}
FileWriter fw = new FileWriter(outputfile,true); //true = append file
BufferedWriter bufferWritter = new BufferedWriter(fw);
bufferWritter.write(html);
bufferWritter.close();
fw.write(html);
fw.close();
}catch(IOException e)
{
System.out.println("IOException : " + e.getMessage() );
e.printStackTrace();
}
System.out.println("Html length: " + html.length());
}
}
Thanks in advance. Any help would be appreciated.
Yes it is possible.
Start crawling on your search site (http://www.consumercomplaints.in/?search=chevrolet)
Use the visitPage method of crawler4j to only follow comments and the ongoing pages.
Take the html Content from crawler4j and shove it to jericho
filter out the content you want to store and write it to some kind of .csv or .xls file (i would prefer .csv)
Hope this helps you
Related
My code below outputs a simple receipt which contains some details from the user like name, fare and stop number. This generates a PDF file containing those details. Whenever a new user inputs data in the main form, this just overwrite the data of the first user in the PDF file. How can I be able to create a new PDF file without appending or overwriting the original data of the first user? (like sample.pdf, sample2.pdf, sample3.pdf...and so on)
public class PDFDisplay {
public static void generatePDF(PassengerBean passengerBean) {
Document document = new Document();
try {
final Chunk NEWLINE = new Chunk("\n");
PdfWriter.getInstance(document,
new FileOutputStream("C://sample.pdf"));
document.open();
Image img = Image.getInstance("C:\\Documents and Settings\\Pinky\\My Documents\\Angel's files\\ICS 113\\eclipse_ws\\MRTApplicationIteration2\\WebContent\\image\\mrt.jpg");
document.add(img);
String or = "Official Receipt";
String hr = "-----------------------------------------------------------";
String spacer = " ";
String name = "Passenger Name: " + passengerBean.lname + "," + " " + passengerBean.fname;
String dest = "Destination: " + passengerBean.dest + " STATION";
String stopno = passengerBean.stop;
double fare = passengerBean.fare;
String fare1 = "Fare: PHP" + " " + String.valueOf(fare);
String ccnum = "CREDIT CARD NUMBER: " + " " + "************" + passengerBean.ccnum.substring(Math.max(0, passengerBean.ccnum.length() - 4));
Paragraph para10 = new Paragraph(32);
para10.setSpacingBefore(10);
para10.setSpacingAfter(10);
para10.add(new Chunk(or));
document.add(para10);
Paragraph para9 = new Paragraph(32);
para9.setSpacingBefore(30);
para9.setSpacingAfter(10);
para9.add(new Chunk(hr));
document.add(para9);
// Setting paragraph line spacing to 32
Paragraph para1 = new Paragraph(32);
para1.setSpacingBefore(5);
para1.setSpacingAfter(10);
para1.add(new Chunk(name));
document.add(para1);
Paragraph para2 = new Paragraph();
para2.setSpacingAfter(10);
para2.add(new Chunk(dest));
document.add(para2);
Paragraph para3 = new Paragraph();
para3.setSpacingAfter(10);
para3.add(new Chunk(stopno));
document.add(para3);
Paragraph para4 = new Paragraph();
para4.setSpacingAfter(10);
para4.add(new Chunk(fare1));
document.add(para4);
Paragraph para5 = new Paragraph();
para5.setSpacingAfter(10);
para5.add(new Chunk(ccnum));
document.add(para5);
document.close();
} catch (DocumentException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Almost all the methods you might need to achieve what you want can be found in the Java API documentation for the File class
You want to create a unique file that starts with sample and ends with pdf. To achieve this, you can use the createTempFile() method. This question was already answered on StackOverflow 6 years ago: What is the best way to generate a unique and short file name in Java
Suppose that you really want to have incremental numbers in your file name, e.g. sample0001.pdf, sample0002.pdf, sample0003.pdf and so on, then you can use the list() method. This returns an array of String values with the names of all files in a directory. I suggest that you use a FilenameFilter so that you only get the PDF files starting with sample. You could then sort these names to find the name with the highest number. See How to list latest files in a directory using FileNameFilter to find out how to create such a filter.
Once you have the file name with the highest number, it's only a matter of String manipulation to create a new filename. Use that filename (or that File instance) when you define the OutputStream.
As you can see, this answer doesn't mention iText anywhere and although the extension of the files we create or list is .pdf, it has nothing to do with PDF or PDF generation either. It's a pure Java question.
I've programmed a tool to check pdf files. The pdf files go through a check method: if there's an error it will display on which page it is. The pdf files are listed in a JTable.
When I'll right click on a pdf file in the table, a textbox appears with the error message.
I use textArea.append(text). But this shows me all errors of all pdf files. I just want to see the error of the selected pdf.
for (int pageNo = 1; pageNo < pdf.getPages(); pageNo++) {
try {
PCProperty content = pdf.getContent(pageNo, ContentCollationOptions.NONE);
if (content == null) {
error = "Error";
}
} catch (PDFDocumentException exception) {
error = "Error";
textArea.append("failed to read content on page " + pageNo + "\n");
}
}
For each new document, you can clear the existing text by passing null or an empty string to setText(). Alternatively, append() a visual separator and the name of the document to retain a running history of the documents examined:
textArea.append("***** " + pdf.getName() + "\n");
I'm writing a program that is supposed to taking in a bunch of tiff's and put them together. I got it to work for most of the image files I read in but a large batch of them throw out an error when I try to read them in.
Here is a snippet of code I have:
int numPages = 0;
inStream = ImageIO.createImageInputStream(imageFile);
reader.setInput(inStream);
while(true){
bufferedImages.add(reader.readAll(numPages, reader.getDefaultReadParam()));
numPages++;
}
Yes I catch the out of bounds exception so we don't have to worry about that. My problem is that I get the following error:
javax.imageio.IIOException: I/O error reading image metadata!
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageReader.readMetadata(TIFFImageReader.java:340)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageReader.seekToImage(TIFFImageReader.java:310)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageReader.prepareRead(TIFFImageReader.java:971)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageReader.read(TIFFImageReader.java:1153)
at javax.imageio.ImageReader.readAll(ImageReader.java:1067)
at sel.image.appender.ImageAppender.mergeImages(ImageAppender.java:59)
at sel.imagenow.processor.AetnaLTCProcessor.processBatch(AetnaLTCProcessor.java:287)
at sel.imagenow.processor.AetnaLTCProcessor.processImpl(AetnaLTCProcessor.java:81)
at sel.processor.AbstractImageNowProcessor.process(AbstractImageNowProcessor.java:49)
at sel.RunConverter.main(RunConverter.java:37)
Caused by: java.io.EOFException
at javax.imageio.stream.ImageInputStreamImpl.readShort(ImageInputStreamImpl.java:229)
at javax.imageio.stream.ImageInputStreamImpl.readUnsignedShort(ImageInputStreamImpl.java:242)
at com.sun.media.imageioimpl.plugins.tiff.TIFFIFD.initialize(TIFFIFD.java:194)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageMetadata.initializeFromStream(TIFFImageMetadata.java:110)
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageReader.readMetadata(TIFFImageReader.java:336)
... 9 more
I did make sure to add in the right JAI lib and my reader is using the "TIFF" type so the reader (and writer) is correct but for some reason the metadata is wrong. Now I can open and view all these images normally in windows so they really aren't corrupted or anything. Java just doesn't want to read them in right. Since I'm just using the stream meatadata to write them out later I don't care that much about the metadata I just need it to read in the file to the list so I can append it. I did find a writer.replaceImageMetaData method on the writer but the TIFFwriter version of IOWriter doens't have code for it. I'm stuck, anyone anything? Is there maybe a way to read in parts of the metadata to see what is wrong and fix it?
For anyone that would like to know I ended up fixing my own issue. It seems the the image metadata was a bit screwed up. Since I was just doing a plain merge and since I knew each image was one page I was able to use a buffered image to read in the picture then make it a IIOImage with null metadata. I used the stream metadata (which worked) to merge the images. Here is my complete method I use to merge a list of images:
public static File mergeImages(List<File> files, String argID, String fileType, String compressionType) throws Exception{
//find the temp location of the image
String location = ConfigManager.getInstance().getTempFileDirectory();
logger_.debug("image file type [" + fileType + "]");
ImageReader reader = ImageIO.getImageReadersByFormatName(fileType).next();
ImageWriter writer = ImageIO.getImageWritersByFormatName(fileType).next();
//set up the new image name
String filePath = location + "\\" + argID +"." + fileType;
//keeps track of the images we copied from
StringBuilder builder = new StringBuilder();
List<IIOImage> bufferedImages = new ArrayList<IIOImage>();
IIOMetadata metaData = null;
for (File imageFile:files) {
//get the name for logging later
builder.append(imageFile.getCanonicalPath()).append("\n");
if (metaData == null){
reader.setInput(ImageIO.createImageInputStream(imageFile));
metaData = reader.getStreamMetadata();
}
BufferedImage image = ImageIO.read(imageFile);
bufferedImages.add(new IIOImage(image, null, null));
}
ImageWriteParam params = writer.getDefaultWriteParam();
if (compressionType != null){
params.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
params.setCompressionType(compressionType);
}
ImageOutputStream outStream = null;
try{
outStream = ImageIO.createImageOutputStream(new File(filePath));
int numPages = 0;
writer.setOutput(outStream);
for(IIOImage image:bufferedImages){
if (numPages == 0){
writer.write(metaData, image, params);
}
else{
writer.writeInsert(numPages, image, params);
}
numPages++;
}
}
finally{
if (outStream != null){
outStream.close();
}
}
//set up the file for us to use later
File mergedFile = new File(filePath);
logger_.info("Merged image into [" + filePath + "]");
logger_.debug("Merged images [\n" + builder.toString() + "] into --> " + filePath);
return mergedFile;
}
I hope this help someone else because I know there isn't much on this issue that I could find.
I don't know how to download a CSV file. The CSV will be generated at runtime. Do I need to save the file in the tomcat WEB-INF directory first? I'm using JSF 1.2.
By the way, what's the favored JSF component for this kind of task?
Edit (05.05.2012 - 15:53)
I tried the solution BalusC stated in his first link, but if I click on my commandButton the content of the file is displayed on the webpage. Maybe there's a problem with the mimetype?
xhtml-file:
<a4j:form>
<a4j:commandButton action="#{surveyEvaluationBean.doDataExport}" value="#{msg.srvExportButton}" />
</a4j:form>
main bean:
public String doDataExport() {
try {
export.downloadFile();
} catch (SurveyException e) {
hasErrors = true;
}
return "";
}
export-bean:
public void downloadFile() throws SurveyException {
try {
String filename = "analysis.csv";
FacesContext fc = FacesContext.getCurrentInstance();
HttpServletResponse response = (HttpServletResponse) fc.getExternalContext().getResponse();
response.reset();
response.setContentType("text/comma-separated-values");
response.setHeader("Content-Disposition", "attachment; filename=\"" + filename + "\"");
OutputStream output = response.getOutputStream();
// writing just sample data
List<String> strings = new ArrayList<String>();
strings.add("filename" + ";" + "description" + "\n");
strings.add(filename + ";" + "this is just a test" + "\n");
for (String s : strings) {
output.write(s.getBytes());
}
output.flush();
output.close();
fc.responseComplete();
} catch (IOException e) {
throw new SurveyException("an error occurred");
}
}
Edit (05.05.2012 - 16:27)
I solved my problem. I have to use <h:commandButton> instead of <a4j:commandButton> and now it works!
Do I need to save the file in the tomcat WEB-INF directory first?
No, just write it straight to the HTTP response body as you obtain by ExternalContext#getResponseOutputStream() after having set the proper response headers which tells the browser what it's going to retrieve.
Do the math based on the concrete examples found in the following answers:
How to provide a file download from a JSF backing bean?
JSP generating Excel spreadsheet (XLS) to download
Basically:
List<List<Object>> csv = createItSomehow();
writeCsv(csv, ';', ec.getResponseOutputStream());
By the way, what's the favorite jsf-component for this kind of task?
This is subjective. But anyway, we're using <p:dataExporter> to full satisfaction.
If you are using JSF 2 you can use primefaces.
You can take a look at that link.
If not you can do it like that:
List<Object> report = new ArrayList<Object>(); // fill your arraylist with relevant data
String filename = "report.csv";
File file = new File(filename);
Writer output = new BufferedWriter(new FileWriter(file));
output.append("Column1");
output.append(",");
output.append("Column2");
output.append("\n");
//your data goes here, Replcae Object with your bean
for (Object row:report){
output.append(row.field1);
output.append(",");
output.append(row.field2);
output.append("\n");
}
output.flush();
output.close();
This question has been asked few times in forums, but in my code, i can't display my image. I think it's not the right method :
webViewContact.loadData(db.getParametres().get(0).getInformationParam(), "text/html", "utf-8");
getInformationParam() recup the HTML code, like :
<img src=\\"file:///android_asset/logoirdes_apropos.jpg\\"/> <b>Test</b>
My image file is in drawable, how i can display it ?
There are restrictions about the HTML loaded with loadData() can do. Suggest using loadUrl:
webViewContact.loadUrl("file:///android_asset/" + db.getParametres().get(0).getInformationParam())
You can try the following code, and your file will be at: htmlFile. You can certainly do it in UI thread for now, but you might consider to move this to a AsyncTask later in real production if the file is huge.
String directory = Environment.getExternalStoragePublicDirectory("html_cache");
Writer output;
try {
directory.mkdir();
File htmlFile = new File(directory + File.separator + "give_a_name.html");
String content = db.getParametres().get(0).getInformationParam();
// assumes default encoding is OK!
output = new BufferedWriter(new FileWriter(htmlFile));
output.write( aContents );
}
finally {
output.close();
}