Page replication error in Jsoup - java

I built one code to download the entire page as HTML with Jsoup. The download part is working as expected. But my problem is - The page is being replicated more than once in the browser when I open the downloaded file and I don't know whats going wrong. Check out the code below:
public class httptest {
static File file;
String crawlingNode;
static BufferedWriter writer = null;
static httptest ht;
public httptest() throws IOException{
file = new File(//***SET HERE YOUR TEST PATH***);
}
private void GetLinks() throws IOException{
Document doc = Jsoup.connect("http://google.com/search?q=mamamia")
.userAgent("Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)")
.cookie("auth", "token")
.timeout(3000)
.get();
Elements links = doc.select("*");
String crawlingNode = links.html();
System.out.println(crawlingNode);
httptest.WriteOnFile(writer, crawlingNode);
}
private static void OpenWriter(File file){
try {
writer = new BufferedWriter(new FileWriter(file));
} catch (IOException e) {
JOptionPane.showMessageDialog(null, "Failed to open URL Writer");
e.printStackTrace();
}
}
private static void WriteOnFile(BufferedWriter writer, String crawlingNode){
try {
writer.write(crawlingNode);
} catch (IOException e) {
JOptionPane.showMessageDialog(null, "Failed to write URL Node");
e.printStackTrace();
}
}
private static void CloseWriter(BufferedWriter writer){
try {
writer.close();
} catch (IOException e) {
JOptionPane.showMessageDialog(null, "Unable to close URL Writer");
System.err.println(e);
}
}
public static void main (String[] args) throws IOException{
ht = new httptest();
httptest.OpenWriter(file);
ht.GetLinks();
httptest.CloseWriter(writer);
}
}
Some parts of the code might seems weird but remember that this is the SSCCE code version. Any ideas of what might be helpfull please? Thanks in advance.

instead of:
Elements links = doc.select("*");
String crawlingNode = links.html();
System.out.println(crawlingNode);
httptest.WriteOnFile(writer, crawlingNode);
use:
Element links = doc.select("*").first();
String crawlingNode = links.html();
System.out.println(crawlingNode);
httptest.WriteOnFile(writer, crawlingNode);
I think the Elements type is more complex and detailed to use. I found this code change analysing this source: http://jsoup.org/cookbook/extracting-data/attributes-text-html
Anyway, this solution worked out for me.

Related

Need to throw error messages if path to an external app doesn't exist in Java

I have 4 jButtons that execute each specific program (Browsers for the time being), I need to make a conditional that tells a user that if a path to that specific program doesn't exist it will open a webpage with that specific browser in mind.
This piece of code repeats itself and I need to optimise it. I tried making another method where I would use if statement but instead it just gave me a lot of issues while executing each button.
Warning_MSG(); is just another method with jOptionPane to confirm if a user wants to open an external application, etc etc.
private void chromeMouseClicked(java.awt.event.MouseEvent evt) {
File file = new File("C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe");
if(!file.exists()){
try {
URI url = new URI("https://google.com/chrome");
java.awt.Desktop.getDesktop().browse(url);
System.err.println("File Error, Google Chrome is not found!\n");
} catch (IOException | URISyntaxException e) {
System.err.println("Unknown Error, Exception found\n" + e);
}
}else {
Warning_MSG(file);
}
}
private void firefoxMouseClicked(java.awt.event.MouseEvent evt) {
File file = new File("C:\\Program Files\\Mozilla Firefox\\firefox.exe");
if(!file.exists()){
try {
System.err.println("File Error, Mozilla Firefox is not found!\n");
URI url = new URI("https://www.mozilla.org/en-GB/firefox/download/thanks/");
java.awt.Desktop.getDesktop().browse(url);
} catch (IOException | URISyntaxException e) {
System.err.println("Unknown Error, Exception found\n" + e);
}
}else {
Warning_MSG(file);
}
}
private void ieMouseClicked(java.awt.event.MouseEvent evt) {
File file = new File("C:\\Program Files (x86)\\Internet Explorer\\iexplore.exe");
if(!file.exists()){
System.err.println("File Error, iExplorer is not found! \n");
}else {
Warning_MSG(file);
}
}
private void edgeMouseClicked(java.awt.event.MouseEvent evt) {
File file = new File("C:\\Windows\\SystemApps\\Microsoft.MicrosoftEdge_8wekyb3d8bbwe\\MicrosoftEdge.exe");
if(!file.exists()){
System.err.println("File Error, Microsoft Edge is not Found! \n");
}else {
Warning_MSG(file);
}
}
The question is, how can I optimise these pieces of code so that I don't have to repeat the same thing all over again with other programs but instead run certain things only when conditions are met?
You can create an enum with 2 parameters (exe path, download page)
enum Browser
{
Chrome("C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe", "https://google.com/chrome"),
Firefox("C:\\Program Files\\Mozilla Firefox\\firefox.exe", "https://www.mozilla.org/en-GB/firefox/download/thanks/"),
IExplorer("C:\\Program Files (x86)\\Internet Explorer\\iexplore.exe", "https://google.com/internet-explorer"),
Edge("C:\\Windows\\SystemApps\\Microsoft.MicrosoftEdge_8wekyb3d8bbwe\\MicrosoftEdge.exe", "https://google.com/download-edge");
private final String _filePath, _downloadPageURL;
Browser(String filePath, String downloadPage)
{
_filePath = filePath;
_downloadPageURL = downloadPage;
}
private String getDownloadPageURL()
{
return _downloadPageURL;
}
private String getFilePath()
{
return _filePath;
}
public boolean exists()
{
return new File(getFilePath()).exists();
}
public void openDownloadPage()
{
try
{
Desktop.getDesktop().browse(new URI(getDownloadPageURL()));
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
Then, all you have to do is a simple check at your methods.
private void chromeMouseClicked(java.awt.event.MouseEvent evt)
{
final Browser chrome = Browser.Chrome;
if (!chrome.exists())
{
chrome.openDownloadPage();
System.err.println("File Error, Google Chrome is not found!\n");
return;
}
// The browser exists at this line. So create your code
// ...
}

Synchronize rest application and saving to file

I've got kinda weird situation, I have methods:
public void generateRecords(Request request) {
String pathToFile = request.getPathFile();
String recordOne = generateRecordOne(request);
String recordTwo = generateRecordTwo(request);
fileService.writeToFile(pathToFile, recordOne);
fileService.writeToFile(pathToFile, recordTwo);
}
public void writeToFile(String path, String content) {
try {
FileWriter writer = new FileWriter(path, true);
writer.append(content);
writer.close();
} catch (IOException e) {
e.printStack();
}
}
generateRecords() is executing is rest endpoint. I am getting something like this:
id:1:record1
id:2:record1
id:1:record2
id:2:record2
While I would like to get something like this:
id:1:record1
id:1:record2
id:2:record1
id:2:record2
It is occuring sometimes, but still it is destroying my file. How can I avoid this?
Try using synchronized on writeToFile method.
Also, consider using the try-with-resources statement. In the code you have right now, an exception in your writer would lead to not closing the FileWriter.
public synchronized void writeToFile(String path, String content) {
try (FileWriter writer = new FileWriter(path, true)) {
writer.append(content);
} catch (IOException e) {
e.printStackTrace();
}
}

Generate doc use docx4j, it gose different on windows and linux server

when I use docx4j to generate doc from HTML and output through java servlet, it works well on Windows system, I can download and open the doc file normally.
When I put the project on Linux server, I can also download the doc file, but when openging file, it alert that the file is broken. I must click the confirm and restore the file.then open it normally.
my core code is like this.
how can i get the same result as windows?
code in jsp:
String vhtml = DownHtml2DocUtil.replaceSvgData2Base64(request);
response.reset();
response.setContentType("application/octet-stream");//设置为字节流
OutputStream output = null;
try {
output = response.getOutputStream();
response.addHeader("Content-Disposition", "attachment;filename=" + System.currentTimeMillis() + ".doc");
DownHtml2DocUtil.genDocFromHtml(vhtml, output);
} catch (Exception e) {
} finally {
try {
if (output != null) {
output.close();
}
} catch (Exception e) {
}
}
response.flushBuffer();
out.clear();
out = pageContext.pushBody();
code in java like this:
public static void genDocFromHtml(String html, OutputStream out)
throws EMPException {
try {
WordprocessingMLPackage wordMLPackage;
wordMLPackage = WordprocessingMLPackage.createPackage();
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(
wordMLPackage);
wordMLPackage.getMainDocumentPart().getContent()
.addAll(XHTMLImporter.convert(html, "utf-8"));
// wordMLPackage.save(out); -- i tried both method
new Save(wordMLPackage).save(out);
} catch (InvalidFormatException e) {
throw new EMPException(e);
} catch (Docx4JException e) {
throw new EMPException(e);
} catch (Exception e) {
e.printStackTrace();
}
}
any suggestion will be appreciate , thanks anyway;

Spring MVC Safe way to Upload, generate and download a file

I´m working on a WebApp with Spring MVC and Maven. I have the following process: First of all the User has to upload a file. Afterwards the uploaded file will be edited. Last but not least I want to create a download which contains the edited file.
The first step "Upload File" works well. I have a controller which contains the following POST method:
#RequestMapping(value = "/CircleUp", method = RequestMethod.POST)
public String circleUpPost(HttpServletRequest request, Model model, //
#ModelAttribute("circleUpForm") CircleUpForm circleUpForm) {
return this.doUpload(request, model, circleUpForm);
}
private String doUpload(HttpServletRequest request, Model model, //
CircleUpForm circleUpForm) {
File file = circleUpForm.getFile();
if (file != null) {
try {
//Some edit stuff
serialize(file, SerializationModeEnum.Standard);
} catch (Exception e) {
e.printStackTrace();
}
}
model.addAttribute("uploadedFiles", file);
return "uploadResult";
}
protected static String serialize(File file, SerializationModeEnum serializationMode) {
java.io.File test = null;
try {
test = java.io.File.createTempFile("Test", ".pdf");
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
try {
file.save(test, serializationMode);
} catch (Exception e) {
e.printStackTrace();
}
// test.deleteOnExit();
return test.getPath();
}
In the "serialize" Method my PDFClown File will be saved to a temp folder.
Afterwards the "uploadResult" page will be appear which contains the folloing code:
<%#taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
<html>
<head>
<meta charset="UTF-8">
<title>Download</title>
</head>
<body>
<h3>Download Files:</h3>
CircleUp
</body>
</html>
When the User clicks on the link another Controller will be called which handles the download. I dont know how to design the controller so that it can works with the edited file which I saved in my temp folder. I think it should look like that :
#RequestMapping(value = "/Download")
public void download(HttpServletRequest request, HttpServletResponse response) throws IOException {
final String temperotyFilePath = ???
String fileName = "Test.pdf";
response.setContentType("application/pdf");
response.setHeader("Content-disposition", "attachment; filename=" + fileName);
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
baos = convertPDFToByteArrayOutputStream(temperotyFilePath + "\\" + fileName);
OutputStream os = response.getOutputStream();
baos.writeTo(os);
os.flush();
} catch (Exception e1) {
e1.printStackTrace();
}
}
private ByteArrayOutputStream convertPDFToByteArrayOutputStream(String fileName) {
InputStream inputStream = null;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
inputStream = new FileInputStream(fileName);
byte [] buffer = new byte [1024];
baos = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1) {
baos.write(buffer, 0, bytesRead);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return baos;
}
I have two questions now:
How can the DownloadController attain the temp path to the file?
Is this process of Uploading,Generating and Downloading a File safe? Or is there a better way to handle this process?
I´m new to Spring MVC and WebApp Development and I´m thankful for every suggestion :)
You can use the same approach you use in the upload
test = java.io.File.createTempFile("Test", ".pdf");
All you need is to point to the same file and then read it.
If you need a custom dir for the files saving you can either define a property - my.file.path=some path here or
use system temp dir
public class Main {
public static void main(String[] args) {
String property = "java.io.tmpdir";
String tempDir = System.getProperty(property);
System.out.println("OS current temporary directory is " + tempDir);
}
}
Got the code from the link
Actually the approach is not safe. What to do if 2 different users upload files with the same name& What if one is uploaded and another user tries to download it? What is amount of files is millions? etc. etc.
It's better to use independent file storage but for test project it's fine

Transferring data from input to output File, getting Exception

My data is not getting transferred to the output file , I always get an Exception.
import java.io.*;
import java.util.*;
class TransferData {
public static void main(String[] args) {
String path1="E:\\IO\\Input.txt";
String path2="E:\\IO\\Output.txt";
int data;
System.out.println("Transfering started...");
try {
FileInputStream fis=new FileInputStream(path1);
FileOutputStream fos=new FileOutputStream(path2);
while((data=fis.read())!=-1) {
fos.write(data);
}
}
catch(Exception e) {
System.out.println("exception caught!");
}
System.out.println("Completed...");
}
}
How do I transfer data to output file ?
Tested this code on my local machine it is works without exceptions.
Check is file E:/IO/Input.txt exists.
IS Directory E:/IO is writeable for your user
(If file E:/IO/Output.txt already exists check is it writeable and not opened in another programm)
By code:
It is good practice to close FIS and FOS after programm finished execution.
public class TransferData {
public static void main(String[] args) {
String path1 = "E:\\IO\\Input.txt";
String path2 = "E:\\IO\\Output.txt";
int data;
System.out.println("Transfering started...");
FileInputStream fis = null;
FileOutputStream fos = null;
try {
fis = new FileInputStream(path1);
fos = new FileOutputStream(path2);
while ((data = fis.read()) != -1) {
fos.write(data);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (fis != null) {
fis.close();
}
if (fos != null) {
fos.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
System.out.println("Completed...");
}
}
If you replace System.out.println("exception caught!"); with e.printStackTrace(); then you will get a much more useful error message.
If you then post that error message here, people will be able to help you much more easily.
It could be the case that the program cannot find the file you're trying to read.
I highly suggest to use e.printStackTrace() as the others suggested.
One possible problem might be the filesystem permissions or the file you are trying to read from being not existent.
You might also want to use a "try with resources" to simplify your code.
Your code is missing a close statement for your Streams too.
All together your code would look something like this:
import java.io.*;
import java.util.*;
class TransferData {
public static void main(String[] args) {
String path1="E:\\IO\\Input.txt";
String path2="E:\\IO\\Output.txt";
int data;
System.out.println("Transfering started...");
try (
FileInputStream fis=new FileInputStream(path1);
FileOutputStream fos=new FileOutputStream(path2)
) {
while((data=fis.read())!=-1) {
fos.write(data);
}
}
catch(Exception e) {
e.printStackTrace();
}
}
}
One last thing, if you post your code on StackOverflow, please do not mix different formatting styles (e.g. { in the same line as an if and sometimes in the next) and try to have the code well formatted from the beginning.
Add e.printStackTrace() to your catch block, and post the data printed in your console here, people will be able to help you better.
The most likely cause of the exception getting thrown is that the system is not able to find the file "E:\\IO\\Input.txt" or "E:\\IO\\Output.txt" make sure that the file's are there and Output.txt is not set to read only.

Categories