How do I fetch a different url from a page in java? - java

I am working on a program to download the first 100 comics from the XKCD website, however the URL for XKCD differs from the image url. For the sake of ease, I was wondering if there was a simple way to grab the URL for the image after going to the XKCD URL. Here is my code:
public class XKCD {
public static void saveImage(String imageUrl, int i) throws IOException {
URL url = new URL(imageUrl);
String fileName = url.getFile();
String destName = i + fileName.substring(fileName.lastIndexOf("/"));
System.out.println(destName);
InputStream is = url.openStream();
OutputStream os = new FileOutputStream(destName);
byte[] b = new byte[2048];
int length;
while ((length = is.read(b)) != -1) {
os.write(b, 0, length);
}
is.close();
os.close();
}
public static void main(String[] args) throws MalformedURLException,
IOException {
for(int i=1;i<=100;i++){
saveImage("https://xkcd.com/"+i+"/", i);
}
}

XKCD Has a JSON API: https://xkcd.com/about/
Is there an interface for automated systems to access comics and metadata?
Yes. You can get comics through the JSON interface, at URLs like http://xkcd.com/info.0.json (current comic) and http://xkcd.com/614/info.0.json (comic #614).
Here is a good java JSON library: https://github.com/stleary/JSON-java
REALLY easy to use, I have used it a lot.
So if you have the text from xkcd.com/info.0.json in txt, you say:
import org.json.*;
JSONObject obj=new JSONObject(txt);
String url=obj.getString("img");
String titleText=obj.getString("alt");
int year=Integer.parseInt(obj.getString("year"));
int num=Integer.parseInt(obj.getString("num"));
int month=Integer.parseInt(obj.getString("month"));
int day=Integer.parseInt(obj.getString("day"));
String title=obj.getString("title");
Image img=downloadImageOrWhateverYouDoWithTheImageURL(url);
This should work.

I suggest using JSOUP for that. It can produce an absolute URL from a relative link:
You can import the library into your project using:
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.2</version>
</dependency>
And you can get the absolute path of the image using simply code like this:
public static void main(String[] args) throws IOException {
Document document = Jsoup.connect("https://xkcd.com/").get();
Elements links = document.select("img");
links.stream()
.map(link -> link.absUrl("src"))
.filter(str -> str.contains("/comics"))
.forEach(System.out::println);
}
If you run this code you will see the image URL printed out on the console:
https://imgs.xkcd.com/comics/river_border.png

The issue here is that you're calling saveImage method not with image, but page URL.
Get the page itself, then parse regex from such example string :
"Image URL (for hotlinking/embedding): https://imgs.xkcd.com/comics/barrel_cropped_(1).jpg"

Related

Best way to get Amazon page and product information

I want to get Amazon page and product information from their website so I work on a future project. I have no experience with APIs but also saw that I would need to pay in order to use Amazon's. My current plan was to use a WebRequest class which basically takes down the page's raw text and then parse through it to get what I need. It pulls down HTML from all the websites I have tried except amazon. When I try and use it for amazon I get text like this...
??èv~-1?½d!Yä90û?¡òk6??ªó?l}L??A?{í??j?ì??ñF Oü?ª[D ú7W¢!?É?L?]â  v??ÇJ???t?ñ?j?^,Y£>O?|?I`OöN??Q?»bÇJPy1·¬Ç??RtâU??Q%vB??^íè|??ª?
Can someone explain to me why this happens? Or even better if you could point me towards a better way of doing this? Any help is appreciated.
This is the class I mentioned...
public class WebRequest {
protected String url;
protected ArrayList<String> pageText;
public WebRequest() {
url = "";
pageText = new ArrayList<String>();
}
public WebRequest(String url) {
this.url = url;
pageText = new ArrayList<String>();
load();
}
public boolean load() {
boolean returnValue = true;
try {
URL thisURL = new URL(url);
BufferedReader reader = new BufferedReader(new InputStreamReader(thisURL.openStream()));
String line;
while ((line = reader.readLine()) != null) {
pageText.add(line);
}
reader.close();
}
catch (Exception e) {
returnValue = false;
System.out.println("peepee");
}
return returnValue;
}
public boolean load(String url) {
this.url = url;
return load();
}
public String toString() {
String returnString = "";
for (String s : pageText) {
returnString += s + "\n";
}
return returnString;
}
}
It could be that the page is returned using a different character encoding than your platform default. If that's the case, you should specify the appropriate encoding, e.g:
new InputStreamReader(thisURL.openStream(), "UTF-8")
But that data doesn't look like character data at all to me. It's too random. It looks like binary data. Are you sure you're not downloading an image by mistake?
If you want to make more sophisticated HTTP requests, there are quite a few Java libraries, e.g. OkHttp and AsyncHttpClient.
But it's worth bearing in mind that Amazon probably doesn't like people scraping its site, and will have built in detection of malicious or unwanted activity. It might be sending you gibberish on purpose to deter you from continuing. You should be careful because some big sites may block your IP temporarily or permanently.
My advice would be to learn how to use the Amazon APIs. They're pretty powerful—and you won't get yourself banned.

Why views does not increase when java opens the pages?

I have a code which uses tor every time to get a new IP address, and then it opens a blog page, but then also the views counter of the blog do not increases?
import java.io.InputStream;
import java.net.*;
public class test {
public static void main (String args [])throws Exception {
System.out.println (test.getData("http://checkip.amazonaws.com"));
System.out.println (test.getData("***BLOG URL***"));
}
public static String getData(String ur) throws Exception {
String TOR_IP="127.0.0.1", TOR_PORT="9050";
System.setProperty("java.net.preferIPv4Stack" , "true");
System.setProperty("socksProxyHost", TOR_IP);
System.setProperty("socksProxyPort", TOR_PORT);
URL url = new URL(ur);
String s = "";
URLConnection c = url.openConnection();
c.connect();
InputStream i = c.getInputStream();
int j ;
while ((j = i.read()) != -1) {
s+=(char)j;
}
return s;
}
}
This I just made to understand what they have to pass this little auto script.
This is an evolving field, the blog sites try to detect and thwart cheating. Wordpress in particular excludes (https://en.support.wordpress.com/stats/):
visits from browsers that do not execute javascript or load images
In other words just hitting the page doesn't count. You need to fetch all the resources and possibly execute the JavaScript as well.

How can receive multiple files in InputStream and process it accordingly?

I want to receive the multiple files uploaded from my client-side. I uploaded multiple files and request my server-side (Java) using JAX-RS(Jersey).
I have the following code,
#POST
#Consumes(MediaType.MULTIPART_FORM_DATA)
public void upload(#Context UriInfo uriInfo,
#FormDataParam("file") final InputStream is,
#FormDataParam("file") final FormDataContentDisposition detail) {
FileOutputStream os = new FileOutputStream("Path/to/save/" + appropriatefileName);
byte[] buffer = new byte[1024];
int length;
while ((length = is.read(buffer)) > 0) {
os.write(buffer, 0, length);
}
}
How can i write the files separately in the server side as uploaded in the client side.
For eg. I uploaded files such as My_File.txt, My_File.PNG, My_File.doc.
I need to write as same as the above My_File.txt, My_File.PNG, My_File.doc in the server side.
How can I achieve this?
You could try something like this:
#POST
#Consumes(MediaType.MULTIPART_FORM_DATA)
public void upload(FormDataMultiPart formParams)
{
Map<String, List<FormDataBodyPart>> fieldsByName = formParams.getFields();
// Usually each value in fieldsByName will be a list of length 1.
// Assuming each field in the form is a file, just loop through them.
for (List<FormDataBodyPart> fields : fieldsByName.values())
{
for (FormDataBodyPart field : fields)
{
InputStream is = field.getEntityAs(InputStream.class);
String fileName = field.getName();
// TODO: SAVE FILE HERE
// if you want media type for validation, it's field.getMediaType()
}
}
}
There is a blog for the scenario you are looking for. Hope this helps
http://opensourzesupport.wordpress.com/2012/10/27/multiple-file-upload-along-with-form-data-in-jax-rs/

JSF application: Open file, not download it

Current situation: I'm trying to create a JSF app (portlet) which should contains links to excel files (xls, xlt) stored on public network drive G: mapped for all users in our company. The main goal is to unify access to these files and save work to users in search of the reports somewhere on G drive. I hope it's clear..?
I'm using following servlet to open a file. Problem is, that it's not just opened, but downloaded by browser and after that, opened:
#WebServlet(name="fileHandler", urlPatterns={"/fileHandler/*"})
public class FileServlet extends HttpServlet
{
private static final int DEFAULT_BUFFER_SIZE = 10240; // 10KB.
private String filePath;
public void init() throws ServletException {
this.filePath = "c:\\Export";
System.out.println("fileServlet initialized: " + this.filePath);
}
protected void doGet(HttpServletRequest request, HttpServletResponse response)
{
String requestedFile = request.getPathInfo();
File file = new File(filePath, URLDecoder.decode(requestedFile, "UTF-8"));
String contentType = getServletContext().getMimeType(file.getName());
response.reset();
response.setBufferSize(DEFAULT_BUFFER_SIZE);
response.setContentType(contentType);
response.setHeader("Content-Length", String.valueOf(file.length()));
response.setHeader("Content-Disposition", "attachment; filename=\"" + file.getName() + "\"");
BufferedInputStream input = null;
BufferedOutputStream output = null;
try {
input = new BufferedInputStream(new FileInputStream(file), DEFAULT_BUFFER_SIZE);
output = new BufferedOutputStream(response.getOutputStream(), DEFAULT_BUFFER_SIZE);
byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
int length;
while ((length = input.read(buffer)) > 0) {
output.write(buffer, 0, length);
}
} finally {
close(output);
close(input);
}
}
private static void close(Closeable resource) {
if (resource != null) resource.close();
}
}
How to just start appropriate application (e.g. Excel, Word, etc.) clicking on link (with absolute file path) and open the file in its original location?
UPDATE: I'm trying to use <a> tag:
File // various "/" "\" "\\" combinations
File
But it doesn't work:
type Status report
message /G:/file.xls
description The requested resource is not available.
File URLs are considered as a security risk by most browsers, because they cause files to be opened on a client's machine by a web page, without the end user being aware of it. If you really want to do that, you'll have to configure the browser to allow it.
See the wikipedia article for solutions.

Showing images outside my application using Tapestry5

I am developing my first project with Tapestry and I am about to finish, except for the images..
What do I want? I just need to display an image outside my application, example: /home/app/images/image.jpg
What did I try? I have been "googling" and reading Tapestry5 forums, I found this: http://wiki.apache.org/tapestry/Tapestry5HowToStreamAnExistingBinaryFile
I followed the steps, creating classes but I need to display the image embed on another page (so I can't use ImagePage), I tried this:
On page java class
public StreamResponse getImage() {
InputStream input = DetallesMultimedia.class
.getResourceAsStream("/home/santi/Escritorio/evolution-of-mario.jpg"); //On application, i will retrieve this from DB
return new JPEGInline(input,"hellow");
}
On page template
...
<img src="${image}" alt:image/>
...
or
...
${image}
...
Obviusly, this didn't work and I really don't know how can I do it. I read about loading the image on an event (returning the OutputStream on that event, as it's said in the HowTo linked above) but my english is so bad (I am sure you already noticed) and I don't understand well how can I do that.
Could you help me please?
Thanks you all.
I've never seen the examples as on the wiki page. Below some code on how to load an image on the classpath though using a StreamResponse.
#Inject
private ComponentResources resources;
#OnEvent(value = "GET_IMAGE_STREAM_EVENT")
private Object getProfilePic() throws Exception {
InputStream openStream = DetallesMultimedia.class.getResourceAsStream("/home/santi/Escritorio/evolution-of-mario.jpg");
byte[] imageBytes = IOUtils.toByteArray(openStream);
final ByteArrayInputStream output = new ByteArrayInputStream(imageBytes);
final StreamResponse response = new StreamResponse() {
public String getContentType() {
"image/jpegOrPngOrGif";
}
public InputStream getStream() throws IOException {
return output;
}
public void prepareResponse(Response response) {
// add response headers if you need to here
}
};
return response;
}
public String getPicUrl() throws Exception {
return resources.createFormEventLink("GET_IMAGE_STREAM_EVENT");
}
In your template:
<img src="${picUrl}"/>

Categories