Including REST-generated RSS feed - java

I've some problems to understand the functionality of RSS in my special situation.
I've a REST service (written in Java using Spring) which reads some information from a database and dynamically generates the RSS page. The pubdate element of each item is filled by the current date.
The service is reachable under an URL like "http://intern.system.com/rest/api/rss".
I took that URL and include it into a wiki page (the wiki is in this scenario the RSS reader).
The background of this workflow is the following: The database is filld with events or todo's for the next few days. Each event has a title, a description and a date. Until now this information is picked up by hand and transferred to show on a wiki page.
My goal is to automate this process. I want to generate a RSS feed of the events or todo's for the current day (that makes my REST service) and automatically show it on a wiki page.
Is this a good way to do? Does the RSS is shown the whole day (or only by the first call) and for all visitors? F.e. one person enters the page at 8 o'clock, another at 9. Both should see the same information for the day. I think the REST service is called twice in that case. Is this a problem?

This is not problem but as we are developers, we should chase best practices on what we are building. So your case reminds me caching your rest service. If your data is daily updated, you can use caching property which is mostly preferred for static resources. You can add #Cacheable("rssCache") annotation to your component methods, after the first call, the result will be cached.

Related

What sequence of steps does crawler4j follow to fetch data?

I'd like to learn,
how crawler4j works?
Does it fetch web page then download its content and extract it ?
What about .db and .cvs file and its structures?
Generally ,What sequences it follows?
please, I want a descriptive content
Thanks
General Crawler Process
The process for a typical multi-threaded crawler is as follows:
We have a queue data structure, which is called frontier. Newly discovered URLs (or start points, so-called seeds) are added to this datastructure. In addition, for every URL a unique ID is assigned in order to determine, if a given URL was previously visited.
Crawler threads then obtain URLs from the frontier and schedule them for later processing.
The actual processing starts:
The robots.txt for the given URL is determined and parsed to honour exclusion criteria and be a polite web-crawler (configurable)
Next, the thread will check for politeness, i.e. time to wait before visting the same host of an URL again.
The actual URL is vistied by the crawler and the content is downloaded (this can be literally everything)
If we have HTML content, this content is parsed and potential new URLs are extracted and added to the frontier (in crawler4j this can be controlled via shouldVisit(...)).
The whole process is repeated until no new URLs are added to the frontier.
General (Focused) Crawler Architecture
Besides the implementation details of crawler4j a more or less general (focused) crawler architecture (on a single server/pc) looks like this:
Disclaimer: Image is my own work. Please respect this by referencing this post.

Reading a lot of data from the internet

I am currently working on a project for my portfolio. I having a little trouble trying to find the right solution to this problem mainly because I have never tried it before.
I am using a free API service that I found online. I have created the database to match all the information, and not I just need to download the information and parse into my application.
I have parsed data from the API (JSON) and into my database. A couple of suggestions that I have found is reading 10 records at a time, but I want to try reading everything at once and then updating accordingly (let us say every 24 hours).
The APi I am using is the a free Game of Thrones API and below is the list I want of about the URL is formed to access each part data as I move through it.
https://anapioficeandfire.com/api/characters/
https://anapioficeandfire.com/api/books/
https://anapioficeandfire.com/api/houses/
At the end of the each of these URLS is a number that indicates the record number that I am trying to get. I have done this before while get information from a single page, and the page contained multiple JSON object. This time I need to move through multiple pages to get the single object on that page.
To give you an idea of the steps that I am looking for :
Go the page
Download the Information
Move on the next page
Break when there when I have reached the end.

Find direct link in a aspx form or read forms with android

I daily visit this link to find my lectures at school. Every time I have to scroll down the list to find my own class, and then post it so I can view the result. Is there any way i could make a direct link to the preferred content? I'm looking to create a simple webview app in Android showing individual form categories.
EDIT : Really any method for converting the aspx info into another format would do the trick. Prefferably a direc link to each form item. But if I can convert every single item to a .xml file or anything else I could work with it. But I have to make it automated.
You can capture the outgoing request and write a simple application to POST the data back to the page. The WebClient class is useful for this.
Looking at the request in Chrome's developer tools, I see that the form posts back to itself and then redirects to the result page. Presumably, you should POST the form data to the initial page, which will then cause it to perform the redirect.
The form contains a large amount of ViewState data which may or may not need to be included in the request to make it work.
A completely different approach would be to find a browser extension, such as a macro recorder, which emulate your actions. This plugin (haven't tried it myself) appears to do exactly that.

Java HTML element reading

I am trying to create a java program that an detect changes in HTML elements on a web page. For example: http://timer.onlineclock.net/
With each passing second, the HTML elements of the clock change the source of the image they display. Is there anyway, using java, that I can EFFICIENTLY open a connection to this page, and be able to see when these elements change?
I have used HTMLUnit, but I decided that takes to long to load a page to be considered efficient enough.
The only way I know how to do it with a URL is to use a BufferedReader to read the page, and then use Regular Expressions to parse an HTML element within the source, but this would require me to "reload" the page every time that I want to see the properties of an element. Can anybody give me an suggetions on how I can detect these changes in a matter of milliseconds, without using much network resources?
Your best bet is to learn and use javascript instead of server-side java. Javascript program runs on the client side (ie: the web browser) as opposed to the server side.
Typical HTML document consists of elements (eg: text, paragraph, list items etc). With javascript you can create timer, action user's event accordingly and manipulate those elements.
http://www.w3schools.com/js/default.asp is probably a good introduction to javascript, I suggest you spend some time on it
The page in question appears to be a...Javascript digital clock.
If you want the current time, try new Date();.
If you want code to be called at a constant rate, try the Timer class. You can set a task to be called every second, which is the same frequency you will get by polling the page.
If you want to use the page as an external source of time, try the Network Time Protocol. http://en.wikipedia.org/wiki/Network_Time_Protocol It will provide much lower latency and is actually designed for this purpose.

How can I fetch only a part of a webpage from the net rather than downloading the whole page

At the moment I am using a third party library called Android Query. I use it to download the webpage, then I parse the html for the bits that I want. These bits are mostly in one small part of the page, so the rest is discarded.
each page is about 100kb and I am fetching 200-300 pages which tastes a while especially on a slow connection.
Is there any method or library to allow m e to fetch a certain div?
The pages i am fetching are from google play market.
example code i am using
String url = "https://play.google.com/store/apps/details?id=com.touchtype.swiftkey";
aq.ajax(url, String.class, new AjaxCallback<String>() {
#Override
public void callback(String url, String html, AjaxStatus status) {
parseHtml(html);
}
});
edit: if is it not possible, is there a light weight version of Google play pages that I can access and download?
Looking at the other answer here:
Is there a way to download partial part of a webpage, rather than the whole HTML body, programmatically?
It looks like there are ways to do it, but you'd need to know the bytes range. This would be very error prone as the content could easily change over time if google changes different parts of the page.
You could setup your own web server to sit in between your app and google play and return the data you query.
There is a Hacker News thread about this:
https://news.ycombinator.com/item?id=4634259
Apparently Apple has a JSON API, but Google does not.
Here are two server side libraries you could use to make the task easier:
https://github.com/chadrem/market_bot (Ruby)
https://github.com/pastfuture/MarketBot (PHP)
You could return only the data you need from your web app, making it as slim as possible. You could also cache it on your server so that you don't have to hit google for every request. Send several app ids in one request to minimize round trips.

Categories