Algorithm of crawling Top10 PR/Alexa sites [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to write a script which will crawl current top 10 PR/Alexa sites. since PR/Alexa frequently changes. so my script should take care of this I mean if today there is not a site in top 10 but could be tomorrow.
I dont know how to start with. I know crawling concepts but here I'm stuck. there could be top50 sites or even top500 sites. which I can configure of course.
I read about Google spider but its very complicated for this simple task. How do Google,Yahoo,Bing crawl billions of sites around the web. I'm just curious. what is the cursor point, I mean how google can Identify newly launch site.
Ok these are very deep details, I would read about these later. right now I'm more concern about my problem. how could I crawl top10 PR sites.
Can you provide a sample program so that I can understand better?

It's rather simple to fetch top25sites (if I understood correctly what you wanted to do)
Code:
from bs4 import BeautifulSoup
from urllib.request import urlopen
b = BeautifulSoup(urlopen("http://www.alexa.com/topsites").read())
paragraphs = b.find_all('p', {'class':'desc-paragraph'})
for p in paragraphs:
print(p.a.text)
Output:
Google.com
Facebook.com
Youtube.com
Yahoo.com
Baidu.com
Wikipedia.org
(...)
But have in mind that law in some countries could be more strict. Do it on own risk.

Alexa has a paid API you can use
**There is also a free API**
There is a free API (though I haven't been able to find any documentation for it anywhere).
http://data.alexa.com/data?cli=10&url=%YOUR_URL%
You can also query for more data the following way:
http://data.alexa.com/data?cli=10&dat=snbamz&url=%YOUR_URL%
All the letters in dat are the ones that determine wich info you get. This dat string is the one I've been able to find wich seems to have more options. Also, cli changes the output completly, this option makes it return an XML with quite a lot of information.
EDIT: This API is the one used by the Alexa toolbar.
Fetching Alexa data

Related

Usage of java.util.ArrayList in ColdFusion [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am working on some old ColdFusion code. It probably dates from late '90s. It was programmed using
queryParams = createObject("java", "java.util.ArrayList");
...
arrayAppend( queryParams, {...});
...
It is looking like a normal array. I am wondering if someone just created a normal ColdFusion array the hard way.
To preface this... My comment was an educated guess. The only person who could give a truly objective answer for a question like this is the champion who originally wrote the code you're looking at.
But yes, it's entirely possible (probable?) that the way people handled Arrays in Coldfusion 20 years ago would seem alien to us in modernity. ArrayNew() simply did not exist.
Pro Tip to anyone who reads this in the future: Adobe's help documentation usually has a "history" section that shows when functions came to be, or when they stopped being supported.
https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-functions/functions-a-b/arraynew.html
ArrayNew
> History
Introduced in ColdFusion MX
EDIT
From the comments, I have been informed that Adobe's official page appears to be wrong. I see there are books that reference the ArrayNew function all the way back until at least ColdFusion 4 in 1999.
I suppose it's still possible that OP's code is old enough to pre-date that function since he didn't give us a version, but an interesting development nonetheless.

How to start learning Big Data? What are the modules I need to concentrate on as a developer [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm planning to learn Big Data. I just have gone through tutorials but I'm a little bit confused what the modules are that I need to concentrate on from a developer perspective. Presently I'm working on java. I hope your response will be helpful for the next step of my Big Data journey.
First I'd propose to get familiar with the term, Big Data is a bit fluffy and debated one, more a marketing catchphrase than a technical specification, covering a huge range of technology.
Starting from that I'd try to determine which aspect (IoT, build/run datacenters, etl/data integration/warehousing, analytics/statistics/machine learning...) or perhaps which field of application (retail, bioinformatics...) you're interested in, and which is reasonable to access from an employment point of view. I'd think also about the tech stack you'd like to work on (Scala, Python...).
Reverse engineering job offers could be a way to get to that information actually.
The Data Scientist profile (etl + machine learing + visualization) gained broad acceptance and encompasses certain skill sets, Big Data Analyst and Bid Data Engineer also can be found, arguably with a not so well defined profile.
Nowadays one can get whole MSCs in data science (here's a personal evaluation of it), but perhaps you can get your foot into the door on a less fancy route too. Trainigs may come in varying quality, I found Andy Ngs machine learning and deep learing (big neural networks) MOOCs stunning, and everything coming from the EPFL-Scala side (if you want to go down that road) is technically superior and from the presentation ok (I tried Big Data Analysis with Scala and Spark).

Should i use for each loop/cursor in SQL or a regular one in Java? What is more efficient? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hello stackoverflow i have a doubt, which is the proper way to solve this?
I have this sql sentence:
SELECT *
FROM task
, subject
WHERE task.id_subject = task.id
AND task.id_tasktype = 1
AND subject.id_evaluation = {ALL the ids of table evaluation}
If i want to execute this sentence for every evaluation what is more efficient? a loop/cursor or whatever in SQL (i have basic knowledge of sql) or a regular for each in Java?
It depends on your situation. Basically, if your database server and application server are actually two different computers, then you might decide to run the loop at the server which can handle more pressure. You need to look at some statistics to be able to determine this.
Also, you can implement both solutions and measure the time needed at db server + time needed at application server. If one of your loops is consistently quicker than the other, then it is practically more efficient in the scenario you are running it according to your experiments. Off course, the scenario might change over time.
Generally speaking, people tend to run this loop on the application server (Java), since you might need to execute some things available only there in the future, but if you have a very good reason to run this on the database server, like the case when a trigger should trigger this functionality, then you might decide to run it there.
Basically, you are trying to optimize a loop where you do not necessarily have a problem. If you encounter performance issues, then you might decide to experiment with a few things, including, but not limited to the suggestions shown in your question.

Creating a dialogue in Java with libGDX [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to make a dialogue tree (conversation tree) in Java using libGDX. Should I use lots of conditionals (if,else etc...) and move on to the next dialogue or is there a better way to read a file such as XML that already have the dialogues inside? Also, I want the solution that would consume the least possible memory amount because I am going to write it for Android.
Example of the dialogue tree:
(Q: Question, A: Answer , C:Choice ,AC:Action)
Q:Hi is there anyway that i can help you?
A:You own me 5 dollars!
C1:Ask politely to return them to you, C2:Τhreaten her , C3:Draw your gun
A1:No way get out of here , A2:Call the security , A3:Call the cops
AC1-2:Exit the building //end of choices 1-2
C3.1:Draw your gun and shoot the cops , C3.2:Jump from the window
AC3.1:Arrested , AC3.2:Dead
If your game is going to have little dialogue, I would use Strings for it, but if you base it around the dialogues, I would use a SQLite or similar database to store them in it. I don't know whether it will be the most efficient way to do that, but that what's occurred to me while reading your question:
You could use e.g. column 1 for the question, and columns 2, 3, 4, 5 for the possible answers. You can get information about using SQLite in libGDX here
You could make into some method actor, whom he pass an id to access the SQLite for the question and the answers to that question, then assign some variables, and use a switch statement, if you don't want to have much if-elseif etc.
Note: I think that SQLite is mostly used when you want data to be saved and used in the future; if the data for example change every 10 minutes, I think it would be better to use JSON, because in SQLite making connections to the database every 10 minutes may take some time. I think this is not the case; in my opinion the purpose of JSON and SQLite is completely different. I would use for example:
JSON = I would use it to send or/and receive data betwen server & client or configuration files etc.
SQLite = I would use it to store data.
This is only my opinion, and I not say that SQLite is better or worse than JSON.
PS 1: the photo is taken from the Internet
PS 2: I also believe that you should read https://stackoverflow.com/tour

Turn HTML into XML and parse it -- Android Apps [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have been learning how to build android apps this summer. I am currently trying to work on xml parsing which falls under java in this case. I have a few questions that are mostly conceptual and one specific one.
First, in most of the examples I have seen pages already in xml are used. Can I use a page in regular html format and with whatever the program does turn it to xml and then parse it? Or is that what is normally done anyway?
Secondly, I could use a little explanation on how the parser actually works and saves the data so I will better know how to use it (extract it from whatever it is saved in), when the parsing is done.
So for my specific example I am trying to work with some weather data from the NWS. My program will take the data from this page, and after some user input take you to a page like this, which sometimes will have various alerts. I want to select certain ones. This is what I could use help with. I haven't really coded anything on that yet because I don't know what I am doing.
If I need to clarify or rephrase anything in here I am happy too and let me know. I am trying to be a good contributor on here!
Yes you can parse HTML and there are many parsers available too, there is a question about it here Parse HTML in Android, then we have an answer here about parsing html https://stackoverflow.com/a/7114346/826657
Although its a bad idea, as the tag names aren't well named, so you will have to write lots of code searching attributes for a specific data tag, so you always have to prefer XML,for saving lots of code space and also time.
Here is a text from CodingHorror which says at general parsing html is a bad idea.
http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
Here is something which explains parsing an XML document using XML PullParser http://www.ibm.com/developerworks/library/x-android/

Categories