java web crawler [closed] - java

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
hi can anyone recommend a simple java web crawler that crawls a websites and return a list of links in the website ? No, i do not need a parser. Thanks for your attention.

A web crawler is (almost by definition) never 'simple'.
Two names spring to mind however (although both have a learning curve):
Nutch
Heritrix
Both are open source and can accomplish what you want, although simply listing the links in a website is not what either is built for (Nutch is designed to build a search index and Heritrix is designed to archive websites). You will need to do some custom configurations to accomplish your task.
HTTrack is a much simpler tool, but is not implemented in Java.

Related

Java learning initial stage [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I want to learn java by myself but I don't know the programming language. I am new to programming but I like to do.
So could you please guide me for learning Java
What software require to install it?
please attach the learning web site link for basics.
please also attach the sample examples links.
Thank You
I would recommend looking at Thinking in Java. It gives you an overview of OO as well which is essential for programming well in Java. I would also look at the tutorials from Oracle.
As to what you might need to actually write java, I'd start with a text editor and manually compile and deploy applications. This gives you a good understanding of it. After that I'd look for an IDE such as Eclipse.

Access website without browser [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am just learning java. I just want to make a simple application to access a web-site.
there is a website onto which i want to log-in through java:
and then interact with it through my interface, basically after log in, i would be writing in some text boxes and sending it.
I tried many places to do it, studied HTTP protocol but still cant make it.
can someone help me out?
Accessing a web site, logging in and interacting with forms on it is somewhat complex work, so it might not be the best choice for a first java project.
But if you want to do it, you should probably use Apache HttpComponents/HttpClient.
There are useful examples at the above link as well, which may help you get started.

Latest technologies in java [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I'm have an experience of 2+ in java development. I worked on corejava,toplink(db framework),sql.
I have knowledge on servlets,jsp and struts.
I would like to move to another company. What are the latest emerging technologies in java??
A master is a master not because of his knowledge of the additional elements in his field; but, because of his skill in handling the core fundamentals. All additional elements in a field stem from the fundamentals.
I am getting into java as well. From what I have gleaned you should be well versed with the Spring Framework and Hibernate or other comparable tools (Google for alternatives). They aren't the "latest", but people will expect you to know them well.

Web Crawler's Functionality [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Does a web crawler return the extracted text from webpages only? Say, if there are some pdf/doc files stored in the web server as well. Can a web crawler crawl through them and return their content as well? Anyway what are the suggestions for a good opensource Java web crawler?
Thank You!
Web crawler doesn't extract the text. It simply returns the htmls with some transformations [UTF-8 conversion for example] applied.
If you think of it that way for crawler it doesn't matter at the first hop. Of course for multiple hops it needs to look inside these documents and typical crawlers don't provide multiple hops in pdf/docs etc.

opensource websites backend in java [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
Are there any opensource websites backend in java?
Just like reddit in python, and openstreet in ruby.
So, you're looking for a Java open source CMS (Content Management System)?
You can find here an overview of the most of them: Open Source CMS in Java. Wikipedia also has an overview of some of them: List of CMS in Java.
The popular ones are Alfresco, Nuxeo and Liferay. You can compare the detailed features on the CMS Matrix site.
ThingLink is a finnish service with a custom build Java backend. The backend code is hosted on SourceForge.
Do you mean web frameworks? If so there are loads: http://java-source.net/open-source/web-frameworks
Edit: Tapestry 5 is great.

Categories