How to use an input field to permanently save to a webpage? - java

First of all, I'm not a programming expert. I'm fluent in VB, functional with html & php, & somewhat fluent with java.
I have created a password protected side of my business' website that basically has commonly needed reference material & alot of organized links to other websites that we frequently use. Right now, if I want to add a new link, I have to go into the html and code the button. (side note: bookmark syncronization via xMarks is what we have been using. While it's functional, I need something that can be more easily accessed on multiple computers, sometimes even public computers & computers owned by clients, so I don't want to be limited by xMarks...we basically store URLs in notes on our smartphones so we can type them in when we need them...archaic, I know)
It seems that it would be possible to simply have a form. One field for the URL, one field for the title, and when I click submit it would be permanently added as a button on that page...but I can't even really figure out where to start. I feel like this is probably a job for Java, but I just don't know what direction to go.
You don't have to write the code for me (by all means, if you have the desire, feel free) I just need to know what direction to go!

This is a job for "any programming language" (that is supported by your server, or which you are willing to add support for to your server).
Of your tags, you could use Java or PHP. My personal preference would probably be to Perl or Python.
The basics would be:
HTML form submitting to a server side program that adds the data to a database. For a low traffic system like that, that database could be SQLite.
Plus: Server side program that generates a list of links from the database. It would query the database for all the links (possibly adding paging when the list got to a certain size) then loop over the results and output the HTML for each one.
Using a template language inside your programming language would be wise. Make sure you look up how to defend yourself from SQL Injection and XSS.

This can be easily done using PHP

Related

Parsing an updating html using jsoup [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
we have a problem (we are a group).
We have to use jsoup in java for an university project. We can parse Htmls with it. But the problem is that we have to parse an html which updates when you click on a button (https://www.bundestag.de/services/opendata).
First Slide
Second Slide
We want to access all xmls from "Wahlperiode 20". But when you click on the slide buttons the html code updates but the html url stays the same. But you have never access to all xmls in the html because the html is updating over the slide button.
Another idea was to find out how the urls of the xmls we want to access are built so that we dont have to deal with the slide buttons and only access the xml urls. But they are all built different.
So we are all desperate how to go on. I hope y'all can help us :)
It's rather ironic that you are attempting to hack1 out some data from an opendata website. There is surely an API!!
The problem is that websites aren't static resources; they have javascript, and that javascript can fetch more data in response to e.g. the user clicking a 'next page' button.
What you're doing is called 'scraping': Using automated tools to attempt to query for data via a communication channel (namely: This website) which is definitely not meant for that. This website is not meant to be read with software. It's meant to be read with eyeballs. If someone decides to change the design of this page and you did have a working scraper, it would then fail after the design update, for example.
You have, in broad strokes, 3 options:
Abort this plan, this is crazy
This data is surely open, and open data tends to come with APIs; things meant to be queried by software and not by eyeballs. Go look for it, and call the german government, I'm sure they'll help you out! If they've really embraced the REST principles of design, then send an accept header that including e.g. application/json and application/xml and does not include text/html and see if the site just responds with the data in JSON or XML format.
I strongly advise you fully exhaust these options before moving on to your next options, as the next options are really bad: Lots of work and the code will be extremely fragile (any updates on the site by the bundestag website folks will break it).
Use your browser's network inspection tools
In just about every browser there's 'dev tools'. For example, in Vivaldi, it's under the "Tools" menu and is called "Developer tools". You can also usually right click anywhere on a web page and there will be an option for 'Inspect', 'Inspector', or 'Development Tools'. Open that now, and find the 'network' tab. When you (re)load this page, you'll see all the resources its loading in (so, images, the HTML itself, CSS, the works). Look through it, find the interesting stuff. In this specific case, the loading of wahlperioden.json is of particular interest.
Let's try this out:
curl 'https://www.bundestag.de/static/appdata/filter/wahlperioden.json'
[{"value":"20","label":"WP 20: seit 2021"},{"value":"19","label":"WP 19: 2017 - 2021"},(rest omitted - there are a lot of these)]
That sounds useful, and as its JSON you can just read this stuff with a json parser. No need to use JSoup (JSoup is great as a library, but it's a library that you can use when all other options have failed, and any code written with JSoup is fragile and complicated simply because scraping sites is fragile and complicated).
Then, click on the buttons that 'load new data' and check if network traffic ensues. And so it does, when you do so, you notice a call going out. And so it is! I'm seeing this URL being loaded:
https://www.bundestag.de/ajax/filterlist/de/services/opendata/866354-866354?limit=10&noFilterSet=true&offset=10
The format is rather obvious. offset=10 means: Start from the 10th element (as I just clicked 'next page') and limit=10 means: NO more than 10 pages.
This html is also incredibly basic which is great news, as that makes it easy to scrape. Just write a for loop that keeps calling this URL, modifying the offset=10 part (first loop: no offset. Second, offset=10, third: offset=20. Keep going until the HTML you get back is blank, then you got it all).
For future reference: Browser emulation
Javascript can also generate entire HTML on its own; not something jsoup can ever do for you: The only way to obtain such HTML is to actually let the javascript do its work, which means you need an entire browser. Tools like selenium will start a real browser but let you use JSoup-like constructs to retrieve information from the page (instead of what browsers usually do, which is to transmit the rendered data to your eyeballs). This tends to always work, but is incredibly complicated and quite slow (you're running an entire browser and really rendering the site, even if you can't see it - that's happening under the hood!).
Selenium isn't meant as a scraping tool; it's meant as a front-end testing tool. But you can use it to scrape stuff, and will have to if its generated HTML. Fortunately, you're lucky here.
Option 1 is vastly superior to option 2, and option 2 is vastly superior to option 3, at least for this case. Good luck!
[1] I'm using the definition of: Using a tool or site to accomplish something it was obviously not designed for. The sense of 'I bought half an ikea cupboard and half of an ikea bookshelf that are completely unrelated, and put them together anyway, look at how awesome this thingie is' - that sense of 'hack'. Not the sense of 'illegal'.

Is it possible to do this type of search in Java

I am stuck on a project at work that I do not think is really possible and I am wondering if someone can confirm my belief that it isn't possible or at least give me new options to look at.
We are doing a project for a client that involved a mass download of files from a server (easily did with ftp4j and document name list), but now we need to sort through the data from the server. The client is doing work in Contracts and wants us to pull out relevant information such as: Licensor, Licensee, Product, Agreement date, termination date, royalties, restrictions.
Since the documents are completely unstandardized, is that even possible to do? I can imagine loading in the files and searching it but I would have no idea how to pull out information from a paragraph such as the licensor and restrictions on the agreement. These are not hashes but instead are just long contracts. Even if I were to search for 'Licensor' it will come up in the document multiple times. The documents aren't even in a consistent file format. Some are PDF, some are text, some are html, and I've even seen some that were as bad as being a scanned image in a pdf.
My boss keeps pushing for me to work on this project but I feel as if I am out of options. I primarily do web and mobile so big data is really not my strong area. Does this sound possible to do in a reasonable amount of time? (We're talking about at the very minimum 1000 documents). I have been working on this in Java.
I'll do my best to give you some information, as this is not my area of expertise. I would highly consider writing a script that identifies the type of file you are dealing with, and then calls the appropriate parsing methods to handle what you are looking for.
Since you are dealing with big data, python could be pretty useful. Javascript would be my next choice.
If your overall code is written in Java, it should be very portable and flexible no matter which one you choose. Using a regex or a specific string search would be a good way to approach this;
If you are concerned only with Licensor followed by a name, you could identify the format of that particular instance and search for something similar using the regex you create. This can be extrapolated to other instances of searching.
For getting text from an image, try using the API's on this page:
How to read images using Java API?
Scanned Image to Readable Text
For text from a PDF:
https://www.idrsolutions.com/how-to-search-a-pdf-file-for-text/
Also, PDF is just text, so you should be able to search through it using a regex most likely. That would be my method of attack, or possibly using string.split() and make a string buffer that you can append to.
For text from HTML doc:
Here is a cool HTML parser library: http://jericho.htmlparser.net/docs/index.html
A resource that teaches how to remove HTML tags and get the good stuff: http://www.rgagnon.com/javadetails/java-0424.html
If you need anything else, let me know. I'll do my best to find it!
Apache tika can extract plain text from almost any commonly used file format.
But with the situation you describe, you would still need to analyze the text as in "natural language recognition". Thats a field where; despite some advances have been made (by dedicated research teams, spending many person years!); computers still fail pretty bad (heck even humans fail at it, sometimes).
With the number of documents you mentioned (1000's), hire a temp worker and have them sorted/tagged by human brain power. It will be cheaper and you will have less misclassifications.
You can use tika for text extraction. If there is a fixed pattern, you can extract information using regex or xpath queries. Other solution is to use Solr as shown in this video.You don't need solr but watch the video to get idea.

Allow to enter language specific character from keyboard

I have one application providing language selection option to user.
I want to implement facility that user are allowed to entering text from keyboard in selected language. e.g. If i select Hindi my application takes an input in Hindi.
I am using JSF(icefaces) and Hibernate.
Is it possible ? How ?
use language translation javascript function on onkeyup event
you need to include external JS for this as http://www.google.com/jsapi..
please refer this for your reference
http://www.labnol.org/internet/website-translation-with-google-language-api/4367/
may this help u :)
Everything is possible. The question is "how much is this?"
Go to translate.google.com and see that they are able to detect writing language automatically. if you are able to do so send the text typed by user using AJAX to server and validate that the text is written in chosen language.
But language detection is not so simple task. It is simple if language uses its unique script. For example Georgian language (as far as I know) uses its own script and no other languages use the same script. You cannot say the same about European languages: they all use Latin letters. In this case more sophisticated methods are required and google does it. BTW You can probably utilize this tranlate.google facility (if they have API). Send typed text to google using AJAX and see which language does it detect. It is not 100% correct but much better that everyone of us can implement himself.

How do I send a query to a website and parse the results?

I want to do some development in Java. I'd like to be able to access a website, say for example
www.chipotle.com
On the top right, they have a place where you can enter in your zip code and it will give you all of the nearest locations. The program will just have an empty box for user input for their zip code, and it will query the actual chipotle server to retrieve the nearest locations. How do I do that, and also how is the data I receive stored?
This will probably be a followup question as to what methods I should use to parse the data.
Thanks!
First you need to know the parameters needed to execute the query and the URL which these parameters should be submitted to (the action attribute of the form). With that, your application will have to do an HTTP request to the URL, with your own parameters (possibly only the zip code). Finally parse the answer.
This can be done with standard Java API classes, but it won't be very robust. A better solution would be HttpClient. Here are some examples.
This will probably be a followup question as to what methods I should use to parse the data.
It very much depends on what the website actually returns.
If it returns static HTML, use an regular (strict) or permissive HTML parser should be used.
If it returns dynamic HTML (i.e. HTML with embedded Javascript) you may need to use something that evaluates the Javascript as part of the content extraction process.
There may also be a web API designed for programs (like yours) to use. Such an API would typically return the results as XML or JSON so that you don't have to scrape the results out of an HTML document.
Before you go any further you should check the Terms of Service for the site. Do they say anything about what you are proposing to do?
A lot of sites DO NOT WANT people to scrape their content or provide wrappers for their services. For instance, if they get income from ads shown on their site, what you are proposing to do could result in a diversion of visitors to their site and a resulting loss of potential or actual income.
If you don't respect a website's ToS, you could be on the receiving end of lawyers letters ... or worse. In addition, they could already be using technical means to make life difficult for people to scrape their service.

Writing a MYSQL backed Java Back end Service for Dummies

So I want to do the simplest possible thing.
Assume I have a MYSQL enabled hosting service.
In it, I have a database storyland and a table story-->(id, title, text)
I only know how to write Java programs in eclipse that run on my computer and do homework assignments well...:)
Now I want to
1) write a Java program that is hosted on my server that would compute and return (for example) the number of characters of text stored in the entire MYSQL database
Then I also only have experience with writing PHP programs that talk directly to MYSQL via forms e.t.c but now i want to
2) be able to display a page index.php that says
echo "Welcome to storyland, there are $textcount
characters of text in all stories here";
where $textcount is the number returned by the java service.
I would appreciate really specific answers for this really "simple" specific example..to get me started. I'd also appreciate answers/resources that do not lean too heavily on external libraries/software since i want to be able to understand how those libraries work to be able to decide how to use them in future.
Thanks!
A design thought: I'd be tempted to have one more column - size, and have that precomputed for each blob of text, so you wouldn't have to calculate that (which might be expensive to count a blob > varchar size). Then I'd just issue a SUM over that column and be done: SELECT SUM(size) from mytable;
That would make the db work real simple, a simple INSERT and SELECT system really.
You need to have your own server, or a server that supports java, else this is not possible.
Even if you have a server that supports java, why do this with java when you can do it with php, the bottleneck will probably be the database anyhow.

Categories