I've read several GWT developer blogs that mentioned some "rule of thumb" whereby if your resultant JavaScript will be 100 lines of code or greater, it is better to just write the JavaScript yourself (instead of generating it through GWT).
My question is: how can you tell how many LOC GWT will produce? This seems like a difficult rule of thumb to follow, and one that requires writing 2 sets of code (first writing the GWT, and then re-writing it in JavaScript)!?!?
Have I misunderstood something here?
The point of GWT is to save us from having to write JavaScript and using Vaadin (a java Web applications framework built on top of GWT) I created a number of UI controls I wouldn't even know how to do in JavaScript.
The concern you do need to keep in mind is performance. I wrote a form that took over 20 seconds to render in IE8. It doesn't matter how many LOC to your users, no one will wait so long for a page to appear.
Related
I'm part of a team creating a data store that passes information around in large XML documents (herein called messages). On the back end, the messages get shredded apart and stored in accumulo in pieces. When a caller requests data, the pieces get reassembled into a message tailored for the caller. The schemas are somewhat complicated so we couldn't use JAXB out of the box. The team (this is a few years ago) assumed that DOM wasn't performant. We're now buried in layer after layer of half-broken parsing code that will take months to finish, will break the second someone changes the schema, and is making me want to jam a soldering iron into my eyeball. As far as I can tell, if we switch to using the DOM method a lot of this fart code can be cut and the code base will be more resilient to future changes. My team lead is telling me that there's a performance hit in using the DOM, but I can't find any data that validates that assumption that isn't from 2006 or earlier.
Is parsing large XML documents via DOM still sufficiently slow to warrant all the pain that XMLBeans is causing us?
edit 1 In response to some of your comments:
1) This is a government project so I can't get rid of the XML part (as much as I really want to).
2) The issue with JAXB, as I understand it, had to do with the substitution groups present in our schemas. Also, maybe I should restate the issue with JAXB being one of the ratio of effort/return in using it.
3) What I'm looking for is some kind of recent data supporting/disproving the contention that using XMLBeans is worth the pain we're going through writing a bazillion lines of brittle binding code because it gives us an edge in terms of performance. Something like Joox looks so much easier to deal with, and I'm pretty sure we can still validate the result after the server has reassembled a shredded message before sending it back to the caller.
So does anyone out there in SO land know of any data germane to this issue that's no more than five years old?
Data binding solutions like XMLBeans can perform very well, but in my experience they can become quite unmanageable if the schema is complex or changes frequently.
If you're considering DOM, then don't use DOM, but one of the other tree-based XML models such as JDOM2 or XOM. They are much better designed.
Better still (but it's probably too radical a step given where you are starting) don't process your XML data in Java at all, but use an XRX architecture where you use XML-based technologies end-to-end: XProc, XForms, XQuery, XSLT.
I think from your description that you need to focus on cleaning up your application architecture rather than on performance. Once you've cleaned it up, performance investigation and tuning will be vastly easier.
If you want the best technology for heavy duty XML processing, you might want to investigate this paper. The best technology will no doubt be clear after you read it...
The paper details :
Processing XML with Java – A Performance Benchmark
Bruno Oliveira1 ,Vasco Santos1 and Orlando Belo2 1 CIICESI,
School of Management and Technology,
Polytechnic of Porto Felgueiras, PORTUGAL
2 Algoritmi R&D Centre, University of Minho
4710-057 Braga, PORTUGAL
One of the projects I'm currently working on includes a Java Swing application for field users to input data about equipment parts scattered all over the companies facilities.
Most of the data collected is measured (field validation is required) or some value the operator can choose from a predefined list.The software then does some calculation and displays back instructions to the operator.
so for example for part number 3435Af-B, engineering requires the operator to measure
the diameter of some bolt and choose the part maker from a list. the application then compares the measured diameter to the stock diameter and tells the operator if it should be replaced (this is obviously not a real world example, but you get the idea)
The problem is there are over 200 known equipment parts and these are pretty old and heterogeneous so the engineering team has a limited idea of what they would like to measure on each part in the future. It doesn't seam reasonable to have the development team write and maintain over 200 different classes but doesn't seem realistic to have the engineers use a complicated system of form builders and BPM like drools (i'm getting sweaty just thinking about it).
We're currently trying to make a poor man's solution that would allow the engineers to build forms with a simple GUI having limited features and outputting the forms to XML files. The few complicated cases not covered by the GUI solution would be custom made by the developers (very few cases). For the calculation part, we would use Java Expression library (JEL), the expressions to evaluate would be generated by the same GUI.
Before we commit to this solution, I was wondering if there is something we could do differently to have a more robust system. I know that some people will consider this too much soft-coding and I agree that it is going to be difficult extracting and treating the data.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
Is there a way using java or python that I can somehow gather a ton of information from a ton of different colleges on a website such as collegeboard?
I want to know how to do things like this but I've never really programmed outside of default libraries. I have no idea how to start my approach.
Example:
I input a large list of colleges on a list that looks somewhat like
this
https://bigfuture.collegeboard.org/print-college-search-results
The code then finds the page for each college such as
https://bigfuture.collegeboard.org/college-university-search/alaska-bible-college?searchType=college&q=AlaskaBibleCollege
and then gathers information from the page such as tuition, size, etc.
and then stores it in a class that I can use for analysis and stuff
Is something like this even possible? I remember seeing a similar program in the Social Network. How would I go about this?
So, short answer, yes. It's perfectly possible, but you need to learn a bunch of stuff first:
1) The basics of the DOM model (HTML) so you can parse the page
2) The general idea of how servers and databases work (and how to interface with them in python- what I use, or java)
3) Sort of a subsection of 2: Learn how to retrieve HTML documents from a server to then parse
Then, once you do that, this is the procedure a program would haft to go through:
1) You need to come up with a list of pages that you want to search. If you want to search and entire website, you need to sort of narrow that down. You can easily limit your program to just search certain types of forums, which all have the same format on college board. You'll also want to add part of the program that builds a list of web pages that your program finds links to. For instants, if collegeboard has a page with a bunch of links to different pages with statistics, you'll want your program to scan that page to find the links to the pages with those statistics.
2) You need to find the ID, location, or some identifying marker of the HTML tag that contains the information you want. If you want to get REALLY FANCY (and I mean REALLY fancy) you can try to use some algorithms to parse the text and try to get information (maybe trying to parse admission statistics and stuff from the text on the forums)
3) You then need to store that information in a database that you then index and create an interface to search (if you want this whole thing to be online, I suggest the python framework Django for making it a web application). For the database type, it would make sense to use Sqlite 3 (I)
So yes, it's perfectly possible, but here's the bad news:
1) As someone already commented, you'll need to figure out step 2 for each individual web page format you do. (By web page format, I mean different pages with different layouts. The stack overflow homepage is different from this page, but all of the question pages follow the same format)
2) Not only will you need to repeat step 2 for each new website, but if the website does a redesign, you'll have to redo it again as well.
3) By the time you finish the program you may have easily gathered the info on your own.
Alternative and Less Cool Option
Instead of going through all the trouble or searching the web page for specific information, you can just search the web page and extract all its text, then try and find key words within the text relating to colleges.
BUT WAIT, THERE'S SOMETHING THAT DOES THIS ALREADY! It's called google :). That's basically how google works, so... yah.
Of course there is "a way". But there is no easy way.
You need to write a bunch of code that extracts the stuff you are interested in from the HTML. Then you need to write code to turn that information into a form that matches your database schema ... and do the database updates.
There are tools that help with parts of the problem; e.g. Web crawler frameworks for fetching the pages, JSoup for parsing HTML, Javascript engines if the pages are "dynamic", etc. But I'm not aware of anything that does the whole job.
What you're asking about here is called scraping and in general
it's quite tricky to do right. You have to worry about a bunch of
things:
The data is formatted for display, not programmatic consumption.
It may be messy, inconsistent, or incomplete.
There may be dynamic content, which means you might have to run a
JavaScript VM or something just to get the final state of the page.
The format could change, often.
So I'd say the first thing you should do is see if you can access the
data some other way before you resort to scraping. If you poke around
in the source for those pages, you might find a webservice feeding data
to the display layer in XML or JSON. That would be a much better place
to start.
Ok everyone thanks for the help. Here's how I ended up doing it. It took me a little while but thankfully collegeboard uses very simple addresses.
Basically there are 3972 colleges and each has a unique, text only page with an address that goes like:
https://bigfuture.collegeboard.org/print-college-profile?id=9
but the id=(1-3972).
Using a library called HTMLunit I was able to access all of these pages, convert them in to strings and then gather the info using indexOf.
It still is going to take about 16 hours to process all of them but I've got a hundred or so saved.
Maybe I lucked out with the print page but I got what I needed and thanks for the help!
I'm trying to develop a system whereby clients can input a series of plant related data which can then be queried against a database to find a suitable list of plants.
These plants then need to be displayed in a graphic output, putting tall plants at the back and small plants at the front of a flower bed. The algorithm to do this I have set in my mind already, but my question to you is what would be the best software to use that:
1) Allows a user to enter in data
2) Queries a database to return suitable results
3) Outputs the data into a systemised graphic (simple rectangle with dots representing plants)
and the final step is an "if possible" and something I've not yet completely considered:
4) Allow users to move these dots using their mouse to reposition if wanted
--
I know PHP can produce graphic outputs, and I assume you could probably mix this in with a bit of jQuery which would allow the user to move the dots. Would this work well or could other software (such as Java or __) produce a better result?
Thanks and apologies if this is in the wrong section of Stack!
Your question is a bit vague. To answer it directly, any general programming language these days is able to do what you want, with the right libraries - be it C/++, Java, PHP+Javascript, Python, Ruby, and millions of others
With Java in particular, you'll probably want to use the swing toolkit for the GUI.
If you do know PHP+Javascript exclusively, it's probably best for your project to stick to what you know. If, however, you see this more as a learning opportunity than a project that needs be done NOW, you could take time to learn a new language in the process.
As to what language to learn, each person has a different opinion, obviously, but generally speaking, a higher-level a language is faster to prototype in.
EDIT
If you need this for a website, however, you'll need to use something web based - that is, you'll necessarily have two programs, one that runs server-side, the other one in the client (browser). On the server side, you could very well use PHP, JSP (JavaServer Pages), Python or Ruby. On the client side, you'll be limited to Javascript+DOM (maybe HTML5), a Java applet, or something flash-based.
Can low level metrics (such as word count) measure over web interface elements cover web page and site usability?
The thing about usability is that no matter how much researchers and engineers try to quantify usability, it can't be accurately measured as a whole. For example, let's say that Google, with it's sub-500 words and one sprited-image is a very "useable" site. Now, let's do a page with one image (black on black writing) on a black screen....let's add a javascript blink to it. The second could have exactly the same amount of elements and the same amount of Javascript as your standard, but one is clearly better. By the same token, you could use word count as a measure, but what happens when you hit a site that's all flash and has no forward-facing text to speak of. It might be a beautiful site (I use that loosely because I'm not a fan of Flash) but by your test's measures, it's a complete failure.
Then you get into concepts like location precedence, separating content in images vs content in text (not all text is actually text on a site), color palettes, expected vs actual functionality, accessibility, compatibility with various browsers and technologies, etc.
There's a reason that testers are paid to interact with enterprise-level sites, graphic designers are paid to make layouts, and UI Engineers (like me) are paid to figure out how to effectively make the interface function effectively with the user...it's because there isn't a way to replace us (yet).
Never mind the fact that the "experts" still haven't figured out exactly what to test for. For every Jakob Nielson finding, there's several others that contradict his findings. Remember, while there's an accepted standard out there (W3c) the browser family with the biggest market share still doesn't entirely accept it, meaning that w3c isn't necessarily a 100% valid singular testing standard (as much as that hurt to write....)
Of course, you could just try the HiPPO. I hear it has a very good API and is always right.