Automated method of verifying backlinks / URLs? - java

Does anyone have suggestions on automated ways to verify that backlinks are valid? I realize there are many criteria for determining this so I am open to all sorts of suggestions.
A few example criteria would be backlinks coming from specific domains, hosts, etc. but what about criteria that is not so easy to determine such as age of the link, subject matter of the site where the link originates, etc.
P.S., Although the above is a general question, I'm specifically looking for how to do this in Java.

I would suggest you ask the generic question on Programmers because determining what is a "good" backlink is not specific to any language.
If you have the criteria, try implementing them for yourself first and then ask a specific question if you don't come along.
The java.net.URL Class handles addresses well you can get the domain and other stuff nicely from there.

Related

Smack user validation and Nodeprep profile of stringprep

I'm working with Smack library and as I understand there is no function for verifying user jid, which is used in creating new Connection. (Please correct me if I'm wrong)
So I decided to write a new one and for this purpose I started to investigate RFC-6122 which contains ABNF block with validation rules.
Unfortunately I'm not very aware of very-Unicode specific things and BNF-related things, so I didn't understand how to make correct regular expression according to this BNF block. Especially I'm confused by such thing as "Nodeprep profile of stringprep" mentioned in ABNF block.
Could you please clarify this one or give me some advices?
It's defined in RFC 6122, Appendix A, but that's unlikely to help you without also reading RFC 3454, and a bunch of other source material. It's quite an undertaking to implement, so I strongly suggest you use an existing Stringprep library, such as libidn.

OOP design - creation strategies/patterns

For OOP practice I am working on a hobby project, a quiz program which reads a table from txt file and asks questions about entries in the table. The idea is to have this facilitate learning of the material given for a course in our dept.
So far I wrote the I/O bit, put together a pretty modest GUI and the classes to represent the different types of entities in the datatable. I am not sure about how to proceed with the core of the program though, I mean question generation and validation.
My first idea was to have a class AbstractQuestion which pretty much defines what a question is and what fields it has (a string representation, an answer and a difficulty level). Then I thought I could write classes for different types of questions, for instance one class for simple value inquiries (like giving the name of an entity and asking for a particular property), another class for more complicated questions (for instance inquiring about interactions of entities etc).
I am not sure if this is the best way to go however. Can't really express why but I have a feeling that this is not the neatest way to go about it. Would it make sense to work on a Factory class? Essentially I need to:
provide means for a question to be generated based on one, or more, entities randomly picked from the datatable
different types of questions need to be created on the runtime, based on input from the user (desired difficulty level)
questions need to be validated and the user needs to be notified by the main Quiz class (so the questions need to be accessible).
I could start simple and implement only one type of question, get it to work and add new features in time but I think it's good practice to improve my understanding of OOP, and besides I'm afraid if it works and I start giving it out for people to test it out, I'll eventually end up working on something else. I'd like to be able to conceptualize my project better, and I think this could be a good opportunity to improve that.
PS: In case it wasn't obvious, I am not a programmer by educational background :)
You could use an Abstract Factory to create factories that know how to create questions based on specific parameters.
As for the notification you could use Observer Pattern. Study them and see examples in the language of your preference
Think in terms of two things:
What objects use Question objects? What do they need Questions to do? That is we talk about the Interface(s) of the Question.
How do Questions do those things? The Behaviour of the Question.
Initially, think only about the Interfaces. I'm not clear what we need the question to do. Seems to me that a question whose answer is free-form text and a question which offers a "Pick one of A to D" and a question which asks "Pick one or more of A to D" might well loom very different in a UI. So are you thinking in terms of "Question: please display yourself, get your answer and tell me the user's score" or "Question: what is your text? Question: what kind of answer do you take? Question : what are your four options? Question: the user entered 'a' what did they score?"
Once you've got the idea of the question's responsibilities clear, then you can consider the appropriate number of different Question interfaces and classes, and hence decide whether you need a creational pattern such as Factory. Factory works well when you have a number of different classes all implementing the same interface.
Factory: go make me a question. Question: go and ask the user.
I've got simple quiz application running on production =) There are different type of questions, with different behaviours (they should be asked, answered and tipped in different fashion). Questions have different complexity etc.
In my situation, the most appropriate solution, was creation Question superclass with some abstract methods (it could be an interface as well) and different implementation. And there were QuestionGenerator (works as a factory), factory, based on some input return different implementation.
Think, about your interface (common part) of your question and use factory pattern.
There could be more complicated scenario, where you can find some advantages of using AbstractFactory or Builder patter.
In my simple case, extracting interface was enought

I18n/L10n of an API targeting developers from maximum locales [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I am already aware about the best practices for internationalizing/localizing an application so that it is consumable by maximum number of users - from geography, language and locale perspective. My question is - what are the (additional ?) best practices I need to follow if I want to make it easy for developers from across the world to consume my API?
I realize this question is very broad - so I will attempt to reduce the scope: I am particularly interested in creating a REST API and a Java client library for the aforementioned REST API.
Some of the things that come to mind are:
Provide a way for the developer to localize Strings (obvious)
Provide a way for the developer to customize locale-specific artifacts (measurements, units like currency, distance, weights etc)
Internationalize the API documentation (?? - is this done often? Is it practical?)
Beyond these measures, there are other aspects that really confuse me.
Correctness versus Simplicity:
Should I really name my classes to reflect the technical concepts on which they are based? For example, many of the design patterns may make sense to people who are well versed in English, but for developers with a different medium of instruction, they might be difficult to grasp. So, should I, in the interest of simplicity, rename DelegationInterceptor to something using simpler language? I'm wondering whether this simplification might have any (legal ??) consequences ?
Being Culture-Neutral:
Many-a-times, the easiest way for a developer to understand things is to see an example (or even a framework name) that is similar to something they see every day - which is why Pizzeria or Token Rings would be cool as example usages of my API. On the other hand, these same concepts may not be heard of in a country where most of the developers who develop to your API come from. So should I make generic examples? But then, what good are dull, boring, "generic" examples?
It'd be great if anyone could point me to any API's out there which do a good job of catering to developers from varied locales and cultures - not necessarily in the REST or Java space - anything will do.
My 3 cents: i18n best practices are not restricted to "geography, language and locale perspective". I even think that the most interesting aspect of i18n is getting to know and understand different cultures with all their richness.
To answer your question shortly: there is a book on API design written by Krzysztof Cwalina (nice first name, isn't it?) and Brad Abrams called Framework Design Guidelines. There is also some presentation on slideshare.
Anyway, I read the book and I think it is great, eye opening at least.
The longer answer... What you are referring to, is a Programming Usability. I haven't actually seen the topic covered in details (yet), but you can find many articles on Usability of Programming Languages (i.e. these slides). It seems this is pretty new discipline and the one that is pretty hard (it is a mixture of psychology, linguistics, grammar in two different sense, theory behind compilers, algorithmic, ... , and more). The most important would be of course human factor, especially inherent ability to produce errors.
Very interesting topic :)
Going through details of your question:
There is no way to create L10n API, as Localization is a process of adapting the software to local market needs. What you want to create is i18n-related API.
I don't necessary know why it has to be REST API, but to be 100% honest I am afraid that you might want to create some super-fantastic API that is actually against i18n best practices. The first things first: if you want it to be consumed by many developers, it should be regular API just like ICU. Maybe some parts of it could be exposed as RESTful API, but I am not sure why you want to do this.
As I already mentioned, there is something called ICU, especially ICU4J. I know that this API is extremely complex and not very developer-friendly, but it has one very important advantage: it does exist. And it was created by i18n experts, so it really follows best practices. Some parts are inherently complex because of nature of things - they just have to be if you want to implement the cultural support correctly. Sorry.
By the way: I might be wrong, but you said REST API, which rings a bell in my head. I believe you think of i18n support on the client side, don't you? In such case, I must ask one question: what is wrong with Globalize and/or Dojo and why you think it is better option to do everything on the server side?
OK, with Dojo I can answer the question myself: size and responsiveness.
Going through your points: "Provide a way for the developer to localize Strings (obvious)". It is not so obvious. That is, it is not as easy as you might think. If you want to do the obvious, you must be sure to understand the terms: Resource Model, Resource Organization, Locale Fall-back, Message Formatting, Machine Translation and Translation Memory.
Trust me, it is really easy to do a mistake here. On one hand, I doubt that anyone could create an API that will stop regular programmers from being lazy and hardcode strings, I doubt that it will ever happen. On the other hand, your friendly API (if you could achieve this) could easily allow reusing translations (which is i18n defect if it doesn't regard to common things like "OK", "Cancel", etc.). Also, you need to think of formatting capability so that it is (almost) impossible to introduce concatenations (very common i18n defect, preventing correct translations) and at the same time it is easy to handle multiple plural forms (still think you know best practices...?).
Proper organization and valid abstract model might help with implementation of TM and MT (that is reusing of old translations and minimizing the costs of new ones). But this is hard and very few people do it correctly (there are even some frameworks, like Play for example that implement serious misconceptions, i.e. single translation file only).
"Provide a way for the developer to customize locale-specific artifacts (measurements, units like currency, distance, weights etc)". Great idea. But please make sure that you will include formats as well. I mean that number format varies, unit varies, unit name and symbols (even for the same units) varies, but also unit placement may vary.
Some of these artifacts are already in ICU and CLDR, but for others you would actually need to get valid translations of patterns and items themselves.
From my experience it might be hard to collect the translations in the first place, yet valid ones...
"Internationalize the API documentation". Let me guess: you meant Localize, which in that case would probably be equal to Translate.
To be honest, I don't think that translating documentation of some API or Framework is all that important. Professional programmers have to have some command of English, at least be able to understand the technical documentation and write passable code (in terms of variable names and comments) - it is very unprofessional not to use English for such items.
"Correctness vs. Simplicity". I am not sure what kind of correctness you refer to. In terms of English language grammar, I would definitely favor simplicity over language correctness.
In terms of valid support for i18n, there are so many incorrect libraries already, please refrain from providing another one. As I wrote before, some things are inherently complex and they could be either done correctly (that will result in a complex API) or should not be done at all. Bringing simple, but only partially correct solution for cultural support will result in large number of defects (that I will curse you for) and the need to find even more complex workarounds. It is not worth the effort.
"Being culture neutral". Please read the book, I recommended. It covers this shortly (there is no need to go deeper, actually).
I doubt you should actually strive so much for political correctness, just avoid something you are sure my hurt somebody's feelings (don't do to others what you don't want to do unto you). That's it.
EDIT: Just two more things.
It might be a good idea to actually perform Usability tests on your API (just like you would do for User Interface). If it feels natural and intuitive, you did a great job. By doing that, you will also learn how people might want to use your library, that is you will discover additional use cases.
It is probably much harder to create programming library than to actually create a program. In case of library/API you often need to break truths that are (or at least seem to be) carved into stone, that is create something that is against common OOP/OOD principles, but is easy to use. You would also need to provide more overloads (there are many different use cases, mind you). Something as simple as formatting Date/Time in Java could really give you a headache if you want to support java.util.Date, Calendar, java.sql.Date, java.sql.Time, java.sql.Timestamp, XMLGregorianCalendar, Joda Time and JSR-310.
As a side not, I am not sure if sending formatted date/times over REST API is actually the i18n best practice...

Business rules Java app for User

The description may sound like just a bunch of words so here is a more detailed explanation. I have a User object which is mapped to database table.
I want users to be in different roles. There will be a bunch of those - and they technically will be the same users in same table but to them will apply different roles. Say user in role A will have to have two fields as required, and will have to have certain restrictions to the length and contents on his password, as well as the time expiration of his password, etc.
While I can hardcore those rules I am very interested to find out of there is an other way to define the rules and may be store in database so it's easier to load/apply and the main idea - to change and update them -- without redeploying the codebase.
Technically the stupidest and straightforward solution is to implement class, serialized, store in db, then load, deserialze, call methods on it which will execute rules. The problem is in changes to the ruleset ( read "interface" of the rule class ) and that generally solution sounds like a hack.
Anytihing else? Any frameworks? Other approaches?
UPDATE: probably was not clear. say, I have class User.java
I need to define different rules say:
1. do we need to verify length of password, and what should it be?
2. do we need to require some properties to be required?
3. do we need to track login attempts for this user?
4. if we do track, how many login attempts allowed?
5. do we expire password?
6. if we do, then in how many days? or months? or weeks?
7. ...
and so on and so on.
so questions ARE.
- how do I define those rules and operate on User object WITHOUT modifying and redeploying code base?
- how do I store those set of rules?
Drools, jBPM, etc. do not seem like a fit for that task. But any advice would help!
JRuleengine is good I heard, sometime back I planned to use it for similar application.
There are many other Rule Engines though.
Well there are some good rules engines out there include jrules, drools I think is popular too. One thing to keep in mind is the relationship between a rule and the data it examines. After all you can have the rules in a word document, but when they execute they need examine data, and that is also a factor in choosing a rule engine or architecture. generally its if (a > b) then do y. Means you need to examine a and b in the rule execution. That is the real issue is how to get the parameters into the rule and engine.

Domain name interpretation utility for java

I find myself with a need for a java utility for taking a fully-qualified hostname, and producing the domain name from that.
In the simple case, that means turning host.company.com into company.com, but this gets rapidly more complicated with host.subdomain.company.com, for example, or host.company.co.uk, where the meaning of "domain name" gets a bit fuzzy. Throw in complications with the definition of SLD and ccSLD, and it gets messy.
So my question is whether there's a 3rd-party library out there that understands these things and can give me sensible interpreations.
Mozilla regularly maintains the rules that it uses in its browser for cookie security in a format that can be parsed and used by others:
http://publicsuffix.org/
Searching Google, there are probably Java libraries that can parse the list, but I don't know the quality of any of them.
I don't think such a thing exists, since it's an adminstrative rather than technical issue, and a very multi-lateral one, at that.
If you end up rolling your own, this page on the Mozilla wiki looks like a good starting point, with lots of references. Looks like a major headache though. Just look at the rules for Japan. Ouch.
Not sure if it's for the same purpose, I do something similar in my code. When I set cookies, I want to set the domain as close to top as possible so I need to find the domain one-level lower than a public suffix. For example, the highest domain you can set cookie for host.div.example.com is .example.com. For host.div.example.co.jp is .example.co.jp.
Unfortunately, the code is not in the public domain. It's very easy to do. I basically use following 2 classes from Apache HttpClient 4,
org.apache.http.impl.cookie.PublicSuffixFilter
org.apache.http.impl.cookie.PublicSuffixListParser
I forgot the exact reason but we had to make some very minor tweaks. You just walk the domain from top to bottom, first valid cookie domain is what you need.
You need to download the public suffix list from here and include it in your JAR,
http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/effective_tld_names.dat?raw=1

Categories