Using Solr for indexing multiple languages

Using Solr for indexing multiple languages - java

We're setting up a Solr to index documents where title field can be in various languages. After googling I found two options:
Define different schema fields for
every language i.e. title_en,
title_fr,... applying different
filters to each language then query
one of title fields with a
corresponding language.
Creating
different Solr cores to handle each
language and make our app query
correct Solr core.
Which one is better? What are the ups and downs?
Thanks

There's also a third alternative where you use a common set of fields for all languages but apply a filter to a field language. For instance if you have the fields text, language you can put text contents for all languages in to the text field and use e.g., fq=language:english to only retrieve english documents.
The downside of this approach is that you cannot use language specific features such as lemmatisation, stemming, etc.
Define different schema fields for every language i.e. title_en, title_fr,... applying different filters to each language then query one of title fields with a corresponding language.
This approach gives good flexibility, but beware of high memory consumption and complexity when many languages are present. This can be mitigated using multiple solr servers.
Creating different Solr cores to handle each language and make our app query correct Solr core.
Definately a nice solution. But whether the separate administration and slight overhead will work for you is probably in relation to the number of languages you wish to use.
Unless the first approach is applicable, I would probably lean towards the second one unless the scalability of cores isn't desired. Either approach is fine though and I think it basicaly comes down to preference.

It all depends on your requirements. I am assuming you dont need to query multiple languages in a single query. In that case splitting them into multiple cores would be a better idea since you can tweak around that core without affecting the other cores & index. With multiple languages there will be some tweaking or the other involved due to stemming, spell check & other features (if you plan to use them).
There is also an option of multiple solr webapps within a servlet container. So that could be an option you can look at.
It all depends on the flexibility that you had with regards to downtime that you could take to fix any issues.

If you use multiple cores and you need sharding, one of the issue I can see is:
you will need to do sharding on each language (core). You won't be able to do sharding on the whole index at once.
If you use a single core, maybe you lose space with text columns that are "not full", not sure about that.

Related

Why would you prefer Java 8 Stream API instead of direct hibernate/sql queries when working with the DB

Recently I see a lot of code in few projects using stream for filtering objects, like:
library.stream()
.map(book -> book.getAuthor())
.filter(author -> author.getAge() >= 50)
.map(Author::getSurname)
.map(String::toUpperCase)
.distinct()
.limit(15)
.collect(toList()));
Is there any advantages of using that instead of direct HQL/SQL query to the database returning already the filtered results.
Isn't the second aproach much faster?

If the data originally comes from a DB it is better to do the filtering in the DB rather than fetching everything and filtering locally.
First, Database management systems are good at filtering, it is part of their main job and they are therefore optimized for it. The filtering can also be sped up by using indexes.
Second, fetching and transmitting many records and to unmarshal the data into objects just to throw away a lot of them when doing local filtering is a waste of bandwidth and computing resources.

On a first glance: streams can be made to run in parallel; just by changing code to use parallelStream(). (disclaimer: of course it depends on the specific context if just changing the stream type will result in correct results; but yes, it can be that easy).
Then: streams "invite" to use lambda expressions. And those in turn lead to usage of invoke_dynamic bytecode instructions; sometimes gaining performance advantages compared to "old-school" kind of writing such code. (and to clarify the misunderstanding: invoke_dynamic is a property of lambdas, not streams!)
These would be reasons to prefer "stream" solutions nowadays (from a general point of view).
Beyond that: it really depends ... lets have a look at your example input. This looks like dealing with ordinary Java POJOs, that already reside in memory, within some sort of collection. Processing such objects in memory directly would definitely be faster than going to some off-process database to do work there!
But, of course: when the above calls, like book.getAuthor() would be doing a "deep dive" and actually talk to an underlying database; then chances are that "doing the whole thing in a single query" gives you better performance.

The first thing is to realize, that you can't tell from just this code, what statement is issued against the database. It might very well, that all the filtering, limiting and mapping is collected, and upon the invocation of collect all that information is used to construct a matching SQL statement (or whatever query language is used) and send to the database.
With this in mind there are many reasons why streamlike APIs are used.
It is hip. Streams and lambdas are still rather new to most java developers, so they feel cool when they use it.
If something like in the first paragraph is used it actually creates a nice DSL to construct your query statements. Scalas Slick and .Net LINQ where early examples I know about, although I assume somebody build something like it in LISP long before I was born.
The streams might be reactive streams and encapsulate a non-blocking API. While these APIs are really nice because they don't force you to block resources like threads while you are waiting for results. Using them requires either tons of callbacks or using a much nicer stream based API to process the results.
They are nicer to read the imperative code. Maybe the processing done in the stream can't [easily/by the author] be done with SQL. So the alternatives aren't SQL vs Java (or what ever language you are using), but imperative Java or "functional" Java. The later often reads nicer.
So there are good reasons to use such an API.
With all that said: It is, in almost all cases, a bad idea to do any sorting/filtering and the like in your application, when you can offload it to the database. The only exception I can currently think of is when you can skip the whole roundtrip to the database, because you already have the result locally (e.g. in a cache).

Well, your question should ideally be - Is it better to do reduction / filtering operations in the DB or fetch all records and do it in Java using Streams?
The answer isn't straightforward and any stats that give a "concrete" answer will not generalize to all cases.
The operations you are talking about are better done in the DB itself, because that is what DBs are designed for, very fast handling of data. Of course usually in case of relational databases, there will be some "book-keeping and locks" being used to ensure that independent transactions don't end up making the data inconsistent, but even with that, DBs do a pretty good job in filtering data, especially large data sets.
One case where I would prefer filtering data in Java code rather than in DB would be if you need to filter different features from the same data. For example, right now you are getting only the Author's surname. If you wanted to get all books written by the author, ages of authors, children of author, place of birth etc. Then it makes sense to get only one "read-only" copy from the DB and use parallel streams to get different information from the same data set.

Unless measured and proven for a specific scenario either could be good or equally bad. The reason you usually want to take these kind of queries to the database is because (among other things):
DB can handle much larger data then your java process
Queries in a database can be indexed (making them much faster)
On the other hand, if your data is small, using a Stream the way you did is effective. Writing such a Stream pipeline is very readable (once you talk Streams good enough).

Hibernate and other ORMs are usually way more useful for writing entities rather than reading, because they allow developers to offload ordering of specific writes to framework that almost never will "get that wrong".
Now, for reading and reporting, on the other hand (and considering we are talking DB here) an SQL query is likely to be better because there will not be any frameworks in-between, and you will be able to tune query performance in terms of database that will be invoking this query rather than in terms of your framework of choice, which gives more flexibility to how that tuning can be done, sort of.

Implementing logic/business rules in application without compile/deploy process?

My current application has a number of rules in the form of if-else conditional statements that work on some parameters to either modify some set of variables or set/unset other variables. Due the increasing clients that we are having, the application code is now getting cluttered with these if-else rules. Also, these rules are not static they change quite frequently tweaked, some are made to expire other's activated etc. I have seen that drools etc based on JSR094 provide this functionality to separate logic/rules from application source code.
My requirement is not so complex, I don't need complex rule visualization software, editing UI, etc. but definitely dynamic(modifications/editable rules), also my requirement requires low latency as it processes some 100s of million requests a day.
Any ideas of a light weight implementation to solve this as I feel drools is overkill?
My platform for development is Java.
Example rule:
if ( age > 15 && item belongs_to(footwear) || using(mobile_device) )
discount = 2%

I strongly suggest you use a proper rule engine, drools, flexrule... If you do not like using those solution, you always can find some expression evaluation libraries that let you to parse and execution a string based expression.

JSP internationalization RTL/LTR

I want to create a web site which can be viewed with two languages, one LTR and one RTL. This means that all content should be shown in either of the two languages.
My framework is Spring, and I'm using Tiles2, but I think this question is not framework specific.
The obvious solution to supporting two languages is having everything doubled (all JSP's, fragments, etc.), and you get the part of the tree which fits the language you chose. But this causes problems when changing the web site (you might forget to update the other JSP's), and is not scalable (try doing this for 5 or 10 languages).
I know I can use properties files to host strings for the different languages, but then my web site would be a huge collection of spring:message tags and will be a lot harder to maintain (what happens if I have a paragraph of 100 lines, does this all go into a single properties line?)
Is there any kind of framework, plugin, other, which solves this problem? Has anyone come across a clever solution to this problem?

I've never realized a complete project, just some tests. I think this problem is not so big as it seems if you follow some simple rules. Here is what I would try to do:
Specify direction with <body dir='ltr/rtl'>. This is preferred versus CSS direction attribute.
Avoid different left/right margins or paddings in all CSS. If you must break this rule, probably you'll need to use two different files (ltr.css and rtl.css) containing all these different elements.
Sometimes you'll need to move some elements from left to right or vice versa. For example, in LTR you want a menu on the left, but in RTL you want it on the right. You can achieve this using CSS, but this sometimes is complicated if you are not an expert and you must test it in all browsers. Another option is to use some IF depending on the case. This last option will fit very well if you use a grid based CSS library, like Bootstrap.
Choose carefully which CSS and JS libraries you'll use. Obviously, pick the ones which offer RTL/LTR support.
Don't worry too much about the images. If you must change one image depending on the language is probably because it has some text in it. So, you must use different images anyway. This is a problem related to i18n, not a text direction issue.
Don't let your customer to be too much fussy about it. I think that with these rules (and maybe some more) you can get a good result. But if your customer starts complaining about one pixel here and another one there, you'll need to complicate all this and probably is not necessary.
About your language properties file. Yes, use them. Always. This is a good practice even when you are only using one language: HTML structure is separated from content, is very easy to correct or translate, some words or sentences are in only one file...

Usually, web frameworks are used to build web applications rather than web sites, and there are quite few long static paragraphs. Most of the content is dynamic and comes from a database. But yes, the usual way of doing is to externalize everything to resource bundles, usually in the form of properties files.
Putting a long paragraph in a properties file doesn't cause much problem, because you can break long paragraphs into multiple lines by ending each line by a backslash:
home.welcomeParagraph=This is a long \
paragraph splitted into several lines \
thanks to backslashes.

RTL and LTR is one of the upper and more difficult i18n problems.
Basically its a Problem of the view-scope of the MVC-Model. This may also includes pictures and emotional differences like the color of the skin of people. In this case you better abadon to the solution HTML+CSS gives you.
In example:
<style type="text/css">
*:lang(ar) { direction:rtl }
*:lang(de) { direction:ltr }
</style>
The best practice is to ask members of the audience-group about what effect the webpages have to them.

I agree to most of solutions provided here. Your problem is more design (architecturally) oriented rather than technical. You need to choose path whether you need to keep this logic of internationalization on server (java) side or in static files.
In case you want to go for java side (preferable solution), you need to keep two properties file and use jstl tags. This minimizes your work in case you want to add another language in future. This is maintainable solution. I have seen applications supporting more than 15 languages and time zones. In fact release process gets pretty easy.
In case you want to go for keeping multiple css and other static files, you will soon find things running out of your hands pretty soon. I dont think this is a maintainable solution.
Said all this, I will leave this choice to the architect of application. He will be able to judge which way to go based upon the nature of application and constraints given to him.

You don't want to use everywhere. That's a pity because it is just the way you should do it. It is a bad practice to keep hard-coded texts in a jsp if you need internationalization.
Furthermore, Most of the modern IDE allows you to go to the variable declaration by doing ctrl+left click (or hovering the key) so that having a lot of variables in your code should not be a problem for maintenance.

First, you must distinguish, for each text element, whether it is a user interface element (e.g. button label) or redactionnal content.
user interface element labels will be stored in properties file that will have to be translated for each supported language (and provide a default value as a fall back)
redactionnal content will be stored in a content management system that you will organize in order to find easily a localized version of your content

Is there a dbunit-like framework that doesn't suck for java/scala?

I was thinking of making a new, light-weight database population framework. I absolutely hate dbunit. Before I do, I want to know if someone already did it.
Things i dislike about dbunit:
1) The simplest format to write and get started is deprecated. They want you to use formats that are bloated. Some even require xml schemas. Yeah, whatever.
2) They populate rows not in the order you write them, but in the order tables are defined in the xml file. This is really bad because you can't order your data in such a way that foreign key constraints won't cause problems. This just forces you to go through the hassle of turning them off altogether.
This also wastes time and bloats up your junit base classes to include code to disable the foreign key constraints. You will probably have to test for the database type (hsqldb, etc.) and disable them in database-specific ways. This is way bad.
It could be better if dbunit helped in disabling foreign key constraints as part of their framework automatically, but they don't do this. They do keep track of dialects... so why not use them for this? Ultimately, all of this does is force the programmer to waste time and not get up and testing quickly.
3) XML is a pain to write. I don't need to say more about this. They also offer so many ways to do it, that I think it just complicates matters. Just offer one really solid way and be done with it.
4) When your data gets large, keeping track of the ids and their consistent/correct relationships is a royal pain.
Also, if you don't work on a project for a month, how are you to remember that user_id 1 was an admin, user_id 2 was a business user, user_id 3 was an engineer and user_id 4 was something else? Going back to check this is wasting more time. There should be a meaningful way to retrieve it other than an arbitrary number.
5) It's slow. I've found that unless hsqldb is used, it is painfully slow. It doesn't have to be. There are also numerous ways to mess up its configuration as it is not easy to do "out of the box". There is a hump that you must go through to get it working right. All this does is encourage people to not use it, or be pissed of when they do start to use it.
6) Some values tend to repeat a lot, likes dates. It'd be nice to specify defaults, or even have the framework put defaults in automatically, even without you telling it to put defaults in there. That way you can create objects just with the values you want, and leave the rest off. This sure beats specifying every nook and cranny of a column if it's not required.
7) Probably the most annoying thing is that the first entry must include ALL the values - even null placeholders - or future rows won't pick the columns that you actually specified.
DBunit doesn't have a sensible default for translating [NULL] to a real null value either. You have to manually add it. Tell me, who hasn't done this with dbunit? Everyone has. It shouldn't be like this!
What this means is that if you have a polymorphic object, you must declare all the foreign keys to the joining tables of each subclass in the first row, even though they are null. If you do a table for all subclasses pattern, you still have to specify all the fields on the first row. This is just awful.
Anything out there to satisfy me, or should I become the next framework developer of a much better database testing framework?

I'm not aware of any real alternative to DbUnit and none of the tools mentioned by #Joe are in my eyes:
Incanto: not DB agnostic
SQLUnit: a regression and unit testing harness for testing database stored procedures (that's not what DbUnit is about)
Cactus: a tool for In-container testing (I fail to see where it helps with databases)
Liquibase: a database migration tool (doesn't load/verify data)
ORMUnit: can initialize a database but that's all
JMock: doesn't compete with DbUnit at all
That being said, I've personally used DbUnit successfully several times, on small and huge projects, and I find it pretty usable, especially when using Unitils and its DbUnit module. This doesn't mean it's perfect and can't be improved but with decent tooling (either custom made or something like Unitils), using it has been a decent experience.
So let me answer some of your points:
The simplest format to write and get started is deprecated. They want you to use formats that are bloated. Some even require xml schemas. Yeah, whatever.
DbUnit supports flat or structured XML, XLS, CSV. What revolutionary format would you like to use? By the way, a DTD or schema is not mandatory when using XML. But it gives you nice things like validation and auto-completion, how is that bad? And Unitils can generate it easily for you, see Generate an XSD or DTD of the database structure.
It could be better if dbunit helped in disabling foreign key constraints as part of their framework automatically, but they don't do this. They do keep track of dialects... so why not use them for this? Ultimately, all of this does is force the programmer to waste time and not get up and testing quickly.
They are waiting for your patch.
Meanwhile, Unitils provides support to handle constraints transparently, see Disabling constraints and updating sequences.
XML is a pain to write. I don't need to say more about this. They also offer so many ways to do it, that I think it just complicates matters. Just offer one really solid way and be done with it.
I guess pain is subjective but I don't find it painful, especially when using a schema and autocompletion. What is the silver bullet you're suggesting?
When your data gets large, keeping track of the ids and their consistent/correct relationships is a royal pain.
Keep them small, that's a know best practice. You're going against a known best practice and then complain...
Also, if you don't work on a project for a month, how are you to remember that user_id 1 was an admin, user_id 2 was a business user, user_id 3 was an engineer and user_id 4 was something else? Going back to check this is wasting more time. There should be a meaningful way to retrieve it other than an arbitrary number.
Yes, task switching is counter productive. But since you're working with low level data, you have to know how they are represented, there is no magic solution unless you use a higher level API of course (but that's not the purpose of DbUnit).
It's slow. I've found that unless hsqldb is used, it is painfully slow. It doesn't have to be. There are also numerous ways to mess up its configuration as it is not easy to do "out of the box". There is a hump that you must go through to get it working right. All this does is encourage people to not use it, or be pissed of when they do start to use it.
That's inherent to databases and JDBC, not DbUnit. Use a fast database like H2 if you want things to be as fast as possible (if you have a better agnostic way to do things, I'd be glad to learn about it).
Probably the most annoying thing is that the first entry must include ALL the values - even null placeholders - or future rows won't pick the columns that you actually specified.
Not when using Unitils as mentioned in presentations like Unitils - Home - JavaPolis 2008 or Unit testing: unitils & dbmaintain.
Anything out there to satisfy me, or should I become the next framework developer of a much better database testing framework?
If you think you can make things better, maybe contribute to existing solutions. If that's not possible and if you think you can create the killer database testing framework, what can I say, do it. But don't forget, ranting is easy, coming up with solutions using your own solutions is less so.

As a DbUnit developer I'm grateful for criticism and I must partially agree with you. We are currently starting the design of the next DbUnit major release and I wish to invite you to participate both in the discussion and development.
I'm not going to answer your points as your question is not really related to DbUnit, but to DbUnit alternatives. Anyway, I just want to highlight your point 7 is completely false: you do not need to specify all the columns on first row any more, the feature is called column sensing. I'm not going to tell you why it's not enabled by default as you are surely smart enough to understand it by yourself.
I'll give scaladbtest a deep examination in the hope we can integrate their ideas.

Faced with similar concerns using DBUnit I have found this : http://dbsetup.ninja-squad.com/index.html which may address concerns. Such as instead of representing test data in separate files all DB content is contained within the java class itself.

If you use the Spring Framework (or don’t mind using it at least for testing), then Spring DBUnit is currently the best (maintained) alternative to plain DBUnit that I know and use. Quoting their website:
Spring DBUnit provides integration between the Spring testing
framework and the popular DBUnit project. It allows you to setup and
teardown database tables using simple annotations as well as checking
expected table contents once a test completes.
Spring DBUnit appears to be the ‘somewhat official’ Spring solution for DB unit testing (with DBUnit); at least the author/maintainer of the library, Phil Webb, is working at SpringSource/Pivotal.

I use DBUnit, with a few wrappers to smooth over the rough edges. A nice tool that can either complement or overlap the functionality is Jailer. It can extract subsets of data from a reference database, and store this as either DBUnit compatible XML files, or as "topologically sorted DML files", which respect the foreign key constraints.

I just released a library called JDBDT (Java Database Delta Testing) that
you may use for database setup and validation in software tests.
Have a look at http://jdbdt.org
Best,
Eduardo

You're making excellent point.
I've been working for a lot of web portals over the last years, mostly with PHP, but also some Java now and then.
And like you I don't get that after all these years framework and unittesting developers don't seem to realize how much storage handling has changed in the last decade.
It's not enough to just send create/insert/truncate statements to some database!
If you're operating at large scale you end up employing all sorts of storage backends, organized in layers to push hot content out fast. Plus on the Database front there's the issue of data partitioning. If you don't have a proper foreign key abstraction provided you will certainly go nuts when your storage setup changes. And while we're at it: fixture ordering by foreign key precedence has many pitfalls and I have yet to see a real solution for that with DBUnit.
Anyway, the point is having just a basic database storage in place for unittesting is not enough for complex storage setups, since they often fail to reproduce problems in the live environment and are a pain in the ass to maintain.
Without wanting to sound like a fanboy: one place where things are okay is ruby on rails.
That has a persistent model concept that people seem to have actually put some thought into. If you're dealing in PHP, Symfony is the place to go. It is limited via the default inclusion of Doctrine, with is also quite DB-centric, but it has clean interfaces and great extensibility and copied the rails fixture system completely. Professionally I need to stick to homebrew solutions for now, but they work okay.

Another vote for wrapping DBUnit with a modern library to improve usability and conciseness. My choice is database-rider, which makes DBUnit a breeze to use and even supports JUnit 5 as demonstrated in the following example:
#RunWith(JUnitPlatform.class)
#ExtendWith(DBUnitExtension.class)
#DBUnit(cacheConnection = true, cacheTableNames = true)
class TestInstrumentQueryService {
private ConnectionHolder connHolder = () -> EntityManagerProvider.instance("my-jta-unit").connection();
#DBRider
#DataSet("datasets/instrumentIds.yml")
void testFindInstrumentById() {
InstrumentQueryService iqs = new InstrumentQueryService(EntityManagerProvider.em());
Instrument instr = iqs.findInstrumentById(InstrumentIdType.TICKER_BBG, "AAPL");
assertEquals(100, instr.getId());
}
}
Notice how this allows leveraging (concise) YAML testing data sets seamlessly (YAML not XML, though I'm lead to believe that DBUnit actually supports those natively).

Here's a short list of a few tools in this vein (besides DBunit) that I particularly like, or find interesting. At the very least they may offer some inspiration:
Incanto
SQLunit
Cactus
Liquibase
ORMUnit
JMock
Note that none of these are really competitors to DBunit in terms of scope or feature sets. However, there are some interesting ideas there that might be worth taking a look at. Good luck!

We are writing Daleq as a wrapper around DbUnit to address some of the mentioned concerns. It allows populating a DB just within your unit test rather than relying on editing XML files.

I too had similar issues with DBUnit. Especially for using it to populate local development data and exporting data from a real database. I ran into several cases where it would export a dataset that it couldn't then import.
This inspired me to write a new library for it: https://github.com/jeffskj/phonydata
This uses a groovy DSL to define the datasets which makes for a very compact representation of the data and makes it possible to do cool things like generate random data since it's just groovy code.

The situation of DBUnit is indeed sometimes frustrating. Some of the problem are solved from Marc Philipp with dbunit-datasetbuilder, specially if you combine it with the validator, which is in a very early stage. You can see it in action at SZE.
Disclaimer: All referenced github-resources are maintained by me.

An alternative using Spring configuration and Specs2 testing can be found here

I just released a groovy DSL based framework called pedal-loader available via github. Documentation here.
It allows you to work with JPA entity level abstraction directly. Since it is a groovy script, you can use all of the groovy constructs.
To insert rows into a table backed by a JPA entity called Student, with fields (not database columns, but mapped fields) called id, name and grade, you would do something like this:
allStudents = table(Student, ['id', 'name', 'grade']) {
row 1, 'Joe', Grade.A
rowOfInterest = row 2, 'John', Grade.B
}
Grade is an enum in the Student class that is mapped to the database column (perhaps using JPA 2.1 #Convert annotation). allStudents is a list that will hold the rows and rowOfInterest is a reference to a particular row. These properties (allStudents and rowOfInterest) become available to your unit test.

How easily customizable are SAP industry-specific solutions?

First of all, I have a very superficial knowledge of SAP. According to my understanding, they provide a number of industry specific solutions. The concept seems very interesting and I work on something similar for banking industry. The biggest challenge we face is how to adapt our products for different clients. Many concepts are quite similar across enterprises, but there are always some client-specific requirements that have to be resolved through configuration and customization. Often this requires reimplementing and developing customer specific features.
I wonder how efficient in this sense SAP products are. How much effort has to be spent in order to adapt the product so it satisfies specific customer needs? What are the mechanisms used (configuration, programming etc)? How would this compare to developing custom solution from scratch? Are they capable of leveraging and promoting best practices?

Disclaimer: I'm talking about the ABAP-based part of SAP software only.
Disclaimer 2, ref PATRYs response: HR is quite a bit different from the rest of the SAP/ABAP world. I do feel rather competent as a general-purpose ABAP developer, but HR programming is so far off my personal beacon that I've never even tried to understand what they're doing there. %-|
According to my understanding, they provide a number of industry specific solutions.
They do - but be careful when comparing your own programs to these solutions. For example, IS-H (SAP for Healthcare) started off as an extension of the SD (Sales & Distribution) system, but has become very much more since then. While you could technically use all of the techniques they use for their IS, you really should ask a competent technical consultant before you do - there are an awful lot of pits to avoid.
The concept seems very interesting and I work on something similar for banking industry.
Note that a SAP for Banking IS already exists. See here for the documentation.
The biggest challenge we face is how to adapt our products for different clients.
I'd rather rephrase this as "The biggest challenge is to know where the product is likely to be adapted and to structurally prepare the product for adaption." The adaption techniques are well researched and easily employed once you know where the customer is likely to deviate from your idea of the perfect solution.
How much effort has to be spent in
order to adapt the product so it
satisfies specific customer needs?
That obviously depends on the deviation of the customer's needs from the standard path - but that won't help you. With a SAP-based system, you always have three choices. You can try to customize the system within its limits. Customizing basically means tweaking settings (think configuration tables, tens of thousands of them) and adding stuff (program fragments, forms, ...) in places that are intended to do so. Technology - see below.
Sometimes customizing isn't enough - you can develop things additionally. A very frequent requirement is some additional reporting tool. With the SAP system, you get the entire development environment delivered - the very same tools that all the standard applications were written with. Your programs can peacefully coexist with the standard programs and even use common routines and data. Of course you can really screw things up, but show me a real programming environment where you can't.
The third option is to modify the standard implementations. Modifications are like a really sharp two-edged kitchen knife - you might be able to cook really cool things in half of the time required by others, but you might hurt yourself really badly if you don't know what you're doing. Even if you don't really intend to modify the standard programs, it's very comforting to know that you could and that you have full access to the coding.
(Note that this is about the application programs only - you have no chance whatsoever to tweak the kernel, but fortunately, that's rarely necessary.)
What are the mechanisms used (configuration, programming etc)?
Configurations is mostly about configuration tables with more or less sophisticated dialog applications. For the programming part of customizing, there's the extension framework - see http://help.sap.com/saphelp_nw70ehp1/helpdata/en/35/f9934257a5c86ae10000000a155106/frameset.htm for details. It's basically a controlled version of dependency injection. As a solution developer, you have to anticipate the extension points, define the interface that has to be implemented by the customer code and then embed the call in your code. As a project developer, you have to create an implementation that adheres to the interface and activate it. The basic runtime system takes care of glueing the two programs together, you don't have to worry about that.
How would this compare to developing custom solution from scratch?
IMHO this depends on how much of the solution is the same for all customers and how much of it has to be adapted. It's really hard to be more specific without knowing more about what you want to do.

I can only speak for the Human Resource component, but this is a component where there is a lot of difference between customers, based on a common need.
First, most of the time you set the value for a group, and then associate the object (person, location...) with a group depending on one or two values. This is akin to an indirection, and allow for great flexibility, as you can change the association for a given location without changing the others. in a few case, there is a 3 level indirection...
Second, there is a lot of customization that is nearly programming. Payroll or administrative operations are first class example of this. In the later cas, you get a table with the operation (hiring for example), the event (creation, modification...) a code for the action (I for test, F to call a function, O for a standard operation) and a text field describing the parameters of a function ("C P0001, begda, endda" to create a structure P001 with default values).
Third, you can also use such a table to indicate a function or class (ABAP-OO), that will be dynamically called. You get a developer to create this function or class, and then indicate this in the table. This is a method to replace a functionality by another one, or extend it. This is used extensively in the ESS/MSS.
Last, there is also extension point or file that you can modify. this is nearly the same as the previous one, except that you don't need to indicate the change : the file is always used (ZXPADU01/02 for HR modification of infotype)
hope this help
Guillaume PATRY

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.