fuzzy implementation for capturing specific strings

fuzzy implementation for capturing specific strings - java

I am going to develop a web crawler using java to capture hotel room prices from hotel websites.
In this case I want to capture room price with the room type and the meal type, so my algorithm should be intelligent to handle that.
For example:
Room type: Deluxe
Meal type: HalfBoad
price : $20.00
The main problem is room prices can be in different ways in different hotel sites. So my algorithm should be independent from hotel sites.
I am plan to use above room types and meal types as a fuzzy sets and compare the words in webpage with above fuzzy sets using a suitable membership function.
Anyone experienced with this? or have an idea for my problem?

There are two ways to approach this problem:
You can customize your crawler to understand the formats used by different Websites; or
You can come up with a general ("fuzzy") solution.
(1) will, by far, be the easiest. Ideally you want to create some tools that make this easier so you can create a filter for any new site in minimal time. IMHO your time will be best spent with this approach.
(2) has lots of problems. Firstly it will be unreliable. You will come across formats you don't understand or (worse) get wrong. Second, it will require a substantial amount of development to get something working. This is the sort of thing you use when you're dealing with thousands or millions of sites.
With hundreds of sites you will get better and more predictable results with (1).

As with all problems, design can let you deliver value adapt to situations you haven't considered much more quickly than the general solution.
Start by writing something that parses the data from one provider - the one with the simplest format to handle. Find a way to adapt that handler into your crawler. Be sure to encapsulate construction - you should always do this anyway...
public class RoomTypeExtractor
{
private RoomTypeExtractor() { }
public static RoomTypeExtractor GetInstance()
{
return new RoomTypeExtractor();
}
public string GetRoomType(string content)
{
// BEHAVIOR #1
}
}
The GetInstance() ,ethod lets you promote to a Strategy pattern for practically free.
Then add your second provider type. Say, for instance, that you have a slightly more complex data format which is a little more prevalent than the first format. Start by refactoring what was your concrete room type extractor class into an abstraction with a single variation behind it and have the GetInstance() method return an instance of the concrete type:
public abstract class RoomTypeExtractor
{
public static RoomTypeExtractor GetInstance()
{
return SimpleRoomTypeExtractor.GetInstance();
}
public abstract string GetRoomType(string content);
}
public final class SimpleRoomTypeExtractor extends RoomTypeExtractor
{
private SimpleRoomTypeExtractor() { }
public static SimpleRoomTypeExtractor GetInstance()
{
return new SimpleRoomTypeExtractor();
}
public string GetRoomType(string content)
{
// BEHAVIOR #1
}
}
Create another variation that implements the Null Object pattern...
public class NullRoomTypeExtractor extends RoomTypeExtractor
{
private NullRoomTypeExtractor() { }
public static NullRoomTypeExtractor GetInstance()
{
return new NullRoomTypeExtractor();
}
public string GetRoomType(string content)
{
// whatever "no content" behavior you want... I chose returning null
return null;
}
}
Add a base class that will make it easier to work with the Chain of Responsibility pattern that is in this problem:
public abstract class ChainLinkRoomTypeExtractor extends RoomTypeExtractor
{
private final RoomTypeExtractor next_;
protected ChainLinkRoomTypeExtractor(RoomTypeExtractor next)
{
next_ = next;
}
public final string GetRoomType(string content)
{
if (CanHandleContent(content))
{
return GetRoomTypeFromUnderstoodFormat(content);
}
else
{
return next_.GetRoomType(content);
}
}
protected abstract bool CanHandleContent(string content);
protected abstract string GetRoomTypeFromUnderstoodFormat(string content);
}
Now, refactor the original implementation to have a base class that joins it into a Chain of Responsibility...
public final class SimpleRoomTypeExtractor extends ChainLinkRoomTypeExtractor
{
private SimpleRoomTypeExtractor(RoomTypeExtractor next)
{
super(next);
}
public static SimpleRoomTypeExtractor GetInstance(RoomTypeExtractor next)
{
return new SimpleRoomTypeExtractor(next);
}
protected string CanHandleContent(string content)
{
// return whether or not content contains the right format
}
protected string GetRoomTypeFromUnderstoodFormat(string content)
{
// BEHAVIOR #1
}
}
Be sure to update RoomTypeExtractor.GetInstance():
public static RoomTypeExtractor GetInstance()
{
RoomTypeExtractor extractor = NullRoomTypeExtractor.GetInstance();
extractor = SimpleRoomTypeExtractor.GetInstance(extractor);
return extractor;
}
Once that's done, create a new link for the Chain of Responsibility...
public final class MoreComplexRoomTypeExtractor extends ChainLinkRoomTypeExtractor
{
private MoreComplexRoomTypeExtractor(RoomTypeExtractor next)
{
super(next);
}
public static MoreComplexRoomTypeExtractor GetInstance(RoomTypeExtractor next)
{
return new MoreComplexRoomTypeExtractor(next);
}
protected string CanHandleContent(string content)
{
// Check for presence of format #2
}
protected string GetRoomTypeFromUnderstoodFormat(string content)
{
// BEHAVIOR #2
}
}
Finally, add the new link to the chain, if this is a more common format, you might want to give it higher priority by putting it higher in the chain (the real forces that govern the order of the chain will become apparent when you do this):
public static RoomTypeExtractor GetInstance()
{
RoomTypeExtractor extractor = NullRoomTypeExtractor.GetInstance();
extractor = SimpleRoomTypeExtractor.GetInstance(extractor);
extractor = MoreComplexRoomTypeExtractor.GetInstance(extractor);
return extractor;
}
As time passes, you may want to add ways to dynamically add new links to the Chain of Responsibility, as pointed out by Cletus, but the fundamental principle here is Emergent Design. Start with high quality. Keep quality high. Drive with tests. Do those three things and you will be able to use the fuzzy logic engine between your ears to overcome almost any problem...
EDIT
Translated to Java. Hope I did that right; I'm a little rusty.

Related

How eliminate switch in this specific example

I have controller method that get data from request and based on subject variable from request decide to call a function. (for project need I cannot use seperate controller method for each subject variable)
For now I used switch but I think it breaks Open Closed Principle (because every time new type of subject added I have to add new case to switch) and not good design, How can I refactor this code?
Subject subject = ... //(type of enum)
JSONObject data = request.getData("data");
switch(subject) {
case SEND_VERIFY:
send_foo1(data.getString("foo1_1"), data.getString("foo1_2"));
break;
case do_foo2:
foo2(data.getInt("foo2_b"), data.getInt("foo2_cc"));
break;
case do_foo3:
do_foo3_for(data.getString("foo3"));
break;
// some more cases
}

While I am not sure about which OO principle this snippet violates, there is indeed a more roust way to achieve the logic: tie the processing for each enum value to the enum class.
You will need to generalize the processing into an interface:
public interface SubjectProcessor
{
void process(JSONObject data);
}
and create concrete implementations for each enum value:
public class SendVerifySubjectProcessor implements SubjectProcessor
{
#Override
public void process(JSONObject data) {
String foo1 = data.getString("foo1_1");
String foo2 = data.getString("foo1_2");
...
}
}
once you have that class hierarchy tree, you can associate each enum value to a concrete processor
public enum Subject
{
SEND_VERIFY(new SendVerifySubjectProcessor()),
do_foo2(new Foo2SubjectProcessor()),
...
private SubjectProcessor processor
Subject(SubjectProcessor processor) {
this.processor = processor;
}
public void process(JSONObject data) {
this.processor.process(data);
}
}
This eliminates the need for the switch statement in the controller:
Subject subject = ... //(type of enum)
JSONObject data = request.getData("data");
subject.process(data);
EDIT:
Following the good comment, You can utilize the java.util.function.Consumer functional interface instead of the custom SubjectProcessor one. You can decide whether to write concrete classes or use the lambda expr construct.
public class SendVerifySubjectProcessor implements Consumer<JSONObject>
{
#Override
public void accept(JSONObject data) {
String foo1 = data.getString("foo1_1");
String foo2 = data.getString("foo1_2");
...
}
}
OR
public enum Subject
{
SEND_VERIFY(data -> {
String foo1 = data.getString("foo1_1");
String foo2 = data.getString("foo1_2");
...
}),
...
private Consumer<Subject> processor
Subject(Consumer<Subject> processor) {
this.processor = processor;
}
public void process(JSONObject data) {
this.processor.accept(data);
}
}

// SubjectsMapping.java
Map<Subject, Consumer<JSONObject>> tasks = new HashMap<>();
tasks.put(SEND_VERIFY,
data -> send_foo1(data.getString("foo1_1"), data.getString("foo1_2")));
tasks.put(do_foo2,
data -> foo2(data.getInt("foo2_b"), data.getInt("foo2_cc")));
tasks.put(do_foo3, data -> do_foo3_for(data.getString("foo3")));
// In your controller class where currently `switch` code written
if (tasks.containsKey(subject)) {
tasks.get(subject).accept(data);
} else {
throw new IllegalArgumentException("No suitable task");
}
You can maintain Map<Subject, Consumer<JSONObject>> tasks configuration in separate class rather than mixing with if (tasks.containsKey(subject)) code. When you need another feature you can configure one entry in this map.

Answers of others seems to be great, as an addition I would suggest using EnumMap for storing enums as keys as it might be more efficient than the standard Map. I think it's also worth mentioning that the Strategy Pattern is used here to achieve calling specific actions for each key from Map without the need of building long switch statements.

Strategy Pattern too many if statements

A user enters a code and the type of that code is determined by regular expressions. There are many different type of codes, such as EAN, ISBN, ISSN and so on. After the type is detected, a custom query has to be created for the code. I thought it might be a good idea to create a strategy for type, but with time it feels wrong.
public interface SearchQueryStrategie {
SearchQuery createSearchQuery(String code);
}
-
public class IssnSearchQueryStrategie implements SearchQueryStrategie {
#Override
public SearchQuery createSearchQuery(final String code) {
// Create search query for issn number
}
}
-
public class IsbnSearchQueryStrategie implements SearchQueryStrategie {
#Override
public SearchQuery createSearchQuery(final String code) {
// Create search query for ISBN number
}
}
-
public class EanShortNumberSearchQueryStrategie implements SearchQueryStrategie {
#Override
public SearchQuery createSearchQuery(final String code) {
// Create search query for ean short number
}
}
-
public class TestApplication {
public static void main(final String... args) {
final String code = "1144875X";
SearchQueryStrategie searchQueryStrategie = null;
if (isIssn(code)) {
searchQueryStrategie = new IssnSearchQueryStrategie();
} else if (isIsbn(code)) {
searchQueryStrategie = new IsbnSearchQueryStrategie();
} else if (isEan(code)) {
searchQueryStrategie = new EanShortNumberSearchQueryStrategie();
}
if (searchQueryStrategie != null) {
performSearch(searchQueryStrategie.createSearchQuery(code));
}
}
private SearchResult performSearch(final SearchQuery searchQuery) {
// perform search
}
// ...
}
I have to say that there are many more strategies. How should I dispatch the code to the right strategy?
My second approach was to put a boolean method into every strategy to decide if the code is correct for that strategy.
public class TestApplication {
final SearchQueryStrategie[] searchQueryStrategies = {new IssnSearchQueryStrategie(), new IsbnSearchQueryStrategie(),
new EanShortNumberSearchQueryStrategie()};
public static void main(final String... args) {
final String code = "1144875X";
for (final SearchQueryStrategie searchQueryStrategie : searchQueryStrategie) {
if (searchQueryStrategie.isRightCode(code)) {
searchQueryStrategie.createSearchQuery(code);
break;
}
}
}
private SearchResult performSearch(final SearchQuery searchQuery) {
// perform search
}
// ...
}
How would you solve this problem? Is the strategy pattern the right one for my purposes?

If you are using Java 8 and you can profit from the functional features I think one Enum will be sufficient.
You can avoid using if/else statements by mapping each type of code with a Function that will return the query that needs to be executed:
import java.util.HashMap;
import java.util.Map;
import java.util.function.Function;
import java.util.regex.Pattern;
public enum CodeType
{
EAN("1|2|3"),
ISBN("4|5|6"),
ISSN("7|8|9");
String regex;
Pattern pattern;
CodeType(String regex)
{
this.regex = regex;
this.pattern = Pattern.compile(regex);
}
private static Map<CodeType, Function<String, String>> QUERIES =
new HashMap<>();
static
{
QUERIES.put(EAN, (String code) -> String.format("Select %s from EAN", code));
QUERIES.put(ISBN, (String code) -> String.format("Select %s from ISBB", code));
QUERIES.put(ISSN, (String code) -> String.format("Select %s from ISSN", code));
}
private static CodeType evalType(String code)
{
for(CodeType codeType : CodeType.values())
{
if (codeType.pattern.matcher(code).matches())
return codeType;
}
// TODO DON'T FORGET ABOUT THIS NULL HERE
return null;
}
public static String getSelect(String code)
{
Function<String, String> function = QUERIES.get(evalType(code));
return function.apply(code);
}
}
And in the main you can test your query:
public class Main
{
public static void main(String... args)
{
System.out.println(CodeType.getSelect("1"));
// System.out: Select 1 from EAN
System.out.println(CodeType.getSelect("4"));
// System.out: Select 4 from ISBB
System.out.println(CodeType.getSelect("9"));
// System.out: Select 9 from ISSN
}
}
I usually tend to keep the code as compact as possible.
Some people dislike enums, so I believe you can use a normal class instead.
You can engineer further the way you obtain the QUERIES (selects), so instead of having String templates you can have a Runnable there.
If you don't want to use the the functional aspects of Java 8 you can use Strategy objects that are associated with each type of code:
import java.util.HashMap;
import java.util.Map;
import java.util.function.Function;
import java.util.regex.Pattern;
public enum CodeType2
{
EAN("1|2|3", new StrategyEAN()),
ISBN("4|5|6", new StrategyISBN()),
ISSN("7|8|9", new StrategyISSN());
String regex;
Pattern pattern;
Strategy strategy;
CodeType2(String regex, Strategy strategy)
{
this.regex = regex;
this.pattern = Pattern.compile(regex);
this.strategy = strategy;
}
private static CodeType2 evalType(String code)
{
for(CodeType2 codeType2 : CodeType2.values())
{
if (codeType2.pattern.matcher(code).matches())
return codeType2;
}
// TODO DON'T FORGET ABOUT THIS NULL HERE
return null;
}
public static void doQuery(String code)
{
evalType(code).strategy.doQuery(code);
}
}
interface Strategy { void doQuery(String code); }
class StrategyEAN implements Strategy {
#Override
public void doQuery(String code)
{
System.out.println("EAN-" + code);
}
}
class StrategyISBN implements Strategy
{
#Override
public void doQuery(String code)
{
System.out.println("ISBN-" + code);
}
}
class StrategyISSN implements Strategy
{
#Override
public void doQuery(String code)
{
System.out.println("ISSN-" + code);
}
}
And the main method will look like this:
public class Main
{
public static void main(String... args)
{
CodeType2.doQuery("1");
CodeType2.doQuery("4");
CodeType2.doQuery("9");
}
}

So, The strategy pattern is indeed the right choice here, but strategy by itself is not enough. You have several options:
Use a Factory with simple if/else or switch. It's ugly, error prone to extend with new strategies, but is simple and quick to implement.
Use a registry. During the application initialization phase you can register in a registry each SearchQueryStratgeyFactory with the right code. For instance if you use a simple Map you can just do :
strategyRegistry.put("isbn", new IsbnSearchStrategyFactory());
strategyRegistry.put("ean", new EanSearchStrategyFactory());
.... and so on
Then when you need to get the right strategy you just get() the strategy factory from the map using the code id. This approach is better if you have a lot of strategies, but it requires an aditional iitialization step during the application startup.
Use a service locator. ServiceLocator is a pattern that enables the dynamic lookup of implementations. Java comes with an implementation of the ServiceLocator pattern -> the infamous ServiceLoader class. This is my favourite approach because it allows for complete decoupling of the consumer and implementation. Also using the service locator you can easily add new strategies without having to modify the existing code. I won't explain how to use the ServiceLoader - there is plenty of information online. I'll just mention that using the service locator you'll need to implement a "can process such codes ?" logic in each strategy factory. For instance if the factory cannot create a strategy for "isbn" then return null and try with the next factory.
Also note that in all cases you work with factories that produce the strategy implementations.
PS: It's strategy not strategie :)

Your approach is not the Strategy Pattern. Strategy Pattern is all about customizing behavior of an object (Context in terms of this pattern) by passing alternative Strategy object to it. By this way, we don't need to modify the source code of the Context class but still can customize the behavior of objects instanced from it.
Your problem is somewhat related to the Chain of Responsibility (CoR) Pattern where you have a request (your code) and need to figure out which SearchQueryStrategie in a predefined list should handle the request.
The second approach -- using array -- that you mentioned is fine. However, to make it usable in production code, you must have another object -- let's say Manager -- that manages the array and is responsible to find the relevant element for each request. So your client code have to depend on two objects: the Manager and the result SearchQueryStrategie. As you can see, the source code of Manager class tend to be changed frequently because new implementations of SearchQueryStrategie may come. This might make your client annoyed.
That's why the CoR Pattern uses the linked list mechanism instead of array. Each SearchQueryStrategie object A would hold a reference to a next SearchQueryStrategie B. If A cannot handle the request, it will delegate to B (it can even decorate the request before delegating). Of course, somewhere still must know all kinds of strategies and create a linked list of SearchQueryStrategie, but your client will then depend only on a SearchQueryStrategie object (the head one of the list).
Here is the code example:
class SearchQueryConsumer {
public void consume(SearchQuery sq) {
// ...
}
}
abstract class SearchQueryHandler {
protected SearchQueryHandler next = null;
public void setNext(SearchQueryHandler next) { this.next = next; }
public abstract void handle(String code, SearchQueryConsumer consumer);
}
class IssnSearchQueryHandler extends SearchQueryHandler {
#Override
public void handle(String code, SearchQueryConsumer consumer) {
if (issn(code)) {
consumer.consume(/* create a SearchQuery */);
} else if (next != null) {
next.handle(code, consumer);
}
}
private boolean issn(String code) { ... }
}

What i recommend is using the Factory pattern. It describes and handles your scenario better.
Factory Pattern

You can design in the following way (using concepts of factory DP and polymorphism):
Code as interface.
ISSNCode, ISBNCode and EANCode as concrete classes
implementing Code interface, having single-arg constructor taking text as String.
Code has method getInstanceOfCodeType(String text) which returns an instance of a sub-class of Code (decided by checking the type of text passed to it). Let's say the returned value be code
Class SearchQueryStrategieFactory with
getSearchQueryStrategie(code) method. It consumes the returned value from step 3, and generates different
instances of SearchQueryStrategie subclasses based on code type using new operator and, then returns the same.
So, you need to call two methods getInstanceOfCodeType(text) and getSearchQueryStrategie(code) from anywhere.
Instead of implicitly implementing the factory inside main, keep the whole factory code separate, to make it easily maintainable and extensible .

How to improve the code quality to see if a string matches either one of the regex's Java

In one of my projects I need to compare the URI with several regex patterns(15+ regex patterns). Currently I have used a if ladder to see if either one of them gets matched and there onward the logical part of the code is executed.
Glimpse of the code now:
if (uri.matches(Constants.GET_ALL_APIS_STORE_REGEX)) {
long lastUpdatedTime = InBoundETagManager.apisGet(null, null, tenantDomain, null);
String eTag = ETagGenerator.getETag(lastUpdatedTime);
if (eTag.equals(ifNoneMatch)) {
message.getExchange().put("ETag", eTag);
generate304NotModifiedResponse(message);
}
message.getExchange().put("ETag", eTag);
}
else if (uri.matches(Constants.GET_API_FOR_ID_REGEX)) { // /apis/{apiId}
apiId = UUIDList.get(0);
String requestedTenantDomain = RestApiUtil.getRequestedTenantDomain(tenantDomain);
long lastUpdatedTime = InBoundETagManager.apisApiIdGet(apiId, requestedTenantDomain, uri);
String eTag = ETagGenerator.getETag(lastUpdatedTime);
handleInterceptorResponse(message, ifNoneMatch, eTag);
}
else if (uri.matches(Constants.GET_SWAGGER_FOR_API_ID_REGEX)) { // /apis/{apiId}/swagger
apiId = UUIDList.get(0);
long lastUpdatedTime = InBoundETagManager.apisApiIdSwaggerGet(apiId, tenantDomain);
String eTag = ETagGenerator.getETag(lastUpdatedTime);
if (lastUpdatedTime == 0L) {
log.info("No last updated time available for the desired API swagger json file");
}
handleInterceptorResponse(message, ifNoneMatch, eTag);
}
Can someone please introduce me with a more neat and clever way of doing this regex matching thing?

One url-type(regex) = one handler = one class. This way would be much easier to read and support especially if you have 15 regex checks.
interface URLHandler {
void handle();
boolean isSupported(String url);
}
class GetAllApisStoreHandler implements URLHandler{
private static final Pattern GET_ALL_API_STORE_PATTERN = Pattern.compile(GET_ALL_APIS_STORE_REGEX);
public boolean isSupported(String url) {
return GET_ALL_API_STORE_PATTERN.matcher(url).matches();
}
public void handle(...) {
...
}
}
class GetApiIdHandler implements URLHandler{
private static final Pattern GET_API_ID_REGEX = Pattern.compile(GET_API_ID_REGEX);
public boolean isSupported(String url) {
return GET_API_ID_PATTERN.matcher(url).matches();
}
public void handle(...) {
...
}
}
class GetApiIdHandler implements URLHandler{
private static final Pattern GET_SWAGGER_FORAPI_ID_PATTERN = Pattern.compile(GET_SWAGGER_FOR_API_ID_REGEX);
public boolean isSupported(String url) {
return GET_SWAGGER_FORAPI_ID_PATTERN.matcher(url).matches();
}
public void handle(...) {
...
}
}
class Main {
private List<URLHandler> urlHandlers;
public void method(){
...
for (URLHandler handler : urlHandlers) {
if(handler.isSupported(url)) {
handler.handle(arg1, arg2, arg3, ...);
}
}
...
}
}

Using multiple classes as #KonstantinLabun proposed is probably the way to go(*), but it shouldn't lead to much code duplication. So use an abstract class instead of (or in addition to an interface). Or (mis)use default methods.
abstract class URLHandler {
abstract void handle();
abstract Pattern urlPattern():
final boolean isSupported(String url) {
return urlPattern().matcher(url).matches();
}
}
class GetAllApisStoreHandler extends URLHandler{
private static final Pattern URL_PATTERN =
Pattern.compile(Constants.GET_ALL_APIS_STORE_REGEX);
Pattern urlPattern() {
return URL_PATTERN;
}
public void handle(...) {
...
}
}
There's no need to invent names for the PATTERN as its scope identified it already. The static field exists only as an optimization, so that the Pattern don't get compiled for each match.
(*) There's nothing wrong with a single class, as long as it's concise (I like spaghetti except in code) and doesn't leak implementation details. There's nothing wrong with multiple classes (except maybe on Android as 50 kB per class might matter) as long as they don't lead to code bloat. An enum is sometimes a good solution, too.
Explanation of abstract class vs. interface
An interface forces you to implement its methods(**), which may quickly lead to duplication. It's advantage is multiple inheritance and some conceptual purity.
An abstract class allows you to gather the common parts. But there's no dilemma, you can do both, see e.g., interface List and abstract class AbstractList.
(**) Since Java 8, an interface can have default methods, so this is no more true. Assuming you want to use them for this purpose. It can't declare any state, but it can access the state of the object. For example, my above URLHandler could be such an interface. There are still disadvantages, e.g., methods must be public and mustn't be final.

creating objects and polymorphism

I want to avoid using tagged classes and big if-else blocks or switch statement and use polymorphism with a class hierarchy instead, which I believe is better practice.
For example, something like the below, where the choice of executed method is dependent only one one field of an object of type Actor.
switch(actor.getTagField())
{
case 1: actor.act1(); break;
case 2: actor.act2(); break;
[...]
}
would become
actor.act();
and the act method would be overridden in subclasses of Actor.
However, the most obvious way to decide at runtime which subclass to instantiate looks awfully similar to the original:
Actor newActor(int type)
{
switch(type)
{
case 1: return new Actor1();
case 2: return new Actor2();
[...]
}
}
so it seems like nothing has really been gained; the logic has just been moved.
What is a better way to do this? The only way I can come up with involved implementing a factory class for each subclass of Actor, but this seems rather cumbersome for such a simple problem.
Am I overthinking this? It just seems like there's no point making the original change if I just do pretty much the same thing elsewhere.

Question is "if" you need a factory. The factory is meant to manage the creation of instances an not so much the behavior of related instances.
Otherwise, you're just looking at basic inheritance. Something like..
class Actor{
public void act(){
System.out.println("I act..");
}
}
class StuntActor extends Actor {
public void act(){
System.out.println("I do fancy stunts..");
}
}
class VoiceActor extends Actor {
public void act(){
System.out.println("I make funny noises..");
}
}
To Use, you can just instantiate the type of actor you need directly.
Actor fred = new Actor();
Actor tom = new VoiceActor();
Actor sally = new StuntActor();
fred.act();
tom.act();
sally.act();
Output:
I act..
I make funny noises..
I do fancy stunts..
EDIT:
If you need to centralize the creation of the Actors..aka vis a Factory, you will not be able to get away from some kind of switching logic--in which case..i'll typically use an enumeration for readability:
public class Actor{
public enum Type{ REGULAR, VOICE, STUNT }
public static Actor Create(Actor.Type type){
switch(type) {
case VOICE:
return new VoiceActor();
case STUNT:
return new StuntActor();
case REGULAR:
default:
return new Actor();
}
}
public void act(){
System.out.println("I act..");
}
}
Usage:
Actor some_actor = Actor.Create(Actor.Type.VOICE);
some_actor.act();
Output:
I make funny noises..

Switch statements aren't pure evil. It's really duplication that you're looking to eliminate with better design. Often times you'll find the same switch statement show up in different (far away) places in your code - not necessarily doing the same thing, but switching on the same data. By introducing polymorphism, you pull those switches together as different methods of the same object.
This does two things, first it reduces several switches to one switch inside of a factory and it pulls together spread out logic that probably depends on similar data. That data will turn into member variables in your objects.
It's also worth noting that you don't always end up with a switch statement under the hood of your factory. Maybe you could scan the classpath at startup and build a HashMap of types that implement an interface. For example, consider an implementation of a socket protocol like SMTP. You could have objects named HeloCommand, MailFromCommand, etc... and find the right object to handle the message by matching the socket command to the class name.

I believe that you can do it with Abstract factory pattern...
This is a example:
abstract class Computer {
public abstract Parts getRAM();
public abstract Parts getProcessor();
public abstract Parts getMonitor();
}
class Parts {
public String specification;
public Parts(String specification) {
this.specification = specification;
}
public String getSpecification() {
return specification;
}
}
We have two class that extends Computer
class PC extends Computer {
public Parts getRAM() {
return new Parts("512 MB");
}
public Parts getProcessor() {
return new Parts("Celeron");
}
public Parts getMonitor() {
return new Parts("15 inches");
}
}
class Workstation extends Computer {
public Parts getRAM() {
return new Parts("1 GB");
}
public Parts getProcessor() {
return new Parts("Intel P 3");
}
public Parts getMonitor() {
return new Parts("19 inches");
}
}
And finally we have,
public class ComputerType {
private Computer comp;
public static void main(String[] args) {
ComputerType type = new ComputerType();
Computer computer = type.getComputer("Workstation");
System.out.println("Monitor: "+computer.getMonitor().getSpecification());
System.out.println("RAM: "+computer.getRAM().getSpecification());
System.out.println("Processor: "+computer.getProcessor().getSpecification());
}
public Computer getComputer(String computerType) {
if (computerType.equals("PC"))
comp = new PC();
else if(computerType.equals("Workstation"))
comp = new Workstation();
return comp;
}
}

Persistent data structures in Java

Does anyone know a library or some at least some research on creating and using persistent data structures in Java? I don't refer to persistence as long term storage but persistence in terms of immutability (see Wikipedia entry).
I'm currently exploring different ways to model an api for persistent structures. Using builders seems to be a interesting solution:
// create persistent instance
Person p = Builder.create(Person.class)
.withName("Joe")
.withAddress(Builder.create(Address.class)
.withCity("paris")
.build())
.build();
// change persistent instance, i.e. create a new one
Person p2 = Builder.update(p).withName("Jack");
Person p3 = Builder.update(p)
.withAddress(Builder.update(p.address())
.withCity("Berlin")
.build)
.build();
But this still feels somewhat boilerplated. Any ideas?

Builders will make your code too verbose to be usable. In practice, almost all immutable data structures I've seen pass in state through the constructor. For what its worth, here are a nice series of posts describing immutable data structures in C# (which should convert readily into Java):
Part 1: Kinds of Immutability
Part 2: Simple Immutable Stack
Part 3: Covariant Immutable Stack
Part 4: Immutable Queue
Part 5: Lolz! (included for completeness)
Part 6: Simple Binary Tree
Part 7: More on Binary Trees
Part 8: Even More on Binary Trees
Part 9: AVL Tree Implementation
Part 10: Double-ended Queue
Part 11: Working Double-ended Queue Implementation
C# and Java are extremely verbose, so the code in these articles is quite scary. I recommend learning OCaml, F#, or Scala and familiarizing yourself with immutability with those languages. Once you master the technique, you'll be able to apply the same coding style to Java much more easily.

I guess the obvious choices are:
o Switch to a transient data structure (builder) for the update. This is quite normal. StringBuilder for String manipulation for example. As your example.
Person p3 =
Builder.update(p)
.withAddress(
Builder.update(p.address())
.withCity("Berlin")
.build()
)
.build();
o Always use persistent structures. Although there appears to be lots of copying, you should actually be sharing almost all state, so it is nowhere near as bad as it looks.
final Person p3 = p
.withAddress(
p.address().withCity("Berlin")
);
o Explode the data structure into lots of variables and recombine with one huge and confusing constructor.
final Person p3 = Person.of(
p.name(),
Address.of(
p.house(), p.street(), "Berlin", p.country()
),
p.x(),
p.y(),
p.z()
);
o Use call back interfaces to provide the new data. Even more boilerplate.
final Person p3 = Person.of(new PersonInfo(
public String name () { return p.name(); )
public Address address() { return Address.of(new AddressInfo() {
private final Address a = p.address();
public String house () { return a.house() ; }
public String street () { return a.street() ; }
public String city () { return "Berlin" ; }
public String country() { return a.country(); }
})),
public Xxx x() { return p.x(); }
public Yyy y() { return p.y(); }
public Zzz z() { return p.z(); }
});
o Use nasty hacks to make fields transiently available to code.
final Person p3 = new PersonExploder(p) {{
a = new AddressExploder(a) {{
city = "Berlin";
}}.get();
}}.get();
(Funnily enough I was just put down a copy of Purely Functional Data Structures by Chris Okasaki.)

Have a look at Functional Java. Currently provided persistent datastructures include:
Singly-linked list (fj.data.List)
Lazy singly-linked list (fj.data.Stream)
Nonempty list (fj.data.NonEmptyList)
Optional value (a container of length 0 or 1) (fj.data.Option)
Set (fj.data.Set)
Multi-way tree (a.k.a. rose tree) (fj.data.Tree)
Immutable map (fj.data.TreeMap)
Products (tuples) of arity 1-8 (fj.P1..P8)
Vectors of arity 2-8 (fj.data.vector.V2..V8)
Pointed list (fj.data.Zipper)
Pointed tree (fj.data.TreeZipper)
Type-safe, generic heterogeneous list (fj.data.hlist.HList)
Immutable arrays (fj.data.Array)
Disjoint union datatype (fj.data.Either)
A number of usage examples are provided with the binary distribution. The source is available under a BSD license from Google Code.

I implemented a few persistent data structures in Java. All open source (GPL) on Google code for anyone who is interested:
http://code.google.com/p/mikeralib/source/browse/#svn/trunk/Mikera/src/mikera/persistent
The main ones I have so far are:
Persistent mutable test object
Persistent hash maps
Persistent vectors/lists
Persistent sets (including a specialised persistent set of ints)

Follow a very simple tentative with dynamic proxy:
class ImmutableBuilder {
static <T> T of(Immutable immutable) {
Class<?> targetClass = immutable.getTargetClass();
return (T) Proxy.newProxyInstance(targetClass.getClassLoader(),
new Class<?>[]{targetClass},
immutable);
}
public static <T> T of(Class<T> aClass) {
return of(new Immutable(aClass, new HashMap<String, Object>()));
}
}
class Immutable implements InvocationHandler {
private final Class<?> targetClass;
private final Map<String, Object> fields;
public Immutable(Class<?> aTargetClass, Map<String, Object> immutableFields) {
targetClass = aTargetClass;
fields = immutableFields;
}
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
if (method.getName().equals("toString")) {
// XXX: toString() result can be cached
return fields.toString();
}
if (method.getName().equals("hashCode")) {
// XXX: hashCode() result can be cached
return fields.hashCode();
}
// XXX: naming policy here
String fieldName = method.getName();
if (method.getReturnType().equals(targetClass)) {
Map<String, Object> newFields = new HashMap<String, Object>(fields);
newFields.put(fieldName, args[0]);
return ImmutableBuilder.of(new Immutable(targetClass, newFields));
} else {
return fields.get(fieldName);
}
}
public Class<?> getTargetClass() {
return targetClass;
}
}
usage:
interface Person {
String name();
Person name(String name);
int age();
Person age(int age);
}
public class Main {
public static void main(String[] args) {
Person mark = ImmutableBuilder.of(Person.class).name("mark").age(32);
Person john = mark.name("john").age(24);
System.out.println(mark);
System.out.println(john);
}
}
grow directions:
naming policy (getName, withName, name)
caching toString(), hashCode()
equals() implementations should be straightforward (although not implemented)
hope it helps :)

It is very difficult, if not impossible, to make things immutable that ain't designed so.
If you can design from ground up:
use only final fields
do not reference non immutable objects

Do you want immutability :
so external code cannot change the data?
so once set a value cannot be changed?
In both cases there are easier ways to accomplish the desired result.
Stopping external code from changing the data is easy with interfaces:
public interface Person {
String getName();
Address getAddress();
}
public interface PersonImplementor extends Person {
void setName(String name);
void setAddress(Address address);
}
public interface Address {
String getCity();
}
public interface AddressImplementor {
void setCity(String city);
}
Then to stop changes to a value once set is also "easy" using java.util.concurrent.atomic.AtomicReference (although hibernate or some other persistence layer usage may need to be modified):
class PersonImpl implements PersonImplementor {
private AtomicReference<String> name;
private AtomicReference<Address> address;
public void setName(String name) {
if ( !this.name.compareAndSet(name, name)
&& !this.name.compareAndSet(null, name)) {
throw new IllegalStateException("name already set to "+this.name.get()+" cannot set to "+name);
}
}
// .. similar code follows....
}
But why do you need anything more than just interfaces to accomplish the task?

Google Guava now hosts a variety of immutable/persistent data structures.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.