Regarding two lines of java code - java

I am trying to learn a java-based program, but I am pretty new to java. I am quite confusing on the following two lines of java code. I think my confusion comes from the concepts including “class” and “cast”, but just do not know how to analyze.
For this one
XValidatingObjectCorpus<Classified<CharSequence>> corpus
= new XValidatingObjectCorpus<Classified<CharSequence>>(numFolds);
What is <Classified<CharSequence>> used for in terms of Java programming? How to understand its relationships with XValidatingObjectCorpusand corpus
For the second one
LogisticRegressionClassifier<CharSequence> classifier
= LogisticRegressionClassifier.<CharSequence>train(para1, para2, para3)
How to understand the right side of LogisticRegressionClassifier.<CharSequence>train? What is the difference between LogisticRegressionClassifier.<CharSequence>train and LogisticRegressionClassifier<CharSequence> classifier
?

These are called generics. They tell Java to make an instance of the outer class - either XValidatingObjectCorpus or LogisticRegressionClassifier - using the type of the inner object.
Normally, these are used for lists and arrays, such as ArrayList or HashMap.
What is the relationship between XValidatingObjectCorpus and corpus?
corpus is just a name given to the new XValidatingObjectCorpus object that you make with that statement (hence the = new... part).
What does LogisticRegressionClassifier.<CharSequence>train mean?
I have no idea, really. I suggest looking at the API for that (I think this is the right class).
What is the difference between LogisticRegressionClassifier.<CharSequence>train and LogisticRegressionClassifier<CharSequence> classifier?
You can't really compare these two. The one on the left of the = is the object identifier, and the one on the right is the allocator (probably the wrong word, but it is what it does, kind of).
Together, the two define an instance of LogisticRegressionClassifier, saying to create that type of object, call it classifier, and then give it the value returned by the train() method. Again, look at the API to understand it more.
By the way, these look like wretched examples to be learning Java with. Start with something simple, or at least an easier part of the code. It looks like someone had way too much fun with long names (the API has even longer names). Seriously though, I only just got to fully understanding this, and Java was my main language for quite a while (It gets really confusing when you try and do simple things). Anyways, good luck!

public class Sample<T> { // T implies Generic implementation, T can be substituted with any object.
static <T> Sample<T> train(int par1, int par2, int par3){
return new Sample<T>(); // you are calling the Generic method to return Sample object which works with a particular type of generic object, may it be an Integer or a CharSequence. --> see the main method.
}
public static void main(String ... a)
{
int par1 = 0, par2 = 0, par3 = 1;
// Here you are returning Sample object which works with a sequence of characters.
Sample<CharSequence> sample = Sample.<CharSequence>train(par1, par2, par3);
// Here you are returning Sample object which works with Integer values.
Sample<CharSequence> sample1 = Sample.<Integer>train(par1, par2, par3);
}
}

<Classified<CharSequence>> is a generic parameter.
LogisticRegressionClassifier<CharSequence> is a generic type.
LogisticRegresstionClassifier.<CharSequence>train is a generic method.
Java Generics Tutorial

Related

Implementing Guava's MurMurHash Properly

I'm a Junior Java Developer and I'm trying to start a small personal project in order to learn the proper ways to do things (in general). I started searching about hash() and while reading an article about the benefits of Guava, I stumbled upon MurMurHash and the example is very clear on it's website, but there is something missing that I didn't understand: Funnel.
The code goes like this:
HashFunction hf = Hashing.md5();
HashCode hc = hf.newHasher()
.putLong(id)
.putString(name, Charsets.UTF_8)
.putObject(person, personFunnel)
.hash();
but then I have to define a Funnel to decompose an object type into primitive field values, for which I have to
Funnel<Person> personFunnel = new Funnel<Person>() {
#Override
public void funnel(Person person, PrimitiveSink into) {
into
.putInt(person.id)
.putString(person.firstName, Charsets.UTF_8)
.putString(person.lastName, Charsets.UTF_8)
.putInt(birthYear);
}
};
Although I searched for more info about how to use this or info in general, there is no clear explanation about how Funnel works and/or how should I use it. Also I don't understand what PrimitiveSink is, so I don't know what kind of data should I send as a second parameter.
I would appreciate an explanation o guidance about this.
You don't actually have to use a Funnel for anything, but a Funnel is just an object that expresses how to convert a particular type to a sequence of primitives. There is no special magic.
Funnel<Person> personFunnel = new Funnel<Person>() {
#Override
public void funnel(Person person, PrimitiveSink into) {
into
.putInt(person.id)
.putString(person.firstName, Charsets.UTF_8)
.putString(person.lastName, Charsets.UTF_8)
.putInt(birthYear);
}
};
This is just an object that explains how to convert a Person to a sequence of primitives by putting them into a thing that knows how to receive primitives; the interface for things that know how to receive primitives is PrimitiveSink. Hasher is an example of a class that implements PrimitiveSink, and when you call hasher.putObject(object, funnelForObjectType), the API internally just does funnelForObjectType.funnel(object, hasher), and keeps going.
It's just a way of writing a converter from a particular object type to primitives, nothing more. You are never likely to call Funnel.funnel yourself; it's only there to be passed into putObject calls; you should never need to pass in your own PrimitiveSink.

Passing a parameter versus returning it from function

As it might be clear from the title which approach should we prefer?
Intention is to pass a few method parameters and get something as output. We can pass another parameter and method will update it and method need not to return anything now, method will just update output variable and it will be reflected to the caller.
I am just trying to frame the question through this example.
List<String> result = new ArrayList<String>();
for (int i = 0; i < SOME_NUMBER_N; i++) {
fun(SOME_COLLECTION.get(i), result);
}
// in some other class
public void fun(String s, List<String> result) {
// populates result
}
versus
List<String> result = new ArrayList<String>();
for (int i = 0; i < SOME_NUMBER_N; i++) {
List<String> subResult = fun(SOME_COLLECTION.get(i));
// merges subResult into result
mergeLists(result, subResult);
}
// in some other class
public List<String> fun(String s) {
List<String> res = new ArrayList<String>();
// some processing to populate res
return res;
}
I understand that one passes the reference and another doesn't.
Which one should we prefer (in different situations) and why?
Update: Consider it only for mutable objects.
Returning a value from the function is generally a cleaner way of writing code. Passing a value and modifying it is more C/C++ style due to the nature of creating and destroying pointers.
Developers generally don't expect that their values will be modified by passing it through a function, unless the function explicitly states it modifies the value (and we often skim documentation anyway).
There are exceptions though.
Consider the example of Collections.sort, which does actually do an in place sort of a list. Imagine a list of 1 million items and you are sorting that. Maybe you don't want to create a second list that has another 1 million entries (even though these entries are pointing back to the original).
It is also good practice to favor having immutable objects. Immutable objects cause far fewer problems in most aspects of development (such as threading). So by returning a new object, you are not forcing the parameter to be mutable.
The important part is to be clear about your intentions in the methods. My recommendation is to avoid modifying the parameter when possible since it not the most typical behavior in Java.
You should return it. The second example you provided is the way to go.
First of all, its more clear. When other people read your code, there's no gotcha that they might not notice that the parameter is being modified as output. You can try to name the variables, but when it comes to code readability, its preferable.
The BIG reason why you should return it rather than pass it, is with immutable objects.
Your example, the List, is mutable, so it works okay.
But if you were to try to use a String that way, it would not work.
As strings are immutable, if you pass a string in as a parameter, and then the function were to say:
public void fun(String result){
result = "new string";
}
The value of result that you passed in would not be altered. Instead, the local scope variable 'result' now points to a new string inside of fun, but the result in your calling method still points to the original string.
If you called:
String test = "test";
fun(test);
System.out.println(test);
It will print: "test", not "new string"!
So definitely, it is superior to return. :)
This is more about best practices and your own method to program. I would say if you know this is going to be a one value return type function like:
function IsThisNumberAPrimeNumber{ }
Then you know that this is only going to ever return a boolean. I usually use functions as helper programs and not as large sub procedures. I also apply naming conventions that help dictate what I expect the sub\function will return.
Examples:
GetUserDetailsRecords
GetUsersEmailAddress
IsEmailRegistered
If you look at those 3 names, you can tell the first is going to give you some list or class of multiple user detail records, the second will give you a string value of a email and the third will likely give you a boolean value. If you change the name, you change the meaning, so I would say consider this in addition.
The reason I don't think we understand is that those are two totally different types of actions. Passing a variable to a function is a means of giving a function data. Returning it from the function is a way of passing data out of a function.
If you mean the difference between these two actions:
public void doStuff(int change) {
change = change * 2;
}
and
public void doStuff() {
int change = changeStorage.acquireChange();
change = change * 2;
}
Then the second is generally cleaner, however there are several reasons (security, function visibilty, etc) that can prevent you from passing data this way.
It's also preferable because it makes reusing code easier, as well as making it more modular.
according to guys recommendation and java code convention and also syntax limitation this is a bad idea and makes code harder to understand
BUT you can do it by implementing a reference holder class
public class ReferenceHolder<T>{
public T value;
}
and pass an object of ReferenceHolder into method parameter to be filled or modified by method.
on the other side that method must assign its return into Reference value instead of returning it.
here is the code for getting result of an average method by a ReferenceHolder instead of function return.
public class ReferenceHolderTest {
public static void main(String[] args) {
ReferenceHolder<Double> out = new ReferenceHolder<>();
average(new int[]{1,2,3,4,5,6,7,8},out);
System.out.println(out.value);
}
public static void average(int[] x, ReferenceHolder<Double> out ) {
int sum=0;
for (int a : x) {
sum+=a;
}
out.value=sum/(double)x.length;
}
}
Returning it will keep your code cleaner and cause less coupling between methods/classes.
It is generally preferable to return it.
Specially from a unit testing standpoint. If you are unit testing it
is easier to assert a returned value from a method than verifying if
your object was modified or interacted correctly. (Using
ArgumentCaptor or ArgumentMatcher to assert interactions isn't as
straight forward as a simple return assertion).
Increased code readability. If I see a method that takes 5 object parameters I
might have no immediate way of knowing you plan on modifying one of
those references for future use downstream. Instead if you are returning an
object, I can easily see you ultimately care about the result of that
method's computation.

How does relying on type inference affect code maintainability?

Although the examples in this question are in Visual Basic.NET and Java, it's possible that the topic may also apply to other languages that implement type inference.
Up until now, I've only used type inference when using LINQ in VB.NET. However, I'm aware that type inference can be used in other parts of the language, as demonstrated below:
Dim i = 1 'i is inferred to be an Integer
Dim s = "Hoi" 's is inferred to be a string
Dim Temperatures(10) as Double
For T In Temperatures 'T is inferred to be a Double
'Do something
Next
I can see how type inference reduces the amount of typing I would need to write a piece of code and can also make it quicker to change the type of a base variable (such as Temperatures(10) above) as you wouldn't need to change the types of all other variables that access it (such as T). However, I was concerned that the lack of an explicit declaration of type might make the code harder to read, say during a code inspection, as it may not be immediately obvious what type a variable is. For example, by looking at just the For loop above, you might correclty assume that T is some floating point value, but wouldn't be able to tell if it's single- or double-precision without looking back at the declaration of Temperatures. Depending on how the code is written, Temperatures could be declared much earlier and would thus require the reader to go back and find the declaration and then resume reading. Admittedly, when using a good IDE, this is not so much of an issue, as a quick hover of the mouse cursor over the variable name reveals the type.
Also, I would imagine that using type inference in some situations could introduce some bugs when attempting to change the code. Consider the following (admittedly contrived) example, assuming that the classes A and B do completely different things.
Class A
Public Sub DoSomething()
End Sub
End Class
Class B
Public Sub DoSomething()
End Sub
End Class
Dim ObjectList(10) As A
For item In ObjectList
item.DoSomething()
End For
It is possible to change the type of ObjectList from A to B and the code would compile, but would function differently, which could manifest as a bug in other parts of the code.
With some examples of type inference, I would imagine that most people would agree that its use has no negative affect on readability or maintainability. For example, in Java 7, the following can be used:
ArrayList<String> a = new ArrayList<>(); // The constructor type is inferred from the declaration.
instead of
ArrayList<String> a = new ArrayList<String>();
I was curious to find out what people's thoughts are on this matter and if there are any recommendations on how extensively type inference should be used.
Amr
Personally I tend to use implicit type inference only in cases where the type is clear and where it makes the code easier to read
var dict = new Dictionary<string, List<TreeNode<int>>>();
is easier to read than
Dictionary<string, List<TreeNode<int>>> dict =
new Dictionary<string, List<TreeNode<int>>>();
yet the type is clear.
Here it is not really clear what you get and I would specify the type explicitly:
var something = GetSomeThing();
Here it makes no sense to use var since it does not simplify anything
int i = 0;
double x = 0.0;
string s = "hello";
I think in part it's a matter of coding style. I personally go with the the more verbose code given the choice. (Although when I move to a Java 7 project, I'll probably take advantage of the new syntax because everything is still there on one line.)
I also think that being explicit can avoid certain subtle bugs. Take your example:
For T In Temperatures 'T is inferred to be a Double
'Do something
Next
If Do something was a numerical method that relied on T being a Double, then changing Temperatures to single precision could cause an algorithm to fail to converge, or to converge to an incorrect value.
I imagine that I could argue the other way—that being explicit about variable types can in certain cases generate a lot of trivial maintenance work that could have been avoided. Again, I think it's all a matter of style.
The use of Generics is to ensure type safe checking at compile time instead of handling it on run time. If you are using type safe correctly it will save you from run time exceptions as well as writing casting codes at the time of data retrieval.
JAva Doesn't use this sentax prior to jdk 1.7
ArrayList<String> a = new ArrayList<>();
Its
ArrayList<String> a = new ArrayList();
But in java the use of
ArrayList<String> a = new ArrayList(); and
ArrayList<String> a = new ArrayList<String>();
Both consider same because the reference variable holds the complile time information of type safe which is string you can not add anything else instead of string in it i.e
a.add(new Integer(5));
Its still type safe and you will get error that you can only insert Strings.
So you can't do what you are expecting to do.

ImplementIon of eval() parser and 2d array in Java

I have really stuck on two Java related issues. One simple and one more difficult.
Regarding the creation of a 2D array, I initialize a table like this:
private String [][] table_of_classifiers = null;
and then, within a function, I fill its contents like this:
String [][] table_of_classifiers = {
{"x1","x","x","x","x"},
{"x2","x","x","x","x"},
{"x3","x","x","x","x"},
{"x4","x","x","x","x"},
{"x5","x","x","x","x"},
{"x6","x","x","x","x"},
};
But as you can guess the second table overwrites (locally) the first one, that is of course not what I want to do. What I do wrong? Note that the dimension of the table is not known from the beginning.
Regarding the creation of a 2D array, I initialize a table like this:
private String [][] table_of_classifiers = null;
Not really. This is the declaration and initialization of a variable that can point to a "2d array", "table" or more exact an "array of arrays" of Strings.
Unless you work with that fact that the variable can/will be null, initializing it to null is usually a bad idea, because you need to do extra work to check for null. Examples:
String[][] a;
// ...
String b = a[0][0];
This won't compile, unless a wasn't initialized in the mean time. This is a good thing, because you can avoid a potential bug.
String[][] a = null;
// ...
String b = a[0][0];
This will however will compile, and if you forgot to actually assign the variable a real array, the program will "crash" with a "null pointer exception" or you need to add additional code/work to check for null.
I fill its contents like this:
String [][] table_of_classifiers = {
{"x1","x","x","x","x"},
{"x2","x","x","x","x"},
{"x3","x","x","x","x"},
{"x4","x","x","x","x"},
{"x5","x","x","x","x"},
{"x6","x","x","x","x"},
};
You are not "filling" anything here. For something to be filled it must exist first, but you haven't created anything yet.
Here you are declaring a second variable of the same name, which is only possible if you are in a different scope that the first one, and in that case you are "hiding" ("shadowing") the original variable if it originally was accessible from this new scope.
But as you can guess the second table overwrites (locally) the first
one, that is of course not what I want to do. What I do wrong?
Which "first" table? There was no first table until now, only a first variable. The others have shown you what you need to do to assign the "table" to the original variable, by not using the "declaration" String[][] at the beginning of the line.
Otherwise it's impossible to say what you are "doing wrong" because you haven't really explained what you are attempting to do.
Note that the dimension of the table is not known from the beginning.
It's not? How/why are you using a array literal then? Literal arrays are for creating arrays of a fixed size with a fixed "prefilling".
What exactly do mean with "the beginning"? Isn't the size known when you are programming (during compile time) or when the program starts (at run time)?
If you get the size of the array during run time you can create a normal array with new:
int a = ...;
int b = ...; // Get the sizes from somewhere, e.g, user input
String[][] table_of_classifiers = new String[a][b];
// Now you have an "empty" table
If size "changes" during run time, then - depending on what you are actually attempting to do - then an array is the wrong tool and you should be using a List implementation such as ArrayList instead.
Regarding "eval", as the others say, Java is a compiled language making "eval" basically impossible. The is "reflection" or the use of Class types to achieve what you are hinting at, but you really need to explain much more extensively what you are trying to achieve, then it may be possible to help you here.
However reflection and CLass types are a complicated matter, and considering you are obviously struggling with the most basic Java concepts, you have a long way to go to until you will be able to do what you want to do.
Just do:
class Foo {
private String [][] table_of_classifiers = null;
void bar() {
table_of_classifiers = new String[][] {
{"x1","x","x","x","x"},
{"x2","x","x","x","x"},
{"x3","x","x","x","x"},
{"x4","x","x","x","x"},
{"x5","x","x","x","x"},
{"x6","x","x","x","x"},
};
}
}
Java doesn't have eval (because it's a compiled language), but it does have reflection. It's almost certainly not the best approach to whatever it is that you want to do, though.
Regarding your first problem: to assign to table_of_classifiers without redeclaring it, write:
table_of_classifiers = new String[][] {
{"x1","x","x","x","x"},
{"x2","x","x","x","x"},
{"x3","x","x","x","x"},
{"x4","x","x","x","x"},
{"x5","x","x","x","x"},
{"x6","x","x","x","x"},
};
Regarding eval . . . the problem is that the run-time doesn't have the names of scoped local variables, and although it can get the names of instance variables, it has to do that within the context of an object. It's possible to address these sorts of issues, but it's non-trivial, and will involve major compromises. I think you have to thoroughly understand how scoping works and how reflection works before you start figuring out what features eval will support, because otherwise you'll just be disappointed at all the requirements you give it that turn out to be impossible.

Should Java method arguments be used to return multiple values?

Since arguments sent to a method in Java point to the original data structures in the caller method, did its designers intend for them to used for returning multiple values, as is the norm in other languages like C ?
Or is this a hazardous misuse of Java's general property that variables are pointers ?
A long time ago I had a conversation with Ken Arnold (one time member of the Java team), this would have been at the first Java One conference probably, so 1996. He said that they were thinking of adding multiple return values so you could write something like:
x, y = foo();
The recommended way of doing it back then, and now, is to make a class that has multiple data members and return that instead.
Based on that, and other comments made by people who worked on Java, I would say the intent is/was that you return an instance of a class rather than modify the arguments that were passed in.
This is common practice (as is the desire by C programmers to modify the arguments... eventually they see the Java way of doing it usually. Just think of it as returning a struct. :-)
(Edit based on the following comment)
I am reading a file and generating two
arrays, of type String and int from
it, picking one element for both from
each line. I want to return both of
them to any function which calls it
which a file to split this way.
I think, if I am understanding you correctly, tht I would probably do soemthing like this:
// could go with the Pair idea from another post, but I personally don't like that way
class Line
{
// would use appropriate names
private final int intVal;
private final String stringVal;
public Line(final int iVal, final String sVal)
{
intVal = iVal;
stringVal = sVal;
}
public int getIntVal()
{
return (intVal);
}
public String getStringVal()
{
return (stringVal);
}
// equals/hashCode/etc... as appropriate
}
and then have your method like this:
public void foo(final File file, final List<Line> lines)
{
// add to the List.
}
and then call it like this:
{
final List<Line> lines;
lines = new ArrayList<Line>();
foo(file, lines);
}
In my opinion, if we're talking about a public method, you should create a separate class representing a return value. When you have a separate class:
it serves as an abstraction (i.e. a Point class instead of array of two longs)
each field has a name
can be made immutable
makes evolution of API much easier (i.e. what about returning 3 instead of 2 values, changing type of some field etc.)
I would always opt for returning a new instance, instead of actually modifying a value passed in. It seems much clearer to me and favors immutability.
On the other hand, if it is an internal method, I guess any of the following might be used:
an array (new Object[] { "str", longValue })
a list (Arrays.asList(...) returns immutable list)
pair/tuple class, such as this
static inner class, with public fields
Still, I would prefer the last option, equipped with a suitable constructor. That is especially true if you find yourself returning the same tuple from more than one place.
I do wish there was a Pair<E,F> class in JDK, mostly for this reason. There is Map<K,V>.Entry, but creating an instance was always a big pain.
Now I use com.google.common.collect.Maps.immutableEntry when I need a Pair
See this RFE launched back in 1999:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4222792
I don't think the intention was to ever allow it in the Java language, if you need to return multiple values you need to encapsulate them in an object.
Using languages like Scala however you can return tuples, see:
http://www.artima.com/scalazine/articles/steps.html
You can also use Generics in Java to return a pair of objects, but that's about it AFAIK.
EDIT: Tuples
Just to add some more on this. I've previously implemented a Pair in projects because of the lack within the JDK. Link to my implementation is here:
http://pbin.oogly.co.uk/listings/viewlistingdetail/5003504425055b47d857490ff73ab9
Note, there isn't a hashcode or equals on this, which should probably be added.
I also came across this whilst doing some research into this questions which provides tuple functionality:
http://javatuple.com/
It allows you to create Pair including other types of tuples.
You cannot truly return multiple values, but you can pass objects into a method and have the method mutate those values. That is perfectly legal. Note that you cannot pass an object in and have the object itself become a different object. That is:
private void myFunc(Object a) {
a = new Object();
}
will result in temporarily and locally changing the value of a, but this will not change the value of the caller, for example, from:
Object test = new Object();
myFunc(test);
After myFunc returns, you will have the old Object and not the new one.
Legal (and often discouraged) is something like this:
private void changeDate(final Date date) {
date.setTime(1234567890L);
}
I picked Date for a reason. This is a class that people widely agree should never have been mutable. The the method above will change the internal value of any Date object that you pass to it. This kind of code is legal when it is very clear that the method will mutate or configure or modify what is being passed in.
NOTE: Generally, it's said that a method should do one these things:
Return void and mutate its incoming objects (like Collections.sort()), or
Return some computation and don't mutate incoming objects at all (like Collections.min()), or
Return a "view" of the incoming object but do not modify the incoming object (like Collections.checkedList() or Collections.singleton())
Mutate one incoming object and return it (Collections doesn't have an example, but StringBuilder.append() is a good example).
Methods that mutate incoming objects and return a separate return value are often doing too many things.
There are certainly methods that modify an object passed in as a parameter (see java.io.Reader.read(byte[] buffer) as an example, but I have not seen parameters used as an alternative for a return value, especially with multiple parameters. It may technically work, but it is nonstandard.
It's not generally considered terribly good practice, but there are very occasional cases in the JDK where this is done. Look at the 'biasRet' parameter of View.getNextVisualPositionFrom() and related methods, for example: it's actually a one-dimensional array that gets filled with an "extra return value".
So why do this? Well, just to save you having to create an extra class definition for the "occasional extra return value". It's messy, inelegant, bad design, non-object-oriented, blah blah. And we've all done it from time to time...
Generally what Eddie said, but I'd add one more:
Mutate one of the incoming objects, and return a status code. This should generally only be used for arguments that are explicitly buffers, like Reader.read(char[] cbuf).
I had a Result object that cascades through a series of validating void methods as a method parameter. Each of these validating void methods would mutate the result parameter object to add the result of the validation.
But this is impossible to test because now I cannot stub the void method to return a stub value for the validation in the Result object.
So, from a testing perspective it appears that one should favor returning a object instead of mutating a method parameter.

Categories