Does Java create object even if it's not initialized directly? - java

If I initialize String array directly like this String[] Distro = Distros.split(","); then it'll create an object because variable Distro is holding the array.
But If I do it this way then it'll also create an object?
String Distros = "CentOS,RHEL,Debian,Ubuntu";
for (String s : Distros.split(",")) {
System.out.println(s);
}
My goal is to reduce object creation to minimize garbage.

Your reasoning “then it'll create an object because variable Distro is holding the array” indicates that you are confusing object creation with variable assignment.
The object is created by the expression Distros.split(","), not the subsequent assignment. It should become obvious when you consider that the split method is an ordinary Java method creating and returning the array without any knowledge about what the caller will do with the result.
When the operation happens in a performance critical code, you might use
int p = 0;
for(int e; (e = Distros.indexOf(',', p)) >= 0; p = e+1)
System.out.println(Distros.substring(p, e));
System.out.println(Distros.substring(p));
instead. It’s worth pointing out that this saves the array creation but still performs the creation of the substrings, which is the more expensive aspect of it. Without knowing what you are actually going to do with the substrings, it’s impossible to say whether there are alternatives which can save the substring creation¹.
But this loop still has an advantage over the split method. The split method creates all substrings and returns an array holding references to them, forcing them to exist at the same time, during the entire loop. The loop above calls substring when needed and doesn’t keep a reference when going to the next. Hence, the strings are not forced to exist all the time and the garbage collector is free to decide when to collect them, depending on the current memory utilization.
¹ I assume that printing is just an example. But to stay at the example, you could replace
System.out.println(Distros.substring(p, e));
with
System.out.append(Distros, p, e).println();
The problem is, this only hides the substring creation, at least in the reference implementation which will eventually perform the substring creation behind the scenes.
An alternative is
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(FileDescriptor.out)));
try {
int p = 0; for(int e; (e = Distros.indexOf(',', p)) >= 0; p = e+1) {
bw.write(Distros, p, e - p);
bw.write(System.lineSeparator());
}
bw.write(Distros, p, Distros.length() - p);
bw.write(System.lineSeparator());
bw.flush();
}
catch(IOException ex) {
ex.printStackTrace();
}
which truly writes the strings without creating substrings. But it forces us to deal with potential exceptions, which PrintStream normally hides.

The method split(delimiter) returns string array from the string based on the delimiter, what you did create the string array in for each and the scope of it ended after for each so It's eligible for GC to release it
String Distros = "CentOS,RHEL,Debian,Ubuntu";
for (String s : Distros.split(",")) {
System.out.println(s);
}
, Is equivalent to
String Distros = "CentOS,RHEL,Debian,Ubuntu";
System.out.println("start scope");
{
String[] splitArray = Distros.split(",");
for (String s : splitArray) {
System.out.println(s);
}
}
System.out.println("end scope");

Related

Identify record that is culprit - coding practices

Is method chaining good?
I am not against functional programming that uses method chaining a lot, but against a herd mentality where people mindlessly run behind something that is new.
The example, if I am processing a list of items using stream programming and need to find out the exact row that resulted into throwing NullPointerException.
private void test() {
List<User> aList = new ArrayList<>();
// fill aList with some data
aList.stream().forEach(x -> doSomethingMeaningFul(x.getAddress()));
}
private void doSomethingMeaningFul(Address x) {
// Do something
}
So in the example above if any object in list is null, it will lead to NullPointerException while calling x.getAddress() and come out, without giving us a hook to identify a User record which has this problem.
I may be missing something that offers this feature in stream programming, any help is appreciated.
Edit 1:
NPE is just an example, but there are several other RuntimeExceptions that could occur. Writing filter would essentially mean checking for every RTE condition based on the operation I am performing. And checking for every operation will become a pain.
To give a better idea about what I mean following is the snippet using older methods; I couldn't find any equivalent with streams / functional programming methods.
List<User> aList = new ArrayList<>();
// Fill list with some data
int counter = 0;
User u = null;
try {
for (;counter < aList.size(); counter++) {
u = aList.get(counter);
u.doSomething();
int result = u.getX() / u.getY();
}
} catch(Exception e) {
System.out.println("Error processing at index:" + counter + " with User record:" + u);
System.out.println("Exception:" + e);
}
This will be a boon during the maintenance phase(longest phase) pointing exact data related issues which are difficult to reproduce.
**Benefits:**
- Find exact index causing issue, pointing to data
- Any RTE is recorded and analyzed against the user record
- Smaller stacktrace to look at
Is method chaining good?
As so often, the simple answer is: it depends.
When you
know what you are doing
are be very sure that elements will never be null, thus the chance for an NPE in such a construct is (close to) 0
and the chaining of calls leads to improved readability
then sure, chain calls.
If any of the above criteria isn't clearly fulfilled, then consider not doing that.
In any case, it might be helpful to distribute your method calls on new lines. Tools like IntelliJ actually give you advanced type information for each line, when you do that (well, not always, see my own question ;)
From a different perspective: to the compiler, it doesn't matter much if you chain call. That really only matters to humans. Either for readability, or during debugging.
There are a few aspects to this.
1) Nulls
It's best to avoid the problem of checking for nulls, by never assigning null. This applies whether you're doing functional programming or not. Unfortunately a lot of library code does expose the possibility of a null return value, but try to limit exposure to this by handling it in one place.
Regardless of whether you're doing FP or not, you'll find you get a lot less frustrated if you never have to write null checks when calling your own methods, because your own methods can never return null.
An alternative to variables that might be null, is to use Java 8's Optional class.
Instead of:
public String myMethod(int i) {
if(i>0) {
return "Hello";
} else {
return null;
}
}
Do:
public Optional<String> myMethod(int i) {
if(i>0) {
return Optional.of("Hello");
} else {
return Optional.empty();
}
Look at Optional Javadoc to see how this forces the caller to think about the possibility of an Optional.empty() response.
As a bridge between the worlds of "null represents absent" and "Optional.empty() represents absent", you can use Optional.ofNullable(val) which returns Empty when val == null. But do bear in mind that Optional.empty() and Optional.of(null) are different values.
2) Exceptions
It's true that throwing an exception in a stream handler doesn't work very well. Exceptions aren't a very FP-friendly mechanism. The FP-friendly alternative is Either -- which isn't a standard part of Java but is easy to write yourself or find in third party libraries: Is there an equivalent of Scala's Either in Java 8?
public Either<Exception, Result> meaningfulMethod(Value val) {
try {
return Either.right(methodThatMightThrow(val));
} catch (Exception e) {
return Either.left(e);
}
}
... then:
List<Either<Exception, Result>> results = listOfValues.stream().map(meaningfulMethod).collect(Collectors.toList());
3) Indexes
You want to know the index of the stream element, when you're using a stream made from a List? See Is there a concise way to iterate over a stream with indices in Java 8?
In your test() function you are creating an emptylist List<User> aList = new ArrayList<>();
And doing for each on it. First add some element to
aList
If you want to handle null values you can add .filter(x-> x != null) this before foreach it will filter out all null value
Below is code
private void test() {
List<User> aList = new ArrayList<>();
aList.stream().filter(x-> x != null).forEach(x -> doSomethingMeaningFul(x.getAddress()));
}
private void doSomethingMeaningFul(Address x) {
// Do something
}
You can write a black of code in streams. And you can find out the list item which might result in NullPointerException. I hope this code might help
private void test() {
List<User> aList = new ArrayList<>();
aList.stream().forEach(x -> {
if(x.getAddress() != null)
return doSomethingMeaningFul(x.getAddress())
else
system.out.println(x+ "doesn't have address");
});
}
private void doSomethingMeaningFul(Address x) {
// Do something
}
If you want you can throw NullPointerException or custom excption like AddressNotFoundException in the else part

Will this work like a destructor?

I am working on a Processing program for Brownian motion tracking.
I have an ArrayList blobs and ArrayList tomerge. The first one is a list of particles which I track and the second one is a list of particles which I want to merge.
Every particle is a Blob class object. Blob object countains ArrayList of Vectors called lespoints and int id in its data.
Since I need to merge a few particles in one, I need to destroy some Blob objects, but Java doesn't have any destructors and I don't want to use finalise(). Will this work like merge + destruction?
public void delete(Blob a)
{
a = null;
}
void merge(ArrayList<Blob> tomerge)
{
int i = 0;
int j = 0;
while (i <= tomerge.size())
{
Blob k = new Blob();
k = tomerge.get(i);
while (j <= tomerge.get(i).siz()) {
Vector g = k.lespoints.get(j);
lespoints.add(g);
j++;
}
if (i > 0)
{
delete(tomerge.get(i));
}
i++;
}
}
You don't need to manually do anything. Just make sure you don't have any references to the variables you want to go away, and Java will handle the rest.
For example:
String x = "test";
x = null;
At this point, the value "test" can be garbage collected because nothing points to it.
Compare that to this:
String x = "test";
ArrayList<String> list = new ArrayList<>();
list.add(x);
x = null;
At this point, the value "test" cannot be garabage collected, because the ArrayList still points to it.
But if you did this:
list.remove("test");
Then it could be garbage collected.
So basically, all you need to do is remove the element from your ArrayList and Java will take care of the rest. Note that you probably don't want to do this in your current loop, as removing elements while you iterate over a list can cause you to skip over elements.
Instead, you probably want to use an iterator or just loop backwards over your ArrayList.
Shameless self-promotion: here is a tutorial on ArrayLists, including removing elements from them.
There is an exact reason why your code example won't work.
public void delete(Blob a)
{
a = null;
}
Blob b = new Blob();
delete(b);
In this code example, the reference which is set to null is a, not b.
You are not deleting the Blob object, you are setting the reference to null.
When the delete() method is called, there exists 2 references to the Blob.
One reference is b, which is in the calling code.
The other reference is a, which is in the called code.
a is set to null, and then the method exits. But the b reference continues to exist throughout. Therefore the Blob will never be garbage-collected.
To achieve garbage-collection, you must remove all references to an object; then it gets destructed at the JVM's convenience.
The Java Collections API for removing an object during iteration works like this:
Iterator<Blob> itr = list.iterator();
while( itr.hasNext() ) {
Blob b = itr.next();
if( /* test condition */ ) {
itr.remove(); // Safely removes object from List during iteration
}
} // Object `b` goes out of scope, and so this Blob is "lost" to the the code and is going to be destroyed

Accumulating streams in java

Recently I've been trying to reimplement my data parser into streams in java, but I can't figure out how to do one specific thing:
Consider object A with timeStamp.
Consider object B which is made of various A objects
Consider some metrics which tells us time range for object B.
What I have now is some method with state which goes though list with objects A and if it fits into last object B, it goes there, otherwise it creates new B instance and starts putting objects A there.
I would like to do this in streams way
Take whole list of objects A and make it as stream. Now I need to figure out function which will create "chunks" and accumulate them into objects B. How do I do that?
Thanks
EDIT:
A and B are complex, but I will try to post here some simplified version.
class A {
private final long time;
private A(long time) {
this.time = time;
}
long getTime() {
return time;
}
}
class B {
// not important, build from "full" temporaryB class
// result of accumulation
}
class TemporaryB {
private final long startingTime;
private int counter;
public TemporaryB(A a) {
this.startingTime = a.getTime();
}
boolean fits(A a) {
return a.getTime() - startingTime < THRESHOLD;
}
void add(A a) {
counter++;
}
}
class Accumulator {
private List<B> accumulatedB;
private TemporaryBParameters temporaryBParameters
public void addA(A a) {
if(temporaryBParameters.fits(a)) {
temporaryBParameters.add(a)
} else {
accumulateB.add(new B(temporaryBParameters)
temporaryBParameters = new TemporaryBParameters(a)
}
}
}
ok so this is very simplified way how do I do this now. I don't like it. it's ugly.
In general such problem is badly suitable for Stream API as you may need non-local knowledge which makes parallel processing harder. Imagine that you have new A(1), new A(2), new A(3) and so on up to new A(1000) with Threshold set to 10. So you basically need to combine input into batches by 10 elements. Here we have the same problem as discussed in this answer: when we split the task into subtasks the suffix part may not know exactly how many elements are in the prefix part, so it cannot even start combining data into batches until the whole prefix is processed. Your problem is essentially serial.
On the other hand, there's a solution provided by new headTail method in my StreamEx library. This method parallelizes badly, but having it you can define almost any operation in just a few lines.
Here's how to solve your problem with headTail:
static StreamEx<TemporaryB> combine(StreamEx<A> input, TemporaryB tb) {
return input.headTail((head, tail) ->
tb == null ? combine(tail, new TemporaryB(head)) :
tb.fits(head) ? combine(tail, tb.add(head)) :
combine(tail, new TemporaryB(head)).prepend(tb),
() -> StreamEx.ofNullable(tb));
}
Here I modified your TemporaryB method this way:
TemporaryB add(A a) {
counter++;
return this;
}
Sample (assuming Threshold = 1000):
List<A> input = Arrays.asList(new A(1), new A(10), new A(1000), new A(1001), new A(
1002), new A(1003), new A(2000), new A(2002), new A(2003), new A(2004));
Stream<B> streamOfB = combine(StreamEx.of(input), null).map(B::new);
streamOfB.forEach(System.out::println);
Output (I wrote simple B.toString()):
B [counter=2, startingTime=1]
B [counter=3, startingTime=1001]
B [counter=2, startingTime=2002]
So here you actually have a lazy Stream of B.
Explanation:
StreamEx.headTail parameters are two lambdas. First is called at most once when input stream is non-empty. It receives the first stream element (head) and the stream containing all other elements (tail). The second is called at most once when input stream is empty and receives no parameters. Both should produce an output stream which would be used instead. So what we have here:
return input.headTail((head, tail) ->
tb == null is the starting case, create new TemporaryB from the head and call self with the tail:
tb == null ? combine(tail, new TemporaryB(head)) :
tb.fits(head) ? Ok, just add the head into existing tb and call self with the tail:
tb.fits(head) ? combine(tail, tb.add(head)) :
Otherwise again create new TemporaryB(head), but also prepend the output stream with the current tb (actually emitting a new element into target stream):
combine(tail, new TemporaryB(head)).prepend(tb),
Input stream is exhausted? Ok, return the last gathered tb if any:
() -> StreamEx.ofNullable(tb));
Note that headTail implementation guarantees that such solution while looking recursive does not eat the stack and heap more than constant amount. You can check it on thousands of input elements if you doubt:
Stream<B> streamOfB = combine(LongStreamEx.range(100000).mapToObj(A::new), null).map(B::new);
streamOfB.forEach(System.out::println);

Scanner.findInLine() leaks memory massively

I'm running a simple scanner to parse a string, however I've discovered that if called often enough I get OutOfMemory errors. This code is called as part of the constructor of an object that is built repeatedly for an array of strings :
Edit: Here's the constructor for more infos; not much more happening outside of the try-catch regarding the Scanner
public Header(String headerText) {
char[] charArr;
charArr = headerText.toCharArray();
// Check that all characters are printable characters
if (charArr.length > 0 && !commonMethods.isPrint(charArr)) {
throw new IllegalArgumentException(headerText);
}
// Check for header suffix
Scanner sc = new Scanner(headerText);
MatchResult res;
try {
sc.findInLine("(\\D*[a-zA-Z]+)(\\d*)(\\D*)");
res = sc.match();
} finally {
sc.close();
}
if (res.group(1) == null || res.group(1).isEmpty()) {
throw new IllegalArgumentException("Missing header keyword found"); // Empty header to store
} else {
mnemonic = res.group(1).toLowerCase(); // Store header
}
if (res.group(2) == null || res.group(2).isEmpty()) {
suffix = -1;
} else {
try {
suffix = Integer.parseInt(res.group(2)); // Store suffix if it exists
} catch (NumberFormatException e) {
throw new NumberFormatException(headerText);
}
}
if (res.group(3) == null || res.group(3).isEmpty()) {
isQuery= false;
} else {
if (res.group(3).equals("?")) {
isQuery = true;
} else {
throw new IllegalArgumentException(headerText);
}
}
// If command was of the form *ABC, reject suffixes and prefixes
if (mnemonic.contains("*")
&& suffix != -1) {
throw new IllegalArgumentException(headerText);
}
}
A profiler memory snapshot shows the read(Char) method of Scanner.findInLine() to be allocated massive amounts of memory during operation as a I scan through a few hundred thousands strings; after a few seconds it already is allocated over 38MB.
I would think that calling close() on the scanner after using it in the constructor would flag the old object to be cleared by the GC, but somehow it remains and the read method accumulates gigabytes of data before filling the heap.
Can anybody point me in the right direction?
You haven't posted all your code, but given that you are scanning for the same regex repeatedly, it would be much more efficient to compile a static Pattern beforehand and use this for the scanner's find:
static Pattern p = Pattern.compile("(\\D*[a-zA-Z]+)(\\d*)(\\D*)");
and in the constructor:
sc.findInLine(p);
This may or may not be the source of the OOM issue, but it will definitely make your parsing a bit faster.
Related: java.util.regex - importance of Pattern.compile()?
Update: after you posted more of your code, I see some other issues. If you're calling this constructor repeatedly, it means you are probably tokenizing or breaking up the input beforehand. Why create a new Scanner to parse each line? They are expensive; you should be using the same Scanner to parse the entire file, if possible. Using one Scanner with a precompiled Pattern will be much faster than what you are doing now, which is creating a new Scanner and a new Pattern for each line you are parsing.
The strings that are filling up your memory were created in findInLine(). Therefore, the repeated Pattern creation is not the problem.
Without knowing what the rest of the code does, my guess would be that one of the groups you get out of the matcher is being kept in a field of your object. Then that string would have been allocated in findInLine(), as you see here, but the fact that it is being retained would be due to your code.
Edit:
Here's your problem:
mnemonic = res.group(1).toLowerCase();
What you might not realize is that toLowerCase() returns this if there are no uppercase letters in the string. Also, group(int) returns a substring(), which creates a new string backed by the same char[] as the full string. So, mnemonic actually contains the char[] for the entire line.
The fix would just be:
mnemonic = new String(res.group(1).toLowerCase());
I think that your code snippet is not full. I believe you are calling scanner.findInLine() in loop. Anyway, try to call scanner.reset(). I hope this will solve your problem.
The JVM apparently does not have time to Garbage collect. Possibly because it's using the same code (the constructor) repeatedly to create multiple instances of the same class. The JVM may not do anything about GC until something changes on the run time stack -- and in this case that's not happening. I've been warned in the past about doing "too much" in a constructor as some of the memory management behaviors are not quite the same when other methods are being called.
Your problem is that you are scanning through a couple hundred thousand strings and you are passing the pattern in as a string, so you have a new pattern object for every single iteration of the loop. You can pull the pattern out of the loop, like so:
Pattern toMatch = Pattern.compile("(\\D*[a-zA-Z]+)(\\d*)(\\D*)")
Scanner sc = new Scanner(headerText);
MatchResult res;
try {
sc.findInLine(toMatch);
res = sc.match();
} finally {
sc.close();
}
Then you will only be passing the object reference to toMatch instead of having the overhead of creating a new pattern object for every attempt at a match. This will fix your leak.
Well I've found the source of the problem, it wasn't Scanner exactly but the list holding the objects doing the scanning in the constructor.
The problem had to do with the overrun of a list that was holding references to the object containing the parsing, essentially more strings were received per unit of time than could be processed and the list grew and grew until there were no more RAM. Bounding this list to a maximum size now prevents the parser from overloading the memory; I'll be adding some synchronization between the parser and the data source to avoid this overrun in the future.
Thank you all for your suggestions, I've already made some changes performance wise regarding the scanner and thank you to #RobI for pointing me to jvisualvm which allowed me to trace back the exact culprits holding the references. The memory dump wasn't showing the reference linking.

What is more efficient StringBuffer new() or delete(0, sb.length())?

It is often argued that avoiding creating objects (especially in loops) is considered good practice.
Then, what is most efficient regarding StringBuffer?
StringBuffer sb = new StringBuffer();
ObjectInputStream ois = ...;
for (int i=0;i<1000;i++) {
for (j=0;i<10;j++) {
sb.append(ois.readUTF());
}
...
// Which option is the most efficient?
sb = new StringBuffer(); // new StringBuffer instance?
sb.delete(0,sb.length()); // or deleting content?
}
I mean, one could argue that creating an object is faster then looping through an array.
First StringBuffer is thread-safe which will have bad performance compared to StringBuilder. StringBuilder is not thread safe but as a result is faster. Finally, I prefer just setting the length to 0 using setLength.
sb.setLength(0)
This is similar to .delete(...) except you don't really care about the length. Also probably a little faster since it doesn't need to 'delete' anything. Creating a new StringBuilder (or StringBuffer) would be less efficient. Any time you see new Java is creating a new object and placing that on the heap.
Note: After looking at the implementation of .delete and .setLength, .delete sets length = 0, and .setLength sets every thing to '\0' So you may get a little win with .delete
Just to amplify the previous comments:
From looking at source, delete() always calls System.arraycopy(), but if the arguments are (0,count), it will call arraycopy() with a length of zero, which will presumably have no effect. IMHO, this should be optimized out since I bet it's the most common case, but no matter.
With setLength(), on the other hand, the call will increase the StringBuilder's capacity if necessary via a call to ensureCapacityInternal() (another very common case that should have been optimized out IMHO) and then truncates the length as delete() would have done.
Ultimately, both methods just wind up setting count to zero.
Neither call does any iterating in this case. Both make an unnecessary function call. However ensureCapacityInternal() is a very simple private method, which invites the compiler to optimize it nearly out of existence so it's likely that setLength() is slightly more efficient.
I'm extremely skeptical that creating a new instance of StringBuilder could ever be as efficient as simply setting count to zero, but I suppose that the compiler might recognize the pattern involved and convert the repeated instantiations into repeated calls to setLength(0). But at the very best, it would be a wash. And you're depending on the compiler to recognize the case.
Executive summary: setLength(0) is the most efficient. For maximum efficiency, pre-allocate the buffer space in StringBuilder when you create it.
The delete method is implemented this way:
public AbstractStringBuilder delete(int start, int end) {
if (start < 0)
throw new StringIndexOutOfBoundsException(start);
if (end > count)
end = count;
if (start > end)
throw new StringIndexOutOfBoundsException();
int len = end - start;
if (len > 0) {
System.arraycopy(value, start+len, value, start, count-end);
count -= len;
}
return this;
}
As you can see it doesn't iterate through the array.

Categories