Is jsoup Document thread safe? - java

Is it safe to use jsoup someDocument.select(..) from multiple threads or is there some internal state for read operations?

You can call safely Document.select(String cssSelector) from multiple threads even though Document class is not thread-safe. Underlying implementation of .select(String cssSelector) method passes reference to the element that called this method (Document object in this case), but it does not call any method that changes state of the caller.
When you call .select(String cssSelector) you actually call Collector.collect(Evaluator eval, Element root) method, where root instance is a reference to the Document object.
/**
Build a list of elements, by visiting root and every descendant of root, and testing it against the evaluator.
#param eval Evaluator to test elements against
#param root root of tree to descend
#return list of matches; empty if none
*/
public static Elements collect (Evaluator eval, Element root) {
Elements elements = new Elements();
new NodeTraversor(new Accumulator(root, elements, eval)).traverse(root);
return elements;
}
In this method only elements object gets updated.
Why Document class is not thread-safe?
There are a few methods in Document class that allow to change state of the object without any synchronization mechanism, e.g. Document.outputSettings(Document.OutputSettings outputSettings). In best case Document class should be final and immutable so sharing its instance between multiple threads won't be a problem.

Related

How to get Node-Entry Reference from LinkedList in java

How to get the reference to the actual Node (entry) object from LinkedList - Not the value that it holds?
Something like this :
import java.util.LinkedList;
public class Main {
public static void main(String[] args) {
LinkedList<String> test = new LinkedList<String>();
test.addLast("first");
test.addLast("second");
test.addLast("third");
var thirdNodeReference = test.getLastNode(); // Does this even exist ?
test.addLast("fourth");
var secondNodeReference = thirdNodeReference.previous(); // To perform such operations.
}
}
Does there exist a method like LinkedList.getLastNode() in java LinkedList so that one can perform previous() or next() operations on it?
I know LinkedList.listIterator() exists but that's not useful, because I'll be having references to each Node (entry), and I need to work with them - such as lastNodeReference in the code above.
If such a functionality doesn't exist in JAVA Standard Library, is there any 3rd party (external) Library that I can use?
Reason:
I need to access the Node to perform remove() operation in O(1).
In the actual JAVA code implementation it performs this in O(N) by traversing the list to find the Node containing the given object by performing equals() on every node on it's way. Also, check this comment.
This can be performed ideally in O(1) if we have a direct reference to Node object - because remove() only requires a change of 2 pointers of previous and next Node.
There is a descendingIterator method on LinkedList, which is described as Returns an iterator over the elements in this deque in reverse sequential order, while it isn't (completely) clear what OP wants, Iterator does have a .next, .previous, and .remove methods.
a linked list can be represented as such:
so no, you can't get the previous element with a linked list.
You might want to implement a double linked list tho ( exemples codes can be found quite easily on google)

Java stream, remove and perform action from ConcurrentLinkedQueue

I am unsure how to do this, I'd like to iterate the ConcurrentLinkedQueue (all of it), removing the i-th item and performing some code on it.
This is what I was used to do:
public static class Input {
public static final ConcurrentLinkedQueue<TreeNode> treeNodes = new ConcurrentLinkedQueue<>();
}
public static class Current {
public static final ConcurrentHashMap<Integer, TreeNode> treeNodes = new ConcurrentHashMap<>();
}
TreeNode is a simple class
TreeNode treeNode = Input.treeNodes.poll();
while (treeNode != null) {
treeNode.init(gl3);
Current.treeNodes.put(treeNode.getId(), treeNode);
treeNode = Input.treeNodes.poll();
}
This is how I am trying to do using stream:
Input.treeNodes.stream()
.forEach(treeNode -> {
Input.treeNodes.remove(treeNode);
treeNode.init(gl3);
Current.treeNodes.put(treeNode.getId(), treeNode);
});
I am afraid that something may be error prone having to remove the item inside the forEach action.
So my question is:
Is this safe and/or are there any better ways to do it?
Just as you've assumed, you should not modify the backing collection while processing the stream because you might get a ConcurrentModificationException (just as with for(Object o:objectArray){} loops)
On the other hand it is not very clear which TreeNode you are trying to remove, as in the current case, seemingly you wish to remove all elements from the List, perform some actions on them and put them in a Map.
You may safely achieve your current logic via:
Input.treeNodes.stream()
.map(treeNode -> {
treeNode.init(gl3);
Current.treeNodes.put(treeNode.getId(), treeNode);
});
Input.treeNodes.clear();
This behavior is determine by the Spliterator used to construct the Stream. The documentation of ConcurrentLinkedQueue.spliterator() says:
Returns a Spliterator over the elements in this queue.
The returned spliterator is weakly consistent.
“weakly consistent” implies:
Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators and Spliterators provide weakly consistent rather than fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.
This implies that removing the encountered elements should not have any impact.
On the other hand, when other threads add or remove elements, the outcome of your Stream operation regarding these elements is unpredictable.
However, you should consider that remove(Object) is not the intended use case for a queue.

Java LinkedList -- Retrieving the index of an Object

I guess this question that would have already been asked here. I searched but could not find anything similar. So here goes --
I have a custom data object Method and Method is as follows --
public Class Method {
List<String> inputParameters;
String resultVariableName;
}
Now i have a LinkedList<Method> which acts as a repository of Method objects.
Given a Method object is there a way in which the correct index of the Method object can be concretely determined.
My question arises from the face that LinkedList class has an indexOf routine but this routine returns the first occurrence of the object but then there is no given that 2 copies of Method object can not reside in the LinkedList(right ?)
Would tagging every Method object as I add it to the LinkedList solve my purpose and if so is there an ideal way to do it ?
EDIT :
Explaining my use case a little further.
My code basically reads a Velocity template top-down and creates Method objects. One Method object is created for every velocity routine encountered.
This explains why the same element can be stored at multiple indices in the LinkedList as there is no real restriction on how many number of time a Velocity routine is called or the inputs/results provided to the Velocity routine.
Now, i have a UI component, one JButton per Method object reference in the LinkedList<Method> by using which the user can click and edit the Method object.
Thus i need to know which exact Method object reference to edit in the event that same elements reside twice or more number of times in the LinkedList<Method>
What do you mean by the "correct" index in the first place? If the linked list can contain the same element twice or more (and be careful here - the list will only contain a reference to a Method object, not the object itself) then which index would be "correct" in your view?
Of course you can just iterate over the linked list yourself and return all indexes at which a given Method reference occurs, but it's not clear what you're trying to do with it.
Note that indexes aren't often used with linked lists to start with, as obtaining the element at a given index is an O(n) operation.
Duplicates are allowed in LinkedList's.
LinkedList does not avoid duplicates, it may have more than one copy.
You can put a logic to avoid multiple instances, extend the linkedlist class and override the add function to check if Method object already exists.
OR
If you want to get all instances of the Method object, you can use a ListIterator and collect all instances of it, and return this collection as a result.
"there is not given 2 copies of Method object can not reside in the LinkedList", if this is a scenario, how will you identify which object to retrieve??
In this case, I would suggest you to use a LinkedHashMap, where you can use a Identifier as a key to uniquely identify a Method's object.

How to retain the content of the variable on every call in java

Hi am developing a linked list in java.
In my system there will one master node where the master node will distribute the incoming request to one of its slave node.
In order to make use of cache memory, i initialized a linked list for each node (where all the content will be maintained in the master node).
I update the linked list of respective node before the node process any query (so that i can find what are request recently processed by the respective nodes).
But the problem is, at an instance when i try to find the content of the linked list its empty. Even when i try to add new content, it creates the list newly and then add the content. I think the list is created newly every time when i access the java file containing the linkedlist implementation.
Is it possible to retain the content of the linked list and update the linked list with the previous content. Is their any inbuilt function in java to retain the state of the variable or where can i initialize the list in order to achieve what i expect.
My code is as follow
import LinkedList.QueueImplement;
public class Node {
protected LinkedList<String> list;
public Node(String address, String serviceName) {
this.list=new LinkedList<String>();
}
public void addlist(String data) {
list.add(data);
}
}
I suspect the problem is just that every Node constructs its own LinkedList, as is clearly shown in the little code we see. If the variable list were marked static, and constructed at its declaration rather than in the constructor, then all Nodes would share a single LinkedList, which is probably what you want.

Jsoup: Performance of top-level select() vs. inner-level select()

My understanding is that once a document is loaded into Jsoup, using Jsoup.parse(), no parsing is required again as a neatly hierarchical tree is ready for programmer's use.
But what I am not sure whether top-level select() is more costly than inner-level select().
For example, if we have a <p> buried inside many nested <div>s, and that <p>'s parent is already available in the program, will there be any performance difference between:
document.select("p.pclass")
and
pImediateParent.select("p.pclass")
?
How does that work in Jsoup?
UPDATE: Based on the answer below, I understand that both document.select() and pImediateParent.select() use the same exact static method, just with a different root as the second parameter:
public Elements select(String query) {
return Selector.select(query, this);
}
Which translates into:
/**
* Find elements matching selector.
*
* #param query CSS selector
* #param root root element to descend into
* #return matching elements, empty if not
*/
public static Elements select(String query, Element root) {
return new Selector(query, root).select();
}
I am not surprised, but the question now is how does that query work? Does it iterate to find the queried element? Is it a random access (as in hash table) query?
Yes, it will be faster if you use the intermediate parent. If you check the Jsoup source code, you'll see that Element#select() actually delegates to the Selector#select() method with the Element itself as 2nd argument. Now, the javadoc of that method says:
select
public static Elements select(String query, Element root)
Find elements matching selector.
Parameters:
query - CSS selector
root - root element to descend into
Returns:
matching elements, empty if not
Note the description of the root parameter. So yes, it definitely makes difference. Not shocking, but there is some difference.

Categories