Variable Declaration in multi thread usage, Java [Memory Leak Issue] - java

I'm building a crawler using Jsoup Library in Java.
The code structure is as follows:
public static BoneCP connectionPool = null;
public static Document doc = null;
public static Elements questions = null;
static
{
// Connection Pool Created here
}
In the MAIN method, I've called getSeed() method from 10 different threads.
The getSeed() method selects 1 random URL from the database and forwards it to processPage() method.
The processPage() method connects to the URL passed from getSeed() method using jSoup library and extracts all the URLs from it and further adds them all to database.
This process goes on for 24x7.
The problem is:
In processPage() method, it first connects to the URL sent from getSeed() method using:
doc = Jsoup.connect(URL)
And then, for each URL that is found by visiting that particular URL, a new connection is made again by jSoup.
questions = doc.select("a[href]");
for(Element link: questions)
{
doc_child = Jsoup.connect(link.attr("abs:href"))
}
Now, if I declare doc and questions variable as global variable and null them after whole processing in processPage() method, it solves the problem of memory leak but the other threads stops because doc and questions get nulled in between. What should I do next ?

It's crying "wrong design" if you are using static fields, particularly for that kind of state, and based on your description it seems like it's behaving very thread-unsafe. I don't know why you think you have a memory-leak at hand but whatever it is it's easier to diagnose if stuff is in order.
What I would say is, try getting something working based on something like this:
class YieldLinks implements Callable<Set<URI>>{
final URI seed;
YieldLinks(URI seed){
this.seed = seed;
}
}
public static void main(String[] args){
Set<URI> links = new HashSet<>();
for(URI uri : seeds){
YieldLinks yieldLinks = new YieldLinks(uri);
links.addAll(yieldLinks.call());
}
}
Once this single threaded thing works ok, you could look at adding threads.

Related

Get Hit Count of any url / resource using web crawler

I have made web crawler in java. It traverse through the links present in each page recursively. Now i want to get the count of hits a particular page got. Is it possible via web crawler? since we don't have any access to server code, we can't add any counter to count the hits. Please suggest any solution. Thanks.
Basic Structure of code is :
-> get the html source code of url.
-> find the reachable links from html code and put it in a list.
-> take the next link from list and continue the same till the list becomes empty.
I just want to show the hit count for each link.
One thing I can suggest is to wrap your link into a class, let it have a variable called counter to keep record of it. So basically you will have a list of the Link class. Example below:
public class Link{
private String url;
private int count = 0;
public Link(String url){
this.url = url; // initialise your link class with a url
}
public String getUrl(){
increment();
return url;
}
public void increment(){
count++;
}
public int getCount(){
return count;
}
}
Then count it like this:
List<Link> links.... // initialise your links
Document doc = Jsoup.connect(links.get(i).getUrl()).get();
This way, everytime your url is accessed, the count is incremented to keep record of total hits.

Threading a recursive function

I have this recursive function that finds hrefs on a URL and adds them all to a global list. This is done synchronously and takes a long time. I have tried to do this with threading but have failed to get all threads to write to the one list. Could someone please show me how to do this with threading?
private static void buildList (String BaseURL, String base){
try{
Document doc = Jsoup.connect(BaseURL).get();
org.jsoup.select.Elements links = doc.select("a");
for(Element e: links){
//only if this website has no longer been visited
if(!urls.contains(e.attr("abs:href"))){
//eliminates pictures and pdfs
if(!e.attr("abs:href").contains(".jpg")){
if(!e.attr("abs:href").contains("#")){
if(!e.attr("abs:href").contains(".pdf")){
//makes sure it doesn't leave the website
if(e.attr("abs:href").contains(base)){
urls.add(e.attr("abs:href"));
System.out.println(e.attr("abs:href"));
//recursive call
buildList(e.attr("abs:href"),base);
}
}
}
}
}
}
} catch(IOException ex) {
}
//to print out all urls.
/*
* for(int i=0;i<urls.size();i++){
* System.out.println(urls.get(i));
* }
*/
}
This is a great use case for ForkJoin. It'll provide excellent concurrency with very simple code.
For the set of urls parsed, use a Collections.synchronizedSet(new HashSet<String>());.
You can also create a larger ForkJoinPool than the amount of cores you have, since there's network involved (the common usage expects that each thread will be performing work at ~100%).
Use any of collection from concurrent package to store the values you get from different threads. ArrayBloac
You can use fork and join once you break you your problem into divide and conquer algo.

Activating and Deactivating Entities in Dynamics CRM from Java API

I am using the Microsoft Dynamics CRM, using the Java API generated as per their tutorial and SDK downloads.
I can create, delete, and update entities with no problems.
I am now at the stage where I need to set entities to active or inactive.
I had thought that the right way to do this was roughly
public void doIt(OrganisationServicesStub stub, OptionSetValue stateValue, OptionSetValue statusValue)
{
Guid g = new Guid();
g.setGuid("abc-def-ghijkl");
Entity updateMe = new Entity();
updateMe.setId(g);
updateMe.setLogicalName("ei_teacherdetails");
AttributeCollection updateCollection = new AttributeCollection();
updateCollection.addKeyValuePairOfstringanyType(pair("statecode", stateValue));
updateCollection.addKeyValuePairOfstringanyType(pair("statuscode", statusValue));
updateMe.setAttributes(updateCollection);
update.setEntity(updateMe);
stub.update(update);
}
public static KeyValuePairOfstringanyType pair(String key, Object value)
{
KeyValuePairOfstringanyType attr = new KeyValuePairOfstringanyType();
attr.setKey(key);
attr.setValue(value);
return attr;
}
The above code has been tested and works for updating any attributes except the state/status ones. When I try the above code, however, (i.e. the code that tries to update the state/status), I get the following error (calling with state/status values of 1 and 2 respectively. I got those values by looking at existing Invalid entries in the CRM dumped through the same api, so I am (almost) certain that they are correct.
org.apache.axis2.AxisFault: 2 is not a valid status code for state code ei_teacherdetailsState.Active
I have noticed that in other languages, there is a SetState request, but I don't find a similar one in Java.
If anyone has been down this path before me, I'd greatly appreciate any assistance you could give.
It turns out that the correct answer is as follows, as best I can tell....
private void doIt(OrganizationServiceStub stub, OptionSetValue state, OptionSetValue status)
{
OrganizationRequest request = new OrganizationRequest();
request.setRequestName("SetState");
ParameterCollection collection = new ParameterCollection();
collection.addKeyValuePairOfstringanyType(pair("State", state));
collection.addKeyValuePairOfstringanyType(pair("Status", status));
request.setParameters(collection);
Guid g = new Guid();
g.setGuid("abc0def-ghi");
EntityReference ref = new EntityReference();
ref.setId(g);
ref.setLogicalName("ei_teacherdetails");
collection.addKeyValuePairOfstringanyType(pair("EntityMoniker", ref));
Execute exe = new Execute();
exe.setRequest(request);
stub.execute(exe);
}
Which is pretty obscure, I think. Especially I like that there's a parameter called "EntryMoniker". Anyway, I leave this answer here just in case some other poor soul ends up having to deal with this MS CRM intricacy.

How to use R model in java to predict with multiple models?

I have this constructor:
public Revaluator(File model,PrintStream ps) {
modelFile=model;
rsession=Rsession.newInstanceTry(ps, null);
rsession.eval("library(e1071)");
rsession.load(modelFile);
}
i want to load a model and predict with it.
the problem that Rsession.newInstanceTry(ps, null); is always the same session, so if i load another model, like:
Revaluator re1=new Revaluator(new File("model1.RData"),System.out);
Revaluator re2=new Revaluator(new File("model2.RData"),System.out);
Both re1 and re2 using the same model, since the var name is model, so only the last one loaded.
the evaluate function:
public REXP evaluate(Object[] arr){
String expression=String.format("predict(model, c(%s))", J2Rarray(arr));
REXP ans=rsession.eval(expression);
return ans;
}
//J2Rarray just creates a string from the array like "1,2,true,'hello',false"
i need to load about 250 predictors, is there a way to get every instance of Rsession as a new separated R Session?
You haven't pasted all of your code in your question, so before trying the (complicated) way below, please rule out the simple causes and make sure that your fields modelFile and rsession are not declared static :-)
If they are not:
It seems that the way R sessions are created is OS dependent.
On Unix it relies on the multi-session ability of R itself, on Windows it starts with Port 6311 and checks if it is still free. If it's not, then the port is incremented and it checks again, if it's free and so on.
Maybe something goes wrong with checking free ports (which OS are you working on?).
You could try to configure the ports manually and explicitly start different local R servers like this:
Logger simpleLogger = new Logger() {
public void println(String string, Level level) {
if (level == Level.WARNING) {
p.print("! ");
} else if (level == Level.ERROR) {
p.print("!! ");
}
p.println(string);
}
public void close() {
p.close();
}
};
RserverConf serverConf = new RserverConf(null, staticPortCounter++, null, null, null);
Rdaemon server = new Rdaemon(serverConf, this);
server.start(null);
rsession = Rsession.newInstanceTry(serverConf);
If that does not work, please show more code of your Revaluator class and give details about which OS you are running on. Also, there should be several log outputs (at least if the log level is configured accordingly). Please paste the logged messages as well.
Maybe it could also help to get the source code of rsession from Google Code and use a debugger to set a breakpoint in Rsession.begin(). Maybe this can help figuring out what goes wrong.

Java: ExceptionInInitializerError caused by NullPointerException when constructing a Locale object

I'm working on localization for a program I've written with a couple other guys. Most of the strings now load in the appropriate language from an ini file. I'm trying to do the same with the format of currency in the program. However, I'm getting a runtime exception as soon as I attempt to launch the application.
I'm using the Locale object as a parameter to a few NumberFormat.getCurrencyInstance()'s, like so:
private static final NumberFormat decf;
static
{
decf = NumberFormat.getCurrencyInstance(Lang.cLocale);
decf.setRoundingMode(RoundingMode.HALF_UP);
}
Lang is the class which contains all the localization stuff. The code the IDE complains about at attempted runtime is public static Locale cLocale = new Locale(GUI.DB_info[19],GUI.DB_info[20]);
GUI is the class the GUI is contained in, and where we decided to construct the DB_info array (which itself just contains information loaded from a remote database in another class). DB_info[19] is the language code (es right now) and DB_info[20] is the country code (US). The array elements are being properly filled-- or were, I can't get far enough into the program to tell right now; but nothing has changed with the code for filling DB_info.
The full exception is as follows:
Exception in thread "main" java.lang.ExceptionInInitializerError
at greetingCard.GUI.<clinit>(GUI.java:118)
Caused by: java.lang.NullPointerException
at java.util.Locale.<init>(Unknown Source)
at java.util.Locale.<init>(Unknown Source)
at greetingCard.Lang.<clinit>(Lang.java:13)
... 1 more
The line in GUI referenced is: static String welcome = Lang.L_WELCOME + ", " + empName;, and Lang.java basically looks like this:
// Set locale for currency display
public static Locale cLocale = new Locale(GUI.DB_info[19],GUI.DB_info[20]); // language, country
// Employee specific strings
public static String L_AMT_REMAIN = "";
public static String L_AMT_TEND = "";
public static String L_APPROVED = "";
public static String L_ARE_YOU_SURE = "";
[...]
public static void Main(String emp_lang)
{
String header = "";
if (emp_lang.equals("ENG"))
{
header = "ENG";
}
else if (emp_lang.equals("SPA"))
{
header = "SPA";
}
else if (emp_lang.equals("FRE"))
{
header = "FRE";
}
else if (emp_lang.equals("GER"))
{
header = "GER";
}
else
{
header = "ENG";
}
try
{
Ini ini = new Ini(new File("C:/lang.ini"));
L_AMT_REMAIN = ini.get(header, "L_AMT_REMAIN");
L_AMT_TEND = ini.get(header, "L_AMT_TEND");
L_APPROVED = ini.get(header, "L_APPROVED");
L_ARE_YOU_SURE = ini.get(header, "L_ARE_YOU_SURE");
[...]
L_WELCOME = ini.get(header, "L_WELCOME");
L_WELCOME2 = ini.get(header, "L_WELCOME2");
L_XACT_CHNG = ini.get(header, "L_XACT_CHNG");
L_YES = ini.get(header, "L_YES");
System.err.println("Employee Language: " + header);
}
catch (InvalidFileFormatException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
} // end public static void main
That's for the majority of the strings to be displayed in different languages. There is another method inside Lang that loads some other strings, independent of the first set. I don't believe it factors into this problem but I can post it if needed.
The order in which these classes/methods get launched is as follows:
GUI.Main calls the Login class, which calls a CreateLogin method. That method calls Clients.main, which gets the DB_info array from GUI passed to it. Clients fills the DB_info array. Lang.other is then called (to get language-specific strings for the login page), and the Login buttons and labels are created. Once a login is successful, the perferred language of the employee logging in (from a DB) is passed to Lang.main to load the other strings (hence the emp_lang being passed in the code above).
Up until I added the code for the Locale object, all of this worked fine. Now I get the ExceptionInInitializerError exception. Anyone know what's going on?
BTW, for loading from the ini file I'm using ini4j. Some forum posts I found while googling suggest this is a problem with that, but I don't see how it relates to the problem with Locale objects. The ini stuff works (worked) fine.
Sounds like you have a cycle in your static initializers, so something is not initialized yet.
GUI calls Lang's static initializer before getting Lang.L_WELCOME. Lang calls GUIs static initializer in line 2. Your exception trace makes it look like GUI calls Langs static initializer for some reason.
In all, cycles like this mean that someone is going to reference a statically initialized object and get null instead of what they expected to get. In this case, I suspect Lang.java, line 2, is passing two null pointers to the Locale constructor.
As Keith notes, you have a static initializer cycle. To help future readers...
To minimize these bugs, initialize (simple) constants (with no or minimal constructors) before (complex) variables, so here String before Locale – less room for cycles to cause problems.
Debugging-wise, NullPointerException on a static field and 2 <clinit> in stack trace, with the earlier class appearing in the failing line, are the clues that this is an uninitialized field caused by a static initializer cycle.

Categories