Java Servlet with Multi-threading - java

I am trying to create multiple output text data files based on the data present in the servlet request. The constraints to my servlet are that:
My servlet waits for enough requests to hit a threshold (for example 20 names in a file) before producing a file
Otherwise it will timeout after a minute and produce a file
The code I have written is such that:
doGet is not synchronized
Within doGet I am creating a new thread pool (reason being that the calling application to my servlet would not send a next request until my servlet returns a response back - so I validate the request and return an instant acknowledgement back to get new requests)
Pass over all request data to the thread created in a new thread pool
Invoke synchronized function to do thread counting and file printing
I am using wait(60000). The problem is that the code produces files with correct threshold (of names) within a minute, but after the timeout of a minute, the files produced (a very few) have capacity exceeded for example, names more than what I have defined in the capacity.
I think it has something to do with the threads who when wake up are causing an issue?
My code is
if(!hashmap_dob.containsKey(key)){
request_count=0;
hashmap_count.put(key, Integer.toString(request_count));
sb1 = new StringBuilder();
sb2 = new StringBuilder();
sb3 = new StringBuilder();
hashmap_dob.put(key, sb1);
hashmap_firstname.put(key, sb2);
hashmap_surname.put(key, sb3);
}
if(hashmap_dob.containsKey(key)){
request_count = Integer.parseInt(hm_count.get(key));
request_count++;
hashmap_count.put(key, Integer.toString(request_count));
hashmap_filehasbeenprinted.put(key, Boolean.toString(fileHasBeenPrinted));
}
hashmap_dob.get(key).append(dateofbirth + "-");
hashmap_firstname.get(key).append(firstName + "-");
hashmap_surname.get(key).append(surname + "-");
if (hashmap_count.get(key).equals(capacity)){
request_count = 0;
dob = hashmap_dob.get(key).toString();
firstname = hashmap_firstname.get(key).toString();
surname = hashmap_surname.get(key).toString();
produceFile(required String parameters for file printing);
fileHasBeenPrinted = true;
sb1 = new StringBuilder();
sb2 = new StringBuilder();
sb3 = new StringBuilder();
hashmap_dob.put(key, sb1);
hashmap_firstname.put(key, sb2);
hashmap_surname.put(key, sb3);
hashmap_count.put(key, Integer.toString(request_count));
hashmap_filehasbeenprinted.put(key, Boolean.toString(fileHasBeenPrinted));
}
try{
wait(Long.parseLong(listenerWaitingTime));
}catch (InterruptedException ie){
System.out.println("Thread interrupted from wait");
}
if(hashmap_filehasbeenprinted.get(key).equals("false")){
dob = hashmap_dob.get(key).toString();
firstname = hashmap_firstname.get(key).toString();
surname = hm_surname.get(key).toString();
produceFile(required String parameters for file printing );
sb1 = new StringBuilder();
sb2 = new StringBuilder();
sb3 = new StringBuilder();
hashmap_dob.put(key, sb1);
hashmap_firstname.put(key, sb2);
hashmap_surname.put(key, sb3);
fileHasBeenPrinted= true;
request_count =0;
hashmap_filehasbeenprinted.put(key, Boolean.toString(fileHasBeenPrinted));
hashmap_count.put(key, Integer.toString(request_count));
}
If you have got to here, then thank you for reading my question and thanks in advance if you have any thougths on it towards resolution!

I didn't look at your code but I find your approach pretty complicated. Try this instead:
Create a BlockingQueue for the data to work on.
In the servlet, put the data into a queue and return.
Create a single worker thread at startup which pulls data from the queue with a timeout of 60 seconds and collects them in a list.
If the list has enough elements or when a timeout occurs, write a new file.
Create the thread and the queue in a ServletContextListener. Interrupt the thread to stop it. In the thread, flush the last remaining items to the file when you receive an InterruptedException while waiting on the queue.

As per my understanding, you want to create/produce a new file in two situations:
Number of request hit a predefined threshold.
Threshold time-out completes.
I would suggest following:
Use APPLICATION-SCOPED variable: requestMap containing object of HttpServletRequest.
On every servlet hit, just add the received request to map.
Now create listener/filter requestMonitor whatever is suitable, to monitor values of requestMap.
RequestMonitor should check if the requestMap has grown to predefined threshold.
If it has not, then it should allow servlet to add request object.
If it has, then it should print file, empty requestMap, then allow Servlet to add next request.
For timeout, you can check when the last file was produced with LAST_FILE_PRODUCED variable in APPLICATION_SCOPE. This should be updated every time file is produced.

I tried to read your code, but there is a lot of information missing, so if you could please give more details:
1) the indentation is messed up and I'm not sure if there were some mistakes introduced when you copied your code.
2) What is the code you are posting? The code that is called on some other thread after by doGet?
3) Maybe you could also add the variable declarations. Are those thread safe types (ConcurrentHashMap)?
4) I'm not sure we have all the information about fileHasBeenPrinted. Also it seems to be a Boolean, which is not thread safe.
5) you talk about "synchronized" functions, but you did not include those.
EDIT:
If the code you copied is a synchronized method, that means if you have many requests, only one of them only ever runs at a given time. The 60 seconds waiting is always invoked it seems (it is not quite clear with the indentation, but I think there is always a 60 seconds wait, whether the file is written or not). So you lock the synchronized method for 60 seconds before another thread (request) can be processed. That could explain why you are not writing the file after 20 requests, since more than 20 requests can arrive within 60 seconds.

Related

Camunda - executing processes in specific order

Let's say that we have bussiness process A. Process A might take more or less time (it's not known).
Normally you can have multiple A processes, but sometimes during some operations we need to make sure that one process execution is made after previous one.
How can we achieve it in Camunda? Tried to find something like process dependency (so process starts after previous one is complete), but couldn't find anything :(
I thought about adding some variable in process (like depending_process) and checking if specified process is done, but maybe there would be some better solution.
Ok, after some research I got solution.
On the beginning of every process I check for processes started by current user:
final DateTime selfOrderDate = (DateTime) execution.getVariable(PROCESS_ORDER_DATE);
List<ProcessInstance> processInstanceList = execution
.getProcessEngineServices()
.getRuntimeService()
.createProcessInstanceQuery()
.processDefinitionId(execution.getProcessDefinitionId())
.variableValueEquals(CUSTOMER_ID, execution.getVariable(CUSTOMER_ID))
.active()
.list();
int processesOrderedBeforeCurrentCount = 0;
for (ProcessInstance processInstance : processInstanceList) {
ExecutionEntity entity = (ExecutionEntity) processInstance;
if (processInstance.getId().equals(execution.getId()))
continue;
DateTime orderDate = (DateTime) entity.getVariable(PROCESS_ORDER_DATE);
if (selfOrderDate.isAfter(orderDate)) {
processesOrderedBeforeCurrentCount += 1;
}
}
Then I save number of previously started processes to Camunda and in next task check if it's equal to 0. If yes, I proceed, if nope, I wait 1s (using Camunda's timer) and check again.

Is there any way to refer to, or convert a thread object as a string?

i am currently playing with an echo server and echo client which was provided fundamentally by my lecturer. each client connects to a socket thread which is started by start() in the EchoServer class. anyway i put the line "System.out.println(this);" within the loop in the server class.
Thankfully this gives the output "Thread[Thread-0,5,main]","Thread[Thread-1,5,main]", depending on which thread it is, the first or second respectively. I want to be able to say:
if (this == "Thread[Thread-1,5,main]"){
do so and so
}else{
do so and so
}
However "Thread[Thread-1,5,main]" is not a string so is there a way i can refer to the current thread as a string or how can i refer to it or convert it?
also sorry for not sharing the code im just not sure if it is lawful considering it belongs to my lecturer.
I think you may be looking for Thread#getName and Thread#setName.
However, as hexafraction pointed out, if you have references to the threads, just do the comparison directly:
Thread t1 = new Thread(/*...*/);
Thread t2 = new Thread(/*...*/);
// ...later...
if (Thread.currentThread() == t1) {
// It's t1
}
Thread.currentThread returns a reference to the currently-executing thread.

Java loop for reading multiple URL's is slowing down

I have written some code to comb through approximately 10000 web pages on a website to put together a profile of the user demographics on the website. The basis of the program is to read each line of the source code of the website, parse out the data wanted, then move onto the next page.
I am encountering an issue where around the 650th page or so, the program goes from reading around 3 pages per second to 1 page per 10-15 seconds. It always occurs at the same point of the program execution. I began wondering if this might be a memory issue with my program and begin to check each aspect of it. Eventually I stripped the program down to its basics:
Step 1) Create an array of URL objects.
Step 2) Loop through the array and open/close a buffered reader to read each line.
Step 3) Read the entire page and move onto the next line.
Even this slowed down in the exact spot, so this isn't a problem with the data I am parsing or where I am storing it. It is a result of this loop somehow. I am wondering if there is a memory issue with what I have written that is causing issues? Otherwise my only guess is somehow I am making calls too quickly to the website servers and it is intentionally slowing me down.
**Obviously not the best written code, as I am new and subject to a bunch of sloppy coding. But it does execute perfectly what I want. The issue is it slows down to a crawl after about ten minutes, which won't work.
Here is the relevant code:
Array code
import java.io.IOException;
import java.net.URL;
public class UrlArrayBuild {
private int page_count; //number of pages
public URL[] urlArray; //array of webpage url's
public UrlArrayBuild(int page) { //object constructor
page_count = page; //initializes page_count
urlArray = new URL[page_count]; //initializes page_count
}
protected void buildArray() throws IOException { // method assigns strings to UrlArray object
int count; //counter for iteration
for(int i = 0; i < page_count; i++) { //loops through
count = i * 60; //sets user number at end of page
URL website = new URL("http://...." + count);
urlArray[i] = website; //url address
//System.out.println(urlArray[i]); //debug
}
}
protected URL returnArrayValue(int index) { //method returns string value in array of given index
//System.out.println(urlArray[index]); //debug
return urlArray[index];
}
protected int returnArrayLength() { //method returns length of array
//System.out.println(urlArray.length); //debug
return urlArray.length;
}
}
Reader Code
import java.net.*;
import java.io.*;
public class DataReader {
public static void main(String[] args) throws IOException {
UrlArrayBuild PrimaryArray = new UrlArrayBuild(9642); //Creates array object
PrimaryArray.buildArray(); //Builds array
//Create and initialize variables to use in loop
URL website = null;
String inputLine = null;
//Loops through array and reads source code
for (int i = 0; i < PrimaryArray.returnArrayLength(); i++) {
try {
website = PrimaryArray.returnArrayValue(i); //acquires url
BufferedReader inputStream = new BufferedReader(new InputStreamReader(website.openStream())); //reads url source code
System.out.println(PrimaryArray.returnArrayValue(i)); //prints out website url. I use it as a check to monitor progress
while((inputLine = inputStream.readLine()) != null) {
if (inputLine.isEmpty()) { //checks for blank lines
continue;
} else {
//begin parsing code. This is currently commented so there is nothing that occurs here
}
}
inputStream.close();
} finally {
//extraneous code here currently commented out.
}
}
}
Some delays cause by websites themselved epspecially if they are rich in term of contents. This might be a reason.
Parsing also can be factor in some delays. Therefore, personally I suggest to useful library for parsing that might be better optimized.
Good luck!
Multithread the application so requests can run concurrently. Or,
Rearchitect to use asynchronous IO / HTTP requests. Netty or MINA, or possibly just raw NIO.
Both of these related solutions are a lot of work, but a sophisticated solution is unfortunately required to deal with your problem. Basically, asynchronous frameworks exist to solve exactly this problem.
I think when looping through array, you can use multi-threading technologies and asynchronized java method invocation to improve your performance.
There is nothing obviously wrong with your code that would explain this. Certainly not in the code that you have shown us. Your code is not saving anything that is being read so it can't be leaking memory that way. And it shouldn't leak resources ... because if there are any I/O exceptions, the application terminates immediately.
(However, if your code did attempt to continue after I/O exceptions, then you would need to move the close() call into the finally block to avoid the socket / file descriptor leakage.)
It is most likely either it is a server-side or (possibly) network issue:
Look to see if there something unusual about the pages at around the 650 page mark. Are they bigger? Do they entail extra server-side processing (meaning they will be delivered more slowly)?
Look at the server-side load (while the application is running) and its log files.
Check to see if some kind of server request throttling has been implemented; e.g. as an anti-DoS measure.
Check to see if some kind of network traffic throttling has been implemented.
Also check on the client-side resource usage. I would expect CPU usage to either stay constant, or tail off at the 650 page mark. If CPU usage increases, that would cast suspicion back onto the application.

How to wait and notify between separate objects in Java?

General purpose of program
To read in a bash-pattern and specified location from command line, and find all files matching that pattern in the location but I have to make the program multi-threaded.
General structure of the program
Driver/Main Class which parses arguments and initiates other classes.
ProcessDirectories Class which adds all directory addresses found from the specified root directory to a string array for processing later
DirectoryData Class which holds the addresses found in the above class
ProcessMatches Class which examines each directory found, and adds any files inside that match the pattern to a string array for printing results later
Main/Driver once again takes over and prints the results :)
The Problem
I need to be processing matches even whilst the ProcessDirectories class is still working (for efficiency so I don't unnecessarily wait for the list to populate before doing work). To do this I try to: a) make ProcessMatches threads wait() if DirectoryData is empty b) make ProcessDirectories notifyAll() if added a new entry.
The Question :)
Every tutorial I look at is focused on the producer and consumer being in the same object, or dealing with just one data structure. How can I do this when I am using more than one data structure and more than one class for producing and consuming?
How about something like:
class Driver(String args)
{
ProcessDirectories pd = ...
BlockingQueue<DirectoryData> dirQueue = new LinkedBlockingQueue<DirectoryData>();
new Thread(new Runnable(){public void run(){pd.addDirs(dirQueue);}}).start();
ProcessMatches pm = ...
BlockingQueue<File> fileQueue = new LinkedBlockingQueue<File>();
new Thread(new Runnable()
{
public void run()
{
for (DirectoryData dir = dirQueue.take(); dir != DIR_POISON; dir = dirQueue.take())
{
for (File file : dir.getFiles())
{
if (pm.matches(data))
fileQueue.add(file)
}
}
fileQueue.add(FILE_POISON);
}
}).start();
for (File file = fileQueue.take(); file != FILE_POISON; file = fileQueue.take())
{
output(file);
}
}
This is just a rough idea of course. ProcessDirectories.addDirs() would just add DirectoryData objects to the queue. In production you'd want to name the threads. Perhaps use an executor to provide manage threads. Perhaps use some other mechanism to indicate end of processing than a poison message. Also, you might want to reduce the limit on the queue size.
Have one data structure that's associated with the data the two threads communicate with each other. This can be a queue that has "get data from queue, waiting if empty" and "put data on queue, waiting if full" functions. Those functions should internally call notify and wait on the queue itself and they should be synchronized to that queue.

Java Selector NIO Reading problem

Relevant Code
-- Note Instructions is merely a class with several methods which will operate on the data. A new thread is created operate on the data read.
READ THREAD:
while(true) {
System.out.println(".");
if(selector.select(500) == 0)
continue;
System.out.println("processing read");
for(SelectionKey sk : selector.keys()) {
Instructions ins = myHashTable.get(sk);
if(ins == null) {
myHashTable.put(sk, new Instructions(sk));
ins = myHashTable.get(sk);
}
ins.readChannel();
}
}
READCHANNEL
public void readChannel() {
BufferedReader reader = new BufferedReader(Channels.newReader((ReadableByteChannel) this.myKey.channel(), "UTF-8"));
Worker w = new Worker(this, reader.readLine());
(new Thread(w)).start();
}
The new thread then calls more Instructions methods.
When the ins function finishes it might write to a Writer:
Writer out = Channels.newWriter((WritableByteChannel) key.channel(), "UTF-8");
out.write(output);
out.flush();
I can confirm that my client (a flash movie), then receives and acts on the output.
Finally, w exits.
After the receipt of the first message from the client, and successful processing, however, no more messages are taken care of by the READ THREAD loop. I believe the key is registered with the selector and ready to read. I have checked by looping on all the keys to see if they are readable with isReadable & isRegistered on the channel and the result is true in all cases to date. When a second message is sent from the client, however, the only response I see in the read thread is that the '.' character is printed out not every half second, but continuously faster. I believe, then, that the data is there in the channel, but for some reason the Selector isn't selecting any key.
Can anyone help me?
I think you are missing few points here.
Firstly, you should use the selector.selectedKeys() in the for loop
as mentioned by Vijay.
One should remove the key from selectedKeys
after the key is processed. Otherwise, the key will not be
removed automatically and hence selector might spin continuously even
if there is one key with interested ops bit set. (This might be
the issue in your case).
Finally, we should perform operations on
channel if the channel is ready for it. i.e, read only if
isReadable() returns true and try to write only if isWritable() is
true. Don't forget to validate the key.
Shouldn't
for(SelectionKey sk : selector.keys())
be
for(SelectionKey sk : selector.selectedKeys())
Since you would like to process only those events that have occurred in the current select operation ?
Since you say that the select(500) returns before 5 seconds, my guess is that you have registered a channel with the selector for the WRITE operation. A channel is ready for write most of the times. Hence it is necessary to set the interest ops to WRITE only when data is available for writing.
Note that you have to remove the channel from the list of selected keys. Select() won't do that for you. Better to use iterator for this purpose:
Iterator<SelectionKey> key_interator = selector.selectedKeys().iterator();
while (key_interator.hasNext()) {
...
key_interator.remove();
}

Categories