Insert data into Cassandra

Insert data into Cassandra - java

I have data (network packets) to be inserted in a Cassandra database!
Unfortunately, my application needs at about 1min to insert 10000 packets!
I'm looking for if there is anyone who can help me to operate the java multithreading concept to accelerate the insertion! Here is my code:
PcapPacketHandler<String> jpacketHandler;
jpacketHandler = new PcapPacketHandler<String>() {
GestionPacketDAO g1;
int row=0;
public void nextPacket(PcapPacket packet, String user) {
row++;
String s = packet.toHexdump();
try {
g1 = new GestionPacketDAO();
g1.Insert(s, row);// Insert is the function which inserts data into database
}
catch (InvalidRequestException exg) {
Logger.getLogger(AccueilInsertion.class.getName()).log(Level.SEVERE, null, exg);
}
catch (TException exg) {
Logger.getLogger(AccueilInsertion.class.getName()).log(Level.SEVERE, null, exg);
}
}
}

A common pattern is:
Use a ThreadPoolExecutor with maybe 10 threads.
Use a client library that does connection pooling (e.g. Astyanax or the DataStax CQL3 java driver). Ensure there are at least as many connections as threads.
Back the ThreadPoolExecutor by a queue of fixed size (e.g. ArrayBlockingQueue)
The producer, in your case the nextPacket function, calls ThreadPoolExecutor.execute, which adds a Runnable to the queue. You need to handle when your queue is full appropriately by handling RejectedExecutionException. You can sleep and block reading your packets or drop the packet or some alternative.
An alternative is to have multiple threads running your packet handler if that is possible. Each one can have its own Cassandra connection and write directly. That will be more efficient if you can do it.

Related

Android - multithread TCP connection

I've been searching for an answer to my problem, but none of the solutions so far have helped me solve it. I'm working on an app that communicates with another device that works as a server. The app sends queries to the server and receives appropriate responses to dynamically create fragments.
In the first implementation the app sent the query and then waited to receive the answer in a single thread. But that solution wasn't satisfactory since the app did not receive any feedback from the server. The server admin said he was receiving the queries, however he hinted that the device was sending the answer back too fast and that the app probably wasn't already listening by the time the answer arrived.
So what I am trying to achieve is create seperate threads: one for listening and one for sending the query. The one that listens would start before we sent anything to the server, to ensure the app does not miss the server response.
Implementing this so far hasn't been succesful. I've tried writing and running seperate Runnable classes and AsyncTasks, but the listener never received an answer and at some points one of the threads didn't even execute. Here is the code for the asynctask listener:
#Override
protected String doInBackground(String... params) {
int bufferLength = 28;
String masterIP = "192.168.1.100";
try {
Log.i("TCPQuery", "Listening for ReActor answers ...");
Socket tcpSocket = new Socket();
SocketAddress socketAddress = new InetSocketAddress(masterIP, 50001);
try {
tcpSocket.connect(socketAddress);
Log.i("TCPQuery", "Is socket connected: " + tcpSocket.isConnected());
} catch (IOException e) {
e.printStackTrace();
}
while(true){
Log.i("TCPQuery", "Listening ...");
try{
Log.i("TCPQuery", "Waiting for ReActor response ...");
byte[] buffer = new byte[bufferLength];
tcpSocket.getInputStream().read(buffer);
Log.i("TCPQuery", "Received message " + Arrays.toString(buffer) + " from ReActor.");
}catch(Exception e){
e.printStackTrace();
Log.e("TCPQuery", "An error occured receiving the message.");
}
}
} catch (Exception e) {
Log.e("TCP", "Error", e);
}
return "";
}
And this is how the tasks are called:
if (Build.VERSION.SDK_INT>=Build.VERSION_CODES.HONEYCOMB) {
listener.executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR, "");
sender.executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR, "");
}
else {
listener.execute();
sender.executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR);
}
How exactly would you approach this problem? If this code is not sufficient I would be glad to post more.

This is because Android's AsyncTask is actually only one thread, no matter how many you create, so if you really want 2 threads running at the same time, I suggest you use standard Java concurrent package tools, not AsyncTask. As explained in the documentation:
AsyncTask is designed to be a helper class around Thread and Handler
and does not constitute a generic threading framework. AsyncTasks
should ideally be used for short operations (a few seconds at the
most.) If you need to keep threads running for long periods of time,
it is highly recommended you use the various APIs provided by the
java.util.concurrent pacakge such as Executor, ThreadPoolExecutor and
FutureTask.

Look this is tcp connection. So you don't need to bother about data lose. This is port to port connection and it never sends end of stream (-1). Perhaps you have to care about read functionality. Because you can not conform all steams are received or not. Tcp read method is a blocking call. If your read buffer size is smaller than available stream size then it block until it can read fully. And you are using android device, perhaps available stream can vary depending upon your device network. So you have 2 options,
1) your buffer size should be dynamic. At first check your available input stream size by using is.available() and create your buf size by this size. If available size is zero then sleep for a certain time to check it is lost its stream availability or not.
2) set your input stream timeout. It really works, because it reads its available stream and wait for the timeout delay, if any stream is not available within the timeout period then it throws timeout exception.
Try to change your code.

Java threaded socket connection timeouts

I have to make simultaneous tcp socket connections every x seconds to multiple machines, in order to get something like a status update packet.
I use a Callable thread class, which creates a future task that connects to each machine, sends a query packet, and receives a reply which is returned to the main thread that creates all the callable objects.
My socket connection class is :
public class ClientConnect implements Callable<String> {
Connection con = null;
Statement st = null;
ResultSet rs = null;
String hostipp, hostnamee;
ClientConnect(String hostname, String hostip) {
hostnamee=hostname;
hostipp = hostip;
}
#Override
public String call() throws Exception {
return GetData();
}
private String GetData() {
Socket so = new Socket();
SocketAddress sa = null;
PrintWriter out = null;
BufferedReader in = null;
try {
sa = new InetSocketAddress(InetAddress.getByName(hostipp), 2223);
} catch (UnknownHostException e1) {
e1.printStackTrace();
}
try {
so.connect(sa, 10000);
out = new PrintWriter(so.getOutputStream(), true);
out.println("\1IDC_UPDATE\1");
in = new BufferedReader(new InputStreamReader(so.getInputStream()));
String [] response = in.readLine().split("\1");
out.close();in.close();so.close(); so = null;
try{
Integer.parseInt(response[2]);
} catch(NumberFormatException e) {
System.out.println("Number format exception");
return hostnamee + "|-1" ;
}
return hostnamee + "|" + response[2];
} catch (IOException e) {
try {
if(out!=null)out.close();
if(in!=null)in.close();
so.close();so = null;
return hostnamee + "|-1" ;
} catch (IOException e1) {
// TODO Auto-generated catch block
return hostnamee + "|-1" ;
}
}
}
}
And this is the way i create a pool of threads in my main class :
private void StartThreadPool()
{
ExecutorService pool = Executors.newFixedThreadPool(30);
List<Future<String>> list = new ArrayList<Future<String>>();
for (Map.Entry<String, String> entry : pc_nameip.entrySet())
{
Callable<String> worker = new ClientConnect(entry.getKey(),entry.getValue());
Future<String> submit = pool.submit(worker);
list.add(submit);
}
for (Future<String> future : list) {
try {
String threadresult;
threadresult = future.get();
//........ PROCESS DATA HERE!..........//
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
}
The pc_nameip map contains (hostname, hostip) values and for every entry i create a ClientConnect thread object.
My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.
If i force the list to contain a single working pc, I have no problem.
The timeouts are pretty random, no clue what's causing them.
All machines are in a local network, the remote servers are written by my also (in C/C++) and been working in another setup for more than 2 years without any problems.
Am i missing something or could it be an os network restriction problem?
I am testing this code on windows xp sp3. Thanks in advance!
UPDATE:
After creating two new server machines, and keeping one that was getting a lot of timeouts, i have the following results :
For 100 thread runs over 20 minutes :
NEW_SERVER1 : 99 successful connections/ 1 timeouts
NEW_SERVER2 : 94 successful connections/ 6 timeouts
OLD_SERVER : 57 successful connections/ 43 timeouts
Other info :
- I experienced a JRE crash (EXCEPTION_ACCESS_VIOLATION (0xc0000005)) once and had to restart the application.
- I noticed that while the app was running my network connection was struggling as i was browsing the internet. I have no idea if this is expected but i think my having at MAX 15 threads is not that much.
So, fisrt of all my old servers had some kind of problem. No idea what that was, since my new servers were created from the same OS image.
Secondly, although the timeout percentage has dropped dramatically, i still think it is uncommon to get even one timeout in a small LAN like ours. But this could be a server's application part problem.
Finally my point of view is that, apart from the old server's problem (i still cannot beleive i lost so much time with that!), there must be either a server app bug, or a JDK related bug (since i experienced that JRE crash).
p.s. I use Eclipse as IDE and my JRE is the latest.
If any of the above ring any bells to you, please comment.
Thank you.
-----EDIT-----
Could it be that PrintWriter and/or BufferedReader are not actually thread safe????!!!?
----NEW EDIT 09 Sep 2013----
After re-reading all the comments and thanks to #Gray and his comment :
When you run multiple servers does the first couple work and the rest of them timeout? Might be interesting to put a small sleep in your fork loop (like 10 or 100ms) to see if it works that way.
I rearanged the tree list of the hosts/ip's and got some really strange results.
It seems that if an alive host is placed on top of the tree list, thus being first to start a socket connection, has no problem connecting and receiving packets without any delay or timeout.
On the contrary, if an alive host is placed at the bottom of the list, with several dead hosts before it, it just takes too long to connect and with my previous timeout of 10 secs it failed to connect. But after changing the timeout to 60 seconds (thanks to #EJP) i realised that no timeouts are occuring!
It just takes too long to connect (more than 20 seconds in some occasions).
Something is blobking new socket connections, and it isn't that the hosts or network is to busy to respond.
I have some debug data here, if you would like to take a look :
http://pastebin.com/2m8jDwKL

You could simply check for availability before you connect to the socket. There is an answer who provides some kind of hackish workaround https://stackoverflow.com/a/10145643/1809463
Process p1 = java.lang.Runtime.getRuntime().exec("ping -c 1 " + ip);
int returnVal = p1.waitFor();
boolean reachable = (returnVal==0);
by jayunit100
It should work on unix and windows, since ping is a common program.

My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.
So as I understand the problem, if you have (for example) 10 PCs in your map and 1 is alive and the other 9 are not online, all 10 connections time out. If you just put the 1 alive PC in the map, it shows up as fine.
This points to some sort of concurrency problem but I can't see it. I would have thought that there was some sort of shared data that was not being locked or something. I see your test code is using Statement and ResultSet. Maybe there is a database connection that is being shared without locking or something? Can you try just returning the result string and printing it out?
Less likely is some sort of network or firewall configuration but the idea that one failed connection would cause another to fail is just strange. Maybe try running your program on one of the servers or from another computer?
If I try your test code, it seems to work fine. Here's the source code for my test class. It has no problems contacting a combination of online and offline hosts.
Lastly some quick comments about your code:
You should close the streams, readers, and sockets in a finally block. Check my test class for a better pattern there.
You should return a small Result class instead of passing back a String that they has to be parsed.
Hope this helps.

After a lot of reading and experimentation i will have to answer my own question (if i am allowed to do of course).
Java just can't handle concurrent multiple socket connections without adding a big performance overhead. At least in a Core2Duo/4GB RAM/ Windows XP machine.
Creating multiple concurrent socket connections to remote hosts (using of course the code i posted) creates some kind of resource bottleneck, or blocking situation, wich i am still not aware of.
If you try to connect to 20 hosts simultaneously, and a lot of them are disconnected, then you cannot guarantee a "fast" connection to the alive ones.
You will get connected but could be after 20-25 seconds. Meaning that you'll have to set socket timeout to something like 60 seconds. (not acceptable for my application)
If an alive host is lucky to start its connection try first (having in mind that concurrency is not absolute. the for loop still has sequentiality), then he will probably get connected very fast and get a response.
If it is unlucky, the socket.connect() method will block for some time, depending on how many are the hosts before it that will timeout eventually.
After adding a small sleep between the pool.submit(worker) method calls (100 ms) i realised that it makes some difference. I get to connect faster to the "unlucky" hosts. But still if the list of dead hosts is increased, the results are almost the same.
If i edit my host list and place a previously "unlucky" host at the top (before dead hosts), all problems dissapear...
So, for some reason the socket.connect() method creates a form of bottleneck when the hosts to connect to are many, and not alive. Be it a JVM problem, a OS limitation or bad coding from my side, i have no clue...
I will try a different coding approach and hopefully tommorow i will post some feedback.
p.s. This answer made me think of my problem :
https://stackoverflow.com/a/4351360/2025271

Java UDP Server, concurrent clients

Is the code below sufficient to accept concurrent UDP transmissions? More specifically, if 2 clients transmit concurrently, will DatagramSocket queue up the transmissions and deliver them one by one as I call receive(), or will only one make it through?
DatagramSocket socket = new DatagramSocket(port, address);
byte[] buffer = new byte[8192];
while(!disconnect){
DatagramPacket p = new DatagramPacket(buffer, buffer.length);
socket.receive(p);
}

There is no queuing by default. The client may retry till timeout or similiar are reach.
UDP is quiet fast but on heavy load you may have clients that cannot connect.

If the packets make it to your networking interface (imagine lost packets on a congested wireless channel) they will passed up and the blocking method socket.receive(p) will be called. If there is a collision of packets on the channel because of two clients transmitting at the same time you will not get any of the two packets. But this is most likely not possible because the access technology of networking interfaces will take care of this, check
CSMA/CA or CSMA/CD
After calling socket.receive(p) you should create a new thread to process the packet itself. That will make sure that the next packet can be received on the socket.
EDIT:
Description of INTEL's TX and RX descriptors

A basic solution would have on thread responsible for handling a number of incoming requests (with your desired limit) and then handing them off to other worker/request handler threads. This basic structure is very much the same with most servers: a main thread responsible for handing off requests to worker threads. When each of these worker threads is finished, the you can update a shared/global counter to let the main thread know that it can establish a new connection. This will require synchronization, but it's a neat and simple abstraction.
Here's the idea:
Server Thread:
// Receive Packet
while (true) {
serverLock.acquire();
try {
if (numberOfRequests < MAX_REQUESTS) {
packet = socket.receive();
numberOfRequests++;
requestThread(packet).run();
} else {
serverMonitor.wait(serverLock);
}
} finally {
serverLock.release();
}
}
Request Thread:
// Handle Packet
serverLock.acquire();
try {
if (numberOfRequests == MAX_REQUESTS){
numberOfRequests--;
serverMonitor.pulse();
}
} finally {
serverLock.release();
}
This is just to give you an idea of what you can start out with. But when you get the hang of it, you'll be able to make optimizations and enhancements to make sure the synchronization is all correct.
One particular enhancement, which also lends itself to limited number of requests, is something called a ThreadPool.

Queue method or individual db connection?

I have socket connection which will send data into a queue via databaseQueue.add(message);. Next the the DatabaseProcessor class which is started as thread during the start where single database connection will be made. The connection will keep taking the message via databaseQueue.take(); and process. The good part about this part everything is that just one database connection is made. The problem arises when suddenly there is a surge of data. So another method is that for each data received I will open and close method. So based your experiences for heavy loads which is the best way to go here?
Some snippet of my codes.
class ConnectionHandler implements Runnable {
ConnectionHandler(Socket receivedSocketConn1) {
this.receivedSocketConn1=receivedSocketConn1;
}
// gets data from an inbound connection and queues it for databse update
public void run() {
databaseQueue.add(message); // put to db queue
}
}
class DatabaseProcessor implements Runnable {
public void run()
{
// open database connection
createConnection();
while (true)
{
message = databaseQueue.take(); // keep taking message from the queue add by connectionhandler and here I will have a number of queries to run in terms of select,insert and updates.
}
}
void createConnection()
{
System.out.println("Crerate Connection");
connCreated = new Date();
try
{
dbconn = DriverManager.getConnection("jdbc:mysql://localhost:3306/test1?"+"user=user1&password=*******");
dbconn.setAutoCommit(false);
}
catch(Throwable ex)
{
ex.printStackTrace(System.out);
}
}
}
public void main()
{
new Thread(new DatabaseProcessor()).start(); //calls the DatabaseProcessor
//initiate the socket
}

As far as I understand you are managing a client-server Socket connection in which you send and receive message through a queue. If I also got it right, you are creating a thread for each new message on the queue.
Considering that there will be plenty of messages being sent and read I recommend you to declare your method(s) in your threads Synchronized so that you won't need to open and close streaming each time a data is received (refer to your second approach here). Synchronized Methods are usually the best way to handle surge of common data which can be modified by threads at the same time.

You can use connection pooling to get the best of both worlds: you are not limited to a single thread, and you also do not need to open connections for each request. Have a look at Apache DBCP.

This approach is fine. Except you can create an a DB pool using c3pO. Also use threadPool executor for miantaining your thread pool.

We need advice for a server software implementation with Java NIO

I'm trying to calculate the load on a server I have to build.
I need to create a server witch have one million users registered in an SQL database. During a week each user will approximately connect 3-4 times. Each time a user will up and download 1-30 MB data, and it will take maybe 1-2 minutes.
When an upload is complete it will be deleted within minutes.
(Update text removed error in calculations)
I know how to make and query an SQL database but what to consider in this situation?

What you want exactly is Netty. It's an API written in NIO and provides another event driven model instead of the classic thread model.
It doesn't use a thread per request, but it put the requests in a queue. With this tool you can make up to 250,000 requests per second.

I am using Netty for a similar scenario. It is just working!
Here is a starting point for using netty:
public class TCPListener {
private static ServerBootstrap bootstrap;
public static void run(){
bootstrap = new ServerBootstrap(
new NioServerSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool()));
bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
public ChannelPipeline getPipeline() throws Exception {
TCPListnerHandler handler = new MyHandler();
ChannelPipeline pipeline = Channels.pipeline();
pipeline.addLast("handler", handler);
return pipeline;
}
});
bootstrap.bind(new InetSocketAddress(9999)); //port number is 9999
}
public static void main(String[] args) throws Exception {
run();
}
}
and MyHandler class:
public class MyHandler extends SimpleChannelUpstreamHandler {
#Override
public void messageReceived(
ChannelHandlerContext ctx, MessageEvent e) {
try {
String remoteAddress = e.getRemoteAddress().toString();
ChannelBuffer buffer= (ChannelBuffer) e.getMessage();
//Now the buffer contains byte stream from client.
} catch (UnsupportedEncodingException ex) {
ex.printStackTrace();
}
byte[] output; //suppose output is a filled byte array
ChannelBuffer writebuffer = ChannelBuffers.buffer(output.length);
for (int i = 0; i < output.length; i++) {
writebuffer.writeByte(output[i]);
}
e.getChannel().write(writebuffer);
}
#Override
public void exceptionCaught(
ChannelHandlerContext ctx, ExceptionEvent e) {
// Close the connection when an exception is raised.
e.getChannel().close();
}
}

At first I was thinking this many
users would require a non-blocking
solution but my calculations show that
I dont, [am I] right?
On modern operating systems and hardware, thread-per-connection is faster than non-blocking I/O, at least unless the number of connections reaches truely extreme levels. However, for writing the data to disk, NIO (channels and buffers) may help, because it can use DMA and avoid copy operations.
But overall, I also think network bandwidth and storage are your main concerns in this application.

The important thing to remember is that most users do not access a system evenly in every hour of every day of the week. Your system need to perform correctly during the busiest hour of the week.
Say the busiest hour of the week, 1/50 of all uploads are made. In the busiest hour each upload could be 30 MB, a total of 1.8 TB. This means you need to have an Internet upload bandwidth to support this. 1.8 TB/hour * 8 bits/byte / 60 min/hour / 60 sec/min = 4 Gbit/s Internet connection.
If for example, you have only a 1 Gbit/s connection, this will limit access to your server.
The other thing to consider is your retention time for these uploads. If each upload is 15 MB on average, you will be getting 157 TB per week or 8.2 PB (8200 TB) per year. You may need a significant amount of storage to retain this.
Once you have spend a significant amount of money on Internet connectivity and disk, the cost of buying a couple of servers is minor. You could use Apache MIMA, however a single server with a 10 Gbit/s connection can support 1 GB easily using any software you care to chose.
A single PC/server/labtop can handle 1,000 I/O threads so 300-600 is not a lot.
The problem will not be in the software but in the network/hardware you chose.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.