loading big csv file to redis - java

I'm trying to load a csv file (this file has 27 millions rows) in redis DB, but it throws me an exception of 'redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out'
My code:
Jedis jedis = new Jedis();
try {
BufferedReader in = new BufferedReader(new FileReader(fileName));
String line = in.readLine();
int i =0;
while (line != null) {
try{
String[] edgeDescription = line.split("\\s");
if (edgeDescription.length >=0) {
jedis.hset( String.valueOf(i) ,edgeDescription[0],edgeDescription[1]);
}
i++;
}
catch (OutOfMemoryError e)
{
e.getMessage();
}
line = in.readLine();
}
} catch (Exception e) {
e.printStackTrace();
}

Given the set of information, it is difficult to answer this question. Though I will try to give you some points to check:
Where do you state to which database to connect? I don't see any host/port spec. Can it be that in this case jedis instance tries to connect to localhost:6379?
Is your code able to write some data to redis at all?
If yes, can you introduce a counter and count how many lines are processed and how many lines are written to DB? After connection timeouts what happens with redis instance? Does it still run and everything is ok?
Did you try redis-cli -h host -p port monitor command?
A small remark: your code in general is not efficient, you should use pipelines and process writes in chunks of 1000 writes or so, this will be much faster. Did you consider this?

Related

Iterative data gathering from server

I'm building a web server and client which I will use on a Raspberry Pi. The server is written in Python and every time a client connects to it, it will return data to the client. This is working well and when I access the server through my browser, I see the correct data and when I refresh the page I get new data, so that is working.
The problem is that I'm now writing a client in Java to access the server and collect the data. The reason I use here Java is that I'm more experienced in it and I find building a GUI more convenient; I use python on the raspberry pi because it offers me a simpler way of interacting with the hardware.
The java client code I have is this:
while (true) {
Socket socket = new Socket("192.168.0.37", 9315);
PrintWriter out = new PrintWriter(socket.getOutputStream(), true);
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
try {
System.out.println("Getting");
out.write("GIMME DATA!");
out.flush();
System.out.println("receiving");
String data;
while ((data = in.readLine()) != null) {
System.out.println("Line: " + data);
}
socket.close();
Thread.sleep(5000);
} catch (Exception e) {
e.printStackTrace();
}
}
This all works well, but I want to take the initial 3 objects out of the while loop (and obviously not close the connection). When I do that, I only can access the server once (it will give me the correct data) and on the second run in.readling() will return null which is weird. It seems that the inputstream only has the data from the initial run and no more bytes are written to it. To be clear, when putting the initial 3 objects IN the while loop, everything works well.
How can I take the objects out of the loop? It seems wasteful, time consuming and possibly memory consuming to create them every time...
Once you have reached to end of the file or BufferedReader reached to end of data, then how can it print again data from beginning. For example :
int i = 0;
BufferedReader br = new BufferedReader(new FileReader(new File("C:\\data.txt")));
while (true) {
String data = "";
while ((data = br.readLine()) != null) {
System.out.println(" data :" + data);
}
Thread.sleep(1000);
i++;
if (i == 2) {
break;
}
}
br.close();
In the above program, br reached to end of file, then it cannot point to the beginning. You have to create BufferedReder object every time when you enter while loop.

NullPointerException and BufferedReader

my program keeps throwing nullPointerException at me and I have no idea why. I thought maybe it's because of the bufferedReader but I'm not sure.
String line = reader.readLine();
while (!line.isEmpty()) {
line = repairLine(line);
tree.add(line);
line = reader.readLine();
}
Yes, it is wrapped in try - catch block. It says that the problem is on the while-line. It didn't work with "if (line != null)". I really don't know what could cause that. Thanks you for any help.
The problem seems to be with the following lines of your code :
String line = reader.readLine();
while (!line.isEmpty()) {
}
If there is nothing to read from the file, line will be null. Thus the exception in the while loop. You are trying to call a method on a null reference.
That being said, the traditional way of reading lines from a file is to assign the line read to a variable in the while condition itself (code not tested) :
String line = "";
while ((line=reader.readLine()!=null) && !line.isEmpty()) {
line = repairLine(line);
tree.add(line);
}
I have had this same error and it meant that the socket is closing on the other connected client while it is trying to read on the server.
To add some context, I am running a server and various clients are connecting to it via java sockets. I was having this error on the server side when the client closes the socket.
So to solve the issue I decided to close both server side and client side sockets after every connection. The way I have done this is before client is closed it sends a "quit" message to the server before closing its sockets. When the server reads the "quit" message it will also close its sockets.
For example:
Client side
if(Quit){
if(mySocket != null && !mySocket.isClosed()) {
output.println("quit");
output.flush();
try {
mySocket.close();
} catch (IOException e) {
System.out.println("Close net variables exception - " + e.getMessage());
}
}
}
Server side
String line = input.readLine(); //Thread stops here untill it reads something
if(line.equals("something")){
do_something();
else if(line.equals("quit")){
try {
mySocket.close();
} catch (IOException e) {
System.out.println("Close net variables exception - " + e.getMessage());
}
else{
do_default();
}
Worked for me, hope it helps!

Java runtime exec getInputStream when process pauses

I am creating a wrapper for a executable that runs on the windows command line. The executable takes a few commands then attempts to connect to another device. then it outputs and ERROR! or Ready For "Device Name" i do not get this message until the app exits. The problem is this app is a tunnel allowing me to run telnet on the external box but i need to make sure the Device is ready this is my code.
public void startUDPTunnel() {
//TODO Pull Amino serial number from webportal
Properties prop = new Properties();
InputStream inConfig = getClass().getClassLoader().getResourceAsStream("config.properties");
try {
prop.load(inConfig);
} catch (IOException e) {
System.out.println(e.getMessage());
}
String server = prop.getProperty("server");//config.GetProp("server");
System.out.println(server);
String port = prop.getProperty("port");//config.GetProp("port");
System.out.println(port);
String location = prop.getProperty("location");//config.GetProp("location");
System.out.println(location);
String url = prop.getProperty("URL");
System.out.println(url);
String input = "";
try {
input = getSerial(url);
System.out.println(input);
Process p = Runtime.getRuntime().exec(location+"udptunnel.exe -c 127.0.0.1 23 "+input+" "+server+" "+port+" 127.0.0.1 23");
threadSleep();
BufferedReader in = new BufferedReader(
new InputStreamReader(p.getInputStream()));
String line = null;
while ((line = in.readLine()) != null) {
if(line.equals("ERROR!")){
System.out.println("There was an ERROR");
}
if(line.equals("Ready for \""+input+"\"")){
System.out.println("Load Telnet");
}
}
p.destroy();
} catch (IOException e) {
e.printStackTrace();
}
}
Sorry there is a lot of debug code left in this function.
EDIT
OK I am pretty sure know what the issue is bufferReader.readLine() requires a \n or \r or just hangs is there anyway to watch the stream with out the buffer?
You should use a ProcessBuilder, and then use redirectErrorStream(). I think this will cause stdout of the process to be unbuffered. And even if it doesn't, you'll only have to read from one InputStream to get both stdout and stderr.
I have figured out my problem the applications that i am executing with java do not have a EOL at the end of the line in fact they just hang on the line For example telnet waits for the username then the password. i am not sure this is proper but it works and is what i am going to use for now
while((i=br.read())!=-1){
ch += (char)i;
}
This outputs every char as they come in when then i just make sure the string contains what i am looking for!

in this case it is better to use database or read for file to increase performance?

This my class TagVertex contain one method that read the tag value=string from textual file
and return it
public class TagVertex extends Vertex<String> {
#Override
public String computeVertexValue() {
String s = null;
try {
BufferedReader bf = new BufferedReader(new FileReader(MyDataSource.TagList1K));
for(int i = 1; i < Integer.parseInt(this.getVertexId().substring(this.getVertexId().indexOf("g")+1)); i++){
bf.readLine();
}
s= bf.readLine();
bf.close();
} catch (Exception e) {
e.printStackTrace();
}
this.setVertexValue(s);
return s;
}
the method is called 1000 times ==>the file is read 1000 times too
It is better to use database instead of textual file?
Accessing the hard drive is always a very slow operation. Databases usually also access the hard drive, so they aren't necessarily faster. They can be even slower, because when the database doesn't run on the same system, the network latency is added (even when it runs on localhost, you have latency due to interprocess communication).
I would recommend you to read the file once and cache the value. When you need to be aware immediately when the file is changed, you could use the new WatchService API to reload the file when it was changed. Here is a tutorial. When it isn't that important that changes on filesystem level are registred immediately, you could also save the time the vertex information was read from the hard drive, and only re-read the value when it's older than a few seconds.
You can create your own in memory database like this.
private static final List<String> lines = new ArrayList<>();
#Override
public String computeVertexValue() {
if (lines.isEmpty())
try {
BufferedReader br = new BufferedReader(new FileReader(MyDataSource.TagList1K));
for (String line; (line = br.readLine()) != null; )
lines.add(line);
br.close();
} catch (Exception e) {
e.printStackTrace();
}
return lines.get(Integer.parseInt(this.getVertexId().substring(this.getVertexId().indexOf("g") + 1)));
}

java tcp server to many connections

I wrote a simple TCP server to transfare some user Data to it and save it in an simple MySQL table. If i now run more than 2000 clients after each other it stops working. While running i get some IO error java.io.EOFException you may also see the misstake i made for that. But the most importand is that i get this
IO error java.net.SocketException: Connection reset
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at Server.main(Server.java:49)
Enough Memory schould be there but the threads are still running and i dont see where i made the misstake that they dont get terminated. So i got up to 3900 threads running than.
So here is the part of the Server:
try {
// create new socket
ServerSocket sock = new ServerSocket(port);
textArea.setText(textArea.getText() + "Server started\n");
while (true) {
// accept the connection
Socket newsock = sock.accept();
// handle the action
Thread t = new ThreadHandler(newsock);
newsock.setSoTimeout(2000); // adding client timeout
t.start();
}
} catch (Exception e) {
guess really simple. Here is how i handle the socket:
class ThreadHandler extends Thread {
private Socket socket;
private MySQLConnection sqlConnection;
ThreadHandler(Socket s) {
socket = s;
sqlConnection = new MySQLConnection();
}
public void run() {
try {
DataOutputStream out = new DataOutputStream(
socket.getOutputStream());
DataInputStream in = new DataInputStream(new BufferedInputStream(
socket.getInputStream()));
Server.textArea.append((new Date()) + "\nClient connected IP: " + socket.getInetAddress().toString()+"\n");
int firstLine = in.readInt(); // get first line for switch
switch (firstLine) {
case 0:
// getting the whole objekt for the database in own lines!
String name2 = in.readUTF();
int level2 = in.readInt();
int kp2 = in.readInt();
String skill = in.readUTF();
LeadboardElement element2 = new LeadboardElement();
element2.setName(name2);
element2.setLevel(level2);
element2.setKillPoints(kp2);
element2.setSkill(skill);
sqlConnection.saveChaToLeadboard(element2);
break;
//case 1 return the top10
###.... shorten here the rest of the cases
out.close();
in.close();
//close this socket
socket.close();
Server.textArea.append("Client disconnected IP: " + socket.getInetAddress().toString()+ "\n" + (new Date())
+ "\n----------------------------------------------------\n");
// autoscrolldown
Server.textArea.setCaretPosition(Server.textArea.getDocument()
.getLength());
} catch (Exception e) {
System.out.println("IO error " + e);
try {
socket.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}finally{
try {
socket.close();
} catch (IOException e) {
e.printStackTrace();
}
}
the saveChaToLeadboard simply gets the name level kp and skill and uses a preparedStatement so save it to my MySQL Table.
I hope you can help me i just dont see the misstake of it. I think i need to Join it somewhere but if i put a join at the end of it (after socket.close()) it still does the same.
Here the save to database methode:
public void saveChaToLeadboard(LeadboardElement element) {
try {
// load driver
Class.forName("com.mysql.jdbc.Driver");
connection = DriverManager.getConnection(this.databaseURL
+ DATABASE_NAME, this.user, this.password);
// insert values into the prep statement
preparedStatement = connection
.prepareStatement(PREP_INSERT_STATEMENT);
preparedStatement.setString(1, element.getName());
preparedStatement.setInt(2, element.getLevel());
preparedStatement.setInt(3, element.getKillPoints());
if(!element.getSkill().equalsIgnoreCase("")){
preparedStatement.setString(4, element.getSkill());
}else{
preparedStatement.setString(4, null);
}
// execute
preparedStatement.executeUpdate();
connection.close();
} catch (Exception e) {
Server.textArea.append(e.getMessage() + "\n");
Server.textArea.setCaretPosition(Server.textArea.getDocument()
.getLength());
try {
connection.close();
} catch (SQLException e1) {
e1.printStackTrace();
}
}
Thanks alot!
Regards
Your run() method is mangled, but I suspect that part of the problem is that you are not always closing network sockets and streams. In particular, I suspect that you are not closing them if there is an exception while reading, or processing the data you read. You should always close sockets and streams in a finally block (or the Java 7 equivalent).
Another potential problem is that some of the connections may be stalling due to the other end not sending data. To deal with that, you would need to set a read timeout on the socket ... so that connections to slow / stuck clients can be closed.
Finally, it is probably unrealistic to even try to process 2000+ connections in parallel with a thread per connection. That's a LOT of resources1. I recommend you use a thread pool with a fixed upper limit in the low hundreds, and stop accepting new connections if all threads are in use.
1 - Each thread stack occupies at least 64K of memory on a HotSpot JVM, and possibly as much of 1Mb. Then there are the Heap resources that the thread directly or indirectly refers to, and OS resources needed to maintain the state of the threads and the sockets. For 2000 threads, that's probably multiple Gb of memory.
IMHO 2000 threads is on the high side for a single process and 2000 database connections definately is.
Regardless of whether or not you're hitting limits with 2000 incoming connections, your approach simply will not scale.
To acheive scalability you need to look at using resource pools - this means:
a pool of reader threads reading from the sockets queuing the data for processing.
a pool of worker threads processing the data queued by the reader threads.
a pool of database connections used by the worker threads - this connection pool could be adjusted so that each worker thread has it's own connection but the important thing is that you don't continually open and close database connections.
Look at the concurreny API for the thread pools and the NIO API for the IO.
This arrangement will allow you to tune your server to acheive the desired throughput.

Categories