Reduce memory usage in java - java

I have a web service Java program which reads 13,000,000 dates like '08-23-2016 12:54:44' as strings from database. My developing environment is Java 8, MySQL 5.7 and tomcat 8. I have declare a string array String[] data to store it. I use Guice to inject the initial values of data array to empty. However, the memory usage is still huge. This is my code:
String[] data;//size is 1,000,000
void generateDataWrapper(String params) {
//read over 13000000 dates string
ResultSet rs = mySQLCon.readData(params);
clearData(data);//set to empty string
int index = 0;
while(rs.next()) {
data[index++] = rs.getString("date");
if (index == (size - 1)) {//calculate every 1,000,000 total 13 times
//calculate statistics
...
//reset all to empty string
clearData(data);
index = 0;
}
}
}
//mySQLCon. readData function
ResultSet readData(String params) {
try {
String query = generateQuery(params);
Statement postStmt = connection.createStatement();
ResultSet rs = postStmt.executeQuery(query);
return rs;
} catch (Exception e) {
}
return null;
}
If I call this function once, the memory is reached 12G, If I call it again, the memory goes to 20G, on the third time the memory will goes to 25G and throw a 'java.lang.OutOfMemoryError: GC overhead limit exceeded' error in com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2174)
This is part of the error message:
java.lang.OutOfMemoryError: GC overhead limit exceeded
com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2174)
com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1964)
com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3316)
com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:463)
com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3040)
com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2288)
com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2681)
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2547)
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2505)
com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1370)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
java.lang.reflect.Method.invoke(Unknown Source)
I have changed the garbage collection algorithms to:
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
But it's not helping.
I have tried change the data to static variables, still will have this problem.
Currently the JVM heap is 8g, the tomcat memory is 24g, however, I don't think increase the memory will solve the problem.
I don't understand why my memory is still increasing every time I call this function, Could someone give me some suggestion?

Used resources like a ResultSet have to be closed to release the underlying system-resources. This can be done automatically declaring the resources in a try-block like try (ResultSet resultSet =...).
You can try to fetch only a limited number of rows from database when they are requested from ResultSet and not all of them immediately.
Objects get eligible for garbage collection when they are not referenced any more. So, your array-object keeps in memory with it's whole size as long as it is referenced. If it's not referenced any more and the VM is running out of memory it will be able to dispose the array-object possibly avoiding an OutOfMemoryError.
Unexpectedly high memory usage can be analyzed by creating a heap dump and exploring it in the tool jvisualvm of the JDK.

Additionally you can change your string array to an long array since strings consume a huge amount of memory. In your case the size of a date string is 38 bytes ( 19 char * 2 bytes ) whereas a long only takes 8 bytes of memory.
long[] data;//size is 1,000,000
void generateDataWrapper(String params) {
//read over 13000000 dates string
ResultSet rs = mySQLCon.readData(params);
clearData(data);//set to empty string
int index = 0;
SimpleDateFormat formater = new SimpleDateFormat("MM-dd-YYYY HH:mm:ss");
while(rs.next()) {
try{
Date date = formater.parse(rs.getString("date"));
data[index++] = date.getTime();
}catch(ParseException pe) {
pe.printStackTrace();
}
if (index == (size - 1)) {//calculate every 1,000,000 total 13 times
//calculate statistics
...
//reset all to empty string
clearData(data);
index = 0;
}
}
}
Wherever you need your string you can just parse it back with the following
SimpleDateFormat formater = new SimpleDateFormat("MM-dd-YYYY HH:mm:ss");
Date date = new Date(data[i]);
String dateString = formater.format(date);

First, thanks for all your suggestions. I have figured this out by reading from mm759 and realized that I forgot to close the ResultSet after I have done reading. After I add rs.close(), every time it takes the same time to finish, although the memory will reach the maximum memory I set.

Related

Cassandra Exception

For my current project i'm using Cassandra Db for fetching data frequently. Within every second at least 30 Db requests will hit. For each request at least 40000 rows needed to fetch from Db. Following is my current code and this method will return Hash Map.
public Map<String,String> loadObject(ArrayList<Integer> tradigAccountList){
com.datastax.driver.core.Session session;
Map<String,String> orderListMap = new HashMap<>();
List<ResultSetFuture> futures = new ArrayList<>();
List<ListenableFuture<ResultSet>> Future;
try {
session =jdbcUtils.getCassandraSession();
PreparedStatement statement = jdbcUtils.getCassandraPS(CassandraPS.LOAD_ORDER_LIST);
for (Integer tradingAccount:tradigAccountList){
futures.add(session.executeAsync(statement.bind(tradingAccount).setFetchSize(3000)));
}
Future = Futures.inCompletionOrder(futures);
for (ListenableFuture<ResultSet> future : Future){
for (Row row: future.get()){
orderListMap.put(row.getString("cliordid"), row.getString("ordermsg"));
}
}
}catch (Exception e){
}finally {
}
return orderListMap;
}
My data request query is something like this,
"SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid = ?".
My Cassandra cluster has 2 nodes with 32 concurrent read and write thread for each and my Db schema as follow
CREATE TABLE omsks_v1.ordersstringv1_copy1 (
tradacntid int,
cliordid text,
ordermsg text,
PRIMARY KEY (tradacntid, cliordid)
) WITH bloom_filter_fp_chance = 0.01
AND comment = ''
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
AND caching = {
'keys' : 'ALL',
'rows_per_partition' : 'NONE'
}
AND compression = {
'sstable_compression' : 'LZ4Compressor'
}
AND compaction = {
'class' : 'SizeTieredCompactionStrategy'
};
My problem is getting Cassandra timeout exception, how to optimize my code to handle all these requests
It would be better if you would attach the snnipet of that Exception (Read/write exception). I assume you are getting read time out. You are trying to fetch a large data set on a single request.
For each request at least 40000 rows needed to fetch from Db
If you have a large record and resultset is too big, it throws exception if results cannot be returned within a time limit mentioned in Cassandra.yaml.
read_request_timeout_in_ms
You can increase the timeout but this is not a good option. It may resolve the issue (may not throw exception but will take more time to return result).
Solution: For big data set you can get the result using manual pagination (range query) with limit.
SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1
WHERE tradacntid > = ? and cliordid > ? limit ?;
Or use range query
SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid
= ? and cliordid >= ? and cliordid <= ?;
This will be much more faster than fetching the whole resultset.
You can also try by reducing the fetch size. Although it will return the whole resultset.
public Statement setFetchSize(int fetchSize) to check if exception is thrown.
setFetchSize controls the page size, but it doesn't control the
maximum rows returned in a ResultSet.
Another point to be noted:
What's the size of tradigAccountList?
Too many requests at a time also may lead to timeout. Large size of tradigAccountList and a lot of read requests are done at a time (load balancing of requests are handled by Cassandra and how many requests can be handled depends on cluster size and some other factors) may cause this exception .
Some related Links:
Cassandra read timeout
NoHostAvailableException With Cassandra & DataStax Java Driver If Large ResultSet
Cassandra .setFetchSize() on statement is not honoured

ORA-12518, TNS:listener could not hand off client connection comes from a loop with heavy memory access

I have a loop with heavy memory access from oracle.
int firstResult = 0;
int maxResult = 500;
int targetTotal = 8000; // more or less
int phase = 1;
for (int i = 0; i<= targetTotal; i += maxResult) {
try {
Session session = .... init hibernate session ...
// Start Transaction
List<Accounts> importableInvAcList = ...getting list using session and firstResult-maxResult...
List<ContractData> dataList = new ArrayList<>();
List<ErrorData> errorDataList = new ArrayList<>();
for (Accounts account : importableInvAcList) {
... Converting 500 Accounts object to ContractData object ...
... along with 5 more database call using existing session ...
.. On converting The object we generate thousands of ErrorData...
dataList.add(.. converted account to Contract data ..);
errorDataList.add(.. generated error data ..);
}
dataList.stream().forEach(session::save); // 500 data
errorDataList.stream().forEach(session::save); // 10,000-5,000 data
... Commit Transaction ...
phase++;
} catch (Exception e) {
return;
}
}
On the second phase (2nd loop) the Exception comes out. Sometimes Exception is coming out in 3rd or fifth phase.
I also checked the Runtime Memory.
Runtime runtime = Runtime.getRuntime();
long total = runtime.totalMemory();
long free = runtime.freeMemory();
long used = total - free;
long max = runtime.maxMemory();
And in the second phase the status was below for sample...
Used: 1022 MB, Free: 313 MB, Total Allocated: 1335 MB
Stack Trace is here...
org.hibernate.exception.GenericJDBCException: Cannot open connection
at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:140)
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:128)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:52)
at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:449)
at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:167)
at org.hibernate.jdbc.JDBCContext.connection(JDBCContext.java:142)
at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:85)
at org.hibernate.impl.SessionImpl.beginTransaction(SessionImpl.java:1463)
at ibbl.remote.tx.TxSessionImpl.beginTx(TxSessionImpl.java:41)
at ibbl.remote.tx.TxController.initPersistence(TxController.java:70)
at com.ibbl.data.util.CDExporter2.run(CDExporter2.java:130)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: Listener refused the connection with the following error:
ORA-12518, TNS:listener could not hand off client connection
Noted that, this process running in a Thread, and there are 3 similar Thread running at a time.
Why this Exception hangs out after the loop running a while ?
there are 3 similar Thread running at a time.
If your code creates a total of 3 Threads, then, optimally, you need only 3 Oracle Connections. Create all of them before any Thread is created. Create the Threads, assign each Thread a Connection, then start the Threads.
Chances are good, though, that your code might be way too aggressively consuming resources on whatever machine is hosting it. Even if you eliminate the ORA-12518, the RDBMS server may "go south". By "go south", I mean if your application is consuming too many resources the machine hosting it or the machine hosting the RDBMS server may "panic" or something equally dreadful.

How to process large XML file (9 GB) with STAX api

I'm always getting Heap memory problem while processing huge file.Here i'm processing 9 GB xml file.
This is my code.
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream in = new FileInputStream(sourcePath);
XMLEventReader eventReader = inputFactory.createXMLEventReader(in);
Map<String, Cmt> mapCmt = new ConcurrentHashMap<String, Cmt>();
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.isStartElement()) {
//some processing and assigning value to map
Cmt cmt = new Cmt();
//get attributes
cmt.setDetails(attribute.getValue());
mapCmt.put(someKey,cmt);
}
}
I'getting heap memory problem in iteration after some time.
Please help me to write optimized code.
Note: server have available 3 GB heap space. I can't increase server space.
I'm executing with following parameters - -Xms1024m -Xmx3g
My xml looks like this.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DatosAbonados xmlns="http://www.cnmc.es/DatosAbonados">
<DatosAbonado Operacion="1" FechaExtraccion="2015-10-08">
<Titular>
<PersonaJuridica DocIdentificacionJuridica="A84619488" RazonSocial="HERMANOS ROJAS" NombreComercial="PINTURAS ROJAS"/>
</Titular>
<Domicilio Escalera=" " Piso=" " Puerta=" " TipoVia="AVENIDA" NombreVia="MANOTERAS" NumeroCalle="10" Portal=" " CodigoPostal="28050" Poblacion="Madrid" Provincia="28"/>
<NumeracionAbonado>
<Rangos NumeroDesde="211188600" NumeroHasta="211188699" ConsentimientoGuias-Consulta="1" VentaDirecta-Publicidad="1" ModoPago="1">
<Operador RazonSocial="11888 SERVICIO CONSULTA TELEFONICA S.A." DocIdentificacionJuridica="A83519389"/>
</Rangos>
</NumeracionAbonado>
</DatosAbonado>
<DatosAbonado Operacion="1" FechaExtraccion="2015-10-08">
<Titular>
<PersonaJuridica DocIdentificacionJuridica="A84619489" RazonSocial="HERMANOS RUBIO" NombreComercial="RUBIO PELUQUERIAS"/>
</Titular>
<Domicilio Escalera=" " Piso=" " Puerta=" " TipoVia="AVENIDA" NombreVia="BURGOS" NumeroCalle="18" Portal=" " CodigoPostal="28036" Poblacion="Madrid" Provincia="28"/>
<NumeracionAbonado>
<Rangos NumeroDesde="211186000" NumeroHasta="211186099" ConsentimientoGuias-Consulta="1" VentaDirecta-Publicidad="1" ModoPago="1">
<Operador RazonSocial="11888 SERVICIO CONSULTA TELEFONICA S.A." DocIdentificacionJuridica="A83519389"/>
</Rangos>
</NumeracionAbonado>
</DatosAbonado>
</DatosAbonados>
My Cmt class is :
public class Cmt {
private List<DetailInfo> details;
public List<DetailInfo> getDetails() {
return details;
}
public void setDetails(DetailInfo detail) {
if(details == null){
details = new ArrayList<DetailInfo>();
}
this.details.add(detail);
}
}
Actually Cmt object is very less, But i have DetailInfo object for
every element. So huge no. of DetailInfo object is
created.
My Logic is this :
if (startElement.getName().getLocalPart().equals("DatosAbonado")) {
detailInfo = new DetailInfo();
Iterator<Attribute> attributes = startElement.getAttributes();
while (attributes.hasNext()) {
Attribute attribute = attributes.next();
if(attribute.getName().toString().equals("Operacion")){
detailInfo.setOperacion(attribute.getValue());
}
}
}
if (event.isEndElement()) {
EndElement endElement = event.asEndElement();
if (endElement.getName().getLocalPart().equals("DatosAbonado")) {
Cmt cmt = null;
if(mapCmt.keySet().contains(identificador)){
cmt = mapCmt.get(identificador);
} else{
cmt = new Cmt();
}
cmt.setDetails(detailInfo);
mapCmt.put(identificador, cmt);
}
}
The root of your problems is most likely this:
mapCmt.put(someKey, cmt);
You are populating a hashmap with a number of large Cmt objects. You need to do one of the following:
Process the data immediately rather than saving it in a data structure.
Write the data out to a database for later querying.
Increase the heap size.
Figure out a less "memory hungry" representation for your data.
The last two approaches don't scale though. As you increase the size of the input file, you will need progressively more memory ... until you eventually exceed the memory capacity of your execution platform.
DatosAbonnado is the killer indeed. If you have plenty of 'm this will cause your application to choke.
The approach is simply not scalable. As pointed out by Stephan C you need to process the DatosAbonnado as it arrives and not collect them in a container.
Since this is a typical scenario for which I developed LDX+ code generator, I went to the steps of:
creating an XML Schema file from XML (because you had not provided it) using: https://devutilsonline.com/xsd-xml/generate-xsd-from-xml
generate code with LDX+
This code generator is actually using SAX, and the resulting code allows you to:
serialize the complexElements to Java objects
configure how to treat 1 to many relationships (like the one you have here) at runtime
I uploaded the code here: https://bitbucket.org/lolkedijkstra/ldx-samples
To see the code navigate to the Source folder. There you'll find DatosAbonnados.
This approach really scales well (memory consumption is flat)

clearing batch preparedstatements

I have a java application which read files and writes to oracle db row by row.
We have come across a strange error during batch insert which does not occur during sequential insert. The error is strange because it occurs only with IBM JDK7 on AIX platform and I get this error on different rows every time. My code looks like below:
prpst = conn.prepareStatement(query);
while ((line = bf.readLine()) != null) {
numLine++;
batchInsert(prpst, line);
//onebyoneInsert(prpst, line);
}
private static void batchInsert(PreparedStatement prpst, String line) throws IOException, SQLException {
prpst.setString(1, "1");
prpst.setInt(2, numLine);
prpst.setString(3, line);
prpst.setString(4, "1");
prpst.setInt(5, 1);
prpst.addBatch();
if (++batchedLines == 200) {
prpst.executeBatch();
batchedLines = 0;
prpst.clearBatch();
}
}
private static void onebyoneInsert(PreparedStatement prpst, String line) throws Exception{
int batchedLines = 0;
prpst.setString(1, "1");
prpst.setInt(2, numLine);
prpst.setString(3, line);
prpst.setString(4, "1");
prpst.setInt(5, 1);
prpst.executeUpdate();
}
I get this error during batch insert mode :
java.sql.BatchUpdateException: ORA-01461: can bind a LONG value only for insert into a LONG column
at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:10345)
I already know why this Ora error occurs but this is not my case. I am nearly sure that I am not setting some large data to a smaller column. May be I am hitting some bugs in IBM jdk7 but could not prove that.
My question if there is a way that I can avoid this problem ? One by one insert is not an option because we have big files and it takes too much time.
Try with
prpst.setInt(5,new Integer(1))
What is the type of variable "numLine"?
Can you share type of columns corresponding to the fields you set in PreparedStatement?
Try once by processing with "onebyoneInsert". Share the output for this case. It might help identifying root cause.
Also print value of "numLine" to console.

retry open files in directory

I am trying the following code to open files in a certain directory. The name of the files are assigned by date but some dates are missing. I want to iterate through the dates to get the files and make the code go back 1 day every time it fails to find a file until it finally finds one (currentdate is a global variable and the strange xml element is because I'm using processing).
What I think the code should do is:
try to open the file with the given date.
on error, it goes to catch and gets a new date.
the process is repeated until a valid date is found.
when a valid date is found it goes to the line where break is and exits the loop.
But for some reason it does weird stuff like EDIT # sometimes it jumps too much, especially near the first month #
Is my logic not working for some reason?
Thanks
String strdate=getdatestring(counter);
int counter=0;
while(true){
try{
xmldata = new XMLElement(this, "dir/" + strdate + "_filename.xml" );
break;
}catch(NullPointerException e){
counter +=1;
strdate=getdatestring(counter);
}}
String getdatestring(int counter) {
Date firstdate=new Date();
int daystosum=0;
String strcurrentdate="";
if(keyPressed && key=='7'){
daystosum=-7;
}
daystosum=daystosum-counter;
Calendar c=Calendar.getInstance();
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
try{
firstdate=formatter.parse("2012-04-13");//first day of the database
}catch(ParseException e){
println(e);
}
c.setTime(currentdate);
c.add(Calendar.DATE,daystosum);
currentdate=c.getTime();
if(currentdate.before(firstdate)){
currentdate=firstdate;
}
strcurrentdate=formatter.format(currentdate);
return strcurrentdate;
}
I believe once you do this,
daystosum=daystosum-counter;
you need to reset the counter as
counter = 0;
otherwise next time it will subtract more bigger number e.g. to start, say daystosum is 0 and counter is 5, after the daystosum=daystosum-counter;, daystosum will become -5. Again you go in the while loop and file is not found then count will increase to 6. In that case you would be getting `daystosum=daystosum-counter; as -5-6 = -11, but you would want it to move to -6. Resetting the counter should ix your issue.
On the other note, I think you can list down the files using file.listFiles() from the parent directory and perform the search on the file names. In that case, you are not attempting to open files again and again.

Categories