Unexpected output of InfluxDB batch write - java

I am using batch processing to write into InfluxDB and below is my code for doing that.
String dbName = "test";
influxDB.query(new Query("CREATE DATABASE " + dbName, dbName));
Stopwatch watch = Stopwatch.createStarted();
influxDB.enableBatch(2000, 100, TimeUnit.MILLISECONDS);
for (int j = 0; j < 100000; j++) {
Point point = Point.measurement("cpu")
.addField("idle", (double) j)
.addField("system", 3.0 * j).build();
influxDB.write(dbName, "autogen", point);
}
influxDB.disableBatch();
System.out.println("Write for " + 100000 + " Points took:" + watch);
}
Here i am writing 100000 points and which is taking very reasonable time to write, however only few records are written into DB instead of expected 100000 records.
select count(idle) from cpu gives me only "89" i am expecting it to be "100000"
While select * from cpu gives me following:
cpu
time idle system
2016-10-06T23:57:41.184Z 8 24
2016-10-06T23:57:41.185Z 196 588
2016-10-06T23:57:41.186Z 436 1308
2016-10-06T23:57:41.187Z 660 1980
2016-10-06T23:57:41.188Z 916 2748
2016-10-06T23:57:41.189Z 1278 3834
2016-10-06T23:57:41.19Z 1405 4215
2016-10-06T23:57:41.191Z 1409 4227
2016-10-06T23:57:41.192Z 1802 5406
2016-10-06T23:57:41.193Z 1999 5997
2016-10-06T23:57:41.456Z 3757 11271
2016-10-06T23:57:41.457Z 3999 11997
2016-10-06T23:57:41.858Z 4826 14478 and so on.....
Here my question is why the values of idle are missing, for example, after 8 it should 9, 10, 11, and so on but these values were not persisted and comes directly 196 and then missing in between and then 436. Any idea how to persist all value of loop variable "j" in this situation?

This line
influxDB.enableBatch(2000, 100, TimeUnit.MILLISECONDS);
says that it will flush input data if there are more than 2000 samples per 100 ms period. Since you are trying to write 100k samples then logically most of them get flushed.
Instead, write less samples in a single batch. My recommendation would be to write 5000 samples in a single batch, and make multiple batches until all your data is in the db.
// Batch 1
influxDB.enableBatch(5000, 100, TimeUnit.MILLISECONDS);
for (int j = 0; j < 5000; j++) {
Point point = Point.measurement("cpu")
.addField("idle", (double) j)
.addField("system", 3.0 * j).build();
influxDB.write(dbName, "autogen", point);
}
influxDB.disableBatch();
// Batch 2
// ...

Related

Iterative GraphFrames AggregateMessages hitting memory limits

I'm using GraphFrame's aggregateMessages capability to build a custom clustering algorithm. I tested this algorithm on a small sample dataset (~100 items) and verified that it works. But when I run this on my real dataset of 50k items, I am getting OOM errors after ~10 iterations. Interestingly, the first few iterations are processed in a couple mins and mem is the normal range. It's after iteration 6 that mem usage creeps to ~30GB and eventually bombs. I am running this on a 2 node cluster 16cores with 32GB.
Since this is an iterative algorithm and the fact that the mem after each iteration only increases, I wonder if I need to release memory somehow. I added the unpersist blocks at the end of the the loop but that hasnt helped.
Are there any other efficiencies I could use? Are there best practices around using GraphFrames in an iterative setting?
Another thing I've noticed is that on the spark UI of the executor page, the used "storage memory" for ~300MB, but the spark process is infact taking ~30GB. Not sure if this is a memory leak!
while ( true ) {
System.out.println("["+new Date()+"] Running " + i);
Dataset<Row> lastRoutesDs = groups;
Dataset<Row> groupUnwind = groups.withColumn("id", explode(col("routeItems")));
GraphFrame gf = new GraphFrame(groupUnwind, edgesDs);
Dataset<Row> lvl1 = gf.aggregateMessages()
.sendToSrc(when(
callUDF("contains_in_array_str", AggregateMessages.dst().getField("routeItems"),
AggregateMessages.src().getField("id")).equalTo(false),
struct(AggregateMessages.dst().getField("routeItems").as("routeItems"),
AggregateMessages.dst().getField("routeScores").as("routeScores"),
AggregateMessages.dst().getField("grpId").as("grpId"),
AggregateMessages.dst().getField("grpScore").as("grpScore"),
AggregateMessages.edge().getField("score").as("edgeScore"))))
.agg(collect_set(AggregateMessages.msg()).as("incomings"))
.withColumn("inItem", explode(col("incomings")))
.groupBy("id", "inItem.grpId")
.agg(first("inItem.routeItems").as("routeItems"), first("inItem.routeScores").as("routeScores"),
first("inItem.grpScore").as("grpScore"), collect_list("inItem.edgeScore").as("inScores"))
.groupBy("grpId")
.agg(bestRouteAgg.apply(col("routeItems"), col("routeScores"), col("inScores"), col("grpScore"),
col("id"), col("grpScore")).as("best"))
.withColumn("newScore", callUDF("calcRouteScores", expr("size(best.routeItems)+1"),
col("best.routeScores"), col("best.inScores")))
.withColumn("edgeCount", expr("size(best.routeScores)"))
.persist(StorageLevel.MEMORY_AND_DISK());
lvl1
.filter("newScore > " + groupMaxScore)
.withColumn("itr", lit(i))
.select("grpId", "best.routeItems","best.routeScores", "best.grpScore", "edgeCount", "itr")
.write()
.mode(SaveMode.Append)
.json(workspaceDir + "clusters-rank-collect");
if (lvl1.count() == 0) {
System.out.println("****** End reached " + i);
break;
}
Dataset<Row> newGroups = lvl1.filter("newScore <= " + groupMaxScore)
.withColumn("routeItems_new",
callUDF("merge2Array", col("best.routeItems"), array(col("best.newNode"))))
.withColumn("routeScores_new",
callUDF("merge2ArrayDouble", col("best.routeScores"), col("best.inScores")))
.select(col("grpId"), col("routeItems_new").as("routeItems"),
col("routeScores_new").as("routeScores"), col("newScore").as("grpScore"));
if (i > 0 && (i % 2) == 0) {
newGroups = newGroups
.checkpoint();
}
newGroups = newGroups
.persist(StorageLevel.DISK_ONLY());
System.out.println( newGroups.count() );
groups.unpersist();
lastRoutesDs.unpersist();
groupUnwind.unpersist();
lvl1.unpersist();
groups = newGroups;
i++;
}

What is the fastest way of sending byte array down the wire using Java? And how to measure it?

What is the fastest way of sending array of 1000 bytes down the TCP Socket using Java? And how to measure it?
If my code that sends 1000 bytes arrya 25_000 times to "localhost" of the same machine would the measurement be fair?
How to achieve the fastest transfer rate?
ByteBuffer buffer = ByteBuffer.allocateDirect( 1000 );
SocketChannel clientSocketChannel = SocketChannel.open( new InetSocketAddress( "localhost", 8888 ) );
clientSocketChannel.socket().setTcpNoDelay( true );
// empty array of bytes
byte[] array = new byte[BUFFER_SIZE];
for ( int i = 0; i < 25_000; i++ ) {
buffer.put( array );
buffer.flip();
long start = System.nanoTime();
// ==========================================
do {
int len = clientSocketChannel.write( buffer );
if (len < 0)
throw new EOFException();
} while (buffer.hasRemaining());
// ==========================================
long elapsed = System.nanoTime() - start;
System.out.println( i + ":\t" + elapsed );
buffer.flip();
}
System.out.println( "Client: finished" );
I get these numbers during execution:
14669: 14543
14670: 62877
14671: 14971
14672: 13687
14673: 70576
14674: 20104
14675: 16254
14676: 60311
14677: 15826
14678: 15826
14679: 18820
14680: 13688
14681: 12832
14682: 12832
14683: 14115
14684: 12831
14685: 12832
14686: 14971
14687: 4705
14688: 4277
14689: 4278
14690: 4277
14691: 3849
14692: 4705
14693: 4706
14694: 4278
14695: 4705
14696: 4277
14697: 4277
14698: 4705
14699: 3849
14700: 4277
14701: 4277
so for 100 iterations it goes faster - then goes back to
15067: 14115
15068: 16254
15069: 19248
15070: 13687
15071: 13259
15072: 14115
15073: 14970
15074: 15827
15075: 14543
15076: 16682
15077: 14116
15078: 16254
15079: 14543
15080: 16253
iperf utility shows me that loppback can transfer 200 MBytes per second. That is approx 4 microseconds per 1000 bytes. So durign execution of my program transfer rate indeed approcahes this rate - but drops from time to time to 14-15 microseconds per 1000 bytes message. There is no JIT involved here as JIT is happening duirng first 10.000 iterations. No class loading, no garbage collection
Microbenchmarking is providing false results. If we meausre not time of each iteration but time of 1000 iterations divided by 1000 we will get more accurate results

MongoDB ACKNOWLEDGED write concern faster than UNACKNOWLEDGED?

I've got a very simple test program that performs faster with ACKNOWLEDGED bulk inserts than with UNACKNOWLEDGED. And it's not just a little faster - I'm seeing a factor of nearly 100!
My understanding of the difference between these two write concerns is solely that with ACKNOWLEDGED the client waits for confirmation from the server that the operation has been executed (but not necessarily made durable), while with UNACKNOWLEDGED the client only knows that the request made it out onto the wire. So it would seem preposterous that the former could actually perform at a higher speed, yet that's what I'm seeing.
I'm using the Java driver (v2.12.0) with Oracle's Java JDK v1.7.0_71, and mongo version 3.0.0 on 64-bit Windows 7. I'm running mongod, completely out-of-the-box (fresh install), no sharding or anything. And before each test I ensure that the collection is empty and has no non-default indexes.
I would appreciate any insight into why I'm consistently seeing the opposite of what I'd expect.
Thanks.
Here's my code:
package test;
import com.mongodb.BasicDBObject;
import com.mongodb.BulkWriteOperation;
import com.mongodb.BulkWriteResult;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;
import com.mongodb.MongoClient;
import com.mongodb.ServerAddress;
import com.mongodb.WriteConcern;
import java.util.Arrays;
public class Test {
private static final int BATCHES = 100;
private static final int BATCH_SIZE = 1000;
private static final int COUNT = BATCHES * BATCH_SIZE;
public static void main(String[] argv) throws Exception {
DBCollection coll = new MongoClient(new ServerAddress()).getDB("test").getCollection("test");
for (String wcName : Arrays.asList("UNACKNOWLEDGED", "ACKNOWLEDGED")) {
WriteConcern wc = (WriteConcern) WriteConcern.class.getField(wcName).get(null);
coll.dropIndexes();
coll.remove(new BasicDBObject());
long start = System.currentTimeMillis();
BulkWriteOperation bulkOp = coll.initializeUnorderedBulkOperation();
for (int i = 1; i < COUNT; i++) {
DBObject doc = new BasicDBObject().append("int", i).append("string", Integer.toString(i));
bulkOp.insert(doc);
if (i % BATCH_SIZE == 0) {
BulkWriteResult results = bulkOp.execute(wc);
if (wc == WriteConcern.ACKNOWLEDGED && results.getInsertedCount() != 1000) {
throw new RuntimeException("Bogus insert count: " + results.getInsertedCount());
}
bulkOp = coll.initializeUnorderedBulkOperation();
}
}
long time = System.currentTimeMillis() - start;
double rate = COUNT / (time / 1000.0);
System.out.printf("%s[w=%s,j=%s]: Inserted %d documents in %s # %f/sec\n",
wcName, wc.getW(), wc.getJ(), COUNT, duration(time), rate);
}
}
private static String duration(long msec) {
return String.format("%d:%02d:%02d.%03d",
msec / (60 * 60 * 1000),
(msec % (60 * 60 * 1000)) / (60 * 1000),
(msec % (60 * 1000)) / 1000,
msec % 1000);
}
}
And here's typical output:
UNACKNOWLEDGED[w=0,j=false]: Inserted 100000 documents in 0:01:27.025 # 1149.095088/sec
ACKNOWLEDGED[w=1,j=false]: Inserted 100000 documents in 0:00:00.927 # 107874.865156/sec
EDIT
Ran more extensive tests, per request from Markus W. Mahlberg. For these tests, I ran the code with four write concerns: UNACKNOWLEDGED, ACKNOWLEDGED, JOURNALED, and FSYNCED. (I would expect this order to show decreasing speed.) I ran 112 repetitions, each of which performed 100 batches of 1000 inserts under each of the four write concerns, each time into an empty collection with no indexes. Code was identical to original post but with two additional write concerns, and with output to CSV format for easy analysis.
Results summary:
UNACKNOWLEDGED: 1147.105004 docs/sec avg, std dev 27.88577035
ACKNOWLEDGED: 77539.27653 docs/sec avg, std dev 1567.520303
JOURNALED: 29574.45243 docs/sec avg, std dev 123.9927554
FSYNCED: 29567.02467 docs/sec avg, std dev 147.6150994
The huge inverted performance difference between UNACKNOWLEDGED and ACKNOWLEDGED is what's got me baffled.
Here's the raw data if anyone cares for it ("time" is elapsed msec for 100*1000 insertions; "rate" is docs/second):
"UNACK time","UNACK rate","ACK time","ACK rate","JRNL time","JRNL rate","FSYNC time","FSYNC rate"
92815,1077.4120562409094,1348,74183.9762611276,3380,29585.798816568047,3378,29603.31557134399
90209,1108.5368422219512,1303,76745.97083653108,3377,29612.081729345577,3375,29629.62962962963
91089,1097.8273995762386,1319,75815.01137225171,3382,29568.30277942046,3413,29299.73630237328
90159,1109.1516099335618,1320,75757.57575757576,3375,29629.62962962963,3377,29612.081729345577
89922,1112.0749093658949,1315,76045.62737642587,3380,29585.798816568047,3376,29620.853080568722
89997,1111.1481493827573,1306,76569.67840735069,3381,29577.048210588586,3379,29594.55460195324
90141,1109.373093264996,1319,75815.01137225171,3386,29533.372711163614,3378,29603.31557134399
89771,1113.9454835080371,1325,75471.69811320755,3387,29524.65308532625,3521,28401.022436807725
89716,1114.6283828971423,1325,75471.69811320755,3379,29594.55460195324,3379,29594.55460195324
90205,1108.5859985588381,1323,75585.78987150417,3377,29612.081729345577,3376,29620.853080568722
90092,1109.976468498868,1328,75301.2048192771,3382,29568.30277942046,3379,29594.55460195324
89822,1113.3129968159249,1322,75642.965204236,3385,29542.097488921714,3383,29559.562518474726
89821,1113.3253916122064,1310,76335.87786259541,3380,29585.798816568047,3383,29559.562518474726
89945,1111.7905386625162,1318,75872.53414264036,3379,29594.55460195324,3379,29594.55460195324
89917,1112.1367483345753,1352,73964.49704142011,3381,29577.048210588586,3377,29612.081729345577
90358,1106.7088691648773,1303,76745.97083653108,3377,29612.081729345577,3380,29585.798816568047
90187,1108.8072560346836,1348,74183.9762611276,3387,29524.65308532625,3395,29455.081001472754
90634,1103.3387029150208,1322,75642.965204236,3384,29550.827423167848,3381,29577.048210588586
90148,1109.2869503483162,1331,75131.48009015778,3389,29507.22927117144,3381,29577.048210588586
89767,1113.9951207013714,1321,75700.22710068131,3380,29585.798816568047,3382,29568.30277942046
89910,1112.2233344455567,1321,75700.22710068131,3381,29577.048210588586,3385,29542.097488921714
89852,1112.9412812180028,1316,75987.84194528875,3381,29577.048210588586,3401,29403.116730373422
89537,1116.8567184515898,1319,75815.01137225171,3380,29585.798816568047,3380,29585.798816568047
89763,1114.0447623185498,1331,75131.48009015778,3380,29585.798816568047,3382,29568.30277942046
90070,1110.2475852115022,1325,75471.69811320755,3383,29559.562518474726,3378,29603.31557134399
89771,1113.9454835080371,1302,76804.91551459293,3389,29507.22927117144,3378,29603.31557134399
90518,1104.7526458825869,1325,75471.69811320755,3383,29559.562518474726,3380,29585.798816568047
90314,1107.2480457071995,1322,75642.965204236,3380,29585.798816568047,3384,29550.827423167848
89874,1112.6688474976079,1329,75244.54477050414,3386,29533.372711163614,3379,29594.55460195324
89954,1111.6793027547415,1318,75872.53414264036,3381,29577.048210588586,3381,29577.048210588586
89903,1112.3099340400208,1325,75471.69811320755,3379,29594.55460195324,3388,29515.9386068477
89842,1113.0651588343983,1314,76103.500761035,3382,29568.30277942046,3377,29612.081729345577
89746,1114.2557885588217,1325,75471.69811320755,3378,29603.31557134399,3385,29542.097488921714
93249,1072.3975592231552,1327,75357.95026375283,3381,29577.048210588586,3377,29612.081729345577
93638,1067.9425019756936,1331,75131.48009015778,3377,29612.081729345577,3392,29481.132075471698
87775,1139.2765593847905,1340,74626.86567164179,3379,29594.55460195324,3378,29603.31557134399
86495,1156.136192843517,1271,78678.20613690009,3375,29629.62962962963,3376,29620.853080568722
85584,1168.442699570013,1276,78369.90595611285,3432,29137.529137529138,3376,29620.853080568722
86648,1154.094728095282,1278,78247.2613458529,3382,29568.30277942046,3411,29316.91586045148
85745,1166.2487608606916,1274,78492.93563579278,3380,29585.798816568047,3363,29735.355337496283
85813,1165.3246011676551,1279,78186.08287724786,3375,29629.62962962963,3376,29620.853080568722
85831,1165.0802157728558,1288,77639.75155279503,3376,29620.853080568722,3377,29612.081729345577
85807,1165.4060857505797,1259,79428.11755361399,3466,28851.702250432772,3375,29629.62962962963
85964,1163.2776511097668,1258,79491.2559618442,3378,29603.31557134399,3378,29603.31557134399
85854,1164.7680946723508,1257,79554.49482895785,3382,29568.30277942046,3375,29629.62962962963
85787,1165.6777833471272,1257,79554.49482895785,3377,29612.081729345577,3377,29612.081729345577
85537,1169.084723569917,1272,78616.35220125786,3377,29612.081729345577,3377,29612.081729345577
85408,1170.8505058074186,1271,78678.20613690009,3375,29629.62962962963,3425,29197.080291970804
85577,1168.5382754712132,1261,79302.14115781126,3378,29603.31557134399,3375,29629.62962962963
85663,1167.365140142185,1261,79302.14115781126,3377,29612.081729345577,3378,29603.31557134399
85812,1165.3381811401669,1273,78554.59544383347,3377,29612.081729345577,3378,29603.31557134399
85783,1165.7321380693145,1273,78554.59544383347,3377,29612.081729345577,3376,29620.853080568722
85682,1167.106276697556,1280,78125.0,3381,29577.048210588586,3376,29620.853080568722
85753,1166.1399601180133,1260,79365.07936507936,3379,29594.55460195324,3377,29612.081729345577
85573,1168.5928972923703,1332,75075.07507507507,3377,29612.081729345577,3377,29612.081729345577
86206,1160.0120641254668,1263,79176.56373713381,3376,29620.853080568722,3383,29559.562518474726
85593,1168.31983923919,1264,79113.92405063291,3380,29585.798816568047,3378,29603.31557134399
85903,1164.1036983574495,1261,79302.14115781126,3378,29603.31557134399,3377,29612.081729345577
85516,1169.3718134618082,1277,78308.53563038372,3375,29629.62962962963,3376,29620.853080568722
85553,1168.8660830128692,1291,77459.3338497289,3490,28653.295128939826,3377,29612.081729345577
85550,1168.907071887785,1293,77339.52049497294,3379,29594.55460195324,3379,29594.55460195324
85610,1168.0878402055835,1298,77041.60246533128,3384,29550.827423167848,3378,29603.31557134399
85522,1169.2897733916418,1267,78926.59826361484,3379,29594.55460195324,3379,29594.55460195324
85595,1168.2925404521293,1276,78369.90595611285,3379,29594.55460195324,3376,29620.853080568722
85451,1170.2613193526115,1286,77760.49766718507,3376,29620.853080568722,3391,29489.82601002654
85792,1165.609847071988,1252,79872.20447284346,3382,29568.30277942046,3376,29620.853080568722
86501,1156.0559993526085,1255,79681.2749003984,3379,29594.55460195324,3379,29594.55460195324
85718,1166.616113301757,1269,78802.20646178094,3382,29568.30277942046,3376,29620.853080568722
85605,1168.156065650371,1265,79051.38339920949,3378,29603.31557134399,3380,29585.798816568047
85398,1170.9876109510762,1274,78492.93563579278,3377,29612.081729345577,3395,29455.081001472754
86370,1157.809424568716,1273,78554.59544383347,3376,29620.853080568722,3376,29620.853080568722
85905,1164.0765962400326,1280,78125.0,3379,29594.55460195324,3379,29594.55460195324
86020,1162.5203441060219,1285,77821.01167315176,3375,29629.62962962963,3376,29620.853080568722
85726,1166.5072440099852,1272,78616.35220125786,3380,29585.798816568047,3380,29585.798816568047
85628,1167.8422945765403,1270,78740.15748031496,3379,29594.55460195324,3376,29620.853080568722
85989,1162.93944574306,1258,79491.2559618442,3376,29620.853080568722,3378,29603.31557134399
85981,1163.047650062223,1276,78369.90595611285,3376,29620.853080568722,3376,29620.853080568722
86558,1155.2947156819703,1269,78802.20646178094,3385,29542.097488921714,3378,29603.31557134399
85745,1166.2487608606916,1293,77339.52049497294,3378,29603.31557134399,3375,29629.62962962963
85544,1168.9890582624148,1266,78988.94154818325,3376,29620.853080568722,3377,29612.081729345577
85536,1169.0983913206135,1268,78864.35331230283,3380,29585.798816568047,3380,29585.798816568047
85477,1169.9053546568082,1278,78247.2613458529,3388,29515.9386068477,3377,29612.081729345577
85434,1170.4941826439124,1253,79808.45969672786,3378,29603.31557134399,3375,29629.62962962963
85609,1168.1014846569872,1276,78369.90595611285,3364,29726.516052318668,3376,29620.853080568722
85740,1166.316771635176,1258,79491.2559618442,3377,29612.081729345577,3377,29612.081729345577
85640,1167.6786548341897,1266,78988.94154818325,3378,29603.31557134399,3377,29612.081729345577
85648,1167.569587147394,1281,78064.012490242,3378,29603.31557134399,3376,29620.853080568722
85697,1166.9019919017,1287,77700.0777000777,3377,29612.081729345577,3378,29603.31557134399
85696,1166.9156086631815,1256,79617.83439490446,3379,29594.55460195324,3376,29620.853080568722
85782,1165.7457275419085,1258,79491.2559618442,3379,29594.55460195324,3379,29594.55460195324
85837,1164.9987767512844,1264,79113.92405063291,3379,29594.55460195324,3376,29620.853080568722
85632,1167.7877428998504,1278,78247.2613458529,3380,29585.798816568047,3459,28910.089621277824
85517,1169.3581393173288,1256,79617.83439490446,3379,29594.55460195324,3380,29585.798816568047
85990,1162.925921618793,1302,76804.91551459293,3380,29585.798816568047,3377,29612.081729345577
86690,1153.535586572846,1281,78064.012490242,3375,29629.62962962963,3381,29577.048210588586
86045,1162.1825788831425,1274,78492.93563579278,3380,29585.798816568047,3383,29559.562518474726
86146,1160.820003250296,1274,78492.93563579278,3382,29568.30277942046,3418,29256.87536571094
86027,1162.4257500552153,1280,78125.0,3382,29568.30277942046,3381,29577.048210588586
85992,1162.8988743138896,1281,78064.012490242,3376,29620.853080568722,3380,29585.798816568047
85857,1164.727395553071,1288,77639.75155279503,3382,29568.30277942046,3376,29620.853080568722
85853,1164.7816616775185,1284,77881.6199376947,3375,29629.62962962963,3374,29638.41138114997
86069,1161.8585088707896,1295,77220.07722007722,3378,29603.31557134399,3378,29603.31557134399
85842,1164.930919596468,1296,77160.49382716049,3378,29603.31557134399,3376,29620.853080568722
86195,1160.160102094089,1301,76863.95080707148,3376,29620.853080568722,3379,29594.55460195324
85523,1169.2761011657683,1305,76628.35249042146,3376,29620.853080568722,3378,29603.31557134399
85752,1166.1535591006625,1275,78431.37254901961,3374,29638.41138114997,3377,29612.081729345577
85441,1170.3982865369085,1286,77760.49766718507,3377,29612.081729345577,3380,29585.798816568047
85566,1168.6884977678048,1265,79051.38339920949,3377,29612.081729345577,3380,29585.798816568047
85523,1169.2761011657683,1267,78926.59826361484,3377,29612.081729345577,3376,29620.853080568722
86152,1160.7391586962578,1285,77821.01167315176,3374,29638.41138114997,3378,29603.31557134399
85684,1167.0790345922226,1272,78616.35220125786,3378,29603.31557134399,3384,29550.827423167848
86252,1159.3934053703103,1271,78678.20613690009,3376,29620.853080568722,3377,29612.081729345577

Titan + Tinkerpop extremly slow read

My setup is: Java 1.7, Tinkerpop 2.6, Titan 0.5.3, Cassandra 2.1.2 and Easticsearch 1.4.2.
My problem is that I have extremely slow reads. In my test code I'm inserting only one Vertex with one property. This takes 5 ms. Then I try to read this Vertex again. This takes 1500 ms. Why is the reading 300 times slower?
Any help is much appreciated.
long d1 = new Date().getTime();
String id = UUID.randomUUID().toString();
Vertex customer = g.addVertex();
customer.setProperty("somethingnew", id);
g.commit();
long d2 = new Date().getTime();
long d3 = 0;;
Iterable<Vertex> its = g.query().has("somethingnew", id).vertices();
for (Vertex vert : its) {
if (vert.getProperty("somethingnew").toString().equals(id)) {
d3 = new Date().getTime();
}
}
System.err.println( "Insert took [ms]:" + (d2 - d1));
System.err.println( "Read took [ms]:" + (d3 - d2));
You should probably look into reading about indexes

Google App-Engine Java Batch Update

I need to upload a .csv file and save the records in bigtable.
My application successfully parse 200 records in the csv files and save to table.
Here is my code to save the data.
for (int i=0;i<lines.length -1;i++) //lines hold total records in csv file
{
String line = lines[i];
//The record have 3 columns integer,integer,Text
if(line.length() > 15)
{
int n = line.indexOf(",");
if (n>0)
{
int ID = lInteger.parseInt(ine.substring(0,n));
int n1 = line.indexOf(",", n + 2);
if(n1 > n)
{
int Col1 = Integer.parseInt(line.substring(n + 1, n1));
String Col2 = line.substring(n1 + 1);
myTable uu = new myTable();
uu.setId(ID);
uu.setCol1(MobNo);
Text t = new Text(Col2);
uu.setCol2(t);
PersistenceManager pm = PMF.get().getPersistenceManager();
pm.makePersistent(uu);
pm.close();
}
}
}
}
But when no of records grow it gives timeout error.
The csv file may have upto 800 records.
Is it possible to do that in App-Engine?
(something like batch update)
GAE limit you app request to 30 sec, and you can't run long task.
Best approach is to split this CSV into smaller chunks, and process them individually, one after one. In the case when you can upload it only as one big file, you can store it as binary data, and then process (split and parse) using Task Queue (note that it's also limited to 10 minutes per request, but you can always make a chain of tasks). Or you can user backend to process.
You could store your CSV file in the Blobstore (gzipped or not) and use a MapReduce job to read and persist each line in the Datastore.

Categories