MongoInternalException while inserting into mongoDB - java

I was entering data into mongodb but suddenly encountered with this error.Don't know how to fix this.Is this due to maximum size exceeded?.If no then why I am getting this error?.Anyone know how to fix this? Below is the error which I encountered
Exception in thread "main" com.mongodb.MongoInternalException: DBObject of size 163745644 is over Max BSON size 16777216
I know my dataset is large...but is there any other solution??

the document you are trying to insert is exceeding the max BSON document size ie 16 MB
Here is the reference documentation : http://docs.mongodb.org/manual/reference/limits/
To store documents larger than the maximum size, MongoDB provides the GridFS API.
The mongofiles utility makes it possible to manipulate files stored in
your MongoDB instance in GridFS objects from the command line. It is
particularly useful as it provides an interface between objects stored
in your file system and GridFS.
Ref : MongoFiles

For inserting an document of size greater than 16MB you need to use GRIDFS by MongoDB. GridFS is an abstraction layer on mongoDB which divide data in chunks (by default 255K ). As you are using java, its simple to use with java driver too. I am inserting an elasticsearch jar(of size 20mb) in mongoDB. Sample code :
MongoClient mongo = new MongoClient("localhost", 27017);
DB db = mongo.getDB("testDB");
String newFileName = "elasticsearch-Jar";
File imageFile = new File("/home/impadmin/elasticsearch-1.4.2.tar.gz");
GridFS gfs = new GridFS(db);
//Insertion
GridFSInputFile inputFile = gfs.createFile(imageFile);
inputFile.setFilename(newFileName);
inputFile.put("name", "devender");
inputFile.put("age", 23);
inputFile.save();
//Fetch back
GridFSDBFile outputFile = gfs.findOne(newFileName);
Find out more here.
If you want to insert directly using mongoclient you will use mongofiles as mentioned in other answer.
Hope that helps.....:)

Related

Storing large Mongo Document using GridFs

I have large mongo db document that I want to store using GridFs library.
For small documents, we use MongoDbTemplate as:
DBObject dbObject = new DbObject();
dbObject.put("user", "alex");
mongoDbTemplate.save(dbObject, "collectionName");
For large documents, we use GridFsTemplate as:
DBObject metaData = new BasicDBObject();
metaData.put("user", "alex");
InputStream inputStream = new FileInputStream("src/main/resources/test.png");
gridFsTemplate.store(inputStream, "test.png", "image/png", metaData).toString();
Here's we don't define any collectionName. Is there any way to store large documents within a given collection?
GridFS uses files and chunks collections. Their names are not configurable.
You can choose which database you store the data in.

Connection closed when trying to read an oracle Clob returned from procedure

I have an Oracle procedure with an input Clob and returns an output Clob.
When i'm trying to recover the value, i reach the object, if i try to read the toString fro the object, i take the "oracle.sql.CLOB#625a8a83" . But when i want to read the object, in anyways i tryed, allways get a connection closed exception.
in my code:
MapSqlParameterSource parametros = new MapSqlParameterSource();
// setting input parameter
parametros.addValue("PE_IN", new SqlLobValue("IN DATA CLOB", new DefaultLobHandler()),
Types.CLOB);
// Executing call
Map<String, Object> out = jdbcCall.execute(parametros);
salida.setDatosRespuesta(out.get("PS_OUT").toString());
if i change the last line for this:
Clob clob = (Clob) out.get("PS_OUT");
long len = clob.length();
String rtnXml = clob.getSubString(1, (int) len);
i get the connection close error. I tryed in several ways and i can't solve this problem. Any ideas?
I think yo are using the SimpleJdbcCall of the spring framework. If so the database configuration are the default configurations for the oracle driver, you need to increase the time out for the reading of the values for the connection. Check the DatabaseMetaData documentation, also check the OracleConnection properies CONNECTION_PROPERTY_THIN_READ_TIMEOUT_DEFAULT. This happends because you are reading a large data from the database remember that de CLOB can have until 4gb of data
You need to keep in mind that is this process is very common in your application you need to consider the quantity of the connections to the database in order to have always enable connections to your database to guarantee your application availability
Regarding the out.get("PS_OUT").toString() this basically only show the hash that represents your object that the reason beacause why that line works fine

Execute MongoTemplate.aggregate without row retrival

I'm using the Spring Mongo driver to execute a large mongo aggregation statement that will run for a period of time. The output stage of this aggregation writes the output of the aggregation into a new collection. At no point do I need to retrieve the results of this aggregation in-memory.
When I run this in Spring boot, the JVM is running out of memory doing row retrieval, although I'm not using or storing any of the results.
Is there a way to skip row retrieval using MongoTemplate.aggregate?
Ex:
mongoTemplate.aggregate(Aggregation.newAggregation(
Aggregation.sort(new Sort(new Sort.Order(Sort.Direction.DESC, "createdOn"))),
Aggregation.group("accountId")
.first("bal").as("bal")
.first("timestamp").as("effectiveTimestamp"),
Aggregation.project("_id", "effectiveTimestamp")
.andExpression("trunc(bal * 10000 + 0.5) / 100").as("bal"),
aggregationOperationContext -> new Document("$addFields", new Document("history",Arrays.asList(historyObj))),
// Write results out to a new collection - Do not store in memory
Aggregation.out("newBalance")
).withOptions(Aggregation.newAggregationOptions().allowDiskUse(true).build()),
"account", Object.class
);
Use AggregationOption - skipOutput() . This will not return a result in case of aggregation pipeline contains $out/$merge operation.
mongoTemplate.aggregate(aggregation.withOptions(newAggregationOptions().skipOutput().allowDiskUse(true).build()), "collectionNme", EntityClass.class);
If you are using MongoDriver without framework.
MongoClient client = MongoClients.create("mongodb://localhost:27017");
MongoDatabase database = client.getDatabase("my-collection");
MongoCollection<Document> model = database.getCollection(collectionName);
AggregateIterable<Document> aggregateResult = model.aggregate(bsonListOfAggregationPipeline);
// instead iterating over call toCollection() to skipResult
aggregateIterable.toCollection();
References:
https://jira.mongodb.org/browse/JAVA-3700
https://developer.mongodb.com/community/forums/t/mongo-java-driver-out-fails-in-lastpipelinestage-when-aggregate/9784
I was able to resolve this by using
MongoTempalte.aggregateStream(...).withOptions(Aggregation.newAggregationOptions().cursorBatchSize(0).build)

Streaming big files from postgres database into file system using JDBC

I store files in my postgres database in a column of type bytea with a size potentionaly exceeding the allocated Java heap space so when trying to write those files into the file system I quickly run into out of memory issues.
I am using JDBC to perform the query and then extract the content as binary stream.
This is a simplified version of my code:
public File readContent(String contentId) {
PreparedStatement statement = jdbcTemplate.getDataSource().getConnection().prepareStatement("SELECT content from table.entry WHERE id=?");
statement.setString(1, contentId);
ResultSet resultSet = statement.executeQuery();
resultSet.next();
File file = writeToFileSystem(resultSet.getBinaryStream(1));
resultSet.close();
return file;
}
private File writeToFileSystem(InputStream inputStream) {
File dir = createDirectories(Paths.get(properties.getTempFolder(), UUID.randomUUID().toString())).toFile();
File file = new File(dir, "content.zip");
FileUtils.copyInputStreamToFile(inputStream, file);
return file;
}
My expectation was that this would let me stream the data from the database into the file without ever having to load it into memory entirely. This approach doesn't work however as I am still getting OutOfMemoryErrors as soon as the query is executed:
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.postgresql.core.PGStream.receiveTupleV3(PGStream.java:395)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2118)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:288)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:430)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:168)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:116)
at sun.reflect.GeneratedMethodAccessor201.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
at com.sun.proxy.$Proxy149.executeQuery(Unknown Source)
at [...].ContentRepository.readContent(ContentRepository.java:111)
Is there any way I can stream the data from the database into a file without having to increase the Java VMs available memory?
As per this mail group discussion you should not be using bytea for this use case:
There are two methods to store binary data in pg and they have different
access methods and performance characteristics. Bytea data is expected to
be shorter and is returned in whole with a ResultSet by the server. For
larger data you want to use large objects which return a pointer (oid) to
the actual data which you can then stream from the server at will.
This page describes some of the differences between the two and
demonstrates using a pg specific api to access large objects, but
getBlob/setBlob will work just fine.
See Chapter 7. Storing Binary Data which shows example code and Chapter 35. Large Objects that goes into details:
PostgreSQL has a large object facility, which provides stream-style access to user data that is stored in a special large-object structure. Streaming access is useful when working with data values that are too large to manipulate conveniently as a whole.

Writing data to remote mongodb database

I am reading a CSV file line by line and upserting data to mongodb database. It takes 2 mins approx. to read, process and write data to mongodb from all files, when db and the files are on same machine. Whereas when the db is located on another machine in my network, it takes around 5 mins. It is taking even more time on remote machine. can anyone please help me out to reduce time ?. Thanks.
An approach to your problem with reduction in processing time.
To Read the CSV file and put it into MongoDB Use an ETL such as Kettle.
http://wiki.pentaho.com/display/BAD/Write+Data+To+MongoDB
This will enhance the time in reading from CSV to writing in MongoDB.
Simplest way to have the data in the remote machine.
Export the Data in your local db and import it in the remote machine.
https://docs.mongodb.com/v2.6/core/import-export/
Hope it Helps!
I saw that you are using Java to load your Mongo database.
The Java driver on recent version allow bulk operations, so you can send a batch of insert to mongo instead of sending them one by one. This will speed up inserts in mongoDB a lot.
DBCollection collection = db.getCollection("my_collection");
List<DBObject> list = new ArrayList<>();
for(int i = 0; i < 100; i++){
//generate your datas
BasicDBObject obj = new BasicDBObject("key", "value");
list.add(obj);
}
collection.insert(list);//bulk insert of 100 obj!
This is available since Mongo 2.6 : https://docs.mongodb.com/manual/reference/method/Bulk.insert/

Categories