Can I save big json file directly to mongodb? - java

I have a web page with multiple controls which contains,text,images,attachments, and we allow customers to upload (drag and drop) attachments. I need to save theis data to mongo db. Till now , I was saving text data to one collection and saving attachments using GridFs separately.
I want to save all the data(text/images) including attachments (as json ) in base 64 encoding data as a single record.
Can I save this entire data as a single record into mongo db. The total file size could be more than 20MB (in case it has attachments).How can I achieve this?
Can I write entire data into json file and same it into mongo db using GridFs ?

Can I save this entire data as a single record into mongo db. The total file size could be more than 20MB (in case it has attachments).How can I achieve this?
The maximum size of a single BSON document in the current versions of MongoDB is 16MB, hence, I'm afraid you cannot save it in a single document.
Can I write entire data into json file and same it into mongo db using GridFs ?
This, on the other hand, you can do, though I don't know why would you. Your initial scheme (one document + files in GridFS) is the "normal" way of handling such cases, storing them in GridFS doesn't give you any edge: GridFS itself automatically splits documents into multiple chunks.

Related

can i store text file in java derby database?

I want to store text files in java derby database and then from that data I want to plot some graph can I do that?(the text file contains ASCII data)
You can use one of these 3 methods to store the data
Store the contents of the file into a db column as text. Depending
upon the data size either CLOB or the VARCHAR variants can
be used.
Store the file as such into the database and use BINARY data
type. The file needs to be binary read and stored.
Instead of storing the file or its data into the database just save the file in a folder and insert the filename with path into the database. Then select the filename back from the database and read the contents directly from files the folder.
The graph can be plot according to your requirement and the type of graph directly using java code or a 3rd party library can also be used instead of reinventing the wheel.

Java Hadoop inserting and query large data in JSON Format

I have a requirement to build a system with Java and Hadoop to handle large data processing (in JSON Format). The system I'm going to create is including insert data to the file storage (whether it is HDFS or database) and query the processed data
I have a big picture of using Hadoop MapReduce to query the data that the user want.
But one thing that makes me confuse is how should I insert the data. Should I use HDFS and inserting the file using Java with Hadoop API? Or is it better to use another tools (e.g. HBase, Relational Database, NoSQL Database) to insert the data so that Hadoop MapReduce will take the input data from another tools that I will be used?
Please advise.
Thank you very much
I would suggest you to use HDFS/HIVE/JSONSerde approach.
The solution outline would look like.
Store your JSON data on HDFS.
Create external tables using hive and use jsonSerde to map json data to columns of your table.
Query your data using hiveQL.
In the above solution, Since hive is schema-on-read, Your json data will be parsed every time when you query the tables.
But if you want to parse the data once and if you have data arriving in batches (weekly, monthly), it would be good to parse the data once and create a staging table. which can be used for frequent querying to avoid repetitive parsing by serde.
I have an example created at :Hadoopgig

loading a word2vec model into a Mysql database

I have a word2vec model stored in text file as
also -0.036738 -0.062687 -0.104392 -0.178325 0.010501 0.049380....
one -0.089568 -0.191083 0.038558 0.156755 -0.037399 -0.013798....
The size of the text file is more than 8GB.
I want to read this file into mysql database using the first word as key (in a column) and the rest of the line as another column. Is it possible to do so without reading each line and splitting it?
I went through some related questions but it didn't match what I want.
How to read a file and add its content to database?
read text file content and insert it into a mysql database
you can do it by:
making a simple for loop that iterates over the records on the model
aggregating about 100 records on an array
using mysql bulk insert feature to insert 100s of records at once
use a fast language like go if you can.
This thing you are trying to do, it's very possible, let me know if you need code for this.

Is it possible to modify the existing id in elasticsearch document?

I've to change the format of elasticseach document id, I was wondering if its possible without deleting and re-indexing all the documents.
You have to reindex. The simplest way to apply these changes to your existing data is: create a new index with the new settings and copy all of your documents from the old index to the new index with bulk-api, see:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html
Yes, You can do it via fetching the data and Re-Indexing it. But in case you have GB's of data you should Run it like a long term Job.
So, you can do like, Fetch the old format Documents Id's of the indexed data and store/index in the new storage such as Cassandra, MongoDB or even in the SQL(As such your application need) by mapping the new format ID to that older one and when you fetch it and while using or on the displaying the data replace that with the mapped newer ID.

With H2 Database can I perform a SQL query on CSV text read from a Java Reader e.g. StringReader?

Is there a way to perform SQL queries on CSV text held in memory and read in from a Java Reader e.g. StringReader.
org.h2.tools.Csv.read(Reader reader, String[] colNames) would allow me to retrieve a result set containing all the rows and columns. However, I actually want to perform a query on the CSV text read from the Reader.
The background - I receive a file containing multiple CSV sections for each entity, see Can H2 Database query a CSV file containing multiple sections of different record groups?, and whilst parsing the file I store each of the CSV sections I need in Strings (a String for each one). This shouldn't bog down memory as I only keep the data in memory for a short time and each CSV section is relatively small). I need to perform queries on these CSV sections to build a document in a custom format.
I could write each CSV section to a file (as a set of files) and use CSVREAD, but I don't want to do that as I need my application to be as fast as possible and splitting and writing the sections to disk will thrash the hard drive to death.
You could write a user defined function that returns a result set, and use that to generate the required rows. Within your user defined function, you can use the Csv tool from H2 (actually any Csv tool).
This is not possible directly, since DBMS can usually only query their own optimized data storage. You have to import the text with the mentioned org.h2.tools.Csv.read into a table and perform the queries on that table. The table may be a temporary one, to prevent any writes on the disk, assuming the memory is sufficient.

Categories