Can we treat binary files as documents? [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 12 days ago.
Improve this question
I am want to be able to store per user multiple types of binary files. Could be pdf, photos or very small video ~2MB.
I have in mind 2 approaches:
Use MySQL and have a BLOB column in a table and add in the column these different types of files.
Use MySQL to store metadata about the binary files but store the actual files in the filesystem.
I think (1) is simpler to implement but (2) allows for easier access of the files from everywhere e.g. even for download links.
What I was not sure though is if we can consider the binary files as documents and hence using e.g. Cassandra or any other NoSQL store is a better choice. What are the downsides of treating the binary files as "documents"?

The downside for this approach with Cassandra, is depending on the table structure, your partitions could get too big. The prevailing wisdom is to keep your partition sizes < 100MB. If this table is partitioned on something unique like video_id, then each movie is its own partition, and that shouldn't be a problem.
But if there's a category or playlist system where multiple videos are getting stored in the same partition, that could exceed that limit and read performance would degrade.
tl;dr;
Regardless of database choice, option #2 is the best practice. Storing binary files in a database almost always leads to problems (corruption, slow reads, higher ops maintenance). Storing the metadata or file location data in the DB, and using that to reference the binary files is a much friendlier solution with fewer opportunities for failure.

Related

How to fit large table in memory? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a Java Map<String, List<String>>.
Is there a way to improve this to make it use less memory without having too much impact on performance?
Three ideas:
An encoded byte array could provide a less memory-intensive representation than a string, especially if the string data actually uses an 8 bit (or less) character set.
A list of strings could be represented as a single string with a distinguished string separator character between the list components.
String data is often compressible.
Depending on the nature of your data, these could easily give a 2 fold reduction in space for the lists.
The downside is that you may need to fully or partially reconstruct the original List<String> objects, which would be a performance hit.
You should also consider using a non-memory resident representation; e.g. a conventional database, a NOSQL database or an "object cache" framework. JVMs with really large heaps tend to lead to performance problems if you need to do a "full" garbage collection, or if there is competition for physical memory with other applications.
One would really need to know a lot more on your specific application to definitely recommend a specific solution, but as a wild guess, if it is a really, really large table (e.g hundreds of thousands or millions of records), I would suggest you consider using a database to store data and access via one of data layer access abstractions, such as DataSet.
Databases are already optimized to efficiently store, search and access data over an amortized data and time range, so without further info on your application, I would go with this option.

Using SQLite or a File [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am new to Android development and I am trying to make a Trivia application.
I need to store the data relating to questions somewhere and I am not entirely sure where to store it.
I plan to have multiple people playing so I need each person to have the same questions.
Basically I planned to have a list of categories and within each category I had question objects.
The question objects contained information regarding the question such as the answers and question itself.
However, if I use a database, I believe none of this would be needed due the questions being stored in tables which would represent categories.
In terms of speed what would be better:
to store it in a database
or to read from a file every time the application is loaded and store the data within a data structure?
You almost certainly want a database. Databases are made for fast search and easy insertion/deletion. There's really no advantage to having a file and doing in memory parsing each time.
Aside from performance benefits, here's a simple list of advantages of using SQLite rather than flat file:
You can query items as you wish -- don't need to load all of them and
select which ones you need.
Record deletion is a much less painful process. No rewriting of whole
files into wherever.
Updating a record is as easy as removing or creating one.
Have you ever tried doing cross-referencing lookups on a flat file?
Just.Not.Worth.It.
To summarize, it's every advantage a Database has over a text file.
Answer by josephus

Complete word-database for Java-App to check if a word is actually a legit word, is SQL appropriate in this case? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am going to write a game in which I have often have to check if a string of letters is actually a word or not. My question is about how to do this the fastest with the least computation-able power as possible (for instance an old smart-phone). With if possible not much start-up time to make it a quick and responsive app.
I once did this look-up by first reading in a word-file with almost all words into an appropriate sized hash-map of around 650,000 words*. (* might be more, I am not sure if this is the exhausted list yet).
Would a SQL database be appropriate here? I am thinking of buying a book about it so I can learn and implement one. Also I have no idea how you could create a hash-map, save it for later and then load one. Is that too much of a hacker solution or is that technique used more often? So would it make sense for me to learn SQL or do it with saving a hashmap and then later restoring it.
A database SQL could be appropriate if you plan to query it every time you need to check a word, but this is not the fastest solution; querying every single word slows down the response time but it should use less memory if the words number is high (you must measure the memory consumed by the db vs the memory consumed by the map). Checking if a word is inside a map is not so computationally expensive, it must calculate the hash and iterate over the array of items with the same hash.
Personally I would choose a map if the memory requirements of keeping all the words in memory can be satisfied. You can store the dictionary as plain text file (one line -> one word) and read it in a background thread when the application starts.
If memory is an issue, this seems like a good use for a B-Tree. This allows for O(log n) search time while searching a large amount of records with minimal memory usage. For this sort of application it sounds like loading the entire thing into memory is not going to be a good idea.

Validating data in MS Excel in Java [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
How can I validate every data in Microsoft Excel before I import it in the database using Java? Should I use loops? But wouldn't it take much time in looping all data in the excel? example if i have more than 3000 data in Excel. What is the best way to validate bad user input in excel? For example bad date format or bad employee id format.
You gave very little detail. I have done something similar to this many times. So I can at least provide some advice.
Assuming you know that each column should contain data of a given format you can validate each input cell with RegEx, i.e. the cell either matches a given RegEx or it doesn't.
If match then import
If not match then
2a. If bad format then correct format
2b. If invalid data do something
2c. Reject entire row?
2d. Prompt for user action?
All of the excel files I had to deal with were machine generated based on user input. So while I could have bad data the format was always correct. If you are dealing with human generated files then you are going to have to assume that at least some of the data will be bad format.
As to your question on speed, 3000 rows is a drop if the bucket. For my project I was forced to use Access/VBA which is dam slow. I was dealing with many files of 10,000+ plus rows with upwards of 50+ columns. The entire run time of the process was around 5 minutes to have the program access the website, pull the files, and load them into the database.
Java is orders of magnitude faster than Access/VBA. The only way you will know if your run time is reasonable is to run some test. Likely, I could have optimized the run time but as the code was only ran once a week there was no need.

Is it good idea to store store List<100000> Pojo objects in memory [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
for some test data scenario i need to read file containing 100000 lines of row and process each row with some condition and then based on condition need to output the data in text format
for this i am planning to store all lines of data in some pojo then pojo to List
My worry is to having pojo of 100000 row in memory . this is just for testing case .
I think using InputSteam to read the file will be better since you still fetch rows one by one. You can read one line per time, and process your confition and then output.
Storing too much Objects in List may encounter an Out of Memory Error.
In any case, its a bad design to store all 100000 rows as POJO in memory. Some of the possible solutions are:
Read one row at a time and process it.
Rather than reading from a file one record at a time and processing it using java, use some scripting language to populate a database table, and then from your java code you can process the records from the table.

Categories