I'm making a browser of defined list of files.
I want to compact empty folders (Like idea based IDE usally can do)
Originally I have a list of files (I get it from MediaStore):
folder1/folder2/folder3/file1.mp3
folder1/folder2/folder3/file2.mp3
folder1/file3.mp3
And I want my browser has this structure:
folder1
-folder2/folder3
-file1.mp3
-file2.mp3
-file3.mp3
How I did that:
When the first time I get files from MediaStore I create a table in a database:
id name parent_id has_songs
0 folder1 -1 1
1 folder2 0 0
2 folder3 1 1
When every time the browser displays folders it makes a request to the database.
And then I start checking folders inside(An additional request to the db for every check is needed): if a folder doesn't have songs and has only one child folder then compact them, then check the next and the next.
This way for the example above if I want to see 'insides' of folder1 it does 3 requests to the local db:
1. Get list of all folders (Make a request to the db here)
2. Check folder2 has one subfolder and doesn't have songs (Make a request to the db here)
3. Check folder3 has one subfolder and doesn't have songs (Make a request to the db here)
1. Is that the best way to implement this?
2. Is it performance critical
to make so many requests to the local db on user click?
It depends on your environment. :)
In general:
If the file list is relatively small it is worth to consider if you could keep all the information in memory. If you have enough free memory, this could be the best solution since it is way faster than reaching the database. I used to create an object that actually represents the directory and file structure, afterwards the view side opend and closed the blocks according to user demands. (Open all, close all etc.)
Is that the best way to implement this?
The environment is the key. As I mentioned it could be a good solution. If you are using a distant (or more advanced [SQLite is able to do it]) database server it is also a good thing to consider to create a view which contains the child count. (Or simply change the generated table introducing the "inside_dir_count" field.) This way you could fetch entire paths.
Is it performance critical to make so many requests to the local db on user click?
It is a relative small request to the database you represented here, and if it really depends on the click of the user it is a good solution. The mentioned local database should respond in little to no time. This action is not something that will happen 1000 times per second, so in my personal view it is good solution but there is a room for improvement like in everything.
Related
I am just trying to write huge data which is fetching from mysql db to CSV by using supercsv. How simply I can manage the performance issue. Does super csv write with some limits?
Since you included almost no detail in your question about how you are approaching the problem, it's hard to make concrete recommendations. So, here's a general one:
Unless you are writing your file to a really slow medium (some old USB stick or something), the slowest step in your process should be reading the data from the database.
There are two general ways how you can structure your program:
The bad way: Reading all the data from the database into your application's memory first and then, in a second step, writing it all in one shot to the csv file.
The right way: "Stream" the data from the db into the csv file, i.e. write the data to the csv file as it comes in to your application (record by record or batch by batch).
The idea is to set up something usually referred to as a "pipeline". Think of it like conveyor belt construction in a factory: You have multiple steps in your process of assembling some widget. What you don't want to do is have station 1 process all widgets and have stations 2 and 3 sit idle meanwhile, and then pass the whole container of widgets to station 2 to begin work, while stations 1 and 3 sit idle and so forth. Instead, station 1 needs to send small batches (1 at a time or 10 at a time or so) of widgets that are done to station 2 immediately so that they can start working on it as soon as possible. The goal is to keep all stations as busy as possible at all times.
In your example, station 1 is mysql retrieving the records, station 2 is your application that forwards (and processes?) them, and station 3 is supercsv. So, simply make sure that supercsv can start working as soon as possible, rather than having to wait for mysql to finish the entire request.
If you do this right, you should be able to generate the csv file as quickly as mysql can throw records at you*, and then, if it's still too slow, you need to rethink your database backend.
*I haven't used supercsv yet, so I don't know how well it performs, but given how trivial its job is and how popular it is, I would find it hard to believe that it would end up performing less well (as measured in processing time needed for one record) than mysql in this task. But this might be something that is worth verifying...
I am very new to lotus notes. Recently my team mates were facing a problem regarding the Duplicates in Lotus notes as shown below in the CASE A and CASE B.
So we bought a app named scanEZ (Link About scanEX). Using this tool we can remove the first occurrence or the second occurrence. As in the case A and Case B the second items are considered as redundant because they do not have child. So we can remove all the second item as given below and thus removing the duplicates.
But in the Case 3 the order gets changed, the child item comes first and the Parent items comes second so i am unable to use the scanEX app.
Is there any other better way or software or script to accomplish my task. As I am new to this field I have not idea. Kindly help me.
Thanks in advance.
Probably the easiest way to approach this would be to force the view to always display documents with children first. That way the tool you have purchased will behave consistently for you. You would do this by adding a hidden sorted column to the right of the column that that you have circled. The formula in this column would be #DocChildren, and the sort options for the column would be set to 'Descending'. (Note that if you are uncomfortable making changes in this view, you can make a copy of it, make your changes in the copy, and run ScanEZ against the copy as well. You can also do all of this in a local replica of the database, and only replicate it back to the server when you are satisified that you have the right results.)
The other way would be to write your own code in LotusScript or Java, using the Notes classes. There are many different ways that you could write that code,
I agree with Richard's answer. If you want more details on how to go thru the document collection you could isolate the documents into a view that shows only the duplicates. Then write an agent to look at the UNID of the document, date modified and other such data elements to insure that you are getting the last updated document. I would add a field to the document as in FLAG='keep'. Then delete documents that don't have your flag in the document with a second agent. If you take this approach you can often use the same agents in other databases.
Since you are new to Notes keep in mind that Notes is a document database. There are several different conflicts like save conflicts or replication conflicts. Also you need to look at database settings on how duplicates can be handled. I would read up on these topics just so you can explain it to your co-workers/project manager.
Eventually in your heavily travelled databases you might be able to automate this process after you work down the source of the duplicates.
These are clearly not duplicates.
The definition of duplicate is that they are identical and so it does not matter which one is kept and which one is removed. To you, the fact that one has children makes it more important, which means that they are not pure duplicates.
What you have not stated is what you want to do if multiple documents with similar dates/subjects have children (a case D if you will).
To me this appears as three separate problems.
The first problem is to sort out the cases where more than one
document in a set has children.
Then sort out the cases where only one document in a set has children.
Then sort out the cases where none of the documents in a set has children.
The approach in each case will be different. The article from Ytira only really covers the last of these cases.
here's the situation.
In a Java Web App i was assigned to mantain, i've been asked to improve the general response time for the stress tests during QA. This web app doesn't use a database, since it was supposed to be light and simple. (And i can't change that decision)
To persist configuration, i've found that everytime you make a change to it, a general object containing lists of config objects is serialized to a file.
Using Jmeter i've found that in the given test case, there are 2 requests taking up the most of the time. Both these requests add or change some configuration objects. Since the access to the file must be sinchronized, when many users are changing config, the file must be fully written several times in a few seconds, and requests are waiting for the file writing to happen.
I have thought that all these serializations are not necessary at all, since we are rewriting the most of the objects again and again, the changes in every request are to one single object, but the file is written as a whole every time.
So, is there a way to reduce the number of real file writes but still guarantee that all changes are eventually serialized?
Any suggestions appreciated
One option is to do changes in memory and keep one thread on the background, running at given intervals and flushing the changes to the disk. Keep in mind, that in the case of crash you'll lost data that wasn't flushed.
The background thread could be scheduled with a ScheduledExecutorService.
IMO, it would be better idea to use a DB. Can't you use an embedded DB like Java DB, H2 or HSQLDB? These databases support concurrent access and can also guarantee the consistency of data in case of crash.
If you absolutely cannot use a database, the obvious solution is to break your single file into multiple files, one file for each of config objects. It would speedup serialization and output process as well as reduce lock contention (requests that change different config objects may write their files simultaneously, though it may become IO-bound).
One way is to to do what Lucene does and not actually overwrite the old file at all, but to write a new file that only contains the "updates". This relies on your updates being associative but that is usually the case anyway.
The idea is that if your old file contains "8" and you have 3 updates you write "3" to the new file, and the new state is "11", next you write "-2" and you now have "9". Periodically you can aggregate the old and the updates. Any physical file you write is never updated, but may be deleted once it is no longer used.
To make this idea a bit more relevant consider if the numbers above are records of some kind. "3" could translate to "Add three new records" and "-2" to "Delete these two records".
Lucene is an example of a project that uses this style of additive update strategy very successfully.
I am making an app for Android, in my Activity I need to load an array of about 10000 strings. Loading it from database was slow, so I decided to put it directly into one .java file (as a private field). I have about 20 of these classes containing string arrays and my question is, are all the classes loaded into memory after my application is started? If so the Activity in which I need these strings would be loaded quickly, but the application as a whole would have a slow start...
Is there other way, how to very quickly load an 10000 string array from a file?
UPDATE:
Why I need these strings? My Android app allows you to find "journeys" in Prague's public transit - you choose departure stop, arrival stop and it finds your journey (have a look here). My app has a suggestions feature - you enter leter "c" as your departure stop and a suggestions ListView appears with stops starting with "c". For these suggestions I need the strings. Fetching the suggestions from database is slow (about 400ms on G1).
First, 400ms to perform a simple database query is really slow. So slow that I'd suspect that there is some problem in your database schema (e.g. indices) or your database connection configuration.
But if you a serious about not using a database, there are a couple of alternatives to what you are currently doing:
Arrange that the classes containing the arrays are lazily loaded as required, using Class.forName(...). If you implement it right, it should be possible for the garbage collector to reclaim the classes after they have been loaded and the strings have been added to your primary data structure.
Turn the 10000 Strings into a flat file, put the file into your app's JAR file. Then use Class.getResourceAsStream(...) to open the file and read it into the in-memory array.
As above, but using an indexed file and replacing the array with a data structure that allows you to read Strings from the file lazily. (This will be a bit complicated, but if you are worried by the memory consumed by the 10000 Strings, this will help address that.)
A class is loaded only when it is first referenced.
Though you need an array of 10000, you may not need all at once. Here is where the concept of paging comes in. This link indicates that Paging is often done in Android.Initialy have only a small amount of array in memory, and as you need it, keep loading it in to memory and unloading any previous data from memory if not wanted.
For e.g. in any table, at one shot, the user may see at best 50 records, then he will have to scroll(considering his screen is not size of an iMax movie theatre). When he scrolls, load the next chunk of data and unload any data that is now inivsible to the user.
When is a Type Loaded? This is a
surprisingly tricky question to
answer. This is due in large part to
the significant flexibility afforded,
by the JVM spec, to JVM
implementations. Loading must be
performed before linking and linking
must be performed before
initialization. The VM spec does
stipulate the timing of
initialization. It strictly requires
that a type be initialized on its
first active use (see Appendix A for a
list of what constitutes an "active
use"). This means that loading (and
linking) of a type MUST be performed
at or before that type's first active
use.
From http://www.developer.com/java/other/article.php/2248831/Java-Class-Loading-The-Basics.htm
I don't think that you will be happy with maintaining 10K Strings, hardcoded at Java files.
Rather check if you are using the right database for your problem and if your indices are set correctly. A wrong index can cause really poor performance.
Additionally you should limit the amount of results returned by the query, but make sure you don't fetch the entries one by one.
If nothing fits, you can still preload the Strings from the database at startup.
You could preload, let's say 10 entries, for each character. If a character is keyed in, you can preload the entries with that character following another and so on.
I challenge you :)
I have a process that someone already implemented. I will try to describe the requirements, and I was hoping I could get some input to the "best way" to do this.
It's for a financial institution.
I have a routing framework that will allow me to recieve files and send requests to other systems. I have a database I can use as I wish but it is only me and my software that has access to this database.
The facts
Via the routing framework I recieve a file.
Each line in this file follows a fixed length format with the identification of a person and an amount (+ lots of other stuff).
This file is 99% of the time im below 100MB ( around 800bytes per line, ie 2,2mb = 2600lines)
Once a year we have 1-3 gb of data instead.
Running on an "appserver"
I can fork subprocesses as I like. (within reason)
I can not ensure consistency when running for more than two days. subprocesses may die, connection to db/framework might be lost, files might move
I can NOT send reliable messages via the framework. The call is synchronus, so I must wait for the answer.
It's possible/likely that sending these getPerson request will crash my "process" when sending LOTS.
We're using java.
Requirements
I must return a file with all the data + I must add some more info for somelines. (about 25-50% of the lines : 25.000 at least)
This info I can only get by doing a getPerson request via the framework to another system. One per person. Takes between 200 and 400msec.
It must be able to complete within two days
Nice to have
Checkpointing. If im going to run for a long time I sure would like to be able to restart the process without starting from the top.
...
How would you design this?
I will later add the current "hack" and my brief idea
========== Current solution ================
It's running on BEA/Oracle Weblogic Integration, not by choice but by definition
When the file is received each line is read into a database with
id, line, status,batchfilename and status 'Needs processing'
When all lines is in the database the rows are seperated by mod 4 and a process is started per each quarter of the rows and each line that needs it is enriched by the getPerson call and status is set to 'Processed'. (38.0000 in the current batch).
When all 4 quaters of the rows has been Processed a writer process startes by select 100 rows from that database, writing them to file and updating their status to 'Written'.
When all is done the new file is handed back to the routing framework, and a "im done" email is sent to the operations crew.
The 4 processing processes can/will fail so its possible to restart them with a http get to a servlet on WLI.
Simplify as much as possible.
The batches (trying to process them as units, and their various sizes) appear to be discardable in terms of the simplest process. It sounds like the rows are atomic, not the batches.
Feed all the lines as separate atomic transactions through an asynchronous FIFO message queue, with a good mechanism for detecting (and appropriately logging and routing failures). Then you can deal with the problems strictly on an exception basis. (A queue table in your database can probably work.)
Maintain batch identity only with a column in the message record, and summarize batches by that means however you need, whenever you need.
When you receive the file, parse it and put the information in the database.
Make one table with a record per line that will need a getPerson request.
Have one or more threads get records from this table, perform the request and put the completed record back in the table.
Once all records are processed, generate the complete file and return it.
if the processing of the file takes 2 days, then I would start by implementing some sort of resume feature. Split the large file into smaller ones and process them one by one. If for some reason the whole processing should be interrupted, then you will not have to start all over again.
By splitting the larger file into smaller files then you could also use more servers to process the files.
You could also use some mass loader(Oracles SQL Loader for example) to take the large amount of data form the file into the table, again adding a column to mark if the line has been processed, so you can pick up where you left off if the process should crash.
The return value could be many small files which at the end would be combined into large single file. If the database approach is chosen you could also save the results in a table which could then be extracted to a csv file.