Just started out with my hobby project and now I am here to get help with making the correct database design/query. I have made a simple Java program that loops trough the content of a folder. I want to save this content to a MySQL database, so I added a connector to my database in Java, created a table and the columns "file", "path" and "id, "date" in MySQL.
So now to the important/fun thing, every time I want to add the filenames to the MySQL in Java I do this (when the GUI-button is pressed I call on a method that does):
DELETE all entries with the same file path - this is to ensure that I will get new entries which is exactly the same as the content in the path.
Java-loop: INSERT the file-info into the columns id, path, filename and date when the file was added to the database.
In this way I can always ensure that the filenames that are going to be added into the database always are up to date, it doesn't matter if I rename a file or remve it, it will be up to date since the table will get it's entries deleted and new info will be written. Old info -> DELETE old info - INSERT new info -> Up-to-date.
I know this is probably not the best solution but it works, but now I am stuck on the next thing I want to do. I want to add the difference of the files in order to know which files has been added and deleted between two inserts, and here is my problem, since the entries are deleted before a new INSERT I cannot compare. How would you change the design or the solution? All ideas are welcome and since I am so fresh I would really appreciate if you could show me how the query could look like.
Do not remove all rows first. Remove only the ones that are removed (or event better, just mark them "inactive" as I suggest below). Query your DB first, to see what was there last time.
I would maintain additional column in your table called "inactive". It will be FALSE as default, and TRUE for removed files. Please keep in mind that as your file is uniquely identified by file+path+id renaming any file is indeed an operation of deleting the old one and creating the new one.
Removing things from DB is not a good idea, as you might always remove something by accident (bug in the code) and would not be able to get the data back.
Additional thing to do is adding the hash to your table. This way you will be able to check if the file was really changed. There is no need to re-add the file to the DB is it is not changed. See Getting a File's MD5 Checksum in Java for more info.
One way to achieve this is to implement auditing of your table. A common approach is to create a copy of the table where you are storing the folder contents and name that table using a convention to indicate it is storing audit information (eg. _AUD) . You then add additional columns to the AUD table, like "REV" (revision), "REV_TYPE" (inserted, deleted, modified). Whenever you insert, update or delete any rows from your main table, you insert a row into the AUD table to describe what you've done. Then you can find the operations associated with each revision by looking it up in the AUD table. A java framework that provides this feature is hibernate envers (http://hibernate.org/orm/envers/).
Related
I am using Cassandra database integrated into a spring boot application.
My Question is around the schema actions. If I need to make structural changes to the DB, say add a column to a table, the database needs to be recreated, however this means all the existing data gets deleted:
schema-action: CREATE_IF_NOT_EXISTS
The only way I have managed to solve this is by using the RECREATE scheme action, but as mentioned earlier, this results in data-loss.
What would be the best approach to handle this? To add structural changes such as a column name with out having to recreate the database and lose all existing data?
Thanks
Cassandra does allow you to modify the schema of an existing table without recreating it from scratch, using the ALTER TABLE statement via cqlsh. However, as explained in that link, there are some important limitations on the kind of changes you can do. You cannot modify the primary key of the table at all, you can add or delete regular columns, and you can't change the type of a column to a non-compatible one.
The reason for most of these limitations is how Cassandra needs to deal with the old data that already exists in the table. For example, it doesn't make sense to say that a column A that until now contained strings - will now contain integers - how are we supposed to handle all the old values in column A which weren't integers?
As Aaron rightly said in a comment, it is unlikely you'll want to do these schema changes as part of your application. These are usually rare operations which are done manually, or via some management application - not your usual application.
I'll be direct about my situation right now. I'm working in a project which will perform a "Base load" procedure based on an excel (xlsx, xls) file. It has been developed in java with JDBC drivers. right now this project is working, It takes an excel file and based on a configuration It performances the insert into differents tables. The point is: It's taking too long doing the job, which makes it inefficient. (It takes around 2 hours inserting 3000 records on DB). in the future, this software will be inserted around 30k records and it will be painfully slow. So I need to improve its efficience and I was thinking in: Instead of inserting from java via JDBC drivers. I will generate control files and data files to be inserted in the DB using SQLLDR.
The point I'm facing right now, I need to insert these data into several tables, and this tables are related to each other. That's means, If I insert a person into "Person_table" I will need the Primary Key generated by a database sequence to insert the "Address, Phone, email, etc." into other table, so I do not know how to get the primary keys generated in the first insert via SQLLDR.
I'm not sure sure yet if SQLLDR is my best way to do this, but I guess It is, because the DBMS is Oracle
Can you guys lead me about how could I do what I explained you guys I need to do? any suggestion is welcome and well received. It does not matter if your suggestions are not about how to do this with SQLLDR.
I'm a kind of stuck at this point right now, I really appreciate the help you could give me.
SQL*Loader can't read native Excel files (at least, as far as I know). Therefore, you'll have to save the result as a CSV file.
As you need to manipulate foreign key constraints, consider switching to external tables feature - basically, the background is still SQL*Loader, but you can write (PL/)SQL against those files/tables (yes - a CSV file, stored on a hard disk, acts as if it was an Oracle table).
So, you'd "load" one table, populate primary key values, populate another (child) table - possibly into a "temporary" (not necessarily a global temporary table) which doesn't have any constraints enabled, populate foreign key values and move data into a "real" target table whose constraints now won't fail.
Possible drawback: CSV files have to reside in a directory that is accessible to the database server, as you'll have to create a directory (Oracle object) and grant required privileges (usually read, write) to user who will be using it. Directory is usually created on a server itself; if not, you'll have to use UNC while creating it.
Now you have something to read about/research; see if it makes sense to you.
My app connects with diferent databases. First of all, I generate the orm code with speedment for the first database. But when I try to connect to a newone Speedment deletes the code generated for the previous one.
The Speedment Tool can't currently connect to more than one database, but there is a hack to get around this.
Speedment will generate code based on the speedment.json-file. When you connect to a new database, your speedment.json-file is overwritten and is therefore not used in the second pass. To get around this, save the original file as something else (like speedment2.json) and then connect to the second database. Instead of generating, simply press "Save". This will create a new speedment.json-file without generating code. Then open the created file in a text editor and add manually combine the files. Look for an value with the key "dbmses". It should be mapped to a list of objects, in the first file the object represents the first database and in the second file it represents the second database. If you combine these two lists, save the file and then reopen the UI, then you should see both the databases there. From here on you can use the tool to make changes and regenerate code as usual.
I have two attributes in my attribute dictionary. One is SAMPLE_ATTRIBUTE and the other one is MODEL_ATTRIBUTE. If I delete SAMPLE_ATTRIBUTE, and want to rename MODEL_ATTRIBUTE to SAMPLE_ATTRIBUTE, can I do it? Will the change reflect right away? Or is there anything that need to be “run” to purge that reference before I can rename another attribute with the same name?
you can delete the dictionary attribute as long as it is not referenced by other products, if it is referenced (assigned to other products), you can not delete manually from CMC before you go and delete the attribute from referencing product.
you can rename dictionary attribute to another one as long as the identifier is unique, and it will save your changes instantly yo database
if you use this dictionary as facetable attribute, what I encountered in previous project that deleting the dictionary attribute will leave the record in SRCHATTR table, so I have to delete the record manually using SQL before i can mark it facetable again.
front-end store (Aurora) is using Apache solr for product browsing, product detail & search, deleting or changing in facetable dictionary attribute will trigger Full Solr indexing to your products, you need to make sure that you have the schedule job "UpdateSearchIndex" scheduled at site level , otherwise solr indexing will not occur and hence you will not see your changes reflected.
in FEP7+, triggering UpdateSearchIndex" job will also invalidate dynacache records for that product . not sure about FEP6 but this feature is not there before FEP6 , so if you have Caching enabled, you need to figure out way to invalidate those product caches (normally by writing sql triggers)
hope that answers your question and give you what you need.
Thanks
Abed
i have some large data in one table and small data in other table,is there any way to run initial load of golden gate so that same data in both tables wont be changed and rest of the data got transferred from one table to other.
Initial loads are typically for when you are setting up the replication environment; however, you can do this as well on single tables. Everything in the Oracle database is driven by System Change Numbers/Change System Numbers (SCN/CSN).
By using the SCN/CSN, you can identify what the starting point in the table should be and start CDC from there. Any prior to the SCN/CSN will not get captured and would require you to manually move that data in some fashion. That can be done by using Oracle Data Pump (Export/Import).
Oracle GoldenGate also provided a parameter called SQLPredicate that allows you to use a "where" clause against a table. This is handy with initial load extracts because you would do something like TABLE ., SQLPredicate "as of ". Then data before that would be captured and moved to the target side for a replicat to apply into a table. You can reference that here:
https://www.dbasolved.com/2018/05/loading-tables-with-oracle-goldengate-and-rest-apis/
Official Oracle Doc: https://docs.oracle.com/en/middleware/goldengate/core/19.1/admin/loading-data-file-replicat-ma-19.1.html
On the replicat side, you would use HANDLECOLLISIONS to kick out any ducplicates. Then once the load is complete, remove it from the parameter file.
Lots of details, but I'm sure this is a good starting point for you.
That would require programming in java.
1) First you would read your database
2) Decide which data has to be added in which table on the basis of data that was read.
3) Execute update/ data entry queries to submit data to tables.
If you want to run Initial Load using GoldenGate:
Target tables should be empty
Data: Make certain that the target tables are empty. Otherwise, there
may be duplicate-row errors or conflicts between existing rows and
rows that are being loaded. Link to Oracle Documentations
If not empty, you have to treat conflicts. For instance if the row you are inserting already exists in the target table (INSERTROWEXISTS) you should discard it, if that's what you want to do. Link to Oracle Documentation