I have a specific question regarding an Anylogic model that I am trying to build.
I have 3 tables:
connections with columns connecteddc and connectedcustomer
customer with columns custname and demand
dcdetails with columns dcname and dccapactiy
I am trying to write a java code that connects each dc in the first table (connecteddc) to each customer assigned (connectedcustomer) and iterates through this process multiple times to build an accurate network. I have tried using several variations of code, as shown below.
for (int i=0; i<3 ; i++){
dc.get(i).LinktoCustomers.connectTo(Locations.get(selectFirstValue(false, int.class, "SELECT connectedcustomer FROM connections WHERE connectedDC = "+i+";")));
}
This code is only connecting 1 DC to 1 customer. This problem is occurring in the 'selectFirstValue' portion of the code.
Database Query
You have to use one of the possibilies to retrieve all of the concerning database entries, instead of just the first one, as you do with selectFirstValue(). Here is one option to do so:
for (int i=0; i<dc.size() ; i++){
List<Tuple> rows = selectFrom(connection)
.where(connection.connecteddc.eq(dc.get(i).dcName))
.list();
for (Tuple row : rows) {
dc.get(i).connectTo(getCustomerByName(row.get(connection.connectedcustomer)));
}
}
Tipp: AnyLogic offers you an assistant to create such queries, that you find in the AnyLogic toolbar under "Insert Database Query". It looks like this:
AnyLogic Database Query Assistant
Other Stuff
I modified a couple of other things that catched my attention:
To add a connection you use dc.get(i).LinktoCustomers.connectTo(...). It is not neccessary to use a special variable for the connections, it is enough to just add it to the standard connections by using: dc.get(i).connectTo(...)
You go through the list of DCs with a hardcoded max index. As soon as you change the number of entries in the DC table, the code will not work anymore. I recommend something like this: for (int i=0; i<dc.size() ; i++){...}.
You gave the name "Locations" to your population of Agent type "Customer". It is confusing to use a population name that doesn't reflect the underlying agent type at all. I recommend to rename it for example "Customers".
To access your DCs you store and use the index number of the DC as an integer in the tables. In order to be on the safe side, I recommend to use unique String Ids instead, which will work even if you change to order of your table. For this to work you'll need to "parse" the Id (stored in the tables) to a Customer object.
This could be done in a function getCustomerByName(String name) like this (although this obviously lacks error handling):
for(Customer c:Customers){
if(c.custName.equals(name)){
return c;
}
}
return null;
Related
I have written an application to scrape a huge set of reviews. For each review i store the review itself Review_Table(User_Id, Trail_Id, Rating), the Username (Id, Username, UserLink) and the Trail which is build previously in the code (Id, ...60 other attributes)
for(Element card: reviewCards){
String userName = card.select("expression").text();
String userLink = card.select("expression").attr("href");
String userRatingString = card.select("expression").attr("aria-label");
Double userRating;
if(userRatingString.equals("NaN Stars")){
userRating = 0.0;
}else {
userRating = Double.parseDouble(userRatingString.replaceAll("[^0-9.]", ""));
}
User u;
Rating r;
//probably this is the bottleneck
if(userService.getByUserLink(userLink)!=null){
u = new User(userName, userLink, new HashSet<Rating>());
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}else {
u = userService.getByUserLink(userLink);
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}
i = i +1;
ratingSet.add(r);
userSet.add(u);
}
saveToDb(userSet, t, link, ratingSet);
savedEntities = savedEntities + 1;
log.info(savedEntities + " Saved Entities");
}
The code works fine for small-medium sized dataset but i encounter a huge bottleneck for larger datasets. Let's suppose i have 13K user entities already stored in the PostgresDB and another batch of 8500 reviews comes to be scraped, i have to check for every review if the user of that review is already stored. This is taking forever
I tried to define and index on the UserLink attribute in Postgres but the speed didn't improve at all
I tried to take and collect all the users stored in the Db inside a set and use the contains method to check if a particular user already exists in the set (in this way I thought I could bypass the database bottleneck of 8k write and read but in a risky way because if the users inside the db table were too much i would have encountered a memory overflow). The speed, again, didn't improve
At this point I don't have any other idea to improve this
Well for one, you would certainly benefit from not querying for each user individually in a loop. What you can do is query & cache for only the UserLink or UserName meaning get & cache the complete set of only one of them because that's what you seem to need to differentiate in the if-else.
You can actually query for individual fields with Spring Data JPA #Query either directly or even with Spring Data JPA Projections to query subset of fields if needed and cache & use them for the lookup. If you think the users could run into millions or billions then you could think of using a distributed cache like Apache Ignite where your collection could scale easily.
Btw, the if-else seem to be inversed is it not?
Next you don't store each review individually which the above code appears to imply. You can write in batches. Also since you are using Postgres you can use Postgres CopyManager provided by Postgres for bulk data transfer by using it with Spring Data Custom repositories. So you can keep writing to a new text/csv file locally at a set schedule (every x minutes) and use this to write that batched text/csv to the table (after that x minutes) and remove the file. This would be really quick.
The other option is write a stored procedure that combines the above & invoke it again in a custom repository.
Please let me know which one you had like elaborated..
UPDATE (Jan 12 2022):
One other item i missed is when you querying for UserLink or UserName you can use a very efficient form of select query that Postgres supports instead of using an IN clause like below,
#Select("select u from user u where u.userLink = ANY('{:userLinks}'::varchar[])", nativeQuery = true)
List<Users> getUsersByLinks(#Param("userLinks") String[] userLinks);
I have a table PERSON with more than 5 millions rows and I need to update field NICKNAME on each one of them based on the field NAME inside the same table.
ResultSet rs = statement.executeQuery("select NAME from PERSON");
while(rs.next())
{
// some parsing function like:
// Nickname = myparsingfunction(rs.getString("NAME"));
rs.updateString( "NICKNAME", Nickname );
rs.updateRow();
}
But I got this error:
not implemented by SQLite JDBC driver
I'm using sqlite-jdbc-3.8.11.2.jar downloaded at https://bitbucket.org/xerial/sqlite-jdbc/downloads.
I know I could use the following SQL query:
statement.executeUpdate("update PERSONS set NICKNAME = Nickname where ID = Id");
But that would take forever and I understand updating ResultSet would be faster. So what options do I have to update the table on the fastest way? Any other driver available? Should I move out of Java?
UPDATE
I was able to find a fast solution using below syntax. The block between CASE and END was a concatenated string that I built before executing the SQL query, so I could send all updates at once.
update PERSON
set NICKNAME= case ID
when 173567 then 'blabla'
when 173568 then 'bleble'
...
when 173569 then 'blublu'
end
where ID in (173567, 173568, 173569)
As you have encountered, the SQLite JDBC driver does not currently support the updateString operation. This can be seen in the source code for this driver.
I can think of three options:
As you stated in your question, you can select the name and ID of the person and then update the person by its ID. Those updates could be done in a batch (using PreparedStatement.addBatch()) to improve performance (tutorial).
Implement the method myparsingfunction in pure SQL so that the query could become UPDATE PERSONS SET NICKNAME = some_function(NAME).
Create an user-defined function (using org.sqlite.Function), implemented in Java, and call it inside the SQL. Example, taken from this answer:
Function.create(db.getConnection(), "getNickName", new Function() {
protected void xFunc() throws SQLException {
String name = value_text(0);
String nickName = ...; // implement myparsingfunction here
result(nickName);
}
});
and use it like this: UPDATE PERSONS SET NICKNAME = getNickName(NAME);
SQLite does not support stored procedures so that option is out of the table.
I'm not sure which of these options would provide the best performance (certainly using pure SQL would be faster but that may not be a viable solution). You should benchmark each solution to find the one that fits you.
I am working on a Project in which I need to delete all the columns and its data except for one column and its data in Cassandra using Astyanax client.
I have a dynamic column family like below and we already have couple of million records into that Column Family.
create column family USER_TEST
with key_validation_class = 'UTF8Type'
and comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and gc_grace = 86400
and column_metadata = [ {column_name : 'lmd', validation_class : DateType}];
I have user_id as the rowKey and other columns I have is something like this -
a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,lmd
Now I need to delete all the columns and its data except for a15 column. Meaning, I want to keep a15 column and its data for all the user_id(rowKey) and delete rest of the columns and its data..
I already know how to delete data from Cassandra using Astyanax client for a particular rowKey-
public void deleteRecord(final String rowKey) {
try {
MutationBatch m = AstyanaxConnection.getInstance().getKeyspace().prepareMutationBatch();
m.withRow(AstyanaxConnection.getInstance().getEmp_cf(), rowKey).delete();
m.execute();
} catch (ConnectionException e) {
// some code
} catch (Exception e) {
// some code
}
}
Now how to delete all the columns and its data except for one column for all the users id which is my rowKey...
Any thoughts how this can be done using Astyanax client efficiently?
It appears that Astyanax does not currently support the slice delete functionality that is a fairly recent addition to both the storage engine and the Thrift API. If you look at the thrift API reference: http://wiki.apache.org/cassandra/API10
You see that the delete operation takes a SlicePredicate, which can take either a list of columns or a SliceRange. A SliceRange, could specify all columns greater or less than the column you wanted to keep, so that would allow you to do two slice delete operations to delete all but one of the columns in the row.
Unfortunately, Astyanax only has the ability to delete an entire row, or a defined list of columns and doesn't wrap the full SlicePredicate functionality. So it looks like you have two options:
1) See about sending a raw thrift slice delete, bypassing Astyanax wrapper, or
2) Do a column read, followed by a row delete, followed by a column write. This is not ideally efficient, but if it isn't done too frequently shouldn't be prohibitive.
or
3) Read the entire row and explicitly delete all of the columns other than the one you want to preserve.
I should note that while the storage engine and thrift API both support slice deletes, this is also not yet explicitly supported by CQL.
I filed this ticket to address that last limitation:
https://issues.apache.org/jira/browse/CASSANDRA-6292
I have an application developed based on MySQL that is connected through Hibernate. I used DAO utility code to query the database. Now I need optimize my database query by indexes. My question is, how can I query data through Hibernate DAO utility code and make sure indexes are used in MySQL database when queries are executed. Any hints or pointers to existing examples are appreciated!
Update: Just want to make the question more understandable a little bit. Following is the code I used to query the MySQL database through Hibernated DAO utility codes. I'm not directly using HQL here. Any suggestions for a best solution? If needed, I will rewrite the database query code and use HQL directly instead.
public static List<Measurements> getMeasurementsList(String physicalId, String startdate, String enddate) {
List<Measurements> listOfMeasurements = new ArrayList<Measurements>();
Timestamp queryStartDate = toTimestamp(startdate);
Timestamp queryEndDate = toTimestamp(enddate);
MeasurementsDAO measurementsDAO = new MeasurementsDAO();
PhysicalLocationDAO physicalLocationDAO = new PhysicalLocationDAO();
short id = Short.parseShort(physicalId);
List physicalLocationList = physicalLocationDAO.findByProperty("physicalId", id);
Iterator ite = physicalLocationList.iterator();
while(ite.hasNext()) {
PhysicalLocation physicalLocation = (PhysicalLocation)ite.next();
List measurementsList = measurementsDAO.findByProperty("physicalLocation", physicalLocation);
Iterator jte = measurementsList.iterator();
while(jte.hasNext()){
Measurements measurements = (Measurements)jte.next();
if(measurements.getMeasTstime().after(queryStartDate)
&& measurements.getMeasTstime().before(queryEndDate)) {
listOfMeasurements.add(measurements);
}
}
}
return listOfMeasurements;
}
Just like with SQL, you don't need to do anything special. Just execute your queries as usual, and the database will use the indices you've created to optimize them, if possible.
For example, let's say you have a HQL query that searches all the products that have a given name:
select p from Product where p.name = :name
This query will be translated by Hibernate to SQL:
select p.id, p.name, p.price, p.code from product p where p.name = ?
If you don't have any index set on product.name, the database will have to scan the whole table of products to find those that have the given name.
If you have an index set on product.name, the database will determine that, given the query, it's useful to use this index, and will thus know which rows have the given name thanks to the index. It willl thus be able to only read a small subset of the rows to return the queries data.
This is all transparent to you. You just need to know which queries are slow and frequent enough to justify the creation of an index to speed them up.
I am attempting to create a test database (based off of my production db) at runtime, but rather than have to maintain an exact duplicate test db i'd like to copy the entire data structure of my production db at runtime and then when I close the test database, drop the entire database.
I assume I will be using statements such as:
CREATE DATABASE test //to create the test db
CREATE TABLE test.sampleTable LIKE production.sampleTable //to create each table
And when I am finished with the test db, calling a close method will run something like:
DROP DATABASE test //delete the database and all its tables
But how do I go about automatically finding all the tables within the production database without having to manually write them out. The idea is that I can manipulate my production db without having to be concerned with maintaining the structure identically within the test db.
Would a stored procedure be necessary in this case? Some sample code on how to achieve something like this would be appreciated.
If the database driver you are using supports it, you can use DatabaseMetaData#getTables to get the list of tables for a schema. You can get access to DatabaseMetaData from Connection#getMetaData.
In your scripting language, you call "SHOW TABLES" on the database you want to copy. Reading that result set a row at a time, your program puts the name of the table into a variable (let's call it $tablename) and can generate the sql: "CREATE TABLE test.$tablename LIKE production.$tablename". Iterate through the result set and you're done.
(You won't get foreign key constraints that way, but maybe you don't need those. If you do, you can run "SHOW CREATE TABLE $tablename" and parse the results to pick out the constraints.)
I don't have a code snippet for java, but here is one for perl that you could treat as pseudo-code:
$ref = $dbh->selectall_arrayref("SHOW TABLES");
unless(defined ($ref)){
print "Nothing found\n";
} else {
foreach my $row_ref (#{$ref}){
push(#tables, $row_ref->[0]);
}
}
The foreach statement iterates over the result set in an array reference returned by the database interface library. The push statement puts the first element of the current row of the result set into an array variable #tables. You'd be using the database library appropriate for your language of choice.
I would use mysqldump : http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html
It will produce a file containing all the sql commands needed to replicate the prod database
The solutions was as follows:
private static final String SQL_CREATE_TEST_DB = "CREATE DATABASE test";
private static final String SQL_PROD_TABLES = "SHOW TABLES IN production";
JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
jdbcTemplate.execute(SQL_CREATE_TEST_DB);
SqlRowSet result = jdbcTemplate.queryForRowSet(SQL_PROD_TABLES);
while(result.next()) {
String tableName = result.getString(result.getMetaData().getColumnName(1)); //Retrieves table name from column 1
jdbcTemplate.execute("CREATE TABLE test2." + tableName + " LIKE production." + tableName); //Create new table in test2 based on production structure
}
This is using Spring to simplify the database connection etc, but the real magic is in the SQL statements. As mentioned by D Mac, this will not copy foreign key constraints, but that can be achieved by running another SQL statement and parsing the results.