What's the methodology of testing #SequenceGenerator from Hibernate? I want to be sure that every sequence is perfectly mapped, no mistake in spelling, and incrementing is done by 1. Is there any way to do this dynamically for all sequences?
Here's the sample of my sequence mapping:
#Column(name = "ADDRESS_ID", nullable = false, precision = 20)
#Id
#SequenceGenerator(name = "AddressSeq", sequenceName = "ADDRESS_SEQ", allocationSize = 1)
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "AddressSeq")
private Long addressId;
You can use this answer to obtain a list of your entity classes:
List<ClassLoader> classLoadersList = new LinkedList<ClassLoader>();
classLoadersList.add(ClasspathHelper.contextClassLoader());
classLoadersList.add(ClasspathHelper.staticClassLoader());
reflections = new Reflections(
new ConfigurationBuilder()
.setScanners(new SubTypesScanner(false),
new ResourcesScanner())
.setUrls(
ClasspathHelper.forClassLoader(classLoadersList
.toArray(new ClassLoader[0])))
.filterInputsBy(
new FilterBuilder().include(FilterBuilder
.prefix("me.business.model"))));
And get your DDL with:
ClassPathResource cpr = new ClassPathResource("db/schema.sql");
schemaContent = new String(FileCopyUtils.copyToByteArray(cpr
.getInputStream())).toLowerCase();
And get the sequence for each class witH;
private String getSequenceName(Class<?> clazz) {
for (Field f : clazz.getDeclaredFields()) {
if (f.isAnnotationPresent(SequenceGenerator.class)) {
SequenceGenerator sg = f.getAnnotation(SequenceGenerator.class);
return sg.sequenceName();
}
}
return null;
}
The test is simple:
Set<Class<?>> entities = reflections.getSubTypesOf(Object.class);
for (Class<?> clazz : entities) {
String name = getSequenceName(clazz);
if (name == null)
continue;
if (!schemaContent.contains(name.toLowerCase())) {
fail("The clazz " + clazz.getSimpleName()
+ " has a sequence called: " + name
+ " and it doesn't exits");
}
}
You can see it here
If you want to see if it works, change the sequence name in one of your entitites and run the test.
Related
I have a dataframe named timeDF which has the schema below:
root
|-- Id: long (nullable = true)
|-- Model: timestamp (nullable = true)
|-- Prevision: timestamp (nullable = true)
I want to add a new row at the end of timeDF by transforming two Calendar objects c1 & c2 to Timestamp. I know I can do it by first converting them to Timestamp like so :
val t1 = new Timestamp(c1.getTimeInMillis)
val t2 = new Timestamp(c2.getTimeInMillis)
However, I can't figure out how I then write those variables to timeDF as a new row, and how to let spark increase the Id column value ?
Should I create a List object with t1 and t2 and make a temporary dataframe from this list to then union the two dataframes ? If so how do I manage the Id column ? Isn't it too much a mess for such a simple operation ?
Can someone explain me please ?
Thanks.
Here is a solution you can try, in a nutshell:
Ingest your file.
Create a new dataframe with your data and unionByName().
Correct the id.
Clean up.
Create the extra record
First you create the extra record from scratch. As you mix several types, I used a POJO, here is the code:
List<ModelPrevisionRecord> data = new ArrayList<>();
ModelPrevisionRecord b = new ModelPrevisionRecord(
-1L,
new Timestamp(System.currentTimeMillis()),
new Timestamp(System.currentTimeMillis()));
data.add(b);
Dataset<ModelPrevisionRecord> ds = spark.createDataset(data,
Encoders.bean(ModelPrevisionRecord.class));
timeDf = timeDf.unionByName(ds.toDF());
ModelPrevisionRecord is a very basic POJO:
package net.jgp.labs.spark.l999_scrapbook.l000;
import java.sql.Timestamp;
public class ModelPrevisionRecord {
public long getId() {
return id;
}
public void setId(long id) {
this.id = id;
}
public Timestamp getModel() {
return model;
}
public void setModel(Timestamp model) {
this.model = model;
}
public Timestamp getPrevision() {
return prevision;
}
public void setPrevision(Timestamp prevision) {
this.prevision = prevision;
}
private long id;
private Timestamp model;
private Timestamp prevision;
public ModelPrevisionRecord(long id, Timestamp model, Timestamp prevision) {
this.id = id;
this.model = model;
this.prevision = prevision;
}
}
Correct the Id
The id is -1, so the id is to create a new column, id2, with the right id:
timeDf = timeDf.withColumn("id2",
when(
col("id").$eq$eq$eq(-1), timeDf.agg(max("id")).head().getLong(0)+1)
.otherwise(col("id")));
Cleanup the dataframe
Finally, clean up your dataframe:
timeDf = timeDf.drop("id").withColumnRenamed("id2", "id");
Important notes
This solution will only work if you add one record at a time, otherwise, you will end up having the same id.
You can see the whole example here: https://github.com/jgperrin/net.jgp.labs.spark/tree/master/src/main/java/net/jgp/labs/spark/l999_scrapbook/l000, it might be easier to clone...
If your first dataframe can be sorted by ID and you need to add rows one by one you can find maximum ID in your list:
long max = timeDF.agg(functions.max("Id")).head().getLong(0);
and then increment and add it to your dataframe by Union. To do this, follow the following example which age can act like id. people.json is a file in spark examples.
Dataset<Row> df = spark.read().json("H:\\work\\HadoopWinUtils\\people.json");
df.show();
long max = df.agg(functions.max("age")).head().getLong(0);
List<Row> rows = Arrays.asList(RowFactory.create(max+1, "test"));
StructType schema = DataTypes.createStructType(Arrays.asList(
DataTypes.createStructField("age", DataTypes.LongType, false, Metadata.empty()),
DataTypes.createStructField("name", DataTypes.StringType, false, Metadata.empty())));
Dataset<Row> df2 = spark.createDataFrame(rows, schema);
df2.show();
Dataset<Row> df3 = df.union(df2);
df3.show();
I tried this but I don't know why, when printing the table saved, it only keep the last 2 rows, all others being deleted.
This is how I init the delta table :
val schema = StructType(
StructField("Id", LongType, false) ::
StructField("Model", TimestampType, false) ::
StructField("Prevision", TimestampType, false) :: Nil
)
var timestampDF = spark.createDataFrame(sc.emptyRDD[Row], schema)
val write_format = "delta"
val partition_by = "Model"
val save_path = "/mnt/path/to/folder"
val table_name = "myTable"
spark.sql("DROP TABLE IF EXISTS " + table_name)
dbutils.fs.rm(save_path, true)
timestampDF.write.partitionBy(partition_by)
.format(write_format)
.save(save_path)
spark.sql("CREATE TABLE " + table_name + " USING DELTA LOCATION '" + save_path + "'")
And this how I add a new item to it
def addTimeToData(model: Calendar, target: Calendar): Unit = {
var timeDF = spark.read
.format("delta")
.load("/mnt/path/to/folder")
val modelTS = new Timestamp(model.getTimeInMillis)
val targetTS = new Timestamp(target.getTimeInMillis)
var id: Long = 0
if (!timeDF.head(1).isEmpty) {
id = timeDF.agg(max("Id")).head().getLong(0) + 1
}
val newTime = Arrays.asList(RowFactory.create(id, modelTS, targetTS))
val schema = StructType(
StructField("Id", LongType, false) ::
StructField("Model", TimestampType, false) ::
StructField("Prevision", TimestampType, false) :: Nil
)
var newTimeDF = spark.createDataFrame(newTime, schema)
val unionTimeDF = timeDF.union(newTimeDF)
timeDF = unionTimeDF
unionTimeDF.show
val save_path = "/mnt/datalake/Exploration/Provisionning/MeteoFrance/Timestamps/"
val table_name = "myTable"
spark.sql("DROP TABLE IF EXISTS " + table_name)
dbutils.fs.rm(save_path, true)
timeDF.write.partitionBy("Model")
.format("delta")
.save(save_path)
spark.sql("CREATE TABLE " + table_name + " USING DELTA LOCATION '" + save_path + "'")
}
I'm not very familiar with delta tables so I don't know if I can just use SQL on it to add values like so :
spark.sql("INSERT INTO 'myTable' VALUES (" + id + ", " + modelTS + ", " + previsionTS + ")");
And I don't if just putting the timestamps variable like so will work.
I have problem with "selecting" values using CrudRepository.
For example:
I have Client class:
#Entity
public class Client {
#Id
private Long id;
#ManyToMany
Map<AttributeType, Attribute> map = new HashMap<>();
}
and AttributeType class:
#Entity
public class AttributeType {
#Id
private Long id;
#Column(unique = true)
private String name;
}
"Attribute" is abstract entity which have "subtypes" of String and Boolean (AttributeBoolean, AttributeString), and both have 2 fields ID, and VAL (which is String val or Boolean val depends on "className").
So, if I want to select client list which have "AttributeType" in map I can use:
public interface ClientRepository extends CrudRepository<Client, Long> {
#Query("select cli from Client cli where KEY(cli.map) = :attributeType ")
List<Client> selectByAttributeType(#Param("attributeType ") AttributeType attributeType );
}
But I have problem with "selecting clients which AttributeType equals x and that Attribute equals y".
I was trying to use:
#Query("select cli from Client cli \n" +
"where KEY(cli.map) = :attributeType \n" +
"and VALUE(cli.map).val = :val")
List<Client> selectByAttributeTypeAndParam(
#Param("attributeType") AttributeType attributeType, #Param("val") String val);
But it throws exception: Caused by: java.lang.IllegalArgumentException: Parameter value [xxxxxxxx] did not match expected type [java.lang.Boolean (n/a)].
So, the question is:
Do you have any idea, how can I select Every Client where:
AttributeType = x
and value from map (for that AttributeType) = y ?
//edit...
To better understund:
We have situation like this:
AttributeType at1 =... //does not matter what is it
AttributeType at2 =... //does not matter what is it
AttributeType at3 =... //does not matter what is it
Client c1 = new Client();
Map<AttributeType, Attribute> map1 = c1.getMap();
map1.put(at1, new AttributeString("123456789");
map1.put(at2, new AttributeBoolean(true);
Client c2 = new Client();
Map<AttributeType, Attribute> map2 = c2.getMap();
map2.put(at1, new AttributeString("111111111");
map2.put(at3, new AttributeBoolean(true);
So, i want CrudRepository function which take 2 parameters.
1) AttributeType
2) SomeValue (can be String or Boolean)
If this function looks like:
#Query("some query - no idea how to write it")
List<Client> selectByAttributeTypeAndParam(
#Param("attributeType") AttributeType attributeType, #Param("val") String val);
and if i run:
List<Client> someList = selectByAttributeTypeAndParam(at1, "12345679");
I want that someList have 1 record which is c1.
And if i have equivalent function for Boolean search and i run:
List<Client> someList = selectByAttributeTypeAndParam(at2, true);
Than i want that someList have both c1 and c2 records.
I've been trying for two days now to store an array list with about six million entries in my Postgres database with Spring-Data-JPA.
The whole thing works, but it's very slow. I need about 27 minutes for everything.
I've already played around with the batch size, but that didn't bring much success. I also noticed that saving takes longer and longer the bigger the table gets. Is there a way to speed it up ?
I've done the whole thing with SQLite before, there I only needed about 15 seconds for the same amount.
My Entity
#Data
#Entity
#Table(name = "commodity_prices")
public class CommodityPrice {
#Id
#Column( name = "id" )
#GeneratedValue( strategy = GenerationType.SEQUENCE )
private long id;
#Column(name = "station_id")
private int station_id;
#Column(name = "commodity_id")
private int commodity_id;
#Column(name = "supply")
private long supply;
#Column(name = "buy_price")
private int buy_price;
#Column(name = "sell_price")
private int sell_price;
#Column(name = "demand")
private long demand;
#Column(name = "collected_at")
private long collected_at;
public CommodityPrice( int station_id, int commodity_id, long supply, int buy_price, int sell_price, long demand,
long collected_at ) {
this.station_id = station_id;
this.commodity_id = commodity_id;
this.supply = supply;
this.buy_price = buy_price;
this.sell_price = sell_price;
this.demand = demand;
this.collected_at = collected_at;
}
}
My insert Class
#Slf4j
#Component
public class CommodityPriceHandler {
#Autowired
CommodityPriceRepository commodityPriceRepository;
#Autowired
private EntityManager entityManager;
public void inserIntoDB() {
int lineCount = 0;
List<CommodityPrice> commodityPrices = new ArrayList<>( );
StopWatch stopWatch = new StopWatch();
stopWatch.start();
try {
Reader reader = new FileReader( DOWNLOAD_SAVE_PATH + FILE_NAME_COMMODITY_PRICES );
Iterable<CSVRecord> records = CSVFormat.EXCEL.withFirstRecordAsHeader().parse( reader );
for( CSVRecord record : records ) {
int station_id = Integer.parseInt( record.get( "station_id" ) );
int commodity_id = Integer.parseInt( record.get( "commodity_id" ) );
long supply = Long.parseLong( record.get( "supply" ) );
int buy_price = Integer.parseInt( record.get( "buy_price" ) );
int sell_price = Integer.parseInt( record.get( "sell_price" ) );
long demand = Long.parseLong( record.get( "demand" ) );
long collected_at = Long.parseLong( record.get( "collected_at" ) );
CommodityPrice commodityPrice = new CommodityPrice(station_id, commodity_id, supply, buy_price, sell_price, demand, collected_at);
commodityPrices.add( commodityPrice );
if (commodityPrices.size() == 1000){
commodityPriceRepository.saveAll( commodityPrices );
commodityPriceRepository.flush();
entityManager.clear();
commodityPrices.clear();
System.out.println(lineCount);
}
lineCount ++;
}
}
catch( IOException e ) {
log.error( e.getLocalizedMessage() );
}
commodityPriceRepository.saveAll( commodityPrices );
stopWatch.stop();
log.info( "Successfully inserted " + lineCount + " lines in " + stopWatch.getTotalTimeSeconds() + " seconds." );
}
}
My application.properties
# HIBERNATE
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.PostgreSQLDialect
spring.jpa.properties.hibernate.jdbc.lob.non_contextual_creation=true
spring.jpa.hibernate.ddl-auto = update
spring.jpa.properties.hibernate.jdbc.batch_size=1000
spring.jpa.properties.hibernate.order_inserts=true
While you are doing your insert in batch, your sequence generation strategy still requires you to issue one statement for each record you insert. Thus, for a batch size of 1000 records you issue 1001 statements, which is clearly not what is expected.
My recommendations:
enable sql logging to see what statements are sent to your db. I personally use datasource-proxy, but use anything you are happy with.
modify your sequence generator. At a minimum, use
#Id
#Column( name = "id" )
#GeneratedValue(generator = "com_pr_generator", strategy = GenerationType.SEQUENCE )
#SequenceGenerator(name="com_pr_generator", sequenceName = "book_seq", allocationSize=50)
private long id;
Read about different generation strateges and fine tune your sequence generator.
A beginner’s guide to Hibernate enhanced identifier generators
Hibernate pooled and pooled-lo identifier generators
I'm trying to add to my crud services the possibility to specify what nested relationship I need so I don't have to read everything from the database.
Take for example I have those entities
Company.java
private List<Department> departments;
private SalaryCode salaryCode;
Department.java
private List<Employee> employees;
private Company company;
private SalaryCode salaryCode;
Employee.java
private Department department;
private SalaryCode salaryCode
And my Criteria query for now is this :
Session session = sessionFactory.openSession();
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<T> criteriaQuery = builder.createQuery(clazz);
Root<T> root = criteriaQuery.from(clazz);
//nestedRelationships is a varargs passed as parameters
for(String nestedRelationship : nestedRelationships) {
root.fetch(nestedRelationship, JoinType.LEFT);
}
List<T> result = session.createQuery(criteriaQuery.select(root)).list();
The thing is if I specify "department" as nestedRelationship and querying for Employee entity it works well but when I try to specify "department.salaryCode" it doesn't work saying " Unable to locate Attribute with the the given name ".
Of course I'm fetching "department" first and then "department.salaryCode".
Is it supported? If yes how does it work and if it's not supported what can I do?
Yes,it is supported. You need to use Joins.
Root<Company> root = criteriaQuery.from(Company.class);
Join<Company,Department> joinDepartment = root.join( Company_.departments );
Join<Department,SalaryCode> joinSalaryCode = joinDepartment.join( Department_.salaryCode );
To generate metamodel classes(e.g. Department_ ) have a look at here.
I found a solution by making an algorithm using the Root element
protected void fetch(Root<T> root, String... joins) {
//Sort the joins so they are like this :
//A
//A.F
//B.E
//B.E.D
//B.G
Arrays.sort(joins);
Map<String, Fetch> flattenFetches = new HashMap<>();
for (String join : joins) {
try {
if (join.contains(".")) {
String[] subrelations = join.split("\\.");
Fetch lastRelation = null;
int i;
for (i = subrelations.length - 1; i >= 0; i--) {
String subJoin = String.join(".", Arrays.copyOf(subrelations, i));
if (flattenFetches.containsKey(subJoin)) {
lastRelation = flattenFetches.get(subJoin);
break;
}
}
if (lastRelation == null) {
lastRelation = root.fetch(subrelations[0], JoinType.LEFT);
flattenFetches.put(subrelations[0], lastRelation);
i = 1;
}
for (; i < subrelations.length; i++) {
String relation = subrelations[i];
String path = String.join(".", Arrays.copyOf(subrelations, i + 1));
if (i == subrelations.length - 1) {
Fetch fetch = lastRelation.fetch(relation, JoinType.LEFT);
flattenFetches.put(path, fetch);
} else {
lastRelation = lastRelation.fetch(relation, JoinType.LEFT);
flattenFetches.put(path, lastRelation);
}
}
} else {
Fetch fetch = root.fetch(join, JoinType.LEFT);
flattenFetches.put(join, fetch);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
and to use it I just have to do for example :
employeeController.getAll("punches", "currentSchedule.shifts", "defaultDepartment.currentSchedule.shifts",
"defaultDepartment.company.currentSchedule.shifts", "bankExtras")
I would like to comment the algorithm but I do not have time and it's pretty easy to understand
I've got an Entity class named Fee. After performing an initial query, if more than one Fee is returned, I'd like to add some (WHERE) conditions only on the rows returned by the first query. Since a code snippet is better than a thousand words, here it is:
// Results of initial query
List<Fee> fees = queryFindFees.getResultList();
if (fees == null || fees.size() <= 0)
return null;
if (fees.size() == 1) {
Fee f = fees.get(0);
jpa.refresh(f);
return f;
}
// More than one fee found
String sqlBase = "SELECT f FROM Fee f WHERE f IN :fees";
String sqlComplete = sqlBase/* + " AND f.isValid = ?1"*/;
queryFindFees = jpa.createQuery(sqlComplete);
queryFindFees.setParameter("fees", fees);
// Commented out for test
// queryFindFees.setParameter(1, "Y");
List<Fee> specFees = queryFindFees.getResultList();
Now, since I'm actually asking for the EntityManager to return all the Fees already returned in the previous query, I would expect the same resultset. Instead the list specFees is always empty.
What am I doing wrong?
Thanks in advance,
Luca
EDIT 1: Details of Entity class Fee
#Entity
public class Fee implements Serializable {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(unique = true, nullable = false)
private long id;
#Column(nullable = false, length = 1)
private String isValid;
...
#Override
public boolean equals(Object arg0) {
if (!(arg0 instanceof Fee))
return false;
Fee f0 = (Fee) arg0;
return f0.getId() == this.getId();
}
}
The generated SQL (sqlComplete variable) is as expected:
SELECT f FROM Fee f WHERE f IN :fees
EDIT 2: As suggested by Deepak, using the collection of IDs works:
String sqlBase = "SELECT f FROM Fee f WHERE f.id IN :feesIds";
String sqlComplete = sqlBase/* + " AND f.isValid = ?1"*/;
List<Long> feesIds = new ArrayList<Long>();
for (Fee f : fees) {
feesIds.add(f.getId());
}
queryFindFees = jpa.createQuery(sqlComplete);
queryFindFees.setParameter("feesIds", feesIds);
Using this code the query works (returns all the original Fees), however if possible I would like to avoid using the for cycle because the number of Fee instances may be very large...
Looks like your overridden equals method is creating problem .Make your id Long Object and use the below overridden method .Alternatively u can use this query also
SELECT f FROM Fee f WHERE f.id IN :fees
this time your fees parameter will contain list of ids of Fee objects
#Entity
public class Fee implements Serializable {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(unique = true, nullable = false)
private Long id;
#Column(nullable = false, length = 1)
private String isValid;
...
#Override
public boolean equals(Object arg0) {
if (!(arg0 instanceof Fee))
return false;
Fee f0 = (Fee) arg0;
return f0.getId().equals(this.getId());
}
}