Regex changes to a DDL (Java)

Regex changes to a DDL (Java) - java

I have a process that gets a DDL from Impala and makes a few changes for it to work on SQL Server.
I get something like this from Impala
CREATE EXTERNAL TABLE xxx.yyy (
year INT,
day INT,
mmm_yyyy DATE,
2target_revenue_day DECIMAL(38,6),
2budget_day DECIMAL(38,6),
last_6_months STRING,
load_timestamp TIMESTAMP
)
STORED AS PARQUET
LOCATION 's3a://xxx'
TBLPROPERTIES ('')
I managed to remove the "EXTERNAL TABLE" bit as I only need "TABLE",
changed "STRING" to "VARCHAR" and "TIMESTAMP" to "DATETIME2".
Also removed the bit at the bottom, i.e STORED AS PARQUET
LOCATION 's3a://xxx'
TBLPROPERTIES ('')
My problem is, some of the column names like year, day and 2target_revenue_day I am going to need to wrap in quotes otherwise script won't work (reserved words, name starts with a digit).
I need to find a way to either wrap all column names in quotes or just the ones which are reserved words and start with a digit.
Any idea how to go about it?
Thank you

You could key the pattern off of a word immediately preceding one of a set of known data types. Depending on when you perform that step, you'll need to customize that list to match either the Impala or the SQL Server types.
(\w+)\s+(?:BOOLEAN|CHAR|DATE|DECIMAL|DOUBLE|FLOAT|INT|REAL|STRING|TIMESTAMP|VARCHAR|etc)

With regards to columns start with a digit,
this has worked for me:
variable.replaceAll("(\\d{1}[a-z]+[a-z0-9_]*)", "\"$0\"");
It finds anything with a number in the beginning of the column name and wraps it in quotes.
With regards to reserved words, I've had to manually look for words like year, month, day, date, etc. and replace them a quoted name, e.g "year", "month", etc.
variable.replace(" date ", " \"date\" ").replace(" year ", " \"year\" ").replace(" month ", " \"month\" ").replace(" day ", " \"day\" ");
I hope someone will find this useful.

Related

Generate an account number with sequence

I have a requirement where I need to generate an account number and insert it into a table column in the following format.
"TBA2222011300000001" = where "TBA" is the value of another column or user sent data "22220113" implies the current date and "00000001" is a seven digit sequence that needs to be incremented and appended for every insert.
How can I append the sequence to the column, Should I do it in java or can it be done at DB end. I am currently using postgres with java and spring boot.

https://www.postgresql.org/docs/current/ddl-generated-columns.html
A generated column is a special column that is always computed from
other columns.
Several restrictions apply to the definition of
generated columns and tables involving generated columns:
The generation expression can only use immutable functions and cannot use subqueries or reference anything other than the current row
in any way.
now() is mutable function, so you cannot use Generated columns.
I am not sure why Default not working.
https://www.postgresql.org/docs/current/ddl-default.html
So now the only option is trigger.
CREATE TABLE account_info(
account_id INT GENERATED ALWAYS AS IDENTITY,
account_type text not null,
acconut_number text ) ;
So what you want is to automate:
UPDATE account_info set
account_number =
concat(
account_type,
to_char(CURRENT_DATE, 'yyyymmdd'),
to_char(account_id, 'FM00000000'));
create the function
create or replace function update_account_number() returns trigger as $$
BEGIN
UPDATE account_info set
account_number =
concat(
account_type,
to_char(CURRENT_DATE, 'yyyymmdd'),
to_char(account_id, 'FM00000000'));
RETURN NULL;
end;
$$ LANGUAGE plpgSQL;
create the trigger:
CREATE OR REPLACE TRIGGER udpate_accout_number
AFTER INSERT ON account_info
FOR EACH ROW
EXECUTE FUNCTION update_account_number();

Have an id column which is identity in postgres with start and end index as required.
For your reference to create identity column as desired
https://www.postgresqltutorial.com/postgresql-identity-column/
Have 1 more column for createdDate.
Then account number is simply a derived value TBA + formatted(DATE) + formatted(Id).
Ex -
No not a trigger just a function. There won't be any account number column in your table. It will simply be a function which takes date and identity as input and gives account number as output. Since account number is only dependent on id and date. No need to store this value at all, whenever you need the account number just call that function. Account number will not exist at all. It will always be calculated based on id and date. Simple.
Refer this in the article
Method 1: Derived Value called "markup"
The first method we may want to add to this table is a accountNumber method, for calculating our accountNumber based on current date and id. Since this value will always be based on two other stored values, there is no sense in storing it (other than possibly in a pre-calculated index). To do this, we:
CREATE FUNCTION accountNumber(id,date) RETURNS varchar AS
$$ SELECT TBA + format(id) + format(date)
$$ LANGUAGE SQL IMMUTABLE;
You need to put logic for format(id) and format(date) as per your requirement.
There is no point of storing the value which can be easily derived from other 2 columns. It would unnecessary consume space. Also maintaining data integrity and checks will be an overhead.
Creating function for derived values
https://ledgersmbdev.blogspot.com/2012/08/postgresql-or-modelling-part-2-intro-to.html
You can use the function in output as well as search.
Index would also be utilized as required.

I did the following to generate the desired account number.
Created a new sequence and appended zeros to it.
select to_char(nextval('finance_accounts_id_seq'), 'fm00000000')
Got the Current date in java using DateTimeFormatter
DateTimeFormatter dmf = DateTimeFormatter.ofPattern("yyyyMMdd");
String date = LocalDate.now().format(dmf);
Got the "TBA" from request param of the user.

PreparedStatement Update showing error ORA-00927 missing equal sign

I am trying to update the records in database according to data read from an Excel sheet. I have more than 50 columns in db whose column names are stored in an array columnNames[].
I use following code to create the Sql query.
String sqlUpdate= "Update "+tableName+
" set "+columnNames[0]+"=?";
for (int i=1;i<columnCount;i++)
{
sqlUpdate= sqlUpdate+","+columnNames[i]+"=?";
}
sqlUpdate= sqlUpdate+
" where demand_id=?";
the equivalent query obtained to printing it on console is :
Update fulfillment_plan set DEMAND_ID=?,SBU=?,PROJ_DOMAIN=?,JOBCODE=?,INDENT_STATUS=?,JC_CREATED_ON=?,PROJECT_NAME=?,CUSTOMER_NAME=?,GROUP_CUSTOMER=?,US_DEMANDS=?,SUITE_NAME=?,ROLE_NAME=?,LOCATION=?,COUNTRY=?,GEO=?,AREA=?,OPEN_POS=?,PRODUCT=?,DEMAND_TYPE=?,POSITIONS_TO_FULFILL_Q4=?,FULFILLMENT_PLAN_Q4=?,TA_STATUS_Q4=?,POSITIONS_TO_FULFILL_Q3=?,FULFILLMENT_PLAN_Q3=?,TA_STATUS_Q3=?,POSITIONS_TO_FULFILL_Q2=?,FULFILLMENT_PLAN_Q2=?,TA_STATUS_Q2=?,POSITIONS_TO_FULFILL_Q1=?,FULFILLMENT_PLAN_Q1=?,TA_STATUS_Q1=?,NET_ADD_TYPE=?,ESSENTIAL_SKILL=?,SUITE_SKILLS=?,ADDITIONAL_SKILLS=?,POSITIONS_WITH_PROPOSALS=?,POSITIONS_WITHOUT_PROPOSALS=?,DEM_ST_DATE=?,OVER_DUE_STATUS=?,OVERDUE_DAYS=?,LEAD_TIME_DAYS=?,LEAD TIME BUCKET=?,DEM_END_DATE=?,CREATED_ON=?,INDENT_CREATED_ON=?,EBD=?,OPPORTUNITYID=?,LOAD_DATE=?,PROJECT_NUMBER=?,CUSTOMER_NO=?,CUSTOMER_SUB_GEO=?,DEMAND_STATUS=?,ENGAGEMENT_TYPE=?,INVOICE_TYPE=?,INDENT_CLASSIFICATIONS=?,PROJ_STAT=?,EFD_SLA=?,RM_EMP_NAME=?,MONTH=?,QUARTER=?,YEAR=?,ACCOUNT_ID=?,ACCOUNT_TEXT=?,STATUS=? where demand_id=?
Then i have set the values to the '?' and on executing the above prepared statement in am getting the "missing equal sign" error. I have been looking into it for around 3 hours now and am not able to solve it. Kindly help.

I suspect this is due to the LEAD TIME BUCKET column name, which should either have underscores (like the other column names) or be escaped somehow - the spaces within the column name are causing the error. It would be better to have underscores in order to be consistent with your other columns, and to make the SQL simpler.
(I'd also suggest adding spaces within your SQL - e.g. one after every comma - so that the SQL can be reformatted in a text editor by line-breaking on spaces, making it easier to read. I'd have more whitespace in the Java code too, but that's clearly a matter of personal/team preference.)

How to perform 'between' in Arabic (hijri) calendar and save it as 'date' in MySQL?

My question is actually my ultimate aim.
So far, I am having 2 issues.
How to save arabic date as a 'date' in mysql?
because, I have been converting Gregorian to Hijri and then, using preg_replace (php, for now, final is in Java) would change the numbers to arabic ascii hex... and then, save it in MySQL as varchar.
I know about collation cp1256_general_ci which allows us to store in arabic but, currently, for simplicity sake, I have put it aside. utf-8_general is doing fine too. So storing as varchar is not an issue, storing as 'date' is.
Performing queries on it.
I thought the requirements would end there but, now the task is to perform queries like date 'between' xyz and pqr... Also, the constraint is to 'store it in arabic only'.
Any inputs are much appreciated.

SQL dates
I'd think about it like this: the server actually stores a date as a reference to a given day. How it does that is no concern of yours. When storing data to or reading data from such a date column, the server represents that date using a specific calendar, which is gregorian by convention. What I'm trying to say is, I wouldn't consider the stored value to be gregorian, although it may well be. I would rather consider the transferred date to be gregorian.
So the best solution, in my opinion, is accepting that fact and converting between Gregorian and Hijri on the application side. That way, you could use normal between checks on that.
Strings made up from numbers
If this is not possible, due to the fact that the locale-dependent conversion is too complicated, or because the mapping betwen Hijri and Grogorian is not unique or not known in advance, then you will have to store the date in some other form. Possible forms that come to my mind are either a varchar containing strings of the form YYYY-MM-DD, with the letters signifying digits. This scheme ensures that strings would compare like the dates they represent, so you could still use between on them. Turning these strings back into spelled out dates would still be tricky, though.
One or more numeric columns
So I would actually suggest you use three columns., each containing a number signifying a date, You could then use 10000*year + 100*month + day_of_month to obtain a single number for each day, which you could use for comparisons and between. On the other hand, you could use the function ELT in your queries to turn the number for the month back into a name. If performance is an issue, you might be better of storing just a single number, and splitting it into parts upon selection. In a Gregorian calendar, this would look like this:
CREATE TABLE tableName (myDate DECIMAL(8));
SELECT myDate DIV 10000 AS year,
ELT((myDate DIV 100) MOD 100, "Jan", "Feb", …) AS month,
myDate MOD 100 AS day_of_month
FROM tableName
WHERE myDate BETWEN 20121021 AND 20121023;
Compatibility and convenience
If you have to maintain read-only compatibility with code that expects a single textual date column, you could use a VIEW to provide that. For example for a German Gregorian DD. MMMM YYYY format, you could use code like this:
CREATE VIEW compatibleName AS
SELECT CONCAT(myDate MOD 100, ". ",
ELT((myDate DIV 100) MOD 100, "Januar", "Februar", …), ". ",
myDate DIV 10000) as dateString,
* -- or explicitely name other columns needed for compatibility
FROM tableName
Decoding strings
If you need read-write access by another application using a string format, you'll have to parse those strings yourself. You can do that at the SQL level. Useful tools are SUBSTRING_INDEX to split the string into fields and FIELD to turn a month name into a number. You might want to add a trigger to the database which will ensure that your strings will always be in a valid format which you can decompose in this way. This question gives details on how to use triggers to enforce such checks.

you can store as date directly. I am usind normal date. my mysql functions are
DELIMITER $$
DROP FUNCTION IF EXISTS `kdmtest`.`IntPart` $$
CREATE FUNCTION `kdmtest`.`IntPart` (FloatNum float) RETURNS INT
BEGIN
if (floatNum< -0.0000001) then
return ceil(floatNum-0.0000001);
else
return floor(floatNum+0.0000001);
end if;
END $$
DELIMITER ;
DELIMITER $$
DROP FUNCTION IF EXISTS `kdmtest`.`Hicri` $$
CREATE DEFINER=`root`#`localhost` FUNCTION `Hicri`(MiladiTarih date) RETURNS date
BEGIN
declare d,m,y,jd,l,n,j int;
set d=day(MiladiTarih);
set m=month(MiladiTarih);
set y=year(MiladiTarih);
if ((y>1582) or((y=1582) and (m>10)) or ((y=1582) and (m=10) and (d>14))) then
set jd=intpart((1461*(y+4800+intpart((m-14)/12)))/4)+intpart((367*(m-2-12*(intpart((m-14)/12))))/12)- intpart( (3* (intpart( (y+4900+ intpart( (m-14)/12) )/100) ) ) /4)+d-32075;
else
set jd = 367*y-intpart((7*(y+5001+intpart((m-9)/7)))/4)+intpart((275*m)/9)+d+1729777;
end if;
set l=jd-1948440+10632;
set n=intpart((l-1)/10631);
set l=l-10631*n+354;
set j=(intpart((10985-l)/5316))*(intpart((50*l)/17719))+(intpart(l/5670))*(intpart((43*l)/15238));
set l=l-(intpart((30-j)/15))*(intpart((17719*j)/50))-(intpart(j/16))*(intpart((15238*j)/43))+29;
set m=intpart((24*l)/709);
set d=l-intpart((709*m)/24);
set y=30*n+j-30;
return concat(y,'-',m,'-',d);
END $$
DELIMITER ;
DELIMITER $$
DROP FUNCTION IF EXISTS `kdmtest`.`Miladi` $$
CREATE FUNCTION `kdmtest`.`Miladi` (HicriTarih date) RETURNS date
BEGIN
declare d,m,y,jd,l,n,j,i,k int;
set d=day(HicriTarih);
set m=month(HicriTarih);
set y=year(HicriTarih);
set jd=intPart((11*y+3)/30)+354*y+30*m-intPart((m-1)/2)+d+1948440-385;
if (jd> 2299160 ) then
set l=jd+68569;
set n=intPart((4*l)/146097);
set l=l-intPart((146097*n+3)/4);
set i=intPart((4000*(l+1))/1461001);
set l=l-intPart((1461*i)/4)+31;
set j=intPart((80*l)/2447);
set d=l-intPart((2447*j)/80);
set l=intPart(j/11);
set m=j+2-12*l;
set y=100*(n-49)+i+l;
else
set j=jd+1402;
set k=intPart((j-1)/1461);
set l=j-1461*k;
set n=intPart((l-1)/365)-intPart(l/1461);
set i=l-365*n+30;
set j=intPart((80*i)/2447);
set d=i-intPart((2447*j)/80);
set i=intPart(j/11);
set m=j+2-12*i;
set y=4*k+n+i-4716;
end if;
return concat(y,'-',m,'-',d);
END $$
DELIMITER ;

Escape sequence when adding multiple records to DerbyDB

I'm converting (or trying to) an Ms AccessDB into derby.
When I extract the data from certain varchar / text / memo field from access they are filled with apostrophe, and mathematical symbols (percent, less than etc), and possible foreign characters
I need to keep these and I test for them so as I can use an 'escape sequence' to ensure they get put into the database.
However for now I am unable to get the data into the DB without it failing on these fields. When the SQL fails I output the SQL string, and cut and past it into ij. Then I modify just the first record, and it is always these characters that cause me grief.
I've tried to modify the strings by surrounding with "double quote marks" but that just gives a different error (stating that it has 'enounterd """ at line1 column x' which is always the first occurance of the double quote).
I haven't found a setting in derby to alter the behaviour for strings, yet. Is there one?
I have also tried to set the SQL statment to a preparedStatement then use the {call preparedStatement} again this fails also. I can't use the {escape "escape char} in a normal statment as derby just says incorrect syntax at me.
How do others manage to get user content with strange characters into a field in derby?
Do I need to change my field into a CLOB or something other than varchar / long Varchar?
Are my problems being caused by using the wrong characteset (eg iso rather UTF-8), how do I tell what it is, how to change it?
Below is a sample of the SQL insert that fails when I send it to derby (via my JAVA 'programme')
insert into S1.SORTIEDESSAI (OBS, DATEDUSORTIE, CONTREINDIC, FIN,
PDEVU, REFUS, INVDECISN, ADMIN, MOTIF_DE_LA_SORTIE, NOMVALIDEE,
DATEVALIDEE) values ('"0001/0001"' , '2007-07-15' , false , true ,
'"null"' , '"null"' , '"null"' , '"null"' , '"2. FIN DE L’ESSAI"' ,
'"DR SIMON"' , '2011-04-19' )
Note:
Actually I look at the above and notice that the order of columns names isn't good? It was OK yesterday, not sure why it would have changed? something to do with Access returning the column names in a random order from the resultSetMetaData, which would be a surprise.
for now I recomend any further answers to hold off whilst I sort this problem out, OK solved that problem, do I need to set another question about this behaviour....
Back to the main thread...
Ok as you can see on my SQL statement I have wrapped any varchar fields in double quotes. This always fails (even directly through ij). help help help...

I'm not quite sure what your question is, but in general you can input these characters by using a PreparedStatement of the form: INSERT INTO tablename (columnname) values (?), and then using the PreparedStatement.setString() method to supply your character data for that column.

Service usage limiter implementation

I need to limit multiple service usages for multiple customers. For example, customer customer1 can send max 1000 SMS per month. My implementation is based on one MySQL table with 3 columns:
date TIMESTAMP
name VARCHAR(128)
value INTEGER
For every service usage (sending SMS) one row is inserted to the table. value holds usage count (eg. if SMS was split to 2 parts then value = 2). name holds limiter name (eg. customer1-sms).
To find out how many times the service was used this month (March 2011), a simple query is executed:
SELECT SUM(value) FROM service_usage WHERE name = 'customer1-sms' AND date > '2011-03-01';
The problem is that this query is slow (0.3 sec). We are using indexes on columns date and name.
Is there some better way how to implement service usage limitation? My requirement is that it must be flexibile (eg. if I need to know usage within last 10 minutes or another within current month). I am using Java.
Thanks in advance

You should have one index on both columns, not two indexes on each of the columns. This should make the query very fast.
If it still doesn't, then you could use a table with a month, a name and a value, and increment the value for the current month each time an SMS is sent. This would remove the sum from your query. It would still need an index on (month, name) to be as fast as possible, though.

I found one solution to my problem. Instead of inserting service usage increment, I will insert the last one incremented:
BEGIN;
-- select last the value
SELECT value FROM service_usage WHERE name = %name ORDER BY date ASC LIMIT 1;
-- insert it to the database
INSERT INTO service_usage (CURRENT_TIMESTAMP, %name, %value + %increment);
COMMIT;
To find out service usage since %date:
SELECT value AS value1 FROM test WHERE name = %name ORDER BY date DESC LIMIT 1;
SELECT value AS value2 FROM test WHERE name = %name AND date <= %date ORDER BY date DESC LIMIT 1;
The result will be value1 - value2
This way I'll need transactions. I'll probably implement it as stored procedure.
Any additional hints are still appreciated :-)

It's worth trying to replace your "=" with "like". Not sure why, but in the past I've seen this perform far more quickly than the "=" operator on varchar columns.
SELECT SUM(value) FROM service_usage WHERE name like 'customer1-sms' AND date > '2011-03-01';
Edited after comments:
Okay, now I can sorta re-create your issue - the first time I run the query, it takes around 0.03 seconds, subsequent runs of the query take 0.001 second. Inserting new records causes the query to revert to 0.03 seconds.
Suggested solution:
COUNT does not show the same slow-down. I would change the business logic so every time the user sends and SMS you insert the a record with value "1"; if the message is a multipart message, simply insert two rows.
Replace the "sum" with a "count".
I've applied this to my test data, and even after inserting a new record, the "count" query returns in 0.001 second.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.