Flink SQL: Repeating grouping keys in result of GROUP BY query

Flink SQL: Repeating grouping keys in result of GROUP BY query - java

I want to do a simple query in Flink SQL in one table which include a group by statement. But in the results there are duplicate rows for the column specified in the group by statement. Is that because I use a streaming environment and it doesn't remember the state ?
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
final StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
// configure Kafka consumer
Properties props = new Properties();
props.setProperty("bootstrap.servers", "localhost:9092"); // Broker default host:port
props.setProperty("group.id", "flink-consumer"); // Consumer group ID
FlinkKafkaConsumer011<BlocksTransactions> flinkBlocksTransactionsConsumer = new FlinkKafkaConsumer011<>(args[0], new BlocksTransactionsSchema(), props);
flinkBlocksTransactionsConsumer.setStartFromEarliest();
DataStream<BlocksTransactions> blocksTransactions = env.addSource(flinkBlocksTransactionsConsumer);
tableEnv.registerDataStream("blocksTransactionsTable", blocksTransactions);
Table sqlResult
= tableEnv.sqlQuery(
"SELECT block_hash, count(tx_hash) " +
"FROM blocksTransactionsTable " +
"GROUP BY block_hash");
DataStream<Test> resultStream = tableEnv
.toRetractStream(sqlResult, Row.class)
.map(t -> {
Row r = t.f1;
String field2 = r.getField(0).toString();
long count = Long.valueOf(r.getField(1).toString());
return new Test(field2, count);
})
.returns(Test.class);
resultStream.print();
resultStream.addSink(new FlinkKafkaProducer011<>("localhost:9092", "TargetTopic", new TestSchema()));
env.execute();
I use the group by statement for the block_hash column but I have several times the same block_hash. This is the result of the print() :
Test{field2='0x2c4a021d514e4f8f0beb8f0ce711652304928528487dc7811d06fa77c375b5e1', count=1}
Test{field2='0x2c4a021d514e4f8f0beb8f0ce711652304928528487dc7811d06fa77c375b5e1', count=1}
Test{field2='0x2c4a021d514e4f8f0beb8f0ce711652304928528487dc7811d06fa77c375b5e1', count=2}
Test{field2='0x780aadc08c294da46e174fa287172038bba7afacf2dff41fdf0f6def03906e60', count=1}
Test{field2='0x182d31bd491527e1e93c4e44686057207ee90c6a8428308a2bd7b6a4d2e10e53', count=1}
Test{field2='0x182d31bd491527e1e93c4e44686057207ee90c6a8428308a2bd7b6a4d2e10e53', count=1}
How can I fix this without using BatchEnvironment ?

A GROUP BY query that runs on a stream must produce updates. Consider the following example:
SELECT user, COUNT(*) FROM clicks GROUP BY user;
Every time, the clicks table receives a new row, the count of the respective user needs to be incremented and updated.
When you convert a Table into a DataStream, these updates must be encoded in the stream. Flink uses retraction and add messages to do that. By calling tEnv.toRetractStream(table, Row.class), you convert the Table table into a DataStream<Tuple2<Boolean, Row>. The Boolean flag is important and indicates whether the Row is added or retracted from the result table.
Given the example query above and the input table clicks as
user | ...
------------
Bob | ...
Liz | ...
Bob | ...
You will receive the following retraction stream
(+, (Bob, 1)) // add first result for Bob
(+, (Liz, 1)) // add first result for Liz
(-, (Bob, 1)) // remove outdated result for Bob
(+, (Bob, 2)) // add updated result for Bob
You need to actively maintain the result yourself and add and remove rows as instructed by the Boolean flag of the retraction stream.

Related

How to generate arbitrary subqueries/joins in a Jooq query

Situation: I am porting our application to Jooq to eliminate several n+1 problems and ensure custom queries are type-safe (DB server is Postgresql 13). In my example we have documents (ID, file name, file size). Each document can have several unqique document attributes (Document ID as FK, archive attribute ID - the type of the attribute and the value). Example data:
Document:
acme=> select id, file_name, file_size from document;
id | file_name | file_size
--------------------------------------+-------------------------+-----------
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | My Really cool book.pdf | 13264
(1 row)
Document Attributes:
acme=> select * from document_attribute ;
document_id | archive_attribute_id | value
--------------------------------------+--------------------------------------+------------
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | b334e287-887f-4173-956d-c068edc881f8 | JustReleased
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | 2f86a675-4cb2-4609-8e77-c2063ab155f1 | Tax
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | 30bb9696-fc18-4c87-b6bd-5e01497ca431 | ShippingRequired
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | 2eb04674-1dcb-4fbc-93c3-73491deb7de2 | Bestseller
1ae56478-d27c-4b68-b6c0-a8bdf36dd341 | a8e2f902-bf04-42e8-8ac9-94cdbf4b6778 | Paperback
(5 rows)
One can search via custom created JDBC prepared statement for these documents and their attribute. A user was able to create this query for a document ID and two document attributes with matching value, which returned the book 'My Really cool book.pdf':
SELECT d.id FROM document d WHERE d.id = '1ae56478-d27c-4b68-b6c0-a8bdf36dd341'
AND d.id IN(SELECT da.document_id AS id0 FROM document_attribute da WHERE da.archive_attribute_id = '2eb04674-1dcb-4fbc-93c3-73491deb7de2' AND da.value = 'Bestseller')
AND d.id IN(SELECT da.document_id AS id1 FROM document_attribute da WHERE da.archive_attribute_id = 'a8e2f902-bf04-42e8-8ac9-94cdbf4b6778' AND da.value = 'Paperback');
(After that the application fetches all document attributes for the returned document IDs - thus the n + 1 problem we want to solve)
Please note that all document values and document attributes are optional. One can only search for the file name or file size of a document but also several document attributes.
Question/Problems:
I wanted to port this code to Jooq and use a multiset, but I am struggeling how to apply the arbitrary subquery or join condition to the document attributes:
1.) How can I achieve this arbitrary adding of subqueries?
2.) Is a INNER JOIN more performant than a subquery?
Code:
import org.jooq.Condition;
import org.jooq.impl.DSL;
import org.junit.jupiter.api.Test;
import java.util.List;
import java.util.Map;
import java.util.UUID;
import static org.jooq.impl.DSL.multiset;
import static org.jooq.impl.DSL.selectDistinct;
public class InSelectExample extends BaseTest {
private record CustomDocumentAttribute(
UUID documentId, // ID of the document the attribute belongs to
UUID archiveAttributeId, // There are predefined attribute types in our application. This ID references them
String value // Real value of this attribute for the document
) {
}
private record CustomDocument(
UUID documentId, // ID of the document
String fileName, // File name of the document
Integer fileSize, // File size in bytes of the document
List<CustomDocumentAttribute> attributes // Attributes the document has
) {
}
#Test
public void findPdfDocumentsWithParameters() {
// Should print the single book
List<CustomDocument> documents = searchDocuments(UUID.fromString("1ae56478-d27c-4b68-b6c0-a8bdf36dd341"), "My Really cool book.pdf", 13264, Map.of(
UUID.fromString("2eb04674-1dcb-4fbc-93c3-73491deb7de2"), "Bestseller",
UUID.fromString("a8e2f902-bf04-42e8-8ac9-94cdbf4b6778"), "Paperback"
));
System.out.println("Size: " + documents.size()); // Should return 1 document
// Should print no books because one of the document attribute value doesn't match (Booklet instead of Paperback)
documents = searchDocuments(UUID.fromString("1ae56478-d27c-4b68-b6c0-a8bdf36dd341"), "My Really cool book.pdf", 13264, Map.of(
UUID.fromString("2eb04674-1dcb-4fbc-93c3-73491deb7de2"), "Bestseller",
UUID.fromString("a8e2f902-bf04-42e8-8ac9-94cdbf4b6778"), "Booklet"
));
System.out.println("Size: " + documents.size()); // Should return 0 documents
}
private List<CustomDocument> searchDocuments(UUID documentId, String fileName, Integer fileSize, Map<UUID, String> attributes) {
// Get the transaction manager
TransactionManager transactionManager = getBean(TransactionManager.class);
// Get the initial condition
Condition condition = DSL.noCondition();
// Check for an optional document ID
if (documentId != null) {
condition = condition.and(DOCUMENT.ID.eq(documentId));
}
// Check for an optional file name
if (fileName != null) {
condition = condition.and(DOCUMENT.FILE_NAME.eq(fileName));
}
// Check for an optional file size
if (fileSize != null) {
condition = condition.and(DOCUMENT.FILE_SIZE.eq(fileSize));
}
// Create the query
var step1 = transactionManager.getDslContext().select(
DOCUMENT.ID,
DOCUMENT.FILE_NAME,
DOCUMENT.FILE_SIZE,
multiset(
selectDistinct(
DOCUMENT_ATTRIBUTE.DOCUMENT_ID,
DOCUMENT_ATTRIBUTE.ARCHIVE_ATTRIBUTE_ID,
DOCUMENT_ATTRIBUTE.VALUE
).from(DOCUMENT_ATTRIBUTE).where(DOCUMENT_ATTRIBUTE.DOCUMENT_ID.eq(DOCUMENT.ID))
).convertFrom(record -> record.map(record1 -> new CustomDocumentAttribute(record1.value1(), record1.value2(), record1.value3())))
).from(DOCUMENT
).where(condition);
// TODO: What to do here?
var step3 = ...? What type?
for (Map.Entry<UUID, String> attributeEntry : attributes.entrySet()) {
// ???
// Reference: AND d.id IN(SELECT da.document_id AS id0 FROM document_attribute da WHERE da.archive_attribute_id = ? AND da.value = ?)
var step2 = step1.and(...??????)
}
// Finally fetch and return
return step1.fetch(record -> new CustomDocument(record.value1(), record.value2(), record.value3(), record.value4()));
}
}

Regarding your questions
1.) How can I achieve this arbitrary adding of subqueries?
You already found a solution to that question in your own answer, though I'll suggest an alternative that I personally prefer. Your approach creates N subqueries hitting your table N times.
2.) Is a INNER JOIN more performant than a subquery?
There's no general rule to this. It's all just relational algebra. If the optimiser can prove two expressions are the same thing, they can be transformed to each other. However, an INNER JOIN is not the exact same thing as a semi join, i.e. an IN predicate (although sometimes it is, in the presence of constraints). So the two operators aren't exactly equivalent, logically
An alternative approach
Your own approach maps the Map<UUID, String> to subqueries, hitting the DOCUMENT_ATTRIBUTE N times. I'm guessing that the PG optimiser might not be able to see through this and factor out the common parts into a single subquery (though technically, it could). So, I'd rather create a single subquery of the form:
WHERE document.id IN (
SELECT a.document_id
FROM document_attribute AS a
WHERE (a.archive_attribute_id, a.value) IN (
(?, ?),
(?, ?), ...
)
)
Or, dynamically, with jOOQ:
DOCUMENT.ID.in(
select(DOCUMENT_ATTRIBUTE_DOCUMENT_ID)
.from(DOCUMENT_ATTRIBUTE)
.where(row(DOCUMENT_ATTRIBUTE.ARCHIVE_ATTRIBUTE_ID, DOCUMENT_ATTRIBUTE.VALUE).in(
attributes.entrySet().stream().collect(Rows.toRowList(
Entry::getKey,
Entry::getValue
))
))
)
Using org.jooq.Rows::toRowList collectors.
Note: I don't think you have to further correlate the IN predicate's subquery by specifying a DOCUMENT_ATTRIBUTE.DOCUMENT_ID.eq(DOCUMENT.ID) predicate. That correlation is already implied by using IN itself.

After reading another question jOOQ - join with nested subquery (and not realizing the solution) and playing around with generating Java code via https://www.jooq.org/translate/, it clicked. In combination with reading https://www.jooq.org/doc/latest/manual/sql-building/column-expressions/scalar-subqueries/ one can simple add the subquery as IN() condition before executing the query. To be honest I am not sure if this is the most performant solution. The searchDocuments method then looks like this:
private List<CustomDocument> searchDocuments(UUID documentId, String fileName, Integer fileSize, Map<UUID, String> attributes) {
// Get the transaction manager
TransactionManager transactionManager = getBean(TransactionManager.class);
// Get the initial condition
Condition condition = DSL.noCondition();
// Check for an optional document ID
if (documentId != null) {
condition = condition.and(DOCUMENT.ID.eq(documentId));
}
// Check for an optional file name
if (fileName != null) {
condition = condition.and(DOCUMENT.FILE_NAME.eq(fileName));
}
// Check for an optional file size
if (fileSize != null) {
condition = condition.and(DOCUMENT.FILE_SIZE.eq(fileSize));
}
// Check for optional document attributes
if (attributes != null && !attributes.isEmpty()) {
for (Map.Entry<UUID, String> entry : attributes.entrySet()) {
condition = condition.and(DOCUMENT.ID.in(select(DOCUMENT_ATTRIBUTE.DOCUMENT_ID).from(DOCUMENT_ATTRIBUTE).where(DOCUMENT_ATTRIBUTE.DOCUMENT_ID.eq(DOCUMENT.ID).and(DOCUMENT_ATTRIBUTE.ARCHIVE_ATTRIBUTE_ID.eq(entry.getKey()).and(DOCUMENT_ATTRIBUTE.VALUE.eq(entry.getValue()))))));
}
}
// Create the query
return transactionManager.getDslContext().select(
DOCUMENT.ID,
DOCUMENT.FILE_NAME,
DOCUMENT.FILE_SIZE,
multiset(
selectDistinct(
DOCUMENT_ATTRIBUTE.DOCUMENT_ID,
DOCUMENT_ATTRIBUTE.ARCHIVE_ATTRIBUTE_ID,
DOCUMENT_ATTRIBUTE.VALUE
).from(DOCUMENT_ATTRIBUTE).where(DOCUMENT_ATTRIBUTE.DOCUMENT_ID.eq(DOCUMENT.ID))
).convertFrom(record -> record.map(record1 -> new CustomDocumentAttribute(record1.value1(), record1.value2(), record1.value3())))
).from(DOCUMENT
).where(condition
).fetch(record -> new CustomDocument(record.value1(), record.value2(), record.value3(), record.value4()));
}

Script in Groovy to update and insert Initial_Range and Final_Range

Im new on Groovy and need some help.
I have that table called RANGE with :
ID_RANGE - Integer
Year - Integer
six_Month - Integer
Initial_Range - String
Final_Range - String
Last_Update - TimeStamp
I have to do a script to update/insert Initial_Range and Final_Range.
I will receive the new data from a map called "df_map" with the following data:
df_map.Update_date -> String
df_map.six_Month -> Integer
df_map.Initial_range -> String (Format "AA-123456678")
df_map.Final_range -> String (Format "AA-123456678")
That script have to validate some requirements,
if it doesn't, it can´t continues:
the date have to be a valid timeStamp (yyyy-MM-dd HH:mm:ss.SSS)
there must be 1000 values between Initial_Range and Final_Range
Only can update or insert a future date
When check all this, have to search if the register to modify exits, if not exits, have to create it.
that script has helpers to do 2 actions:
sqlQuery -
Make a query against the bbdd and show and array with the data. It get parameters like bbdd,query and parameters maps.
EXAMPLE:
query = "SELECT * FROM RANGE WHERE ID_RANGE = :RANGE"
params = [:]
params.RANGE = 1
outcome = sqlQuery(bbdd,query,params)
-sqlUpdate -
It make an insert or an update against the database and returns an array with the result. It receives as parameters the database, query, parameter map.
EXAMPLE:
query = "UPDATE RANGE SET Initial_Range = :Initial_Range WHERE ID_RANGE = :RANGE"
params = [:]
params.RANGE = 1
outcome = sqlUpdate(bbdd,query,params)
I work with sql but never before with groovy.....
Thanks in advance ;)

My Customer data is being truncated when added to my List [duplicate]

I am running data.bat file with the following lines:
Rem Tis batch file will populate tables
cd\program files\Microsoft SQL Server\MSSQL
osql -U sa -P Password -d MyBusiness -i c:\data.sql
The contents of the data.sql file is:
insert Customers
(CustomerID, CompanyName, Phone)
Values('101','Southwinds','19126602729')
There are 8 more similar lines for adding records.
When I run this with start > run > cmd > c:\data.bat, I get this error message:
1>2>3>4>5>....<1 row affected>
Msg 8152, Level 16, State 4, Server SP1001, Line 1
string or binary data would be truncated.
<1 row affected>
<1 row affected>
<1 row affected>
<1 row affected>
<1 row affected>
<1 row affected>
Also, I am a newbie obviously, but what do Level #, and state # mean, and how do I look up error messages such as the one above: 8152?

From #gmmastros's answer
Whenever you see the message....
string or binary data would be truncated
Think to yourself... The field is NOT big enough to hold my data.
Check the table structure for the customers table. I think you'll find that the length of one or more fields is NOT big enough to hold the data you are trying to insert. For example, if the Phone field is a varchar(8) field, and you try to put 11 characters in to it, you will get this error.

I had this issue although data length was shorter than the field length.
It turned out that the problem was having another log table (for audit trail), filled by a trigger on the main table, where the column size also had to be changed.

In one of the INSERT statements you are attempting to insert a too long string into a string (varchar or nvarchar) column.
If it's not obvious which INSERT is the offender by a mere look at the script, you could count the <1 row affected> lines that occur before the error message. The obtained number plus one gives you the statement number. In your case it seems to be the second INSERT that produces the error.

Just want to contribute with additional information: I had the same issue and it was because of the field wasn't big enough for the incoming data and this thread helped me to solve it (the top answer clarifies it all).
BUT it is very important to know what are the possible reasons that may cause it.
In my case i was creating the table with a field like this:
Select '' as Period, * From Transactions Into #NewTable
Therefore the field "Period" had a length of Zero and causing the Insert operations to fail. I changed it to "XXXXXX" that is the length of the incoming data and it now worked properly (because field now had a lentgh of 6).
I hope this help anyone with same issue :)

Some of your data cannot fit into your database column (small). It is not easy to find what is wrong. If you use C# and Linq2Sql, you can list the field which would be truncated:
First create helper class:
public class SqlTruncationExceptionWithDetails : ArgumentOutOfRangeException
{
public SqlTruncationExceptionWithDetails(System.Data.SqlClient.SqlException inner, DataContext context)
: base(inner.Message + " " + GetSqlTruncationExceptionWithDetailsString(context))
{
}
/// <summary>
/// PArt of code from following link
/// http://stackoverflow.com/questions/3666954/string-or-binary-data-would-be-truncated-linq-exception-cant-find-which-fiel
/// </summary>
/// <param name="context"></param>
/// <returns></returns>
static string GetSqlTruncationExceptionWithDetailsString(DataContext context)
{
StringBuilder sb = new StringBuilder();
foreach (object update in context.GetChangeSet().Updates)
{
FindLongStrings(update, sb);
}
foreach (object insert in context.GetChangeSet().Inserts)
{
FindLongStrings(insert, sb);
}
return sb.ToString();
}
public static void FindLongStrings(object testObject, StringBuilder sb)
{
foreach (var propInfo in testObject.GetType().GetProperties())
{
foreach (System.Data.Linq.Mapping.ColumnAttribute attribute in propInfo.GetCustomAttributes(typeof(System.Data.Linq.Mapping.ColumnAttribute), true))
{
if (attribute.DbType.ToLower().Contains("varchar"))
{
string dbType = attribute.DbType.ToLower();
int numberStartIndex = dbType.IndexOf("varchar(") + 8;
int numberEndIndex = dbType.IndexOf(")", numberStartIndex);
string lengthString = dbType.Substring(numberStartIndex, (numberEndIndex - numberStartIndex));
int maxLength = 0;
int.TryParse(lengthString, out maxLength);
string currentValue = (string)propInfo.GetValue(testObject, null);
if (!string.IsNullOrEmpty(currentValue) && maxLength != 0 && currentValue.Length > maxLength)
{
//string is too long
sb.AppendLine(testObject.GetType().Name + "." + propInfo.Name + " " + currentValue + " Max: " + maxLength);
}
}
}
}
}
}
Then prepare the wrapper for SubmitChanges:
public static class DataContextExtensions
{
public static void SubmitChangesWithDetailException(this DataContext dataContext)
{
//http://stackoverflow.com/questions/3666954/string-or-binary-data-would-be-truncated-linq-exception-cant-find-which-fiel
try
{
//this can failed on data truncation
dataContext.SubmitChanges();
}
catch (SqlException sqlException) //when (sqlException.Message == "String or binary data would be truncated.")
{
if (sqlException.Message == "String or binary data would be truncated.") //only for EN windows - if you are running different window language, invoke the sqlException.getMessage on thread with EN culture
throw new SqlTruncationExceptionWithDetails(sqlException, dataContext);
else
throw;
}
}
}
Prepare global exception handler and log truncation details:
protected void Application_Error(object sender, EventArgs e)
{
Exception ex = Server.GetLastError();
string message = ex.Message;
//TODO - log to file
}
Finally use the code:
Datamodel.SubmitChangesWithDetailException();

Another situation in which you can get this error is the following:
I had the same error and the reason was that in an INSERT statement that received data from an UNION, the order of the columns was different from the original table. If you change the order in #table3 to a, b, c, you will fix the error.
select a, b, c into #table1
from #table0
insert into #table1
select a, b, c from #table2
union
select a, c, b from #table3

on sql server you can use SET ANSI_WARNINGS OFF like this:
using (SqlConnection conn = new SqlConnection("Data Source=XRAYGOAT\\SQLEXPRESS;Initial Catalog='Healthy Care';Integrated Security=True"))
{
conn.Open();
using (var trans = conn.BeginTransaction())
{
try
{
using cmd = new SqlCommand("", conn, trans))
{
cmd.CommandText = "SET ANSI_WARNINGS OFF";
cmd.ExecuteNonQuery();
cmd.CommandText = "YOUR INSERT HERE";
cmd.ExecuteNonQuery();
cmd.Parameters.Clear();
cmd.CommandText = "SET ANSI_WARNINGS ON";
cmd.ExecuteNonQuery();
trans.Commit();
}
}
catch (Exception)
{
trans.Rollback();
}
}
conn.Close();
}

I had the same issue. The length of my column was too short.
What you can do is either increase the length or shorten the text you want to put in the database.

Also had this problem occurring on the web application surface.
Eventually found out that the same error message comes from the SQL update statement in the specific table.
Finally then figured out that the column definition in the relating history table(s) did not map the original table column length of nvarchar types in some specific cases.

I had the same problem, even after increasing the size of the problematic columns in the table.
tl;dr: The length of the matching columns in corresponding Table Types may also need to be increased.
In my case, the error was coming from the Data Export service in Microsoft Dynamics CRM, which allows CRM data to be synced to an SQL Server DB or Azure SQL DB.
After a lengthy investigation, I concluded that the Data Export service must be using Table-Valued Parameters:
You can use table-valued parameters to send multiple rows of data to a Transact-SQL statement or a routine, such as a stored procedure or function, without creating a temporary table or many parameters.
As you can see in the documentation above, Table Types are used to create the data ingestion procedure:
CREATE TYPE LocationTableType AS TABLE (...);
CREATE PROCEDURE dbo.usp_InsertProductionLocation
#TVP LocationTableType READONLY
Unfortunately, there is no way to alter a Table Type, so it has to be dropped & recreated entirely. Since my table has over 300 fields (😱), I created a query to facilitate the creation of the corresponding Table Type based on the table's columns definition (just replace [table_name] with your table's name):
SELECT 'CREATE TYPE [table_name]Type AS TABLE (' + STRING_AGG(CAST(field AS VARCHAR(max)), ',' + CHAR(10)) + ');' AS create_type
FROM (
SELECT TOP 5000 COLUMN_NAME + ' ' + DATA_TYPE
+ IIF(CHARACTER_MAXIMUM_LENGTH IS NULL, '', CONCAT('(', IIF(CHARACTER_MAXIMUM_LENGTH = -1, 'max', CONCAT(CHARACTER_MAXIMUM_LENGTH,'')), ')'))
+ IIF(DATA_TYPE = 'decimal', CONCAT('(', NUMERIC_PRECISION, ',', NUMERIC_SCALE, ')'), '')
AS field
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = '[table_name]'
ORDER BY ORDINAL_POSITION) AS T;
After updating the Table Type, the Data Export service started functioning properly once again! :)

When I tried to execute my stored procedure I had the same problem because the size of the column that I need to add some data is shorter than the data I want to add.
You can increase the size of the column data type or reduce the length of your data.

A 2016/2017 update will show you the bad value and column.
A new trace flag will swap the old error for a new 2628 error and will print out the column and offending value. Traceflag 460 is available in the latest cumulative update for 2016 and 2017:
https://support.microsoft.com/en-sg/help/4468101/optional-replacement-for-string-or-binary-data-would-be-truncated
Just make sure that after you've installed the CU that you enable the trace flag, either globally/permanently on the server:
...or with DBCC TRACEON:
https://learn.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-traceon-trace-flags-transact-sql?view=sql-server-ver15

Another situation, in which this error may occur is in
SQL Server Management Studio. If you have "text" or "ntext" fields in your table,
no matter what kind of field you are updating (for example bit or integer).
Seems that the Studio does not load entire "ntext" fields and also updates ALL fields instead of the modified one.
To solve the problem, exclude "text" or "ntext" fields from the query in Management Studio

This Error Comes only When any of your field length is greater than the field length specified in sql server database table structure.
To overcome this issue you have to reduce the length of the field Value .
Or to increase the length of database table field .

If someone is encountering this error in a C# application, I have created a simple way of finding offending fields by:
Getting the column width of all the columns of a table where we're trying to make this insert/ update. (I'm getting this info directly from the database.)
Comparing the column widths to the width of the values we're trying to insert/ update.
Assumptions/ Limitations:
The column names of the table in the database match with the C# entity fields. For eg: If you have a column like this in database:
You need to have your Entity with the same column name:
public class SomeTable
{
// Other fields
public string SourceData { get; set; }
}
You're inserting/ updating 1 entity at a time. It'll be clearer in the demo code below. (If you're doing bulk inserts/ updates, you might want to either modify it or use some other solution.)
Step 1:
Get the column width of all the columns directly from the database:
// For this, I took help from Microsoft docs website:
// https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.getschema?view=netframework-4.7.2#System_Data_SqlClient_SqlConnection_GetSchema_System_String_System_String___
private static Dictionary<string, int> GetColumnSizesOfTableFromDatabase(string tableName, string connectionString)
{
var columnSizes = new Dictionary<string, int>();
using (var connection = new SqlConnection(connectionString))
{
// Connect to the database then retrieve the schema information.
connection.Open();
// You can specify the Catalog, Schema, Table Name, Column Name to get the specified column(s).
// You can use four restrictions for Column, so you should create a 4 members array.
String[] columnRestrictions = new String[4];
// For the array, 0-member represents Catalog; 1-member represents Schema;
// 2-member represents Table Name; 3-member represents Column Name.
// Now we specify the Table_Name and Column_Name of the columns what we want to get schema information.
columnRestrictions[2] = tableName;
DataTable allColumnsSchemaTable = connection.GetSchema("Columns", columnRestrictions);
foreach (DataRow row in allColumnsSchemaTable.Rows)
{
var columnName = row.Field<string>("COLUMN_NAME");
//var dataType = row.Field<string>("DATA_TYPE");
var characterMaxLength = row.Field<int?>("CHARACTER_MAXIMUM_LENGTH");
// I'm only capturing columns whose Datatype is "varchar" or "char", i.e. their CHARACTER_MAXIMUM_LENGTH won't be null.
if(characterMaxLength != null)
{
columnSizes.Add(columnName, characterMaxLength.Value);
}
}
connection.Close();
}
return columnSizes;
}
Step 2:
Compare the column widths with the width of the values we're trying to insert/ update:
public static Dictionary<string, string> FindLongBinaryOrStringFields<T>(T entity, string connectionString)
{
var tableName = typeof(T).Name;
Dictionary<string, string> longFields = new Dictionary<string, string>();
var objectProperties = GetProperties(entity);
//var fieldNames = objectProperties.Select(p => p.Name).ToList();
var actualDatabaseColumnSizes = GetColumnSizesOfTableFromDatabase(tableName, connectionString);
foreach (var dbColumn in actualDatabaseColumnSizes)
{
var maxLengthOfThisColumn = dbColumn.Value;
var currentValueOfThisField = objectProperties.Where(f => f.Name == dbColumn.Key).First()?.GetValue(entity, null)?.ToString();
if (!string.IsNullOrEmpty(currentValueOfThisField) && currentValueOfThisField.Length > maxLengthOfThisColumn)
{
longFields.Add(dbColumn.Key, $"'{dbColumn.Key}' column cannot take the value of '{currentValueOfThisField}' because the max length it can take is {maxLengthOfThisColumn}.");
}
}
return longFields;
}
public static List<PropertyInfo> GetProperties<T>(T entity)
{
//The DeclaredOnly flag makes sure you only get properties of the object, not from the classes it derives from.
var properties = entity.GetType()
.GetProperties(System.Reflection.BindingFlags.Public
| System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.DeclaredOnly)
.ToList();
return properties;
}
Demo:
Let's say we're trying to insert someTableEntity of SomeTable class that is modeled in our app like so:
public class SomeTable
{
[Key]
public long TicketID { get; set; }
public string SourceData { get; set; }
}
And it's inside our SomeDbContext like so:
public class SomeDbContext : DbContext
{
public DbSet<SomeTable> SomeTables { get; set; }
}
This table in Db has SourceData field as varchar(16) like so:
Now we'll try to insert value that is longer than 16 characters into this field and capture this information:
public void SaveSomeTableEntity()
{
var connectionString = "server=SERVER_NAME;database=DB_NAME;User ID=SOME_ID;Password=SOME_PASSWORD;Connection Timeout=200";
using (var context = new SomeDbContext(connectionString))
{
var someTableEntity = new SomeTable()
{
SourceData = "Blah-Blah-Blah-Blah-Blah-Blah"
};
context.SomeTables.Add(someTableEntity);
try
{
context.SaveChanges();
}
catch (Exception ex)
{
if (ex.GetBaseException().Message == "String or binary data would be truncated.\r\nThe statement has been terminated.")
{
var badFieldsReport = "";
List<string> badFields = new List<string>();
// YOU GOT YOUR FIELDS RIGHT HERE:
var longFields = FindLongBinaryOrStringFields(someTableEntity, connectionString);
foreach (var longField in longFields)
{
badFields.Add(longField.Key);
badFieldsReport += longField.Value + "\n";
}
}
else
throw;
}
}
}
The badFieldsReport will have this value:
'SourceData' column cannot take the value of
'Blah-Blah-Blah-Blah-Blah-Blah' because the max length it can take is
16.

Kevin Pope's comment under the accepted answer was what I needed.
The problem, in my case, was that I had triggers defined on my table that would insert update/insert transactions into an audit table, but the audit table had a data type mismatch where a column with VARCHAR(MAX) in the original table was stored as VARCHAR(1) in the audit table, so my triggers were failing when I would insert anything greater than VARCHAR(1) in the original table column and I would get this error message.

I used a different tactic, fields that are allocated 8K in some places. Here only about 50/100 are used.
declare #NVPN_list as table
nvpn varchar(50)
,nvpn_revision varchar(5)
,nvpn_iteration INT
,mpn_lifecycle varchar(30)
,mfr varchar(100)
,mpn varchar(50)
,mpn_revision varchar(5)
,mpn_iteration INT
-- ...
) INSERT INTO #NVPN_LIST
SELECT left(nvpn ,50) as nvpn
,left(nvpn_revision ,10) as nvpn_revision
,nvpn_iteration
,left(mpn_lifecycle ,30)
,left(mfr ,100)
,left(mpn ,50)
,left(mpn_revision ,5)
,mpn_iteration
,left(mfr_order_num ,50)
FROM [DASHBOARD].[dbo].[mpnAttributes] (NOLOCK) mpna
I wanted speed, since I have 1M total records, and load 28K of them.

This error may be due to less field size than your entered data.
For e.g. if you have data type nvarchar(7) and if your value is 'aaaaddddf' then error is shown as:
string or binary data would be truncated

You simply can't beat SQL Server on this.
You can insert into a new table like this:
select foo, bar
into tmp_new_table_to_dispose_later
from my_table
and compare the table definition with the real table you want to insert the data into.
Sometime it's helpful sometimes it's not.
If you try inserting in the final/real table from that temporary table it may just work (due to data conversion working differently than SSMS for example).
Another alternative is to insert the data in chunks, instead of inserting everything immediately you insert with top 1000 and you repeat the process, till you find a chunk with an error. At least you have better visibility on what's not fitting into the table.

Querying a local secondary index of dynamoDB using low level java API

I am trying to query a local secondary index with the hash key.
My table: Table1
Username(hash_key), Id(range_key), timestamp....
My local secondary index(because I want to query all usernames and have it ordered by time)
My LSI: Table1_TimestampLSI
Username(hash_key), timestamp(range_key)
e.g
user1 | 123 | 12/12/2015
user1 | 456 | 11/01/2015
user2 | 789 | 12/01/2015
NOTE: I could not just make the timestamp the range key in the table itself because (Username+Timestamp) is not unique. So I had to create a ID field to ensure uniqueness.
Since I want the async client I am using the low level API(model NOT document) java API.
Function to query index
HashMap<String, Condition> queryFilter = new HashMap<String, Condition>();
Condition condition = new Condition()
.withComparisonOperator(ComparisonOperator.EQ.toString())
.withAttributeValueList(new AttributeValue().withS(username));
queryFilter.put("Username", condition);
QueryRequest queryRequest = new QueryRequest(tableName + "_TimestampLSI").withQueryFilter(queryFilter);
queryRequest.setScanIndexForward(false);
Future<QueryResult> fQueryResult = dynamoDB.queryAsync(queryRequest,
new AsyncHandler<QueryRequest,QueryResult>() {
public void onSuccess(QueryRequest request, QueryResult result) {
System.out.println("Table: " + result);
}
public void onError(Exception exception) {
System.out.println("Error describing table: " + exception.getMessage());
// Callers can also test if exception is an instance of
// AmazonServiceException or AmazonClientException and cast
// it to get additional information
}
});
System.out.println("Result: " + fQueryResult);
I get the following error
Error describing table: Either the KeyConditions or KeyConditionExpression parameter must be specified in the request. (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException;
Am I missing something? Thought I should be able to query an index just on the hash value.

You've misunderstood Local Indices.
A Query always takes a hash key. So your QueryRequest should have has a setHashKeyValue setup.
You can't cross-hash a query, not with hash+range primary key and not with LSI. You can do that with GSI but I'm not sure how this will help you here.
It is unclear what exactly you are trying to achieve. If you want all usernames then you need a Scan and not a Query - since you want all hash keys to be present.

Ormlite: foreignAutoCreate insert if not exists

i noticed that foreignAutoCreate crash when a related data already exists, throwing something like this:
E/SQLiteLog﹕ (2067) abort at 20 in [INSERT INTO `Group` (... etc,`id` ) VALUES (?,?,?)]:
UNIQUE constraint failed: Group.id
but i have a list, example:
List<User> lstUsers = //values
im inserting values with a loop "for" with createOrUpdate:
for(...) {
dao.createOrUpdate(user);
}
and User has related data with Group by example:
#DatabaseField(canBeNull = true, foreign = true, foreignAutoCreate = true,
foreignAutoRefresh = true)
private Group group;
When i have a repeated Group id value the operation fails:
lstUsers.get(0).getGroup().getId(); // group id = 1 <-- foreign insert
lstUsers.get(1).getGroup().getId(); // group id = 1 <-- crash
lstUsers.get(3).getGroup().getId(); // group id = 1 <-- crashed already
lstUsers.get(3).getGroup().getId(); // group id = 2 <-- crashed already
... etc.
i need to insert a group or groups that is not reppeated (insert only 1 time) automatically with foreignAutoCreate no manually.
lstUsers.get(0).getGroup().getId(); // group id = 1 <-- foreign insert
lstUsers.get(1).getGroup().getId(); // group id = 1 <-- foreign exists, skip
lstUsers.get(3).getGroup().getId(); // group id = 1 <-- foreign exists, skip
lstUsers.get(3).getGroup().getId(); // group id = 2 <-- foreign insert
there is a way to do this??
UPDATE 1:
Try with this test please:
public void poblatingUsersAndGroupsList(){
List<User> lstUsers = new ArrayList<>();
Group group1 = new Group();
// this group doesn't exists in database
group1.setId(1); // should be inserted by ForeignAutoCreate
lstUsers.add(new User("user1",group1));
lstUsers.add(new User("user2",group1));
lstUsers.add(new User("user3",group1));
Group group2 = new Group();
group1.setId(2);
// this group doesn't exists in database
group1.setId(1); // should be inserted by ForeignAutoCreate
lstUsers.add(new User("user4",group1));
lstUsers.add(new User("user5",group2));
lstUsers.add(new User("user6",group2));
createUsersInGroup(lstUsers);
}
public void createUsers(List<User> lstUsers){
for(User user : lstUsers){
// here is the error
// group1 inserted the 1st time
// the 2nd, 3rd, n times are throwing error
// same for group2
dao.createOrUpdate(user);
}
}
foreignAutoCreate should work like this code, so we can avoid this block of code:
public void createUsers(List<User> lstUsers){
for(User user : lstUsers){
// (innecesary) calling or instantiating the groupDao
// (innecesary) check if not exists
groupDao.createIfNotExists(user.getGroup());
dao.createOrUpdate(user);
}
}

This is an old question and I assume you moved on. I was not able to reproduce this however. I've expanded the test cases with multiple inserts using dao.createOrUpdate(...). See the ForeignObjectTest unit test code.
One thing that I wonder about is that when you are creating a User with an assocaited Group, the Group must have been created already so that it has an id. Is that possibly the problem?
Group group = new Group();
// need to do this first to have group get an id
groupDao.create(group);
User user = new User();
user.setGroup(group);
for(...) {
dao.createOrUpdate(user);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Flink SQL: Repeating grouping keys in result of GROUP BY query - java

Related

How to generate arbitrary subqueries/joins in a Jooq query

Script in Groovy to update and insert Initial_Range and Final_Range

My Customer data is being truncated when added to my List [duplicate]

Querying a local secondary index of dynamoDB using low level java API

Ormlite: foreignAutoCreate insert if not exists

Categories

Resources