I'm using Android Studio Room database to query some complicated works.
My original SQLite query sentence like this:
SELECT DISTINCT mainCol as main, secondCol as sum FROM ADB WHERE
(mainCol LIKE 'A1%' OR mainCol LIKE 'A2%' OR mainCol LIKE 'A3%') AND
(secondCol LIKE 'B1%' OR secondCol LIKE 'B2%' OR secondCol LIKE 'B3%' )
ORDER BY dbtime DESC
the Room DAO suggests using List<String> , so I do this as follow.
in DAO
#Query
("SELECT DISTINCT mainCol as main, secondCol as sum FROM ADB WHERE" +
" mainCol LIKE :rOne AND" +
" secondCol LIKE :rTwo" +
" ORDER BY dbtime DESC")
public Map<String,String> getCompare(List<String> rOne, List<String> rTwo);
but it responsed error like this
SELECT DISTINCT mainCol as main, secondCol as sum FROM ADB WHERE mainCol LIKE ?,?,? AND secondCol LIKE ?,?,? ORDER BY dbtime DESC
and it is wrong sentence of course.
Any kind answer to fix my problem?
Thank you
I've tried to use my original command query by room, but no result.
I've tried only one value for each, that have correct result, but only one result.
I want to query multiple value and using "OR", I'm struggling here.
If you use a List then it simply converts the values to a comma separated list of the values, and then to ?'s for the values to be bound by SQLite, hence ?,?,? (good for use in IN () clause). In your case you want individual values placed throughout the SQL. So something like:-
#Query("SELECT DISTINCT mainCol as main, secondCol as sum FROM ADB WHERE
(mainCol LIKE :rOneA OR mainCol LIKE :rOneB OR mainCol LIKE :rOneC) AND
(secondCol LIKE :rTwoA OR secondCol LIKE :rTwoB OR secondCol LIKE :rTwoC )
ORDER BY dbtime DESC")
public Map<String,String> getCompare(String rOneA, String rOneB, String rOneC, String rTwoA, String rTwoB, String rTwoC);
Note the above is in-principle code, it has not been compiled or run and thus may contain some typing errors.
Additional
Just curious if I want to query more values, such as 10rOne comparing 10rTwo, is that any way to make the Room query statement dynamic?
Yes. 3 potential ways.
Using recursive CTE's (Common Table Expressions which are basically temporary tables that exist just for the duration). Not exactly light hearted matter and the resultant SQL can be pretty long especially as you would probably be splitting CSV's.
To a limited fashion you can utilise CASE WHEN THEN END in the SQL (probably not suited to list handling but with adaptation to utilise IN(the_list) there would be potential).
Using an #RawQuery where you can build the SQL. However, the query cannot be and therefore is not checked at compile time. Errors in the SQL will result in run time errors.
Here's an example/demo of using #RawQuery
First the ADB class (my version with 4 columns to be searched dynamically):-
#Entity
class ADB {
#PrimaryKey
Long id=null;
String mainCol;
String secondCol;
String thirdCol;
String fourthCol;
ADB(){}
#Ignore
ADB(String mainCol, String secondCol, String thirdCol, String fourthCol){
this.mainCol = mainCol;
this.secondCol = secondCol;
this.thirdCol = thirdCol;
this.fourthCol = fourthCol;
}
}
TheDatabase i.e. the #Database annotated class
#Database(entities = {ADB.class}, exportSchema = false, version = 1)
abstract class TheDatabase extends RoomDatabase {
abstract TheDAOs getTheDAOs();
private static volatile TheDatabase instance;
public static TheDatabase getInstance(Context context) {
if (instance==null) {
instance = Room.databaseBuilder(context,TheDatabase.class,"the_database.db")
.allowMainThreadQueries()
.build();
}
return instance;
}
}
Note .allowMainThreadQueries used for brevity and convenience
Nothing special here.
TheDAOs (includes all the dynamic stuff, WARNING pretty long winded):-
#Dao
abstract class TheDAOs {
#Insert(onConflict = OnConflictStrategy.IGNORE)
abstract long insert(ADB adb);
#RawQuery
abstract List<ADB> rawQueryForADB(SimpleSQLiteQuery simpleSQLiteQuery);
List<ADB> dynamicQueryOfADB(String[] mainCol, String[] secondCol, String[] thirdCol, String[] fourthCol) {
boolean whereKeywordUsed = false;
StringBuilder sql = new StringBuilder().append("SELECT * FROM adb ");
if (mainCol != null && mainCol.length > 0) {
if (!whereKeywordUsed) sql.append("WHERE ");
sql.append(buildColumnWhereClause("maincol",mainCol,whereKeywordUsed));
whereKeywordUsed = true;
}
if (secondCol != null && secondCol.length > 0) {
if (!whereKeywordUsed) sql.append("WHERE ");
sql.append(buildColumnWhereClause("secondcol",secondCol,whereKeywordUsed));
whereKeywordUsed = true;
}
if (thirdCol != null && thirdCol.length > 0) {
if (!whereKeywordUsed) sql.append("WHERE ");
sql.append(buildColumnWhereClause("thirdcol",thirdCol,whereKeywordUsed));
whereKeywordUsed = true;
}
if (fourthCol != null && fourthCol.length > 0) {
if (!whereKeywordUsed) sql.append("WHERE ");
sql.append(buildColumnWhereClause("fourthcol",fourthCol,whereKeywordUsed));
whereKeywordUsed = true;
}
Log.d("SQLGEN","Generated SQL is\n\t" + sql.toString());
return rawQueryForADB(new SimpleSQLiteQuery(sql.toString()));
}
private String buildColumnWhereClause(String columnName, String[] columnValues, boolean whereKeywordUsed) {
StringBuilder colSQL = new StringBuilder();
if (whereKeywordUsed) colSQL.append(" AND ");
boolean afterFirst = false;
for (String s: columnValues) {
if (afterFirst) colSQL.append(" OR ");
colSQL.append(columnName).append(" LIKE '").append(s).append("' ");
afterFirst = true;
}
return colSQL.toString();
}
}
First, note the use of an abstract class as opposed to an interface to allow methods with bodies.
method rawQueryForADB is a skeleton as such and is called indirectly and hence the use of an abstract class (otherwise an instance of TheDaos would have to be passed to the method).
method dynamicQueryOfADB is the method that actually builds the SQL with the help of the buildColumnWhereClause method.
it needs to be passed String[] (or null) for each of the 4 columns.
In this case it returns a List (unclear of dynamic meant covering extra columns). Of course it would be simple enough to get the 2 strings from this (so List gives more flexibility).
The generated SQL is written to the Log.
It should be noted that the above has only undergone minimal testing. It's intended as in-principle code. As such it may need refinement.
MainActivity i.e. to actually demonstrate the above by adding some data and then extracting data in various permutations to demonstrate the dynamicity:-
public class MainActivity extends AppCompatActivity {
TheDatabase dbi;
TheDAOs dao;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
dbi = TheDatabase.getInstance(this);
dao = dbi.getTheDAOs();
dao .insert(new ADB("A1aaaa","B2bbbb","C3cccc","D4dddd"));
dao .insert(new ADB("A2aaaa","B3bbbb","C4cccc","D5dddd"));
dao .insert(new ADB("A3aaaa","B4bbbb","C5cccc","D6dddd"));
dao .insert(new ADB("A4aaaa","B5bbbb","C6cccc","D7dddd"));
String[] mc1 = new String[]{"a1%","a2%"};
logQueryResults(dao.dynamicQueryOfADB(mc1,null,null,new String[]{}),"RUN1");
String[] sc1 = new String[]{"b4%","b5%"};
logQueryResults(dao.dynamicQueryOfADB(null,sc1,null,new String[]{}),"RUN2");
String[] tc1 = new String[]{"c3%","c6%"};
logQueryResults(dao.dynamicQueryOfADB(null,null,tc1,null),"RUN3");
String[] fc1 = new String[]{"D1%","d2%","D3%","D4%"};
logQueryResults(dao.dynamicQueryOfADB(null,null,null,fc1),"RUN4");
logQueryResults(dao.dynamicQueryOfADB(null,null,null,null),"RUN5");
logQueryResults(dao.dynamicQueryOfADB(mc1,sc1,tc1,fc1),"RUN6");
}
private void logQueryResults(List<ADB> result, String tagSuffix) {
for (ADB a: result) {
Log.d("DBINFO_" + tagSuffix,"MC is " + a.mainCol + " SC is " + a.secondCol + " TC is " + a.thirdCol + " FC is " + a.fourthCol );
}
}
}
When run (first time as the code is just intended to run once) then the log contains (generated SQL followed by the result):-
2023-01-14 07:42:47.786 D/SQLGEN: Generated SQL is
SELECT * FROM adb WHERE maincol LIKE 'a1%' OR maincol LIKE 'a2%'
2023-01-14 07:42:47.791 D/DBINFO_RUN1: MC is A1aaaa SC is B2bbbb TC is C3cccc FC is D4dddd
2023-01-14 07:42:47.792 D/DBINFO_RUN1: MC is A2aaaa SC is B3bbbb TC is C4cccc FC is D5dddd
2023-01-14 07:42:47.792 D/SQLGEN: Generated SQL is
SELECT * FROM adb WHERE secondcol LIKE 'b4%' OR secondcol LIKE 'b5%'
2023-01-14 07:42:47.793 D/DBINFO_RUN2: MC is A3aaaa SC is B4bbbb TC is C5cccc FC is D6dddd
2023-01-14 07:42:47.793 D/DBINFO_RUN2: MC is A4aaaa SC is B5bbbb TC is C6cccc FC is D7dddd
2023-01-14 07:42:47.793 D/SQLGEN: Generated SQL is
SELECT * FROM adb WHERE thirdcol LIKE 'c3%' OR thirdcol LIKE 'c6%'
2023-01-14 07:42:47.794 D/DBINFO_RUN3: MC is A1aaaa SC is B2bbbb TC is C3cccc FC is D4dddd
2023-01-14 07:42:47.794 D/DBINFO_RUN3: MC is A4aaaa SC is B5bbbb TC is C6cccc FC is D7dddd
2023-01-14 07:42:47.794 D/SQLGEN: Generated SQL is
SELECT * FROM adb WHERE fourthcol LIKE 'D1%' OR fourthcol LIKE 'd2%' OR fourthcol LIKE 'D3%' OR fourthcol LIKE 'D4%'
2023-01-14 07:42:47.797 D/DBINFO_RUN4: MC is A1aaaa SC is B2bbbb TC is C3cccc FC is D4dddd
2023-01-14 07:42:47.797 D/SQLGEN: Generated SQL is
SELECT * FROM adb
2023-01-14 07:42:47.799 D/DBINFO_RUN5: MC is A1aaaa SC is B2bbbb TC is C3cccc FC is D4dddd
2023-01-14 07:42:47.799 D/DBINFO_RUN5: MC is A2aaaa SC is B3bbbb TC is C4cccc FC is D5dddd
2023-01-14 07:42:47.799 D/DBINFO_RUN5: MC is A3aaaa SC is B4bbbb TC is C5cccc FC is D6dddd
2023-01-14 07:42:47.799 D/DBINFO_RUN5: MC is A4aaaa SC is B5bbbb TC is C6cccc FC is D7dddd
2023-01-14 07:42:47.799 D/SQLGEN: Generated SQL is
SELECT * FROM adb WHERE maincol LIKE 'a1%' OR maincol LIKE 'a2%' AND secondcol LIKE 'b4%' OR secondcol LIKE 'b5%' AND thirdcol LIKE 'c3%' OR thirdcol LIKE 'c6%' AND fourthcol LIKE 'D1%' OR fourthcol LIKE 'd2%' OR fourthcol LIKE 'D3%' OR fourthcol LIKE 'D4%'
2023-01-14 07:42:47.801 D/DBINFO_RUN6: MC is A1aaaa SC is B2bbbb TC is C3cccc FC is D4dddd
Related
I want to DB connections in a list from various sources. Some sources may/may not provide the Url.
I want to keep the records which has the complete info(i.e dbName, connUrl, isActive) + if any db doesn't have the complete record with any of the sources.
Record.java,
String dbName;
String connUrl;
String isActiveYN;
MainApp.java,
List <Record> connRecords = new ArrayList<>();
connRecords = someFunc(sourceName) // some func takes source as param and returns connection List
connRecords.addAll(someFunc(sourceName));
connRecords.addAll(someFunc(sourceName));
//logic for getting all unique DBs
Eg:
List 1 from Source Alpha: [{A,Y},{B,Y},{C, Y},{D,Y}]
List 2 from Source Beta: [{A, abc-url, Y}, {B, xyz-url, Y}]
List 3 from Source Gamma: [{A,Y},{C,Y}]
After merging all the list we get:
I/P List: [{A,Y}, {B,Y}, {C, Y}, {D,Y}, {A, abc-url, Y}, {B, xyz-url, Y}, {A,Y}, {C,Y}]
for dbName A, source Beta has the complete record info , so we'll take that
for dbName B, source Beta has the complete record info, so we'll take that
for dbName C, even none of the sources have the complete info, still we'll keep it since it's an available DB and some source might have the url connection at some point in future.
for dbName D, even none of the sources have the complete info, still we'll keep it since it's an available DB and some source might have the url connection at some point in future.
Required O/P List: [{A,abc-url,Y}, {B,xyz-url,Y}, {C,Y},{D,Y}]
Can anyone suggest a java 8 way (using streams) way of doing this?
old school way of doing this:
Map<String, Record> map = new HashMap<>();
for (Record record: recordList) {
String dbName = record.getDBName();
String isActiveYN = record.getIsActiveYN();
if (map.containsKey(dbName)) {
if(record.getConnURL() != null)
map.replace(dbName, record);
} else {
map.put(dbName, record);
}
}
List<Record> merged = new ArrayList<>(map.values());
It can be implemented using by implementing a couple of merge functions:
merge two Record instances into one
merge two String instances into one
override toString method to provide desired output
fix setting of isActiveN: if merge needed, then set to Y
// using Lombok annotations for brevity
#Data
#AllArgsConstructor
#EqualsAndHashCode
class Record {
String dbName;
String connUrl;
String isActiveYN;
Record merge(Record other) {
if (this.dbName.equals(other.dbName)) {
this.connUrl = merge(this.connUrl, other.connUrl);
this.isActiveYN = "Y"; // if merge called, there are at least two entries
}
return this;
}
static String merge(String thiss, String thats) {
return thiss == null ^ thats == null ? (thiss == null ? thats : thiss) : thiss;
}
#Override
public String toString() {
StringBuilder sb = new StringBuilder();
String sep = "";
if (null != dbName) {
sb.append(dbName);
sep = ", ";
}
if (null != connUrl) {
sb.append(sep).append(connUrl);
sep = ", ";
}
if (null != isActiveYN) {
sb.append(sep).append(isActiveYN);
}
sb.insert(0, '{');
sb.append('}');
return sb.toString();
}
}
Then the records may be merged as simply as using Collectors.toMap collector and pass reference to Record::merge as the third parameter:
// Source Alpha
List<Record> connRecords = new ArrayList<>(Arrays.asList(
new Record("A", null, "N"),
new Record("B", null, "Y"),
new Record("C", null, null),
new Record("D", null, "N")
));
// Source Beta
connRecords.addAll(Arrays.asList(
new Record("A", "abc-url", null),
new Record("B", "xyz-url", "N")
));
// Source Gamma
connRecords.addAll(Arrays.asList(
new Record("A", null, "N"),
new Record("C", null, "N")
));
List<Record> merged = new ArrayList<>(
connRecords.stream()
.collect(Collectors.toMap(Record::getDbName, rec -> rec, Record::merge))
.values()
);
System.out.println(merged);
Output:
[{A, abc-url, Y}, {B, xyz-url, Y}, {C, Y}, {D, N}]
I have a custom type defined in my database as
CREATE TYPE address AS (ip inet, port int);
And a table that uses this type in an array:
CREATE TABLE my_table (
addresses address[] NULL
)
I have a sample CSV file with the following contents
{(10.10.10.1,80),(10.10.10.2,443)}
{(10.10.10.3,8080),(10.10.10.4,4040)}
And I use the following code snippet to perform my COPY:
Class.forName("org.postgresql.Driver");
String input = loadCsvFromFile();
Reader reader = new StringReader(input);
Connection connection = DriverManager.getConnection(
"jdbc:postgresql://db_host:5432/db_name", "user",
"password");
CopyManager copyManager = connection.unwrap(PGConnection.class).getCopyAPI();
String copyCommand = "COPY my_table (addresses) " +
"FROM STDIN WITH (" +
"DELIMITER '\t', " +
"FORMAT csv, " +
"NULL '\\N', " +
"ESCAPE '\"', " +
"QUOTE '\"')";
copyManager.copyIn(copyCommand, reader);
Executing this program produces the following exception:
Exception in thread "main" org.postgresql.util.PSQLException: ERROR: malformed record literal: "(10.10.10.1"
Detail: Unexpected end of input.
Where: COPY only_address, line 1, column addresses: "{(10.10.10.1,80),(10.10.10.2,443)}"
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2422)
at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1114)
at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:963)
at org.postgresql.core.v3.CopyInImpl.endCopy(CopyInImpl.java:43)
at org.postgresql.copy.CopyManager.copyIn(CopyManager.java:185)
at org.postgresql.copy.CopyManager.copyIn(CopyManager.java:160)
I have tried with different combinations of the parentheses in the input but cannot seem to get the COPY working. Any ideas where I might be going wrong?
See https://git.mikael.io/mikaelhg/pg-object-csv-copy-poc/ for a project with a JUnit test that does what you want.
Basically, you want to be able to use commas for two things: to separate array items, and to separate type fields, but you DON'T want the CSV parsing to interpret commas as field delineators.
So
you want to tell the CSV parser to consider the whole row to be one string, one field, which you can do by enclosing it in single quotes and telling the CSV parser about this, and
you want the PG field parser to consider each array item type instance to be enclosed in a double quote.
Code:
copyManager.copyIn("COPY my_table (addresses) FROM STDIN WITH CSV QUOTE ''''", reader);
DML example 1:
COPY my_table (addresses) FROM STDIN WITH CSV QUOTE ''''
CSV example 1:
'{"(10.0.0.1,1)","(10.0.0.2,2)"}'
'{"(10.10.10.1,80)","(10.10.10.2,443)"}'
'{"(10.10.10.3,8080)","(10.10.10.4,4040)"}'
DML example 2, escaping the double quotes:
COPY my_table (addresses) FROM STDIN WITH CSV
CSV example 2, escaping the double quotes:
"{""(10.0.0.1,1)"",""(10.0.0.2,2)""}"
"{""(10.10.10.1,80)"",""(10.10.10.2,443)""}"
"{""(10.10.10.3,8080)"",""(10.10.10.4,4040)""}"
Full JUnit test class:
package io.mikael.poc;
import com.google.common.io.CharStreams;
import org.junit.*;
import org.postgresql.PGConnection;
import org.postgresql.copy.CopyManager;
import org.testcontainers.containers.PostgreSQLContainer;
import java.io.*;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import static java.nio.charset.StandardCharsets.UTF_8;
public class CopyTest {
private Reader reader;
private Connection connection;
private CopyManager copyManager;
private static final String CREATE_TYPE = "CREATE TYPE address AS (ip inet, port int)";
private static final String CREATE_TABLE = "CREATE TABLE my_table (addresses address[] NULL)";
private String loadCsvFromFile(final String fileName) throws IOException {
try (InputStream is = getClass().getResourceAsStream(fileName)) {
return CharStreams.toString(new InputStreamReader(is, UTF_8));
}
}
#ClassRule
public static PostgreSQLContainer db = new PostgreSQLContainer("postgres:10-alpine");
#BeforeClass
public static void beforeClass() throws Exception {
Class.forName("org.postgresql.Driver");
}
#Before
public void before() throws Exception {
String input = loadCsvFromFile("/data_01.csv");
reader = new StringReader(input);
connection = DriverManager.getConnection(db.getJdbcUrl(), db.getUsername(), db.getPassword());
copyManager = connection.unwrap(PGConnection.class).getCopyAPI();
connection.setAutoCommit(false);
connection.beginRequest();
connection.prepareCall(CREATE_TYPE).execute();
connection.prepareCall(CREATE_TABLE).execute();
}
#After
public void after() throws Exception {
connection.rollback();
}
#Test
public void copyTest01() throws Exception {
copyManager.copyIn("COPY my_table (addresses) FROM STDIN WITH CSV QUOTE ''''", reader);
final StringWriter writer = new StringWriter();
copyManager.copyOut("COPY my_table TO STDOUT WITH CSV", writer);
System.out.printf("roundtrip:%n%s%n", writer.toString());
final ResultSet rs = connection.prepareStatement(
"SELECT array_to_json(array_agg(t)) FROM (SELECT addresses FROM my_table) t")
.executeQuery();
rs.next();
System.out.printf("json:%n%s%n", rs.getString(1));
}
}
Test output:
roundtrip:
"{""(10.0.0.1,1)"",""(10.0.0.2,2)""}"
"{""(10.10.10.1,80)"",""(10.10.10.2,443)""}"
"{""(10.10.10.3,8080)"",""(10.10.10.4,4040)""}"
json:
[{"addresses":[{"ip":"10.0.0.1","port":1},{"ip":"10.0.0.2","port":2}]},{"addresses":[{"ip":"10.10.10.1","port":80},{"ip":"10.10.10.2","port":443}]},{"addresses":[{"ip":"10.10.10.3","port":8080},{"ip":"10.10.10.4","port":4040}]}]
In CSV format, when you specify a seperator, you can not use it as a character in your data, unless you escape it!
example of a csv file using comma as a separator
a correct record: data1, data2
parse results: [0] => data1 [1] => data2
an incorrect one: data,1, data2 parse results: [0] => data [1] => 1 [2] => data2
finally you do not need to load your file as a csv, but as a simple file, so replace your method loadCsvFromFile(); by
public String loadRecordsFromFile(File file) {
LineIterator it = FileUtils.lineIterator(file, "UTF-8");
StringBuilder sb = new StringBuilder();
try {
while (it.hasNext()) {
sb.append(it.nextLine()).append(System.nextLine);
}
}
finally {
LineIterator.closeQuietly(iterator);
}
return sb.toString();
}
Do not forget to add this dependency in your pom file
<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
Or to download the JAR from commons.apache.org
1NF
First of all, I think your table design is wrong because it isn't 1NF compliant. Every field should only contain atomic attributes, but that's not the case. Why not a table like:
CREATE TABLE my_table (
id,
ip inet,
port int
)
Where id is the number of your line in the source file and ip/port one of the adresses in this line?
Sample data:
id | ip | port
-----------------------
1 | 10.10.10.1 | 80
1 | 10.10.10.2 | 443
2 | 10.10.10.3 | 8080
2 | 10.10.10.4 | 4040
...
Hence, you will be able to query your database on single address (find all the associated adresses, return true if two adresses are on the same line, whatever else you might want...).
Load the data
But let's assume you know what you are doing. The main issue here is that your input data file is in a special format. It might be one single column CSV file, but it would be a very degenerated CSV file. Anyway, you have to transform the lines before you insert them into the database. You have two options:
you read each line of the input file and you make an INSERT (this may take a while);
you convert the input file into a text file with the expected format and use COPY.
Insert one by one
The first options seems easy: for the first row of the csv file, {(10.10.10.1,80),(10.10.10.2,443)}, you have to run the query:
INSERT INTO my_table VALUES (ARRAY[('10.10.10.1',80),('10.10.10.2',443)]::address[], 4)
To do so, you just have to create a new string:
String value = row.replaceAll("\\{", "ARRAY[")
.replaceAll("\\}", "]::address[]")
.replaceAll("\\(([0-9.]+),", "'$1'");
String sql = String.format("INSERT INTO my_table VALUES (%s)", value);
And execute the query for every line of the input file (or for a better security, use a prepared statement).
Insert with COPY
I will elaborate on the second option. You have to use in Java code:
copyManager.copyIn(sql, from);
Where copy query is a COPY FROM STDIN statement and from is a reader. The statement will be:
COPY my_table (addresses) FROM STDIN WITH (FORMAT text);
To feed the copy manager, you need data like (note the quotes):
{"(10.10.10.1,80)","(10.10.10.2,443)"}
{"(10.10.10.3,8080)","(10.10.10.4,4040)"}
With a temporary file
The simpler way to get the data in the right format is to create a temporary file. You read each line of the input file and replace ( by "( and ) by )". Write this processed line into a temporary file. Then pass a reader on this file to the copy manager.
On the fly
With two threads
You can use two threads:
thread 1 reads the input file, processes the lines one by one and writes them into a PipedWriter.
thread 2 passes a PipedReader connected to the previous PipedWriter to the copy manager.
The main difficulty is to sychronize the threads in such a way that thread 2 starts to read the PipedReader before thread 1 starts to write data into the PipedWriter. See this project of mine for an example.
With a custom reader
The from reader could be an instance of something like (naive version):
class DataReader extends Reader {
PushbackReader csvFileReader;
private boolean wasParenthese;
public DataReader(Reader csvFileReader) {
this.csvFileReader = new PushbackReader(csvFileReader, 1);
wasParenthese = false;
}
#Override
public void close() throws IOException {
this.csvFileReader.close();
}
#Override
public int read(char[] cbuf, int off, int len) throws IOException {
// rely on read()
for (int i = off; i < off + len; i++) {
int c = this.read();
if (c == -1) {
return i-off > 0 ? i-off : -1;
}
cbuf[i] = (char) c;
}
return len;
}
#Override
public int read() throws IOException {
final int c = this.csvFileReader.read();
if (c == '(' && !this.wasParenthese) {
this.wasParenthese = true;
this.csvFileReader.unread('(');
return '"'; // add " before (
} else {
this.wasParenthese = false;
if (c == ')') {
this.csvFileReader.unread('"');
return ')'; // add " after )
} else {
return c;
}
}
}
}
(This is a naive version because the right way to do it would be to override only public int read(char[] cbuf, int off, int len). But you should then process the cbuf to add the quotes and store the extra chars pushed to the right: this is a bit tedious).
Now, if r is the reader for the file:
{(10.10.10.1,80),(10.10.10.2,443)}
{(10.10.10.3,8080),(10.10.10.4,4040)}
Just use:
Class.forName("org.postgresql.Driver");
Connection connection = DriverManager
.getConnection("jdbc:postgresql://db_host:5432/db_base", "user", "passwd");
CopyManager copyManager = connection.unwrap(PGConnection.class).getCopyAPI();
copyManager.copyIn("COPY my_table FROM STDIN WITH (FORMAT text)", new DataReader(r));
On bulk loading
If you are loading a huge amount of data, don't forget the basic tips: disable autocommit, remove indexes and constraints, and use TRUNCATE and ANALYZE as follows:
TRUNCATE my_table;
COPY ...;
ANALYZE my_table;
This will speed up the loading.
I cannot figure out how to get MyBatis to generate an enclosed or statement:
WHERE x.token = ? AND (
(x.scene = 'A' OR x.scene = 'B'))
This is a surprisingly simple operation they've made very difficult. I can't find it in the JavaDoc: http://static.javadoc.io/org.mybatis/mybatis/3.4.5/org/apache/ibatis/jdbc/SQL.html
It can't be done in the current version of mybatis (3.4.6).
You can either use the whole subexpression like this:
WHERE("x.token = ?");
WHERE("(x.scene = 'A' OR x.scene = 'B')");
or create you own function if you have many/variable number of operands to OR:
WHERE("x.token = ?");
WHERE(OR("x.scene = 'A'", "x.scene = 'B'"));
Where OR is defined (using guava Joiner) as:
Joiner OR_JOINER = Joiner.on(" OR ");
private String OR(String ... operands) {
return String.format("(%s)", OR_JOINER.join(operands));
}
Here is the code:
public class databag extends EvalFunc<DataBag> {
TupleFactory mTupleFactory = TupleFactory.getInstance();
BagFactory mBagFactory = BagFactory.getInstance();
private DataBag result;
private String delimiterType = ": Src / dest :";
public DataBag exec(Tuple input) throws IOException {
try{
result = mBagFactory.newDefaultBag(); // change here
result.add(input);
getLogger().info("::::::: Entered try block ::::::::::::");
// create indexing for source and destination . ::: (Arraylist<Object[]>)
ConcurrentHashMap<Object, ArrayList<Integer>> srcIndexMap = new ConcurrentHashMap<Object, ArrayList<Integer>>();
ConcurrentHashMap<Object, ArrayList<Integer>> destIndexMap = new ConcurrentHashMap<Object, ArrayList<Integer>>();
// store the rows to Arraylist(Object[]) collection by converting .
ArrayList<Object[]> source = new ArrayList<Object[]>();
ArrayList<Object[]> destination = new ArrayList<Object[]>();
int srcCounter = 0;
int destCounter = 0;
ArrayList<Integer> Sourcearray = new ArrayList<Integer>();
ArrayList<Integer> Destinationarray = new ArrayList<Integer>();
for (Iterator<Tuple> iter = result.iterator(); iter.hasNext();) {
//some code here
}
I'm trying to iterate tuples in a databag using for loop, but for every tuple all the collections are reinitialized in other words its executing from the try block for each tuples.
output:
INFO PigUDFpck.databag - ::::::: Entered try block ::::::::::::
PigUDFpck.databag - srcIndexMap={}
PigUDFpck.databag - inside main if loop skey=4
PigUDFpck.databag - destIndexMap.contains(skey)=false
PigUDFpck.databag - into else loop of main method
PigUDFpck.databag - ::::::: Entered try block ::::::::::::
PigUDFpck.databag - srcIndexMap={}
PigUDFpck.databag - inside main if loop skey=4
PigUDFpck.databag - destIndexMap.contains(skey)=false
PigUDFpck.databag - into else loop of main method
UPDATED
Pig Script
REGISTER /usr/local/pig/UDF/UDFBAG.jar;
sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source1.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination1.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
cogroupnew = COGROUP sourcenew BY ID inner, destnew BY ID inner;
diff_data = FOREACH cogroupnew GENERATE DIFF(sourcenew,destnew);
ids = FOREACH diff_data GENERATE FLATTEN($0);
id1 = DISTINCT( FOREACH ids GENERATE $0);
src = FILTER sourcenew BY ID == id1.$0;
finalsrc = FOREACH src GENERATE *, 'Source' as Source:chararray;
dest = FILTER destnew BY ID == id1.$0;
finaldest = FOREACH dest GENERATE *, 'Destination' as Destination:chararray;
final = UNION finalsrc,finaldest ;
A = FOREACH final GENERATE PigUDFpck.databag(*);
DUMP A;
and the input to the UDF as follows:
(4,JOHN Hansel,JOHN,Hansel,Banking ,4,M,20-01-1994,78.65,345000,ArkansasSrc1,Source)
(4,JOHN Hansel,JOHN,Hansel,Banking ,4,M,20-01-1994,78.65,345000,ArkansaSrc2,Source)
(4,JOHN Hansel,JOHN,Hansel,Banking ,4,M,20-01-1994,78.65,345000,Arkansasdest1,Destination)
(4,JOHN Hansel,JOHN,Hansel,Banking ,4,M,20-01-1994,78.65,345000,Arkanssdest2,Destination)
Help is much appreciated.!!
Thanks in Advance..!
Please understand PIG is a DAG generator and based on the DAG it spawns Map Reduce Jobs.
Higher level PIG constructs like LOAD , FOREACH , JOIN boils to lower level MR constructs of
> Load => Mapper in MR
> GENRERATE => a function call in Mapper or Reduce
> JOIN => SHUFFLE (Join in Map Reduce)
> Filter => Filter function in Map or Reduce
The databag function is being called not once but multiple times as its a function call being executed in Mapper of Reducer.
For each input ROW( depending on databag UDF becomes part of mapper or reducer) the dataBag will be executed.
Please go through the EXPAIN command in pig which translates PIG script to chaing of underlying MR jobs
For a detailed understanding Please follow :
http://bytepadding.com/big-data/map-reduce/pig-to-map-and-reduce/
http://bytepadding.com/big-data/map-reduce/understanding-map-reduce-the-missing-guide/
Ok, a bit large for the comment
...
src = FILTER sourcenew BY ID == id1.$0;
finalsrc = FOREACH src GENERATE *, 'Source' as Source:chararray;
dest = FILTER destnew BY ID == id1.$0;
finaldest = FOREACH dest GENERATE *, 'Destination' as Source:chararray;
final = UNION finalsrc,finaldest ;
A = FOREACH (group final by ID) {
src = filter final by Source == 'Source';
dest = filter final by Source == 'Destination';
GENERATE flatten(PigUDFpck.databag(src, dest));
}
In that case UDF will recieve a a tuple containing 2 bag of tuples for which you can do your comparison. Also I'm pretty sure it can be simplified (I mean, you can do this union and group by right after load - just generate a flag for each telling you from if it's source or detination)
This is an hypothetic example; well, quite not hypothetic since I reproduced it in code:
public final class JooqPlayAround
{
public static void main(final String... args)
throws IOException, SQLException
{
final Path path = Paths.get("/tmp/meh");
if (Files.exists(path))
// see https://github.com/fge/java7-fs-more
MoreFiles.deleteRecursive(path, RecursionMode.KEEP_GOING);
Files.createDirectories(path);
final Path dbpath = path.resolve("foo");
final String db = "jdbc:h2:" + dbpath.toAbsolutePath();
final Connection connection = DriverManager.getConnection(db, "sa", "");
final DSLContext jooq = DSL.using(connection, SQLDialect.H2);
//jooq.dropTableIfExists("bad").execute();
final CreateTableColumnStep step = jooq.createTable("bad")
.column("f1", SQLDataType.INTEGER.nullable(false))
.column("f2", SQLDataType.INTEGER.nullable(false))
.column("fuckedup", SQLDataType.VARCHAR.length(20).nullable(false));
step.execute();
final Table<Record> table = DSL.tableByName("bad");
final Field<Object> f1 = DSL.fieldByName("f1");
final Field<Object> f2 = DSL.fieldByName("f2");
final Field<Object> f3 = DSL.fieldByName("fuckedup");
jooq.insertInto(table, f1, f2, f3)
.values(1, 2, "hello,world")
.execute();
// prints "1 2" as expected
jooq.select(f1, f2)
.from(table)
.forEach(System.out::println);
}
}
I have been looking at many a page and failed to achieve what I want, that is "unfsuck up" the f*ed up column; that is, I'd like to obtain this:
1 2 hello
1 2 world
I am not an SQL expert and am only used to work with properly normalized tables; this isn't one. How do I obtain the result above in SQL, knowing that I use:
jooq 3.5.1,
h2 as an SQL engine?
It looks to me like what is needed is a pivot of some sort, but I am unable to even split the string at the moment :/