SQL query using Spark and java language - java

I have two dataframe on spark .
The first dataframe1 is :
+--------------+--------------+--------------+
|id_z |longitude |latitude |
+--------------+--------------+--------------+
|[12,20,30 ] |-7.0737816 | 33.82666 |
|13 |-7.5952683 | 33.5441916 |
+--------------+--------------+--------------+
The second dataframe2 is :
+--------------+--------------+---------------+
|id_z2 |longitude2 |latitude2 |
+--------------+--------------+---------------+
| 14 |-8.5952683 | 38.5441916 |
| 12 |-7.0737816 | 33.82666 |
+--------------+--------------+---------------+
I want to apply the logic of the following request.
String sql = "SELECT * FROM dataframe2 WHERE dataframe2 .id_z2 IN ("'"
+ id_z +"'") and longitude2 = "'"+longitude+"'" and latitude = "'"+latitude+"'"";
I prefer not to have a join, is it possible to do this?
I really need your help , or just a starting point that will make things easier for me.
Thnak you

Related

Is there a way of checking for multiple string sequences with String.matches in Java?

I want to look for any of the following sequences: either one of * - / + followed by a * + /
for example:
4+*2 is something I am looking for.
4+5/2 is not
if (combinedArray.matches("[-*+/][*+/]"))
{
...code
}
I'd be happy to know what I did wrong.
I basicly want it to have the same logic as this:
if (combinedArray.contains("*/") | combinedArray.contains("*+") | combinedArray.contains("**")
| combinedArray.contains("//") | combinedArray.contains("/+") | combinedArray.contains("/*")
| combinedArray.contains("+/") | combinedArray.contains("++") | combinedArray.contains("+*")
| combinedArray.contains("-/") | combinedArray.contains("-+") | combinedArray.contains("-*") )
~~~

Detected implicit cartesian product for INNER join between logical plan

I'm trying to join to two datasets
Ds1
+-------------+-----------------+-----------+
| countryName|countryPopulation|countrySize|
+-------------+-----------------+-----------+
| China| 1210004992| 9596960|
| India| 952107712| 3287590|
| UnitedStates| 266476272| 9372610|
| Indonesia| 206611600| 1919440|
| Brazil| 162661216| 8511965|
| Russia| 148178480| 17075200|
| Pakistan| 129275664| 803940|
| Japan| 125449704| 377835|
| Bangladesh| 123062800| 144000|
| Nigeria| 103912488| 923770|
| Mexico| 95772464| 1972550|
| Germany| 83536112| 356910|
| Philippines| 74480848| 300000|
| Vietnam| 73976976| 329560|
| Iran| 66094264| 1648000|
| Egypt| 63575108| 1001450|
| Turkey| 62484480| 780580|
| Thailand| 58851356| 514000|
|UnitedKingdom| 58489976| 244820|
| France| 58317448| 547030|
+-------------+-----------------+-----------+
Ds2:
+------------+-----------------+-----------+
| countryName|countryPopulation|countrySize|
+------------+-----------------+-----------+
| China| 1210004992| 9596960|
| India| 952107712| 3287590|
|UnitedStates| 266476272| 9372610|
| Indonesia| 206611600| 1919440|
| Brazil| 162661216| 8511965|
| Russia| 148178480| 17075200|
| Pakistan| 129275664| 803940|
| Japan| 125449704| 377835|
| Bangladesh| 123062800| 144000|
| Nigeria| 103912488| 923770|
| Germany| 83536112| 356910|
| Vietnam| 73976976| 329560|
| Iran| 66094264| 1648000|
| Thailand| 58851356| 514000|
| France| 58317448| 547030|
| Italy| 57460272| 301230|
| Ethiopia| 57171664| 1127127|
| Ukraine| 50864008| 603700|
| Zaire| 46498540| 2345410|
| Burma| 45975624| 678500|
+------------+-----------------+-----------+
When I perform below operation I ge the output
Dataset<Row> ds3 = ds2.filter(ds2.col("countryPopulation").cast("int").$greater(100000))
.join(ds1, ds1.col("countrySize")
.equalTo(ds2.col("countrySize")));
ds3.show();
But when I do below operation, I'm getting error
Dataset<Row> ds3 = ds2.filter(ds2.col("countryPopulation").cast("int").$greater(100000))
.join(ds1, ds1.col("countrySize").cast(DataTypes.IntegerType)
.equalTo(ds2.col("countrySize").cast(DataTypes.IntegerType)), "inner");
ds3.show();
Error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for INNER join between logical plans
Project [country#6.name AS countryName#2, country#6.population AS countryPopulation#3, country#6.area AS countrySize#4]
+- Filter (isnotnull(country#6) && (Contains(country#6.name, a) && ((cast(country#6.population as int) > 100000) && (cast(country#6.area as int) = cast(country#6.area as int)))))
+- Generate explode(countries#0.country), [0], false, t, [country#6]
+- Relation[countries#0] json
May I know Please how should cast and join at the same time..? And why am getting this error..?
What's a meaning of this "Detected implicit cartesian product for INNER join between logical plans" in error ?
I have seen cartesian join happens when the where condition contains a function call with parameters from both data frames. Something like df1.join(df2, aFunction(df1.column, df2.column). Here I don't see that exactly but I suspect something like that is going on.
Try below to get the function applied on select condition rather than where.
Dataset<Row> ds1_1 = ds1.select(col("countrySize").cast(DataTypes.IntegerType).as("countrySize")) // add all columns here
Dataset<Row> ds2_1 = ds2.select(col("countrySize").cast(DataTypes.IntegerType).as("countrySize"),ds2.col("countryPopulation").cast("int").as("countryPopulation")) // add all columns here
Dataset<Row> ds3 = ds2.filter(ds2_1.col("countryPopulation").$greater(100000))
.join(ds1_1, ds1_1.col("countrySize")
.equalTo(ds2_1.col("countrySize")), "inner");
ds3.show();

How to convert double with scientific format to String Using spark Java API?

I'm new in spark Java API. I want to transform double with scientific format example: 1.7E7---->17000000,00.
MyDataSet is:
+---------+------------+
| account| amount |
+---------+------------+
| c1 | 1.7E7 |
| c2 | 1.5E8 |
| c3 | 142.0 |
I want to transform my dataset to something like this.
+---------+----------------------+
| account| amount |
+---------+----------------------+
| c1 | 17000000,00 |
| c2 | 1500000000,00 |
| c3 | 142,00 |
Can someOne guide me with an expression in spark Java to resolve this.
Thanks in advance.
I think you can do it like this. Don't forget import spark.sql.functions
Dataset<Row> myDataset = ....
myDataset = myDataset.withColumn("newAmount", col("amount").cast(DataTypes.DoubleType))
.drop(col("amount"))
.withColumnRenamed("newAmount","amount")
.toDF();

how to create a complex object from a Cucumber table?

I do not know how to read table from .feature and populate correctly
| payInstruction.custodian | and | payInstruction.acctnum |
like the internal class.
I have a table:
| confirmationId | totalNominal | payInstruction.custodian | payInstruction.acctnum |
| 1 | 100.1321 | yyy | yyy |
| 2 | 100.1351 | zzz | zzz |
and I have class template which has the next structure:
class Confirmation {
String confirmationId;
double totalNominal;
PayInstruction payInstruction;
}
class PayInstruction {
String custodian;
String acctnum;
}
auto converting table to List<Confirmation> has error because cannot recognize payInstruction.acctnum and pay payInstruction.custodian
any help ?
I know the question is a bit old, but Google did drive me here and could do so with others in the future.
.feature adaptations according to the question :
Given some confirmations:
| confirmationId | totalNominal |
| 1 | 100.1321 |
| 2 | 100.1351 |
And some pay instructions:
| confirmationId | custodian | acctnum |
| 1 | yyy | yyy |
| 2 | zzz | zzz |
Steps implementation :
Map<Integer, Confirmation> confirmations;
#Given('^some confirmations:$)
public void someConfirmations(List<Confirmation> confirmations) {
this.confirmations = confirmations.stream().collect(Collectors.toMap(Confirmation::getConfirmationId, Function.identity()));
}
#And('^some pay instructions:$)
public void somePayInstructions(List<PayInstructionTestObject> payInstructions) {
payInstructions.forEach(pi ->
this.confirmations.get(pi.getConfirmationId()).addPayInstruction(pi.toPayInstruction())
);
}
The trick is to create a sub class of PayInstruction in test folder which holds a confirmation id as correlation identifier to retrieve the correct confirmation. The toPayInstruction method serves as converter to get rid of the test object.
Hope the Java and feature code is almost compiling, I'm writing this without effectively making it run. Slight adaptations might be necessary to make it run.
The original business model was untouched by the solution, not breaking / tweaking it for testing.
My approach would be to supply the constructor for Confirmation with four primitives and then create the PayInstruction in the constructor of Confirmation.

Select data from specific year

I need a solution for my problem here.
I got 2 tables, assetdetail and assetcondition. Here is the structure of those tables.
assetdetail
-----------------------------------------------------------
| sequenceindex | assetcode | assetname | acquisitionyear |
-----------------------------------------------------------
| 1 | 110 | Car | 2012-06-30 |
| 2 | 111 | Bus | 2013-02-12 |
assetcondition
--------------------------------------------------------------------------
|sequenceindex | indexassetdetail | fiscalyear | assetamount | assetprice |
---------------------------------------------------------------------------
| 1 | 1 | 2012 | 1 | 20000000 |
| 2 | 1 | 2013 | 1 | 15000000 |
| 3 | 2 | 2013 | 1 | 25000000 |
And i want the result is like this:
------------------------
assetname | assetprice |
------------------------
Car | 20000000 |
Bus | 25000000 |
Note: using "SELECT WHERE fiscalyear = "
Without explaining how your tables are linked one can only guess. Here's the query I came up with.
select assetdetail.assetname,
sum( assetcondition.assetprice )
from assetdetail
inner join assetcondition
on assetcondition.indexassetdetail = assetdetail.sequenceindex
where assetcondition.fiscalyear = 2013
group by assetdetail.assetname;
I haven't understand from a logical point of view your query. By the way the operator that you have to you use is the JOIN's one.
The SQL that follows, I don't know if it is what you want.
Select assetname, assetprice
From assetdetail as ad join assetcondition as ac on (as.sequenceindex = ac.sequenceindex)
Where fiscalyear = '2013'
Not quite sure if it is what you're looking for, but I guess what you want is a JOIN:
SELECT
assetdetail.assetname, assetcondition.assetprice
FROM
assetdetail
JOIN
assetcondition
ON
assetdetail.sequenceindex = assetcondition.sequenceindex
WHERE
YEAR(acquisitionyear) = '2013'

Categories