collecting data-frame column names in java - java

I am using spark-sql-2.4.1v with java8.
I have scenario like below
List data = List(
("20", "score", "school", "2018-03-31", 14 , 12 , 20),
("21", "score", "school", "2018-03-31", 13 , 13 , 21),
("22", "rate", "school", "2018-03-31", 11 , 14, 22),
("21", "rate", "school", "2018-03-31", 13 , 12, 23)
)
Dataset<Row> df = = data.toDF("id", "code", "entity", "date", "column1", "column2" ,"column3")
Dataset<Row> resultDs = df
.withColumn("column_names",
array(Arrays.asList(df.columns()).stream().map(s -> new Column(s)).toArray(Column[]::new))
);
**But this is showing respective row columns values instread of column names.
so what is wrong here ? how to get "column_names" in java **
I am trying to solve below use-case:
Lets say i have 100 columns like column1....to column100 ... each column calculation would be different depend on the column name and data .... but every time i run my spark job i will get which columns i need to calculate ... but in my code i will have all columns logic i.e. each column logic might be different ... i need to ignore the logic of unspecified columns... but as the dataframe contain all columns i am selecting specified columns..so for non-selected columns my code throws exception as the column not found ...i need to fix this

Related

How to use elemMatch in Springboot to query an element of an array such that the array has just one column and the column has no field name?

The table I am having is of the following form
{
"element_1": 1,
"element_2": 1,
"elements":[
"ele_1", "ele_2", "ele_3", "ele_4"
]
},
{
"element_1":2,
"element_2":2,
"elements":[
"ele_5", "ele_6", "ele_7", "ele_8"
]
},
{
"element_1": 3,
"element_2": 3,
"elements": [
"ele_9", "ele_10", "ele_11", "ele_12"
]
}
Over here I wanted to query out the document having the element ele_1 in the elements field so that on using the java command
Query query = new Query("Required Criteria");
the document which should get returned should be
{
"element_1": 1,
"element_2": 1,
"elements":[
"ele_1", "ele_2", "ele_3", "ele_4"
]
}
I would like to mention again that the arrays in the field "elements" have no field name hence providing a key parameter while building the Criteria object is not possible. How to get the required result?
you can simply write :
Query query = new Query("{'elements' : 'ele_1'}");
You dont need $elemMatch

Cassandra Saving JSON data in Text Column

In Cassandra DB I have column name as custom_extensions which can contain List<AppEncoded> where AppEncoded is an UDT. The UDT has following fields
type -> TEXT
code -> TEXT
value - TEXT
While saving data to DB the value field can expect input as object.
CurrencyTO:
field -> amount
field -> Symbol
field -> formattedAmount
The implementation to save the column value in DB is as follows:
JacksonJsonCodec<CurrencyTO> jacksonJsonCodec = new JacksonJsonCodec<>(CurrencyTO.class);
appEncodedValue.setValue(jacksonJsonCodec.format(CurrencyTO.getValue()));
CurrencyTo is extending TranserObject which has following attributes as well.
If I see in DB I am seeing the following results:
"value": "'{\"serviceResult\":{\"messagesResult\":[]},\"attributeNames\":[\"amount\",\"isoCode\",\"symbol\",\"decimalValue\",\"formattedAmount\"],\"metadata\":null,\"this\":null,\"amount\":\"45\",\"isoCode\":\"USD\",\"symbol\":\"$\",\"decimalValue\":2.0,\"formattedAmount\":null}'"
The result added the serviceResult, messageResult, metadata and some \ characters as well.
The Expected Result Should be similar as following in DB
"value": {
"amount": 90,
"Symbol": "$",
"formattedAmount" : "90.00"
}
The Reference I followed for implementation is:
custom_codecs

How can I convert JSON to a database table dynamically?

I need to save JSON data to an Oracle database. The JSON looks like this(see below). But it doesn't stay in the same format. I might add some additional nodes or modify existing ones. So is it possible to create or modify oracle tables dynamically to add more columns? I was going to do that with Java. I will create a Java class matching the JSON, convert JSON to Java object and persist it to the table. But how can I modify Java class dynamically? Or would it be better idea to do that with PL/SQL? The JSON comes from a mobile device to a REST web service.
{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
I would suggest that you avoid creating new columns, and instead create a new table that will contain one entry for each of what would have been the new columns. I'm assuming here that the new columns would be menu items. So you would have a "menu" table with these columns:
id file
and you would have a "menuitem" table which would contain one entry for each of your menu items:
id value onclick
So instead of adding columns dynamically, you would be adding records.
I suggested in the comments to change your approach to a NoSQL database like MongoDB. However, if you still feel you need to use a relational database, maybe the EAV model can point you in the right direction.
In summay, you would have a "helper" table that stores which columns an Entity has and their types.
You cannot modify a Java class but you can define the class like a Map and implement the logic to add the desired columns.
Magento, a PHP product, uses EAV in its database.
Mongodb may be your best choice, or you could have a large TEXT field and only extract the columns you are likely to search one.
However, you can CREATE TABLE for additional normalised data and ALTER TABLE to add a column. The later can be particularity expensive.
Use https://github.com/zolekode/json-to-tables/.
Here you go:
import json
from core.extent_table import ExtentTable
from core.table_maker import TableMaker
menu = {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}
menu = json.dumps(menu)
menu = json.loads(menu)
extent_table = ExtentTable()
table_maker = TableMaker(extent_table)
table_maker.convert_json_object_to_table(menu, "menu")
table_maker.show_tables(8)
table_maker.save_tables("menu", export_as="sql", sql_connection="your_connection")
Output:
SHOWING TABLES :D
menu
ID id value popup
0 0 file File 0
1 1 None None None
____________________________________________________
popup
ID
0 0
1 1
____________________________________________________
popup_?_menuitem
ID PARENT_ID is_scalar scalar
0 0 0 False None
1 1 0 False None
2 2 0 False None
____________________________________________________
popup_?_menuitem_$_onclick
ID value onclick PARENT_ID
0 0 New CreateNewDoc() 0
1 1 Open OpenDoc() 1
2 2 Close CloseDoc() 2
3 3 None None None
____________________________________________________
This can be done in MYSQL database:
This code takes a JSON input string and automatically generates
SQL Server CREATE TABLE statements to make it easier
to convert serialized data into a database schema.
It is not perfect, but should provide a decent starting point when starting
to work with new JSON files.
SET NOCOUNT ON;
DECLARE
#JsonData nvarchar(max) = '
{
"Id" : 1,
"IsActive":true,
"Ratio": 1.25,
"ActivityArray":[true,false,true],
"People" : ["Jim","Joan","John","Jeff"],
"Places" : [{"State":"Connecticut", "Capitol":"Hartford", "IsExpensive":true},{"State":"Ohio","Capitol":"Columbus","MajorCities":["Cleveland","Cincinnati"]}],
"Thing" : { "Type":"Foo", "Value" : "Bar" },
"Created_At":"2018-04-18T21:25:48Z"
}',
#RootTableName nvarchar(4000) = N'AppInstance',
#Schema nvarchar(128) = N'dbo',
#DefaultStringPadding smallint = 20;
DROP TABLE IF EXISTS ##parsedJson;
WITH jsonRoot AS (
SELECT
0 as parentLevel,
CONVERT(nvarchar(4000),NULL) COLLATE Latin1_General_BIN2 as parentTableName,
0 AS [level],
[type] ,
#RootTableName COLLATE Latin1_General_BIN2 AS TableName,
[key] COLLATE Latin1_General_BIN2 as ColumnName,
[value],
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColumnSequence
FROM
OPENJSON(#JsonData, '$')
UNION ALL
SELECT
jsonRoot.[level] as parentLevel,
CONVERT(nvarchar(4000),jsonRoot.TableName) COLLATE Latin1_General_BIN2,
jsonRoot.[level]+1,
d.[type],
CASE WHEN jsonRoot.[type] IN (4,5) THEN CONVERT(nvarchar(4000),jsonRoot.ColumnName) ELSE jsonRoot.TableName END COLLATE Latin1_General_BIN2,
CASE WHEN jsonRoot.[type] IN (4) THEN jsonRoot.ColumnName ELSE d.[key] END,
d.[value],
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColumnSequence
FROM
jsonRoot
CROSS APPLY OPENJSON(jsonRoot.[value], '$') d
WHERE
jsonRoot.[type] IN (4,5)
), IdRows AS (
SELECT
-2 as parentLevel,
null as parentTableName,
-1 as [level],
null as [type],
TableName as Tablename,
TableName+'Id' as columnName,
null as [value],
0 as columnsequence
FROM
(SELECT DISTINCT tablename FROM jsonRoot) j
), FKRows AS (
SELECT
DISTINCT -1 as parentLevel,
null as parentTableName,
-1 as [level],
null as [type],
TableName as Tablename,
parentTableName+'Id' as columnName,
null as [value],
0 as columnsequence
FROM
(SELECT DISTINCT tableName,parentTableName FROM jsonRoot) j
WHERE
parentTableName is not null
)
SELECT
*,
CASE [type]
WHEN 1 THEN
CASE WHEN TRY_CONVERT(datetime2, [value], 127) IS NULL THEN 'nvarchar' ELSE 'datetime2' END
WHEN 2 THEN
CASE WHEN TRY_CONVERT(int, [value]) IS NULL THEN 'float' ELSE 'int' END
WHEN 3 THEN
'bit'
END COLLATE Latin1_General_BIN2 AS DataType,
CASE [type]
WHEN 1 THEN
CASE WHEN TRY_CONVERT(datetime2, [value], 127) IS NULL THEN MAX(LEN([value])) OVER (PARTITION BY TableName, ColumnName) + #DefaultStringPadding ELSE NULL END
WHEN 2 THEN
NULL
WHEN 3 THEN
NULL
END AS DataTypePrecision
INTO ##parsedJson
FROM jsonRoot
WHERE
[type] in (1,2,3)
UNION ALL SELECT IdRows.parentLevel, IdRows.parentTableName, IdRows.[level], IdRows.[type], IdRows.TableName, IdRows.ColumnName, IdRows.[value], -10 AS ColumnSequence, 'int IDENTITY(1,1) PRIMARY KEY' as datatype, null as datatypeprecision FROM IdRows
UNION ALL SELECT FKRows.parentLevel, FKRows.parentTableName, FKRows.[level], FKRows.[type], FKRows.TableName, FKRows.ColumnName, FKRows.[value], -9 AS ColumnSequence, 'int' as datatype, null as datatypeprecision FROM FKRows
-- For debugging:
-- SELECT * FROM ##parsedJson ORDER BY ParentLevel, level, tablename, columnsequence
DECLARE #CreateStatements nvarchar(max);
SELECT
#CreateStatements = COALESCE(#CreateStatements + CHAR(13) + CHAR(13), '') +
'CREATE TABLE ' + #Schema + '.' + TableName + CHAR(13) + '(' + CHAR(13) +
STRING_AGG( ColumnName + ' ' + DataType + ISNULL('('+CAST(DataTypePrecision AS nvarchar(20))+')','') + CASE WHEN DataType like '%PRIMARY KEY%' THEN '' ELSE ' NULL' END, ','+CHAR(13)) WITHIN GROUP (ORDER BY ColumnSequence)
+ CHAR(13)+')'
FROM
(SELECT DISTINCT
j.TableName,
j.ColumnName,
MAX(j.ColumnSequence) AS ColumnSequence,
j.DataType,
j.DataTypePrecision,
j.[level]
FROM
##parsedJson j
CROSS APPLY (SELECT TOP 1 ParentTableName + 'Id' AS ColumnName FROM ##parsedJson p WHERE j.TableName = p.TableName ) p
GROUP BY
j.TableName, j.ColumnName,p.ColumnName, j.DataType, j.DataTypePrecision, j.[level]
) j
GROUP BY
TableName
PRINT #CreateStatements;
You can find the solution on https://bertwagner.com/posts/converting-json-to-sql-server-create-table-statements/
ALso JSON can be converted to a POJO class in JAVA :
package com.cooltrickshome;
2
import java.io.File;
3
import java.io.IOException;
4
import java.net.MalformedURLException;
5
import java.net.URL;
6
import org.jsonschema2pojo.DefaultGenerationConfig;
7
import org.jsonschema2pojo.GenerationConfig;
8
import org.jsonschema2pojo.Jackson2Annotator;
9
import org.jsonschema2pojo.SchemaGenerator;
10
import org.jsonschema2pojo.SchemaMapper;
11
import org.jsonschema2pojo.SchemaStore;
12
import org.jsonschema2pojo.SourceType;
13
import org.jsonschema2pojo.rules.RuleFactory;
14
import com.sun.codemodel.JCodeModel;
15
public class JsonToPojo {
16
/**
17
* #param args
18
*/
19
public static void main(String[] args) {
20
String packageName="com.cooltrickshome";
21
File inputJson= new File("."+File.separator+"input.json");
22
File outputPojoDirectory=new File("."+File.separator+"convertedPojo");
23
outputPojoDirectory.mkdirs();
24
try {
25
new JsonToPojo().convert2JSON(inputJson.toURI().toURL(), outputPojoDirectory, packageName, inputJson.getName().replace(".json", ""));
26
} catch (IOException e) {
27
// TODO Auto-generated catch block
28
System.out.println("Encountered issue while converting to pojo: "+e.getMessage());
29
e.printStackTrace();
30
}
31
}
32
public void convert2JSON(URL inputJson, File outputPojoDirectory, String packageName, String className) throws IOException{
33
JCodeModel codeModel = new JCodeModel();
34
URL source = inputJson;
35
GenerationConfig config = new DefaultGenerationConfig() {
36
#Override
37
public boolean isGenerateBuilders() { // set config option by overriding method
38
return true;
39
}
40
public SourceType getSourceType(){
41
return SourceType.JSON;
42
}
43
};
44
SchemaMapper mapper = new SchemaMapper(new RuleFactory(config, new Jackson2Annotator(config), new SchemaStore()), new SchemaGenerator());
45
mapper.generate(codeModel, className, packageName, source);
46
codeModel.build(outputPojoDirectory);
47
}
48
}

Mongodb updating and setting a field in embedded document

I have a collection with embedded documents.
System
{
System_Info: "automated",
system_type:
{
system_id:1,
Tenant: [
{
Tenant_Id: 1,
Tenant_Info: "check",
Prop_Info: ...
},
{
Tenant_Id: 2,
Tenant_Info: "sucess",
Prop_Info: ...
} ]
}}
I need to update and set the field Tenant_Info to "failed" in Tenant_Id: 2
I need to do it using mongodb java. I know to insert another tenant information in the tenant array. But here I need to set the field using java code.
Could anyone help me to do this?
How about something like this (untested):
db.coll.update(
{
"System.system_type.Tenant.Tenant_Id" : 2
},
{
$set : {
"System.system_type.Tenant.$.Tenant_Info" : "failed"
}
},
false,
true
);
It should update the first nested document in the collection with a Tenant_id of 2 for all top level documents. If you need to target a specific top level document, you need to add it to the as field on the first object argument in the update call.
And the equivalent in Java:
BasicDBObject find = new BasicDBObject(
"System.system_type.Tenant.Tenant_Id", 2
);
BasicDBObject set = new BasicDBObject(
"$set",
new BasicDBObject("System.system_type.Tenant.$.Tenant_Info", "failed")
);
coll.update(find, set);

CSV-file with many tables

The csv file contains more than one table, it might look like this:
"Vertical Table 1"
,
"id","visits","downloads"
1, 4324, 23
2, 664, 42
3, 73, 44
4, 914, 8
"Vertical Table 2"
,
"id_of_2nd_tab","visits_of_2nd_tab","downloads_of_2nd_tab"
1, 524, 3
2, 564, 52
3, 63, 84
4, 814, 8
To read one table I use "HeaderColumnNameTranslateMappingStrategy" from opencsv
which allows me to map the csv-table entries into a List of TableDataBean objects, as seen below:
HeaderColumnNameTranslateMappingStrategy<TableDataBean> strat = new HeaderColumnNameTranslateMappingStrategy<TableDataBean>();
CSVReader reader = new CSVReader(new FileReader(path), ',');
strat.setType(TableDataBean.class);
Map<String, String> map = new HashMap<String, String>();
map.put("Number of visits", "visits");
map.put("id", "id");
map.put("Number of downloads", "downloads");
strat.setColumnMapping(map);
CsvToBean<TableDataBean> csv = new CsvToBean<TableDataBean>();
List<TableDataBean> list = csv.parse(strat, reader);
This works fine for the first table, but when it cames to the second, the values and the attributes are mapped to the same attribute of the first table. The output for
for(TableDataBean bean : list){System.out.println(bean.getVisits());}
would look like this:
4324
664
73
914
null
null
null
visits_of_2nd_tab
524
564
63
814
I don't wanna split the file into many files containing each of them one table.
So what do you suggest ? Is there any other Library that supports this format?
I've got it! I thought that the type of reader have to be of CSVReader. It actually turned out that I can feed the methode parse with any object inheriting from the Reader class.
Now I can read the entire csv-file into a String, splitt it, pack each of the new Strings into a StringReader and than pass it to the parse methode.

Categories