R: read.arff error - java

I am starting to work with Weka in R and I got stuck at the first step. I converted my csv file into arff file and I did this using an online converter, but when i tried to read it into R I got the following error message.
require(RWeka)
A <- read.arff("Environmental variables all overviewxlsx.arff")
Error in .jnew("weka/core/Instances", .jcast(reader, "java/io/Reader")) :
java.io.IOException: no valid attribute type or invalid enumeration, read Token[[°C]], line 6
Does anyone have an idea that could help me?
Thanks!
p.s. the proper package (RWeka) is already installed.

Because read.arff() returns a dataframe you could skip the conversion process and use read.csv().
train_arff<-read.arff(file.choose())
str(train_arff)
'data.frame': 14 obs. of 5 variables:
$ outlook : Factor w/ 3 levels "sunny","overcast",..: 1 1 2 3 3 3 2 1 1 3 ...
$ temperature: Factor w/ 3 levels "hot","mild","cool": 1 1 1 2 3 3 3 2 3 2 ...
$ humidity : Factor w/ 2 levels "high","normal": 1 1 1 1 2 2 2 1 2 2 ...
$ windy : logi FALSE TRUE FALSE FALSE FALSE TRUE ...
$ play : Factor w/ 2 levels "yes","no": 2 2 1 1 1 2 1 2 1 1 ...
train_csv<-read.csv(file.choose())
str(train_csv)
'data.frame': 14 obs. of 5 variables:
$ outlook : Factor w/ 3 levels "overcast","rainy",..: 3 3 1 2 2 2 1 3 3 2 ...
$ temperature: Factor w/ 3 levels "cool","hot","mild": 2 2 2 3 1 1 1 3 1 3 ...
$ humidity : Factor w/ 2 levels "high","normal": 1 1 1 1 2 2 2 1 2 2 ...
$ windy : logi FALSE TRUE FALSE FALSE FALSE TRUE ...
$ play : Factor w/ 2 levels "no","yes": 1 1 2 2 2 1 2 1 2 2 ...
Otherwise your .arff file should have this format

Related

Flattening JSON data into individual rows

I am interested in flattening JSON with multiple layers of nested arrays of object. I would ideally like to do this in Java but it seems like the Pandas library in python might be good for this.
Does anyone know a good java library for this?
I found this article (Create a Pandas DataFrame from deeply nested JSON) using pandas and jq and my solution almost works but the output I am receiving is not quite as expected. Here is my code sample
json_data = '''{ "id": 1,
"things": [
{
"tId": 1,
"objs": [{"this": 99},{"this": 100}]
},
{
"tId": 2,
"objs": [{"this": 222},{"this": 22222}]
}
]
}'''
rule = """[{id: .id,
tid: .things[].tId,
this: .things[].objs[].this}]"""
out = jq(rule, _in=json_data).stdout
res = pd.DataFrame(json.loads(out))
The problem is the output I am receiving is this:
id this tid
0 1 99 1
1 1 100 1
2 1 222 1
3 1 22222 1
4 1 99 2
5 1 100 2
6 1 222 2
7 1 22222 2
I am expecting to see
id this tid
0 1 99 1
1 1 100 1
3 1 222 2
4 1 22222 2
Any tips on how to make this work, different solutions, or a java option would be great!
Thanks in advance!
Craig
The problem is that your "rule" creates a Cartesian product, whereas in effect you want nested iteration.
With your input, the following jq expression, which makes the nested iteration reasonably clear, produces the output as shown:
.id as $id
| .things[] as $thing
| $thing.objs[]
| [$id, .this, $thing.tId]
| #tsv
Output
1 99 1
1 100 1
1 222 2
1 22222 2
Rule
So presumably your rule should look something like this:
[{id} + (.things[] | {tid: .tId} + (.objs[] | {this}))]
or if you want to make the nested iteration clearer:
[ .id as $id
| .things[] as $thing
| $thing.objs[]
| {id: $id, this, tid: $thing.tId} ]
Running jq in java
Besides processBuilder, you might like to take a look at these wrappers:
https://github.com/eiiches/jackson-jq
https://github.com/arakelian/java-jq

Java - Hibernate : Generate sequence column based on another field

Need to add a field to the database which will record a sequence number related to that (foreign) id.
Example table data (current):
ID ACCOUNT some_other_stuff
1 1 ...
2 1 ...
3 1 ...
4 2 ...
5 2 ...
6 1 ...
I need to add a sequenceid column which increments separately for each account, achieving:
ID ACCOUNT SEQ some_other_stuff
1 1 1 ...
2 1 2 ...
3 1 3 ...
4 2 1 ...
5 2 2 ...
6 1 4 ...
Note that the sequence is related to account.
Unfortunately this cannot be done with JPA and hibernate. The only solution would be to do it manually in the service. You can use #Generated value on a column but that relies on the database to provide the value. And you cannot create a custom sequence implementation and use #GeneratedValue because that works only for the ID column.

Filter and Group multiple DataSets in spark java

I am very new to spark.The below is the requirement am getting to
1st RDD
empno first-name last-name
0 fname lname
1 fname1 lname1
2nd rdd
empno dept-no dept-code
0 1 a
0 1 b
1 1 a
1 2 a
3rd rdd
empno history-no address
0 1 xyz
0 2 abc
1 1 123
1 2 456
1 3 a12
I have to generate a file combining all the RDDs for each employee, and the average emp-count is 200k
Desired output:
seg-start emp-0
seg-emp 0-fname-lname
seg-dept 0-1-a
seg-dept 0-1-b
seg-his 0-1-xyz
seg-his 0-2-abc
seg-end emp-0
seg-start emp-1
......
seg-end emp-1
How can I achieve this by combining RDDs? Please note that the data is not written straight forward as it was shown here, we are converting data to business valid format(ex:- e0xx5fname5lname is 0-fname-lname), so need help from the experts here, as the current batch program runs for hours to write data, thinking of using spark to process this efficiently.

index out of bound exception in hibernate3

In my project i store and retrieve the timesheet in week manner.
i have a table like this
Id projectId activityId Date Spenttime
1 1 1 2014-11-10 8
2 1 1 2014-11-11 8
3 1 1 2014-11-12 8
4 1 1 2014-11-13 8
5 1 1 2014-11-14 8
6 1 1 2014-11-15 8
7 1 1 2014-11-16 8
8 1 2 2014-11-10 8
9 1 2 2014-11-11 8
10 1 2 2014-11-12 8
11 1 2 2014-11-13 8
12 1 2 2014-11-14 8
13 1 2 2014-11-15 8
14 1 2 2014-11-16 8
15 2 1 2014-11-15 8
16 2 1 2014-11-16 8
i want the result for the above table,like below
projectId activityId 2014-11-10 2014-11-11 2014-11-12 2014-11-13 2014-11-14 2014-11-15 2014-11-16
1 1 8 8 8 8 8 8 8
1 2 8 8 8 8 8 8 8
2 1 0 0 0 0 0 8 8
My hibernate code for the above table
List<Timesheet> timesheetList=sessionfactory.getCurrentSession.createCriteria(Timesheet.class)
.add(Restrctions.between("date",formatter.parse("2014-11-09"),formatter.parse("2014-11-16"))
.list();
Retrieve logic:
List<DisplayTable> dispaly=new ArrayList<DisplayTable>();
for(int i=0;i<timesheetList.size();i+=7)
{
DisplayTable disp=new DisplayTable();
disp.setProjectId(timesheetList.get(i).getProjectId());
disp.setActivityId(timesheetList.get(i).getActivityId());
disp.setSpentTimeDate1(timesheetList.get(i).getSpentTime());
disp.setSpentTimeDate2(timesheetList.get(i+1).getSpentTime());
disp.setSpentTimeDate3(timesheetList.get(i+2).getSpentTime());
disp.setSpentTimeDate4(timesheetList.get(i+3).getSpentTime());
disp.setSpentTimeDate5(timesheetList.get(i+4).getSpentTime());
disp.setSpentTimeDate6(timesheetList.get(i+5).getSpentTime());
disp.setSpentTimeDate7(timesheetList.get(i+6).getSpentTime());
}
The above logic works fine in first two iteration.after that it throws index outofbound exception.
i know the exception is throwed because project 2 contains only 2 rows.
Is their any ways to achive the desired result in hibernate3?
Any help wll be greatly appreciated!!!!
change the condition to
i < timesheetList.size() - 6
because you don't want go +7 than size

Select single row for column value

Here is sample table data which is dynamic.
ColId Name JobId Instance
1 aaaaaaaaa 1 2dc757b
2 bbbbbbbbb 1 2dc757b
3 aaaaaaaaa 1 010dbb8
4 bbbbbbbbb 1 010dbb8
5 bbbbbbbbb 1 faa2733
6 aaaaaaaaa 1 faa2733
7 aaaaaaaaa 1 bc13d69
8 aaaaaaaaa 1 9428f4d
I want output like
ColId Name JobId Instance
1 aaaaaaaaa 1 2dc757b
3 aaaaaaaaa 1 010dbb8
5 bbbbbbbbb 1 faa2733
7 aaaaaaaaa 1 bc13d69
8 aaaaaaaaa 1 9428f4d
What should be the JPA query so that I can retrieve entire row having only single 'Instance'(there is no max min condition involved).
I need one row for each 'Instance' value
FROM table t GROUP BY t.instance should suit your needs.
Something like JPQL "Select entity from Entity entity where entity.id in (select min(subinstance.id) from Entity subinstance group by subinstance.instance)"
Functions like count, min, avg etc are allowed over columns not included in the group by statement, so any such should work if it returns a single id value from the grouping.

Categories