Count Unique Text and value count

Count Unique Text and value count - java

I would like to get the result unique count value/text/ etc
A B
2 BADER 111
3 FAISA 112
4 NASSE 113
5 NASSE 113
6 MOHS 121
7 ASI 122
8 AHME 100
9 AHME 100
10 AHME 100
11 ASI 122
RESULT AS BELOW.
A B
2 BADER 111
3 FAISA 112
4 NASSE 113
5 NASSE 113
6 MOHS 121
7 ASI 122
8 AHME 100
9 AHME 100
10 AHME 100
11 ASI 122
6 6

For the number of different values in A2:A11 try this formula
=SUMPRODUCT((A2:A11<>"")/COUNTIF(A2:A11,A2:A11&""))
That will work for numeric or text values

Related

Select the first duplicated element

I have 2 Classes in JAVA ( MODEL1 && MODEL2 )
As you can see here :
MODEL1_ID ACTN_DTE MODEL2_ID
---------- --------- --------------
1 14/11/19 18
1000 14/11/19 4
1001 14/11/19 19
1002 14/11/19 4
1003 14/11/19 4
1004 14/11/19 18
2000 14/11/19 5
I am trying to find a way with SQL Or HQL to get all elements from MODEL1 that have a list of MODEL2_ID , get only the first (min MODEL1_ID) MODEL1 per MODEL2_ID ( in case if it's duplucated ).
Exemple :
Input : MODEL2_ID in (18,4,19,5)
MODEL1_ID ACTN_DTE MODEL2_ID
---------- --------- --------------
1 14/11/19 18
1000 14/11/19 4
1001 14/11/19 19
2000 14/11/19 5

select MIN(MODEL1_ID) FROM table GROUP BY (MODEL2_ID)

It is possible that by "first" you mean the minimum actn_date and the question just has a useless sample of data (because all the values are the same).
If so, you can use aggregation with keep to get the first value by actn_date:
select model2_id, min(actn_date) as actn_date,
min(model1_id) keep (dense_rank first order by actn_date) as model1_id
from t
group by model2_id;

Regexp for german phone number format

I try to get phone numbers from string in german format. But I don't get it to full run. The input text is a full HTML-Page with lots of content, not only the numbers.
Possible Formats:
(06442) 3933023
(02852) 5996-0
(042) 1818 87 9919
06442 / 3893023
06442 / 38 93 02 3
06442/3839023
042/ 88 17 890 0
+49 221 549144 – 79
+49 221 - 542194 79
+49 (221) - 542944 79
0 52 22 - 9 50 93 10
+49(0)121-79536 - 77
+49(0)2221-39938-113
+49 (0) 1739 906-44
+49 (173) 1799 806-44
0173173990644
0214154914479
02141 54 91 44 79
01517953677
+491517953677
015777953677
02162 - 54 91 44 79
(02162) 54 91 44 79
I have tried:
$regex = '~(?:\+?49|0)(?:\s*\d{3}){2}\s*\d{4,10}~';
if(preg_match_all($regex, $input_imprint , $matches)){
print_r($matches);
}
But it doesn't match only a few formats. I have no idea to do it.

Here is a regex to match all your formats.
I would suggest then to replace all unwanted characters and you got your desired result.
(\(?([\d \-\)\–\+\/\(]+)\)?([ .\-–\/]?)([\d]+))
If you need a minimum length to match your numbers, use this:
(\(?([\d \-\)\–\+\/\(]+){6,}\)?([ .\-–\/]?)([\d]+))
https://regex101.com/r/CAVex8/143
updated, thanks for the suggestion #Willi Mentzel

[0-9]*\/*(\+49)*[ ]*(\([0-9]+\))*([ ]*(-|–)*[ ]*[0-9]+)*
Check this link: https://regex101.com/r/CAVex8/1
May introduce some false positives.

This one solved my problem (extracting phone numers from emails):
r"\+?[0-9]+([0-9]|\/|\(|\)|\-| ){10,}"
A plus sign optional at the front, followed by at least 1 number, followed by at least 10 numbers or delimiting characters such as /, (, ) or - or a space.
(There is no official "smallest number of digits" for a telephone number, but I assume they are all at least 11 digits long)
I'm adding this because #Kakul 's solution matched any lien of my text, and using #despecial 's my code would not terminate. (I am guessing it is too computationally expensive for my pc)

This is no solution for the asked question, just an advice for matching phonenumbers!
If you are about to store telephone numbers for you first time, then limit the amount of different accepted formats. Get rid of these for example:
(06442) 3933023
042/ 88 17 890 0
+49(0)121-79536 - 77
02162 - 54 91 44 79
Why?
You need to test more possible ways of inputting an invalid value.
Those formats you absolutely need to concider according to DIN 5008:
0873 376461
03748 37682358
05444 347687-350
0764 812632-41
0180 2 12334
0800 5 23234213
+49 30 3432622-113
0179 1111111
Here is what I came up with: Regex
^(([+]{1}[1-9]{1}[0-9]{0,2}[ ]{1}([1-9]{1}[0-9]{1,4}){1}[ ]{1}([1-9]{1}[0-9]{2,6}){1}([ -][0-9]{1,5})?)|([0]{1}[1-9]{1}[0-9]{1,4}[ ]{1}[0-9]{1,8}([ -][0-9]{1,8})?)?)
Positives:
06429 1111
06901 306180
06429 231
0800 3301000
0179 1111111
0873 376461
03748 37682358
05444 347687-350
0764 812632-41
0180 2 12334
0800 5 23234213
+49 6429 1111
+49 39857 2530
+55 11 2666-0054
+300 11 2666-0054
+49 641 20106 0
+49 641 20106
+49 30 3432622-113
Negatives:
++49 157 184977
+300 11 0000-0000
(06442) 3933023
(02852) 5996-0
(042) 1818 87 9919
06442 / 3893023
06442 / 38 93 02 3
06442/3839023
042/ 88 17 890 0
+49 221 - 542194 79
+49 (221) - 542944 79
0 52 22 - 9 50 93 10
+49(0)121-79536 - 77
+49(0)2221-39938-113
+49 (0) 1739 906-44
+49 (173) 1799 806-44
0173173990644
0214154914479
01517953677
+491517953677
015777953677
02162 - 54 91 44 79
(02162) 54 91 44 79
saddsadasdasd
asdasd
asdasd asdasd asd
asdasd
kjn asohas asdoiasd
23434 234 234 23
323
23434 234----234
///// ----
// id8834 3493934 //

Hey i have a little enhancement for despecial‘s Regex:
(\(?([\d \-\)\–\+\(]+\/?){6,}\)?([ .\-–\/]?)([\d]+))
It filters numbers that have too high occurrencies of /

organizing query result in sql developer (oracle 11g)

Currently i have a table called schedule in db (SQL developer).
Assuming theavailableIDs are 1,3,7,8. The table consists of something like this:
Stud Title Supervisor Examiner availableID
abc Hello 1024 1001 1
def Hi 1024 1001 1
ghi Hey 1002 1004 1
xxx hhh 1020 1011 1
jkl hhh 1027 1010 1
try ttt 1001 1011 1
654 bbb 1007 1012 1
gyg 888 1027 1051 1
yyi 333 1004 1022 3
fff 111 1027 1041 3
ggg 222 1032 1007 3
hhh 444 1007 1001 3
ppp 444 1005 1072 7
ooo 555 1067 1009 7
uuu 666 1030 1010 7
yyy 777 1004 1001 7
qqq yhh 1015 1072 8
www 767 1017 1029 8
eee 566 1030 1020 8
rrr 888 1004 1031 8
abc 5555 1045 1051 8
As you can see, I have sort these value using ORDER BY availableID asc.
However, I would like to ORGANIZE them again into something like this:
Stud Title Supervisor Examiner availableID
abc Hello 1024 1001 1
def Hi 1024 1001 1
ghi Hey 1002 1004 1
xxx hhh 1020 1011 1
yyi 333 1004 1022 3
fff 111 1027 1041 3
ggg 222 1032 1007 3
hhh 444 1007 1001 3
ppp 444 1005 1072 7
ooo 555 1067 1009 7
uuu 666 1030 1010 7
yyy 777 1004 1001 7
qqq yhh 1015 1072 8
www 767 1017 1029 8
eee 566 1030 1020 8
rrr 888 1004 1031 8
jkl hhh 1027 1010 1
try ttt 1001 1011 1
654 bbb 1007 1012 1
gyg 888 1027 1051 1
........
abc 5555 1045 1051 8
For every availableID it will called four times then proceed to next availableID. Next it will iterate back to the lowest ID but using different other values. Stud must be distinct.
Is it possible to achieve this by using sql query?

You can do this with row_number() and some arithmetic. Something like:
Select t.*
From (select t.*,
Row_number() over (partition by availableid order by stud) as seqnum
From t
) t
Order by trunc((seqnum - 1) / 4), availableid

A slightly another equivalent approach as above, using floor and partitioning and grouping by the same availableID column -
select a.stud,
a.title,
a.supervisor,
a.examiner,
a.availableID
from ( select s.*,row_number() over (partition by availableID order by availableID) rn
from student s) a
order by floor((rn-1)/4),availableID

I got a different result when I retrained the sentiment model with Stanford CoreNLP to compare with the related paper's result

I downloaded stanford-corenlp-full-2015-12-09.
And I created a training model with the following command:
java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz
When I finished training, I found many files in my directory.
the model list
Then I used the evaluation tool from the package and I ran the code like this:
java -cp * edu.stanford.nlp.sentiment.Evaluate -model model-0024-79.82.ser.gz -treebank test.txt
The test.txt was from trainDevTestTrees_PTB.zip. This is the result about code:
F:\trainDevTestTrees_PTB\trees>java -cp * edu.stanford.nlp.sentiment.Evaluate -model model-0024-79.82.ser.gz -treebank test.txt
EVALUATION SUMMARY
Tested 82600 labels
65331 correct
17269 incorrect
0.790932 accuracy
Tested 2210 roots
890 correct
1320 incorrect
0.402715 accuracy
Label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 551 340 87 32 6 1016
1 956 5348 2476 686 191 9657
2 354 2812 51386 3097 467 58116
3 146 744 2525 6804 1885 12104
4 1 11 74 379 1242 1707
Marg. (Gold) 2008 9255 56548 10998 3791
0 prec=0.54232, recall=0.2744, spec=0.99423, f1=0.36442
1 prec=0.5538, recall=0.57785, spec=0.94125, f1=0.56557
2 prec=0.8842, recall=0.90871, spec=0.74167, f1=0.89629
3 prec=0.56213, recall=0.61866, spec=0.92598, f1=0.58904
4 prec=0.72759, recall=0.32762, spec=0.9941, f1=0.4518
Root label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 50 60 12 9 3 134
1 161 370 147 94 36 808
2 31 103 102 60 32 328
3 36 97 123 305 265 826
4 1 3 5 42 63 114
Marg. (Gold) 279 633 389 510 399
0 prec=0.37313, recall=0.17921, spec=0.9565, f1=0.24213
1 prec=0.45792, recall=0.58452, spec=0.72226, f1=0.51353
2 prec=0.31098, recall=0.26221, spec=0.87589, f1=0.28452
3 prec=0.36925, recall=0.59804, spec=0.69353, f1=0.45659
4 prec=0.55263, recall=0.15789, spec=0.97184, f1=0.24561
Approximate Negative label accuracy: 0.638817
Approximate Positive label accuracy: 0.697140
Combined approximate label accuracy: 0.671925
Approximate Negative root label accuracy: 0.702851
Approximate Positive root label accuracy: 0.742574
Combined approximate root label accuracy: 0.722680
The accuracy about fine-grained and positive/negative was quite different from the paper "Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C., 2013, October. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (Vol. 1631, p. 1642)."
The paper states the accuracy about fine-grained and positive/negative is higher than mine.
The records in the paper
Were there any problems with my operation? Why was my result different from the paper?

The short answer is that the paper used a different system written in Matlab. The Java system does not match the paper. Though we do distribute the binary model we trained in Matlab with the English models jar. So you can RUN the binary model with Stanford CoreNLP, but you cannot TRAIN a binary model with similar performance with Stanford CoreNLP at this time.

index out of bound exception in hibernate3

In my project i store and retrieve the timesheet in week manner.
i have a table like this
Id projectId activityId Date Spenttime
1 1 1 2014-11-10 8
2 1 1 2014-11-11 8
3 1 1 2014-11-12 8
4 1 1 2014-11-13 8
5 1 1 2014-11-14 8
6 1 1 2014-11-15 8
7 1 1 2014-11-16 8
8 1 2 2014-11-10 8
9 1 2 2014-11-11 8
10 1 2 2014-11-12 8
11 1 2 2014-11-13 8
12 1 2 2014-11-14 8
13 1 2 2014-11-15 8
14 1 2 2014-11-16 8
15 2 1 2014-11-15 8
16 2 1 2014-11-16 8
i want the result for the above table,like below
projectId activityId 2014-11-10 2014-11-11 2014-11-12 2014-11-13 2014-11-14 2014-11-15 2014-11-16
1 1 8 8 8 8 8 8 8
1 2 8 8 8 8 8 8 8
2 1 0 0 0 0 0 8 8
My hibernate code for the above table
List<Timesheet> timesheetList=sessionfactory.getCurrentSession.createCriteria(Timesheet.class)
.add(Restrctions.between("date",formatter.parse("2014-11-09"),formatter.parse("2014-11-16"))
.list();
Retrieve logic:
List<DisplayTable> dispaly=new ArrayList<DisplayTable>();
for(int i=0;i<timesheetList.size();i+=7)
{
DisplayTable disp=new DisplayTable();
disp.setProjectId(timesheetList.get(i).getProjectId());
disp.setActivityId(timesheetList.get(i).getActivityId());
disp.setSpentTimeDate1(timesheetList.get(i).getSpentTime());
disp.setSpentTimeDate2(timesheetList.get(i+1).getSpentTime());
disp.setSpentTimeDate3(timesheetList.get(i+2).getSpentTime());
disp.setSpentTimeDate4(timesheetList.get(i+3).getSpentTime());
disp.setSpentTimeDate5(timesheetList.get(i+4).getSpentTime());
disp.setSpentTimeDate6(timesheetList.get(i+5).getSpentTime());
disp.setSpentTimeDate7(timesheetList.get(i+6).getSpentTime());
}
The above logic works fine in first two iteration.after that it throws index outofbound exception.
i know the exception is throwed because project 2 contains only 2 rows.
Is their any ways to achive the desired result in hibernate3?
Any help wll be greatly appreciated!!!!

change the condition to
i < timesheetList.size() - 6
because you don't want go +7 than size

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Count Unique Text and value count - java

For the number of different values in A2:A11 try this formula =SUMPRODUCT((A2:A11<>"")/COUNTIF(A2:A11,A2:A11&"")) That will work for numeric or text values

Related

Select the first duplicated element

Regexp for german phone number format

organizing query result in sql developer (oracle 11g)

I got a different result when I retrained the sentiment model with Stanford CoreNLP to compare with the related paper's result

index out of bound exception in hibernate3

Categories

Resources