I'm new spark Java API. My dataset contains two columns (account, Lib) . I want to display accounts having differents lib. In fact my dataset is something like this.
ds1
+---------+------------+
| account| Lib |
+---------+------------+
| 222222 | bbbb |
| 222222 | bbbb |
| 222222 | bbbb |
| | |
| 333333 | aaaa |
| 333333 | bbbb |
| 333333 | cccc |
| | |
| 444444 | dddd |
| 444444 | dddd |
| 444444 | dddd |
| | |
| 555555 | vvvv |
| 555555 | hhhh |
| 555555 | vvvv |
I want to get ds2 like this:
+---------+------------+
| account| Lib |
+---------+------------+
| | |
| 333333 | aaaa |
| 333333 | bbbb |
| 333333 | cccc |
| | |
| 555555 | vvvv |
| 555555 | hhhh |
If groups are small you can use window functions:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
df
.withColumn("cnt", approx_count_distinct("Lib").over(Window.partitionBy("account")).alias("cnt"))
.where(col("cnt") > 1)
If groups are large:
df.join(
df
.groupBy("account")
.agg(countDistinct("Lib").alias("cnt")).where(col("cnt") > 1),
Seq("account"),
"leftsemi"
)
Related
I create a temp table which consist of several table with UNION ALL statement like here below. I want later map this table to the entity for repository in spring. With other words I wanna map temp table to entity in spring jpa or hibernate.
select * from name UNION ALL
select * from soft where id >3
into temp namesoft_tmp
I tried the following.
select * from namesoft_tmp
but i can't see what is the column which can point me to the conclusion that this is primary key.
What is the unique id(primary key) of table namesoft_tmp?
How can i add auto generated id to temp table?
How can i excute select statement based on unique id?**
In general, the result of a UNION ALL query does not have a primary key; there is no guarantee that there are not duplicate rows in the result set.
Imagine a table describing the table of elements — called elements.
SELECT * FROM elements WHERE atomic_number < 10
UNION ALL
SELECT * FROM elements WHERE symbol MATCHES '[A-F]*'
INTO TEMP union_all;
Here, the elements Boron (B), Carbon (C), Beryllium (Be) and Fluorine (F) are all listed twice.
However, you can use:
SELECT ROWID, * FROM union_all ORDER BY atomic_number;
to get a unique identifier, the ROWID, in the result set. Note that this unique identifier is unique at any given time, but is not guaranteed to be stable. If you delete rows and add them again, the ROWID of the replaced rows may be different from before. But the ROWID will be unique until you edit the table.
+-------+--------+--------+--------------+-----------+--------+-------+
| rowid | atomic | symbol | name | atomic | period | group |
| | number | | | weight | | |
+-------+--------+--------+--------------+-----------+--------+-------+
| 257 | 1 | H | Hydrogen | 1.0079 | 1 | 1 |
| 258 | 2 | He | Helium | 4.0026 | 1 | 18 |
| 259 | 3 | Li | Lithium | 6.9410 | 2 | 1 |
| 260 | 4 | Be | Beryllium | 9.0122 | 2 | 2 |
| 266 | 4 | Be | Beryllium | 9.0122 | 2 | 2 |
| 267 | 5 | B | Boron | 10.8110 | 2 | 13 |
| 261 | 5 | B | Boron | 10.8110 | 2 | 13 |
| 268 | 6 | C | Carbon | 12.0110 | 2 | 14 |
| 262 | 6 | C | Carbon | 12.0110 | 2 | 14 |
| 263 | 7 | N | Nitrogen | 14.0070 | 2 | 15 |
| 264 | 8 | O | Oxygen | 15.9990 | 2 | 16 |
| 265 | 9 | F | Fluorine | 18.9980 | 2 | 17 |
| 269 | 9 | F | Fluorine | 18.9980 | 2 | 17 |
| 270 | 13 | Al | Aluminium | 26.9820 | 3 | 13 |
| 271 | 17 | Cl | Chlorine | 35.4530 | 3 | 17 |
| 272 | 18 | Ar | Argon | 39.9480 | 3 | 18 |
| 273 | 20 | Ca | Calcium | 40.0780 | 4 | 2 |
| 274 | 24 | Cr | Chromium | 51.9960 | 4 | 6 |
| 275 | 26 | Fe | Iron | 55.8450 | 4 | 8 |
| 276 | 27 | Co | Cobalt | 58.9330 | 4 | 9 |
| 277 | 29 | Cu | Copper | 63.5460 | 4 | 11 |
| 278 | 33 | As | Arsenic | 74.9220 | 4 | 15 |
| 279 | 35 | Br | Bromine | 79.9040 | 4 | 17 |
| 280 | 47 | Ag | Silver | 107.8700 | 5 | 11 |
| 281 | 48 | Cd | Cadmium | 112.4100 | 5 | 12 |
| 282 | 55 | Cs | Caesium | 132.9100 | 6 | 1 |
| 283 | 56 | Ba | Barium | 137.3300 | 6 | 2 |
| 284 | 58 | Ce | Cerium | 140.1200 | 6 | L |
| 285 | 63 | Eu | Europium | 151.9600 | 6 | L |
| 286 | 66 | Dy | Dyprosium | 162.5000 | 6 | L |
| 287 | 68 | Er | Erbium | 167.2600 | 6 | L |
| 288 | 79 | Au | Gold | 196.9700 | 6 | 11 |
| 289 | 83 | Bi | Bismuth | 208.9800 | 6 | 15 |
| 290 | 85 | At | Astatine | 209.9900 | 6 | 17 |
| 291 | 87 | Fr | Francium | 223.0200 | 7 | 1 |
| 292 | 89 | Ac | Actinium | 227.0300 | 7 | A |
| 293 | 95 | Am | Americium | 243.0600 | 7 | A |
| 294 | 96 | Cm | Curium | 247.0700 | 7 | A |
| 295 | 97 | Bk | Berkelium | 247.0700 | 7 | A |
| 296 | 98 | Cf | Californium | 251.0800 | 7 | A |
| 297 | 99 | Es | Einsteinium | 252.0800 | 7 | A |
| 298 | 100 | Fm | Fermium | 257.1000 | 7 | A |
| 299 | 105 | Db | Dubnium | 270.1300 | 7 | 5 |
| 300 | 107 | Bh | Bohrium | 270.1300 | 7 | 7 |
| 301 | 110 | Ds | Darmstadtium | 281.1700 | 7 | 10 |
| 302 | 112 | Cn | Copernicium | 285.1800 | 7 | 12 |
| 303 | 114 | Fl | Flerovium | 289.1900 | 7 | 14 |
+-------+--------+--------+--------------+-----------+--------+-------+
For the past year, I've been developing a system that uses ANTLR for parsing. Now in UAT and running on a Tomcat server, we are regularly having performance issues due to threads running indefinitely while attempting to parse certain street addresses. For instance, we've found some running over 7.5 hours before we caught and killed them. When enough get in this state, they will tie up all the cores and cause every request sent to the system to timeout. Although we've upgraded to ANTLR 4.7 (previously 4.5.3) and reduced the occurrence frequency through grammar changes over the past few weeks, we still haven't been able to fully resolve the issue. Below is the log for one of these threads as well as the grammar. Does anyone have an idea of the cause?
"https-openssl-apr-8443-exec-17" #111 daemon prio=5 os_prio=0 tid=0x00007faa78018800 nid=0x3e91 runnable [0x00007fa9ef7f6000]
java.lang.Thread.State: RUNNABLE
at org.antlr.v4.runtime.misc.Array2DHashSet.getOrAdd(Array2DHashSet.java:59)
at org.antlr.v4.runtime.atn.ATNConfigSet.add(ATNConfigSet.java:146)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1529)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1496)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1496)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure_(ParserATNSimulator.java:1583)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closureCheckingStopState(ParserATNSimulator.java:1513)
at org.antlr.v4.runtime.atn.ParserATNSimulator.closure(ParserATNSimulator.java:1448)
at org.antlr.v4.runtime.atn.ParserATNSimulator.computeReachSet(ParserATNSimulator.java:856)
at org.antlr.v4.runtime.atn.ParserATNSimulator.execATNWithFullContext(ParserATNSimulator.java:664)
at org.antlr.v4.runtime.atn.ParserATNSimulator.execATN(ParserATNSimulator.java:505)
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:393)
at com.americanpg.alis.parsers.StreetAddressParser.streetName(StreetAddressParser.java:2000)
at com.americanpg.alis.parsers.StreetAddressParser.addr1(StreetAddressParser.java:1857)
at com.americanpg.alis.parsers.StreetAddressParser.address(StreetAddressParser.java:1207)
at com.americanpg.alis.parsers.StreetAddressParser.prog(StreetAddressParser.java:153)
at com.americanpg.alis.common.antlr.runners.StreetAddressRunner.parseStreetAddress(StreetAddressRunner.java:39)
at com.americanpg.alis.common.engine.services.ServiceHelper.parseAddress(ServiceHelper.java:656)
at com.americanpg.alis.common.engine.objects.ParcelMatchCache.getAddress(ParcelMatchCache.java:49)
at com.americanpg.alis.common.engine.services.ServiceHelper.rank(ServiceHelper.java:75)
at com.americanpg.alis.common.engine.services.ServiceHelper.rankAndFilter(ServiceHelper.java:168)
at com.americanpg.alis.common.engine.services.ParcelService.locate(ParcelService.java:61)
at com.americanpg.alis.webapp.dispatcher.controller.core.LocateController.locate(LocateController.java:42)
at sun.reflect.GeneratedMethodAccessor248.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:116)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:963)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:897)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:478)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:799)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:861)
at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2268)
- locked <0x00000006c3de5fe8> (a org.apache.tomcat.util.net.AprEndpoint$AprSocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
GRAMMAR
grammar StreetAddress;
prog: (address SEPARATOR* (NEWLINE | <EOF>))+;
address: /* 15 alts */
| addr1
;
streetNumber: DIGIT+ (SEPARATOR (streetFractional | DIGIT+))?
| DIGIT+ '-'? NAME (SEPARATOR (streetFractional | DIGIT+))?
| (DIGIT+ (SEPARATOR? '/' SEPARATOR? DIGIT+)+ | DIGIT+ SEPARATOR? '-' SEPARATOR? DIGIT+) (SEPARATOR (streetFractional | DIGIT+))?
;
streetFractional: DIGIT SEPARATOR? '/' SEPARATOR? DIGIT+;
addr1: streetDirectional SEPARATOR (streetNameSpanish | streetName) SEPARATOR streetSuffix (SEPARATOR streetPostdirectional)??
| streetDirectional SEPARATOR (streetNameSpanish | streetName) (SEPARATOR streetPostdirectional)??
| (streetNameSpanish | streetName) SEPARATOR streetSuffix (SEPARATOR streetPostdirectional)??
| (streetNameSpanish | streetName) (SEPARATOR streetPostdirectional)??
;
streetName: (NAME | STREETSUFFIX | STREETSUFFIXABBREVIATED | VISTA | AVE | CIR | BUSINESS) (SEPARATOR (NAME | STREETSUFFIX | STREETSUFFIXABBREVIATED | VISTA | AVE | CIR | BUSINESS))* (SEPARATOR DIGIT+ NAME?) (SEPARATOR DIRECTIONAL)? (SEPARATOR BUSINESS)? // HWY 35 E? S^, COUNTY ROAD 913
| (NAME | STREETSUFFIX | STREETSUFFIXABBREVIATED | UNITPREFIXNONUMBER | DE | MI | BUSINESS) (SEPARATOR (NAME | DIRECTIONAL | BUSINESS))* (SEPARATOR (STREETSUFFIX | VISTA | AVE | CIR))? // ST PAUL ST, JOSEPH E LOWERY BLVD
| (DIRECTIONAL SEPARATOR)? NUMBEREDSTREET (SEPARATOR DIGIT SEPARATOR? '/' SEPARATOR? DIGIT+)? (SEPARATOR STREETSUFFIX)? //34th 1/2 ST
| (NAME | STREETSUFFIX | STREETSUFFIXABBREVIATED | VISTA | AVE | CIR) (SEPARATOR (NAME | STREETSUFFIX | STREETSUFFIXABBREVIATED | VISTA | AVE | CIR))* (SEPARATOR DIRECTIONAL) (SEPARATOR (STREETSUFFIX | VISTA | AVE | CIR))? //Great Southwest PKWY
| DIRECTIONAL (SEPARATOR NAME)* (SEPARATOR (STREETSUFFIX | VISTA | AVE | CIR))? //Northwest HWY
| (SPANISHSTREETPREFIX | VISTA) (SEPARATOR (NAME | STREETSUFFIX | STREETSUFFIXABBREVIATED | SPANISHSTREETPREFIX | VISTA | AVE | CIR | BUSINESS))*
| NAME
;
streetNameSpanish: (SPANISHSTREETPREFIX | SPANISHSTREETPREFIXABBREVIATED | VISTA | AVE | CIR) (SEPARATOR NAME)+? //VISTA MONTANA
| (SPANISHSTREETPREFIX | SPANISHSTREETPREFIXABBREVIATED | VISTA | AVE | CIR) SEPARATOR DE SEPARATOR (LA | LO) (SEPARATOR NAME)+? //CALLE DE LA MESA
| (SPANISHSTREETPREFIX | SPANISHSTREETPREFIXABBREVIATED | VISTA | AVE | CIR) SEPARATOR DE SEPARATOR (LAS | LOS) (SEPARATOR NAME)+? //AVENIDA DE LAS AMERICAS
| (SPANISHSTREETPREFIX | SPANISHSTREETPREFIXABBREVIATED | VISTA | AVE | CIR) SEPARATOR (DE | DEL) (SEPARATOR NAME)+? //CAMINO DE CLARA, CALLE DEL REY
| (LA | LO | LAS | LOS | DEL) (SEPARATOR NAME)+ //LAS CARINO, DEL TORO
;
streetSuffix: (CT | VISTA | AVE | CIR | STREETSUFFIX | STREETSUFFIXABBREVIATED) '.'?;
streetPostdirectional: DIRECTIONAL
| NE
;
DIGIT: [0-9];
//List of state abbreviations, some of which must be used in other parts of an address
AL: 'AL' | 'Al' | 'al';
AK: 'AK' | 'Ak' | 'ak';
AZ: 'AZ' | 'Az' | 'az';
AR: 'AR' | 'Ar' | 'ar';
CA: 'CA' | 'Ca' | 'ca';
CO: 'CO' | 'Co' | 'co';
CT: 'CT' | 'Ct' | 'ct';
DC: 'DC' | 'Dc' | 'dc';
DE: 'DE' | 'De' | 'de';
FL: 'FL' | 'Fl' | 'fl';
GA: 'GA' | 'Ga' | 'ga';
HI: 'HI' | 'Hi' | 'hi';
ID: 'ID' | 'Id' | 'id';
IL: 'IL' | 'Il' | 'il';
IN: 'IN' | 'In' | 'in';
IA: 'IA' | 'Ia' | 'ia';
KS: 'KS' | 'Ks' | 'ks';
KY: 'KY' | 'Ky' | 'ky';
LA: 'LA' | 'La' | 'la';
ME: 'ME' | 'Me' | 'me';
MD: 'MD' | 'Md' | 'md';
MA: 'MA' | 'Ma' | 'ma';
MI: 'MI' | 'Mi' | 'mi';
MN: 'MN' | 'Mn' | 'mn';
MS: 'MS' | 'Ms' | 'ms';
MO: 'MO' | 'Mo' | 'mo';
MT: 'MT' | 'Mt' | 'mt';
NE: 'NE' | 'Ne' | 'ne';
NV: 'NV' | 'Nv' | 'nv';
NH: 'NH' | 'Nh' | 'nh';
NJ: 'NJ' | 'Nj' | 'nj';
NM: 'NM' | 'Nm' | 'nm';
NY: 'NY' | 'Ny' | 'ny';
NC: 'NC' | 'Nc' | 'nc';
ND: 'ND' | 'Nd' | 'nd';
OH: 'OH' | 'Oh' | 'oh';
OK: 'OK' | 'Ok' | 'ok';
OR: 'OR' | 'Or' | 'or';
PA: 'PA' | 'Pa' | 'pa';
RI: 'RI' | 'Ri' | 'ri';
SC: 'SC' | 'Sc' | 'sc';
SD: 'SD' | 'Sd' | 'sd';
TN: 'TN' | 'Tn' | 'tn';
TX: 'TX' | 'Tx' | 'tx';
UT: 'UT' | 'Ut' | 'ut';
VT: 'VT' | 'Vt' | 'vt';
VA: 'VA' | 'Va' | 'va';
WA: 'WA' | 'Wa' | 'wa';
WV: 'WV' | 'Wv' | 'wv';
WI: 'WI' | 'Wi' | 'wi';
WY: 'WY' | 'Wy' | 'wy';
LO: 'LO' | 'Lo' | 'lo';
LAS: 'LAS' | 'Las' | 'las';
LOS: 'LOS' | 'Los' | 'los';
DEL: 'DEL' | 'Del' | 'del';
VISTA: 'VISTA' | 'Vista' | 'vista' | 'VIS' | 'Vis' | 'vis';
AVE: 'AVE' | 'Ave' | 'ave';
CIR: 'CIR' | 'Cir' | 'cir';
MC: 'MC ' | 'Mc ' | 'mc ';
SAN: 'SAN ' | 'San ' | 'san ';
REV_DR: ('REVEREND' | 'Reverend' | 'reverend' | ('REV' | 'Rev' | 'rev') '.'?) SEPARATOR ('DOCTOR' | 'Doctor' | 'doctor' | ('DR' | 'Dr' | 'dr') '.'?) SEPARATOR;
BUSINESS: ('BUSINESS' | 'Business' | 'business' | ( 'BUS' | 'Bus' | 'bus' | 'BSNS' | 'Bsns' | 'bsns' | 'BUSINES' | 'Busines' | 'busines' | 'BUSN' | 'Busn' | 'busn') '.'?)
(SEPARATOR ( STREETSUFFIX | STREETSUFFIXABBREVIATED ))?
;
DIRECTIONAL: 'NORTH' | 'SOUTH' | 'EAST' | 'WEST' | 'North' | 'South' | 'East' | 'West' | 'north' | 'south' | 'east' | 'west'
| 'NORTHWEST' | 'SOUTHWEST' | 'NORTHEAST' | 'SOUTHEAST' | 'Northwest' | 'Southwest' | 'Northeast' | 'Southeast'
| 'NorthWest' | 'SouthWest' | 'NorthEast' | 'SouthEast' | 'northwest' | 'southwest' | 'northeast' | 'southeast'
| 'N' | 'S' | 'E' | 'W' | 'n' | 's' | 'e' | 'w' | 'NW' | 'SW' | /*NE |*/ 'SE' | 'Nw' | 'Sw' | 'Se' | 'nw' | 'sw' | 'se'
; //NE, a lexer rule, has been commented here and coded in the parser to avoid ambiguity
STREETSUFFIX: 'Allee' | 'Alley' | 'Ally' | 'Anex' | 'Annex' | 'Annx' | 'Arcade' | 'Av' | 'Aven' | 'Avenu' | 'Avenue' | 'Avn' | 'Avnue' | 'Bayoo' | 'Bayou' | 'Beach' | 'Bend' | 'Bluf' | 'Bluff' | 'Bluffs' | 'Bot' | 'Bottm' | 'Bottom' | 'Boul' | 'Boulevard' | 'Boulv' | 'Brnch' | 'Branch' | 'Brdge' | 'Bridge' | 'Brook' | 'Brooks' | 'Burg' | 'Burgs' | 'Bypa' | 'Bypas' | 'Bypass' | 'Byps' | 'Camp' | 'Cmp' | 'Canyn' | 'Canyon' | 'Cnyn' | 'Cape' | 'Causeway' | 'Causwa' | 'Cen' | 'Cent' | 'Center' | 'Centr' | 'Centre' | 'Cnter' | 'Cntr' | 'Ctr' | 'Centers' | 'Circ' | 'Circl' | 'Circle' | 'Crcl' | 'Crcle' | 'Circles' | 'Cliff' | 'Cliffs' | 'Club' | 'Common' | 'Commons' | 'Corner' | 'Corners' | 'Course' | 'Court' | 'Courts' | 'Cove' | 'Coves' | 'Creek' | 'Crescent' | 'Crsent' | 'Crsnt' | 'Crest' | 'Crossing' | 'Crssng' | 'Crossroad' | 'Crossroads'| 'Curve' | 'Dale' | 'Dam' | 'Div' | 'Divide' | 'Dvd' | 'Driv' | 'Drive' | 'Drv' | 'Drives' | 'Estate' | 'Estates' | 'Exp' | 'Expr' | 'Express' | 'Expressway'| 'Expw' | 'Expy' | 'Extension' | 'Extn' | 'Extnsn' | 'Falls' | 'Ferry' | 'Frry' | 'Field' | 'Fields' | 'Flat' | 'Flats' | 'Ford' | 'Fords' | 'Forest' | 'Forests' | 'Forg' | 'Forge' | 'Forges' | 'Fork' | 'Forks' | 'Fort' | 'Frt' | 'Freeway' | 'Freewy' | 'Frway' | 'Frwy' | 'Fwy' | 'Garden' | 'Gardn' | 'Grden' | 'Grdn' | 'Gardens' | 'Grdns' | 'Gateway' | 'Gatewy' | 'Gatway' | 'Gtway' | 'Gtwy' | 'Glen' | 'Glens' | 'Green' | 'Greens' | 'Grov' | 'Grove' | 'Groves' | 'Harb' | 'Harbor' | 'Harbr' | 'Hrbor' | 'Harbors' | 'Haven' | 'Ht' | 'Highway' | 'Highwy' | 'Hiway' | 'Hiwy' | 'Hway' | 'Hwy' | 'Hill' | 'Hills' | 'Hllw' | 'Hollow' | 'Hollows' | 'Holws' | 'Island' | 'Islnd' | 'Islands' | 'Islnds' | 'Isles' | 'Jction' | 'Jctn' | 'Junction' | 'Junctn' | 'Juncton' | 'Jctns' | 'Junctions' | 'Key' | 'Keys' | 'Knol' | 'Knoll' | 'Knolls' | 'Lake' | 'Lakes' | 'Landing' | 'Lndng' | 'Lane' | 'Light' | 'Lights' | 'Loaf' | 'Lock' | 'Locks' | 'Ldge' | 'Lodg' | 'Lodge' | 'Loops' | 'Manor' | 'Manors' | 'Meadow' | 'Meadows' | 'Medows' | 'Mill' | 'Mills' | 'Missn' | 'Mssn' | 'Motorway' | 'Mnt' | 'Mount' | 'Mntain' | 'Mntn' | 'Mountain' | 'Mountin' | 'Mtin' | 'Mtn' | 'Mntns' | 'Mountains' | 'Neck' | 'Orchard' | 'Orchrd' | 'Ovl' | 'Overpass' | 'Prk' | 'Parks' | 'Parkway' | 'Parkwy' | 'Pkway' | 'Pky' | 'Parkways' | 'Pkwys' | 'Passage' | 'Paths' | 'Pikes' | 'Pine' | 'Pines' | 'Plain' | 'Plains' | 'Plaza' | 'Plza' | 'Point' | 'Points' | 'Port' | 'Ports' | 'Prairie' | 'Prr' | 'Rad' | 'Radial' | 'Radiel' | 'Ranch' | 'Ranches' | 'Rnchs' | 'Rapid' | 'Rapids' | 'Rest' | 'Rdge' | 'Ridge' | 'Ridges' | 'River' | 'Rvr' | 'Rivr' | 'Road' | 'Roads' | 'Route' | 'Shoal' | 'Shoals' | 'Shoar' | 'Shore' | 'Shoars' | 'Shores' | 'Skyway' | 'Spng' | 'Spring' | 'Sprng' | 'Spngs' | 'Springs' | 'Sprngs' | 'Spurs' | 'Sqr' | 'Sqre' | 'Squ' | 'Square' | 'Sqrs' | 'Squares' | 'Station' | 'Statn' | 'Stn' | 'Strav' | 'Straven' | 'Stravenue' | 'Stravn' | 'Strvn' | 'Strvnue' | 'Stream' | 'Streme' | 'Street' | 'Strt' | 'Str' | 'Streets' | 'Sumit' | 'Sumitt' | 'Summit' | 'Terr' | 'Terrace' | 'Throughway'| 'Trace' | 'Traces' | 'Track' | 'Tracks' | 'Trk' | 'Trks' | 'Trafficway'| 'Trail' | 'Trails' | 'Trls' | 'Trailer' | 'Trlrs' | 'Tunel' | 'Tunls' | 'Tunnel' | 'Tunnels' | 'Tunnl' | 'Trnpk' | 'Turnpike' | 'Turnpk' | 'Underpass' | 'Union' | 'Unions' | 'Valley' | 'Vally' | 'Vlly' | 'Valleys' | 'Vdct' | 'Viadct' | 'Viaduct' | 'View' | 'Views' | 'Vill' | 'Villag' | 'Village' | 'Villg' | 'Villiage' | 'Vlg' | 'Villages' | 'Ville' | 'Vist' /*| 'Vista'*/ | 'Vst' | 'Vsta' | 'Walks' | 'Wy' | 'Well' | 'Wells' | 'Fall' | 'Isle' | 'Land' | 'Loop' | 'Mall' | 'Mews' | 'Oval' | 'Park' | 'Pass' | 'Path' | 'Pike' | 'Ramp' | 'Row' | 'Rue' | 'Run' | 'Spur' | 'Walk' | 'Wall' | 'Way' | 'Ways'
;
STREETSUFFIXABBREVIATED: ('ALY' | 'ANX' | 'ARC' | /*'AVE' |*/ 'BYU' | 'BCH' | 'BND' | 'BLF' | 'BLFS' | 'BTM' | 'BLVD' | 'BR' | 'BRG' | 'BRK' | 'BRKS' | 'BG' | 'BGS' | 'BYP' | 'CP' | 'CYN' | 'CPE' | 'CSWY' | 'CTR' | 'CTRS' | /*'CIR' |*/ 'CIRS' | 'CLF' | 'CLFS' | 'CLB' | 'CMN' | 'CMNS' | 'COR' | 'CORS' | 'CRSE' | /*CT |*/ 'CTS' | 'CV' | 'CVS' | 'CRK' | 'CRES' | 'CRST' | 'XING' | 'XRD' | 'XRDS' | 'CURV' | 'DL' | 'DM' | 'DV' | 'DR' | 'DRS' | 'EST' | 'ESTS' | 'EXPY' | 'EXT' | 'EXTS' | 'FALL' | 'FLS' | 'FRY' | 'FLD' | 'FLDS' | 'FLT' | 'FLTS' | 'FRD' | 'FRDS' | 'FRST' | 'FRG' | 'FRGS' | 'FRK' | 'FRKS' | 'FT' | 'FWY' | 'GDN' | 'GDNS' | 'GTWY' | 'GLN' | 'GLNS' | 'GRN' | 'GRNS' | 'GRV' | 'GRVS' | 'HBR' | 'HBRS' | 'HVN' | 'HTS' | 'HWY' | 'HL' | 'HLS' | 'HOLW' | 'INLT' | 'IS' | 'ISS' | 'ISLE' | 'JCT' | 'JCTS' | 'KY' | 'KYS' | 'KNL' | 'KNLS' | 'LK' | 'LKS' | 'LAND' | 'LNDG' | 'LN' | 'LGT' | 'LGTS' | 'LF' | 'LCK' | 'LCKS' | 'LDG' | 'LOOP' | 'MALL' | 'MNR' | 'MNRS' | 'MDW' | 'MDWS' | 'MEWS' | 'ML' | 'MLS' | 'MSN' | 'MTWY' | 'MT' | 'MTN' | 'MTNS' | 'NCK' | 'ORCH' | 'OVAL' | 'OPAS' | 'PARK' | 'PARK' | 'PKWY' | 'PKWY' | 'PASS' | 'PSGE' | 'PATH' | 'PIKE' | 'PNE' | 'PNES' | 'PL' | 'PLN' | 'PLNS' | 'PLZ' | 'PT' | 'PTS' | 'PRT' | 'PRTS' | 'PR' | 'RADL' | 'RAMP' | 'RNCH' | 'RPD' | 'RPDS' | 'RST' | 'RDG' | 'RDGS' | 'RIV' | 'RD' | 'RDS' | 'RTE' | 'ROW' | 'RUE' | 'RUN' | 'SHL' | 'SHLS' | 'SHR' | 'SHRS' | 'SKWY' | 'SPG' | 'SPGS' | 'SPUR' | 'SPUR' | 'SQ' | 'SQS' | 'STA' | 'STRA' | 'STRM' | 'ST' | 'STS' | 'SMT' | 'TER' | 'TRWY' | 'TRCE' | 'TRAK' | 'TRFY' | 'TRL' | 'TRLR' | 'TUNL' | 'TPKE' | 'UPAS' | 'UN' | 'UNS' | 'VLY' | 'VLYS' | 'VIA' | 'VW' | 'VWS' | 'VLG' | 'VLGS' | 'VL' /*| 'VIS'*/ | 'WALK' | 'WALK' | 'WALL' | 'WAY' | 'WAYS' | 'WL' | 'WLS'
| 'Aly' | 'Anx' | 'Arc' | /*'Ave' |*/ 'Byu' | 'Bch' | 'Bnd' | 'Blf' | 'Blfs' | 'Btm' | 'Blvd' | 'Br' | 'Brg' | 'Brk' | 'Brks' | 'Bg' | 'Bgs' | 'Byp' | 'Cp' | 'Cyn' | 'Cpe' | 'Cswy' | 'Ctr' | 'Ctrs' | /*'Cir' |*/ 'Cirs' | 'Clf' | 'Clfs' | 'Clb' | 'Cmn' | 'Cmns' | 'Cor' | 'Cors' | 'Crse' | /*'Ct' |*/ 'Cts' | 'Cv' | 'Cvs' | 'Crk' | 'Cres' | 'Crst' | 'Xing' | 'Xrd' | 'Xrds' | 'Curv' | 'Dl' | 'Dm' | 'Dv' | 'Dr' | 'Drs' | 'Est' | 'Ests' | 'Expy' | 'Ext' | 'Exts' | 'Fall' | 'Fls' | 'Fry' | 'Fld' | 'Flds' | 'Flt' | 'Flts' | 'Frd' | 'Frds' | 'Frst' | 'Frg' | 'Frgs' | 'Frk' | 'Frks' | 'Ft' | 'Fwy' | 'Gdn' | 'Gdns' | 'Gtwy' | 'Gln' | 'Glns' | 'Grn' | 'Grns' | 'Grv' | 'Grvs' | 'Hbr' | 'Hbrs' | 'Hvn' | 'Hts' | 'Hwy' | 'Hl' | 'Hls' | 'Holw' | 'Inlt' | 'Is' | 'Iss' | 'Isle' | 'Jct' | 'Jcts' | 'Ky' | 'Kys' | 'Knl' | 'Knls' | 'Lk' | 'Lks' | 'Land' | 'Lndg' | 'Ln' | 'Lgt' | 'Lgts' | 'Lf' | 'Lck' | 'Lcks' | 'Ldg' | 'Loop' | 'Mall' | 'Mnr' | 'Mnrs' | 'Mdw' | 'Mdws' | 'Mews' | 'Ml' | 'Mls' | 'Msn' | 'Mtwy' | 'Mt' | 'Mtn' | 'Mtns' | 'Nck' | 'Orch' | 'Oval' | 'Opas' | 'Park' | 'Pkwy' | 'Pass' | 'Psge' | 'Path' | 'Pike' | 'Pne' | 'Pnes' | 'Pl' | 'Pln' | 'Plns' | 'Plz' | 'Pt' | 'Pts' | 'Prt' | 'Prts' | 'Pr' | 'Radl' | 'Ramp' | 'Rnch' | 'Rpd' | 'Rpds' | 'Rst' | 'Rdg' | 'Rdgs' | 'Riv' | 'Rd' | 'Rds' | 'Rte' | 'Row' | 'Rue' | 'Run' | 'Shl' | 'Shls' | 'Shr' | 'Shrs' | 'Skwy' | 'Spg' | 'Spgs' | 'Spur' | 'Spur' | 'Sq' | 'Sqs' | 'Sta' | 'Stra' | 'Strm' | 'St' | 'Sts' | 'Smt' | 'Ter' | 'Trwy' | 'Trce' | 'Trak' | 'Trfy' | 'Trl' | 'Trlr' | 'Tunl' | 'Tpke' | 'Upas' | 'Un' | 'Uns' | 'Vly' | 'Vlys' | 'Via' | 'Vw' | 'Vws' | 'Vlg' | 'Vlgs' | 'Vl' /*| 'Vis'*/ | 'Walk' | 'Wall' | 'Way' | 'Ways' | 'Wl' | 'Wls'
| 'aly' | 'anx' | 'arc' | /*'ave' |*/ 'byu' | 'bch' | 'bnd' | 'blf' | 'blfs' | 'btm' | 'blvd' | 'br' | 'brg' | 'brk' | 'brks' | 'bg' | 'bgs' | 'byp' | 'cp' | 'cyn' | 'cpe' | 'cswy' | 'ctr' | 'ctrs' | /*'cir' |*/ 'cirs' | 'clf' | 'clfs' | 'clb' | 'cmn' | 'cmns' | 'cor' | 'cors' | 'crse' | /*'ct' |*/ 'cts' | 'cv' | 'cvs' | 'crk' | 'cres' | 'crst' | 'xing' | 'xrd' | 'xrds' | 'curv' | 'dl' | 'dm' | 'dv' | 'dr' | 'drs' | 'est' | 'ests' | 'expy' | 'ext' | 'exts' | 'fall' | 'fls' | 'fry' | 'fld' | 'flds' | 'flt' | 'flts' | 'frd' | 'frds' | 'frst' | 'frg' | 'frgs' | 'frk' | 'frks' | 'ft' | 'fwy' | 'gdn' | 'gdns' | 'gtwy' | 'gln' | 'glns' | 'grn' | 'grns' | 'grv' | 'grvs' | 'hbr' | 'hbrs' | 'hvn' | 'hts' | 'hwy' | 'hl' | 'hls' | 'holw' | 'inlt' | 'is' | 'iss' | 'isle' | 'jct' | 'jcts' | 'ky' | 'kys' | 'knl' | 'knls' | 'lk' | 'lks' | 'land' | 'lndg' | 'ln' | 'lgt' | 'lgts' | 'lf' | 'lck' | 'lcks' | 'ldg' | 'loop' | 'mall' | 'mnr' | 'mnrs' | 'mdw' | 'mdws' | 'mews' | 'ml' | 'mls' | 'msn' | 'mtwy' | 'mt' | 'mtn' | 'mtns' | 'nck' | 'orch' | 'oval' | 'opas' | 'park' | 'park' | 'pkwy' | 'pkwy' | 'pass' | 'psge' | 'path' | 'pike' | 'pne' | 'pnes' | 'pl' | 'pln' | 'plns' | 'plz' | 'pt' | 'pts' | 'prt' | 'prts' | 'pr' | 'radl' | 'ramp' | 'rnch' | 'rpd' | 'rpds' | 'rst' | 'rdg' | 'rdgs' | 'riv' | 'rd' | 'rds' | 'rte' | 'row' | 'rue' | 'run' | 'shl' | 'shls' | 'shr' | 'shrs' | 'skwy' | 'spg' | 'spgs' | 'spur' | 'spur' | 'sq' | 'sqs' | 'sta' | 'stra' | 'strm' | 'st' | 'sts' | 'smt' | 'ter' | 'trwy' | 'trce' | 'trak' | 'trfy' | 'trl' | 'trlr' | 'tunl' | 'tpke' | 'upas' | 'un' | 'uns' | 'vly' | 'vlys' | 'via' | 'vw' | 'vws' | 'vlg' | 'vlgs' | 'vl' /*| 'vis'*/ | 'walk' | 'walk' | 'wall' | 'way' | 'ways' | 'wl' | 'wls') '.'?
;
SPANISHSTREETPREFIX: 'AVENIDA' | 'CALLE' | 'CAMINITO' | 'CAMINO' | 'CARREDA' | 'CIRCULO' | 'ENTRADA' | 'PASEO' | 'PLACITA' | 'RANCHO' | 'VEREDA' /*| 'VISTA'*/
| 'Avenida' | 'Calle' | 'Caminito' | 'Camino' | 'Carreda' | 'Circulo' | 'Entrada' | 'Paseo' | 'Placita' | 'Rancho' | 'Vereda' /*| 'Vista'*/
| 'avenida' | 'calle' | 'caminito' | 'camino' | 'carreda' | 'circulo' | 'entrada' | 'paseo' | 'placita' | 'rancho' | 'vereda' /*| 'vista'*/
;
SPANISHSTREETPREFIXABBREVIATED: (/*'AVE' |*/ 'CLL' | 'CMT' | 'CAM' | 'CER' | /*'CIR' |*/ 'ENT' | 'PSO' | 'PLA' | 'RCH' | 'VER' /*| 'VIS'*/
| /*'Ave' |*/ 'Cll' | 'Cmt' | 'Cam' | 'Cer' | /*'Cir' |*/ 'Ent' | 'Pso' | 'Pla' | 'Rch' | 'Ver' /*| 'Vis'*/
| /*'ave' |*/ 'cll' | 'cmt' | 'cam' | 'cer' | /*'cir' |*/ 'ent' | 'pso' | 'pla' | 'rch' | 'ver' /*| 'vis'*/) '.'?
; //The commented abbreviations are included in STREETSUFFIXABBREVIATED and could cause ambiguity if uncommented.
UNITPREFIX: 'APARTMENT' | 'APT' | 'BUILDING' | 'BLDG' | 'DEPARTMENT' | 'DEPT' | 'FLOOR' | 'FL' | 'HANGER' | 'HNGR' | 'KEY' | 'KEY' | 'LOT' | 'LOT' | 'PIER' | 'PIER' | 'ROOM' | 'RM' | 'SLIP' | 'SPACE' | 'SPC' | 'STOP' | 'STOP' | 'SUITE' | 'STE' | 'TRAILER' | 'TRLR' | 'UNIT'
| 'Apartment' | 'Apt' | 'Building' | 'Bldg' | 'Department' | 'Dept' | 'Floor' | 'Fl' | 'Hanger' | 'Hngr' | 'Key' | 'Lot' | 'Pier' | 'Room' | 'Rm' | 'Slip' | 'Space' | 'Spc' | 'Stop' | 'Suite' | 'Ste' | 'Trailer' | 'Trlr' | 'Unit'
| 'apartment' | 'apt' | 'building' | 'bldg' | 'department' | 'dept' | 'floor' | 'fl' | 'hanger' | 'hngr' | 'key' | 'lot' | 'pier' | 'room' | 'rm' | 'slip' | 'space' | 'spc' | 'stop' | 'suite' | 'ste' | 'trailer' | 'trlr' | 'unit'
;
UNITPREFIXNONUMBER: 'BASEMENT' | 'BSMT' | 'FRONT' | 'FRNT' | 'LOBBY' | 'LBBY' | 'LOWER' | 'LOWR' | 'OFFICE' | 'OFC' | 'PENTHOUSE' | 'PH' | 'REAR' | 'SIDE' | 'UPPER' | 'UPPR'
| 'Basement' | 'Bsmt' | 'Front' | 'Frnt' | 'Lobby' | 'Lbby' | 'Lower' | 'Lowr' | 'Office' | 'Ofc' | 'Penthouse' | 'Ph' | 'Rear' | 'Side' | 'Upper' | 'Uppr'
| 'basement' | 'bsmt' | 'front' | 'frnt' | 'lobby' | 'lbby' | 'lower' | 'lowr' | 'office' | 'ofc' | 'penthouse' | 'ph' | 'rear' | 'side' | 'upper' | 'uppr'
;
NUMBEREDSTREET: DIGIT+ ('st' | 'nd' | 'rd' | 'th' | 'St' | 'Nd' | 'Rd' | 'Th' | 'ST' | 'ND' | 'RD' | 'TH');
NAME: (MC | SAN | REV_DR | [a-zA-Z'])? [a-zA-Z\-/'.]*;
NEWLINE: '\r'? '\n';
COMMA: ',';
SEPARATOR: ('-' | ' ')+;
WS: [\t]+ -> skip;
I have huge .csv file which has several columns but the columns of importance to me are USER_ID(User Identifier), DURATION(Duration of Call), TYPE(Incoming or Outgoing), DATE, NUMBER(Mobile No.).
So what I am trying to do is : replace all null values in DURATION column with average of duration of all the calls of same type by the same user(i.e. of same USER_ID).
I have found the average as following :
In the query below I am finding out the average of duration of all the calls of same type by the same user.
Dataset<Row> filteredData = callLogsDataSet.selectExpr(USER_ID, DURATION, TYPE, DATE, NORMALIZE_NUMBER)
/*1*/ .filter(col(USER_ID).isNotNull().and(col(TYPE).isNotNull()).and(col(NORMALIZE_NUMBER).isNotNull()).and(col(DATE).gt(0)).and(col(TYPE).isin("OUTGOING","INCOMING")))
/*2*/ .groupBy(col(USER_ID), col(TYPE), col(NORMALIZE_NUMBER))
/*3*/ .agg(sum(DURATION).alias(DURATION_IN_MIN).divide(count(col(USER_ID))));
filteredData.show() gives :
|USER_ID |type |normalized_number|(sum(duration) AS `durationInMin` / count(USER_ID))|
+--------------------------------+--------+-----------------+---------------------------------------------------+
|8a8a8a8a592b4ace01595e65901b0013|OUTGOING|+435657456354 |0.0 |
|8a8a8a8a592b4ace01595e70dcbd0016|OUTGOING|+876454354353 |48.6 |
|8a8a8a8a592b4ace01595e099764000c|INCOMING|+132445686765 |15.0 |
|8a8a8a8a592b4ace01592b4ff4b90000|INCOMING|+097645634324 |74.16666666666667 |
|8a8a8a8a592b4ace0159366a56290005|INCOMING|+134435657656 |15.0 |
|8a8a8a8a592b4ace01595e70dcbd0016|OUTGOING|+135879878543 |31.0 |
|8a8a8a8a592b4ace0159366a56290005|INCOMING|+768435245243 |11.0 |
|8a8a8a8a592b4ace01592cd8fd160003|INCOMING|+787685534523 |0.0 |
|8a8a8a8a592b4ace01595e65901b0013|OUTGOING|+098976865745 |61.5 |
|8a8a8a8a592b4ace01592b4ff4b90000|OUTGOING|+123456787644 |43.333333333333336 |
In the query below I am filtering the data and replacing all the null occurences with 0 in step 2.
DataSet<Row> filteredData2 = callLogsDataSet.selectExpr(USER_ID, DURATION, TYPE, DATE, NORMALIZE_NUMBER)
/*1*/ .filter(col(USER_ID).isNotNull().and(col(TYPE).isNotNull()).and(col(NORMALIZE_NUMBER).isNotNull())
.and(col(DATE).gt(0)).and(col(DURATION).gt(0)).and(col(TYPE).isin("OUTGOING","INCOMING")))
/*2*/ .withColumn(DURATION, when(col(DURATION).isNull(), 0).otherwise(col(DURATION).cast(LONG)))
/*3*/ .withColumn(DATE, col(DATE).cast(LONG).minus(col(DATE).cast(LONG).mod(ROUND_ONE_MIN)).cast(LONG))
/*4*/ .groupBy(col(USER_ID), col(DURATION), col(TYPE), col(DATE), col(NORMALIZE_NUMBER))
/*5*/ .agg(sum(DURATION).alias(DURATION_IN_MIN))
/*6*/ .withColumn(DAY_TIME, lit(""))
/*7*/ .withColumn(WEEK_DAY, lit(""))
/*8*/ .withColumn(HOUR_OF_DAY, lit(0));
filteredData2.show() gives :
|USER_ID |duration|type |date |normalized_number|durationInMin|DAY_TIME|WEEK_DAY|HourOfDay|
+--------------------------------+--------+--------+-------------+-----------------+-------------+--------+--------+---------+
|8a8a8a8a592b4ace01595e70dcbd0016|25 |INCOMING|1479017220000|+465435534353 |25 | | |0 |
|8a8a8a8a592b4ace01595e099764000c|29 |INCOMING|1482562560000|+545765765775 |29 | | |0 |
|8a8a8a8a592b4ace01595e099764000c|75 |OUTGOING|1483363980000|+124435665755 |75 | | |0 |
|8a8a8a8a592b4ace01595e70dcbd0016|34 |OUTGOING|1483261920000|+098865563645 |34 | | |0 |
|8a8a8a8a592b4ace01595e70dcbd0016|22 |OUTGOING|1481712180000|+232434656765 |22 | | |0 |
|8a8a8a8a592b4ace0159366a56290005|64 |OUTGOING|1482984060000|+875634521325 |64 | | |0 |
|8a8a8a8a592b4ace0159366a56290005|179 |OUTGOING|1482825060000|+876542543554 |179 | | |0 |
|8a8a8a8a592b4ace01595e65901b0013|12 |OUTGOING|1482393360000|+098634563456 |12 | | |0 |
|8a8a8a8a592b4ace01595e70dcbd0016|14 |OUTGOING|1482820860000|+1344365i8787 |14 | | |0 |
|8a8a8a8a592b4ace01592b4ff4b90000|105 |INCOMING|1478772240000|+234326886784 |105 | | |0 |
|8a8a8a8a592b4ace01592b4ff4b90000|453 |OUTGOING|1480944480000|+134435676578 |453 | | |0 |
|8a8a8a8a592b4ace01595e099764000c|42 |OUTGOING|1483193100000|+413247687686 |42 | | |0 |
|8a8a8a8a592b4ace01595e099764000c|41 |OUTGOING|1481696820000|+134345435645 |41 | | |0 |
Please help me to combine these two or use these two get the required result. I am new to Spark and SparkSQL.
Thanks.
import java.util.ArrayList;
import java.util.List;
public class Tree
{
private Board board;
private List<Tree> children;
private Tree parent;
public Tree(Board board1)
{
this.board = board1;
this.children = new ArrayList<Tree>();
}
public Tree(Tree t1)
{
}
public Tree createTree(Tree tree, boolean isHuman, int depth)
{
Player play1 = new Player();
ArrayList<Board> potBoards = new ArrayList<Board>(play1.potentialMoves(tree.board, isHuman));
if (board.gameEnd() || depth == 0)
{
return null;
}
//Tree oldTree = new Tree(board);
for (int i = 0; i < potBoards.size() - 1; i++)
{
Tree newTree = new Tree(potBoards.get(i));
createTree(newTree, !isHuman, depth - 1);
tree.addChild(newTree);
}
return tree;
}
private Tree addChild(Tree child)
{
Tree childNode = new Tree(child);
childNode.parent = this;
this.children.add(childNode);
return childNode;
}
}
Hi there. I'm trying to make a gameTree that will be handled by minimax in the future. I think the error either happened in the AddChild function or the potentialMoves? The potentialMoves returns all potential moves a player or computer can make. For example in Othello a player can either go
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
0| | | | | | | | |
+-+-+-+-+-+-+-+-+
1| | | | | | | | |
+-+-+-+-+-+-+-+-+
2| | | |b| | | | |
+-+-+-+-+-+-+-+-+
3| | | |b|b| | | |
+-+-+-+-+-+-+-+-+
4| | | |b|w| | | |
+-+-+-+-+-+-+-+-+
5| | | | | | | | |
+-+-+-+-+-+-+-+-+
6| | | | | | | | |
+-+-+-+-+-+-+-+-+
7| | | | | | | | |
+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+
0| | | | | | | | |
+-+-+-+-+-+-+-+-+
1| | | | | | | | |
+-+-+-+-+-+-+-+-+
2| | | | | | | | |
+-+-+-+-+-+-+-+-+
3| | |b|b|b| | | |
+-+-+-+-+-+-+-+-+
4| | | |b|w| | | |
+-+-+-+-+-+-+-+-+
5| | | | | | | | |
+-+-+-+-+-+-+-+-+
6| | | | | | | | |
+-+-+-+-+-+-+-+-+
7| | | | | | | | |
+-+-+-+-+-+-+-+-+
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
0| | | | | | | | |
+-+-+-+-+-+-+-+-+
1| | | | | | | | |
+-+-+-+-+-+-+-+-+
2| | | | | | | | |
+-+-+-+-+-+-+-+-+
3| | | |w|b| | | |
+-+-+-+-+-+-+-+-+
4| | | |b|b|b| | |
+-+-+-+-+-+-+-+-+
5| | | | | | | | |
+-+-+-+-+-+-+-+-+
6| | | | | | | | |
+-+-+-+-+-+-+-+-+
7| | | | | | | | |
+-+-+-+-+-+-+-+-+
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
0| | | | | | | | |
+-+-+-+-+-+-+-+-+
1| | | | | | | | |
+-+-+-+-+-+-+-+-+
2| | | | | | | | |
+-+-+-+-+-+-+-+-+
3| | | |w|b| | | |
+-+-+-+-+-+-+-+-+
4| | | |b|b|| | |
+-+-+-+-+-+-+-+-+
5| | | | |b| | | |
+-+-+-+-+-+-+-+-+
6| | | | | | | | |
+-+-+-+-+-+-+-+-+
7| | | | | | | | |
+-+-+-+-+-+-+-+-+
for the first turn. The potential moves does not permanently change the board that is being played on. It returns an ArrayList.
I have this in my main:
Tree gameTree = new Tree(boardOthello);
Tree pickTree = gameTree.createTree(gameTree, true, 2);
Does the addChild() function look ok or is there something else I'm missing in my code?
In the example below how to add the formula in individual user totals as the number of rows for the user can vary.
+------+--------------+---------+--------------+-------+
| Name | Date | Billable| Non-Billable | Total |
+------+--------------+---------+--------------+-------+
| abc | 06/23/2012 | 860 | 10 | 870 |
| | User Totals: | 860 | 10 | 870 |
| xyz | 07/12/2012 | 45 | 0 | 45 |
| | User Totals: | 45 | 0 | 45 |
| ccc | 09/19/2013 | 165 | 35 | 200 |
| | 10/15/2013 | 240 | 0 | 240 |
| | User Totals: | 405 | 35 | 440 |
| | Grand Totals | 1310| 45 | 1355 |
+------+--------------+---------+--------------+-------+