Regex to extract a specific string - java

1X79 "The X-Files" (1.01) 9/10/93 1/17/94* 11/ 6/94*
1X01 "Deep Throat" (1.02) 9/17/93 12/24/93 6/24/94
1X02 "Squeeze" (1.03) 9/24/93 12/ 3/93 6/10/94 11/ 4/95*
1X03 "Conduit" (1.04) 10/ 1/93 12/14/93* 5/27/94
1X04 "Jersey Devil" (1.05) 10/ 8/93 12/31/93 7/22/94
1X05 "Shadows" (1.06) 10/22/93 3/ 4/94 5/26/95
1X06 "Ghost in the Machine" (1.07) 10/29/93 1/14/94
1X07 "Ice" (1.08) 11/ 5/93 1/17/94* 8/12/94 3/ 3/95
1X08 "Space" (1.09) 11/12/93 1/28/94 8/22/94*
1X09 "Fallen Angel" (1.10) 11/19/93 3/29/94* 11/13/94*
1X10 "Eve" (1.11) 12/10/93 3/11/94 8/26/94
1X11 "Fire" (1.12) 12/17/93 3/25/94 11/20/94*
1X12 "Beyond the Sea" (1.13) 1/ 7/94 4/ 8/94 12/22/95?
1X13 "GenderBender" (1.14) 1/21/94 5/20/94 7/21/95
1X14 "Lazarus" (1.15) 2/ 4/94 6/ 3/94 9/ 2/94
1X15 "Young at Heart" (1.16) 2/11/94 6/17/94 8/19/94
1X16 "E.B.E." (1.17) 2/18/94 7/ 8/94 11/27/94*
1X17 "Miracle Man" (1.18) 3/18/94 7/ 1/94
1X18 "Shapes" (1.19) 4/ 1/94 10/28/94 8/ 4/95
1X19 "Darkness Falls" (1.20) 4/15/94 8/ 5/94 12/ 2/94
1X20 "Tooms" (1.21) 4/22/94 7/15/94 11/ 4/95*
1X21 "Born Again" (1.22) 4/29/94 8/22/94*
1X22 "Roland" (1.23) 5/ 6/94 7/29/94
1X23 "The Erlenmeyer Flask" (1.24) 5/13/94 9/ 9/94 9/ 1/95
given this list I need to extract:
the name (in quotes)
the season (in parentheses)
the the first year mentioned.
ex) for "The X-Files" I would need to extract '93'
I've come up with:
"(.*?)"+([\D]+)(.{4})
Which get the first two items needed but I cant figure out how to grab the year

Consider the following:
"([^"]+)"\s*\(([^\)]+)\)\s*\d+\/\s*\d+\/(\d+)
https://regex101.com/r/jU6kI0/1
Name is group(1), season is group(2), year is group(3)

Related

Get CLIPS/Jess output in string variable

i have integrated jess with java in netbeans. I want to access output in a string variable.
when I (run) the .clp file and give it an input, it shows me output, but I want to get this output in string variable. How can I do this?? please help. this is my .clp file code.
(deftemplate problem
(multislot name)
(slot symptom))
(deffacts probelms
(probelm (name (create$ "Air filter" "fuel injector problem" "fuel pressure regualtor"))
(symptom Black-Smoke))
)
(defrule reading-input
=>
(printout t "Enter the symptom your car Shows: " )
(assert (var (read))))
(defrule checking-input
(var ?symptom)
(probelm (symptom ?symptom1)(name $?name1))
(test (eq ?symptom ?symptom1))
=>
(printout t "Problems can be " $?name1 crlf))
code to run this in java
public static string path="C:\Users\Taimoor Mirza\Documents\car.CLP";
Rete r=new Rete();
r.batch(path);
r.reset();
r.run();
this run good and when I enter Black-Smoke , this gives me possible symptoms on running exe.
I want to get these Symptoms in a string. how can I get this result in String???
Set up a Writer and tell Jess to use it:
Writer writer = new StringWriter();
rete.addOutputRouter( "t", writer );
// run Jess writing to router "t"
System.out.println( writer.toString() );
Look at the section about I/O Routers in the manual, here: http://www.jessrules.com/jess/docs/71/library.html#routers . Use a java.io.StringWriter as an output router, and then retrieve the text from the StringWriter.

Parsing a Tab Separated File

I'm attempting to TSV from IMDB:
$hutter Battle of the Sexes (2017) (as $hutter Boy) [Bobby Riggs Fan] <10>
NVTION: The Star Nation Rapumentary (2016) (as $hutter Boy) [Himself] <1>
Secret in Their Eyes (2015) (uncredited) [2002 Dodger Fan]
Steve Jobs (2015) (uncredited) [1988 Opera House Patron]
Straight Outta Compton (2015) (uncredited) [Club Patron/Dopeman]
$lim, Bee Moe Fatherhood 101 (2013) (as Brandon Moore) [Himself - President, Passages]
For Thy Love 2 (2009) [Thug 1]
Night of the Jackals (2009) (V) [Trooth]
"Idle Talk" (2013) (as Brandon Moore) [Himself]
"Idle Times" (2012) {(#1.1)} (as Brandon Moore) [Detective Ryan Turner]
As you can some lines start with a tab and some do not. I want a map with the actor's name as a key and a list of movies as the value. Between the actor's name is one or more tabs to until the movie listing.
My code:
while ((line = reader.readLine()) != null) {
Matcher matcher = headerPattern.matcher(line);
boolean headerMatchFound = matcher.matches();
if (headerMatchFound) {
Logger.getLogger(ActorListParser.class.getName()).log(Level.INFO, "Header for actor list found");
String newline;
reader.readLine();
while ((newline = reader.readLine()) != null) {
String[] fullLine = null;
String actor;
String title;
Pattern startsWithTab = Pattern.compile("^\t.*");
Matcher tab = startsWithTab.matcher(newline);
boolean tabStartMatcher = tab.matches();
if (!tabStartMatcher) {
fullLine = newline.split("\t.*");
System.out.println("Actor: " + fullLine[0] +
"Movie: " + fullLine[1]);
}//this line will have code to match lines that start with tabs.
}
}
}
The way I've done this only works for a few lines before I get and arrayoutofbounds exception. How can I parse the lines and split them into 2 strings at max if they have one or more tabs?
There are subtleties in parsing tab/comma-delimited data files having to do with quoting and escaping.
To save yourself a lot of work, frustration and headaches you really should consider using one of the existing CSV parsing libaries such as OpenCSV or Apache Commons CSV.
Posted as an answer instead of a comment because the OP has not stated a reason for reinventing the wheel and there are some tasks that really have been "solved" once and for all.

Clojure and HBase: Format the result of HBase Scan to Clojure maps

I have a function developed in Clojure to scan a HBase table:
(defn- ^Scan make-scan []
(Scan. ))
(defn hscan [hbase tbl]
(let [htbl (.getTable (:connection hbase) (. TableName valueOf tbl))
scanner (.getScanner htbl (make-scan))
results (mapv (fn preprocess-result [result]
result)
scanner)]
(println "Results: " results)))
And I call the function thus for a given HBase table name lookup:
(hscan #hbase "lookup")
PS: The #hbase holds the HBase configuration
I'm getting this as output:
Results: [
#object[org.apache.hadoop.hbase.client.Result 0x16cf8438 keyvalues={Pepsi/A:Canada/1443095322877/Put/vlen=5/seqid=0, Pepsi/A:USA/1443095303916/Put/vlen=5/seqid=0}]
#object[org.apache.hadoop.hbase.client.Result 0x3e5beab5 keyvalues={Wallmart/A:Canada/1443095361758/Put/vlen=5/seqid=0, Wallmart/A:USA/1443095349956/Put/vlen=5/seqid=0}]
]
And actually I have 2 rows in the HBase table:
hbase(main):007:0> scan 'lookup'
ROW COLUMN+CELL
Pepsi column=A:Canada, timestamp=1443095322877, value=upc-b
Pepsi column=A:USA, timestamp=1443095303916, value=upc-a
Wallmart column=A:Canada, timestamp=1443095361758, value=upc-d
Wallmart column=A:USA, timestamp=1443095349956, value=upc-c
2 row(s) in 0.0790 seconds
Question is I want to format the result to produce a clojure map of the form:
{
:Pepsi {:A {:USA "upc-a" :Canada "upc-b"}}
:Wallmart {:A {:USA "upc-c" :Canada "upc-d"}}
}
And I'm lost on how to implement that. Kindly help me with this implementation
Thanks!

Regex Java word context

what I want to achieve is that I want to obtain the context of an acronym. Can you help me pls with the regular expression?
I am looping over the text (String) and looking for dots, after match I am trying to get the context of the particular found acronym, so that I can do some other processing after that, but I cant get the context. I need to take at least 5 words before and 5 words after the acronym.
//Pattern to match each word ending with dot
Pattern pattern = Pattern.compile("(\\w+)\\b([.])");
Matcher matchDot = pattern.matcher(textToCorrect);
while (matchDot.find()) {
System.out.println("zkratka ---"+matchDot.group()+" ---");
//5 words before and after tha match = context
// Matcher matchContext = Pattern.compile("(.{25})("+matchDot.group()+")(.{25})").matcher(textToCorrect);
Pattern patternContext = Pattern.compile("(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,10}"+matchDot.group()+"(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,10}");
Matcher matchContext = patternContext.matcher(textToCorrect);
if (matchContext.find()) {
System.out.println("context: "+matchContext.group()+" :");
// System.out.println("context: "+matchContext.group(1)+" :");
// System.out.println("context: "+matchContext.group(2)+" :");
}
}
Example:
input:
Some 84% of Paris residents see fighting pol. as a priority and 54% supported a diesel ban in the city by 2020, according a poll carried out for the Journal du Dimanche.
output:
1-st regex will find pol.
2-nd regex will find "of Paris residents see fighting pol. as a priority and 54%"
Another example with more text
I need to loop through this once and every time I match an acronym to get the context of this particular acronym. After that I am processing some datamining. Here's the original text
neklidná nemocná, vyš. je možné provést pouze nativně
Na mozku je patrna hyperdenzita v počátečním úseku a. cerebri media
vlevo, vlevo se objevuje již smazání hranic mezi bazálními ganglii a
okolní bílou hmotou a mírná difuzní hypointenzita v periventrikulární
bílé hmotě. Kromě těchto čerstvých změn jsou patrné staré
postmalatické změny temporálně a parietookcipitálně vlevo. Oboustranně
jsou patrné vícečetné vaskulární mikroléze v centrum semiovale bilat.
Nejsou známky nitrolebního krvácení. skelet kalvy orientačně nihil tr.
Z á v ě r: Známky hyperakutní ischemie v povodí ACM vlevo, staré
postmalatickéé změny T,P a O vlevo, vaskulární mikroléze v centrum
semiovale bilat.
CT AG: vyš. po bolu k.l..
Po zklidnění nemocné se podařilo provést CT AG. Na krku je naznačený
kinkink na ACC vlevo a ACI vlevo pod bazí. Kalcifikace v karotických
sifonech nepůsobí hemodynamicky významné stenozy. Intrakraniálně je
patrný konický uzávěr operkulárního úseku a. cerebri media vlevo pro
parietální lalok. Ostatní nález na intrakraniálním tepenném řečišti je
v mezích normy.
Z á v ě r: uzávěr operkulárního úseku a. cerebri media vlevo.
Of course if it matches end of sentence is ok for me :-) The question is to find all the acronyms even if they are before new line (\n)
I would try this out:
(?:\w+\W+){5}((?:\w.?)+)(?:\w+\W+){5}
Though natural language processing with regular expressions cannot be accurate.
((?:[\w!##$%&*]+\s+){5}([\w!##$%&*]+\.)(?:\s+[\w!##$%&*]+){5})
Try this.See demo.
https://regex101.com/r/aQ3zJ3/9

How to find ISD code from ISO 3166 code?

I want to show ISO 3166 country code and its ISD code in list
Like:
ind +91
irq +964
ita +39
With the help of java.util.Locale I have got all ISO 3166 country codes but now I want ISD codes of every country.
You can use Google's libphonenumber and use its PhoneNumberUtil.getInstance().getCountryCodeForRegion(). It takes an ISO 3166 code as a String as an argument.
Note that this utility class also has a .getSupportedRegions() method.
I also have same requirement in my application, I couldn't find any api regarding this. So what I did, I create a array in res and put all the ISD codes in it and then use Local for all details. List of code is as follow
<string-array name="countryArray">
<item>AC,247</item>
<item>AD,376</item>
<item>AE,971</item>
<item>AF,93</item>
<item>AG,1</item>
<item>AI,1</item>
<item>AL,355</item>
<item>AM,374</item>
<item>AN,599</item>
<item>AO,244</item>
<item>AQ,672</item>
<item>AR,54</item>
<item>AS,1</item>
<item>AT,43</item>
<item>AU,61</item>
<item>AW,297</item>
<item>AZ,994</item>
<item>BA,387</item>
<item>BB,1</item>
<item>BD,880</item>
<item>BE,32</item>
<item>BF,226</item>
<item>BG,359</item>
<item>BH,973</item>
<item>BI,257</item>
<item>BJ,229</item>
<item>BL,590</item>
<item>BM,1</item>
<item>BN,673</item>
<item>BO,591</item>
<item>BR,55</item>
<item>BS,1</item>
<item>BT,975</item>
<item>BW,267</item>
<item>BY,375</item>
<item>BZ,501</item>
<item>CA,1</item>
<item>CD,243</item>
<item>CF,236</item>
<item>CG,242</item>
<item>CH,41</item>
<item>CI,225</item>
<item>CK,682</item>
<item>CL,56</item>
<item>CM,237</item>
<item>CN,86</item>
<item>CO,57</item>
<item>CR,506</item>
<item>CU,53</item>
<item>CV,238</item>
<item>CY,357</item>
<item>CZ,420</item>
<item>DE,49</item>
<item>DJ,253</item>
<item>DK,45</item>
<item>DM,1</item>
<item>DO,1</item>
<item>DZ,213</item>
<item>EC,593</item>
<item>EE,372</item>
<item>EG,20</item>
<item>ER,291</item>
<item>ES,34</item>
<item>ET,251</item>
<item>FI,358</item>
<item>FJ,679</item>
<item>FK,500</item>
<item>FM,691</item>
<item>FO,298</item>
<item>FR,33</item>
<item>GA,241</item>
<item>GB,44</item>
<item>GD,1</item>
<item>DE,995</item>
<item>GF,594</item>
<item>GH,233</item>
<item>GI,350</item>
<item>GL,299</item>
<item>GM,220</item>
<item>GN,224</item>
<item>GP,590</item>
<item>GQ,240</item>
<item>GR,30</item>
<item>GT,502</item>
<item>GU,1</item>
<item>GW,245</item>
<item>GY,592</item>
<item>HK,852</item>
<item>HN,504</item>
<item>HR,385</item>
<item>HT,509</item>
<item>HU,36</item>
<item>ID,62</item>
<item>IE,353</item>
<item>IL,972</item>
<item>IN,91</item>
<item>IO,246</item>
<item>IQ,964</item>
<item>IR,98</item>
<item>IS,354</item>
<item>IT,39</item>
<item>JA,81</item>
<item>JM,1</item>
<item>JO,962</item>
<item>JP,81</item>
<item>KE,254</item>
<item>KG,996</item>
<item>KH,855</item>
<item>KI,686</item>
<item>KM,269</item>
<item>KN,1</item>
<item>KP,850</item>
<item>KR,82</item>
<item>KW,965</item>
<item>KY,1</item>
<item>KZ,7</item>
<item>LA,856</item>
<item>LB,961</item>
<item>LC,1</item>
<item>LI,423</item>
<item>LK,94</item>
<item>LR,231</item>
<item>LS,266</item>
<item>LT,370</item>
<item>LU,352</item>
<item>LV,371</item>
<item>LY,218</item>
<item>MA,212</item>
<item>MC,377</item>
<item>MD,373</item>
<item>ME,382</item>
<item>MG,261</item>
<item>MH,692</item>
<item>MK,389</item>
<item>ML,223</item>
<item>MM,95</item>
<item>MN,976</item>
<item>MO,853</item>
<item>MP,1</item>
<item>MQ,596</item>
<item>MR,222</item>
<item>MS,1</item>
<item>MT,356</item>
<item>MU,230</item>
<item>MV,960</item>
<item>MW,265</item>
<item>MX,52</item>
<item>MY,60</item>
<item>MZ,258</item>
<item>NA,264</item>
<item>NC,687</item>
<item>NE,227</item>
<item>NG,234</item>
<item>NI,505</item>
<item>NL,31</item>
<item>NO,47</item>
<item>NP,977</item>
<item>NR,674</item>
<item>NU,683</item>
<item>NZ,64</item>
<item>OM,968</item>
<item>PA,507</item>
<item>PE,51</item>
<item>PF,689</item>
<item>PG,675</item>
<item>PH,63</item>
<item>PK,92</item>
<item>PL,48</item>
<item>PM,508</item>
<item>PR,1</item>
<item>PS,970</item>
<item>PT,351</item>
<item>PW,680</item>
<item>PY,595</item>
<item>QA,974</item>
<item>RE,262</item>
<item>RO,40</item>
<item>RS,381</item>
<item>RU,7</item>
<item>RW,250</item>
<item>SA,966</item>
<item>SB,677</item>
<item>SC,248</item>
<item>SD,249</item>
<item>SE,46</item>
<item>SG,65</item>
<item>SH,290</item>
<item>SI,386</item>
<item>SK,421</item>
<item>SL,232</item>
<item>SM,378</item>
<item>SN,221</item>
<item>SO,252</item>
<item>SR,597</item>
<item>ST,239</item>
<item>SV,503</item>
<item>SX,1</item>
<item>SY,963</item>
<item>SZ,268</item>
<item>TC,1</item>
<item>TD,235</item>
<item>TG,228</item>
<item>TH,66</item>
<item>TJ,992</item>
<item>TK,690</item>
<item>TM,993</item>
<item>TN,216</item>
<item>TO,676</item>
<item>TR,90</item>
<item>TT,1</item>
<item>TV,688</item>
<item>TW,886</item>
<item>TZ,255</item>
<item>UA,380</item>
<item>UG,256</item>
<item>UK,44</item>
<item>US,1</item>
<item>UY,598</item>
<item>UZ,998</item>
<item>VA,379</item>
<item>VC,1</item>
<item>VE,58</item>
<item>VG,1</item>
<item>VI,1</item>
<item>VN,84</item>
<item>VU,678</item>
<item>WF,681</item>
<item>WS,685</item>
<item>XT,800</item>
<item>YE,967</item>
<item>ZA,27</item>
<item>ZM,260</item>
<item>ZW,263</item>
</string-array>

Categories