I must extract a sub-string but only if a condition is met.
The users in this column must fill the telephone numbers of clients (It's a varchar column),
Some examples of those stored values are:
23==880-3112==9435
52==031 31466==171
321==15850
The '=' are numbers I don't wanna share.
I need to extract the mobile number (The one with the length of 10), but as you can see, those numbers are not stored in the same positions, I can't use a LEFT() or RIGHT() function because of that, some are separated with '-', others with spaces between the house number and mobile number.
If this can't be done with SQL, I'm using Java but I don't even know where to start.
Thanks in advance.
EDIT:
I'm using SQL SERVER 2012
The expected results from the examples are
3112==9435
31466==171
312==15850
I want to obtain always the 10 characters number.
Sorry and thanks
You can use String comparison function
Where part of your 1st example will be
WHERE TELEPHONE_NO LIKE '23%%880-3112%%9435'
[Edit]
To extract a substring you can use https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_substring-index
mysql> SELECT SUBSTRING_INDEX('23==880-3112==9435', '-', 10);
-> '3112==9435'
Related
I am new to Java and I am trying to write a program for finding a specific string value in CSV documents that use a pipe ( | ) as the delimiter (only using core Java, so no CSVreader library). I need to find a specific string "Employee Name" in the CSV docs, but since they are inconsistently formatted, I want the program to essentially use the searched string("Employee Name") as the column header and return all of the values under it, until it reaches the null value at the end of the list (the number of elements that would be in said column are unknown).
I was planning on using the Scanner class with a series of loops to keep track of the number of Delimiter occurrences per line (that is, restarting the count after every linebreak) so that I could use the relative location of the searched string to retrieve the values under it that would be a part of it as a "column". (So if there were 3 Delimiters before "Employee Name" was matched on line 4, the program would return the values of the field that is 3 Delimiters in for Line 5, Line 6, Line 7, etc until it hits a null value in one of the lines.
Sidenote: I am trying to allow for the search term to permit Whitespaces (such as in the case of "Employee Name", so I have been using the custom Delimiter:
scanner.UseDelimiter("[|\r\n]")
If this regex isn't correct for the task I am trying to accomplish, please let me know.
I would sincerely appreciate any suggestions or guidance in solving this.
I do have a Java Web Application, where I get some inputs from the user. Once I got this input I have to parse it and the parsing part depends on what kind of input I'll get. I decided to use the Pattern class of java for some of predefined user inputs.
So I need the last 2 regex patterns:
a)Enumaration:
input can be - A03,B24.1,A25.7
The simple way would be to check if there are a comma in there ([^,]+) but it will end up with a lot of updates in to parsing function, which I would like to avoid. So, in addition to comma it should check if it starts with
letter
minimum 3 letters (combined with numbers)
can have one dot in the word
minimum 1 comma (updated it)
b) Mixed
input can be A03,B24.1-B35.5,A25.7
So all of what Enumuration part got, but with addition that it can have a dash minimum one.
I've tried to use multiple online regex generators but didnt get it correct. Would be much appreciated if you can help.
Here is what I got if its B24.1-B35.5 if its just a simple range.
"='.{1}\\d{0,2}-.{1}\\d{0,2}'|='.{1}\\d{1,2}.\\d{1,2}-.{1}\\d{1,2}.\\d{1,2}'";
Edit1: Valid and Invalid inputs
for a)Enumaration
A03,B24.1,A25.7 Valid
A03,B24.1 Valid
A03,B24.1-B25.1 -Invalid because in this case (enumaration) it should not contain dash
A03 invalid because no comma
A03,B24.1 - Valid
A03 Invalid
for b)Mixed
everything that a enumeration has with addition that it can have dash too.
You can use this regex for (a) Enumeration part as per your rules:
[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?(?:,[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
Rules:
Verifies that each segment starts with a letter
Minimum of three letters or numbers [A-Za-z][A-Za-z0-9]{2,}
Optionally followed by decimal . and one or more alphabets and numbers i.e (?:\.[A-Za-z0-9]{1,})?
Same thing repeated, and seperated by a comma ,. Also must have atleast one comma so using + i.e (?:,[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
?: to indicate non-capturing group
Using [A-Za-z0-9] instead of \w to avoid underscores
Regex101 Demo
For (b) Mixed, you haven't shared too many valid and invalid cases, but based on my current understanding here's what I have:
[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?(?:[,-][A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
Note that , from previous regex has been replaced with [,-] to allow - as well!
Regex101 Demo
// Will match
A03,B24.1-B35.5,A25.7
A03,B24.1,A25.7
A03,B24.1-B25.1
Hope this helps!
EDIT: Making sure each group starts with a letter (and not a number)
Thanks to #diginoise and #anubhava for pointing out! Changed [A-Za-z0-9]{3,} to [A-Za-z][A-Za-z0-9]{2,}
As I said in the comments, I would chop the input by commas and verify each segment separately. Your domain ICD 10 CM codes is very well defined and also I would be very wary of any input which could be non valid, yet pass the validation.
Here is my solution:
regex
([A-TV-Z][0-9][A-Z0-9](\.?[A-Z0-9]{0,4})?)
... however I would avoid that.
Since your domain is (moste likely) medical software, people's lives (or at least well being) is at stake. Not to mention astronomical damages and the lawyers ever-chasing ambulances. Therefore avoid the easy solution, and implement the bomb proof one.
You could use the regex to establish that given code is definitely not valid. However if a code passes your regex it does not mean that it is valid.
bomb proof method
See this example: O09.7, O09.70, O09.71, O09.72, O09.73 are valid entries, but O09.1 is not valid.
Therefore just get all possible codes. According to this gist there are 42784 different codes. Just load them to memory and any code which is not in the set, is not valid. You could compress said list and be clever about the encoding in memory, to occupy less space, but verbatim all codes are under 300kB on disk, so few MBs max in memory, therefore not a massive cost to pay for a price of people not having left instead of right kidney removed.
We are using Endeca to fetch the records since they are huge in number. We have a dataTable at frontend that displays the records fetched from Endeca through Endeca query.
Now, when we filter the results based on the checkbox values at frontend, query appends Nrs attribute and get the filtered results. For any chinese or russian or special characters, we encode them and create the query. Example:
N=0&Ntk=All&Ntx=mode+matchall&Ntt=rumtek&Nrs=collection()/record[(customerName="%22RUMTEK%22+LTD.")]&No=0&Ns=,Endeca.stratify(collection()/record[not%20(invoiceDate)])||invoiceDate|1||,Endeca.stratify(collection()/record[not%20(invoiceNumber)])||invoiceNumber|1
In above query, results are fetched based on value "rumtek" and we apply filter by giving value as ""RUMTEK" LTD.". After encoding, filter value is converted to "%22RUMTEK%22+LTD.". This query fetches no result.
Results are fetched when we either give the complete encoded term (like for any chinese word we give encoded value) or any English word. Results are not fetched when give terms containing double quotes(") example "ABC" LTD. or AB&C (AB%26C).
One more issue is:- what if we have made AB as Stop word (words that won't be searched). If we search for AB&C, then would it search the results for AB&C or it world make the entire term as stop word.
Any suggestion will be appreciated.
Thanks in Advance.
First, you need to make sure that your Nrs parameter is entirely and properly URL encoded. Second, you need to make sure you properly escape your double quotes because you want to match against them.
As you said, your data contains some record whose customerName property is (without brackets) ["RUMTEK" LTD.]. According to the MDEX Development Guide, to use double quotes as a literal value you need to escape it by prepending it with a double quote character (how confusing!). So, in order to match on this, you would need to have a query string like (separated into lines for readability):
N=0&
Ntk=All&
Ntx=mode+matchall&
Ntt=rumtek&
Nrs=collection()/record[(customerName="""RUMTEK"" LTD.")]&
&No=0&
Ns=,Endeca.stratify(collection()/record[not%20(invoiceDate)])||invoiceDate|1||,Endeca.stratify(collection()/record[not%20(invoiceNumber)])||invoiceNumber|1
Now, it isn't ready yet. You need to URL encode the ENTIRE Nrs parameter value. So it would become:
N=0&
Ntk=All&
Ntx=mode+matchall&
Ntt=rumtek&
Nrs=collection%28%29%2Frecord%5B%28customerName%3D%22%22%22RUMTEK%22%22+LTD.%22%29%5D&
&No=0&
Ns=,Endeca.stratify(collection()/record[not%20(invoiceDate)])||invoiceDate|1||,Endeca.stratify(collection()/record[not%20(invoiceNumber)])||invoiceNumber|1
That should get you what you need without having to resort to wildcard queries.
I have numbers like 32,33,33.1,33.2,34,34.1,35,35.1,35.2,35.3,35.4,36 and so on. Now is it possible that if I will change the number 32 to 50, then all respective numbers will also change like 50,51,51.1,51.2,52,52.1,53,53.1,53.2,53.3,53.4,54 may be using regexp pattern or anything or coding in java.
Based on the Excel tag and assuming the numbers are in different cells, key 18 and copy that cell, select the numbers and Paste Special with Operation Add.
You will need to code this in Java. Arithmetic is impractical to do this with regexes.
community
I am looking to compare partial strings of data thats in two seperate columns. And if its a match print a statement in a third column like " yes its a match" or "there is no match". The problem is there is extra data in the first column so it wont be an exact match so I'm essentially searching or comparing certian words. I have over 8000 rows and doing it one by one would take forever is there a function I can use in excel to make this process easier.
In Excel you can combine SEARCH() and ISERROR().
SEARCH() returns the index of the start of a string match, or the #VALUE error otherwise. Using IF(ISERROR()) on this will let you output something based on whether there was a match or not.
=IF(ISERROR(SEARCH(B2,A2)),"No Match", "Match")