regex to find integers in string not matching - java

I have two tables' contents stored in Stringbuffers. One has data in it; the other is only a header. I converted the Stringbuffers into Strings and removed whitespace.
table1:
ACCOUNT_NUMBER;BRANCH_CODE;RECALC_ACTION_CODE;RECALC_DATE;PROCESS_NO;PRINCIPAL_CHG_AMXX23QRUP120970003;023;E;05.09.2013;1;-522.53
table2:
ACCOUNT_NUMBER;BRANCH_CODE;MSG_TYPE
I only want to proceed with a table if it has data in it, like table1.
To check for data (i.e integers) I used regex: table1.matches("\\d"), but this returns false. I also tried table1.matches("(?s)\\d")), for new line character but even this returns false.
How can I check for integer data in the strings?

Read the documentation on matches. The "match" requires the entire string to match, and so your table1.matches("\\d") fails -- "table1" is not 'one digit only'.
Use table1.matches(".*\\d.*") instead. Note the double backslash! You might not be aware they need escaping in a String constant.

Related

Match custom pattern in regex multiple times

I am trying to parse a query which I need to modify to replace a specific property and its value with another property and different values. I am struggling to write a regex that will match the specify property and its value that I need.
Here are some examples to illustrate my point. test:property is the property name that we need to match.
Property with a single value: test:property:schema:Person
Property with multiple values (there is no limit on how many values there can be - this example uses 3): test:property:(schema:Person OR schema:Organization OR schema:Place)
Property with a single value in brackets: test:property:(schema:Person)
Property with another property in the query string (i.e. there are other parts of the string that I'm not interested in): test:property:schema:Person test:otherProperty:anotherValue
Also note that other combinations are possible such as other properties being before the property I need to capture, my property having multiple values with another property present in the query.
I want to match on the entire test:property section with each value captured within that match. Given the examples above these are the results I am looking for:
#
Match
Groups
1
test:property:schema:Person
schema:Person
2
test:property:(schema:Person OR schema:Organization OR schema:Place)
schema:Personschema:Organizationschema:Person
3
test:property:(schema:Person)
schema:Person
4
test:property:schema:Person
schema:Person
Note: #1 and #4 produce the same output. I wanted to illustrate that the rest of the string should be ignored (I only need to change the test:property key and value).
The pattern of schema:Person is defined as \w+\:\w+, i.e. one or more word characters, followed by a colon, followed by one or more word characters.
If we define the known parts of the string with names I think I can express what I want to match.
schema:Person - <TypeName> - note that the first part, schema in this case, is not fixed and can be different
test:property - <MatchProperty>
<MatchProperty>: // property name (which is known and the same - in the examples this is `test:property`) followed by a colon
( // optional open bracket
<TypeName>
(OR <TypeName>)* // optional additional TypeNames separated by an OR
) // optional close bracket
Every example I've found has had simple alphanumeric characters in the repeating section but my repeating pattern contains the colon which seems to be tripping me up. The closest I've got is this:
(test\:property:(?:\(([\w+\:\w+]+ [OR [\w+\:\w+]+)\))|[\w+\:\w+]+)
Which works okayish when there are no other properties (although the match for example #2 contains the entire property and value as the first group result, and a second group with the property value) but goes crazy when other properties are included.
Also, putting that regex through https://regex101.com/ I know it's not right as the backslash characters in the square brackets are being matched exactly. I started to have a go with capturing and non-capturing groups but got as far as this before giving up!
(?:(\w+\:\w+))(?:(\sOR\s))*(?:(\w+\:\w+))*
This isn't a complete solution if you want pure regex because there are some limitations to regex and Java regex in particular, but the regexes I came up with seem to work.
If you're looking to match the entire sequence, the following regex will work.
test:property:(?:\((\w+:\w+)(?:\sOR\s(\w+:\w+))*\)|(\w+:\w+))
Unfortunately, the repeated capture groups will only capture the last match, so in queries with multiple values (like example 2), groups 1 and 2 will be the first and last values (schema:Person and schema:Place). In queries without parentheses, the value will be in group 3.
If you know the maximum number of values, you could just generate a massive regex that will have enough groups, but this might not be ideal depending on your application.
The other regex to find values in groups of arbitrary length uses regex's positive lookbehind to match valid values. You can then generate an array of matches.
(?<=test:property:(?:(?:\((?:\w+:\w+\sOR\s)+)|\(?))\w+:\w+
The issue with this method is that it looks like Java lookbehind has some limitations, specifically, not allowing unbound or complex quantifiers. I'm not a Java person so I haven't tried things out for myself, but it seems like this wouldn't work either. If someone else has another solution, please post another answer!
With this in mind, I would probably suggest going with a combination regex + string parsing method. You can use regex to parse out the value or multiple values (separated by OR), then split the string to get your final values.
To match the entire part inside parentheses or the single value no parentheses, you can use this regex:
test:property:(?:\((\w+:\w+(?:\sOR\s\w+:\w+)*)\)|(\w+:\w+))
It's still split into two groups where one matches values with parentheses and the other matches values without (to avoid matching unpaired parentheses), but it should be usable.
If you want to play around with these regexes or learn more, here's a regexr: https://regexr.com/65kma

Regex - replace all nulls with "", except before pattern

I have a json (not pretty formatted) where all fields with null values need to be replaced with empty string (""), except when the field names (or keys) contains the word "date" or "Date" (or "_Date").
Example (not exhaustive):
"Effective_Date__c":null
"Birthdate":null
How to do this using Java Regex?
First, I will echo the sentiment that a real JSON parser is a better idea.
Second, assuming that it's one per line as your example, you can do this by using a negative lookbehind to check that the key is not preceeded by 'date'. Hard-coding the : as a separator, replacing
(?<!(?:d|D)ate):null
with
:''
Should get you what you've asked for.
This works by searching for :null without (d|D)ate preceding it, and replacing that by the :'' empty string you have requested.

Nrs in Endeca query is not fetching results when we give encoded value along with English character in url

We are using Endeca to fetch the records since they are huge in number. We have a dataTable at frontend that displays the records fetched from Endeca through Endeca query.
Now, when we filter the results based on the checkbox values at frontend, query appends Nrs attribute and get the filtered results. For any chinese or russian or special characters, we encode them and create the query. Example:
N=0&Ntk=All&Ntx=mode+matchall&Ntt=rumtek&Nrs=collection()/record[(customerName="%22RUMTEK%22+LTD.")]&No=0&Ns=,Endeca.stratify(collection()/record[not%20(invoiceDate)])||invoiceDate|1||,Endeca.stratify(collection()/record[not%20(invoiceNumber)])||invoiceNumber|1
In above query, results are fetched based on value "rumtek" and we apply filter by giving value as ""RUMTEK" LTD.". After encoding, filter value is converted to "%22RUMTEK%22+LTD.". This query fetches no result.
Results are fetched when we either give the complete encoded term (like for any chinese word we give encoded value) or any English word. Results are not fetched when give terms containing double quotes(") example "ABC" LTD. or AB&C (AB%26C).
One more issue is:- what if we have made AB as Stop word (words that won't be searched). If we search for AB&C, then would it search the results for AB&C or it world make the entire term as stop word.
Any suggestion will be appreciated.
Thanks in Advance.
First, you need to make sure that your Nrs parameter is entirely and properly URL encoded. Second, you need to make sure you properly escape your double quotes because you want to match against them.
As you said, your data contains some record whose customerName property is (without brackets) ["RUMTEK" LTD.]. According to the MDEX Development Guide, to use double quotes as a literal value you need to escape it by prepending it with a double quote character (how confusing!). So, in order to match on this, you would need to have a query string like (separated into lines for readability):
N=0&
Ntk=All&
Ntx=mode+matchall&
Ntt=rumtek&
Nrs=collection()/record[(customerName="""RUMTEK"" LTD.")]&
&No=0&
Ns=,Endeca.stratify(collection()/record[not%20(invoiceDate)])||invoiceDate|1||,Endeca.stratify(collection()/record[not%20(invoiceNumber)])||invoiceNumber|1
Now, it isn't ready yet. You need to URL encode the ENTIRE Nrs parameter value. So it would become:
N=0&
Ntk=All&
Ntx=mode+matchall&
Ntt=rumtek&
Nrs=collection%28%29%2Frecord%5B%28customerName%3D%22%22%22RUMTEK%22%22+LTD.%22%29%5D&
&No=0&
Ns=,Endeca.stratify(collection()/record[not%20(invoiceDate)])||invoiceDate|1||,Endeca.stratify(collection()/record[not%20(invoiceNumber)])||invoiceNumber|1
That should get you what you need without having to resort to wildcard queries.

Split result not usable on android

I have a string that I split, it works perfectly until i want to use it : when I use a 'for' to read what a have in my String table I shows exactly what I want, but when I use if(MyStringTable[1] == "a") it isn't true, even though I just saw that MyStringTable[1] was equal to "a".
My string table is "static" declared.
I'm wondering if there is an invisible character or something that has been created with the split.
In terms of Strings, use .equals() in order to check if a String is equal to another. If one of them is a character, cast it previously to a String using .toString() to make it match this approach.

Hidden char StringTokeneizer/split in Java?

I have a text file and each of its line is like that
author-title-kind
I have a Java program parsing this file and it must returns only the books whose author is "example".
I read a line at a time, and then I split the string with StringTokeneizer or split().
So I will get 3 items: author, title, kind.
Then I check if the first item string is equal to "example".
The problem is that I always get false, and never true.
Is there any hidden character so that this comparison ends always with false?
Maybe I should check with "example-", or "-example"...or anything else?
Remember that String.split() takes a regular expression as a separator and not just a string. I would use apache commons StringUtils.split() if you want basic string splitting with a simple string.

Categories