Issue with Regular Expression - java

I made an application in which I can get a field's value through a regular expression with the help of a matcher... I made a method in which I pass a field and get a response. In string today I got some odd behaviour in my response I got AgentId = 25001220052805950 and after matcher I got fake so I have to check whether a field whose name contains "AgentId" exists and verify the values.
Needed Fields:
SecondaryAgentId=fake; PrimaryAgentId=fake;
Responce :
IsPrimaryAgentId=true; AgentId=25001220052805950; MerchantID=19; Cashier=michael; IsManualPayment=1; UserID=GraceRose; Password=rose1234; AmountUserEntered=2; AmountApproved=0; AmountDifference=0; Amount=0; CustomerNameAttempts=0; ProductID=Agriculture; InvoiceID=inv7443; SiteUrl=http://www.thcelink.com/index.php/shoping/checkout/step/step-1; ReturnURL=http://220.2.3:2027/Customer/Thanks.aspx; ResponseType=1; PrimaryAgentId=fake; PrimaryCurrencyCode=fake; SecondaryAgentId=fake; SecondaryCurrencyCode=fake; MerchantName=GraceRose; EmailId=rr#myglobal.com; Query1Attempts=0; MerchantTransactionID=543; MerchantTransactionSequenceID=246; txtAmtIsVisible=false; isQuery1Executed=false; isQuery2Executed=false; Voucher=fake; Passcode=fake; Error=fake; QueryType=fake; Payer=fake; CurrencyName=fake; CurrencySymbol=fake; CustomerName=fake; EmailBody=fake; ErrorText=fake; CustomerEmailID=fake; NavigatePageValue=0; IsCustomerInsertSucess=false; IdType=fake; IdNumber=fake; AggregateAttempts=0; Voucher2=fake; PassCode2=fake; Voucher3=fake; PassCode3=fake; TransCode=0; TransactionDate=2012-06-11T12:04:52.921875+05:30; NumberInWords=fake; MerchantCompany=fake; InvoiceNumber=fake; OverPaidAmount=0; InsufficientAmount=0; OverPaymentForEmail=fake; RedirectPage=false;
Update::
private String GetString1(String strManualproResponce2, String paternField) {
// TODO Auto-generated method stub
String s = null;
if(paternField.equalsIgnoreCase("AgentId"))
{
Pattern pinPattern2 = Pattern.compile("^"+paternField + "=(.*?);");
ArrayList<String> pins2 = new ArrayList<String>();
Matcher m2 = pinPattern2.matcher(strManualproResponce2);
while (m2.find()) {
pins2.add(m2.group(1));
s = m2.group(1);
}
}else
{
Pattern pinPattern2 = Pattern.compile(paternField + "=(.*?);");
ArrayList<String> pins2 = new ArrayList<String>();
Matcher m2 = pinPattern2.matcher(strManualproResponce2);
while (m2.find()) {
pins2.add(m2.group(1));
s = m2.group(1);
}
}
return s;
}

Your question is a little bit cryptic, from what I am understanding is that the code is not working for when you would like to match/extract the value for the AgentId field. The issue seems to be with your regular expression: "^"+paternField + "=(.*?);" assumes that the text AgentId will be at the beginning of your string, which is not since at the beginning of your string you have IsPrimaryAgentId.
Also, your current regex will return true both for IsPrimaryAgentId and AgentId since they both contain the substring: AgentId. To fix this, you can either use this regex: \\s+AgentId=(.*?);, this will require a white space before the AgentId text.
Another option would be (if your AgentId will always be numerical) to use this: AgentId=(\\d+);.

Related

Using regex and android for categorizing different fields

I am currently trying do a business name card scanner app. The idea here is to take a picture of a name card and it would extract the text and categorize the text into different EditText.
I have already completed the OCR part which extract out all the text from a name card image.
What I am missing now is to make a regex method which can take this entire text extracted from OCR and categorize the name, email address, phone number into their respective fields in EditText.
Through some googling I have already found the regex formulas below:
private static final String EMAIL_PATTERN =
"[a-zA-Z0-9\\+\\.\\_\\%\\-\\+]{1,256}" +
"\\#" +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}" +
"(" +
"\\." +
"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,25}" +
")+";
private static final String PHONE_PATTERN =
"^[89]\\d{7}$";
private static final String NAME_PATTERN =
"/^[a-z ,.'-]+$/i";
Currently I am just able to extract out the email address using the below method:
public String EmailValidator(String email) {
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(email);
if (matcher.find()) {
return email.substring(matcher.start(), matcher.end());
} else {
// TODO handle condition when input doesn't have an email address
}
return email;
}
I am unsure of how to edit the ^above method^ to include using all the 3 regex patterns at once and display them to different EditText fields like (name, email address, phone number).
--------------------------------------------EDIT-------------------------------------------------
After using #Styx answer,
it has a problem with the parameter whereby how I used to pass the text "textToUse" to the method as shown below:
I have also tried passing the text into all three parameters. But since the method is void, it cannot be done. Or if I change the method to a String instead of void, it would require a return value.
Try this code. The function takes in the recognize text and split it using break line symbol. Then run a loop and determine the type of content by running a pattern check. Whenever a pattern is determined then the loop will go into next iteration using continue keyword. This piece of code also able to handle situation where 1 or more email and phone number appear on a single business card. Hope it helps. Cheers!
public void validator(String recognizeText) {
Pattern emailPattern = Pattern.compile(EMAIL_PATTERN);
Pattern phonePattern = Pattern.compile(PHONE_PATTERN);
Pattern namePattern = Pattern.compile(NAME_PATTERN);
String possibleEmail, possiblePhone, possibleName;
possibleEmail = possiblePhone = possibleName = "";
Matcher matcher;
String[] words = recognizeText.split("\\r?\\n");
for (String word : words) {
//try to determine is the word an email by running a pattern check.
matcher = emailPattern.matcher(word);
if (matcher.find()) {
possibleEmail = possibleEmail + word + " ";
continue;
}
//try to determine is the word a phone number by running a pattern check.
matcher = phonePattern.matcher(word);
if (matcher.find()) {
possiblePhone = possiblePhone + word + " ";
continue;
}
//try to determine is the word a name by running a pattern check.
matcher = namePattern.matcher(word);
if (matcher.find()) {
possibleName = possibleName + word + " ";
continue;
}
}
//after the loop then only set possibleEmail, possiblePhone, and possibleName into
//their respective EditText here.
}

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters
You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>
To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);
Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

regex matcher check in if logic not working

Hi, you can see my code below. I have some strings Country, rank and grank in my code, initially they will be null, but if regex is mached, it should change the value. But even if regex is matched it is not changing the value it is always null. If I remove all if statements and append the string it works fine, but if match is not found it is throwing an exception. Please let me know how can I check this in if logic.
System.err.println(content);
Pattern c = Pattern.compile("NAME=\"(.*)\" RANK");
Pattern r = Pattern.compile("\" RANK=\"(.*)\"");
Pattern gr = Pattern.compile("\" TEXT=\"(.*)\" SOURCE");
Matcher co = c.matcher(content);
Matcher ra = r.matcher(content);
Matcher gra = gr.matcher(content);
co.find();
ra.find();
gra.find();
String country = null;
String Rank = null;
String Grank = null;
if (co.matches()) {
country = co.group(1);
}
if (ra.matches()) {
Rank = ra.group(1);
}
if (gra.matches()) {
Grank = gra.group(1);
}
You have to escape a single \ - use double \\ then it should work.
Tried this?
while (co.find()) {
System.out.print("Start index: " + co.start());
System.out.print(" End index: " + co.end() + " ");
System.out.println(co.group());
}
Personally I can't make your program work with / without the if so it's not a problem of logic but just a problem that it doesn't match the string for me
So I changed it to get something working, maybe you can use it :)
String content = "NAME=\"salut\" RANK=\"pouet\" TEXT=\"text\" SOURCE";
System.out.println(content);
System.out.println(content.replaceAll(("NAME=\"(.*)\"\\sRANK=\"(.*)\"\\sTEXT=\"(.*)\" SOURCE"), "$1---$2---$3"));
Output
NAME="salut" RANK="pouet" TEXT="text" SOURCE
salut---pouet---text

String Parsing in Java and android

I want to parse the data below in java. What approach shall I follow?
I want to neglect ; inside { }.
Thus Version, Content, Provide, UserConfig and Icon as name and corresponding values.
Version:"1";
Content:2013091801;
Provide:"Airtel";
UserConfig :
{
Checksum = "sha1-234448e7e573b6dedd65f50a2da72245fd3b";
Source = "content\\user.ini";
};
Icon:
{
Checksum = "sha1-a99f835tytytyt3177674489770e613c89390a8c4";
Source = "content\\resept_ico.bmp";
};
Here we can't use String.split(";") function.
It would have been lot more complex to convert using the Regex and then creating a method to extract the required fields,
What I did was converted the above mentioned input to Json compatible string and then used GSON library by google to parse the String to my customized class,
class MyVer
{
String Version;
long Content;
String Provide;
Config UserConfig;
Config Icon;
String Source;
}
class Config
{
String Checksum;
String Source;
}
public static void main(String[] args)
{
String s = "Version:\"1\";Content:2013091801;Provide:\"Airtel\";UserConfig :{ Checksum = \"sha1-234448e7e573b6dedd65f50a2da72245fd3b\"; Source = \"content\\user.ini\";};Icon:{ Checksum = \"sha1-a99f835tytytyt3177674489770e613c89390a8c4\"; Source = \"content\\resept_ico.bmp\";};";
String startingBracePattern = Pattern.quote("{");
String endBracePattern = Pattern.quote("}");
s=s.replaceAll(Pattern.quote("\\"), "\\\\\\\\"); //Replacing all the single \ with double \\
s = s.replaceAll("\\s*"+startingBracePattern +"\\s*", "\\{\""); //Replacing all the `spaces { spaces` with `{"` MEANS all the { to replace with {"
s = s.replaceAll(";\\s*"+endBracePattern +"\\s*;", "\\};"); //Replacing all the `; spaces } spaces ;` with `},"` MEANS all the ;}; to replace with };
s = "{\"" + s.substring(0, s.length() - 1) +"}"; //Removing last ; and appending {" and }
s = s.replaceAll("\\s*:", "\":"); // Replacing all the `space with :` with `":`
s = s.replaceAll("\\s*;\\s*", ",\""); //Replacing all the `spaces ; spaces` with `,"`
s = s.replaceAll("\\s*=\\s*", "\":"); //Replacing all the `spaces = spaces` with `":`
Gson gson = new Gson();
MyVer newObj = gson.fromJson(s, MyVer.class);
}
This converts and give you the object of MyVer and then you can access all the variables.
NOTE: You can alter the code little to replace all \r\n if they are present in your input variables. I have not used them and your actual data supplied in question in a single line for simplicity.
JSON sounds a lot easier in this case..
.. however, if you were to do this using regular expressions, one way would be:
for the simple cases (eg. version):
// look for Version: some stuff ;
Pattern versionPattern = Pattern.compile("Version\\s*:\\s*\"\\w+\"\\s*;");
// the whole big string you're looking in
String bigString = ...; // the entire string from before can go here
// create a matcher for the "version pattern"
Matcher versionMatcher = versionPattern.matcher(bigString);
// check if there's a match in the string
if(versionMatcher.find()) {
// get the matching substring
String matchingSubstring = bigString.substring(
versionMatcher.start(),
versionMatcher.end()
);
// we need the area between the quotes
String version = matchingSubstring.split("\"")[1];
// do something with it
...
}
for the harder (multi-line) cases (eg. UserConfig):
// look for UserConfig : { some stuff };
Pattern userconfigPattern = Pattern.compile("UserConfig\\s*:\\s*{[^}]*};", Pattern.DOTALL);
// create a matcher for the "user config pattern"
Matcher userconfigMatcher = userconfigPattern.matcher(bigString);
// check if there's a match in the string
if(userconfigMatcher.find()) {
// get the matching substring
String matchingSubstring = bigString.substring(
userconfigMatcher.start(),
userconfigMatcher.end()
);
// we need the area between the curly braces
String version = matchingSubstring.split("[{}]")[1];
// do something with it
...
}
EDIT: this is probably an easier way
// split the input string into fields
String[] fields = bigString.split("[^:]+:([^{;]+;)|({[^}]+};)");
// for each key-value pair
for(String field : fields) {
// the key and value are separated by colons
String parts = field.split(":");
String key = parts[0];
String value = parts[1];
// do something with them, or add them to a map
...
}
This last way splits the input string based on the assumption that each key-value pair consists of:
some (non-colon) characters at the start, followed by
a colon,
either
-> some characters that are not curly braces or semi-colons (for simple attributes), or
-> curly braces containing some characters that are not curly braces
a semi-colon
Here is json solution
str = "{" + str.substring(0, str.lastIndexOf(";")).replace(";\n}", "}") + "}";
try {
JSONObject json = new JSONObject(str);
String version = json.getString("Version");
JSONObject config = json.getJSONObject("UserConfig");
String source = config.getString("Source");
} catch (JSONException e) {
e.printStackTrace();
}
since ";" should not be in front of "}"
Source = "content\\resept_ico.bmp";
}
we need remove them

Validate File name using regex in java

Here i want to validate filename using regex in java. i implemented below code but this is not works for me for 3rd type file.
Can i check prefix and extenstion in regex ???
My validate filename looks like these 3 ways
1) prefix_digit.digit.extenstion example : AB_1.1.fuij (Here fuij is my extension)
2) prefix_digit.digit.digit.extenstion example : AB_1.1.1.fuij
3) prefix_digit.digit.B/P.digit.extensionexample : AB_1.1.B.1.fuij
Only these 3 types of file valid. 3rd one is beta and pilot version files. if beta and pilot version file is there than is should be like this which i mentioned above
I am going to write some valid and invalid files
**Valid :**
AB_1.1.fuij
AB_1.4.fuij
AB_1.1.1.fuij
AB_1.1.B.1.fuij
AB_3.4.P.7.fuij
***Invalid :***
AB_0.1.fuij
AB_1.B.1.1.fuij(B/P should be place on 3rd always)
AB_1.2.B.0.fuij
CODE :
import java.util.ArrayList;
import java.util.regex.Pattern;
public class democlass {
/**
* Test harness.
*/
public static void main(String[] args) {
ArrayList<String> demoversion = new ArrayList<String>();
System.out.println("Result >>>>>>>>>>>> "
+isFileValid("AB_1.1.fuij"));
System.out.println("Result >>>>>>>>>>>> "
+isFileValid("AB_1.B.fuij"));
System.out.println("Result >>>>>>>>>>>> "
+isFileValid("AB_1.1.1.fuij"));
System.out.println("Result >>>>>>>>>>>> "
+isFileValid("AB_1.P.1.1.fuij"));
System.out.println("Result >>>>>>>>>>>> "
+isFileValid("AB_1.1.B.1.fuij"));
}
private static boolean isFileValid(String input)
{
String regexFinalBugFix = "^\\d+\\.\\d+\\.\\d+$";
String regexFinal = "^\\d+\\.\\d+$";
String regexBetaPilot = "^\\d+\\.\\d+\\.\\[BP]+\\.\\d+$";
final Pattern pattern1 = Pattern.compile(regexFinal);
final Pattern pattern2 = Pattern.compile(regexBetaPilot);
final Pattern pattern3 = Pattern.compile(regexFinalBugFix);
String inputVersion = null;
int suffixIndex = input.lastIndexOf(".");
int prefixIndex = input.lastIndexOf("_");
if (suffixIndex > 0 && prefixIndex > 0) {
inputVersion = input.substring(prefixIndex + 1,
suffixIndex);
String prefixString1 = input.substring(0, 3);
String suffixString1 = input.substring(suffixIndex);
if(prefixString1.equals("AB_") && suffixString1.equals(".fuij"))
{
if (pattern1.matcher(inputVersion).matches()
|| pattern2.matcher(inputVersion).matches()
|| pattern3.matcher(inputVersion).matches()) {
return true;
}
return false;
}
return false;
}
return false;
}
}
OUTPUT :
Result >>>>>>>>>>>> true
Result >>>>>>>>>>>> false
Result >>>>>>>>>>>> true
Result >>>>>>>>>>>> false
Result >>>>>>>>>>>> false : It should be valid but it is false, why??
Your regexBetaPilot is wrong: you are escaping the opening bracket of the [BP] class. Try this instead:
String regexBetaPilot = "^\\d+\\.\\d+\\.[BP]+\\.\\d+$";
You can easily combine all three patterns into a single pattern:
String regex = "\\d+\\.(\\d+\\.([BP]+\\.)?)?\\d+";
You don't need the anchors (^ and $). Since you are using matches() instead of find(), it will always try to match the entire string.
EDIT I left in the + after [BP] because that's what you had in your original code. However, if you want to match a single B or P, then you should remove the + from the pattern.
You are escaping the opening bracket of [BP], so it tries to find a [ in the string.
This works:
String regexBetaPilot = "^\\d+\\.\\d+\\.[BP]+\\.\\d+$";
Something like this should work with AB being static:
Regular Expression: AB_\d+\.\d+((\.\d){0,1}|\.[BP]\.\d+)\.fuij
as a Java string AB_\\d+\\.\\d+((\\.\\d){0,1}|\\.[BP]\\.\\d+)\\.fuij
This misses two of your listed invalids, but I was unsure why they should be invalid. I can halep more if you explain the rules for success / failure better?
You can simplify your regular expression to
AB_\d+\.\d+(?:(?:\.[BP])?\.\d+)?\.fuij
This matches AB_digits.digits. Then comes an optional .digits, .B.digits or .P.digits. And finally matches .fuij. From your examples, there might be only a single B or P. If you wish to match multiple Bs and Ps, just add the + again.
And then your isFileValid() function might be reduced to
private static boolean isFileValid(String input)
{
final String re = "AB_\\d+\\.\\d+(?:(?:\\.[BP])?\\.\\d+)?\\.fuij";
final Pattern pattern = Pattern.compile(re);
return pattern.matcher(input).matches();
}

Categories