String Parsing in Java and android - java

I want to parse the data below in java. What approach shall I follow?
I want to neglect ; inside { }.
Thus Version, Content, Provide, UserConfig and Icon as name and corresponding values.
Version:"1";
Content:2013091801;
Provide:"Airtel";
UserConfig :
{
Checksum = "sha1-234448e7e573b6dedd65f50a2da72245fd3b";
Source = "content\\user.ini";
};
Icon:
{
Checksum = "sha1-a99f835tytytyt3177674489770e613c89390a8c4";
Source = "content\\resept_ico.bmp";
};
Here we can't use String.split(";") function.

It would have been lot more complex to convert using the Regex and then creating a method to extract the required fields,
What I did was converted the above mentioned input to Json compatible string and then used GSON library by google to parse the String to my customized class,
class MyVer
{
String Version;
long Content;
String Provide;
Config UserConfig;
Config Icon;
String Source;
}
class Config
{
String Checksum;
String Source;
}
public static void main(String[] args)
{
String s = "Version:\"1\";Content:2013091801;Provide:\"Airtel\";UserConfig :{ Checksum = \"sha1-234448e7e573b6dedd65f50a2da72245fd3b\"; Source = \"content\\user.ini\";};Icon:{ Checksum = \"sha1-a99f835tytytyt3177674489770e613c89390a8c4\"; Source = \"content\\resept_ico.bmp\";};";
String startingBracePattern = Pattern.quote("{");
String endBracePattern = Pattern.quote("}");
s=s.replaceAll(Pattern.quote("\\"), "\\\\\\\\"); //Replacing all the single \ with double \\
s = s.replaceAll("\\s*"+startingBracePattern +"\\s*", "\\{\""); //Replacing all the `spaces { spaces` with `{"` MEANS all the { to replace with {"
s = s.replaceAll(";\\s*"+endBracePattern +"\\s*;", "\\};"); //Replacing all the `; spaces } spaces ;` with `},"` MEANS all the ;}; to replace with };
s = "{\"" + s.substring(0, s.length() - 1) +"}"; //Removing last ; and appending {" and }
s = s.replaceAll("\\s*:", "\":"); // Replacing all the `space with :` with `":`
s = s.replaceAll("\\s*;\\s*", ",\""); //Replacing all the `spaces ; spaces` with `,"`
s = s.replaceAll("\\s*=\\s*", "\":"); //Replacing all the `spaces = spaces` with `":`
Gson gson = new Gson();
MyVer newObj = gson.fromJson(s, MyVer.class);
}
This converts and give you the object of MyVer and then you can access all the variables.
NOTE: You can alter the code little to replace all \r\n if they are present in your input variables. I have not used them and your actual data supplied in question in a single line for simplicity.

JSON sounds a lot easier in this case..
.. however, if you were to do this using regular expressions, one way would be:
for the simple cases (eg. version):
// look for Version: some stuff ;
Pattern versionPattern = Pattern.compile("Version\\s*:\\s*\"\\w+\"\\s*;");
// the whole big string you're looking in
String bigString = ...; // the entire string from before can go here
// create a matcher for the "version pattern"
Matcher versionMatcher = versionPattern.matcher(bigString);
// check if there's a match in the string
if(versionMatcher.find()) {
// get the matching substring
String matchingSubstring = bigString.substring(
versionMatcher.start(),
versionMatcher.end()
);
// we need the area between the quotes
String version = matchingSubstring.split("\"")[1];
// do something with it
...
}
for the harder (multi-line) cases (eg. UserConfig):
// look for UserConfig : { some stuff };
Pattern userconfigPattern = Pattern.compile("UserConfig\\s*:\\s*{[^}]*};", Pattern.DOTALL);
// create a matcher for the "user config pattern"
Matcher userconfigMatcher = userconfigPattern.matcher(bigString);
// check if there's a match in the string
if(userconfigMatcher.find()) {
// get the matching substring
String matchingSubstring = bigString.substring(
userconfigMatcher.start(),
userconfigMatcher.end()
);
// we need the area between the curly braces
String version = matchingSubstring.split("[{}]")[1];
// do something with it
...
}
EDIT: this is probably an easier way
// split the input string into fields
String[] fields = bigString.split("[^:]+:([^{;]+;)|({[^}]+};)");
// for each key-value pair
for(String field : fields) {
// the key and value are separated by colons
String parts = field.split(":");
String key = parts[0];
String value = parts[1];
// do something with them, or add them to a map
...
}
This last way splits the input string based on the assumption that each key-value pair consists of:
some (non-colon) characters at the start, followed by
a colon,
either
-> some characters that are not curly braces or semi-colons (for simple attributes), or
-> curly braces containing some characters that are not curly braces
a semi-colon

Here is json solution
str = "{" + str.substring(0, str.lastIndexOf(";")).replace(";\n}", "}") + "}";
try {
JSONObject json = new JSONObject(str);
String version = json.getString("Version");
JSONObject config = json.getJSONObject("UserConfig");
String source = config.getString("Source");
} catch (JSONException e) {
e.printStackTrace();
}
since ";" should not be in front of "}"
Source = "content\\resept_ico.bmp";
}
we need remove them

Related

Matching input String pattern with a given string and separating w.r.t special characters

I have a piece of data in the following formats/patterns :
String inputFruit = "[Apple,Banana(Mango-Juice,lemon-Pickle,Grape-Drinks)]";
String inputFruit = "Apple,Banana(Mango-Juice,lemon-Pickle,Grape-Drinks)"
String inputFruit = "Apple(Mango-Juice,lemon-Pickle,Grape-Drinks)Banana"
Now I have to extract and store individual datas like :
firstFruit = Apple
secondFruit = Banana
miscFruit = Mango-Juice,lemon-Pickle,Grape-Drinks
I have the following code snippet which I am using :
public static void splitFruits(String inputFruit)
{
String firstFruit = StringUtils.EMPTY;
String secondFruit = StringUtils.EMPTY;
String miscFruit = StringUtils.EMPTY;
inputFruit = inputFruit.replaceAll("\\[" , "");
inputFruit = inputFruit.replaceAll("\\]" , "");
String frts[] = inputFruit.split("\\("");
String frtp[] = frts[0].split(",");
firstFruit = frtp[0];
secondFruit = frtp[1];
miscFruit = frts[1];
}
Here I need to store Apple in variable firstFruit, Banana in secondFruit, and whatever is there inside () in miscFruit.
My code is able to extract value for a specific patter mentioned in no 1.How can I create pattern match statements to match with input values in all the above specified 3 different formats and store them separately.
Instead of using frts[0] to get the first and second fruits, combine frts[0] and frts[2] (that is, the parts on either side of the parenthetical section) and split that.

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters
You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>
To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);
Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

replacing the carriage return with white space in java

I am having the below string in a string variable in java.
rule "6"
no-loop true
when
then
String prefix = null;
prefix = "900";
String style = null;
style = "490";
String grade = null;
grade = "GL";
double basePrice = 0.0;
basePrice = 837.00;
String ruleName = null;
ruleName = "SIVM_BASE_PRICE_006
Rahul Kumar Singh";
ProductConfigurationCreator.createFact(drools, prefix, style,grade,baseprice,rulename);
end
rule "5"
no-loop true
when
then
String prefix = null;
prefix = "800";
String style = null;
style = "481";
String grade = null;
grade = "FL";
double basePrice = 0.0;
basePrice = 882.00;
String ruleName = null;
ruleName = "SIVM_BASE_PRICE_005";
ProductConfigurationCreator.createFact(drools, prefix, style,grade,baseprice,rulename);
end
I need to replace this the carriage return between "THEN" and "END" keyword with white space so that it becomes like below code:
rule "6"
no-loop true
when
then
String prefix = null;
prefix = "900";
String style = null;
style = "490";
String grade = null;
grade = "GL";
double basePrice = 0.0;
basePrice = 837.00;
String ruleName = null;
ruleName = "SIVM_BASE_PRICE_006 Rahul Kumar Singh";
ProductConfigurationCreator.createFact(drools, prefix, style,grade,baseprice,rulename);
end
rule "5"
no-loop true
when
then
String prefix = null;
prefix = "800";
String style = null;
style = "481";
String grade = null;
grade = "FL";
double basePrice = 0.0;
basePrice = 882.00;
String ruleName = null;
ruleName = "SIVM_BASE_PRICE_005";
ProductConfigurationCreator.createFact(drools, prefix, style,grade,baseprice,rulename);
end
In the above two example of string set, the second is correct format that I need. However, in the first set, I am getting this :
ruleName = "SIVM_BASE_PRICE_006
Rahul Kumar Singh";
This perticulerly needs to be like this:
ruleName = "SIVM_BASE_PRICE_006 Rahul Kumar Singh";
and I also need to ensure that this doesn't effect any thing else in the string.
Thus I need to replace this "carriage return" with a white space and make in one line. This is my requirment. I tried with replace and replaceAll method of string but not works properly.
Problem:
I need to look in between string "then" and "end" and in that whenever
there is any carriage return in between two double quaotes "" ""; I
need to replace this carriage return with white space and make it in
one line.
Thanks
EDIT:
DRT:
template header
Prefix
Style
Product
package com.xx
import com.xx.drools.ProductConfigurationCreator;
template "ProductSetUp"
rule "Product_#{row.rowNumber}"
no-loop true
when
then
String prefix = null;
prefix = "#{Prefix}";
String style = null;
prefix = "#{Style}";
String product = null;
product = "#{Product}";
ProductConfigurationCreator.createProductFact(drools,prefix,style,product);
end
end template
The excel and drt are for only demostration purpose.
In the Image, in Product column, there is "SOFAS \rkumar shorav". Actually this is creating problem. This will generate like below:
product = "SOFAS
kumar shorav";
I need this like below:
product = "SOFAS kumar shorav";
Then Excel data :
attached image.
Instead of regex I would probably write my own formatter which will
check if cursor is inside quote
replace each \r with space
replace each \n with space, unless it was placed right after \r which means that space was already placed for that \r
write rest of characters without change.
Only possible problem is that this formatter will not care about where string is placed so if you want to format some specific part of the string you will need to provide only that part.
Code implementing such formatter can look like:
public static String format(String text){
StringBuilder sb = new StringBuilder();
boolean insideQuote = false;
char previous = '\0';//to track `\r\n`
for (char ch : text.toCharArray()) {
if (insideQuote &&
(ch == '\r' ||
ch == '\n' && previous != '\r') ) {
sb.append(" ");//replace `\r` or `\n` with space
}else {
if (ch == '"') {
insideQuote = !insideQuote;
}
sb.append(ch); //write other characters without change
}
previous = ch;
}
return sb.toString();
}
helper utility method
public static String format(File file, String encoding) throws IOException {
String text = new String(Files.readAllBytes(file.toPath()), encoding);
return format(text);
}
Usage:
String formatted = format(new File("input.txt"), "utf-8");
System.out.println(formatted);
You might say that there is a bug in org.drools.template.parser.StringCell, method
public void addValue(Map<String, Object> vars) {
vars.put(column.getName(), value);
}
Here, the value is added to the Map as a String but this does not take into account that string values are usually expanded into string literals. Therefore, an embedded newline should be converted to the escape sequence \n. You might try this patch:
public void addValue(Map<String, Object> vars) {
String h = value.replaceAll( "\n", "\\\\n" );
vars.put(column.getName(), h);
}
Take the source file, put it into a suitable subdirectory, compile it to a class file and make sure that the root directory precedes drools-templates-6.2.0.Final-sources.jar in the class path. You should then see
ruleName = "SIVM_BASE_PRICE_006\nRahul Kumar Singh";
in the generated DRL file. Obviously, this is not a space, but it is what is written in the spreadsheet cell!
I suggest (urgently) that you do not follow this approach. The reason is simply this that strings are not always expanded between quotes, and then the replacement would result almost certainly in invalid code. There is simply no remedy as the template compiler is "dumb" and does not really "know" what it is expanding.
If a String in a spreadsheet contains a line break, template expansion must render this faithfully, and break the line just there. If this produces invalid (Java) code: why was the line break entered in the first place? There is absolutely no reason not to have a space in that cell if that's what you want.
s = s.replaceAll("(?m)^([^\"]*(\"[^\"]*\")*[^\"]*\"[^\"]*)\r?\n\\s*", "$1 ");
This replaces lines with an unpaired quotes to one with the line ending replaced.
^.... means starting at the line begin
[^\"] means not quote
\r?\n catches both CR+LF (Windows) as LF (Rest) line endings
not-quotes,
repetition of " not-quotes ",
not quotes, quote, not-quotes, newline
Mind this does not cover backslash+quote, escapes them-selves.
Use the "multi line" flag:
str = str.replaceAll("(?m)^\\s+", "");
The multi-line flag (?m) makes ^ and $ match start/end of each line (rather than start/end of input). \s+ means "one or more whitespace characters".

Issue with Regular Expression

I made an application in which I can get a field's value through a regular expression with the help of a matcher... I made a method in which I pass a field and get a response. In string today I got some odd behaviour in my response I got AgentId = 25001220052805950 and after matcher I got fake so I have to check whether a field whose name contains "AgentId" exists and verify the values.
Needed Fields:
SecondaryAgentId=fake; PrimaryAgentId=fake;
Responce :
IsPrimaryAgentId=true; AgentId=25001220052805950; MerchantID=19; Cashier=michael; IsManualPayment=1; UserID=GraceRose; Password=rose1234; AmountUserEntered=2; AmountApproved=0; AmountDifference=0; Amount=0; CustomerNameAttempts=0; ProductID=Agriculture; InvoiceID=inv7443; SiteUrl=http://www.thcelink.com/index.php/shoping/checkout/step/step-1; ReturnURL=http://220.2.3:2027/Customer/Thanks.aspx; ResponseType=1; PrimaryAgentId=fake; PrimaryCurrencyCode=fake; SecondaryAgentId=fake; SecondaryCurrencyCode=fake; MerchantName=GraceRose; EmailId=rr#myglobal.com; Query1Attempts=0; MerchantTransactionID=543; MerchantTransactionSequenceID=246; txtAmtIsVisible=false; isQuery1Executed=false; isQuery2Executed=false; Voucher=fake; Passcode=fake; Error=fake; QueryType=fake; Payer=fake; CurrencyName=fake; CurrencySymbol=fake; CustomerName=fake; EmailBody=fake; ErrorText=fake; CustomerEmailID=fake; NavigatePageValue=0; IsCustomerInsertSucess=false; IdType=fake; IdNumber=fake; AggregateAttempts=0; Voucher2=fake; PassCode2=fake; Voucher3=fake; PassCode3=fake; TransCode=0; TransactionDate=2012-06-11T12:04:52.921875+05:30; NumberInWords=fake; MerchantCompany=fake; InvoiceNumber=fake; OverPaidAmount=0; InsufficientAmount=0; OverPaymentForEmail=fake; RedirectPage=false;
Update::
private String GetString1(String strManualproResponce2, String paternField) {
// TODO Auto-generated method stub
String s = null;
if(paternField.equalsIgnoreCase("AgentId"))
{
Pattern pinPattern2 = Pattern.compile("^"+paternField + "=(.*?);");
ArrayList<String> pins2 = new ArrayList<String>();
Matcher m2 = pinPattern2.matcher(strManualproResponce2);
while (m2.find()) {
pins2.add(m2.group(1));
s = m2.group(1);
}
}else
{
Pattern pinPattern2 = Pattern.compile(paternField + "=(.*?);");
ArrayList<String> pins2 = new ArrayList<String>();
Matcher m2 = pinPattern2.matcher(strManualproResponce2);
while (m2.find()) {
pins2.add(m2.group(1));
s = m2.group(1);
}
}
return s;
}
Your question is a little bit cryptic, from what I am understanding is that the code is not working for when you would like to match/extract the value for the AgentId field. The issue seems to be with your regular expression: "^"+paternField + "=(.*?);" assumes that the text AgentId will be at the beginning of your string, which is not since at the beginning of your string you have IsPrimaryAgentId.
Also, your current regex will return true both for IsPrimaryAgentId and AgentId since they both contain the substring: AgentId. To fix this, you can either use this regex: \\s+AgentId=(.*?);, this will require a white space before the AgentId text.
Another option would be (if your AgentId will always be numerical) to use this: AgentId=(\\d+);.

Problem replacing a String in Java

I am trying to replace a URL with a shortened URL inside of a String:
public void shortenMessage()
{
String body = composeEditText.getText().toString();
String shortenedBody = new String();
String [] tokens = body.split("\\s");
// Attempt to convert each item into an URL.
for( String token : tokens )
{
try
{
Url url = as("mycompany", "someapikey").call(shorten(token));
Log.d("SHORTENED", token + " was shortened!");
shortenedBody = body.replace(token, url.getShortUrl());
}
catch(BitlyException e)
{
//Log.d("BitlyException", token + " could not be shortened!");
}
}
composeEditText.setText(shortenedBody);
// url.getShortUrl() -> http://bit.ly/fB05
}
After the links are shortened, I want to print the modified string in an EditText. My EditText is not displaying my messages properly.
For example:
"I like www.google.com" should be "I like [some shortened url]" after my code executes.
In Java, strings are immutable. String.replace() returns a new string which is the result of the replacement. Thus, when you do shortenedBody = body.replace(token, url.getShortUrl()); in a loop, shortenedBody will hold the result of (only the very) last replace.
Here's a fix, using StringBuilder.
public void shortenMessage()
{
String body = composeEditText.getText().toString();
StringBuilder shortenedBody = new StringBuilder();
String [] tokens = body.split("\\s");
// Attempt to convert each item into an URL.
for( String token : tokens )
{
try
{
Url url = as("mycompany", "someapikey").call(shorten(token));
Log.d("SHORTENED", token + " was shortened!");
shortenedBody.append(url.getShortUrl()).append(" ");
}
catch(BitlyException e)
{
//Log.d("BitlyException", token + " could not be shortened!");
}
}
composeEditText.setText(shortenedBody.toString());
// url.getShortUrl() -> http://bit.ly/fB05
}
You'll probably want String.replaceAll and Pattern.quote to "quote" your string before you pass it to replaceAll, which expects a regex.

Categories