I have the below regex expression in a Java code, it is taking a good deal of time to complete on some cases. Is there a way to improve it?
String decimal = "([0-9]+(\\.[0-9]+)?[/-]?)+";
String units = "(in|ft)\\.?";
String unitName = "(cu\\.? *ft|gauge|watt|rpm|ft|lbs|K|GPF|btu|mph|cfm|volt|oz|pounds|dbi|miles|amp|hour|kw|f|degrees|year)";
sizePattern.add(Pattern.compile("(?i)" + decimal + " *" + units + " *x? *" + decimal + " *" + units + " *x? *" + decimal + " *" + units + ""));
sizePattern.add(Pattern.compile("(?i)" + decimal + " *" + units + " *x? *" + decimal + " *" + units));
sizePattern.add(Pattern.compile("(?i)" + decimal + " *x *" + decimal + " *" + units));
sizePattern.add(Pattern.compile("(?i)" + decimal + "( *" + units + ")"));
sizePattern.add(Pattern.compile("(?i)" + decimal + "( *sq?\\.?)( *ft?\\.?)"));
sizePattern.add(Pattern.compile("(?i)" + decimal + " *" + unitName));
sizePattern.add(Pattern.compile("(?i)" + decimal + "(d)"));
sizePattern.add(Pattern.compile("(?i)" + decimal + "( *(%|percent))"));
sizePattern.add(Pattern.compile("(?i)" + decimal));
for (Pattern p : sizePattern)
{
ODebug.Write(Level.FINER, "PRD-0079: Using pattern = " + p.pattern());
m = p.matcher(_data);
while (m.find())
{
ODebug.Write(Level.FINER, " Got => [" + m.group(0) + "]");
this.Dimensions.add(m.group(0));
_data = _data.replaceAll("\\Q" + m.group(0) + "\\E", ".");
m = p.matcher(_data);
}
}
String causing the issue:
Micro-Induction Cooktop provides the best in cooktop performance, safety and efficiency. Induction heats as electricity flows through a coil to produce a magnetic field under the ceramic plate. When a ferromagnetic cookware is placed on the ceramic surface, currents are induced in the cookware and instant heat is generated due to the resistance of the pan. Heat is generated to the pan only and no heat is lost. As there are no open flames, inductions are safer to use than conventional burners. Once cookware is removed, all molecular activity ceases and heating is stopped immediately.Flush surface for built-in or freestanding applicationDual functions: Cook and Warm7 power settings (100-300-500-700-900-1100-1300W)* The 2 lowest power settings cannot be actually achieved, but are ""simulated"":100W = 500W intermittently heat for 2 seconds and stop for 8 seconds300W = 500W intermittently heat for 6 seconds and stop for 4 seconds13 Keep Warm settings (100-120-140-160-180-190-210-230-250-280-300-350-390F)Touch sensitive panel with control lockUp to 8 hours timerMicro-crystal ceramic plateAutomatic pan detectionLED panelETL/ETL-Sanitation/FCC certified for household or commercial useHome Depot Protection Plan:
Assuming your _data is long, it's not the matching what takes the time, but rather the assignment
_data = _data.replaceAll("\\Q" + m.group(0) + "\\E", ".");
which is O(n**2) in terms of the string length. Just don't do it.
You could do it simpler with
_data = _data.replace(m.group(0), ".");
but just don't do it. Do you need a reduced _data at the end? If so, use a single replaceAll per pattern (it uses a StringBuffer internally and is only linear in the size of the string).
Additionally:
Use non-capturing groups.
Consider recycling the Matcher by using reset(CharSequence) and usePattern(Pattern).
Consider combining all the patterns into one. As all of them start the same, it could be quite efficient.
Your decimal can probably get slow in case there's no match. Leaving out the optional part, you get "([0-9]+)+" which can backtrack needlessly a lot. Consider using atomic groups.
Related
I don't get it why it says it cannot find symbol append.
do i need to use Stringbuffer? i got this code on a tutorial for receipts from youtube, and the uploader disabled comments so I can't ask him directly. please help me. Im still an amateur at java.
Tell me if I need to post my whole code or what code would you want to see to see errors. thanks in adv.
Calendar timer = Calendar.getInstance();
timer.getTime();
SimpleDateFormat tTime = new SimpleDateFormat("HH:mm:ss");
tTime.format(timer.getTime());
SimpleDateFormat Tdate = new SimpleDateFormat("dd-MMM-yyyy");
Tdate.format(timer.getTime());
jtxtReceipt.append("\ Water Station Receipt:\n" +
"Reference:\t\t\t" + refs +
"\n=========================================\n" +
"Mineral:\t\t\t" + jtxtMineral.getText() + "\n\n" +
"Purified:\t\t\t" + jtxtPurified.getText() + "\n\n" +
"Travel:\t\t\t" + jtxtTravel.getText() + "\n\n" +
"VAT:\t\t\t" + jtxtVat.getText() + "\n"+
"\n========================================\n" + "\n" +
"Tax:\t\t\t" + jtxtTax2.getText() + "\n" +
"Subtotal:\t\t\t" + jtxtSubTotal.getText() + "\n" +
"Total:\t\t\t" + jtxtTotal.getText() + "\n" +
"===========================================" +
"\nDate:" + Tdate.format(timer.getTime()) +
"\ntTime:" + tTime.format(timer.getTime()) +
"\n\t\tThank you ");
The append method doesn't exist in the String class. You can either user a StringBuilder to do the job, or if it's a light concatenation, just use the + operator
The append method doesn't work on TextField Palette. So, if You're on TextField Palette, replacing that with TextArea Palette should solve the problem.
Here's what I'm trying to do:
String output = "If you borrow" + currencyFormatter.format(loanAmount);
+" at an interest rate of" + rate + %;
+"\nfor" + years;
+",you will pay" + totalInterest + "in interest.";
Take out the semicolons before the end of your concatenation.
String output = "If you borrow" + currencyFormatter.format(loanAmount)
+" at an interest rate of" + rate + "%"
+"\nfor" + years
+",you will pay" + totalInterest + "in interest.";
I also recommend that you move the concatenation operator to the end of the line rather than the start of the line. It's a minor stylistic preference...
String output = "If you borrow" + currencyFormatter.format(loanAmount) +
" at an interest rate of" + rate + "%" +
"\nfor" + years +
",you will pay" + totalInterest + "in interest.";
Finally, you may notice that you are missing some white-spaces when you try printing that string. The String.format method helps with that (also see the documentation for Formatter). It's also faster than doing lots of concatenations.
String output = String.format(
"If you borrow %s at an interest rate of %d%%\nfor %d years, you will pay %d in interest.", currencyFormatter.format(loanAmount), rate, years, totalInterest
);
I have this statement:
s = s + "Id: " + lc.getID() + " Name: " + lc.getName() + "\n"
+ " Phone Number: " + lc.getPhone() + " Email: " + lc.getEmail() + "\n"
+ " Description: " + lc.getDescription() + "\n\n"
that prints this out:
Id: 1 Name: Eric
Phone Number: 8294038 Email: foo#gmail.com
Description: Cool guy Eric
I want to Bold only the titles (Id, Name, etc).
I tried this:
s = s + Html.fromHtml(" <b> Id: </b>" + lc.getID() + " <b> Name: </b>" + lc.getName() + "\n"
+ " Phone Number: " + lc.getPhone() + " Email: " + lc.getEmail() + "\n"
+ " Description: " + lc.getDescription() + "\n"
+ "\n\n");
But not only does it not bold, but it takes away the new lines (\n). Any ideas on how to get this done? Thanks.
Html.fromHtml() returns a Spanned object, designed to be put directly into a TextView or similar widget.
A Spanned is not a String.
By doing s = s + Html.fromHtml(...), you are saying "please parse this HTML into a Spanned, then throw out all the formatting to give me a String that I can concatenate onto some other String". That's not what you want -- you want to keep the formatting. But a Java String does not have formatting, and so ordinary string concatenation has no way to keep it.
Beyond that, as Manishika pointed out, newlines are ignored in HTML anyway, as you use HTML elements for vertical whitespace.
Your options include:
Generate a complete HTML snippet -- including whatever it is you are trying to concatenate it to -- and then use Html.fromHtml() on the entire thing. You may wish to use a template engine (e.g., jmustache) for that, or String.format(). Or, use StringBuilder, rather than lots of + operations (less memory churn, faster performance). Be sure to use <br/> or <p> for your line breaks/paragraph delimiters.
Use SpannableStringBuilder to assemble the string and its formatting from component parts.
Use TextUtils.concat(s, Html.fromHtml(...)) instead of s + Html.fromHtml(...), as concat() will maintain the spans that implement the formatting. While the implementation of Spanned returned by fromHtml() is not a String, both it and String are a CharSequence, and hence work with concat().
It will require a little bit of parsing on your end, but you definitely want to look into SpannableStrings.
For example, let's say I have the following string:
String s = "How now brown cow";
I would then turn it into a SpannableString by simply feeding the string to the constructor as follows:
SpannableString ss = new SpannableString(s);
From there you need your stylization with the spanned area. For this, I'll just use SubscriptSpan, though if you wish to make your own you can simply make your own class extending CharacterStyle and override the updateDrawState(TextPaint ds) method. The following is how you can set you span:
/ *
* the first argument is the span effect you want, the second and third
* are the start and end indices, respectively, and the last argument is
* for setting a flag, which you probably won't need.
*/
ss.setSpan(new SubscriptSpan(), 0, 2, 0);
And now you can just put your string straight into the TextView and it should appear how you want, like so:
myTextView.setText(ss);
I'm trying to run a mapReduce query on Riak 1.4 that queries by secondary index, sorts the records by date, and then limits the results to the first record.
I've got the secondary index query working. The sort doesn't seem to do anything. No errors on the sort, just returns the results unsorted. The limit on the number of records returned yields a 'bad_json' error returned by the server.
Here's what I have. It is suppose to query the "cars" bucket for the most recent car owned by "john_doe". (some names have been changed to protect the innocent;) :
JSSourceFunction dateSortFunction = new JSSourceFunction(
"function(v) {" +
"return v.sort(function(a, b) {" +
"return a.issueDate - b.issueDate ;" +
"}" +
");" +
"}");
IndexQuery iq = new BinValueQuery(BinIndex.named("person"), "cars", "john_doe");
MapReduceResult response = session.mapReduce(iq)
.addMapPhase(NamedErlangFunction.MAP_OBJECT_VALUE)
.addReducePhase(dateSortFunction)
.addReducePhase(new NamedJSFunction("Riak.reduceLimit"), 1)
.execute();
I've seen a number of posts on sorting and am hoping to figure it out eventually. However, I haven't seen any help on how the LIMIT function might work.
Thanks in advance!
Update:
Thanks to Joe, he put me on the right track. Here's what ended up working for me. My date format is ISO 8601 (eg. 2011-05-18T17:00:00-07:00). So, I can lexically compare for the correct sorting. Also, I found javascript's array shortening method and updated the code to return up-to the first 5 objects.
JSSourceFunction sortLimitFunction = new JSSourceFunction(
"function(v) {" +
"v.sort(function(a, b) {" +
"return a.issueDate < b.issueDate" +
"}" +
");" +
"if (v.length > " + "5" + ") { " +
"v.length = " + "5" + ";" +
"}" +
"return v;" +
"}");
IndexQuery iq = new BinValueQuery(BinIndex.named("person"), "cars", "john_doe");
MapReduceResult response = session.mapReduce(iq)
.addMapPhase(new NamedJSFunction("Riak.mapValuesJson"))
.addReducePhase(sortLimitFunction)
.execute();
For the sorting, there is a mailing list post that covers this topic. The main difference I see between that implementation and yours is the use of the JavaScript Riak.mapValuesJson function in the map phase.
For the limiting, if you want just the first item from the sorted list, try having your sort function return only the first element. While the reduce function can (and probably is) called multiple times as partial result sets arrive from the various vnodes, the first element in the consolidated list must also be the first element in the partial list where it originated, so this should give you what you are looking for:
JSSourceFunction dateSortFunction = new JSSourceFunction(
"function(v) {" +
"var arr = v.sort(function(a, b) {" +
"return a.issueDate - b.issueDate ;" +
"}" +
");" +
"if (arr.length == 0) { " +
"return [];" +
"} else {"
"return arr[0];" +
"}"
"}"
);
I want a regular expression to extract text between EVALUATE and END-EVALAUTE or . which ever comes first.
Presently i am using regular expression:
EVALUATE\\s*(((?!EVALUATE|(END-EVALUATE|\\.)).)+)\\s*(END-EVALUATE|\\.)
But my problem is i do not want to consider . if it comes within double quotes.
Please suggest any better regular expression or correct the one i have mentioned above.
Thanks in advance.
You could try this:
EVALUATE("[^"]*"|((?!EVALUATE|END-EVALUATE)[^."])+)*(END-EVALUATE|\.)
A Java demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) throws Exception {
String src =
" EVALUATE WS-ADDITIONAL-FILE-WORK \n" +
" WHEN \"ACCNT\" \n" +
" IF LINK-TRIG-FILE-NAME NOT = \"ACTMSTR \" \n" +
" PERFORM 04510-GET-ACCOUNT-MASTER \n" +
" ELSE \n" +
" MOVE \"0106H\" TO WS-ERROR-CODE \n" +
" PERFORM 09000-PROCESS-ABORT-ERROR \n" +
" END-IF \n" +
" WHEN \"ADDRM\" \n" +
" IF LINK-TRIG-FILE-NAME NOT = \"ADDRMSTR \" \n" +
" IF PROGRAM-HBMS-RELEASE (1:3) > \"5.0\" \n" +
" PERFORM 04520-GET-ADDRESS-MASTER \n" +
" END-IF \n" +
" ELSE \n" +
" MOVE \"0106H\" TO WS-ERROR-CODE \n" +
" PERFORM 09000-PROCESS-ABORT-ERROR \n" +
" END-IF \n" +
" WHEN OTHER \n" +
" MOVE \"0106F\" TO WS-ERROR-CODE \n" +
" PERFORM 09000-PROCESS-ABORT-ERROR \n" +
" END-EVALUATE. ";
Matcher m = Pattern.compile("EVALUATE(\"[^\"]*\"|((?!EVALUATE|END-EVALUATE)[^.\"])+)*(END-EVALUATE|\\.)").matcher(src);
while(m.find()) {
System.out.println(m.group());
}
}
}
which prints:
EVALUATE WS-ADDITIONAL-FILE-WORK
WHEN "ACCNT"
IF LINK-TRIG-FILE-NAME NOT = "ACTMSTR "
PERFORM 04510-GET-ACCOUNT-MASTER
ELSE
MOVE "0106H" TO WS-ERROR-CODE
PERFORM 09000-PROCESS-ABORT-ERROR
END-IF
WHEN "ADDRM"
IF LINK-TRIG-FILE-NAME NOT = "ADDRMSTR "
IF PROGRAM-HBMS-RELEASE (1:3) > "5.0"
PERFORM 04520-GET-ADDRESS-MASTER
END-IF
ELSE
MOVE "0106H" TO WS-ERROR-CODE
PERFORM 09000-PROCESS-ABORT-ERROR
END-IF
WHEN OTHER
MOVE "0106F" TO WS-ERROR-CODE
PERFORM 09000-PROCESS-ABORT-ERROR
END-EVALUATE
Just thought I'd point out that the regex given by Bart will match a basic, single-level EVALUATE block, however it will NOT cope with nested EVALUATEs.
For instance, try the regex on the following example:
EVALAUTE TRUE
WHEN FILE-ERROR
EVALUATE ERROR-CODE
WHEN FILE-NOT-FOUND
DISPLAY "File Not Found!"
WHEN ACCESS-DENIED
DISPLAY "Access Denied!"
END-EVALUATE
WHEN OTHER
DISPLAY "Success!"
END-EVALUATE
Another approach would be to read through the Cobol source line-by-line and for each EVALUATE you find on a line (that's not inside quotes), increment an evaluate 'level'. That way you can keep track of where you are in the nested levels.
Also, the OP said he was looking for a way to get the text "between" the EVALUATE and END-EVALUATE, which seems to imply that they should not be included. Maybe I misinterpreted that one, but if that is the requirement, then the regex is including the keywords incorrectly.