A friend gave me this piece of code and said there is a bug. And yes, this code runs for ever.
The answer I got is:
It runs for >10^15 years before printing anything.
public class Match {
public static void main(String[] args) {
Pattern p = Pattern.compile("(aa|aab?)+");
int count = 0;
for(String s = ""; s.length() < 200; s += "a")
if (p.matcher(s).matches())
count++;
System.out.println(count);
}
}
I didn't really understand why am I seeing this behavior, I am new to java, do you have any suggestions?
The pattern you are using is known as an evil regex according to OWASP (they know what they're talking about most of the time):
https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
It basically matches aa OR aa or aab (since the b is optional by addition of ?)
A Regex like this is vulnerable to a ReDoS or Regex Denial of Service Attack.
So yes, sort out what you want to match. I suggest in the above example you should simply match aa, no need for groups, repitition or alternation:
Pattern p = Pattern.compile("aa");
Also as someone pointed out, who now deleted his post, you should not use += to append to strings. You should use a StringBuffer instead:
public class Match {
public static void main(String[] args) {
Pattern p = Pattern.compile("aa");
StringBuffer buffy = new StringBuffer(200);
int count = 0;
for(int i = 0; i < 200; i++){
buffy.append("a")
if (p.matcher(buffy.toString()).matches()){
count++;
}
}
System.out.println(count);
}
}
The regular expression (aa|aab?)+ is one that takes an especially long time for the regular expression engine to handle. These are colorfully called evil regexes. It is similar to the (a|aa)+ example at the link. This particular one is very slow on a string composed entirely of as.
What this code does is check the evil regex against increasingly long strings of as, up to length 200, so it certainly ought to take a long time, and it doesn't print until the loop ends. I'd be interested to know where the 10^15 years figure came from.
Edit
OK, the 10^15 (and in fact the entire piece of code in the question) comes from this talk, slide 37. Thanks to zengr for that link. The most relevant piece of information to the question is that the check for this regex takes time that is exponential in the length of the string. Specifically it's O(2^(n/2)), so it takes 2^99 (or so) times longer to check the last string than the first one.
Related
I have to create a program in which I input 2 values simultaneously but, these values must range from 50 to 127. If the value is less than 50 or above 127 it should show an exception saying "value entered" is too low or too high. As a requirement for the program, I cannot use flow-control statements. I have not tried anything as I am lost
UPDATE: WITH THE HELP OF YOUR COMMENTS I HAVE SATISFIED THE 127 CONDITION, AND MAY BE ON THE WAY TO SATISFYING THE LOWER CONDITION WITH THE NEXT(PATTERN) METHOD. I AM CURRENTLY CONTACTING MY TEACHER AND ASKING IF SATISFYING THE 127 CONDITION IS ENOUGH. THANK YOU ALL
UPDATE 2: the 127 condition was enough!
import java.util.Scanner;
public class A1_Q1 {
public static void main(String[] args) {
// declaration of variables
double x, y, z;
int p, q;
Scanner kBoard = new Scanner(System.in);
System.out.println("Welcome to the Simple 3D-Space Program:");
System.out.println("+++++++++++++++++++++++++++++++++++++++");
System.out.print("Enter values for 'p' and 'q' simultaneously. Must be between 50 and 127:");
String input = kBoard.nextLine();
//separating first number and second number from the inputed line
String firstNum = input.substring(0, input.indexOf(" "));
String secondNum = input.substring(input.indexOf(" "), input.length());
//isolating numbers from the strings and filtering out
String numOnlyP = firstNum.replaceAll("[^0-9]", "");
String numOnlyQ = secondNum.replaceAll("[^0-9]", "");
//transforming string into integer values
p = Integer.parseInt(numOnlyP);
q = Integer.parseInt(numOnlyQ);
//to check and see if previous code was functioning correctly
System.out.println(p + " " + q);
}
}
The constraint against using flow control statements is highly artificial, but it does force you into a certain direction. For the program to behave differently for some inputs than it does for others requires flow control. If you cannot use flow control in your program then that means the needed flow control must be somewhere else. And the only "somewhere else" you are allowed to use is java.util.Scanner.
As a blooming programmer, some of the things you need to learn and have ready at all times at the tip of your brain are the powers of two, and certain ones of them especially. Among those is 27 = 128, which has special significance as one more than 127, the maximum value of an 8-bit two's complement integer, such as a Java byte (these properties are closely related). And lo and behold, the maximum value your program is supposed to accept is 127.
Read the API docs of Scanner, especially those for scanning inputs of various specific Java types. You will find some that can serve the purpose of (as a side effect) throwing an exception for inputs larger than 127. If you want to throw a custom exception instead, then you can catch the one thrown by the Scanner and throw one of your choice in its place -- provided that try / catch is not considered flow control.
Scanner could be pressed into service for an exception for violation of the lower bound, too, but that would require a very unnatural hack.
Details are left as the exercise they are intended to be.
This question is a tad disingenuous. A requirement that 50-127 is accepted, the rest is not, is not normally possible without flow control. So we need flow control, but, we're not allowed to use flow control.
Quite the dilemma.
It's a bit like an escape room. You must screw together this intricate thing, but, you don't have a screwdriver (all your pockets were emptied before entering it), so now what? Well, you look around the room. What tools do you have? Even if they are epically not suited to the job, can they be used to deliver?
So, let's look. What do we have? We have scanner.
We need to look at what you can do with those, and see if that helps, because other than Scanner you're not given much room to work.
And lo! Scanner delivers. It's a method that is almost never used with scanner, but it's there, and it'll give us a way through: The next(Pattern) method can do the job.
Quoting from the javadoc:
Returns the next token if it matches the specified pattern.
Great!
Now, I'm flabbergasted, but the docs are broken. Java is used by millions of people and the docs are taken fairly seriously, so that is mostly an indictment on how little this method is used. Nobody, apparently, noticed. The thing missing from the docs is subtle, but important: What happens if the next token in the stream does not match the specified pattern? The docs don't say. But the scanner acts the exact same way as when you call e.g. nextInt() and the next token is not an int: It throws InputMismatchException.
This means next(Pattern) is the complete package: Will pass if input 'matches the pattern' and throws if it does not, so now all we need is a pattern that matches if the input is 50-127 and fails otherwise. We can trivially do that, though we'll need to dive into Regular Expressions to do the job - as that is the 'language' for the pattern we can provide to this next method. (There is a String variant too, but it just parses your string as... a regular expression, so we can't get out of reading up on those).
Scanner s = new Scanner(System.in);
int value = s.next("[5-9][0-9]|1[0-1][0-9]|12[0-7]");
It's a rather unwieldy pattern, but it'll get the job done. This pattern says:
The input has to be one of these 3 different options (a|b|c is regexp for: either a, or b, or c)
Option 1: A 5/6/7/8/9, followed by any digit.
Option 2: A 1, followed by a 0 or a 1, followed by any digit.
Option 3: A 1, a 2, and then a 0/1/2/3/4/5/6/7.
That'll covers 50-127. If you want e.g. 00050 to be okay too, stick a 0* on the front, which regexp for 'any amount of zeroes'.
I think this exercise is quite a reach. As the answers provided so far clearly attest, no java programmer would get in their right mind to solve such a task by using the regexp pattern powers of scanner, they'd just read an int and use an if like normal. As the unwieldiness of that regexp shows, using a regex pattern to limit numeric input is not really what it is meant for. A key insight into writing readable, flexible code is to avoid doing that: Using a thing in a way that the author of the thing clearly was not at all thinking about when they wrote it is a fine way to create confusing code that runs into bugs quickly and may no longer work nearly as well in future updates of said thing. I guess it's testing if you can think creatively and read documentation, perhaps?
If you can contact the author of this question, you might want to have them read this answer too :)
I'll be filing a bug over at OpenJDK about the subtle mistake in the docs, at any rate.
Quite standard approach for such problems is to use lookup tables, something like this:
import java.util.Scanner;
public class TestProgram {
private interface Handler {
void handle(final int value);
}
private static class NullHandler implements Handler {
public void handle(final int value) { }
}
private static class TooLowHandler implements Handler {
public void handle(final int value) {
throw new IllegalArgumentException("Value " + value + " is too low");
}
}
private static class TooHighHandler implements Handler {
public void handle(final int value) {
throw new IllegalArgumentException("Value " + value + " is too high");
}
}
private static void checkLow(final int value) {
final int sign = ((value - 50) >> 31) & 1;
new Handler[]{new NullHandler(), new TooLowHandler()}[sign].handle(value);
}
private static void checkHigh(final int value) {
final int sign = ((value - 128) >> 31) & 1;
new Handler[]{new TooHighHandler(), new NullHandler()}[sign].handle(value);
}
public static void main(final String args[]) {
final Scanner scanner = new Scanner(System.in);
final int p = scanner.nextInt();
final int q = scanner.nextInt();
System.out.println(p + " " + q);
checkLow(p); checkLow(q);
checkHigh(p); checkHigh(q);
}
}
I'm trying the solve this hacker earth problem https://www.hackerearth.com/practice/basic-programming/input-output/basics-of-input-output/practice-problems/algorithm/anagrams-651/description/
I have tried searching through the internet but couldn't find the ideal solution to solve my problem
This is my code:
String a = new String();
String b = new String();
a = sc.nextLine();
b = sc.nextLine();
int t = sc.nextInt();
int check = 0;
int againCheck =0;
for (int k =0; k<t; k++)
{
for (int i =0; i<a.length(); i++)
{
char ch = a.charAt(i);
for (int j =0; j<b.length(); j++)
{
check =0;
if (ch != b.charAt(j))
{
check=1;
}
}
againCheck += check;
}
}
System.out.println(againCheck*againCheck);
I expect the output to be 4, but it is showing the "NZEC" error
Can anyone help me, please?
The requirements state1 that the input is a number (N) followed by 2 x N lines. Your code is reading two strings followed by a number. It is probably throwing an InputMismatchException when it attempts to parse the 3rd line of input as a number.
Hints:
It pays to read the requirements carefully.
Read this article on CodeChef about how to debug a NZEC: https://discuss.codechef.com/t/tutorial-how-to-debug-an-nzec-error/11221. It explains techniques such as catching exceptions in your code and printing out a Java stacktrace so that you can see what is going wrong.
1 - Admittedly, the requirements are not crystal clear. But in the sample input the first line is a number.
As I've written in other answers as well, it is best to write your code like this when submitting on sites:
def myFunction():
try:
#MY LOGIC HERE
except Exception as E:
print("ERROR Occurred : {}".format(E))
This will clearly show you what error you are facing in each test case. For a site like hacker earth, that has several input problems in various test cases, this is a must.
Coming to your question, NZEC stands for : NON ZERO EXIT CODE
This could mean any and everything from input error to server earthquake.
Regardless of hacker-whatsoever.com I am going to give two useful things:
An easier algorithm, so you can code it yourself, becuase your algorithm will not work as you expect;
A Java 8+ solution with totally a different algorithm, more complex but more efficient.
SIMPLE ALGORITM
In you solution you have a tipical double for that you use to check for if every char in a is also in b. That part is good but the rest is discardable. Try to implement this:
For each character of a find the first occurence of that character in b
If there is a match, remove that character from a and b.
The number of remaining characters in both strings is the number of deletes you have to perform to them to transform them to strings that have the same characters, aka anagrams. So, return the sum of the lenght of a and b.
NOTE: It is important that you keep track of what you already encountered: with your approach you would have counted the same character several times!
As you can see it's just pseudo code, of a naive algorithm. It's just to give you a hint to help you with your studying. In fact this algorithm has a max complexity of O(n^2) (because of the nested loop), which is generally bad. Now, a better solution.
BETTER SOLUTION
My algorithm is just O(n). It works this way:
I build a map. (If you don't know what is it, to put it simple it's a data structure to store couples "key-value".) In this case the keys are characters, and the values are integer counters binded to the respective character.
Everytime a character is found in a its counter increases by 1;
Everytime a character is found in b its counter decreases by 1;
Now every counter represents the diffences between number of times its character is present in a and b. So, the sum of the absolute values of the counters is the solution!
To implement it actually add an entry to map whenever I find a character for the first time, instead of pre-costructing a map with the whole alphabet. I also abused with lambda expressions, so to give you a very different sight.
Here's the code:
import java.util.HashMap;
public class HackerEarthProblemSolver {
private static final String a = //your input string
b = //your input string
static int sum = 0; //the result, must be static because lambda
public static void main (String[] args){
HashMap<Character,Integer> map = new HashMap<>(); //creating the map
for (char c: a.toCharArray()){ //for each character in a
map.computeIfPresent(c, (k,i) -> i+1); //+1 to its counter
map.computeIfAbsent(c , k -> 1); //initialize its counter to 1 (0+1)
}
for (char c: b.toCharArray()){ //for each character in b
map.computeIfPresent(c, (k,i) -> i-1); //-1 to its counter
map.computeIfAbsent(c , k -> -1); //initialize its counter to -1 (0-1)
}
map.forEach((k,i) -> sum += Math.abs(i) ); //summing the absolute values of the counters
System.out.println(sum)
}
}
Basically both solutions just counts how many letters the two strings have in common, but with different approach.
Hope I helped!
Given array of strings like [“crane", "drain", "refrain”] and a pattern such as *an* where * can match any number of characters.
Return the matching word in an efficient manner. (In this example, "crane")
I can solve it in very simple way:
String [] array = {"crane", "drain", "refrain"};
String pattern="an";
for(String s:array){
if(s.contains(pattern)){
System.out.println(s);
}
}
Is there a way to optimize the code performance in java? Consider array can contains a large number of strings.
You could try it with Regular Expressions (regex).
public class RegexExample3{
public static void main(String args[]){
String [] array = {"crane", "drain", "refrain"};
for(String s:array){
if(java.util.regex.Pattern.matches(".*an.*", s))
System.out.println(""+s);
}
}
}
Here is the link if someone doesn't know about regex and would want to understand it.
well, if you want to check if a word is match a pattern without using any Regex, contains..etc
i suggest to encode the pattern in way that if you encode a word will have same hashing...
but, in your case i suggest to do this:
static String EncodeString(String x){
String output="";
for(int i=0;i<x.length();i++){
// *an* == 0110
char c=x.charAt(i);
if(c=='n'|| c=='a'){
output +="1";
} else {
output +="0";
}
}
return output;
}public static void main(String args[])
{
String pattern="*an*";
String enPattern=EncodeString(pattern);
String word="xyanxvsdanfgh";
String enWord=EncodeString(word);
System.out.println(enPattern+" "+enWord);
int v1=Integer.parseInt(enPattern);
int v2=Integer.parseInt(enWord);
System.out.println(" % :"+ v2%v1);// notice here if word not match the pattern then the MOD operation will NOT return 0
}
The assignment asks for a return of the matching word, so the assumptions are, that there is one word, and only one word matching.
And if there is just one word, it is efficient to return early, instead of looping on. You have been close.
String matching (String pattern, String [] array) {
for (String s:array)
if (s.contains (pattern))
return s;
return "";
}
Think about alternatives, how to measure s.contains (pattern) against Regex.pattern.matches, how many cases you would have to generate, to find a difference. Without doing the measurement, you're not sure, that it isn't less efficient. Maybe the pattern should be precompiled?
In such assignments, supposed you cited it carefully, you usually have to take everything very carefully.
Often people have good ideas about a topic, and can't hesitate to implement their first idea to it. Don't do it!
Given array of strings like [“crane", "drain", "refrain”] and a
pattern such as an where * can match any number of characters.
Return the matching word in an efficient manner. (In this example,
"crane")
Be very sensible for every violation of your expectation. It is asked for returning the matching word. Did you notice the singular case? What might it mean in the context of efficient manner?
Of course you have to know your teacher, whether he is a bit sloppy or not and fluent in the language, he uses. But interfaces of methods which fit together are a big issue in software development and reading the specs carefully, too. You soon end up investing much time into a solution which works, but doesn't fit the problem.
Returning an empty String is probably not the best idea, but seems sufficient for the level, and without further context it is hard to decide, what an appropriate reaction would be, if nothing is found. Again, the wording suggests, that there is exactly one solution.
I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines:
(split() doesn't work.!!! )
test.csv
"31451 CID005319044 15939353 C8H14O3S2 beta-lipoic acid C1C[S#](=O)S[C##H]1CCCCC(=O)O "
"12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O "
"9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
One more thing: the length of whitespace between each of the six terms is not always equal. the length could be one, two, three, four, or five, or something like that.
Because the length of whitespace is random, I can not use the .split() function.
For example, in the first line I would get "beta-lipoic" instead "beta-lipoic acid.**
Here is a solution for your problem using the string split and index of,
import java.util.ArrayList;
public class StringSplit {
public static void main(String[] args) {
String[] seperatedStr = null;
int fourthStrIndex = 0;
String modifiedStr = null, finalStr = null;
ArrayList<String> strList = new ArrayList<String>();
strList.add("31451 CID005319044 15939353 C8H14O3S2 beta-lipoic acid C1C[S#](=O)S[C##H]1CCCCC(=O)O ");
strList.add("12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O ");
strList.add("9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O ");
for (String item: strList) {
seperatedStr = item.split("\\s+");
fourthStrIndex = item.indexOf(seperatedStr[3]) + seperatedStr[3].length();
modifiedStr = item.substring(fourthStrIndex, item.length());
finalStr = modifiedStr.substring(0, modifiedStr.indexOf(seperatedStr[seperatedStr.length - 1]));
System.out.println(finalStr.trim());
}
}
}
Output:
beta-lipoic acid
saponin
Berberine
Option 1 : Use spring.split and check for multiple consecutive spaces. Like the code below:
String s[] = str.split("\\s\\s+");
for (String string : s) {
System.out.println(string);
}
Option 2 : Implement your own string split logic by browsing through all the characters. Sample code below (This code is just to give an idea. I didnot test this code.)
public static List<String> getData(String str) {
List<String> list = new ArrayList<>();
String s="";
int count=0;
for(char c : str.toCharArray()){
System.out.println(c);
if (c==' '){
count++;
}else {
s = s+c;
}
if(count>1&&!s.equalsIgnoreCase("")){
list.add(s);
count=0;
s="";
}
}
return list;
}
This would be a relatively easy fix if it weren't for beta-lipoic acid...
Assuming that only spaces/tabs/other whitespace separate terms, you could split on whitespace.
Pattern whitespace = Pattern.compile("\\s+");
String[] terms = whitespace.split(line); // Not 100% sure of syntax here...
// Your desired term should be index 4 of the terms array
While this would work for the majority of your terms, this would also result in you losing the "acid" in "beta-lipoic acid"...
Another hacky solution would be to add in a check for the 6th spot in the array produced by the above code and see if it matches English letters. If so, you can be reasonably confident that the 6th spot is actually part of the same term as the 5th spot, so you can then concatenate those together. This falls apart pretty quickly though if you have terms with >= 3 words. So something like
Pattern possibleEnglishWord = Pattern.compile([[a-zA-Z]*); // Can add dashes and such as needed
if (possibleEnglishWord.matches(line[5])) {
// return line[4].append(line[5]) or something like that
}
Another thing you can try is to replace all groups of spaces with a single space, and then remove everything that isn't made up of just english letters/dashes
line = whitespace.matcher(line).replaceAll("");
Pattern notEnglishWord = Pattern.compile("^[a-zA-Z]*"); // The syntax on this is almost certainly wrong
notEnglishWord.matcher(line).replaceAll("");
Then hopefully the only thing that is left would be the term you're looking for.
Hopefully this helps, but I do admit it's rather convoluted. One of the issues is that it appears that non-term words may have only one space between them, which would fool Option 1 as presented by Hirak... If that weren't the case that option should work.
Oh by the way, if you do end up doing this, put the Pattern declarations outside of any loops. They only need to be created once.
I'm trying to convert the first letter of a string to lowercase.
value.substring(0,1).toLowerCase() + value.substring(1)
This works, but are there any better ways to do this?
I could use a replace function, but Java's replace doesn't accept an index. You have to pass the actual character/substring. It could be done like this:
value.replaceFirst(value.charAt(0), value.charAt(0).toLowerCase())
Except that replaceFirst expects 2 strings, so the value.charAt(0)s would probably need to be replaced with value.substring(0,1).
Is there any standard way to replace the first letter of a String?
I would suggest you to take a look at Commons-Lang library from Apache. They have a class
StringUtils
which allows you to do a lot of tasks with Strings. In your case just use
StringUtils.uncapitalize( value )
read here about uncapitalize as well as about other functionality of the class suggested
Added: my experience tells that Coomon-Lang is quite good optimized, so if want to know what is better from algorithmistic point of view, you could take a look at its source from Apache.
The downside of the code you used (and I've used in similar situations) is that it seems a bit clunky and in theory generates at least two temporary strings that are immediately thrown away. There's also the issue of what happens if your string is fewer than two characters long.
The upside is that you don't reference those temporary strings outside the expression (leaving it open to optimization by the bytecode compiler or the JIT optimizer) and your intent is clear to any future code maintainer.
Barring your needing to do several million of these any given second and detecting a noticeable performance issue doing so, I wouldn't worry about performance and would prefer clarity. I'd also bury it off in a utility class somewhere. :-) See also jambjo's response to another answer pointing out that there's an important difference between String#toLowerCase and Character.toLowerCase. (Edit: The answer and therefore comment have been removed. Basically, there's a big difference related to locales and Unicode and the docs recommend using String#toLowerCase, not Character.toLowerCase; more here.)
Edit Because I'm in a weird mood, I thought I'd see if there was a measureable difference in performance in a simple test. There is. It could be because of the locale difference (e.g., apples vs. oranges):
public class Uncap
{
public static final void main(String[] params)
{
String s;
String s2;
long start;
long end;
int counter;
// Warm up
s = "Testing";
start = System.currentTimeMillis();
for (counter = 1000000; counter > 0; --counter)
{
s2 = uncap1(s);
s2 = uncap2(s);
s2 = uncap3(s);
}
// Test v2
start = System.currentTimeMillis();
for (counter = 1000000; counter > 0; --counter)
{
s2 = uncap2(s);
}
end = System.currentTimeMillis();
System.out.println("2: " + (end - start));
// Test v1
start = System.currentTimeMillis();
for (counter = 1000000; counter > 0; --counter)
{
s2 = uncap1(s);
}
end = System.currentTimeMillis();
System.out.println("1: " + (end - start));
// Test v3
start = System.currentTimeMillis();
for (counter = 1000000; counter > 0; --counter)
{
s2 = uncap3(s);
}
end = System.currentTimeMillis();
System.out.println("3: " + (end - start));
System.exit(0);
}
// The simple, direct version; also allows the library to handle
// locales and Unicode correctly
private static final String uncap1(String s)
{
return s.substring(0,1).toLowerCase() + s.substring(1);
}
// This will *not* handle locales and unicode correctly
private static final String uncap2(String s)
{
return Character.toLowerCase(s.charAt(0)) + s.substring(1);
}
// This will *not* handle locales and unicode correctly
private static final String uncap3(String s)
{
StringBuffer sb;
sb = new StringBuffer(s);
sb.setCharAt(0, Character.toLowerCase(sb.charAt(0)));
return sb.toString();
}
}
I mixed up the order in various tests (moving them around and recompiling) to avoid issues of ramp-up time (and tried to force some initially anyway). Very unscientific, but uncap1 was consistently slower than uncap2 and uncap3 by about 40%. Not that it matters, we're talking a difference of 400ms across a million iterations on an Intel Atom processor. :-)
So: I'd go with your simple, straightforward code, wrapped up in a utility function.
Watch out for any of the character functions in strings. Because of unicode, it is not always a 1 to 1 mapping. Stick to string based methods unless char is really what you want. As others have suggested, there are string utils out there, but even if you don't want to use them for your project, just make one yourself as you work. The worst thing you can do is to make a special function for lowercase and hide it in a class and then use the same code slightly differently in 12 different places. Put it somewhere it can easily be shared.
Use StringBuffer:
buffer.setCharAt(0, Character.toLowerCase(buffer.charAt(0)));