regex VS Contains. Best Performance? [closed]

regex VS Contains. Best Performance? [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I want to compare an URI String over different patterns in java and I want fastest code possible.
Should I use :
if(uri.contains("/br/fab") || uri.contains("/br/err") || uri.contains("/br/sts")
Or something like :
if(uri.matches(".*/br/(fab|err|sts).*"))
Note that I can have a lot more uri and this method is called very often.
What is the best answer between my choices ?

If you're going to use a regular expression, create it up-front and reuse the same Pattern object:
private static final Pattern pattern = Pattern.compile(".*/br/(fab|err|sts).*");
Do you actually need the ".*" at each end? I wouldn't expect it to be required, if you use Matcher.find().
Which is faster? The easiest way to find out is to measure it against some sample data - with as realistic samples as possible. (The fastest solution may very well depend on
Are you already sure this is a bottleneck though? If you've already measured the code enough to find out that it's a bottleneck, I'm surprised you haven't just tried both already. If you haven't verified that it's a problem, that's the first thing to do before worrying about the "fastest code possible".
If it's not a bottleneck, I would personally opt for the non-regex version unless you're a regex junkie. Regular expressions are very powerful, but also very easy to get wrong.

I've done a test and it is faster to use contains. As Ewan Todd said, they both fast enough to don't really bother with that.

Both are fast enough, but contains is faster. Facts: ~20mil ops vs ~1mil ops
Using the following jmh code to test
#State(Scope.Benchmark)
public class Main {
private String uri = "https://google.com/asdfasdf/ptyojty/aeryethtr";
#Benchmark
#Warmup(iterations = 5)
#Measurement(iterations = 5)
#Fork(value = 1, warmups = 0)
public void initContains() throws InterruptedException {
if (uri.contains("/br/fab") || uri.contains("/br/err") || uri.contains("/br/sts")) {}
}
#Benchmark
#Warmup(iterations = 5)
#Measurement(iterations = 5)
#Fork(value = 1, warmups = 0)
public void initMatches() throws InterruptedException {
if (uri.matches(".*/br/(fab|err|sts).*")) {}
}
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
}
The results
# Run complete. Total time: 00:00:37
Benchmark Mode Cnt Score Error Units
Main.initContains thrpt 5 21004897.968 ± 1987176.746 ops/s
Main.initMatches thrpt 5 1177562.581 ± 248488.092 ops/s

I would expect contains() to be faster since it won't have to compile and iterate through a (relatively) complex regular expression, but rather simply look for a sequence of characters.
But (as with all optimisations) you should measure this. Your particular situation may impact results, to a greater or lesser degree.
Furthermore, is this known to be causing you grief (wrt. performance) ? If not, I wouldn't worry about it too much, and choose the most appropriate solution for your requirements regardless of performance issues. Premature optimisation will cause you an inordinate amount of grief if you let it!

UPDATE:
I know this is not the best benchmark code and for each case there are several ways to optimize it.
What I wanted to achieve was, for a regular developer that will use the simpler ways of doing things and it's not a JVM expert, that's the "common" way to use it, so here it goes.
ORIGINAL:
The below code produced the following output
contains took: 70
matches took: 113
matches with pre pattern took: 419
The test class
public class MatchesTester {
public static void main(String[] args) {
String typeStr = "Nunc rhoncus odio ac tellus pulvinar, et volutpat sapien aliquet. Nam sed libero nec ex laoreet pretium sed id mi. Aliquam erat volutpat. Aenean at erat vitae massa iaculis mattis. Quisque sagittis massa orci, sit amet vestibulum turpis tempor a. Etiam eget venenatis arcu. Nunc enim augue, pulvinar at nulla ut, pellentesque porta sapien. Maecenas ut erat id nisi tincidunt faucibus eget vel erat. Morbi quis magna et massa pharetra venenatis ut a lacus. Vivamus egestas vitae nulla eget tristique. Praesent consectetur, tellus quis bibendum suscipit, nisl turpis mattis sapien, ultrices mollis leo quam eu eros.application/binaryNunc rhoncus odio ac tellus pulvinar, et volutpat sapien aliquet. Nam sed libero nec ex laoreet pretium sed id mi. Aliquam erat volutpat. Aenean at erat vitae massa iaculis mattis. Quisque sagittis massa orci, sit amet vestibulum turpis tempor a. Etiam eget venenatis arcu. Nunc enim augue, pulvinar at nulla ut, pellentesque porta sapien. Maecenas ut erat id nisi tincidunt faucibus eget vel erat. Morbi quis magna et massa pharetra venenatis ut a lacus. Vivamus egestas vitae nulla eget tristique. Praesent consectetur, tellus quis bibendum suscipit, nisl turpis mattis sapien, ultrices mollis leo quam eu eros.";
int timesToTest = 10000;
long start = System.currentTimeMillis();
int count = 0;
//test contains
while(count < timesToTest){
if (typeStr.contains("image") || typeStr.contains("audio") || typeStr.contains("video") || typeStr.contains("application")) {
//do something non expensive like creating a simple native var
int a = 10;
}
count++;
}
long end = System.currentTimeMillis();
System.out.println("contains took: "+ (end - start));
long start2 = System.currentTimeMillis();
count = 0;
while(count < timesToTest){
if (typeStr.matches("(image|audio|video|application)")) {
//do something non expensive like creating a simple native var
int a = 10;
}
count++;
}
long end2 = System.currentTimeMillis(); //new var to have the same cost as contains
System.out.println("matches took: "+ (end2 - start2));
long start3 = System.currentTimeMillis();
count = 0;
Pattern pattern = Pattern.compile("(image|audio|video|application)");
while(count < timesToTest){
if (pattern.matcher(typeStr).find()) {
//do something non expensive like creating a simple native var
int a = 10;
}
count++;
}
long end3 = System.currentTimeMillis(); //new var to have the same cost as contains
System.out.println("matches with pre pattern took: "+ (end3 - start3));
}

If the bit you are trying to match against is always at the beginning, or end, or is in some other way predictable then: neither!
For example, if urls are like http://example.com/br/fab or http://example.com/br/err all the time, then you could store "br/fab" and "br/err" etc in a HashSet or similar, and then given an incoming URL, chop off the last part of it and query the Set to see if it contains it. This will scale better than either method you gave (with a HashSet it should get no slower to lookup entries no matter how many there are).
If you do need to match against substrings appearing in arbitrary locations... it depends what you mean by "a lot more". One thing you should do regardless of the specifics of the problem is try things out and benchmark them!

its much faster if you use indexOf().
if(uri.indexOf("/br/fab")>-1 || uri.indexOf("/br/err")>-1 || uri.indexOf("/br/sts") >-1 )
{
your code.
}
and problem with contains() is internally it creates a Matcher(java.util.regex.Matcher) object and evalates the expression.
Matcher is a very costly thing if processing large amount of data.

They're both fast enough to be over before you know it. I'd go for the one that you can read more easily.

Related

Check if one List<String> contains specific string from another list

I have a List<String> emails containing emails, of length n , and another List<String> keywords for containing keywords, of the same length. These lists should meet following condition: For each index i emails.get(i).contains(keywords.get(i))
So, if emails.get(0) == "quick brown fox", then keywords.get(0) == "fox".
if emails.get(5) == "foo bar", then keywords.get(5) == "foo".
How can I check (other than for loop) that each email contains a keyword?

First, it may be needed to check the size of both lists, then to compare corresponding list items, IntStream should be used:
public static boolean allKeywordsFound(List<String> emails, List<String> keywords) {
return emails.size() == keywords.size() &&
IntStream.range(0, emails.size())
.allMatch(i -> emails.get(i).contains(keywords.get(i)));
}

I see that others correctly answered your question but here's my take on the issue.
I presume you want the emails to be checked in order so here's a piece of code that uses Stream API instead of a for loop, I also put together the emails list and the result into a Map since you didn't specify whether you want the resulting boolean value to be for all the emails together or if you want a boolean value for each email containing the same-position keyword:
//mock data initialization
List<String> emails = new ArrayList<>();
List<String> keywords = new ArrayList<>();
//mock data initialization
emails.add("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua");
emails.add("eu lobortis elementum nibh tellus molestie nunc non blandit massa enim nec dui nunc mattis enim ut tellus elementum sagittis");
emails.add("Dignissim suspendisse in est ante in nibh mauris");
//mock data initialization
keywords.add("consectetur");
keywords.add("Foo");
keywords.add("Dignissim");
//initialized a list to contain whether a keyword exists for each email
List<Boolean> exists = new ArrayList<>();
//loaded it with boolean values (the exists List has the same order as the emails list)
emails.forEach(email -> exists.add(email
.contains(keywords
.get(emails
.indexOf(email)))));
//since I don't know what you wanna do with the result, I decided to just put them together in a Map
//with the email string as the key and the existence variable as a value
LinkedHashMap mapOfTruth = new LinkedHashMap();
emails.forEach(email -> mapOfTruth.put(email, exists.get(emails.indexOf(email))));
Output
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua = true
eu lobortis elementum nibh tellus molestie nunc non blandit massa enim nec dui nunc mattis enim ut tellus elementum sagittis = false
Dignissim suspendisse in est ante in nibh mauris = true

This code using Java streams/maps checks if each email contains their respective keyword.
boolean allEmailsContainKeyword(List<String> emails, List<String> keywords) {
return !emails.stream().map(email -> email.contains(keywords.get(emails.indexOf(email)))).collect(Collectors.toList()).contains(false);
}

Find and replace text iteratively from a list

Suppose I have these text :
Donec sollicitudin ? malesuada. "Curabitur" arcu erat, accumsan
id imperdiet et, porttitor at sem. Quisque velit nisi, ? ut
lacinia in, ? id enim. Proin eget tortor risus.
and I have these texts in list :
["apple", "banana", "cherry"]
How can I replace each occurence of ? with each of the text in the list ? Expected output :
Donec sollicitudin apple malesuada. "Curabitur" arcu erat, accumsan
id imperdiet et, porttitor at sem. Quisque velit nisi, banana ut
lacinia in, cherry id enim. Proin eget tortor risus.
Is it possible to use notepad++ to achieve something like this for a longer text and list? Or is there any other technologies that I can use ?

This Python script will get the job done. If there are more ? than replacements in the list, it will leave them as ?.
import re
replacements = ["apple", "banana", "cherry"]
lines = ""
with open("file.txt") as file:
lines = file.read()
def replace(m):
if not replacements:
return "?"
return replacements.pop(0)
lines = re.sub(r"\?", replace, lines)
with open("file.txt", "w") as file:
file.write(lines)
Admittedly, there are better ways of doing this, such as not loading the entire file into a string.

You could try doing three regex replacements in succession:
Find:
([^?]*)?\?(.*)
Replace:
$1apple$2
The trick here is that ([^?]*)?\? matches everything up until the first question mark. This allows us to do a controlled replacement of only one ? placeholder at a time.
You would then repeat the above the above replacement from left to right for the other two keywords.

You can use below regex:
\?(?!(.|\s)*\?(.|\s)*)
It will pick the last ? and provide you the index of it. After that you can replace it with the last element of your array (it would be better if you create a stack which contains ["apple", "banana", "cherry"] so that stack.pop method will always give you the last element.)

In Perl:
$text =~ s{\?}{shift #fruits}eg; # Consumes #fruits array
Or
my $i = 0;
$text =~ s{\?}{$fruits[$i++]}g; # Preserves #fruits
To cycle over #fruits (if the number of ?s exceeds the number of fruits):
my $i = 0;
$text =~ s{\?}{$fruits[ $i++ % #fruits ]}g;

time complexity Java

The program counts maximum number of words in a sentence considering a text is given.A text can have multiple sentences. I have to find sentence with maximum words
I have the following code I need to optimize the time complexity for this
it should not take more than 5 sec
import java.util.*;
import java.io.*;
class Solution {
public int solution(String S) {
// write your code in Java SE 8
List<Integer> wca=new ArrayList<Integer>();
int wc,i;
String[] sent=S.split("\\.+");
while(sent.length!=0){
for(i=0;i<sent.length;i++){
wc=sent[i].split("\\s+").length;
wca.add(wc);
}
}
Collections.sort(wca);
return(wca.get(wca.size()-1));
}
}

You don't need to sort the list to simply find its largest value. In fact you don't need a list at all. Simply store the longest sentence as you go along.
public int findLongestSentence(String paragraph) {
String[] sentences = paragraph.split("\\.|\\!|\\?");
int maxSentenceLength = 0;
for(String sentence : sentences) {
String[] words = sentence.split("\\s");
maxSentenceLength = Math.max(words.length, maxSentenceLength);
}
return maxSentenceLength;
}
This could be made more efficient by not using the split() method, but that would not affect the asymptotic time complexity.
P.S. Informative variable names are important, and along with good code formatting, made your code much easier to read.

The program counts maximum number of words in a sentence
Suppose you have this text:
Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Split by dot (.)
arr[0]= Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua
arr[1]= Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat
arr[2]= Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur
arr[3]= Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Amount of words is related with amount of spaces, so count the spaces will be easier.
int max = 0; // this hold the maximal space count
int index = 0; // this hold the index of the maximal space count
Iterate over the array
int spaces = arr[i].length - arr[i].replace(" ","").length;
if(spaces>max){
max = spaces;
index = i;
}
At the end of that loop you will get the maximal amount of words and the index of the array of that sentence.

From what i understand, you want to parse an input text, so you can get the wordcount of each sentence and find the sentence witht the highest one.
First of all, you are only returning the highest wordcount, nothing to identify the sentence itself.
Second (as others have already pointed out) the sorting can be substituted with keeping only the longest sentence, and upon finding a longer one, replacing it. That would indeed bring it to O(n).
Third is the problem that sentences dont only end with periods.
String longest;
for (String s : sentences){
if(s.split(" ").length>longest.split(" ").length))
{
longest = s
}
}
return longest;

Why are two references of Integer equal to a certain point? [duplicate]

This question already has answers here:
Using == operator in Java to compare wrapper objects
(8 answers)
Closed 7 years ago.
Why the Integer objects do not behave the way String objects behave?
I read that the reason was performance but can not understand how it would perform better?
Look at the code below for example :
public class MyClass{
public static void main(String[] args){
String one = "myString";
String two = "myString";
System.out.println(one == two); // true
String oneLong = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus in leo at massa vehicula rhoncus quis eu mauris. Pellentesque non nulla convallis, tempus augue sed, sollicitudin risus. Aenean posuere nulla ipsum, at faucibus massa dignissim non. Duis felis felis, iaculis eu posuere id, elementum id nulla. Fusce tristique arcu vitae consectetur vehicula. Mauris tincidunt nunc placerat tortor rhoncus, eget venenatis felis dapibus. Sed scelerisque ligula congue ligula elementum hendrerit. Proin et mauris vestibulum, rutrum ante ut, sollicitudin massa. Fusce tempus mattis eleifend. Phasellus ut ante turpis. Suspendisse eu leo nec elit ornare rhoncus sed nec ex. In at tellus mi.";
String twoLong = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus in leo at massa vehicula rhoncus quis eu mauris. Pellentesque non nulla convallis, tempus augue sed, sollicitudin risus. Aenean posuere nulla ipsum, at faucibus massa dignissim non. Duis felis felis, iaculis eu posuere id, elementum id nulla. Fusce tristique arcu vitae consectetur vehicula. Mauris tincidunt nunc placerat tortor rhoncus, eget venenatis felis dapibus. Sed scelerisque ligula congue ligula elementum hendrerit. Proin et mauris vestibulum, rutrum ante ut, sollicitudin massa. Fusce tempus mattis eleifend. Phasellus ut ante turpis. Suspendisse eu leo nec elit ornare rhoncus sed nec ex. In at tellus mi.";
System.out.println(oneLong == twoLong); // true
Integer first = 1;
Integer second = 1;
System.out.println(first == second); // true
Integer third = 500;
Integer fourth = 500;
System.out.println(third == fourth); // false
}
}
Here are the questions I found but no response was given there :
Why Integer In Java is Immutable
Is Integer Immutable?

Generally is a good idea to mantain an object as immutable as possible because an immutable object can be shared without problems in a multithreading enviroment.
Creating n copies of the same Integer is possible, ma that copies are immutable. Note that also for String is possible to create n different copies of the same String. To do that you need to explicitly use the new keyword.
Two objects are the same if compared with the operator == returns true. Two objects have the same content if compared with the method equals returns true.
obj1 == obj2 // if true obj1 and obj2 are the same
obj1.equals(obj2) // if true obj1 and obj2 have the same content
// (can be the same or not)
In your example you have two Integer with the same content (so equals returns true) but different memory locations (so == returns false)

What is the difference between a built-in font and a third-party font to Java?

Trying to figure out what may be a bug in the code for MultiLineLabel found here: http://samuelsjoberg.com/archive/2009/10/multiline-labels-in-swing
Essentially, that sample will run just fine with, say, Arial. But if I install a custom font to my Mac, like ITCKorinna-Bold, it still renders the MultiLineLabel, but it doesn't add any linebreaks, so
"This is a
long line." becomes "This is ..."

On Linux with OpenJDK 7 update 19 it works just fine. You haven't supplied your code so its kind of hard for us to see if there might be anything wrong with that.
When I modify the example to use Roboto the line breaks work as expected.
I don't know which Java runtime you are using but if you are using the one supplied by Apple there might be a difference in the values returned by the FontMetrics object (since the Apple Java runtime does font rendering different then for example OpenJDK on Linux). I would suggest trying OpenJDK and see if that makes a difference or run your app on a different platform and see if that makes a difference.
Font robotoFont = null;
try {
robotoFont = Font.createFont(Font.TRUETYPE_FONT, new File("/usr/share/fonts/roboto/Roboto-Black.ttf"));
robotoFont = robotoFont.deriveFont(14f);
} catch (Exception e) {
e.printStackTrace();
}
// Using the MultiLineLabel class.
final MultiLineLabel mLabel = new MultiLineLabel(
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. "
+ "Phasellus non sapien quam. Fusce posuere, nisl "
+ "vitae tristique volutpat, augue erat faucibus nisl, "
+ "nec venenatis metus sem vel enim. Cras in libero "
+ "sapien, vitae euismod neque. Proin hendrerit, odio "
+ "et faucibus suscipit, eros tellus blandit justo, "
+ "ac cursus risus elit ut risus.");
mLabel.setForeground(Color.WHITE);
mLabel.setFont(robotoFont);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex VS Contains. Best Performance? [closed] - java

I've done a test and it is faster to use contains. As Ewan Todd said, they both fast enough to don't really bother with that.

They're both fast enough to be over before you know it. I'd go for the one that you can read more easily.

Related

Check if one List<String> contains specific string from another list

Find and replace text iteratively from a list

time complexity Java

Why are two references of Integer equal to a certain point? [duplicate]

What is the difference between a built-in font and a third-party font to Java?

Categories

Resources