Regex filename with exactly 2 underscores - java

I need to match if filenames have exactly 2 underscores and extension 'txt'.
For example:
asdf_assss_eee.txt -> true
asdf_assss_eee_txt -> false
asdf_assss_.txt -> false
private static final String FILENAME_PATTERN = "/^[A-Za-z0-9]+_[A-Za-z0-9]+_[A- Za-z0-9]\\.txt";
does not working.

You just need to add + after the third char class and you must remove the first forward slash.
private static final String FILENAME_PATTERN = "^[A-Za-z0-9]+_[A-Za-z0-9]+_[A-Za-z0-9]+\\.txt$";

You can use a regex like this with insensitive flag:
[a-z\d]+_[a-z\d]+_[a-z\d]+\.txt
Or with inline insensitive flag
(?i)[a-z\d]+_[a-z\d]+_[a-z\d]+\.txt
Working demo
In case you want to shorten it a little, you could do:
([a-z\d]+_){2}[a-z\d]+\.txt

Update
So lets assume you want to at least one or more characters after the second underscore, before the file extension.
Regex is still not "needed" for this. You could split the String by the underscore and you should have 3 elements from the split. If the 3rd element is just ".txt" then it's not valid.
Example:
public static void main(String[] args) throws Exception {
String[] data = new String[] {
"asdf_assss_eee.txt",
"asdf_assss_eee_txt",
"asdf_assss_.txt"
};
for (String d : data) {
System.out.println(validate(d));
}
}
public static boolean validate(String str) {
if (!str.endsWith(".txt")) {
return false;
}
String[] pieces = str.split("_");
return pieces.length == 3 && !pieces[2].equalsIgnoreCase(".txt");
}
Results:
true
false
false
Old Answer
Not sure I understand why your third example is false, but this is something that can easily be done without regex.
Start with checking to see if the String ends with ".txt", then check if it contains only two underscores.
Example:
public static void main(String[] args) throws Exception {
String[] data = new String[] {
"asdf_assss_eee.txt",
"asdf_assss_eee_txt",
"asdf_assss_.txt"
};
for (String d : data) {
System.out.println(validate(d));
}
}
public static boolean validate(String str) {
if (!str.endsWith(".txt")) {
return false;
}
return str.chars().filter(c -> c == '_').count() == 2;
}
Results:
true
false
true

Use this Pattern:
Pattern p = Pattern.compile("_[^_]+_[^_]+\\.txt")
and use .find() instead of .match() in the Matcher:
Matcher m = p.matcher(filename);
if (m.find()) {
// found
}

Related

Regex to validate that every digit is different from each other

I have to validate strings with specific conditions using a regex statement. The condition is that every digit is different from each other. So, 123 works but not 112 or 131.
So, I wrote a statement which filters a string according to the condition and prints true once a string fullfies everything, however it only seems to print "true" altough some strings do not meet the condition.
public class MyClass {
public static void main(String args[]) {
String[] value = {"123","951","121","355","110"};
for (String s : value){
System.out.println("\"" + s + "\"" + " -> " + validate(s));
}
}
public static boolean validate(String s){
return s.matches("([0-9])(?!\1)[0-9](?!\1)[0-9]");
}
}
#Vinz's answer is perfect, but if you insist on using regex, then you can use:
public static boolean validate(String s) {
return s.matches("(?!.*(.).*\\1)[0-9]+");
}
You don't need to use regex for that. You can simply count the number of unique characters in the String and compare it to the length like so:
public static boolean validate(String s) {
return s.chars().distinct().count() == s.length();
}

How do I check if the number of occurrences of two words in a String is Equal without using loops?

I am trying to find out if there is the same number of occurrences "dog" and "cat" are in the given String.
It should return true if they are equal, or false otherwise. How can I find out this without while, for etc. loops?
This is my current process
class Main {
public static boolean catsDogs(String s) {
String cat = "cat";
String dog = "dog";
if (s.contains(cat) && s.contains(dog)) {
return true;
}
return false;
}
public static void main(String[] args) {
boolean r = catsDogs("catdog");
System.out.println(r); // => true
System.out.println(catsDogs("catcat")); // => false
System.out.println(catsDogs("1cat1cadodog")); // => true
}
}
With java9+ the regex matcher has a count method:
public static boolean catsDogs(String s) {
Pattern pCat = Pattern.compile("cat");
Pattern pDog = Pattern.compile("dog");
Matcher mCat = pCat.matcher(s);
Matcher mDog = pDog.matcher(s);
return (mCat.results().count() == mDog.results().count());
}
You can use the following example by replacing the string (in case you don't want the split to be placed) :
public static boolean catsDogs(String s) {
return count(s,"cat") == count(s,"dog");
}
public static int count(String s, String catOrDog) {
return (s.length() - s.replace(catOrDog, "").length()) / catOrDog.length();
}
public static void main(String[] args) {
boolean r = catsDogs("catdog");
System.out.println(r); // => true
System.out.println(catsDogs("catcat")); // => false
System.out.println(catsDogs("1cat1cadodog")); // => true
}
Here's a couple of single-line solutions based on Java 9 Matcher.result() which produces a stream of MatchResult corresponding to each matching subsequence in the given string.
We can also make this method more versatile by providing a pair of regular expressions as arguments instead of hard-coding them.
teeing() + summingInt()
We can turn the stream of MatchResesult into a stream of strings by generating matching groups. And collect the data using collector teeing() expecting as its arguments two downstream collectors and a function producing the result based on the values returned by each collector.
public static boolean hasSameFrequency(String str,
String regex1,
String regex2) {
return Pattern.compile(regex1 + "|" + regex2).matcher(str).results()
.map(MatchResult::group)
.collect(Collectors.teeing(
Collectors.summingInt(group -> group.matches(regex1) ? 1 : 0),
Collectors.summingInt(group -> group.matches(regex2) ? 1 : 0),
Objects::equals
));
}
collectingAndThen() + partitioningBy()
Similarly, we can use a combination of collectors collectingAndThen() and partitioningBy().
The downside of this approach in comparison to the one introduced above is that partitioningBy() materializes stream elements as the values of the map (meanwhile we're interested only their quantity), but it performs fewer comparisons.
public static boolean hasSameFrequency(String str,
String regex1,
String regex2) {
return Pattern.compile(regex1 + "|" + regex2).matcher(str).results()
.map(MatchResult::group)
.collect(Collectors.collectingAndThen(
Collectors.partitioningBy(group -> group.matches(regex1)),
map -> map.get(true).size() == map.get(false).size()
));
}

Java Pattern.split() with overlapping delimiters

Firstly, I'm aware of similar questions that have been asked such as here:
How to split a string, but also keep the delimiters?
However, I'm having issue implementing a split of a string using Pattern.split() where the pattern is based on a list of delimiters, but where they can sometimes appear to overlap. Here is the example:
The goal is to split a string based on a set of known codewords which are surrounded by slashes, where I need to keep both the delimiter (codeword) itself and the value after it (which may be empty string).
For this example, the codewords are:
/ABC/
/DEF/
/GHI/
Based on the thread referenced above, the pattern is built as follows using look-ahead and look-behind to tokenise the string into codewords AND values:
((?<=/ABC/)|(?=/ABC/))|((?<=/DEF/)|(?=/DEF/))|((?<=/GHI/)|(?=/GHI/))
Working string:
"123/ABC//DEF/456/GHI/789"
Using split, this tokenises nicely to:
"123","/ABC/","/DEF/","456","/GHI/","789"
Problem string (note single slash between "ABC" and "DEF"):
"123/ABC/DEF/456/GHI/789"
Here the expectation is that "DEF/456" is the value after "/ABC/" codeword because the "DEF/" bit is not actually a codeword, but just happens to look like one!
Desired outcome is:
"123","/ABC/","DEF/456","/GHI/","789"
Actual outcome is:
"123","/ABC","/","DEF/","456","/GHI/","789"
As you can see, the slash between "ABC" and "DEF" is getting isolated as a token itself.
I've tried solutions as per the other thread using only look-ahead OR look-behind, but they all seem to suffer from the same issue. Any help appreciated!
If you are OK with find rather than split, using some non-greedy matches, try this:
public class SampleJava {
static final String[] CODEWORDS = {
"ABC",
"DEF",
"GHI"};
static public void main(String[] args) {
String input = "/ABC/DEF/456/GHI/789";
String codewords = Arrays.stream(CODEWORDS)
.collect(Collectors.joining("|", "/(", ")/"));
// codewords = "/(ABC|DEF|GHI)/";
Pattern p = Pattern.compile(
/* codewords */ ("(DELIM)"
/* pre-delim */ + "|(.+?(?=DELIM))"
/* final bit */ + "|(.+?$)").replace("DELIM", codewords));
Matcher m = p.matcher(input);
while(m.find()) {
System.out.print(m.group(0));
if(m.group(1) != null) {
System.out.print(" ← code word");
}
System.out.println();
}
}
}
Output:
/ABC/ ← code word
DEF/456
/GHI/ ← code word
789
Use a combination of positive and negative look arounds:
String[] parts = s.split("(?<=/(ABC|DEF|GHI)/)(?<!/(ABC|DEF|GHI)/....)|(?=/(ABC|DEF|GHI)/)(?<!/(ABC|DEF|GHI))");
There's also a considerable simplification by using alternations inside single look ahead/behind.
See live demo.
Following some TDD principles (Red-Green-Refactor), here is how I would implement such behaviour:
Write specs (Red)
I defined a set of unit tests that explain how I understood your "tokenization process". If any test is not correct according to what you expect, feel free to tell me and I'll edit my answer accordingly.
import static org.assertj.core.api.Assertions.assertThat;
import java.util.List;
import org.junit.Test;
public class TokenizerSpec {
Tokenizer tokenizer = new Tokenizer("/ABC/", "/DEF/", "/GHI/");
#Test
public void itShouldTokenizeTwoConsecutiveCodewords() {
String input = "123/ABC//DEF/456";
List<String> tokens = tokenizer.splitPreservingCodewords(input);
assertThat(tokens).containsExactly("123", "/ABC/", "/DEF/", "456");
}
#Test
public void itShouldTokenizeMisleadingCodeword() {
String input = "123/ABC/DEF/456/GHI/789";
List<String> tokens = tokenizer.splitPreservingCodewords(input);
assertThat(tokens).containsExactly("123", "/ABC/", "DEF/456", "/GHI/", "789");
}
#Test
public void itShouldTokenizeWhenValueContainsSlash() {
String input = "1/23/ABC/456";
List<String> tokens = tokenizer.splitPreservingCodewords(input);
assertThat(tokens).containsExactly("1/23", "/ABC/", "456");
}
#Test
public void itShouldTokenizeWithoutCodewords() {
String input = "123/456/789";
List<String> tokens = tokenizer.splitPreservingCodewords(input);
assertThat(tokens).containsExactly("123/456/789");
}
#Test
public void itShouldTokenizeWhenEndingWithCodeword() {
String input = "123/ABC/";
List<String> tokens = tokenizer.splitPreservingCodewords(input);
assertThat(tokens).containsExactly("123", "/ABC/");
}
#Test
public void itShouldTokenizeWhenStartingWithCodeword() {
String input = "/ABC/123";
List<String> tokens = tokenizer.splitPreservingCodewords(input);
assertThat(tokens).containsExactly("/ABC/", "123");
}
#Test
public void itShouldTokenizeWhenOnlyCodeword() {
String input = "/ABC//DEF//GHI/";
List<String> tokens = tokenizer.splitPreservingCodewords(input);
assertThat(tokens).containsExactly("/ABC/", "/DEF/", "/GHI/");
}
}
Implement according to the specs (Green)
This class make all the tests above pass
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Optional;
public final class Tokenizer {
private final List<String> codewords;
public Tokenizer(String... codewords) {
this.codewords = Arrays.asList(codewords);
}
public List<String> splitPreservingCodewords(String input) {
List<String> tokens = new ArrayList<>();
int lastIndex = 0;
int i = 0;
while (i < input.length()) {
final int idx = i;
Optional<String> codeword = codewords.stream()
.filter(cw -> input.substring(idx).indexOf(cw) == 0)
.findFirst();
if (codeword.isPresent()) {
if (i > lastIndex) {
tokens.add(input.substring(lastIndex, i));
}
tokens.add(codeword.get());
i += codeword.get().length();
lastIndex = i;
} else {
i++;
}
}
if (i > lastIndex) {
tokens.add(input.substring(lastIndex, i));
}
return tokens;
}
}
Improve implementation (Refactor)
Not done at the moment (not enough time that I can spend on that answer now). I'll do some refactor on Tokenizer with pleasure if you request me to (but later). :-) Or you can do it yourself quite securely since you have the unit tests to avoid regressions.

Java substring string when specific string occurs

i need help to substring a string when a a substring occurs.
Example
Initial string: 123456789abcdefgh
string to substr: abcd
result : 123456789
I checked substr method but it accept index position value.I need to search the occurrence of the substring and than pass the index?
If you want to split the String from the last number (a), then the code would look like this:
you can change the "a" to any char within the string
package nl.testing.startingpoint;
public class Main {
public static void main(String args[]) {
String[] part = getSplitArray("123456789abcdefgh", "a");
System.out.println(part[0]);
System.out.println(part[1]);
}
public static String[] getSplitArray(String toSplitString, String spltiChar) {
return toSplitString.split("(?<=" + spltiChar + ")");
}
}
Bear in mind that toSplitString.split("(?<=" + spltiChar + ")"); splits from the first occurrence of that character.
Hope this might help:
public static void main(final String[] args)
{
searchString("123456789abcdefghabcd", "abcd");
}
public static void searchString(String inputValue, final String searchValue)
{
while (!(inputValue.indexOf(searchValue) < 0))
{
System.out.println(inputValue.substring(0, inputValue.indexOf(searchValue)));
inputValue = inputValue.substring(inputValue.indexOf(searchValue) +
searchValue.length());
}
}
Output:
123456789
efgh
Use a regular expression, like this
static String regex = "[abcd[.*]]"
public String remove(String string, String regex) {
return string.contains(regex) ? string.replaceAll(regex) : string;
}

HashSet contains

I have set of keywords and I have one string which contains keyword instances separated by '/'. e.g. 'Food' or 'Car' are keywords and '/food/oatmeal/fruits' , '/tyre/car/wheel' are strings. Total # of keywords are 5500 . I need to flag this string 'eligible' if it has at least one of the 5550 keywords in it. One way I can do is to load all 5500 keywords in hashSet and split String in to tokens and check if hashSet contains each of the tokens. If find match, I flag that String 'eligible'.
Performance wise, Can there be a better solution ?
A simplified solution for token matching could be
public class REPL {
private static final HashSet<String> keyWords = new HashSet<>();
public static void main(String[] args) {
keyWords.add("food");
keyWords.add("car");
String[] strings = {
"/food/oatmeal/fruits",
"/tyre/car/wheel",
"/steel/nuts/bolts",
"/cart/handle/grill"
};
for (String s : strings) {
System.out.printf("string: %-20s ", s);
if (isEligible(s)) {
System.out.println("eligible: true");
} else {
System.out.println("eligible: false");
}
}
}
private static boolean isEligible(String s) {
StringTokenizer st = new StringTokenizer(s, "/");
while (st.hasMoreTokens()) {
if (keyWords.contains(st.nextToken())) {
return true;
}
}
return false;
}
}

Categories