Imagine a string like this:
#*****~~~~~~**************~~~~~~~~***************************#
I am looking for an elegant way to find the indices of the longest continues section that contains a specific character. Let's assume we are searching for the * character, then I expect the method to return the start and end index of the last long section of *.
I am looking for the elegant way, I know I could just bruteforce this by checking something like
indexOf(*)
lastIndexOf(*)
//Check if in between the indices is something else if so, remember length start from new
//substring and repeat until lastIndex reached
//Return saved indices
This is so ugly brute-force - Any more elegant way of doing this? I thought about regular expression groups and comparing their length. But how to get the indices with that?
Regex-based solution
If you don't want to hardcode a specific character like * and find "Find the longest section of repeating characters" as the title of the question states, then the proper regular expression for the section of repeated characters would be:
"(.)\\1*"
Where (.) a group that consists from of a single character, and \\1 is a backreference that refers to that group. * is greedy quantifier, which means that presiding backreference could be repeated zero or more times.
Finally, "(.)\\1*" captures a sequence of subsequent identical characters.
Now to use it, we need to compile the regex into Pattern. This action has a cost, hence if the regex would be used multiple times it would be wise to declare a constant:
public static final Pattern REPEATED_CHARACTER_SECTION =
Pattern.compile("(.)\\1*");
Using features of modern Java, the longest sequence that matches the above pattern could be found literally with a single line of code.
Since Java 9 we have method Matcher.results() which return a stream of MatchResult objects, describe a matching group.
MatchResult.start() MatchResult.end() expose the way of accessing start and end indices of the group. To extract the group itself, we need to invoke MatchResult.group().
That how an implementation might look like:
public static void printLongestRepeatedSection(String str) {
String longestSection = REPEATED_CHARACTER_SECTION.matcher(str).results() // Stream<MatchResult>
.map(MatchResult::group) // Stream<String>
.max(Comparator.comparingInt(String::length)) // find the longest string in the stream
.orElse(""); // or orElseThrow() if you don't want to allow an empty string to be received as an input
System.out.println("Longest section:\t" + longestSection);
}
main()
public static void printLongestRepeatedSection(String str) {
MatchResult longestSection = REPEATED_CHARACTER_SECTION.matcher(str).results() // Stream<MatchResult>
.max(Comparator.comparingInt(m -> m.group().length())) // find the longest string in the stream
.orElseThrow(); // would throw an exception is an empty string was received as an input
System.out.println("Section start: " + longestSection.start());
System.out.println("Section end: " + longestSection.end());
System.out.println("Longest section: " + longestSection.group());
}
Output:
Section start: 34
Section end: 61
Longest section: ***************************
Links:
Official tutorials on Lambda expressions and Stream API provided by Oracle
A quick tutorial on Regular expressions
Simple and Performant Iterative solution
You can do it without regular expressions by manually iterating over the indices of the given string and checking if the previous character matches the current one.
You just need to maintain a couple of variables denoting the start and the end of the longest previously encountered section, and a variable to store the starting index of the section that is being currently examined.
That's how it might be implemented:
public static void printLongestRepeatedSection(String str) {
if (str.isEmpty()) throw new IllegalArgumentException();
int maxStart = 0;
int maxEnd = 1;
int curStart = 0;
for (int i = 1; i < str.length(); i++) {
if (str.charAt(i) != str.charAt(i - 1)) { // current and previous characters are not equal
if (maxEnd - maxStart < i - curStart) { // current repeated section is longer then the maximum section discovered previously
maxStart = curStart;
maxEnd = i;
}
curStart = i;
}
}
if (str.length() - curStart > maxEnd - maxStart) { // checking the very last section
maxStart = curStart;
maxEnd = str.length();
}
System.out.println("Section start: " + maxStart);
System.out.println("Section end: " + maxEnd);
System.out.println("Section: " + str.substring(maxStart, maxEnd));
}
main()
public static void main(String[] args) {
String source = "#*****~~~~~~**************~~~~~~~~***************************#";
printLongestRepeatedSection(source);
}
Output:
Section start: 34
Section end: 61
Section: ***************************
Use methods of class Matcher.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Solution {
public static void main(String args[]) {
String str = "#*****~~~~~~**************~~~~~~~~***************************#";
Pattern pattern = Pattern.compile("\\*+");
Matcher matcher = pattern.matcher(str);
int max = 0;
while (matcher.find()) {
int length = matcher.end() - matcher.start();
if (length > max) {
max = length;
}
}
System.out.println(max);
}
}
The regular expression searches for occurrences of one or more asterisk (*) characters.
Method end returns the index of the first character after the last character matched and method start returns the index of the first character matched. Hence the length is simply the value returned by method end minus the value returned by method start.
Each subsequent call to method find starts searching from the end of the previous match.
The only thing left is to get the longest string of asterisks.
The solution based on the regular expression may use some features from Stream API to get an array of indexes of the longest sequence of a given character:
Pattern.quote should be used to safely wrap the input character for the search within a regular expression
Stream<MatchResult> returned by Matcher::results provides necessary information about the start and end of the match
Stream::max allows to select the longest matching sequence
Optional::map and Optional::orElseGet help convert the match into the desired array of indexes
public static int[] getIndexes(char c, String str) {
return Pattern.compile(Pattern.quote(Character.toString(c)) + "+")
.matcher(str)
.results()
.max(Comparator.comparing(mr -> mr.end() - mr.start()))
.map(mr -> new int[]{mr.start(), mr.end()})
.orElseGet(() -> new int[]{-1, -1});
}
// Test:
System.out.println(Arrays.toString(getIndexes('*', "#*****~~~~~~**************~~~~~~~~***************************#")));
// -> [34, 61]
Related
How can I replace consecutive characters with a single character in java?
String fileContent = "def mnop.UVW";
String oldDelimiters = " .";
String newDelimiter = "!";
for (int i = 0; i < oldDelimiters.length(); i++){
Character character = oldDelimiters.charAt(i);
fileContent = fileContent.replace(String.valueOf(character), newDelimiter);
}
Current output: def!!mnop!UVW
Desired output: def!mnop!UVW
Notice the two spaces are replaced with two exclamation marks. How can I replace consecutive delimiters with one delimiter?
Since you want to match consecutive characters from the old delimiter, a regex solution doesn't seem to be feasible here. You can instead match char by char if it belongs to one of the old delimiter chars and then set it with the new one as shown below.
import java.util.*;
public class Main{
public static void main(String[] args) {
String fileContent = "def mnop.UVW";
String oldDelimiters = " .";
// add all old delimiters in a set for fast checks
Set<Character> set = new HashSet<>();
for(int i=0;i<oldDelimiters.length();++i) set.add(oldDelimiters.charAt(i));
/*
match all consecutive chars at once, check if it belongs to an old delimiter
and replace it with the new one
*/
String newDelimiter = "!";
StringBuilder res = new StringBuilder("");
for(int i=0;i<fileContent.length();++i){
if(set.contains(fileContent.charAt(i))){
while(i + 1 < fileContent.length() && fileContent.charAt(i) == fileContent.charAt(i+1)) i++;
res.append(newDelimiter);
}else{
res.append(fileContent.charAt(i));
}
}
System.out.println(res.toString());
}
}
Demo: https://onlinegdb.com/r1BC6qKP8
s = s.replaceAll("([ \\.])[ \\.]+", "$1");
Or if only several same delimiters have to be replaced:
s = s.replaceAll("([ \\.])\\1+", "$1");
[....] is a group of alternative characters
First (...) is group 1, $1
\\1 is the text of the first group
While not using regex, I thought a solution with StreamS was needed, because everyone loves streams:
private static class StatefulFilter implements Predicate<String> {
private final String needle;
private String last = null;
public StatefulFilter(String needle) {
this.needle = needle;
}
#Override
public boolean test(String value) {
boolean duplicate = last != null && last.equals(value) && value.equals(needle);
last = value;
return !duplicate;
}
}
public static void main(String[] args) {
System.out.println(
"def mnop.UVW"
.codePoints()
.sequential()
.mapToObj(c -> String.valueOf((char) c))
.filter(new StatefulFilter(" "))
.map(x -> x.equals(" ") ? "!" : x)
.collect(Collectors.joining(""))
);
}
Runnable example: https://onlinegdb.com/BkY0R2twU
Explanation:
Theoretically, you aren't really supposed to have a stateful filter, but technically, as long as the stream is not parallelized, it works fine:
.codePoints() - splits the String into a Stream
.sequential() - since we care about the order of characters, our Stream may not be processed in parallel
.mapToObj(c -> String.valueOf((char) c)) - the comparison in the filter is more intuitive if we convert to String, but it's not really needed
.filter(new StatefulFilter(" ")) - here we filter out any space that comes after another space
.map(x -> x.equals(" ") ? "!" : x) - now we can replace the remaining spaces with exclamation marks
.collect(Collectors.joining("")) - and finally we can join the characters together to reconstitute a String
The StatefulFilter itself is pretty straight forward - it checks whether a) we have a previous character at all, b) whether the previous character is the same as the current character and c) whether the current character is the delimiter (space). It returns false (meaning the character gets deleted) only if all a, b and c are true.
The biggest difficulty to using a regex for this, is to create an expression from your oldDelimiters string. For example:
String oldDelimiters = " .";
String expression = "\\" + String.join("+|\\", oldDelimiters.split("")) + "+";
String text = "def mnop.UVW;abc .df";
String result = text.replaceAll(expression, "!");
(Edit: since characters in the expression are now escaped anyway, I removed the character classes and edited the following text to reflect that change.)
Where the generated expression looks like \ +|\.+, i.e. each character is quantified and constitutes one alternative of the expression. The engine will match and replace one alternative at a time if it can be matched. result now contains:
def!mnop!UVW;abc!!df
Not sure how backwards compatible this is due to split() behaviour in previous versions of Java (producing a leading space in splitting on the empty string), but with current versions this should be fine.
Edit: As it is, this breaks if the delimiting characters contain digits or characters representing unescaped regex tokens (i.e. 1, b, etc.).
The output should be like;
Hans4444müller ---> HansIVmüller
Mary555kren ---> MaryVkren
Firstly I have tried to get all repetitive numbers from a word with that regex:
(\d)\1+ // and replace that with $1
After I get the repetitive number such as 4, I tried to change this number to IV but
unfortunately I can't find the correct regex for this.
What I think about this algorithm is if there is a repeating number, replace that number with the roman form.
Are there any possible way to do it with regex ?
I don't know Java very well, but I do know regular expressions, C# and JavaScript. I am confident you can adapt one of my techniques to Java.
I have sample code with two different techniques.
The first invokes a function on every match to perform the replacement
The second iterates the matches provided by your regular expression you and convert each match into Roman numerals, then injects the result into your original text.
The link below illustrates technique 1 using DotNetFiddle. The replacement function takes a method name. The method in question performs is invoked for every match. This technique requires very little code.
https://dotnetfiddle.net/o9gG28. If you're lucky, Java has a similar technique available.
Technique 2: a javascript version that loops through every match found by the regex:
https://jsfiddle.net/ActualRandy/rxnzoc3u/81/. The method does some string concatenation using the replacement value.
Here's some code for method 2 using .NET syntax, Java should be similar. The key methods are 'Match' and 'GetNextMatch'. Match uses your regex to get the first match.
private void btnRegexRep_Click(object sender, RoutedEventArgs e) {
string fixThis = #"Hans4444müller,Mary555kren";
var re = new Regex("\\d+");
string result = "";
int lastIndex = 0;
string lastMatch = "";
//Get the first match using the regular expression:
var m = re.Match(fixThis);
//Keep looping while we can match:
while (m.Success) {
//Get length of text between last match and current match:
int len = m.Index - (lastIndex + lastMatch.Length);
result += fixThis.Substring(lastIndex + lastMatch.Length, len) + GetRomanText(m);
//Save values for next iteration:
lastIndex = m.Index;
lastMatch = m.Value;
m = m.NextMatch();
}
//Append text after last match:
if (lastIndex > 0) {
result += fixThis.Substring(lastIndex + lastMatch.Length);
}
Console.WriteLine(result);
}
private string GetRomanText(Match m) {
string[] roman = new[] { "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "IX" };
string result = "";
// Get ASCII value of first digit from the match (remember, 48= ascii 0, 57=ascii 9):
char c = m.Value[0];
if (c >= 48 && c <= 57) {
int index = c - 48;
result = roman[index];
}
return result;
}
I wrote the following Java code, and it returned a time out error. I'm not exactly sure what that means nor why the code doesn't run
public int countHi(String str) {
int pos = str.indexOf("hi");
int count = 0;
while(pos!=-1)
{
count++;
pos = str.substring(pos).indexOf("hi");
}
return count;
}
I know an alternative solution, using a for loop but I really thought this would work too.
You're getting into an infinite loop because pos never advances past the first match as the first match will be included in the substring.
You can fix by using this overridden version of indexOf() inside your while loop:
pos = str.indexOf("hi", pos + 1);
Or use a do... while loop to avoid having to repeat the call to indexOf():
public static int countHi(String str) {
int pos = -1, count = -1;
do {
count++;
pos = str.indexOf("hi", pos + 1);
} while (pos != -1);
return count;
}
str.substring(pos) Output the substring from the given index. therefore in your code while loop never travel through your whole string and its stop at the first "hi".Use this.
while(pos!=-1){
count++;
str = str.substring(pos+2);
pos = str.indexOf("hi");
}
str variable store 2nd half of the string (use +2 for travel two more indexes for end of the hi) then check pos variable store that index of "hi" appear in the new string.
Just for added fun......
If the supplied substring (ie: "hi") is to be counted and it doesn't matter where it is located within the input string (single word or part of a word), you can use a one liner and let the String.replace() method do the job for you by actually removing the desired substring you want to count from the initial input string and calculating what remains of that input string (this does not modify the initial input string):
String inputString = "Hi there. This is a hit in his pocket";
String subString = "hi";
int count = (inputString.length() - inputString.replace(subString, "").
length()) / subString.length())
//Display the result...
System.out.println(count);
Console will display: 3
You will note that the above code is letter case sensitive and therefore in the example above the substring "hi" differs from the word "Hi" because of the uppercase "H" so "Hi" is ignored. If you want to ignore letter case when counting for the supplied substrings then you can use the same code but utilize the String.toLowerCase() method within it:
String inputString = "Hi there. This is a hit in his pocket";
String subString = "hi";
int count = (inputString.length() - inputString.toLowerCase().
replace(substring.toLowerCase(), "").
length()) / substring.length())
//Display the result...
System.out.println(count);
Console will display: 4
If however the supplied substring you want to count is a specific word (not part of another word) then it gets a little more complicated. One way you can do this is by utilizing the Pattern and Matcher Classes along with a small Regular Expression. It could look something like this:
String inputString = "Hi there. This is a hit in his pocket";
String subString = "Hi";
String regEx = "\\b" + subString + "\\b";
int count = 0; // To hold the word count
// Compile the regular expression
Pattern p = Pattern.compile(regEx);
// See if there are matches of subString within the
// input string utilizing the compiled pattern
Matcher m = p.matcher(inputString);
// Count the matches found
while (m.find()) {
count++;
}
//Display the count result...
System.out.println(count);
Console will display: 1
Again, the above code is letter case sensitive. In other words if the supplied substring was "hi" then the display in console would of been 0 since "hi" is different from "Hi" which is in fact contained within the input string as the first word. If you want to ignore letter case then it would just be a matter of converting both the input string and the supplied substring to either all upper case or all lowercase, for example:
String inputString = "Hi there. This is a hit in his pocket";
String subString = "this is";
String regEx = "\\b" + subString.toLowerCase() + "\\b";
int count = 0; // To hold the word count
// Compile the regular expression
Pattern p = Pattern.compile(regEx);
// See if there are matches of subString within the
// input string utilizing the compiled pattern
Matcher m = p.matcher(inputString.toLowerCase());
// Count the matches found
while (m.find()) {
count++;
}
//Display the count result...
System.out.println(count);
Console will display: 1
As you can see in the two most recent code examples above the Regular Expression (RegEx) of "\\bHi\\b" was used (in code, a variable was used in the place of Hi) and here is what it means:
I'm trying to write a function to count specific Strings.
The Strings to count look like the following:
first any character except comma at least once -
the comma -
any chracter but at least once
example string:
test, test, test,
should count to 3
I've tried do that by doing the following:
int countSubstrings = 0;
final Pattern pattern = Pattern.compile("[^,]*,.+");
final Matcher matcher = pattern.matcher(commaString);
while (matcher.find()) {
countSubstrings++;
}
Though my solution doesn't work. It always ends up counting to one and no further.
Try this pattern instead: [^,]+
As you can see in the API, find() will give you the next subsequence that matches the pattern. So this will find your sequences of "non-commas" one after the other.
Your regex, especially the .+ part will match any char sequence of at least length 1. You want the match to be reluctant/lazy so add a ?: [^,]*,.+?
Note that .+? will still match a comma that directly follows a comma so you might want to replace .+? with [^,]+ instead (since commas can't match with this lazyness is not needed).
Besides that an easier solution might be to split the string and get the length of the array (or loop and check the elements if you don't want to allow for empty strings):
countSubstrings = commaString.split(",").length;
Edit:
Since you added an example that clarifies your expectations, you need to adjust your regex. You seem to want to count the number of strings followed by a comma so your regex can be simplified to [^,]+,. This matches any char sequence consisting of non-comma chars which is followed by a comma.
Note that this wouldn't match multiple commas or text at the end of the input, e.g. test,,test would result in a count of 1. If you have that requirement you need to adjust your regex.
So, quite good answers are already given. Very readable. Something like this should work, beware, it's not clean and probably not the fastest way to do this. But is is quite readable. :)
public int countComma(String lots_of_words) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == ',') {
count++;
}
}
return count;
}
Or even better:
public int countChar(String lots_of_words, char the_chosen_char) {
int count = 0;
for (int x = 0; x < lots_of_words.length(); x++) {
if (lots_of_words.charAt(x) == the_chosen_char) {
count++;
}
}
return count;
}
I am looking for an elegant way to find the first appearance of one of a set of delimiters.
For example, let's assume my delimiter set is composed of {";",")","/"}.
If my String is
"aaa/bbb;ccc)"
I would like to get the result 3 (the index of the "/", since it is the first to appear).
If my String is
"aa;bbbb/"
I would like to get the result 2 (the index of the ";", since it is the first to appear).
and so on.
If the String does not contain any delimiter, I would like to return -1.
I know I can do it by first finding the index of each delimiter, then calculating the minimum of the indices, disregarding the -1's. This code becomes very cumbersome. I am looking for a shorter and more generic way.
Through regex , it woud be done like this,
String s = "aa;bbbb/";
Matcher m = Pattern.compile("[;/)]").matcher(s); // [;/)] would match a forward slash or semicolon or closing bracket.
if(m.find()) // if there is a match found, note that it would find only the first match because we used `if` condition not `while` loop.
{
System.out.println(m.start()); // print the index where the match starts.
}
else
{
System.out.println("-1"); // else print -1
}
Search in list of delimiter each character from the input string. If found then print the index.
You can also use Set to store delimiters
Below program will gives the result. this is done using RegEx.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class FindIndexUsingRegex {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
findMatches("aaa/bbb;ccc\\)",";|,|\\)|/");
}
public static void findMatches(String source, String regex) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(source);
while (matcher.find()) {
System.out.print("First index: " + matcher.start()+"\n");
System.out.print("Last index: " + matcher.end()+"\n");
System.out.println("Delimiter: " + matcher.group()+"\n");
break;
}
}
}
Output:
First index: 3
Last index: 4
Delimiter: /