I have a pair of Strings in an array to check in another String:
String[] validPair = "{"[BOLD]", "[/BOLD]" };
String toCheck = "Example [BOLD]bold long text[/BOLD] other example [BOLD]bold short[/BOLD]";
I need to check the balance of the tags, I know how to check if a string is inside another string, also how to achieve this using both indexOf of validPair content across the string and saving references, but is an ugly way and I don't wanna reinvent the wheel.
Something like :
int lastIndex = 0;
while (lastIndex != -1) {
int index = toCheck.findNextOccurrence(validPair, lastIndex); // here use indexOf
System.out.println(index);
lastIndex = index;
}
I was guessing if there is a way I can check nextOccurrence of any of the String's in String[] validPair in the String toCheck?
A kind of Iterator or Tokenizer but not splitting the string and giving only occurrences of the contents of the array (or List or any other Object).
OR:
OwnIterator ownIterator = new OwnIterator<String>(toCheck, validPair);
while (toCheck.hasNext()) {
String next = toCheck.findNextOccurrence();
System.out.println(next);
}
OUTPUT:
[BOLD]
[/BOLD]
[BOLD]
[/BOLD]
This is the solution I came up with. it is using array of regular expressions to search for every item in validPair separetely then combine all found occurrences into one list (and its iterator)
public class OwnIterator implements Iterator
{
private Iterator<Integer> occurrencesItr;
public OwnIterator(String toCheck, String[] validPair) {
// build regex to search for every item in validPair
Matcher[] matchValidPair = new Matcher[validPair.length];
for (int i = 0 ; i < validPair.length ; i++) {
String regex =
"(" + // start capturing group
"\\Q" + // quote entire input string so it is not interpreted as regex
validPair[i] + // this is what we are looking for, duhh
"\\E" + // end quote
")" ; // end capturing group
Pattern p = Pattern.compile(regex);
matchValidPair[i] = p.matcher(toCheck);
}
// do the search, saving found occurrences in list
List<Integer> occurrences = new ArrayList<>();
for (int i = 0 ; i < matchValidPair.length ; i++) {
while (matchValidPair[i].find()) {
occurrences.add(matchValidPair[i].start(0)+1); // +1 if you want index to start at 1
}
}
// sort the list
Collections.sort(occurrences);
occurrencesItr = occurrences.iterator();
}
#Override
public boolean hasNext()
{
return occurrencesItr.hasNext();
}
#Override
public Object next()
{
return occurrencesItr.next();
}
}
a quick test :
public static void main(String[] args)
{
String[] validPair = {"[BOLD]", "[/BOLD]" };
String toCheck = "Example [BOLD]bold long text[/BOLD] other example [BOLD]bold short[/BOLD]";
OwnIterator itr = new OwnIterator(toCheck, validPair);
while (itr.hasNext()) {
System.out.println(itr.next());
}
}
gives desired output:
9
29
51
67
EDIT:
found a better solution, with just one regular expression that includes all items in validPair with "or" condition (|). then you have the Matcher's own find() method as the iterator:
String regex = "(";
for (int i = 0 ; i < validPair.length ; i++) {
regex += (i == 0 ? "" : "|") + // add "or" after first item
"\\Q" + // quote entire input string so it is not interpreted as regex
validPair[i] + // this is what we are looking for, duhh
"\\E"; // end quote
}
regex += ")";
System.out.println("using regex : " + regex);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(toCheck);
while (m.find()) {
System.out.println(m.group(0));
}
you get the output
using regex : (\Q[BOLD]\E|\Q[/BOLD]\E)
[BOLD]
[/BOLD]
[BOLD]
[/BOLD]
You can just do:
int first = toCheck.indexOf(validPair[0]);
boolean ok = first > -1 && toCheck.indexOf(validPair[1], first) > 0;
Related
Trying to search for patterns of letters in a file, the pattern is entered by a user and comes out as a String, so far I've got it to find the first letter by unsure how to make it test to see if the next letter also matches the pattern.
This is the loop I currently have. any help would be appreciated
public void exactSearch(){
if (pattern==null){UI.println("No pattern");return;}
UI.println("===================\nExact searching for "+patternString);
int j = 0 ;
for(int i=0; i<data.size(); i++){
if(patternString.charAt(i) == data.get(i) )
j++;
UI.println( "found at " + j) ;
}
}
You need to iterate over the first string until you find the first character of the other string. From there, you can create an inner loop and iterate on both simultaneously, like you did.
Hint: be sure to look watch for boundaries as the strings might not be of the same size.
You can try this :-
String a1 = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = 0;
while(foundIndex != -1) {
foundIndex = a1.indexOf(pattern,foundIndex);
if(foundIndex != -1)
{
System.out.println(foundIndex);
foundIndex += 1;
}
}
indexOf - first parameter is the pattern string,
second parameter is starting index from where we have to search.
If pattern is found, it will return the starting index from where the pattern matched.
If pattern is not found, indexOf will return -1.
String data = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = data.indexOf(pattern);
while (foundIndex > -1) {
System.out.println("Match found at: " + foundIndex);
foundIndex = data.indexOf(pattern, foundIndex + pattern.length());
}
Based on your request, you can use this algorithm to search for your positions:
1) We check if we reach at the end of the string, to avoid the invalidIndex error, we verify if the remaining substring's size is smaller than the pattern's length.
2) We calculate the substring at each iteration and we verify the string with the pattern.
List<Integer> positionList = new LinkedList<>();
String inputString = "AAACABCCCABC";
String pattern = "ABC";
for (int i = 0 ; i < inputString.length(); i++) {
if (inputString.length() - i < pattern.length()){
break;
}
String currentSubString = inputString.substring(i, i + pattern.length());
if (currentSubString.equals(pattern)){
positionList.add(i);
}
}
for (Integer pos : positionList) {
System.out.println(pos); // Positions : 4 and 9
}
EDIT :
Maybe it can be optimized, not to use a Collection for this simple task, but I used a LinkedList to write a quicker approach.
My task is splitting a string, which starts with numbers and contains numbers and letters, into two sub-strings.The first one consists of all numbers before the first letter. The second one is the remained part, and shouldn't be split even if it contains numbers.
For example, a string "123abc34de" should be split as: "123" and "abc34de".
I know how to write a regular expression for such a string, and it might look like this:
[0-9]{1,}[a-zA-Z]{1,}[a-zA-Z0-9]{0,}
I have tried multiple times but still don't know how to apply regex in String.split() method, and it seems very few online materials about this. Thanks for any help.
you can do it in this way
final String regex = "([0-9]{1,})([a-zA-Z]{1,}[a-zA-Z0-9]{0,})";
final String string = "123ahaha1234";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
matcher.group(1) contains the first part and matcher.group(2) contains the second
you can add it to a list/array using these values
You can use a pretty simple pattern : "^(\\d+)(\\w+)" which capture digits as start, and then when letters appear it take word-char
String string = "123abc34de";
Matcher matcher = Pattern.compile("^(\\d+)(\\w+)").matcher(string);
String firstpart = "";
String secondPart = "";
if (matcher.find()) {
firstpart = matcher.group(1);
secondPart = matcher.group(2);
}
System.out.println(firstpart + " - " + secondPart); // 123 - abc34de
This is not the correct way but u will get the result
public static void main(String[] args) {
String example = "1234abc123";
int index = 0;
String[] arr = new String[example.length()];
for (int i = 0; i < example.length(); i++) {
arr = example.split("");
try{
if(Integer.parseInt(arr[i]) >= 0 & Integer.parseInt(arr[i]) <= 9){
index = i;
}
else
break;
}catch (NumberFormatException e) {
index = index;
}
}
String firstHalf = example.substring(0,Integer.parseInt(arr[index])+1);
String secondHalf = example.substring(Integer.parseInt(arr[index])+1,example.length());
System.out.println(firstHalf);
System.out.println(secondHalf);
}
Output will be: 1234 and in next line abc123
I'm using regex to control an input and I want to get the exact index of the wrong char.
My regex is :
^[A-Z]{1,4}(/[1-2][0-9][0-9][0-9][0-1][0-9])?
If I type the following input :
DATE/201A08
Then macher.group() (using lookingAt() method) will return "DATE" instead of "DATE/201". Then, I can't know that the wrong index is 9.
If I read this right, you can't do this using only one regex.
^[A-Z]{1,4}(/[1-2][0-9][0-9][0-9][0-1][0-9])? assumes either a String starting with 1 to 4 characters followed by nothing, or followed by / and exactly 6 digits. So it correctly parses your input as "DATE" as it is valid according to your regex.
Try to split this into two checks. First check if it's a valid DATE
Then, if there's an actual / part, check this against the non-optional pattern.
You want to know whether the entire pattern matched, and when not, how far it matched.
There regex fails. A regex test must succeed to give results in group(). If it also succeeds on a part, one does not know whether all was matched.
The sensible thing to do is split the matching.
public class ProgressiveMatch {
private final String[] regexParts;
private String group;
ProgressiveMatch(String... regexParts) {
this.regexParts = regexParts;
}
// lookingAt with (...)?(...=)?...
public boolean lookingAt(String text) {
StringBuilder sb = new StringBuilder();
sb.append('^');
for (int i = 0; i < regexParts.length; ++i) {
String part = regexParts[i];
sb.append("(");
sb.append(part);
sb.append(")?");
}
Pattern pattern = Pattern.compile(sb.toString());
Matcher m = pattern.matcher(text);
if (m.lookingAt()) {
boolean all = true;
group = "";
for (int i = 1; i <= regexParts.length; ++i) {
if (m.group(i) == null) {
all = false;
break;
}
group += m.group(i);
}
return all;
}
group = null;
return false;
}
// lookingAt with multiple patterns
public boolean lookingAt(String text) {
for (int n = regexParts.length; n > 0; --n) {
// Match for n parts:
StringBuilder sb = new StringBuilder();
sb.append('^');
for (int i = 0; i < n; ++i) {
String part = regexParts[i];
sb.append(part);
}
Pattern pattern = Pattern.compile(sb.toString());
Matcher m = pattern.matcher(text);
if (m.lookingAt()) {
group = m.group();
return n == regexParts.length;
}
}
group = null;
return false;
}
public String group() {
return group;
}
}
public static void main(String[] args) {
// ^[A-Z]{1,4}(/[1-2][0-9][0-9][0-9][0-1][0-9])?
ProgressiveMatch match = new ProgressiveMatch("[A-Z]{1,4}", "/",
"[1-2]", "[0-9]", "[0-9]", "[0-9]", "[0-1]", "[0-9]");
boolean matched = match.lookingAt("DATE/201A08");
System.out.println("Matched: " + matched);
System.out.println("Upto; " + match.group());
}
One could make a small DSL in java, like:
ProgressiveMatch match = ProgressiveMatchBuilder
.range("A", "Z", 1, 4)
.literal("/")
.range("1", "2")
.range("0", "9", 3, 3)
.range("0", "1")
.range("0", "9")
.match();
I want to write a function to extract various number of values from a String according to a regex pattern:
Here is my function code:
/**
* Get substrings in a string using groups in regular expression.
*
* #param str
* #param regex
* #return
*/
public static String[] regexMatch(String str, String regex) {
String[] rtn = null;
if (str != null && regex != null) {
Pattern pat = Pattern.compile(regex);
Matcher matcher = pat.matcher(str);
if (matcher.find()) {
int nGroup = matcher.groupCount();
rtn = new String[nGroup];
for (int i = 0; i < nGroup; i++) {
rtn[i] = matcher.group(i);
}
}
}
return rtn;
}
When I test it using:
String str = "nets-(90000,5,4).dat";
String regex = "(\\d+),(\\d+),(\\d+)";
String[] rtn = regexMatch(str, regex);
I get:
rtn: [90000,5,4,90000,5]
How can I get rtn to be [90000,5,4] as I expected?
Your array currently store
[0] -> 90000,5,4
[1] -> 90000
[2] -> 5
That is why you are seeing as output [90000,5,4,90000,5]. It is because group(0) represents entire match so it returns 90000,5,4.
What you need is match from groups 1, 2 and 3.
(\\d+),(\\d+),(\\d+)
1 2 3
So change
rtn[i] = matcher.group(i);
to
rtn[i] = matcher.group(i+1);
First, I would start the for loop with 1 so you can get the grouping you are declaring in your regex. The loop should look like this:
for (int i = 1; i <= nGroup; i++) {
rtn[i] = matcher.group(i);
}
Group 0 is known to be the entire matching string for your regex. The grouping is from:
String regex = "(\\d+),(\\d+),(\\d+)";
You would say matcher.group(1), matcher.group(2), and matcher.group(3) will give you what you want.
At input i have some string : "today snowing know " , here i have 3 words , so i must to parse them is such way : every character i must compare with all other characters , and to sum how many same characters these words have , like exemple for "o" letter will be 2 (from "today" and "snowing") or "w" letter will be 2 (from "know" and "snowing"). After that i must to replace these characters with number(transformed in char format) of letters. The result should be "13111 133211 1332".
What i did ?
First i tape some words and
public void inputStringsForThreads () {
boolean flag;
do {
// will invite to input
stringToParse = Input.value();
try {
flag = true;
// in case that found nothing , space , number and other special character , throws an exception
if (stringToParse.equals("") | stringToParse.startsWith(" ") | stringToParse.matches(".*[0-9].*") | stringToParse.matches(".*[~`!##$%^&*()-+={};:',.<>?/'_].*"))
throw new MyStringException(stringToParse);
else analizeString(stringToParse);
}
catch (MyStringException exception) {
stringToParse = null;
flag = false;
exception.AnalizeException();
}
}
while (!flag);
}
I eliminate spaces between words , and from those words make just one
static void analizeString (String someString) {
// + sign treat many spaces as one
String delimitator = " +";
// words is a String Array
words = someString.split(delimitator);
// temp is a string , will contain a single word
temp = someString.replaceAll("[^a-z^A-Z]","");
System.out.println("=============== Words are : ===============");
for (int i=0;i<words.length;i++)
System.out.println((i+1)+")"+words[i]);
}
So i try to compare for every word in part (every word is split in letters) with all letter from all words , But i don know how to count number of same letter and after replace letters with correct number of each letter??? Any ideas ?
// this will containt characters for every word in part
char[] motot = words[id].toCharArray();
// this will containt all characters from all words
char[] notot = temp.toCharArray();
for (int i =0;i<words[i].length();i++)
for (int j=0;j<temp.length ;j++)
{
if (i == j) {
System.out.println("Same word");
}
else if (motot[i] == notot[j] ) {
System.out.println("Found equal :"+lol[i]+" "+lol1[j]);
}}
For counting you might want to use a Map<Character, Integer> counter like java.util.HashMap. If getting a Value(Integer) using a specific key (Character) from counter is 'not null', then your value++ (leverage autoboxing). Otherwise put a new entry (char, 1) in the counter.
Replacing the letters with the numbers should be fairly easy then.
It is better to use Pattern Matching like this:
initially..
private Matcher matcher;
Pattern regexPattern = Pattern.compile( pattern );
matcher = regexPattern.matcher("");
for multiple patterns to match.
private final String[] patterns = new String [] {/* instantiate patterns here..*/}
private Matcher matchers[];
for ( int i = 0; i < patterns.length; i++) {
Pattern regexPattern = Pattern.compile( pattern[i] );
matchers[i] = regexPattern.matcher("");
}
and then for matching pattern.. you do this..
if(matcher.reset(charBuffer).find() ) {//matching pattern.}
for multiple matcher check.
for ( int i = 0; i < matchers.length; i++ ) if(matchers[i].reset(charBuffer).find() ) {//matching pattern.}
Don't use string matching, not efficient.
Always use CharBuffer instead of String.
Here is some C# code (which is reasonably similar to Java):
void replace(string s){
Dictionary<char, int> counts = new Dictionary<char, int>();
foreach(char c in s){
// skip spaces
if(c == ' ') continue;
// update count for char c
if(!counts.ContainsKey(c)) counts.Add(c, 1);
else counts[c]++;
}
// replace characters in s
for(int i = 0; i < s.Length; i++)
if(s[i] != ' ')
s[i] = counts[s[i]];
}
Pay attention to immutable strings in the second loop. Might want to use a StringBuilder of some sort.
Here is a solution that works for lower case strings only. Horrible horrible code, but I was trying to see how few lines I could write a solution in.
public static String letterCount(String in) {
StringBuilder out = new StringBuilder(in.length() * 2);
int[] count = new int[26];
for (int t = 1; t >= 0; t--)
for (int i = 0; i < in.length(); i++) {
if (in.charAt(i) != ' ') count[in.charAt(i) - 'a'] += t;
out.append((in.charAt(i) != ' ') ? "" + count[in.charAt(i) - 'a'] : " ");
}
return out.substring(in.length());
}