Why is my String array length 3 instead of 2? - java

I'm trying to understand regex. I wanted to make a String[] using split to show me how many letters are in a given string expression?
import java.util.*;
import java.io.*;
public class Main {
public static String simpleSymbols(String str) {
String result = "";
String[] alpha = str.split("[\\+\\w\\+]");
int alphaLength = alpha.length;
// System.out.print(alphaLength);
String[] charCount = str.split("[a-z]");
int charCountLength = charCount.length;
System.out.println(charCountLength);
}
}
My input string is "+d+=3=+s+". I split the string to count the number of letters in string. The array length should be two but I'm getting three. Also, I'm trying to make a regex to check the pattern +b+, with b being any letter in the alphabet? Is that correct?

So, a few things pop out to me:
First, your regex looks correct. If you're ever worried about how your regex will perform, you can use https://regexr.com/ to check it out. Just put your regex on the top and enter your string in the bottom to see if it is matching correctly
Second, upon close inspection, I see you're using the split function. While it is convenient for quickly splitting strings, you need to be careful as to what you are splitting on. In this case, you're removing all of the strings that you were initially looking at, which would make it impossible to find. If you print it out, you would notice that the following shows (for an input string of +d+=3=+s+):
+
+=3=+
+
Which shows that you accidentally cut out what you were looking to find in the first place. Now, there are several ways of fixing this, depending on what your criteria is.
Now, if what you wanted was just to separate on all +s and it doesn't matter that you find only what is directly bounded by +s, then split works awesome. Just do str.split("+"), and this will return you a list of the following (for +d+=3=+s+):
d
=3=
s
However, you can see that this poses a few problems. First, it doesn't strip out the =3= that we don't want, and second, it does not truly give us values that are surrounded by a +_+ format, where the underscore represents the string/char you're looking for.
Seeing as you're using +w, you intend to find words that are surrounded by +s. However, if you're just looking to find one character, I would suggest using another like [a-z] or [a-zA-Z] to be more specific. However, if you want to find multiple alphabetical characters, your pattern is fine. You can also add a * (0 or more) or a + (1 or more) at the end of the pattern to dictate what exactly you're looking for.
I won't give you the answer outright, but I'll give you a clue as to what to move towards. Try using a pattern and a matcher to find the regex that you listed above and then if you find a match, make sure to store it somewhere :)
Also, for future reference, you should always start a function name with a lower case, at least in Java. Only constants and class names should start in a capital :)

I am trying to use split to count the number of letters in that string. The array length should be two, but I'm getting three.
The regex in the split functions is used as delimiters and will not be shown in results. In your case "str.split([a-z])" means using alphabets as delimiters to separate your input string, which makes three substrings "(+)|d|(+=3=+)|s|(+)".
If you really want to count the number of letters using "split", use 'str.split("[^a-z]")'. But I would recommend using "java.util.regex.Matcher.find()" in order to find out all letters.
Also, I'm trying to make a regex to check the pattern +b+, with b being any letter in the alphabet? Is that correct?
Similarly, check the functions in "java.util.regex.Matcher".

Related

Rearranging one string to another in Java

I am trying to find whether a part of given string A can be or can not be rearranged to given string B (Boolean output).
Since the algorithm must be at most O(n), to ease it, I used stringA.retainAll(stringB), so now I know string A and string B consist of the same set of characters and now the whole task smells like regex.
And .. reading about regex, I might be now having two problems(c).
The question is, do I potentially face a risk of getting O(infinity) by using regex or its more efficient to use StreamAPI with the purpose of finding whether each character of string A has enough duplicates to cover each of character of string B? Let alone regex syntax is not easy to read and build.
As of now, I can't use sorting (any sorting is at least n*log(n)) nor hashsets and the likes (as it eliminates duplicates in both strings).
Thank you.
You can use a HashMap<Character,Integer> to count the number of occurrences of each character of the first String. That would take linear time.
Then, for each Character of the second String, find if it's in the HashMap and decrement the counter (if it's still positive). This will also take linear time, and if you manage to decrement the counters for all the characters of the second String, you succeed.

Regex to match a fixed sub string in a String

I am trying to write a regular expression to verify the presence of a specific number in a fixed position in a String.
String: 109300300330066611111111100000000017000656052086116020170111Name 1
Number to find: 111111111 (Staring from position 17)
I have written the following regular expression:
^.{16}(?<Ones>111111111)(.*)
My understanding is:
Let first 16 characters be whatever they are
Use the Named Capturing Group to grab the specific word
Let the rest of the characters be whatever they are
I am new to regex, is there any issue with the above approach?
Can it be done in other/better way?
I am using Java 8.
Without more details of why you're doing what you're doing, there's just one possible improvement I can see. You repeated any character 16 times at the beginning of the string rather than writing out 16 .s, which is nice and readable, but then, it would be nice to do the same for the repeated 1s:
^.{16}(?<Ones>1{9})(.*)
Otherwise, the string of 1s is hard to understand without the coder manually counting how many there are in the regex.
If you want to hard-code the ones and you know the starting position and you just wnat to know if it is there, using a regex seems unnecessary. you can use this:
String s = "109300300330066611111111100000000017000656052086116020170111Name 1";
if (s.indexOf("111111111").equals(16) doSomething();
Another possible solution without regex:
if(s.substring(16,25).equals("111111111") doSomething();
Otherwise your regex looks good.

Replacing substrings in String

I am 16 and trying to learn Java, I have a paper that my uncle gave me that has things to do in Java. One of these things is too write and execute a program that will accept an extended message as a string such as
Each time she saw the painting, she was happy
and replace the word she with the word he.
Each time he saw the painting, he was happy.
This part is simple, but he wants me to be able to take any form of she and replace it we he like (she to he, She to He, she? to he?, she. to he., she' to he' and so on). Can someone help me make a program to accomplish this.
I have this
public static void main(String[] args) {
Scanner keyboard = new Scanner(System.in);
System.out.println("Write Sentence");
String original = keyboard.nextLine();
String changeWord = "he";
String modified = original.replaceAll("she", changeWord);
System.out.println(modified);
}
If this isn't the right site to find answers like this, can you redirect me to a site that answers such questions?
The best way to do this is with regular expressions (regex). Regex allow you to match patterns or classes of words so you can deal with general cases. Consider the cases you have already listed:
(she to he, She to He, she? to he?, she. to he., she' to he' and so on)
What is common between these cases? Can you think of some general rule(s) that would apply to all such transformations?
But also consider some cases you haven't listed: for example, as you've written it now, your code will change the word "ashes" to "ahes" because "ashes" contains "she." A properly written regex expression allows you to avoid this.
Before delving into regex, try and express, in plain English, a rule or set of rules for what you want to replace and what it should be replaced with.
Then, learn some regex and attempt to apply those rules.
Lastly, try and write some tests (i.e. using JUnit) for various cases so you can see which cases your code is working for and which cases it isn't working for.
Once you have done this, if something still doesn't work, feel free to post a new question here showing us your code and explaining what doesn't work. We'll be happy to help.
I would recommend this regular expression to solve this. It seems you have to search and replace separately the uppercase S and the lowercase s
String modified = original
.replaceAll("(she)(\\W)", "he$2")
.replaceAll("(She)(\\W)", "He$2");
Explanation :
The pattern (she) will match the word she and store it as the first captured group of characters
The pattern (\\W) will match one non alphabetic character (e.g. ', .) and store it as the second captured group of characters
Both of these patterns must match consecutive parts of the input string for replaceAll to replace something.
"he$2" put in the resulting string the word he followed by the second captured group of characters (in our case the group has only one character)
The above means that the regular expression will match a pattern like She'll and replace with He'll, but it will not match a pattern like Sherlock because here She is followed by an alphabetic character r

How to retrieve portion of number that's within parenthesis in Java?

For part of my Java assignment I'm required to select all records that have a certain area code. I have custom objects within an ArrayList, like ArrayList<Foo>.
Each object has a String phoneNumber variable. They are formatted like "(555) 555-5555"
My goal is to search through each custom object in the ArrayList<Foo> (call it listOfFoos) and place the objects with area code "616" in a temporaryListOfFoos ArrayList<Foo>.
I have looked into tokenizers, but was unable to get the syntax correct. I feel like what I need to do is similar to this post, but since I'm only trying to retrieve the first 3 digits (and I don't care about the remaining 7), this really didn't give me exactly what I was looking for. Ignore parentheses with string tokenizer?
What I did as a temporary work-around, was...
for (int i = 0; i<listOfFoos.size();i++){
if (listOfFoos.get(i).getPhoneNumber().contains("616")){
tempListOfFoos.add(listOfFoos.get(i));
}
}
This worked for our current dataset, however, if there was a 616 anywhere else in the phone numbers [like "(555) 616-5555"] it obviously wouldn't work properly.
If anyone could give me advice on how to retrieve only the first 3 digits, while ignoring the parentheses, I would greatly appreciate it.
You have two options:
Use value.startsWith("(616)") or,
Use regular expressions with this pattern "^\(616\).*"
The first option will be a lot quicker.
areaCode = number.substring(number.indexOf('(') + 1, number.indexOf(')')).trim() should do the job for you, given the formatting of phone numbers you have.
Or if you don't have any extraneous spaces, just use areaCode = number.substring(1, 4).
I think what you need is a capturing group. Have a look at the Groups and capturing section in this document.
Once you are done matching the input with a pattern (for example "\((\\d+)\) \\d+-\\d+"), you can get the number in the parentheses using a matcher (object of java.util.regex.Matcher) with matcher.group(1).
You could use a regular expression as shown below. The pattern will ensure the entire phone number conforms to your pattern ((XXX) XXX-XXXX) plus grabs the number within the parentheses.
int areaCodeToSearch = 555;
String pattern = String.format("\\((%d)\\) \\d{3}-\\d{4}", areaCodeToSearch);
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(phoneNumber);
if (m.matches()) {
String areaCode = m.group(1);
// ...
}
Whether you choose to use a regular expression versus a simple String lookup (as mentioned in other answers) will depend on how bothered you are about the format of the entire string.

Java pattern matcher find multiple strings

I am using Pattern.compile() to find if a text string contains two other strings. But it needs to be in one regex pattern.
For example the string must have "StringOne" and "StringTwo" in it.
I could do Pattern.compile("(StringOne StringTwo|StrinTwo StringOne"), but both strings are quite long and I want to see if I can compress it.
If I do "(StringOne )?StringTwo( StringOne)?" it would match "StringTwo" and "StringOne StringTwo StringOne".
Use this regex:
^(?=.*\\bStringOne\\b)(?=.*\\bStringTwo\\b)
This uses two look-aheads anchored to start of input to assert that both strings appear somewhere
Edit:
Added word boundaries \b to ends of strings to prevent matches of one string within another, although this was not a stated requirement of the question.
There is question of speed.
You could probably use lookaheads to accomplish this, but it's costly speed-wise. lookaheads are really expansive on long strings.
If the strings are long, the faster approach would be to do two separate matches.
If you really need to do one, use your original way string A string B|String B String A

Categories