Java Finding all words begining with a letter - java

I am trying to get all words that begin with a letter from a long string. How would you do this is java? I don't want to loop through every letter or something inefficient.
EDIT: I also can't use any in built data structures (except arrays of course)- its for a cs class. I can however make my own data structures (which i have created sevral).

You could try obtaining an array collection from your String and then iterating through it:
String s = "my very long string to test";
for(String st : s.split(" ")){
if(st.startsWith("t")){
System.out.println(st);
}
}

You need to be clear about some things. What is a "word"? You want to find only "words" starting with a letter, so I assume that words can have other characters too. But what chars are allowed? What defines the start of such a word? Whitespace, any non letter, any non letter/non digit, ...?
e.g.:
String TestInput = "test séntènce îwhere I'm want,to üfind 1words starting $with le11ers.";
String regex = "(?<=^|\\s)\\pL\\w*";
Pattern p = Pattern.compile(regex, Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = p.matcher(TestInput);
while (matcher.find()) {
System.out.println(matcher.group());
}
The regex (?<=^|\s)\pL\w* will find sequences that starts with a letter (\pL is a Unicode property for letter), followed by 0 or more "word" characters (Unicode letters and numbers, because of the modifier Pattern.UNICODE_CHARACTER_CLASS).
The lookbehind assertion (?<=^|\s) ensures that there is the start of the string or a whitespace before the sequence.
So my code will print:
test
séntènce ==> contains non ASCII letters
îwhere ==> starts with a non ASCII letter
I ==> 'm is missing, because `'` is not in `\w`
want
üfind ==> starts with a non ASCII letter
starting
le11ers ==> contains digits
Missing words:
,to ==> starting with a ","
1words ==> starting with a digit
$with ==> starting with a "$"

You could build a HashMap -
HashMap<String,String> map = new HashMap<String,String>();
example -
ant, bat, art, cat
Hashmap
a -> ant,art
b -> bat
c -> cat
to find all words that begin with "a", just do
map.get("a")

You can get the first letter of the string and check with API method that if it is letter or not.
String input = "jkk ds 32";
String[] array = input.split(" ");
for (String word : array) {
char[] arr = word.toCharArray();
char c = arr[0];
if (Character.isLetter(c)) {
System.out.println( word + "\t isLetter");
} else {
System.out.println(word + "\t not Letter");
}
}
Following are some sample output:
jkk isLetter
ds isLetter
32 not Letter

Scanner scan = new Scanner(text); // text being the string you are looking in
char test = 'x'; //whatever letter you are looking for
while(scan.hasNext()){
String wordFound = scan.next();
if(wordFound.charAt(0)==test){
//do something with the wordFound
}
}
this will do what you are looking for, inside the if statement do what you want with the word

Regexp way:
public static void main(String[] args) {
String text = "my very long string to test";
Matcher m = Pattern.compile("(^|\\W)(\\w*)").matcher(text);
while (m.find()) {
System.out.println("Found: "+m.group(2));
}
}

You can use split() method. Here is an example :
String string = "your string";
String[] parts = string.split(" C");
for(int i=0; i<parts.length; i++) {
String[] word = parts[i].split(" ");
if( i > 0 ) {
// ignore the rest words because don't starting with C
System.out.println("C" + word[0]);
}
else { // Check 1st excplicitly
for(int j=0; j<word.length; j++) {
if ( word[j].startsWith("c") || word[j].startsWith("C"))
System.out.println(word[j]);
}
}
}
where "C" is you letter. Just then loop around the array. For parts[0] you have to check if it starts with "C". It was my mistake to start looping from i=1. The correct is from 0.

Related

(hello-> h3o) How to replace in a String the middle letters for the number of letters replaced

I need to build a method which receive a String e.g. "elephant-rides are really fun!". and return another similar String, in this example the return should be: "e6t-r3s are r4y fun!". (because e-lephan-t has 6 middle letters, r-ide-s has 3 middle letters and so on)
To get that return I need to replace in each word the middle letters for the number of letters replaced leaving without changes everything which isn't a letter and the first and the last letter of every word.
for the moment I've tried using regex to split the received string into words, and saving these words in an array of strings also I have another array of int in which I save the number of middle letters, but I don't know how to join both arrays and the symbols into a correct String to return
String string="elephant-rides are really fun!";
String[] parts = string.split("[^a-zA-Z]");
int[] sizes = new int[parts.length];
int index=0;
for(String aux: parts)
{
sizes[index]= aux.length()-2;
System.out.println( sizes[index]);
index++;
}
You may use
String text = "elephant-rides are really fun!";
Pattern r = Pattern.compile("(?U)(\\w)(\\w{2,})(\\w)");
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(1) + m.group(2).length() + m.group(3));
}
m.appendTail(sb); // append the rest of the contents
System.out.println(sb);
// => e6t-r3s are r4y fun!
See the Java demo
Here, (?U)(\\w)(\\w{2,})(\\w) matches any Unicode word char capturing it into Group 1, then captures any 2 or more word chars into Group 2 and then captures a single word char into Group 3, and inside the .appendReplacement method, the second group contents are "converted" into its length.
Java 9+:
String text = "elephant-rides are really fun!";
Pattern r = Pattern.compile("(?U)(\\w)(\\w{2,})(\\w)");
Matcher m = r.matcher(text);
String result = m.replaceAll(x -> x.group(1) + x.group(2).length() + x.group(3));
System.out.println( result );
// => e6t-r3s are r4y fun!
For the instructions you gave us, this would be sufficient:
String [] result = string.split("[\\s-]");
for (int i=0; i<result.length; i++){
result[i] = "" + result[i].charAt(0) + ((result[i].length())-2) + result[i].charAt(result[i].length()-1);
}
With your input, it creates the array [ "e6t", "r3s", "a1e", "r4y", "f2!" ]
And it works even with one or two sized words, but it gives result such as:
Input: I am a small; Output: [ "I-1I", "a0m", "a-1a", "s3l" ]
Again, for the instructions you gave us this would be legal.
Hope I helped!

How to exclude the words that have non-alphabetic characters from string

For example, if I want to delete the non-alphabetic characters I would do:
for (int i = 0; i < s.length; i++) {
s[i] = s[i].replaceAll("[^a-zA-Z]", "");
}
How do I completely exclude a word with a non-alphabetic character from the string?
For example:
Initial input:
"a cat jumped jumped; on the table"
It should exclude "jumped;" because of ";".
Output:
"a cat jumped on the table"
Edit: (in response to your edit)
You could do this:
String input = "a cat jumped jumped; on the table";
input = input.replaceAll("(^| )[^ ]*[^A-Za-z ][^ ]*(?=$| )", "");
Let's break down the regex:
(^| ) matches after the beginning of a word, either after a space or after the start of the string.
[^ ]* matches any sequence, including the null string, of non-spaces (because spaces break the word)
[^A-Za-z ] checks if the character is non-alphabetical and does not break the string.
Lastly, we need to append [^ ]* to make it match until the end of the word.
(?=$| ) matches the end of the word, either the end of the string or the next space character, but it doesn't consume the next space, so that consecutive words will still match (ie "I want to say hello, world! everybody" becomes "I want to say everybody")
Note: if "a cat jumped off the table." should output "a cat jumped off the table", then use this:
input = input.replaceAll(" [^ ]*[^A-Za-z ][^ ]*(?= )", "").replaceAll("[^A-Za-z]$", "");
Assuming you have 1 word per array element, you can do this to replace them with the empty string:
for (String string: s) {
if (s.matches(".*[^A-Za-z].*") {
s = "";
}
}
If you actually want to remove it, consider using an ArrayList:
ArrayList<String> stringList = new ArrayList<>();
for (int index = 0; index < s.length; index++) {
if (s[index].matches(".*[^A-Za-z].*") {
stringList.add(s[index]);
}
}
And the ArrayList will have all the elements that don't have non-alphabetical characters in them.
Try this:
s = s[i].join(" ").replaceAll("\\b\\w*\\W+\\w*(?=\\b)", "").split(" ");
It joins the array with spaces, then applies the regex. The regex looks for a word break (\b), then a word with at least one non-word character (\w*\W+\w*), and then a word break at the end (not matched, there will still be a space). The split splits the string into an array.
public static void main(String[] args) throws ClassNotFoundException {
String str[] ={ "123abass;[;[]","abcde","1234"};
for(String s : str)
{
if(s.matches("^[a-zA-Z]+$")) // should start and end with [a-zA-Z]
System.out.println(s);
}
O/P : abcde
You could use .toLowerCase() on each value in the array, then search the array against a-z values and it will be faster than a regular expression. Assume that your values are in an array called "myArray."
List<String> newValues = new ArrayList<>();
for(String s : myArray) {
if(containsOnlyLetters(s)) {
newValues.add(s);
}
}
//do this if you have to go back to an array instead of an ArrayList
String[] newArray = (String[])newValues.toArray();
This is the containsOnlyLetters method:
boolean containsOnlyLetters(String input) {
char[] inputLetters = input.toLowerCase().toCharArray();
for(char c : inputLetters) {
if(c < 'a' || c > 'z') {
return false;
}
}
return true;
}

How to get all substrings occurring between two characters?

If I wanted to pull all substrings between two characters(general) along a String how would I do that?
I also want to keep the first char I match but not the second one.
So, for example, if I wanted ot keep the characters between a # char and either the next whitespace OR next of another char (in this case # again, but could be anything) and I had a string, say : "hello i'm #chilling#likeAVillain but like #forreal"
How would I get, say a Set of [#chilling, #likeAVillain, #forreal]
I'm having difficulty because of the either/or end substring case - I want the substring starting with # and ending before the first occurence of either another # or a whitespace (or the end of the string if neither of those are found)
Put simplest in sudocode:
for every String W between [char A, either (char B || char C)) // notice [A,B) - want the
//first to be inclusive
Set.add(W);
This regex #\\w+ seems to do what you need. It will find # and all alphanumeric characters after it. Since whitespace is not part of \\w it will not be included in your match.
String s = "hello i'm #chilling#likeAVillain but like #forreal";
Pattern p = Pattern.compile("#\\w+");
Matcher m = p.matcher(s);
while (m.find())
System.out.println(m.group());
output:
#chilling
#likeAVillain
#forreal
public static void main(String[] args) throws Exception{
String s1 = "hello i'm #chilling#likeAVillain but like #forreal";
String[] strArr = s1.split("\\#");
List<String> strOutputArr = new ArrayList<String>();
int i = 0;
for(String str: strArray){
if(i>0){
strOutputArray.add("#" + str.split("\\s+")[0]);
}
i++;
}
System.out.println(strOutputArray.toString());
}

How can I remove punctuation from input text in Java?

I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:
String[] words = instring.split("\\s+");
for (int i = 0; i < words.length; i++) {
words[i] = words[i].toLowerCase();
}
String[] wordsout = new String[50];
Arrays.fill(wordsout,"");
int e = 0;
for (int i = 0; i < words.length; i++) {
if (words[i] != "") {
wordsout[e] = words[e];
wordsout[e] = wordsout[e].replaceAll(" ", "");
e++;
}
}
return wordsout;
I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.
This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:
String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");
Spaces are initially left in the input so the split will still work.
By removing the rubbish characters before splitting, you avoid having to loop through the elements.
You can use following regular expression construct
Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~
inputString.replaceAll("\\p{Punct}", "");
You may try this:-
Scanner scan = new Scanner(System.in);
System.out.println("Type a sentence and press enter.");
String input = scan.nextLine();
String strippedInput = input.replaceAll("\\W", "");
System.out.println("Your string: " + strippedInput);
[^\w] matches a non-word character, so the above regular expression will match and remove all non-word characters.
If you don't want to use RegEx (which seems highly unnecessary given your problem), perhaps you should try something like this:
public String modified(final String input){
final StringBuilder builder = new StringBuilder();
for(final char c : input.toCharArray())
if(Character.isLetterOrDigit(c))
builder.append(Character.isLowerCase(c) ? c : Character.toLowerCase(c));
return builder.toString();
}
It loops through the underlying char[] in the String and only appends the char if it is a letter or digit (filtering out all symbols, which I am assuming is what you are trying to accomplish) and then appends the lower case version of the char.
I don't like to use regex, so here is another simple solution.
public String removePunctuations(String s) {
String res = "";
for (Character c : s.toCharArray()) {
if(Character.isLetterOrDigit(c))
res += c;
}
return res;
}
Note: This will include both Letters and Digits
If your goal is to REMOVE punctuation, then refer to the above. If the goal is to find words, none of the above solutions does that.
INPUT: "This. and:that. with'the-other".
OUTPUT: ["This", "and", "that", "with", "the", "other"]
but what most of these "replaceAll" solutions is actually giving you is:
OUTPUT: ["This", "andthat", "withtheother"]

Java - how can i remove a word after using a special symbol

How can i delete a word which is followed by a special character ( # )? Ex: From a phrase: Eva have green#eyes I would like to delete the word "green" (Eva have eyes), that is the word that ends on a space.
Also I have problem with deleting the whole line that is followed by #,
ex: Eva have green#eyes ---> and should be "eyes"
my code (I have no idea haw can I complete it):
class Stack {
public Stack() {
del = '&'; //that method deletes only one character
//ex: ab&cd ---> acd
destroy = '#';
delWord = '#';
stack = new Stack<Character>();
}
public void methods(String string)
{
for (int i=0; i< string.length(); i++) //that method deletes only one character
//ex: ab&cd ---> acd
{
if ( del == string.charAt(i) )
{
if (stack.size() > 0)
stack.remove(stack.size()-1);
}
else if( destroy == string.charAt(i) )
{
stack.remove(stack.size());
}
}
}
}
Yes, using the replaceAll method in the String class, e.g.
String myString = "eyes#green"
myString = myString.replaceAll("#.*", " ");
The first argument is a regular expression matching what you want to replace. In this case hashtag followed by anything (i.e. matching the remainder of the string).
The second argument is what it should be replaced with, in this case a space.
Modify the regex as needed for your purposes.
you can transform your string into array:
String[] string = plainText.split(" ,;[(!)*=]")//not using your #
then transform it to ArrayList, using simple for(String word: arraylist)
check every word for existance of # and if you find such word not to write it to the other ArrayList or right to the String array
To delet the whole line that is followed by # , Try this:
String s = "Eva have green#eyes";
s = s.replaceAll(".*#"," ");
System.out.println(s);
The Output : eyes
and to remove word before # , Try this:
String s= "Eva have green#eyes ";
s= s.replaceAll("\\s*(\\w+)#", " ");
System.out.println(s);
The Output : Eva have eyes

Categories