How to get all substrings occurring between two characters?

How to get all substrings occurring between two characters? - java

If I wanted to pull all substrings between two characters(general) along a String how would I do that?
I also want to keep the first char I match but not the second one.
So, for example, if I wanted ot keep the characters between a # char and either the next whitespace OR next of another char (in this case # again, but could be anything) and I had a string, say : "hello i'm #chilling#likeAVillain but like #forreal"
How would I get, say a Set of [#chilling, #likeAVillain, #forreal]
I'm having difficulty because of the either/or end substring case - I want the substring starting with # and ending before the first occurence of either another # or a whitespace (or the end of the string if neither of those are found)
Put simplest in sudocode:
for every String W between [char A, either (char B || char C)) // notice [A,B) - want the
//first to be inclusive
Set.add(W);

This regex #\\w+ seems to do what you need. It will find # and all alphanumeric characters after it. Since whitespace is not part of \\w it will not be included in your match.
String s = "hello i'm #chilling#likeAVillain but like #forreal";
Pattern p = Pattern.compile("#\\w+");
Matcher m = p.matcher(s);
while (m.find())
System.out.println(m.group());
output:
#chilling
#likeAVillain
#forreal

public static void main(String[] args) throws Exception{
String s1 = "hello i'm #chilling#likeAVillain but like #forreal";
String[] strArr = s1.split("\\#");
List<String> strOutputArr = new ArrayList<String>();
int i = 0;
for(String str: strArray){
if(i>0){
strOutputArray.add("#" + str.split("\\s+")[0]);
}
i++;
}
System.out.println(strOutputArray.toString());
}

Related

Java: How to replace consecutive characters with a single character?

How can I replace consecutive characters with a single character in java?
String fileContent = "def mnop.UVW";
String oldDelimiters = " .";
String newDelimiter = "!";
for (int i = 0; i < oldDelimiters.length(); i++){
Character character = oldDelimiters.charAt(i);
fileContent = fileContent.replace(String.valueOf(character), newDelimiter);
}
Current output: def!!mnop!UVW
Desired output: def!mnop!UVW
Notice the two spaces are replaced with two exclamation marks. How can I replace consecutive delimiters with one delimiter?

Since you want to match consecutive characters from the old delimiter, a regex solution doesn't seem to be feasible here. You can instead match char by char if it belongs to one of the old delimiter chars and then set it with the new one as shown below.
import java.util.*;
public class Main{
public static void main(String[] args) {
String fileContent = "def mnop.UVW";
String oldDelimiters = " .";
// add all old delimiters in a set for fast checks
Set<Character> set = new HashSet<>();
for(int i=0;i<oldDelimiters.length();++i) set.add(oldDelimiters.charAt(i));
/*
match all consecutive chars at once, check if it belongs to an old delimiter
and replace it with the new one
*/
String newDelimiter = "!";
StringBuilder res = new StringBuilder("");
for(int i=0;i<fileContent.length();++i){
if(set.contains(fileContent.charAt(i))){
while(i + 1 < fileContent.length() && fileContent.charAt(i) == fileContent.charAt(i+1)) i++;
res.append(newDelimiter);
}else{
res.append(fileContent.charAt(i));
}
}
System.out.println(res.toString());
}
}
Demo: https://onlinegdb.com/r1BC6qKP8

s = s.replaceAll("([ \\.])[ \\.]+", "$1");
Or if only several same delimiters have to be replaced:
s = s.replaceAll("([ \\.])\\1+", "$1");
[....] is a group of alternative characters
First (...) is group 1, $1
\\1 is the text of the first group

While not using regex, I thought a solution with StreamS was needed, because everyone loves streams:
private static class StatefulFilter implements Predicate<String> {
private final String needle;
private String last = null;
public StatefulFilter(String needle) {
this.needle = needle;
}
#Override
public boolean test(String value) {
boolean duplicate = last != null && last.equals(value) && value.equals(needle);
last = value;
return !duplicate;
}
}
public static void main(String[] args) {
System.out.println(
"def mnop.UVW"
.codePoints()
.sequential()
.mapToObj(c -> String.valueOf((char) c))
.filter(new StatefulFilter(" "))
.map(x -> x.equals(" ") ? "!" : x)
.collect(Collectors.joining(""))
);
}
Runnable example: https://onlinegdb.com/BkY0R2twU
Explanation:
Theoretically, you aren't really supposed to have a stateful filter, but technically, as long as the stream is not parallelized, it works fine:
.codePoints() - splits the String into a Stream
.sequential() - since we care about the order of characters, our Stream may not be processed in parallel
.mapToObj(c -> String.valueOf((char) c)) - the comparison in the filter is more intuitive if we convert to String, but it's not really needed
.filter(new StatefulFilter(" ")) - here we filter out any space that comes after another space
.map(x -> x.equals(" ") ? "!" : x) - now we can replace the remaining spaces with exclamation marks
.collect(Collectors.joining("")) - and finally we can join the characters together to reconstitute a String
The StatefulFilter itself is pretty straight forward - it checks whether a) we have a previous character at all, b) whether the previous character is the same as the current character and c) whether the current character is the delimiter (space). It returns false (meaning the character gets deleted) only if all a, b and c are true.

The biggest difficulty to using a regex for this, is to create an expression from your oldDelimiters string. For example:
String oldDelimiters = " .";
String expression = "\\" + String.join("+|\\", oldDelimiters.split("")) + "+";
String text = "def mnop.UVW;abc .df";
String result = text.replaceAll(expression, "!");
(Edit: since characters in the expression are now escaped anyway, I removed the character classes and edited the following text to reflect that change.)
Where the generated expression looks like \ +|\.+, i.e. each character is quantified and constitutes one alternative of the expression. The engine will match and replace one alternative at a time if it can be matched. result now contains:
def!mnop!UVW;abc!!df
Not sure how backwards compatible this is due to split() behaviour in previous versions of Java (producing a leading space in splitting on the empty string), but with current versions this should be fine.
Edit: As it is, this breaks if the delimiting characters contain digits or characters representing unescaped regex tokens (i.e. 1, b, etc.).

How to split a string in JAVA with two different seperators? [duplicate]

I want to split the string "004-034556" into two strings by the delimiter "-":
part1 = "004";
part2 = "034556";
That means the first string will contain the characters before '-', and the second string will contain the characters after '-'.
I also want to check if the string has '-' in it.

Use the appropriately named method String#split().
String string = "004-034556";
String[] parts = string.split("-");
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556
Note that split's argument is assumed to be a regular expression, so remember to escape special characters if necessary.
there are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), and the opening square bracket [, the opening curly brace {, These special characters are often called "metacharacters".
For instance, to split on a period/dot . (which means "any character" in regex), use either backslash \ to escape the individual special character like so split("\\."), or use character class [] to represent literal character(s) like so split("[.]"), or use Pattern#quote() to escape the entire string like so split(Pattern.quote(".")).
String[] parts = string.split(Pattern.quote(".")); // Split on the exact string.
To test beforehand if the string contains certain character(s), just use String#contains().
if (string.contains("-")) {
// Split it.
} else {
throw new IllegalArgumentException("String " + string + " does not contain -");
}
Note, this does not take a regular expression. For that, use String#matches() instead.
If you'd like to retain the split character in the resulting parts, then make use of positive lookaround. In case you want to have the split character to end up in left hand side, use positive lookbehind by prefixing ?<= group on the pattern.
String string = "004-034556";
String[] parts = string.split("(?<=-)");
String part1 = parts[0]; // 004-
String part2 = parts[1]; // 034556
In case you want to have the split character to end up in right hand side, use positive lookahead by prefixing ?= group on the pattern.
String string = "004-034556";
String[] parts = string.split("(?=-)");
String part1 = parts[0]; // 004
String part2 = parts[1]; // -034556
If you'd like to limit the number of resulting parts, then you can supply the desired number as 2nd argument of split() method.
String string = "004-034556-42";
String[] parts = string.split("-", 2);
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556-42

An alternative to processing the string directly would be to use a regular expression with capturing groups. This has the advantage that it makes it straightforward to imply more sophisticated constraints on the input. For example, the following splits the string into two parts, and ensures that both consist only of digits:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class SplitExample
{
private static Pattern twopart = Pattern.compile("(\\d+)-(\\d+)");
public static void checkString(String s)
{
Matcher m = twopart.matcher(s);
if (m.matches()) {
System.out.println(s + " matches; first part is " + m.group(1) +
", second part is " + m.group(2) + ".");
} else {
System.out.println(s + " does not match.");
}
}
public static void main(String[] args) {
checkString("123-4567");
checkString("foo-bar");
checkString("123-");
checkString("-4567");
checkString("123-4567-890");
}
}
As the pattern is fixed in this instance, it can be compiled in advance and stored as a static member (initialised at class load time in the example). The regular expression is:
(\d+)-(\d+)
The parentheses denote the capturing groups; the string that matched that part of the regexp can be accessed by the Match.group() method, as shown. The \d matches and single decimal digit, and the + means "match one or more of the previous expression). The - has no special meaning, so just matches that character in the input. Note that you need to double-escape the backslashes when writing this as a Java string. Some other examples:
([A-Z]+)-([A-Z]+) // Each part consists of only capital letters
([^-]+)-([^-]+) // Each part consists of characters other than -
([A-Z]{2})-(\d+) // The first part is exactly two capital letters,
// the second consists of digits

Use:
String[] result = yourString.split("-");
if (result.length != 2)
throw new IllegalArgumentException("String not in correct format");
This will split your string into two parts. The first element in the array will be the part containing the stuff before the -, and the second element in the array will contain the part of your string after the -.
If the array length is not 2, then the string was not in the format: string-string.
Check out the split() method in the String class.

This:
String[] out = string.split("-");
should do the thing you want. The string class has many method to operate with a string.

// This leaves the regexes issue out of question
// But we must remember that each character in the Delimiter String is treated
// like a single delimiter
public static String[] SplitUsingTokenizer(String subject, String delimiters) {
StringTokenizer strTkn = new StringTokenizer(subject, delimiters);
ArrayList<String> arrLis = new ArrayList<String>(subject.length());
while(strTkn.hasMoreTokens())
arrLis.add(strTkn.nextToken());
return arrLis.toArray(new String[0]);
}

With Java 8:
List<String> stringList = Pattern.compile("-")
.splitAsStream("004-034556")
.collect(Collectors.toList());
stringList.forEach(s -> System.out.println(s));

Use org.apache.commons.lang.StringUtils' split method which can split strings based on the character or string you want to split.
Method signature:
public static String[] split(String str, char separatorChar);
In your case, you want to split a string when there is a "-".
You can simply do as follows:
String str = "004-034556";
String split[] = StringUtils.split(str,"-");
Output:
004
034556
Assume that if - does not exists in your string, it returns the given string, and you will not get any exception.

The requirements left room for interpretation. I recommend writing a method,
public final static String[] mySplit(final String s)
which encapsulate this function. Of course you can use String.split(..) as mentioned in the other answers for the implementation.
You should write some unit-tests for input strings and the desired results and behaviour.
Good test candidates should include:
- "0022-3333"
- "-"
- "5555-"
- "-333"
- "3344-"
- "--"
- ""
- "553535"
- "333-333-33"
- "222--222"
- "222--"
- "--4555"
With defining the according test results, you can specify the behaviour.
For example, if "-333" should return in [,333] or if it is an error.
Can "333-333-33" be separated in [333,333-33] or [333-333,33] or is it an error? And so on.

To summarize: there are at least five ways to split a string in Java:
String.split():
String[] parts ="10,20".split(",");
Pattern.compile(regexp).splitAsStream(input):
List<String> strings = Pattern.compile("\\|")
.splitAsStream("010|020202")
.collect(Collectors.toList());
StringTokenizer (legacy class):
StringTokenizer strings = new StringTokenizer("Welcome to EXPLAINJAVA.COM!", ".");
while(strings.hasMoreTokens()){
String substring = strings.nextToken();
System.out.println(substring);
}
Google Guava Splitter:
Iterable<String> result = Splitter.on(",").split("1,2,3,4");
Apache Commons StringUtils:
String[] strings = StringUtils.split("1,2,3,4", ",");
So you can choose the best option for you depending on what you need, e.g. return type (array, list, or iterable).
Here is a big overview of these methods and the most common examples (how to split by dot, slash, question mark, etc.)

You can try like this also
String concatenated_String="hi^Hello";
String split_string_array[]=concatenated_String.split("\\^");

Assuming, that
you don't really need regular expressions for your split
you happen to already use apache commons lang in your app
The easiest way is to use StringUtils#split(java.lang.String, char). That's more convenient than the one provided by Java out of the box if you don't need regular expressions. Like its manual says, it works like this:
A null input String returns null.
StringUtils.split(null, *) = null
StringUtils.split("", *) = []
StringUtils.split("a.b.c", '.') = ["a", "b", "c"]
StringUtils.split("a..b.c", '.') = ["a", "b", "c"]
StringUtils.split("a:b:c", '.') = ["a:b:c"]
StringUtils.split("a b c", ' ') = ["a", "b", "c"]
I would recommend using commong-lang, since usually it contains a lot of stuff that's usable. However, if you don't need it for anything else than doing a split, then implementing yourself or escaping the regex is a better option.

For simple use cases String.split() should do the job. If you use guava, there is also a Splitter class which allows chaining of different string operations and supports CharMatcher:
Splitter.on('-')
.trimResults()
.omitEmptyStrings()
.split(string);

The fastest way, which also consumes the least resource could be:
String s = "abc-def";
int p = s.indexOf('-');
if (p >= 0) {
String left = s.substring(0, p);
String right = s.substring(p + 1);
} else {
// s does not contain '-'
}

String Split with multiple characters using Regex
public class StringSplitTest {
public static void main(String args[]) {
String s = " ;String; String; String; String, String; String;;String;String; String; String; ;String;String;String;String";
//String[] strs = s.split("[,\\s\\;]");
String[] strs = s.split("[,\\;]");
System.out.println("Substrings length:"+strs.length);
for (int i=0; i < strs.length; i++) {
System.out.println("Str["+i+"]:"+strs[i]);
}
}
}
Output:
Substrings length:17
Str[0]:
Str[1]:String
Str[2]: String
Str[3]: String
Str[4]: String
Str[5]: String
Str[6]: String
Str[7]:
Str[8]:String
Str[9]:String
Str[10]: String
Str[11]: String
Str[12]:
Str[13]:String
Str[14]:String
Str[15]:String
Str[16]:String
But do not expect the same output across all JDK versions. I have seen one bug which exists in some JDK versions where the first null string has been ignored. This bug is not present in the latest JDK version, but it exists in some versions between JDK 1.7 late versions and 1.8 early versions.

There are only two methods you really need to consider.
Use String.split for a one-character delimiter or you don't care about performance
If performance is not an issue, or if the delimiter is a single character that is not a regular expression special character (i.e., not one of .$|()[{^?*+\) then you can use String.split.
String[] results = input.split(",");
The split method has an optimization to avoid using a regular expression if the delimeter is a single character and not in the above list. Otherwise, it has to compile a regular expression, and this is not ideal.
Use Pattern.split and precompile the pattern if using a complex delimiter and you care about performance.
If performance is an issue, and your delimiter is not one of the above, you should pre-compile a regular expression pattern which you can then reuse.
// Save this somewhere
Pattern pattern = Pattern.compile("[,;:]");
/// ... later
String[] results = pattern.split(input);
This last option still creates a new Matcher object. You can also cache this object and reset it for each input for maximum performance, but that is somewhat more complicated and not thread-safe.

You can split a string by a line break by using the following statement:
String textStr[] = yourString.split("\\r?\\n");
You can split a string by a hyphen/character by using the following statement:
String textStr[] = yourString.split("-");

public class SplitTest {
public static String[] split(String text, String delimiter) {
java.util.List<String> parts = new java.util.ArrayList<String>();
text += delimiter;
for (int i = text.indexOf(delimiter), j=0; i != -1;) {
String temp = text.substring(j,i);
if(temp.trim().length() != 0) {
parts.add(temp);
}
j = i + delimiter.length();
i = text.indexOf(delimiter,j);
}
return parts.toArray(new String[0]);
}
public static void main(String[] args) {
String str = "004-034556";
String delimiter = "-";
String result[] = split(str, delimiter);
for(String s:result)
System.out.println(s);
}
}

Please don't use StringTokenizer class as it is a legacy class that is retained for compatibility reasons, and its use is discouraged in new code. And we can make use of the split method as suggested by others as well.
String[] sampleTokens = "004-034556".split("-");
System.out.println(Arrays.toString(sampleTokens));
And as expected it will print:
[004, 034556]
In this answer I also want to point out one change that has taken place for split method in Java 8. The String#split() method makes use of Pattern.split, and now it will remove empty strings at the start of the result array. Notice this change in documentation for Java 8:
When there is a positive-width match at the beginning of the input
sequence then an empty leading substring is included at the beginning
of the resulting array. A zero-width match at the beginning however
never produces such empty leading substring.
It means for the following example:
String[] sampleTokensAgain = "004".split("");
System.out.println(Arrays.toString(sampleTokensAgain));
we will get three strings: [0, 0, 4] and not four as was the case in Java 7 and before. Also check this similar question.

One way to do this is to run through the String in a for-each loop and use the required split character.
public class StringSplitTest {
public static void main(String[] arg){
String str = "004-034556";
String split[] = str.split("-");
System.out.println("The split parts of the String are");
for(String s:split)
System.out.println(s);
}
}
Output:
The split parts of the String are:
004
034556

import java.io.*;
public class BreakString {
public static void main(String args[]) {
String string = "004-034556-1234-2341";
String[] parts = string.split("-");
for(int i=0;i<parts.length;i++) ｛
System.out.println(parts[i]);
}
}
}

You can use Split():
import java.io.*;
public class Splitting
{
public static void main(String args[])
{
String Str = new String("004-034556");
String[] SplittoArray = Str.split("-");
String string1 = SplittoArray[0];
String string2 = SplittoArray[1];
}
}
Else, you can use StringTokenizer:
import java.util.*;
public class Splitting
{
public static void main(String[] args)
{
StringTokenizer Str = new StringTokenizer("004-034556");
String string1 = Str.nextToken("-");
String string2 = Str.nextToken("-");
}
}

Here are two ways two achieve it.
WAY 1: As you have to split two numbers by a special character you can use regex
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TrialClass
{
public static void main(String[] args)
{
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher("004-034556");
while(m.find())
{
System.out.println(m.group());
}
}
}
WAY 2: Using the string split method
public class TrialClass
{
public static void main(String[] args)
{
String temp = "004-034556";
String [] arrString = temp.split("-");
for(String splitString:arrString)
{
System.out.println(splitString);
}
}
}

You can simply use StringTokenizer to split a string in two or more parts whether there are any type of delimiters:
StringTokenizer st = new StringTokenizer("004-034556", "-");
while(st.hasMoreTokens())
{
System.out.println(st.nextToken());
}

Check out the split() method in the String class on javadoc.
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
String data = "004-034556-1212-232-232";
int cnt = 1;
for (String item : data.split("-")) {
System.out.println("string "+cnt+" = "+item);
cnt++;
}
Here many examples for split string but I little code optimized.

String str="004-034556"
String[] sTemp=str.split("-");// '-' is a delimiter
string1=004 // sTemp[0];
string2=034556//sTemp[1];

I just wanted to write an algorithm instead of using Java built-in functions:
public static List<String> split(String str, char c){
List<String> list = new ArrayList<>();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.length(); i++){
if(str.charAt(i) != c){
sb.append(str.charAt(i));
}
else{
if(sb.length() > 0){
list.add(sb.toString());
sb = new StringBuilder();
}
}
}
if(sb.length() >0){
list.add(sb.toString());
}
return list;
}

You can use the method split:
public class Demo {
public static void main(String args[]) {
String str = "004-034556";
if ((str.contains("-"))) {
String[] temp = str.split("-");
for (String part:temp) {
System.out.println(part);
}
}
else {
System.out.println(str + " does not contain \"-\".");
}
}
}

To split a string, uses String.split(regex). Review the following examples:
String data = "004-034556";
String[] output = data.split("-");
System.out.println(output[0]);
System.out.println(output[1]);
Output
004
034556
Note:
This split (regex) takes a regex as an argument. Remember to escape the regex special characters, like period/dot.

String s = "TnGeneral|DOMESTIC";
String a[]=s.split("\\|");
System.out.println(a.toString());
System.out.println(a[0]);
System.out.println(a[1]);
Output:
TnGeneral
DOMESTIC

String s="004-034556";
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)=='-')
{
System.out.println(s.substring(0,i));
System.out.println(s.substring(i+1));
}
}
As mentioned by everyone, split() is the best option which may be used in your case. An alternative method can be using substring().

Splitting a string between a char

I want to split a String on a delimiter.
Example String:
String str="ABCD/12346567899887455422DEFG/15479897445698742322141PQRS/141455798951";
Now I want Strings as ABCD/12346567899887455422, DEFG/15479897445698742322141 like I want
only 4 chars before /
after / any number of chars numbers and letters.
Update:
The only time I need the previous 4 characters is after a delimiter is shown, as the string may contain letters or numbers...
My code attempt:
public class StringReq {
public static void main(String[] args) {
String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
testSplitStrings(str);
}
public static void testSplitStrings(String path) {
System.out.println("splitting of sprint starts \n");
String[] codeDesc = path.split("/");
String[] codeVal = new String[codeDesc.length];
for (int i = 0; i < codeDesc.length; i++) {
codeVal[i] = codeDesc[i].substring(codeDesc[i].length() - 4,
codeDesc[i].length());
System.out.println("line" + i + "==> " + codeDesc[i] + "\n");
}
for (int i = 0; i < codeVal.length - 1; i++) {
System.out.println(codeVal[i]);
}
System.out.println("splitting of sprint ends");
}
}

You claim that after / there can appear digits and alphabets, but in your example I don't see any alphabets which should be included in result after /.
So based on that assumption you can simply split in placed which has digit before and A-Z character after it.
To do so you can split with regex which is using look-around mechanism like str.split("(?<=[0-9])(?=[A-Z])")
Demo:
String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
for (String s : str.split("(?<=[0-9])(?=[A-Z])"))
System.out.println(s);
Output:
BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890
If you alphabets can actually appear in second part (after /) then you can use split which will try to find places which have four alphabetic characters and / after it like split("(?=[A-Z]{4}/)") (assuming that you are using at least Java 8, if not you will need to manually exclude case of splitting at start of the string for instance by adding (?!^) or (?<=.) at start of your regex).

you can use regex
Pattern pattern = Pattern.compile("[A-Z]{4}/[0-9]*");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}

Instead of:
String[] codeDesc = path.split("/");
Just use this regex (4 characters before / and any characters after):
String[] codeDesc = path.split("(?=.{4}/)(?<=.)");

Even simpler using \d:
path.split("(?=[A-Za-z])(?<=\\d)");
EDIT:
Included condition for 4 any size letters only.
path.split("(?=[A-Za-z]{4})(?<=\\d)");
output:
BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890
It is still unclear if this is authors expected result.

How to exclude the words that have non-alphabetic characters from string

For example, if I want to delete the non-alphabetic characters I would do:
for (int i = 0; i < s.length; i++) {
s[i] = s[i].replaceAll("[^a-zA-Z]", "");
}
How do I completely exclude a word with a non-alphabetic character from the string?
For example:
Initial input:
"a cat jumped jumped; on the table"
It should exclude "jumped;" because of ";".
Output:
"a cat jumped on the table"

Edit: (in response to your edit)
You could do this:
String input = "a cat jumped jumped; on the table";
input = input.replaceAll("(^| )[^ ]*[^A-Za-z ][^ ]*(?=$| )", "");
Let's break down the regex:
(^| ) matches after the beginning of a word, either after a space or after the start of the string.
[^ ]* matches any sequence, including the null string, of non-spaces (because spaces break the word)
[^A-Za-z ] checks if the character is non-alphabetical and does not break the string.
Lastly, we need to append [^ ]* to make it match until the end of the word.
(?=$| ) matches the end of the word, either the end of the string or the next space character, but it doesn't consume the next space, so that consecutive words will still match (ie "I want to say hello, world! everybody" becomes "I want to say everybody")
Note: if "a cat jumped off the table." should output "a cat jumped off the table", then use this:
input = input.replaceAll(" [^ ]*[^A-Za-z ][^ ]*(?= )", "").replaceAll("[^A-Za-z]$", "");
Assuming you have 1 word per array element, you can do this to replace them with the empty string:
for (String string: s) {
if (s.matches(".*[^A-Za-z].*") {
s = "";
}
}
If you actually want to remove it, consider using an ArrayList:
ArrayList<String> stringList = new ArrayList<>();
for (int index = 0; index < s.length; index++) {
if (s[index].matches(".*[^A-Za-z].*") {
stringList.add(s[index]);
}
}
And the ArrayList will have all the elements that don't have non-alphabetical characters in them.

Try this:
s = s[i].join(" ").replaceAll("\\b\\w*\\W+\\w*(?=\\b)", "").split(" ");
It joins the array with spaces, then applies the regex. The regex looks for a word break (\b), then a word with at least one non-word character (\w*\W+\w*), and then a word break at the end (not matched, there will still be a space). The split splits the string into an array.

public static void main(String[] args) throws ClassNotFoundException {
String str[] ={ "123abass;[;[]","abcde","1234"};
for(String s : str)
{
if(s.matches("^[a-zA-Z]+$")) // should start and end with [a-zA-Z]
System.out.println(s);
}
O/P : abcde

You could use .toLowerCase() on each value in the array, then search the array against a-z values and it will be faster than a regular expression. Assume that your values are in an array called "myArray."
List<String> newValues = new ArrayList<>();
for(String s : myArray) {
if(containsOnlyLetters(s)) {
newValues.add(s);
}
}
//do this if you have to go back to an array instead of an ArrayList
String[] newArray = (String[])newValues.toArray();
This is the containsOnlyLetters method:
boolean containsOnlyLetters(String input) {
char[] inputLetters = input.toLowerCase().toCharArray();
for(char c : inputLetters) {
if(c < 'a' || c > 'z') {
return false;
}
}
return true;
}

Java Finding all words begining with a letter

I am trying to get all words that begin with a letter from a long string. How would you do this is java? I don't want to loop through every letter or something inefficient.
EDIT: I also can't use any in built data structures (except arrays of course)- its for a cs class. I can however make my own data structures (which i have created sevral).

You could try obtaining an array collection from your String and then iterating through it:
String s = "my very long string to test";
for(String st : s.split(" ")){
if(st.startsWith("t")){
System.out.println(st);
}
}

You need to be clear about some things. What is a "word"? You want to find only "words" starting with a letter, so I assume that words can have other characters too. But what chars are allowed? What defines the start of such a word? Whitespace, any non letter, any non letter/non digit, ...?
e.g.:
String TestInput = "test séntènce îwhere I'm want,to üfind 1words starting $with le11ers.";
String regex = "(?<=^|\\s)\\pL\\w*";
Pattern p = Pattern.compile(regex, Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = p.matcher(TestInput);
while (matcher.find()) {
System.out.println(matcher.group());
}
The regex (?<=^|\s)\pL\w* will find sequences that starts with a letter (\pL is a Unicode property for letter), followed by 0 or more "word" characters (Unicode letters and numbers, because of the modifier Pattern.UNICODE_CHARACTER_CLASS).
The lookbehind assertion (?<=^|\s) ensures that there is the start of the string or a whitespace before the sequence.
So my code will print:
test
séntènce ==> contains non ASCII letters
îwhere ==> starts with a non ASCII letter
I ==> 'm is missing, because `'` is not in `\w`
want
üfind ==> starts with a non ASCII letter
starting
le11ers ==> contains digits
Missing words:
,to ==> starting with a ","
1words ==> starting with a digit
$with ==> starting with a "$"

You could build a HashMap -
HashMap<String,String> map = new HashMap<String,String>();
example -
ant, bat, art, cat
Hashmap
a -> ant,art
b -> bat
c -> cat
to find all words that begin with "a", just do
map.get("a")

You can get the first letter of the string and check with API method that if it is letter or not.
String input = "jkk ds 32";
String[] array = input.split(" ");
for (String word : array) {
char[] arr = word.toCharArray();
char c = arr[0];
if (Character.isLetter(c)) {
System.out.println( word + "\t isLetter");
} else {
System.out.println(word + "\t not Letter");
}
}
Following are some sample output:
jkk isLetter
ds isLetter
32 not Letter

Scanner scan = new Scanner(text); // text being the string you are looking in
char test = 'x'; //whatever letter you are looking for
while(scan.hasNext()){
String wordFound = scan.next();
if(wordFound.charAt(0)==test){
//do something with the wordFound
}
}
this will do what you are looking for, inside the if statement do what you want with the word

Regexp way:
public static void main(String[] args) {
String text = "my very long string to test";
Matcher m = Pattern.compile("(^|\\W)(\\w*)").matcher(text);
while (m.find()) {
System.out.println("Found: "+m.group(2));
}
}

You can use split() method. Here is an example :
String string = "your string";
String[] parts = string.split(" C");
for(int i=0; i<parts.length; i++) {
String[] word = parts[i].split(" ");
if( i > 0 ) {
// ignore the rest words because don't starting with C
System.out.println("C" + word[0]);
}
else { // Check 1st excplicitly
for(int j=0; j<word.length; j++) {
if ( word[j].startsWith("c") || word[j].startsWith("C"))
System.out.println(word[j]);
}
}
}
where "C" is you letter. Just then loop around the array. For parts[0] you have to check if it starts with "C". It was my mistake to start looping from i=1. The correct is from 0.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to get all substrings occurring between two characters? - java

Related

Java: How to replace consecutive characters with a single character?

How to split a string in JAVA with two different seperators? [duplicate]

Splitting a string between a char

How to exclude the words that have non-alphabetic characters from string

Java Finding all words begining with a letter

Categories

Resources