How would I go about splitting a sentence in Java [duplicate]

How would I go about splitting a sentence in Java [duplicate] - java

This question already has answers here:
Wrap the string after a number of characters word-wise in Java
(6 answers)
Closed 8 years ago.
I am attempting to split a sentence that is being printed to the console in order to avoid cut offs like so # 80 chars:
Welcome to fancy! A text based rpg. Perhaps you could tell us your name brave ad
venturer?
So I would like it to print like so
Welcome to fancy! A text based rpg. Perhaps you could tell us your name brave
adventurer?
Is there a way to do this with String.split() ?

You can do something like this, where you decide whether to do a new line or not, depending on the lines current size:
public static String getLines(String line){
String[] words = line.split(" ");
StringBuilder str = new StringBuilder();
int size = 0;
for (int i = 0; i < words.length; i++) {
if(size==0){
str.append(words[i]);
}
if(size+words[i].length()+1<=80){
str.append(" ").append(words[i]);
size++;
}else{
str.append("\n").append(words[i]);
size = 0;
}
size+=words[i].length();
}
return str.toString();
}
Another different way of doing the same:
public static String getLines2(String line){
StringBuilder str = new StringBuilder();
int begin = 0;
while(begin<line.length()){
int lineEnd = Math.min(begin + 80, line.length());
while(lineEnd<line.length()&&line.charAt(lineEnd)!= ' '){
lineEnd--;
}
str.append(line.subSequence(begin, lineEnd));
if(lineEnd<line.length()) str.append("\n");
begin = lineEnd+1;
}
return str.toString();
}

Depending on how exact it has to be, Regex can do this pretty well. If you just need to split at spaces, a simple
String eg = "Example sentence with, some. spaces! and stuff";
String[] splitSentence = eg.split(" ");
will do the job, splitting the string at every space and thus returning the words with their adjacent special characters as a String array. You could then simply add up the characters (with the spaces inbetween) and if you pass your border (in your case 80), pop the last word and add a '\n':
String getConsoleFormattedString(String s, int rowLength) {
String[] split = s.split(" ");
String ret = "";
int counter = 0,
for(int i = 0; i < split.length; i++) {
if(counter + split[i] + 1 <= 80)
ret += split[i] + " ";
else {
ret += "\n";
counter = 0;
i--;
}
}
return ret;
}
I will let you figure out how to handle words with > 80 letters, for the sake of simplicity

split is not best option here, but you can use Pattern and Matcher classes with this regex
\\G.{1,80}(\\s+|$)
which means
\\G place of last match, or if it is first iteration of searching for match (so there was not any yet) start of the string (represented by ^)
.{1,80} any characters can appear between one and eighty times
(\\s+|$) one or more whitespaces or end of string
You can use it this way
String data = "Welcome to fancy! A text based rpg. Perhaps you could tell us your name brave "
+ "adventurer? ";
Pattern p = Pattern.compile("\\G.{1,80}(\\s+|$)");
Matcher m = p.matcher(data);
while(m.find())
System.out.println(m.group().trim());
Output:
Welcome to fancy! A text based rpg. Perhaps you could tell us your name brave
adventurer?
But assuming that you can face with very long words which shouldn't be split you can add
\\S{80,}
to your regex to also let it find non-whitespace strings which length is 80 or more.
Example:
String data = "Welcome to fancy! A text based rpg. Perhaps you could tell us your name brave "
+ "adventurer? foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar";
Pattern p = Pattern.compile("\\G.{1,80}(\\s+|$)|\\S{80,}");
Matcher m = p.matcher(data);
while (m.find())
System.out.println(m.group().trim());
Output:
Welcome to fancy! A text based rpg. Perhaps you could tell us your name brave
adventurer?
foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar-foo-bar

Related

How do I replace the same word but different case in the same sentence separately?

For example, replace "HOW do I replace different how in the same sentence by using Matcher?" with "LOL do I replace different lol in the same sentence?"
If HOW is all caps, replace it with LOL. Otherwise, replace it with lol.
I only know how to find them:
String source = "HOW do I replace different how in the same " +
"sentence by using Matcher?"
Pattern pattern = Pattern.compile(how, Pattern.CASE_INSENSITIVE);
Matcher m = pattern.matcher(source);
while (m.find()) {
if(m.group.match("^[A-Z]*$"))
System.out.println("I am uppercase");
else
System.out.println("I am lowercase");
}
But I don't know how to replace them by using matcher and pattern.

Here's one way to achieve your goal: (not necessarily the most efficient, but it works and is simply understood)
String source = "HOW do I replace different how in the same sentence by using Matcher?";
String[] split = source.replaceAll("HOW", "LOL").split(" ");
String newSource = "";
for(int i = 0; i < split.length; i++) {
String at = split[i];
if(at.equalsIgnoreCase("how")) at = "lol";
newSource+= " " + at;
}
newSource.substring(1, newSource.length());
//The output string is newSource
Replace all uppercase, then iterate over each word and replace the remaining "how"s with "lol". That substring at the end is simply to remove the extra space.

I came up with a really dumb solution:
String result = source;
result = result.replaceAll(old_Word, new_Word);
result = result.replaceAll(old_Word.toUpperCase(),
newWord.toUpperCase());

Regex pattern to convert comma separated String

Changing string with comma separated values to numbered new-line values
For example:
Input: a,b,c
Output:
1.a
2.b
3.c
Finding it hard to change it using regex pattern, instead of converting string to string array and looping through.

I'm not really sure, that it's possible to achive with only regex without any kind of a loop. As fore me, the solution with spliting the string into an array and iterating over it, is the most straightforward:
String value = "a,b,c";
String[] values = value.split(",");
String result = "";
for (int i=1; i<=values.length; i++) {
result += i + "." + values[i-1] + "\n";
}
Sure, it's possible to do without splitting and any kind of arrays, but it could be a little bit awkward solution, like:
String value = "a,b,c";
Pattern pattern = Pattern.compile("[(^\\w+)]");
Matcher matcher = pattern.matcher(value.replaceAll("\\,", "\n"));
StringBuffer s = new StringBuffer();
int i = 0;
while (matcher.find()) {
matcher.appendReplacement(s, ++i + "." + matcher.group());
}
System.out.println(s.toString());
Here the , sign is replaced with \n new line symbol and then we are looking for a groups of characters at the start of every line [(^\\w+)]. If any group is found, then we are appending to the start of this group a line number. But even here we have to use a loop to set the line number. And this logic is not as clear, as the first one.

StringUtils.countMatches words starting with a string?

I'm usingStringUtils.countMatches to count word frequencies, is there a way to search text for words starting-with some characters?
Example:
searching for art in "artificial art in my apartment" will return 3! I need it to return 2 for words starting with art only.
My solution was to replace \r and \n in the text with a space and modify the code to be:
text = text.replaceAll("(\r\n|\n)"," ").toLowerCase();
searchWord = " "+searchWord.toLowerCase();
StringUtils.countMatches(text, searchWord);
I also tried the following Regex:
patternString = "\\b(" + searchWord.toLowerCase().trim() + "([a-zA-Z]*))";
pattern = Pattern.compile(patternString);
matcher = pattern.matcher(text.toLowerCase());
Questions:
-Does my first solution make sense or is there a better way to do this?
-Is my second solution faster? as I'm working with large text files and decent number of search-words.
Thanks

text = text.replaceAll("(\r\n|\n)"," ").toLowerCase();
searchWord = " "+searchWord.toLowerCase();
String[] words = text.split(" ");
int count = 0;
for(String word : words)
if(searchWord.length() < word.length())
if(word.substring(word.length).equals(searchWord))
count++;
Loops provide the same effect.

Use a regular expression to count examples of art.... The pattern to use is:
\b<search-word>
Here, \b matches a word boundary. Of course, the \b needs to be escaped when listed in the pattern string. Below is an example:
String input = "artificial art in my apartment";
Matcher matcher = Pattern.compile("\\bart").matcher(input);
int count = 0;
while (matcher.find()) {
count++;
}
System.out.println(count);
Output: 2

I need to get a substring from a java string Tokenizer

I need to get a substring from a java string tokenizer.
My inpunt string is = Pizza-1*Nutella-20*Chicken-65*
StringTokenizer productsTokenizer = new StringTokenizer("Pizza-1*Nutella-20*Chicken-65*", "*");
do
{
try
{
int pos = productsTokenizer .nextToken().indexOf("-");
String product = productsTokenizer .nextToken().substring(0, pos+1);
String count= productsTokenizer .nextToken().substring(pos, pos+1);
System.out.println(product + " " + count);
}
catch(Exception e)
{
}
}
while(productsTokenizer .hasMoreTokens());
My output must be:
Pizza 1
Nutella 20
Chicken 65
I need the product value and the count value in separate variables to insert that values in the Data Base.
I hope you can help me.

You could use String.split() as
String[] products = "Pizza-1*Nutella-20*Chicken-65*".split("\\*");
for (String product : products) {
String[] prodNameCount = product.split("\\-");
System.out.println(prodNameCount[0] + " " + prodNameCount[1]);
}
Output
Pizza 1
Nutella 20
Chicken 65

You invoke the nextToken() method 3 times. That will get you 3 different tokens
int pos = productsTokenizer .nextToken().indexOf("-");
String product = productsTokenizer .nextToken().substring(0, pos+1);
String count= productsTokenizer .nextToken().substring(pos, pos+1);
Instead you should do something like:
String token = productsTokenizer .nextToken();
int pos = token.indexOf("-");
String product = token.substring(...);
String count= token.substring(...);
I'll let you figure out the proper indexes for the substring() method.
Also instead of using a do/while structure it is better to just use a while loop:
while(productsTokenizer .hasMoreTokens())
{
// add your code here
}
That is don't assume there is a token.

An alternative answer you may want to use if your input grows:
// find all strings that match START or '*' followed by the name (matched),
// a hyphen and then a positive number (not starting with 0)
Pattern p = Pattern.compile("(?:^|[*])(\\w+)-([1-9]\\d*)");
Matcher finder = p.matcher(products);
while (finder.find()) {
// possibly check if the new match directly follows the previous one
String product = finder.group(1);
int count = Integer.valueOf(finder.group(2));
System.out.printf("Product: %s , count %d%n", product, count);
}

Some people dislike regex, but this is a good application for them. All you need to use is "(\\w+)-(\\d{1,})\\*" as your pattern. Here's a toy example:
String template = "Pizza-1*Nutella-20*Chicken-65*";
String pattern = "(\\w+)-(\\d+)\\*";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(template);
while(m.find())
{
System.out.println(m.group(1) + " " + m.group(2));
}
To explain this a bit more, "(\\w+)-(\\d+)\\*" looks for a (\\w+), which is any set of at least 1 character from [A-Za-z0-9_], followed by a -, followed by a number \\d+, where the+ means at least one character in length, followed by a *, which must be escaped. The parentheses capture what's inside of them. There are two sets of capturing parentheses in this regex, so we reference them by group(1) and group(2) as seen in the while loop, which prints:
Pizza 1
Nutella 20
Chicken 65

Java regex to filter phone numbers

I have following example string that needs to be filtered
0173556677 (Alice), 017545454 (Bob)
This is how phone numbers are added to a text view. I want the text to look like that
0173556677;017545454
Is there a way to change the text using regular expression. How would such an expression look like? Or do you recommend an other method?

You can do as follows:
String orig = "0173556677 (Alice), 017545454 (Bob)";
String regex = " \\(.+?\\)";
String res = orig.replaceAll(regex, "").replaceAll(",", ";");
// ^remove all content in parenthesis
// ^ replace comma with semicolon

Use the expression in android.util.Patterns
Access the static variable
Patterns.PHONE
or use this expression here (Android Source Code)

Here's a resource that can guide you :
http://www.zparacha.com/validate-email-ssn-phone-number-using-java-regular-expression/

This solution works with phone numbers separated with any string that does not contain numbers:
String orig = "0173556677 (Alice), 017545454 (Bob)";
String[] numbers = orig.split("\\D+"); //split at everything that is not a digit
StringBuilder sb = new StringBuilder();
if (numbers.length > 0) {
sb.append(numbers[0]);
for (int i = 1; i < numbers.length; i++) { //concatenate all that is left
sb.append(";");
sb.append(numbers[i]);
}
}
String res = sb.toString();
or, with com.google.common.base.Joiner:
String[] numbers = orig.split("\\D+"); //split at everything that is not a digit
String res = Joiner.on(";").join(numbers);
PS. There is a minor deviation from the requirements in the best voted example, but it seems I cannot just add one character (should be replaceAll(", ", ";"), with a space after the coma, or a \\s) and I do not want to mess somebody's code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How would I go about splitting a sentence in Java [duplicate] - java

Related

How do I replace the same word but different case in the same sentence separately?

Regex pattern to convert comma separated String

StringUtils.countMatches words starting with a string?

I need to get a substring from a java string Tokenizer

Java regex to filter phone numbers

Categories

Resources