How do I split/parse this String properly using Regex - java

I am inexperienced with regex and rusty with JAVA, so some help here would be appreciated.
So I have a String in the form:
statement|digit|statement
statement|digit|statement
etc.
where statement can be any combination of characters, digits, and spaces.
I want to parse this string such that I save the first and last statements of each line in a separate string array.
for example if I had a string:
cats|1|short hair and long hair
cats|2|black, blue
dogs|1|cats are better than dogs
I want to be able to parse the string into two arrays.
Array one = [cats], [cats], [dogs]
Array two = [short hair and long hair],[black, blue],[cats are better than dogs]
Matcher m = Pattern.compile("(\\.+)|\\d+|=(\\.+)").matcher(str);
while(m.find()) {
String key = m.group(1);
String value = m.group(2);
System.out.printf("key=%s, value=%s\n", key, value);
}
I would have continued to add the keys and values into seperate arrays had my output been right but no luck. Any help with this would be very much appreciated.

Here is a solution with RegEx:
public class ParseString {
public static void main(String[] args) {
String data = "cats|1|short hair and long hair\n"+
"cats|2|black, blue\n"+
"dogs|1|cats are better than dogs";
List<String> result1 = new ArrayList<>();
List<String> result2 = new ArrayList<>();
Pattern pattern = Pattern.compile("(.+)\\|\\d+\\|(.+)");
Matcher m = pattern.matcher(data);
while (m.find()) {
String key = m.group(1);
String value = m.group(2);
result1.add(key);
result2.add(value);
System.out.printf("key=%s, value=%s\n", key, value);
}
}
}
Here is a great site to help with regex http://txt2re.com/ expressions. Enter some example text in step one. Select the parts you are interested in part 2. And select a language in step 3. Then copy, paste and massage the code that it spits out.

Double split should work:
class ParseString
{
public static void main(String[] args)
{
String s = "cats|1|short hair and long hair\ncats|2|black, blue\ndogs|1|cats are better than dogs";
String[] sa1 = s.split("\n");
for (int i = 0; i < sa1.length; i++)
{
String[] sa2 = sa1[i].split("\\|");
System.out.printf("key=%s, value=%s\n", sa2[0], sa2[2]);
} // end for i
} // end main
} // end class ParseString
Output:
key=cats, value=short hair and long hair
key=cats, value=black, blue
key=dogs, value=cats are better than dogs

There is no need for a complex regex pattern, you could simple split the string by the demiliter token using the string's split method (String#split()) on Java.
Working Example
public class StackOverFlow31840211 {
private static final int SENTENCE1_TOKEN_INDEX = 0;
private static final int DIGIT_TOKEN_INDEX = SENTENCE1_TOKEN_INDEX + 1;
private static final int SENTENCE2_TOKEN_INDEX = DIGIT_TOKEN_INDEX + 1;
public static void main(String[] args) {
String[] text = {
"cats|1|short hair and long hair",
"cats|2|black, blue",
"dogs|1|cats are better than dogs"
};
ArrayList<String> arrayOne = new ArrayList<String>();
ArrayList<String> arrayTwo = new ArrayList<String>();
for (String s : text) {
String[] tokens = s.split("\\|");
int tokenType = 0;
for (String token : tokens) {
switch (tokenType) {
case SENTENCE1_TOKEN_INDEX:
arrayOne.add(token);
break;
case SENTENCE2_TOKEN_INDEX:
arrayTwo.add(token);
break;
}
++tokenType;
}
}
System.out.println("Sentences for first token: " + arrayOne);
System.out.println("Sentences for third token: " + arrayTwo);
}
}

I agree with the other answers that you should use split, but I am providing an answer that uses Pattern.split, since it uses a regex.
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Pattern;
/* Name of the class has to be "Main" only if the class is public. */
class MatchExample
{
public static void main (String[] args) {
String[] data = {
"cats|1|short hair and long hair",
"cats|2|black, blue",
"dogs|1|cats are better than dogs"
};
Pattern p = Pattern.compile("\\|\\d+\\|");
for(String line: data){
String[] elements = p.split(line);
System.out.println(elements[0] + " // " + elements[1]);
}
}
}
Notice that the pattern will match on one or more digits between two |'s. I see what you are doing with the groupings.

The main problem is that you need to escape | and not the .. Also what is the = doing in your regex? I generalized the regex a little bit but you can replace .* by \\d+ to have the same as you.
Matcher m = Pattern.compile("^(.+?)\\|.*\\|(.+)$", Pattern.MULTILINE).matcher(str);
Here is the strict version:"^([^|]+)\\|\\d+\\|([^|]+)$" (also with MULTILINE)
And it's indeed easier using split (on the lines) as some have said, but like this:
String[] parts = str.split("\\|\\d+\\|");
If parts.length is not two then you know it is not a legal line.
If your input is always formatted like that, then you can just do with this single statement to get the left part in the even indexes and the right part in the odd indexes (0: line1-left, 1: line1-right, 2: line2-left, 3: line2-right, 4: line3-left ...), so you will get an array twice the size of line count.
String[] parts = str.split("\\|\\d+\\||\\n+");

Related

How to find a specific occurrence of a string inside of a string

The problem is this, I'm trying to find and get only one occurrence of a string, when the only way I can get one is by using a keyword that occurs multiple times.
Ex. 4 potato, 4 (string I want), 4 house, 4 car
How do I only get the string I want, when I can't type in any keywords that the string might contain.
Imagine it as trying to take only one paragraph out of an essay.
I've tried the stringy.replaceAll(Str1, Str2); variable, but to no avail. All that happens is I replace all of the string (go figure with a name like replace all)
package com.donovan.cunningham;
import java.util.ArrayList;
import java.util.Random;
public class EssayCreator {
//Creating varz
private static String[] lf = {"happy", "sad", "unhappy", "atractive",
"fast", "lazy"};
private static String[] op = {"estatic", "melhencohly", "depressed",
"alluring", "swift", "lackadaisical"};
private static String pF = " ";
private static String temp[];
private static String conv = " ";
private static String comm = ", ";
private static Random random = new Random();
private ArrayList<String> array = new ArrayList<String>();
public static void Converter(String in) {
in = in.replace(comm, conv);
for (int i = 0; i < lf.length; i++){
in = in.replace(lf[i], op[i]);
}
in = in.replace(conv, comm);
//int rand = random.nextInt(in.indexOf(pF));
for (int i = 0; i < in.indexOf(pF); i++){
/*
Where I want to get an exact string of an essay
I'd convert pF to conv, and then remove the paragraph to
change the order
} */
}
CreateGUI.output.setText(in);
Sound.stopSound();
}
}
Thanks
Your question is not very clear though. You want (1) to find the most occurrence string which you don't know or you want (2) to replace the occurrence string that you know?
The most naive way to do (1) is to chop your text by space and put them in a string-to-integer HashMap to calculate the most occurrence string. You can also scan this HashMap to find all the N-occurence strings
For (2), supposed that you already know which key string you want to find, you can apply indexOf(String str, int fromIndex) in String recursively as followed:
int occurenceCount = 0;
String input = "Here is your text with key_word1, key_word2, ...etc";
StringBuffer output = new StringBuffer();
int index = input.indexOf("key_word");
int copiedIndex = 0;
for(index>0)
{
output.append(input.substring(copiedIndex, index));
occurenceCount++;
if(occurenceCount==4) //Find 4th occurrence and replaced it with "new_key_word"
{
output.append("new_key_word")
}
else
{
output.append("key_word")
}
copiedIndex = index+("key_word".length);
index = input.indexOf("key_word", index+("key_word".length));
if(index==-1)
break;
}
Ref: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String,%20int)
Not sure if I had answered your question though...
I've combed through your code and tried running it with a use case I made from the cryptic info you've given. As folks above are saying, it's not clear what you're trying to accomplish. Typically regex is your friend when trying to do more complex string pattern matching if String's built in methods are not enough. Try googling 'pattern match string in paragraph regex java' or something to that effect. Meanwhile, I've added some comments to your code which might help with making this question more clear. Happy to help if I can better understand what you're trying to do. See code and comments below:
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
public class EssayCreator {
// Creating varz
private static String[] lf = { "happy", "sad", "unhappy", "atractive",
"fast", "lazy" };
private static String[] op = { "estatic", "melhencohly", "depressed",
"alluring", "swift", "lackadaisical" };
private static String pF = " ";
private static String temp[];
private static String conv = " ";
private static String comm = ", ";
private static Random random = new Random();
private List<String> array = new ArrayList<String>();
// Bradley D: Just some side notes here:
// Don't capitalize method names and don't use nouns.
// They're not class names or constructors. Changed Converter to convert. It'd also be good to
// stipulate what you are converting, i.e. convertMyString to make this a
// little more intuitive
public static void convert(String in) {
/*
* Bradley D: First, you are replacing all commas following by a space
* with 3 spaces. Be good to know why you doin that?
*/
in = in.replace(comm, conv);
for (int i = 0; i < lf.length; i++) {
in = in.replace(lf[i], op[i]);
}
/*
* Bradley D: Now you are replacing the 3 spaces with a comma and a
* space again??
*/
in = in.replace(conv, comm);
// Bradley D: Not really sure what you are trying to iterate through
// here. in.indexOf(pF) is -1
// for the use case I've created for you with the text below (what did I
// miss?)...
// Perhaps you're trying to find the first place in your essay where pF
// (3 spaces) occurs....
// but you've already reconverted your 3 spaces back to a comma and
// single space, so I'm getting even more lost here....
// int rand = random.nextInt(in.indexOf(pF));
for (int i = 0; i < in.indexOf(pF); i++) {
/*
* Bradley D: What is pF?? It appears to be the same as comm...
*/
/*
* Where I want to get an exact string of an essay I'd convert pF to
* conv, and then remove the paragraph to change the order }
*/
// Bradley D: Ok, so if you can make this question clear, I'll give it a shot here
}
// Bradley D: Commenting this out since it was not included
// CreateGUI.output.setText(in);
// Sound.stopSound();
}
public static void main(String[] args) {
EssayCreator ec = new EssayCreator();
String essay = "Let's see if we find the desired string in here. "
+ "Are we happy? Nope, we're not happy. Who's happy? What does happiness mean anyway? "
+ "I'd be very happy if this question we're more clear, but let's give it a go anyway. Maybe "
+ "we're lazy, and that's not attractive, thus rendering us unhappy and lackadasical... jk! "
+ "So hey man... why are you replacing all of the commas with spaces? "
+ "Can you put comments in your code? What is pF? "
+ "Also you should not capitalize method names. They should be in camelCase and they should not "
+ "be nouns like Converter, which makes them look like a constructor. Methods represent "
+ "an action taken, so a verb to describe them is standard practice. "
+ "So use convert, but what are you converting? convertString?? convertWords? "
+ "Anyway, making your method names intuitive would be helpful to anyone trying to understand "
+ "the code.";
ec.convert(essay);
}
}

How to check if an Array contains a particular word in a String and get it?

I have a String[] and an input String:
String[] ArrayEx = new String[1];
String textInput = "a whole bunch of words"
What I want to do is check if the String contains a word present in the Array, like this.
Ex: textInput = "for example" and ArrayEx[0] = "example"
I know about this method:
Arrays.asList(yourArray).contains(yourValue)
but it checks the full String right? How do I check if the String contains a particular word present in the Array. Even if it is from an ArrayList I have no problem.
Also if yes, can I get that word from the String[]? i.e., in the above case get the String "example".
EDIT:
public void searchNearestPlace(String v2txt)
{
Log.e("TAG", "Started");
v2txt = v2txt.toLowerCase();
String[] places = {"accounting, airport, amusement_park, aquarium, art_gallery, atm, bakery, bank, bar, beauty_salon, bicycle_store, book_store, bowling_alley, bus_station, cafe, campground, car_dealer, car_rental, car_repair, car_wash, casino, cemetery, church, city_hall, clothing_store, convenience_store, courthouse, dentist, department_store, doctor, electrician, electronics_store, embassy, establishment, finance, fire_station, florist, food, funeral_home, furniture_store, gas_station, general_contractor, grocery_or_supermarket, gym, hair_care, hardware_store, health, hindu_temple, home_goods_store, hospital, insurance_agency, jewelry_store, laundry, lawyer, library, liquor_store, local_government_office, locksmith, lodging, meal_delivery, meal_takeaway, mosque, movie_rental, movie_theater, moving_company, museum, night_club, painter, park, parking, pet_store, pharmacy, physiotherapist, place_of_worship, plumber, police, post_office, real_estate_agency, restaurant, roofing_contractor, rv_park, school, shoe_store, shopping_mall, spa, stadium, storage, store, subway_station, synagogue, taxi_stand, train_station, travel_agency, university, veterinary_care, zoo"};
int index;
for(int i = 0; i<= places.length - 1; i++)
{
Log.e("TAG","for");
if(v2txt.contains(places[i]))
{
Log.e("TAG", "sensed?!");
index = i;
}
}
Say v2txt was "awesome airport" the sensed Log never does appear even though all other logs indicate it working
Edit2:
I am so embarrassed that I made such a dunder head mistake. My array is declared wrongly. There should be a " before every ,. I am such a big idiot!
Sorry will change it and let you know.
First of all it has nothing to do with android
Second the solution
boolean flag = false;
String textInput = "for example";
int index = 0;
String[] yourArray = {"ak", "example"};
for (int i = 0; i <= yourArray.length - 1; i++) {
if (textInput.contains(yourArray[i])) {
flag = true;
index = i;
}
}
if (flag)
System.out.println("found at index " + index);
else
System.out.println("not found ");
DEMO
EDIT :
Change your array to
String[] places = {"accounting", "airport", "amusement_park" };
and so on with other values with your array declaration it has one index.
you can split your string and get array of words
txArray = textInput.split(" ");
then for each element in txArray check if
Arrays.asList(ArrayEx).contains(txArray[i])
txArray = "Hello I'm your String";
String[] splitStr = txArray.split(" ");
int i=0;
while(splitStr[i]){
if(Arrays.asList(ArrayEx).contains(txArray[i])){
System.out.println("FOUND");
}
i++;
}
You can use Java - Regular Expressions.
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Testing {
public static void main(String[] args) {
String textInput = "for example";
String[] arrayEx = new String[1];
arrayEx[0] = "example";
Pattern p = Pattern.compile(arrayEx[0]);
Matcher m = p.matcher(textInput);
boolean matchedFoundStatus = false;
while (m.find()) {
matchedFoundStatus = true;
}
System.out.println("matchedFoundStatus:" + matchedFoundStatus);
}
}
Try this;
Sting text2check = "Your Name":
for(int t = 0; t < array.length; t++)
{
if (text2check.equals(array[t])
// Process it Here
break;
}
"How do I check if the String contains a particular word present in the Array?" is the same thing as Is there an element in the array, for which the input string contains this element
Java 8
String[] words = { "example", "hello world" };
String input = "a whole bunch of words";
Arrays.stream(words).anyMatch(input::contains);
(The matching words can also be extracted, if needed:)
Arrays.stream(words)
.filter(input::contains)
.toArray();
If you are stuck with Java 7, you will have to re-implement "anyMatch" and "filter" yourself:
Java 7
boolean anyMatch(String[] words, String input) {
for(String s : words)
if(input.contains(s))
return true;
return false;
}
List<String> filter(String[] words, String input) {
List<String> matches = new ArrayList<>();
for(String s : words)
if(input.contains(s))
matches.add(s);
return matches;
}
This will take an String array, and search through all the strings looking for a specific char sequence found in a string. Also, native Android apps are programmed in the Java language. You might find it beneficial to read up more on Strings.
String [] stringArray = new String[5];
//populate your array
String inputText = "abc";
for(int i = 0; i < stringArray.length; i++){
if(inputText.contains(stringArray[i]){
//Do something
}
}

Regex pattern for matching words like c++ in a text

I have a text which can have words like c++, c, .net, asp.net in any format.
Sample Text:
Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.
I already have c,c++,java,.net,asp.net stored somewhere.
All I need is to pick the occurrences of all these words in the text.
The pattern I was using to match was (?i)\\b(" +Pattern.quote(key)+ ")\\b which doesn't match things like c++ and .net. So I tried escaping the literals using (?i)\\b(" +forRegex(key)+ ")\\b (method link here), and I got the same result.
The expected output is that it should match(case insensitive):
C++ : 2
C : 2
java: 2
asp.net : 1
.net : 1
Set<String> keywords; // add your keywords in this set;
String text="Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.";
text=text.replaceAll("[, ; ]"," ");
String[] textArray=text.split(" ");
for(String s : keywords){
int count=0;
for(int i=0;i<textArray.length();i++){
if(textArray[i].equals(s)){
count++
}
}
System.out.println(s + " : " + count);
}
This works most of the time. (if you want better result change the regular expression on replaceAll method.)
I would choose a non-regex solution to your problem. Just put the keywords into an array, and search for each occurance in the input string. It uses String.indexOf(String, int) to iterate through the string without creating any new objects (beyond the index and counter).
public class SearchWordCountNonRegex {
public static final void main(String[] ignored) {
//Keywords and input searched for with lowercase, so the keyword "java"
//matches "Java", "java", and "JAVA".
String[] searchWords = {"c++", "c", "java", "asp.net", ".net"};
String input = "Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.".
toLowerCase();
for(int i = 0; i < searchWords.length; i++) {
String searchWord = searchWords[i];
System.out.print(searchWord + ": ");
int foundCount = 0;
int currIdx = 0;
while(currIdx != -1) {
currIdx = input.indexOf(searchWord, currIdx);
if(currIdx != -1) {
foundCount++;
currIdx += searchWord.length();
} else {
currIdx = -1;
}
}
System.out.println(foundCount);
}
}
}
Output:
c++: 2
c: 4
java: 2
asp.net: 1
.net: 2
If you are really wanting a regex solution, you could try something like the following, which uses a case insensitive pattern to match each keyword.
The problem is that the number of occurrences must be kept track of separately. This could be done, for example, by adding each found keyword to a map, where the key is the keyword, and the value is its current count. In addition, once a match is found, the search continues from that point, which implies that any potential overlapping matches are hidden (such as when Asp.NET is found, that particular .NET match will never be found)--this may or may not be a desired behavior.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class SearchWordsRegexNoCounts {
public static final void main(String[] ignored) {
Matcher keywordMtchr = Pattern.compile("(C\\+\\+|C|Java|Asp\\.NET|\\.NET)",
Pattern.CASE_INSENSITIVE).matcher("");
String input = "Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me C,C++,Java,asp.net skills.";
keywordMtchr.reset(input);
while(keywordMtchr.find()) {
System.out.println("Keyword found at index " + keywordMtchr.start() + ": " + keywordMtchr.group(1));
}
}
}
Output:
Keyword found at index 7: java
Keyword found at index 32: .net
Keyword found at index 57: C
Keyword found at index 60: C++
Keyword found at index 90: C
Keyword found at index 92: C++
Keyword found at index 96: Java
Keyword found at index 101: asp.net
Using regex I've come up with the following solution. Although it can potentially find undesired matches as described in the code comments:
// "\\" is first because we don't want to escape any escape characters we will
// be adding ourselves
private static final String[] regexSpecial = {"\\", "(", ")", "[", "]", "{",
"}", ".", "+", "*", "?", "^", "$", "|"};
private static final String regexEscape = "\\";
private static final String[] regexEscapedSpecial;
static {
regexEscapedSpecial = new String[regexSpecial.length];
for (int i = 0; i < regexSpecial.length; i++) {
regexEscapedSpecial[i] = regexEscape + regexSpecial[i];
}
}
public static void main(String[] args) throws Throwable {
Set<String> searchWords = new HashSet<String>(Arrays.asList("c++", "c",
".net", "asp.net", "java"));
String text = "Hello, java is what I want. Hmm .net should be fine too. C, C++ are also need. So, get me\nC,C++,Java,asp.net skills.";
System.out.println(numOccurrences(text, searchWords, false));
}
/**
* Counts the number of occurrences of the given words in the given text. This
* allows the given "words" to contain non-word characters. Note that it is
* possible for unexpected matches to occur. For example if one of the words
* to match is "c" then while none of the "c"s in "coconut" will be matched,
* the "c" in "c-section" will even if only matches of "c" as in the "c
* programming language" were intended.
*/
public static Map<String, Integer> numOccurrences(String text,
Set<String> searchWords, boolean caseSensitive) {
Map<String, String> lowerCaseToSearchWords = new HashMap<String, String>();
List<String> searchWordsInOrder = sortByNonInclusion(searchWords);
StringBuilder regex = new StringBuilder("(?<!\\w)(");
boolean started = false;
for (String searchWord : searchWordsInOrder) {
lowerCaseToSearchWords.put(searchWord.toLowerCase(), searchWord);
if (started) {
regex.append("|");
} else {
started = true;
}
regex.append(escapeRegex(searchWord));
}
regex.append(")(?!\\w)");
Pattern pattern = null;
if (caseSensitive) {
pattern = Pattern.compile(regex.toString());
} else {
pattern = Pattern.compile(regex.toString(), Pattern.CASE_INSENSITIVE);
}
Matcher matcher = pattern.matcher(text);
Map<String, Integer> matches = new HashMap<String, Integer>();
while (matcher.find()) {
String match = lowerCaseToSearchWords.get(matcher.group(1).toLowerCase());
Integer oldVal = matches.get(match);
if (oldVal == null) {
oldVal = 0;
}
matches.put(match, oldVal + 1);
}
return matches;
}
/**
* Sorts the given collection of words in such a way that if A is a prefix of
* B, then it is guaranteed that A will appear after B in the sorted list.
*/
public static List<String> sortByNonInclusion(Collection<String> toSort) {
List<String> sorted = new ArrayList<String>(new HashSet<String>(toSort));
// sorting in reverse alphabetical order will ensure that if A is a prefix
// of B it will appear later in the list than B
Collections.sort(sorted, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o2.compareTo(o1);
}
});
return sorted;
}
/**
* Escape all regex special characters in the given text.
*/
public static String escapeRegex(String toEscape) {
for (int i = 0; i < regexSpecial.length; i++) {
toEscape = toEscape.replace(regexSpecial[i], regexEscapedSpecial[i]);
}
return toEscape;
}
The printed result is
{asp.net=1, c=2, c++=2, java=2, .net=1}

Returning an array of string words with more than 5 letters [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
Here is the problem specification:
Write a static method , getBigWords, that gets a single String parameter and returns an array whose elements are the words in the parameter that contain more than 5 letters. (A word is defined as a contiguous sequence of letters.)
EXAMPLE: So, if the String argument passed to the method was "There are 87,000,000 people in Canada", getBigWords would return an array of two elements , "people" and "Canada".
ANOTHER EXAMPLE: If the String argument passed to the method was "Send the request to support#turingscraft.com", getBigWords would return an array of three elements , "request", "support" and "turingscraft".
My approach was to declare a variable length array, and also split the string sequence into tokens pieces. Then I passed the tokens pieces into a method. Essentially, I wanted to check that each token element in the array with more than 5 elements has a contiguous sequence of letters. If not, break out of the loop and move to the next token. If so, add each letter in the token to a new array of String builders. Then, when all the loop iterations are done, I return the String builder array as a string array. I then assign the return value to the variable length array.
I am trying to figure out how to declare an array String builder. The array must be of variable length, to allow me to append letters to each element.
Here is my code.
public static String getBigWords(String string){
String[] tokens = string.split(" ");
java.util.ArrayList<String> array = new ArrayList<String>();
array = isNoDigit(tokens);
public static String isNodigit(String[] param);
StringBuilder[] builder = new StringBuilder[param.length];
for (int i = 0; param.length(); i++){
if (param[i].length() > 5){
for (int j = 0; j < param[i].length(); j++)
if ((Character.isLetter(param[i].charAt(j))){
builder.append(param[i].charAt(j));
}
else
break;
}
}
}
return builder.toString
}
return array.toArray(new String[0]);
}
Updated code:
public static String[] getBigWords(String sequence) {
sequence = sequence.replace("#", " ");
sequence = sequence.replace(".", " ");
sequence = sequence.replaceAll("\\d+.", "");
java.util.ArrayList<String> array = new java.util.ArrayList<String>();
String[] tokens = sequence.split(" ");
for (int i = 0; i < tokens.length(); i++) {
if (tokens[i].length() > 5) {
array.add(tokens[i]);
}
}
return array.toArray(new String[0]);
}
I believe this is what you want. Obviously you'll need to take the code and implement it how you need to. This gives you the gist of how to use Regular Expressions to achieve your end goal in a much easier way. This finds any word greater than 5 letters (only letters a-z, upper or lower case) and adds them to an ArrayList object.
import java.util.*;
import java.util.regex.*;
public class Test
{
public static void main(String[] args)
{
String s = "There are 87,000,000 people in Canada";
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile("([a-zA-Z]{6,})").matcher(s);
while (m.find())
{
allMatches.add(m.group());
}
}
}
If you want to learn more about regex, I recommend this tutorial: regexone
Please note: This code is for demo only. I have not performed any error checking etc. You need to add that as fit for your application.
You can try something like this.
public class StackOverFlow {
public static void main(String[] args) {
String str1 = "There are 87,000,000 people in Canada";
String str2 = "Send the request to support#turingscraft.com";
String[] bigWordsArray = getBigWords(str1);
System.out.println("Printing results from first String");
for (String string : bigWordsArray) {
System.out.println(string);
}
bigWordsArray = getBigWords(str2);
System.out.println("Printing results from second String");
for (String string : bigWordsArray) {
System.out.println(string);
}
}
public static String[] getBigWords(String testString) {
testString = testString.replace('#', ' ');
testString = testString.replace('.', ' ');
testString = testString.replaceAll("\\d+.", "");
String[] tokens = testString.split(" ");
StringBuilder sb = new StringBuilder();
for (String string : tokens) {
if (string.length() > 5) {
sb.append(string).append(" ");
}
}
return sb.toString().split(" ");
}
}
This compiles and works see the output below.

Java's split method has leading blank records that I can't suppress

I'm parsing an input file that has multiple keywords preceded by a +. The + is my delimiter in a split, with individual tokens being written to an array. The resulting array includes a blank record in the [0] position.
I suspect that split is taking the "nothing" before the first token and populating project[0], then moving on to subsequent tokens which all show up as correct.
Documentaion says that this method has a limit parameter:
If n is zero then the pattern will be applied as many times as
possible, the array can have any length, and trailing empty strings
will be discarded.
and I found this post on SO, but the solution proposed, editing out the leading delimiter (I used a substring(1) to create a temp field) yielded the same blank record for me.
Code and output appers below. Any tips would be appreciated.
import java.util.regex.*;
import java.io.*;
import java.nio.file.*;
import java.lang.*;
//
public class eadd
{
public static void main(String args[])
{
String projStrTemp = "";
String projString = "";
String[] project = new String[10];
int contextSOF = 0;
int projStringSOF = 0;
int projStringEOF = 0;
//
String inputLine = "foo foofoo foo foo #bar.com +foofoofoo +foo1 +foo2 +foo3";
contextSOF = inputLine.indexOf("#");
int tempCalc = (inputLine.indexOf("+")) ;
if (tempCalc == -1) {
proj StrTemp = "+Uncategorized";
} else {
projStringSOF = inputLine.indexOf("+",contextSOF);
projStrTemp = inputLine.trim().substring(projStringSOF).trim();
}
project = projStrTemp.split("\\+");
//
System.out.println(projStrTemp+"\n"+projString);
for(int j=0;j<project.length;j++) {
System.out.println("Project["+j+"] "+project[j]);
}
}
CONSOLE OUTPUT:
+foofoofoo +foo1 +foo2 +foo3
Project[0]
Project[1] foofoofoo
Project[2] foo1
Project[3] foo2
Project[4] foo3
Change:
projStrTemp = inputLine.trim().substring(projStringSOF).trim();
to:
projStrTemp = inputLine.trim().substring(projStringSOF + 1).trim();
If you have a leading delimiter, your array will start with a blank element. It might be worthwhile for you to experiment with split() without all the other baggage.
public static void main(String[] args) {
String s = "an+example";
String[] items = s.split("\\+");
for (int i = 0; i < items.length; i++) {
System.out.println(i + " = " + items[i]);
}
}
With String s = "an+example"; it produces:
0 = an
1 = example
Whereas String s = "+an+example"; produces:
0 =
1 = an
2 = example
One simple solution would be to remove the first + from the string. This way, it won't split before the first keyword:
projStrTemp = inputLine.trim().substring(projStringSOF + 1).trim();
Edit: Personally, I'd go for a more robust solution using regular expressions. This finds all keywords preceded by +. It also requires that + is preceded by either a space or it's at the start of the line so that words like 3+4 aren't matched.
String inputLine = "+foo 3+4 foofoo foo foo #bar.com +foofoofoo +foo1 +foo2 +foo3";
Pattern re = Pattern.compile("(\\s|^)\\+(\\w+)");
Matcher m = re.matcher(inputLine);
while (m.find()) {
System.out.println(m.group(2));
}
+foofoofoo +foo1 +foo2 +foo3
Splits method splits the string around matches of the given + so the array contains in the first element an empty field (with 5 elements). If you want to get the previous data get inputLine instead the processed projStrTemp that substring from the first + included.

Categories