Splitting a string java based on multiple delimiters - java

I need to split a string based on delimiters and assign it to an object. I am aware of the split function, but I am unable to figure how to do it for my particular string.
The object is of the format:
class Selections{
int n;
ArrayList<Integer> choices;
}
The string is of the form :
1:[1,3,2],2:[1],3:[4,3],4:[4,3]
where:
1:[1,3,2] is an object with n=1 and Arraylist should have numbers 1,2,3.
2:[1] is an object with n=2 and Arraylist should have number 1
and so on .
I cannot use split with "," as delimiter because both individual objects and the elements within [] are separated by ",".
Any ideas would be appreciated.

You could use a regex to have a more robust result as follows:
String s = "1:[1,3,2],2:[1],3:[4,3],4:[4,3],5:[123,53,1231],123:[54,98,434]";
// commented one handles white spaces correctly
//Pattern p = Pattern.compile("[\\d]*\\s*:\\s*\\[((\\d*)(\\s*|\\s*,\\s*))*\\]");
Pattern p = Pattern.compile("[\\d]*:\\[((\\d*)(|,))*\\]");
Matcher matcher = p.matcher(s);
while (matcher.find())
System.out.println(matcher.group());
The regex can probably be tuned to be more accurate (e.g., handling white spaces) but it works fine on the example.

How about using "]," as delimiter?
If your structure is strictly like you said, it should be able to identify and split.
(Sorry, I want to leave it as comment, but my reputation does not allow)

You will need to perform multiple splits.
Split with the delimiter "]," (as mentioned in other comments and answers).
For each of the resulting strings, split with the delimiter ":[".
you will need to cleanup the last entry (from the split in step 1), because it will end with ']'

I have no idea how to use a build-in function for this. I would just write my own split method:
private List<Sections> split(String s){
private List<Sections> sections = new ArrayList<>();
private boolean insideBracket = false;
private int n = 0;
private List<Integer> ints = new ArrayList<>();
for (int i = 0; i < s.length(); i++){
char c = s.charAt(i);
if(!insideBracket && !c.equals(':')){
n = c.getNumericValue();
} else if(c.equals('[')){
insideBracket = true;
} else if (c.equals(']')){
insideBracket = false;
sections.add(new Section(n, ints));
ints = new ArrayList();
} else if(insideBracket && !c.equals(',')){
ints.add(c.getNumericValue());
}
}
}
you probably need to modify that a little bit. Right now it dont works if a number has multiple digits.

Try this
while(true){
int tmp=str.indexOf("]")+1;
System.out.println(str.substring(0,tmp));
if(tmp==str.length())
break;
str=str.substring(tmp+1);
}

Related

Java: Replace a specific character with a substring in a string at index

I am struggling with how to actually do this. Say I have this string
"This Str1ng i5 fun"
I want to replace the '1' with "One" and the 5 with "Five"
"This StrOneng iFive fun"
I have tried to loop thorough the string and manually replace them, but the count is off. I have also tried to use lists, arrays, stringbuilder, etc. but I cannot get it to work:
char[] stringAsCharArray = inputString.toCharArray();
ArrayList<Character> charArraylist = new ArrayList<Character>();
for(char character: stringAsCharArray) {
charArraylist.add(character);
}
int counter = startPosition;
while(counter < endPosition) {
char temp = charArraylist.get(counter);
String tempString = Character.toString(temp);
if(Character.isDigit(temp)){
char[] tempChars = digits.getDigitString(Integer.parseInt(tempString)).toCharArray(); //convert to number
charArraylist.remove(counter);
int addCounter = counter;
for(char character: tempChars) {
charArraylist.add(addCounter, character);
addCounter++;
}
counter += tempChars.length;
endPosition += tempChars.length;
}
counter++;
}
I feel like there has to be a simple way to replace a single character at a string with a substring, without having to do all this iterating. Am I wrong here?
String[][] arr = {{"1", "one"},
{"5", "five"}};
String str = "String5";
for(String[] a: arr) {
str = str.replace(a[0], a[1]);
}
System.out.println(str);
This would help you to replace multiple words with different text.
Alternatively you could use chained replace for doing this, eg :
str.replace(1, "One").replace(5, "five");
Check this much better approach : Java Replacing multiple different substring in a string at once (or in the most efficient way)
You can do
string = string.replace("1", "one");
Don't use replaceAll, because that replaces based on regular expression matches (so that you have to be careful about special characters in the pattern, not a problem here).
Despite the name, replace also replaces all occurrences.
Since Strings are immutable, be sure to assign the result value somewhere.
Try the below:
string = string.replace("1", "one");
string = string.replace("5", "five");
.replace replaces all occurences of the given string with the specified string, and is quite useful.

Scanning 2 Different Data Types Java

I have a data file that is a list of names followed by "*****" and then continues with integers. How do I scan the names and then break with the asterisks, followed by scanning the integers?
This question might help : Splitting up data file in Java Scanner
Use the Scanner.useDelimiter() method, put "*****" as the delimiter, like this for example :
sc.useDelimiter("*****");
OR
Alternative :
Read the whole string
Split the string using String.split()
Resulting String array will have index 0 contain the names and index 1 contain the integers.
Below code should work for you
Scanner scanner = new Scanner(<INPUT_STR>).useDelimiter("[*****]");
while (scanner.hasNext()) {
if (scanner.hasNextInt()) {
// For Integer
} else {
// For String
}
}
Although this seems a tedious thing, I think this would solve the issue without worrying if the split returns anything, and the out of bounds.
final String x = "abc****12354";
final Pattern p = Pattern.compile("[A-Z]*[a-z]*\\*{4}");
final Matcher m = p.matcher(x);
while (m.find()) {
System.out.println(m.group());
}
final Pattern p1 = Pattern.compile("\\*{4}[0-9]*");
final Matcher m1 = p1.matcher(x);
while (m1.find()) {
System.out.println(m1.group());
}
The first pattern match minus the last 4 stars (can be substring-ed out) and the second pattern match minus the leading 4 stars (also can be removed) would give the request fields.

Check the number of occurrences of word(s) stored in an ArrayList

I have big text such as :
really!!! Oh Oh! You read about them in a book and they told you to wear clothes? buahahahaham Did they also tell you how they were able to sew the leaves that they used to cover up? You amu
Also I have an arraylist of some words and expression such as really or oh oh!
Now I want to count the number of occurrence of the phrases (which is in the arraylist ) in the given text above or any similar text.
So for that I first split the text to words and start looping as follow:
String[] word=content.split("\\s+");
for(int j=0;j<word.length;j++){
if(sexuality.contains(word[j])){
swCount=sw+1;
}
But this does not work since the oh oh! or really cannot be picked by the above method. Can anyone help?
This counts the occurences of any searchString in your input.
String input = "....";
List<String> searchStrings = Arrays.asList("oh oh!", "really");
int count = 0;
for (String searchString : searchStrings) {
int indexOf = input.indexOf(searchString);
while (indexOf > -1) {
count++;
indexOf = input.indexOf(searchString, indexOf+1);
}
}
If you want case insensitive search, convert both the input and the search words to lowercase. If you don't want to count words twice, replace the indexOf and the while loop with a simple contains:
int count = 0;
for (String searchString : searchStrings) {
if (input.contains(searchString)) {
count++;
}
}
If you have something like god in your blacklist and don't want to match goddamn in input (for whatever reason) you need to make sure there are string boundaries around your search word. Have a look at this code:
int count = 0;
for (String searchString : searchStrings) {
Pattern pattern = Pattern.compile("\\b" + Pattern.quote(searchString) + "\\b");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
count++;
}
}
I also don't understand exactly: is the problem that "oh oh!" should be one word? or is "!" the problem? Anyway, consider overriding "Equals" in ArrayList (I assume "sexuality" is your arraylist) to fit your needs. Check out this post:
ArrayList's custom Contains method
The brute force approach is to insert all strings of sexuality list to an HashMap and then for each substring of content search for it in the map. You can limit the length of the substring to the maximum length of the words in sexuality list. However this could be really expensive, it depends on the length of content and the length of the longest word contained in sexuality
For a smarter approach you should have a look at another data structure, the trie.
An implementation is available in the Apache Commons Collection 4 lib. This approach is much faster because let you stop scanning the substring as soon as you find a prefix the doesn't exist in your dictionary (in your case the sexuality list)
If your "sentence" is not too big and your List doesn´t contain too many items I would go the easy way and do it like this:
String sentence = "Here is my my sentence";
List<String> searchList = new ArrayList<>();
searchList.add("is");
searchList.add("my");
int occurences[] = new int[searchList.size()];
for (int i = 0; i < searchList.size(); i++) {
int searchFromPos = 0;
String wordToSearch = searchList.get(i);
while ((searchFromPos = sentence.indexOf(wordToSearch, searchFromPos)) != -1) {
occurences[i]++;
searchFromPos += wordToSearch.length();
}
}
NOTE, however, that is will also detect word parts.
e.g. when your sentence is "This is sneaky" and you search for "is", there wille be two results, because This also has and "is".

Java - separate numbers from a string

I have a string that contains a few numbers (usually a date) and separators. The separators can either be "," or "." - or example 01.05,2000.5000
....now I need to separate those numbers and put into an array but I'm not sure how to do that (the separating part). Also, I need to check that the string is valid - it cannot be 01.,05.
I'm not asking for anyone to solve the thing for me (but if someone wants I appreciated it), just point me in the right direction :)
This is a way of doing it with StringTokenizer class, just iterate the tokens and if the obtained token is empty then you have a invalid String, also, convert the tokens to integers by the parseInt method to check if they are valid integer numbers:
import java.util.*;
public class t {
public static void main(String... args) {
String line = "01.05,2000.5000";
StringTokenizer strTok = new StringTokenizer(line, ",.");
List<Integer> values = new ArrayList<Integer>();
while (strTok.hasMoreTokens()) {
String s = strTok.nextToken();
if (s.length() == 0) {
// Found a repeated separator, String is not valid, do something about it
}
try {
int value = Integer.parseInt(s, 10);
values.add(value);
} catch(NumberFormatException e) {
// Number not valid, do something about it or continue the parsing
}
}
// At the end, get an array from the ArrayList
Integer[] arrayOfValues = values.toArray(new Integer[values.size()]);
for (Integer i : arrayOfValues) {
System.out.println(i);
}
}
}
Iterate through an String#split(regex) generated array and check each value to make sure your source String is "valid".
In:
String src = "01.05,2000.5000";
String[] numbers = src.split("[.,]");
numbers here will be an array of Strings, like {"01", "05", "2000", "5000"}. Each value is a number.
Now iterate over numbers. If you find a index that is not a number (it's a number when numbers[i].matches("\\d+") is true), then your src is invalid.
If possible, I would use guava String splitter for that. It is much more reliable, predictable and flexible than String#split. You can tell it exactly what to expect, what to omit, and so on.
For an example usage, and a small rant on how stupid javas split sometimes behaves, have a look here: http://code.google.com/p/guava-libraries/wiki/StringsExplained#Splitter
Use regex to group and match the input
String s = "01.05,2000.5000";
Pattern pattern = Pattern.compile("(\\d{2})[.,](\\d{2})[.,](\\d{4})[.,](\\d{4})");
Matcher m = pattern.matcher(s);
if(m.matches()) {
String[] matches = { m.group(1),m.group(2), m.group(3),m.group(4) };
for(String match : matches) {
System.out.println(match);
}
} else {
System.err.println("Mismatch");
}
Try this:
String str = "01.05,2000.5000";
str = str.replace(".",",");
int number = StringUtils.countMatches(str, ",");
String[] arrayStr = new String[number+1];
arrayStr = str.split(",");
StringUtils is from Apache Commons >> http://commons.apache.org/proper/commons-lang/
To validate:
if (input.matches("^(?!.*[.,]{2})[\\d.,]+))
This regex checks that:
dot and comma are never adjacent
input is comprised only of digits, dots and commas
To split:
String[] numbers = input.split("[.,]");
In order to separate the string, use split(), the argument of the method is the delimiter
array = string.split("separator");

Java regex to filter phone numbers

I have following example string that needs to be filtered
0173556677 (Alice), 017545454 (Bob)
This is how phone numbers are added to a text view. I want the text to look like that
0173556677;017545454
Is there a way to change the text using regular expression. How would such an expression look like? Or do you recommend an other method?
You can do as follows:
String orig = "0173556677 (Alice), 017545454 (Bob)";
String regex = " \\(.+?\\)";
String res = orig.replaceAll(regex, "").replaceAll(",", ";");
// ^remove all content in parenthesis
// ^ replace comma with semicolon
Use the expression in android.util.Patterns
Access the static variable
Patterns.PHONE
or use this expression here (Android Source Code)
Here's a resource that can guide you :
http://www.zparacha.com/validate-email-ssn-phone-number-using-java-regular-expression/
This solution works with phone numbers separated with any string that does not contain numbers:
String orig = "0173556677 (Alice), 017545454 (Bob)";
String[] numbers = orig.split("\\D+"); //split at everything that is not a digit
StringBuilder sb = new StringBuilder();
if (numbers.length > 0) {
sb.append(numbers[0]);
for (int i = 1; i < numbers.length; i++) { //concatenate all that is left
sb.append(";");
sb.append(numbers[i]);
}
}
String res = sb.toString();
or, with com.google.common.base.Joiner:
String[] numbers = orig.split("\\D+"); //split at everything that is not a digit
String res = Joiner.on(";").join(numbers);
PS. There is a minor deviation from the requirements in the best voted example, but it seems I cannot just add one character (should be replaceAll(", ", ";"), with a space after the coma, or a \\s) and I do not want to mess somebody's code.

Categories