Java: Parsing a string based on delimiter

Java: Parsing a string based on delimiter - java

I have to design an interface where it fetches data from machine and then plots it. I have already designed the fetch part and it fetches a string of format A&B#.13409$13400$13400$13386$13418$13427$13406$13383$13406$13412$13419$00000$00000$
First five A&B#. characters are the identifier. Please note that the fifth character is new line feed i.e. ASCII 0xA.
The function I have written -
public static boolean checkStart(String str,String startStr){
String Initials = str.substring(0,5);
System.out.println("Here is start: " + Initials);
if (startStr.equals(Initials))
return true;
else
return false;
}
shows Here is start: A&B#. which is correct.
Question 1:
Why do we need to take str.substring(0,5) i.e. when I use str.substring(0,4) it shows only - Here is start: A&B# i.e. missing new line feed. Why is New Line feed making this difference.
Further to extract remaing string I have to use s.substring(5,s.length()) instead of s.substring(6,s.length())
i.e.
s.substring(6,s.length()) produces 3409$13400$13400$13386$13418$13427$13406$13383$13406$13412$13419$00000$00000$ i.e missing the first char after the identifier A&B#.
Question 2:
My parsing function is:
public static String[] StringParser(String str,String del){
String[] sParsed = str.split(del);
for (int i=0; i<sParsed.length; i++) {
System.out.println(sParsed[i]);
}
return sParsed;
}
It parses correctly for String String s = "A&B#.13409/13400/13400/13386/13418/13427/13406/13383/13406/13412/13419/00000/00000/"; and calling the function as String[] tokens = StringParser(rightChannelString,"/");
But for String such as String s = "A&B#.13409$13400$13400$13386$13418$13427$13406$13383$13406$13412$13419$00000$00000$" , the call String[] tokens = StringParser(rightChannelString,"$"); does not parse the string at all.
I am not able to figure out why this behaviour. Can any one please let me know the solution?
Thanks

Regarding question 1, the java API says that the substring method takes 2 parameters:
beginIndex the begin index, inclusive.
endIndex the end index, exclusive.
So in your example
String: A&B#.134
Index: 01234567
substring(0,4) = indexes 0 to 3 so A&B#, that's why you have to put 5 as the second parameter to recover your line delimiter.
Regarding question 2, I guess that the split method takes a regexp in parameter and $ is a special character. To match the dollar sign I guess you have to escape it with the \ character (as \ is a special char in strings so you must also escape it).
String[] tokens = StringParser(rightChannelString,"\\$");

Q1: review the description of substring in the documentation:
Returns a new string that is a substring of this string.
The substring begins at the specified beginIndex and extends to the
character at index endIndex - 1. Thus the length of the substring
is endIndex-beginIndex.
Q2: the split method takes a regular expression for the separator. $ is a special character for regular expressions, it matches the end of the line.

Related

Empty Strings within a non empty String [duplicate]

This question already has answers here:
Replace with empty string replaces newChar around all the characters in original string
(4 answers)
Closed 6 years ago.
I'm confused with a code
public class StringReplaceWithEmptyString
{
public static void main(String[] args)
{
String s1 = "asdfgh";
System.out.println(s1);
s1 = s1.replace("", "1");
System.out.println(s1);
}
}
And the output is:
asdfgh
1a1s1d1f1g1h1
So my first opinion was every character in a String is having an empty String "" at both sides. But if that's the case after 'a' (in the String) there should be two '1' coming in the second line of output (one for end of 'a' and second for starting of 's').
Now I checked whether the String is represented as a char[] in these links In Java, is a String an array of chars? and String representation in Java I got answer as YES.
So I tried to assign an empty character '' to a char variable, but its giving me a compiler error,
Invalid character constant
The same process gives a compiler error when I tried in char[]
char[] c = {'','a','','s'}; // CTE
So I'm confused about three things.
How an empty String is represented by char[] ?
Why I'm getting that output for the above code?
How the String s1 is represented in char[] when it is initialized first time?
Sorry if I'm wrong at any part of my question.

Just adding some more explanation to Tim Biegeleisen answer.
As of Java 8, The code of replace method in java.lang.String class is
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Here You can clearly see that the string is replaced by Regex Pattern matcher and in regex "" is identified by Zero-Length character and it is present around any Non-Zero length character.
So, behind the scene your code is executed as following
Pattern.compile("".toString(), Pattern.LITERAL).matcher("asdfgh").replaceAll(Matcher.quoteReplacement("1".toString()));
The the output becomes
1a1s1d1f1g1h1

Going with Andy Turner's great comment, your call to String#replace() is actually implemented using String#replaceAll(). As such, there is a regex replacement happening here. The matches occurs before the first character, in between each character in the string, and after the last character.
^|a|s|d|f|g|h|$
^ this and every pipe matches to empty string ""
The match you are making is a zero length match. In Java's regex implementation used in String.replaceAll(), this behaves as the example above shows, namely matching each inter-character position and the positions before the first and after the last characters.
Here is a reference which discusses zero length matches in more detail: http://www.regexguru.com/2008/04/watch-out-for-zero-length-matches/
A zero-width or zero-length match is a regular expression match that does not match any characters. It matches only a position in the string. E.g. the regex \b matches between the 1 and , in 1,2.

This is because it does a regex match of the pattern/replacement you pass to the replace().
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence. The
replacement proceeds from the beginning of the string to the end, for
example, replacing "aa" with "b" in the string "aaa" will result in
"ba" rather than "ab".
Parameters:
target The sequence of char values
to be replaced
replacement The replacement sequence of char values
Returns: The resulting string
Throws: NullPointerException if target
or replacement is null.
Since:
1.5
Please read more at the link below ... (Also browse through the source code).
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.replace%28java.lang.CharSequence%2Cjava.lang.CharSequence%29
A regex such as "" would match every possible empty string in a string. In this case it happens to be every empty space at the start and end and after every character in the string.

How to remove all characters before a specific character in Java?

I have a string and I'm getting value through a html form so when I get the value it comes in a URL so I want to remove all the characters before the specific charater which is = and I also want to remove this character. I only want to save the value that comes after = because I need to fetch that value from the variable..
EDIT : I need to remove the = too since I'm trying to get the characters/value in string after it...

You can use .substring():
String s = "the text=text";
String s1 = s.substring(s.indexOf("=") + 1);
s1.trim();
then s1 contains everything after = in the original string.
s1.trim()
.trim() removes spaces before the first character (which isn't a whitespace, such as letters, numbers etc.) of a string (leading spaces) and also removes spaces after the last character (trailing spaces).

While there are many answers. Here is a regex example
String test = "eo21jüdjüqw=realString";
test = test.replaceAll(".+=", "");
System.out.println(test);
// prints realString
Explanation:
.+ matches any character (except for line terminators)
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
= matches the character = literally (case sensitive)
This is also a shady copy paste from https://regex101.com/ where you can try regex out.

You can split the string from the = and separate in to array and take the second value of the array which you specify as after the = sign
For example:
String CurrentString = "Fruit = they taste good";
String[] separated = CurrentString.split("=");
separated[0]; // this will contain "Fruit"
separated[1]; //this will contain "they teste good"
then separated[1] contains everything after = in the original string.

I know this is asked about Java but this seems to also be the first search result for Kotlin so you should know that Kotlin has the String.substringAfter(delimiter: String, missingDelimiterValue: String = this) extension for this case.
Its implementation is:
val index = indexOf(delimiter)
return if (index == -1)
missingDelimiterValue
else
substring(index + delimiter.length, length)

Maybe locate the first occurrence of the character in the URL String. For Example:
String URL = "http://test.net/demo_form.asp?name1=stringTest";
int index = URL.indexOf("=");
Then, split the String based on an index
String Result = URL.substring(index+1); //index+1 to skip =
String Result now contains the value: stringTest

If you use the Apache Commons Lang3 library, you can also use the substringAfter method of the StringUtils utility class.
Official documentation is here.
Examples:
String value = StringUtils.substringAfter("key=value", "=");
// in this case where a space is in the value (e.g. read from a file instead of a query params)
String value = StringUtils.trimToEmpty(StringUtils.substringAfter("key = value", "=")); // = "value"
It manage the case where your values can contains the '=' character as it takes the first occurence.
If you have keys and values also containing '=' character it will not work (but the other methods as well); in the URL query params, such a character should be escaped anyway.

How do I extract the second occurence of a character?

How do I extract '1358751074-6824' from this
http://api.discogs.com/images/R-1169056-1358751074-6824.jpeg
and it also needs to extract '13587510746824' from this
http://api.discogs.com/images/R-1169056-13587510746824.jpeg
So I thought I could do it by substringing from the 'second - of the last path component up to the final dot', but how do I work out the second -

Depending on the allowed variations of the string, you could do something like:
String extract = s.replaceAll(".*?-.*?-([\\d-]+).*", "$1");
.*?- skips everyhing up to the first hyphen
.*?- skips everything up to the second hyphen
([\\d-]+) is the part you want to keep: digits and hyphens
.* skips the rest of the string

You can work out the position of the second dash without regular expressions - by finding the position of the first dash, and working from there:
int pos = str.indexOf('-', str.indexOf('-')+1);
Demo.

You can try something like this:
// Your original String
String str = "http://api.discogs.com/images/R-1169056-1358751074-6824.jpeg";
// identify the one-before-last-dash
int i=str.lastIndexOf("-", str.lastIndexOf("-")-1);
// Extract the value you want
String newStr = str.substring(i+1, str.lastIndexOf("."));
// Return numeric value only
String strNums = newStr.replaceAll("[^?0-9]+", "");

split a string in java into equal length substrings while maintaining word boundaries

How to split a string into equal parts of maximum character length while maintaining word boundaries?
Say, for example, if I want to split a string "hello world" into equal substrings of maximum 7 characters it should return me
"hello "
and
"world"
But my current implementation returns
"hello w"
and
"orld "
I am using the following code taken from Split string to equal length substrings in Java to split the input string into equal parts
public static List<String> splitEqually(String text, int size) {
// Give the list the right capacity to start with. You could use an array
// instead if you wanted.
List<String> ret = new ArrayList<String>((text.length() + size - 1) / size);
for (int start = 0; start < text.length(); start += size) {
ret.add(text.substring(start, Math.min(text.length(), start + size)));
}
return ret;
}
Will it be possible to maintain word boundaries while splitting the string into substring?
To be more specific I need the string splitting algorithm to take into account the word boundary provided by spaces and not solely rely on character length while splitting the string although that also needs to be taken into account but more like a max range of characters rather than a hardcoded length of characters.

If I understand your problem correctly then this code should do what you need (but it assumes that maxLenght is equal or greater than longest word)
String data = "Hello there, my name is not importnant right now."
+ " I am just simple sentecne used to test few things.";
int maxLenght = 10;
Pattern p = Pattern.compile("\\G\\s*(.{1,"+maxLenght+"})(?=\\s|$)", Pattern.DOTALL);
Matcher m = p.matcher(data);
while (m.find())
System.out.println(m.group(1));
Output:
Hello
there, my
name is
not
importnant
right now.
I am just
simple
sentecne
used to
test few
things.
Short (or not) explanation of "\\G\\s*(.{1,"+maxLenght+"})(?=\\s|$)" regex:
(lets just remember that in Java \ is not only special in regex, but also in String literals, so to use predefined character sets like \d we need to write it as "\\d" because we needed to escape that \ also in string literal)
\G - is anchor representing end of previously founded match, or if there is no match yet (when we just started searching) beginning of string (same as ^ does)
\s* - represents zero or more whitespaces (\s represents whitespace, * "zero-or-more" quantifier)
(.{1,"+maxLenght+"}) - lets split it in more parts (at runtime :maxLenght will hold some numeric value like 10 so regex will see it as .{1,10})
. represents any character (actually by default it may represent any character except line separators like \n or \r, but thanks to Pattern.DOTALL flag it can now represent any character - you may get rid of this method argument if you want to start splitting each sentence separately since its start will be printed in new line anyway)
{1,10} - this is quantifier which lets previously described element appear 1 to 10 times (by default will try to find maximal amout of matching repetitions),
.{1,10} - so based on what we said just now, it simply represents "1 to 10 of any characters"
( ) - parenthesis create groups, structures which allow us to hold specific parts of match (here we added parenthesis after \\s* because we will want to use only part after whitespaces)
(?=\\s|$) - is look-ahead mechanism which will make sure that text matched by .{1,10} will have after it:
space (\\s)
OR (written as |)
end of the string $ after it.
So thanks to .{1,10} we can match up to 10 characters. But with (?=\\s|$) after it we require that last character matched by .{1,10} is not part of unfinished word (there must be space or end of string after it).

Non-regex solution, just in case someone is more comfortable (?) not using regular expressions:
private String justify(String s, int limit) {
StringBuilder justifiedText = new StringBuilder();
StringBuilder justifiedLine = new StringBuilder();
String[] words = s.split(" ");
for (int i = 0; i < words.length; i++) {
justifiedLine.append(words[i]).append(" ");
if (i+1 == words.length || justifiedLine.length() + words[i+1].length() > limit) {
justifiedLine.deleteCharAt(justifiedLine.length() - 1);
justifiedText.append(justifiedLine.toString()).append(System.lineSeparator());
justifiedLine = new StringBuilder();
}
}
return justifiedText.toString();
}
Test:
String text = "Long sentence with spaces, and punctuation too. And supercalifragilisticexpialidocious words. No carriage returns, tho -- since it would seem weird to count the words in a new line as part of the previous paragraph's length.";
System.out.println(justify(text, 15));
Output:
Long sentence
with spaces,
and punctuation
too. And
supercalifragilisticexpialidocious
words. No
carriage
returns, tho --
since it would
seem weird to
count the words
in a new line
as part of the
previous
paragraph's
length.
It takes into account words that are longer than the set limit, so it doesn't skip them (unlike the regex version which just stops processing when it finds supercalifragilisticexpialidosus).
PS: The comment about all input words being expected to be shorter than the set limit, was made after I came up with this solution ;)

having trouble with arrays and maybe split

String realstring = "&&&.&&&&";
Double value = 555.55555;
String[] arraystring = realstring.split(".");
String stringvalue = String.valueof(value);
String [] valuearrayed = stringvalue.split(".");
System.out.println(arraystring[0]);
Sorry if it looks bad. Rewrote on my phone. I keep getting ArrayIndexOutOfBoundsException: 0 at the System.out.println. I have looked and can't figure it out. Thanks for the help.

split() takes a regexp as argument, not a literal string. You have to escape the dot:
string.split("\\.");
or
string.split(Pattern.quote("."));
Or you could also simply use indexOf('.') and substring() to get the two parts of your string.
And if the goal is to get the integer part of a double, you could also simply use
long truncated = (long) doubleValue;

split uses regex as parameter and in regex . means "any character except line separators", so you could expect that "a.bc".split(".") would create array of empty strings like ["","","","",""]. Only reason it is not happening is because (from split javadoc)
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
so because all strings are empty you get empty array (and that is because you see ArrayIndexOutOfBoundsException).
To turn off removal mechanism you would have to use split(regex, limit) version with negative limit.
To split on . literal you need to escape it with \. (which in Java needs to be written as "\\." because \ is also Strings metacharacter) or [.] or other regex mechanism.

Dot (.) is a special character so you need to escape it.
String realstring = "&&&.&&&&";
String[] partsOfString = realstring.split("\\.");
String part1 = partsOfString[0];
String part2 = partsOfString[1];
System.out.println(part1);
this will print expected result of
&&&
Its also handy to test if given string contains this character. You can do this by doing :
if (string.contains(".")) {
// Split it.
} else {
throw new IllegalArgumentException("String " + string + " does not contain .");
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Parsing a string based on delimiter - java

Related

Empty Strings within a non empty String [duplicate]

How to remove all characters before a specific character in Java?

How do I extract the second occurence of a character?

split a string in java into equal length substrings while maintaining word boundaries

having trouble with arrays and maybe split

Categories

Resources