Finding the strings in a TreeSet that start with a given prefix

Finding the strings in a TreeSet that start with a given prefix - java

I'm trying to find the strings in a TreeSet<String> that start with a given prefix. I found a previous question asking for the same thing — Searching for a record in a TreeSet on the fly — but the answer given there doesn't work for me, because it assumes that the strings don't include Character.MAX_VALUE, and mine can.
(The answer there is to use treeSet.subSet(prefix, prefix + Character.MAX_VALUE), which gives all strings between prefix (inclusive) and prefix + Character.MAX_VALUE (exclusive), which comes out to all strings that start with prefix except those that start with prefix + Character.MAX_VALUE. But in my case I need to find all strings that start with prefix, including those that start with prefix + Character.MAX_VALUE.)
How can I do this?

To start with, I suggest re-examining your requirements. Character.MAX_VALUE is U+FFFF, which is not a valid Unicode character and never will be; so I can't think of a good reason why you would need to support it.
But if there's a good reason for that requirement, then — you need to "increment" your prefix to compute the least string that's greater than all strings starting with your prefix. For example, given "city", you need "citz". You can do that as follows:
/**
* #param prefix
* #return The least string that's greater than all strings starting with
* prefix, if one exists. Otherwise, returns Optional.empty().
* (Specifically, returns Optional.empty() if the prefix is the
* empty string, or is just a sequence of Character.MAX_VALUE-s.)
*/
private static Optional<String> incrementPrefix(final String prefix) {
final StringBuilder sb = new StringBuilder(prefix);
// remove any trailing occurrences of Character.MAX_VALUE:
while (sb.length() > 0 && sb.charAt(sb.length() - 1) == Character.MAX_VALUE) {
sb.setLength(sb.length() - 1);
}
// if the prefix is empty, then there's no upper bound:
if (sb.length() == 0) {
return Optional.empty();
}
// otherwise, increment the last character and return the result:
sb.setCharAt(sb.length() - 1, (char) (sb.charAt(sb.length() - 1) + 1));
return Optional.of(sb.toString());
}
To use it, you need to use subSet when the above method returns a string, and tailSet when it returns nothing:
/**
* #param allElements - a SortedSet of strings. This set must use the
* natural string ordering; otherwise this method
* may not behave as intended.
* #param prefix
* #return The subset of allElements containing the strings that start
* with prefix.
*/
private static SortedSet<String> getElementsWithPrefix(
final SortedSet<String> allElements, final String prefix) {
final Optional<String> endpoint = incrementPrefix(prefix);
if (endpoint.isPresent()) {
return allElements.subSet(prefix, endpoint.get());
} else {
return allElements.tailSet(prefix);
}
}
See it in action at: http://ideone.com/YvO4b3.

If anybody is looking for a shorter version of ruakh's answer:
First element is actually set.ceiling(prefix),and last - you have to increment the prefix and use set.floor(next_prefix)
public NavigableSet<String> subSetWithPrefix(NavigableSet<String> set, String prefix) {
String first = set.ceiling(prefix);
char[] chars = prefix.toCharArray();
if(chars.length>0)
chars[chars.length-1] = (char) (chars[chars.length-1]+1);
String last = set.floor(new String(chars));
if(first==null || last==null || last.compareTo(first)<0)
return new TreeSet<>();
return set.subSet(first, true, last, true);
}

Related

I need to prase integers after a specific character from list of strings

i got a problem here guys. I need to get all the numbers from a string here from a list of strings.
Lets say one of the strings in the list is "Jhon [B] - 14, 15, 16"
and the format of the strings is constant, every string has maximum of 7 numbers in it and the numbers are separated with "," . I want to get every number after the "-". i am really confused here, i tried everything i know of but i am not getting even close.
public static List<String> readInput() {
final Scanner scan = new Scanner(System.in);
final List<String> items = new ArrayList<>();
while (scan.hasNextLine()) {
items.add(scan.nextLine());
}
return items;
}
public static void main(String[] args) {
final List<String> stats= readInput();
}
}

You could...
Just manually parse the String using things like String#indexOf and String#split (and String#trim)
String text = "Jhon [B] - 14, 15, 16";
int indexOfDash = text.indexOf("-");
if (indexOfDash < 0 && indexOfDash + 1 < text.length()) {
return;
}
String trailingText = text.substring(indexOfDash + 1).trim();
String[] parts = trailingText.split(",");
// There's probably a really sweet and awesome
// way to use Streams, but the point is to try
// and keep it simple 😜
List<Integer> values = new ArrayList<>(parts.length);
for (int index = 0; index < parts.length; index++) {
values.add(Integer.parseInt(parts[index].trim()));
}
System.out.println(values);
which prints
[14, 15, 16]
You could...
Make use of a custom delimiter for Scanner for example...
String text = "Jhon [B] - 14, 15, 16";
Scanner parser = new Scanner(text);
parser.useDelimiter(" - ");
if (!parser.hasNext()) {
// This is an error
return;
}
// We know that the string has leading text before the "-"
parser.next();
if (!parser.hasNext()) {
// This is an error
return;
}
String trailingText = parser.next();
parser = new Scanner(trailingText);
parser.useDelimiter(", ");
List<Integer> values = new ArrayList<>(8);
while (parser.hasNextInt()) {
values.add(parser.nextInt());
}
System.out.println(values);
which prints...
[14, 15, 16]

Or You could use a method that will extract signed or unsigned Whole or floating point numbers from a string. The method below makes use of the String#replaceAll() method:
/**
* This method will extract all signed or unsigned Whole or floating point
* numbers from a supplied String. The numbers extracted are placed into a
* String[] array in the order of occurrence and returned.<br><br>
*
* It doesn't matter if the numbers within the supplied String have leading
* or trailing non-numerical (alpha) characters attached to them.<br><br>
*
* A Locale can also be optionally supplied so to use whatever decimal symbol
* that is desired otherwise, the decimal symbol for the system's current
* default locale is used.
*
* #param inputString (String) The supplied string to extract all the numbers
* from.<br>
*
* #param desiredLocale (Optional - Locale varArgs) If a locale is desired for a
* specific decimal symbol then that locale can be optionally
* supplied here. Only one Locale argument is expected and used
* if supplied.<br>
*
* #return (String[] Array) A String[] array is returned with each element of
* that array containing a number extracted from the supplied
* Input String in the order of occurrence.
*/
public static String[] getNumbersFromString(String inputString, java.util.Locale... desiredLocale) {
// Get the decimal symbol the the current system's locale.
char decimalSeparator = new java.text.DecimalFormatSymbols().getDecimalSeparator();
/* Is there a supplied Locale? If so, set the decimal
separator to that for the supplied locale */
if (desiredLocale != null && desiredLocale.length > 0) {
decimalSeparator = new java.text.DecimalFormatSymbols(desiredLocale[0]).getDecimalSeparator();
}
/* The first replaceAll() removes all dashes (-) that are preceeded
or followed by whitespaces. The second replaceAll() removes all
periods from the input string except those that part of a floating
point number. The third replaceAll() removes everything else except
the actual numbers. */
return inputString.replaceAll("\\s*\\-\\s{1,}","")
.replaceAll("\\.(?![\\d](\\.[\\d])?)", "")
.replaceAll("[^-?\\d+" + decimalSeparator + "\\d+]", " ")
.trim().split("\\s+");
}

Strange behavior of Java String split() method

I have a method which takes a string parameter and split the string by # and after splitting it prints the length of the array along with array elements. Below is my code
public void StringSplitTesting(String inputString) {
String tokenArray[] = inputString.split("#");
System.out.println("tokenArray length is " + tokenArray.length
+ " and array elements are " + Arrays.toString(tokenArray));
}
Case I : Now when my input is abc# the output is tokenArray length is 1 and array elements are [abc]
Case II : But when my input is #abc the output is tokenArray length is 2 and array elements are [, abc]
But I was expecting the same output for both the cases. What is the reason behind this implementation? Why split() method is behaving like this? Could someone give me proper explanation on this?

One aspect of the behavior of the one-argument split method can be surprising -- trailing nulls are discarded from the returned array.
Trailing empty strings are therefore not included in the resulting array.
To get a length of 2 for each case, you can pass in a negative second argument to the two-argument split method, which means that the length is unrestricted and no trailing empty strings are discarded.

Just take a look in the documentation:
Trailing empty strings are therefore not included in the resulting
array.
So in case 1, the output would be {"abc", ""} but Java cuts the trailing empty String.
If you don't want the trailing empty String to be discarded, you have to use split("#", -1).

The observed behavior is due to the inherently asymmetric nature of the substring() method in Java:
This is the core of the implementation of split():
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
The key to understanding the behavior of the above code is to understand the behavior of the substring() method:
From the Javadocs:
String java.lang.String.substring(int beginIndex, int endIndex)
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at index
endIndex - 1. Thus the length of the substring is endIndex-beginIndex.
Examples:
"hamburger".substring(4, 8) returns "urge" (not "urger")
"smiles".substring(1, 5) returns "mile" (not "miles")
Hope this helps.

Prefix search using BinarySearch

Using prefix string i need to display all possible string from ArrayList using BinarySearch. Is it possible tell me the wright way.
BinarySearch(myList, SearchString);

To every String prefix and non-empty String suffix applies prefix < prefix+suffix.
So you can use Collections.<String>binarySearch(List<String>,String) to search for the position of prefix. Of course, the existence or absence of the prefix does not say anything about the existence or absence of the prefix+suffix Strings. So if the index is negative, convert it using -index-1 and check that position whether it is equal to the list’s size (in this case no prefixed String was found) or if the String at that index has the prefix. If the index was not negative, i.e. prefix without a suffix was found you have to decide whether to include Strings with empty suffix or not. Since binarySearch will return an arbitrary index in the case of multiple occurrences you have to use the returned index to go linearly to the first occurrence, either backward or forward depending on your decision whether prefix without a suffix shall be included or not.
Once you have the first position you could use binarySearch again to find the end of all prefixed Strings by searching for the smallest String which is greater than any prefixed String. This String can be constructed by incrementing the last character of the prefix by one. Here again it doesn’t matter whether that String is really there, it just gives us the delimiter for the range of prefixed Strings. So you will convert negative values using -index-1 and have the first index of a String without the prefix; it will be the size of the list if no such String exists.
public static List<String> findAllPrefixed(
List<String> list, String prefix, boolean includeEmptySuffixed)
{
int first=Collections.binarySearch(list, prefix);
if(first<0)
{
first=-first-1;
if(first==list.size() || !list.get(first).startsWith(prefix))
return Collections.emptyList();
}
else
{
if(includeEmptySuffixed)
while(first>0 && list.get(first-1).equals(prefix)) first--;
else
{
do first++; while(first<list.size() && list.get(first).equals(prefix));
if(first==list.size() || !list.get(first).startsWith(prefix))
return Collections.emptyList();
}
}
// the conditional is just a small optimization
List<String> notSmaller=first>0? list.subList(first, list.size()): list;
final int p = prefix.length()-1;
if(p<0) return notSmaller;//empty prefix, there are no larger values
final String after=prefix.substring(0, p)+(char)(prefix.charAt(p)+1);
int last=Collections.binarySearch(notSmaller, after);
if(last<0) last=-last-1;
// could just do notSmaller.subList(0,last); but this here reduces heap usage
return last==notSmaller.size()? notSmaller: list.subList(first, first+last);
}

indexOf (String str) - equals string to other string

I need to write method that will chek "String str" on other string, and return the index that the str starts.
That's sound like homework, and it is some of homework but for my use to learn for a test...
i've tried:
public int IndexOf (String str) {
for (i= 0;i<_st.length();i++)
{
if (_st.charAt(i) == str.charAt(i)) {
i++;
if (_st.charAt(i) == str.charAt(i)) {
return i;
}
}
}
return -1;
}
but i dont get the right return. why? i'm on the right way or don't even close?

I am afraid, you are not close.
Here's what you have to do:
Loop on the characters of the string (the one on which you are supposed to do an indexOf, I will call this the master) (you are going this right)
For every character check whether your other string's character and this character are the same.
If they are (a potential start of the same sequence) check whether the next characters in the master match with your String to check (You might want to loop through the elements of the string and check one by one).
If they don't match, continue with the characters in the master string
Something like:
Loop master string
for every character (using index i, lets say)
check whether this is same as first character of the other string
if it is
//potential match
loop through the characters in the child string (lets say using index j)
match them with the consecutive characters in the master string
(something like master[j+i] == sub[j])
If everything match, 'i' is what you want
otherwise, continue with the master, hoping you find a match
Some other points:
In java, method names start with a
lower case letter by convention
(meaning, the compiler won't
complain, but your fellow programmers
may). So IndexOf should actually be
indexOf
Having instance variables
(class level variables) start with a
_ (as in _st) is not a really good
practice. If your professor insists,
you may not have many options, but
keep this in mind)

Not really very close, I'm afraid. What that code basically does is check there if the two strings have two characters in the same positions at any point and, if so, returns the index of the second of those characters. E.g., if _str is "abcdefg" and str is "12cd45", you'll return 3 because they have "cd" in the same place, and that's the index of the "d". At least, that's as near as I can tell what it's actually doing. That's because you're indexing into both strings with the same indexing variable.
To re-write indexOf, looking for str within _st, you have to scan _st for the first character in str and then check whether the remaining characters match; if not, bump forward one place from where you started checking and continue your scan. (There are optimisations you can do, but that's the essence of it.) So for instance, if you find the first character of str at index 4 in _st and str is six characters long, having found the first character you need to see if the remaining five (str's indexes 1-5 inclusive) match _st's indexes 5-10 inclusive (easiest just to check all six of str's characters against a substring of _st starting at 4 and going for six charactesr). If everything matches, return the index at which you found the first character (so, 4 in that example). You can stop scanning at _st.length() - str.length() since if you haven't found it starting prior to that point, you're not going to find it at all.
Side point: Don't call the length function on every loop. The JIT may be able to optimize out the call, but if you know that _st won't change during the course of this function (and if you don't know that, you should require it), grab length() to a local and then refer to that. And of course, since you know you can stop earlier than length(), you'l use a local to remember where you can stop.

You are using i for both strings equal, but what you wan't is the first string to always start at 0 unless the character is found is the other string. Then check if the next characters are equal and so on.
Hope this helps

Your code loops through the string being searched and if the characters at position i match, it checks the next position. If the strings match at the next position, you assume that the string str is contained in _st.
What you probably want to do is:
keep track of whether the whole of str is contained in _st. You could probably check whether the string that you are searching for has length equal to the number of matching characters so far.
if you do the above then you could get the starting index by subtracting the number of matches so far from the current value of i.
One question:
Why are you not using the built in String.IndexOf() function? Is this assignment meant for you to implement this functionality on your own?

Maybe the Oracle Java API Source code does help:
/**
* Returns the index within this string of the first occurrence of the
* specified substring. The integer returned is the smallest value
* <i>k</i> such that:
* <blockquote><pre>
* this.startsWith(str, <i>k</i>)
* </pre></blockquote>
* is <code>true</code>.
*
* #param str any string.
* #return if the string argument occurs as a substring within this
* object, then the index of the first character of the first
* such substring is returned; if it does not occur as a
* substring, <code>-1</code> is returned.
*/
public int indexOf(String str) {
return indexOf(str, 0);
}
/**
* Returns the index within this string of the first occurrence of the
* specified substring, starting at the specified index. The integer
* returned is the smallest value <tt>k</tt> for which:
* <blockquote><pre>
* k >= Math.min(fromIndex, this.length()) && this.startsWith(str, k)
* </pre></blockquote>
* If no such value of <i>k</i> exists, then -1 is returned.
*
* #param str the substring for which to search.
* #param fromIndex the index from which to start the search.
* #return the index within this string of the first occurrence of the
* specified substring, starting at the specified index.
*/
public int indexOf(String str, int fromIndex) {
return indexOf(value, offset, count,
str.value, str.offset, str.count, fromIndex);
}
/**
* Code shared by String and StringBuffer to do searches. The
* source is the character array being searched, and the target
* is the string being searched for.
*
* #param source the characters being searched.
* #param sourceOffset offset of the source string.
* #param sourceCount count of the source string.
* #param target the characters being searched for.
* #param targetOffset offset of the target string.
* #param targetCount count of the target string.
* #param fromIndex the index to begin searching from.
*/
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j] ==
target[k]; j++, k++);
if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}

Why do Strings start with a "" in Java? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why does “abcd”.StartsWith(“”) return true?
Whilst debugging through some code I found a particular piece of my validation was using the .startsWith() method on the String class to check if a String started with a blank character
Considering the following :
public static void main(String args[])
{
String s = "Hello";
if (s.startsWith(""))
{
System.out.println("It does");
}
}
It prints out It does
My question is, why do Strings start off with a blank character? I'm presuming that under the hood Strings are essentially character arrays, but in this case I would have thought the first character would be H
Can anyone explain please?

"" is an empty string containing no characters. There is no "empty character", unless you mean a space or the null character, neither of which are empty strings.
You can think of a string as starting with an infinite number of empty strings, just like you can think of a number as starting with an infinite number of leading zeros without any change to the meaning.
1 = ...00001
"foo" = ... + "" + "" + "" + "foo"
Strings also end with an infinite number of empty strings (as do decimal numbers with zeros):
1 = 001.000000...
"foo" = "foo" + "" + "" + "" + ...

Seems like there is a misunderstanding in your code. Your statement s.startsWith("") checks if string starts with an empty string (and not a blank character). It may be a weird implementation choice, anyway, it's as is : all strings will say you they start with an empty string.
Also notice a blank character will be the " " string, as opposed to your empty string "".

"Hello" starts with "" and it also starts with "H" and it also starts with "He" and it also sharts with "Hel" ... do you see?

That "" is not a blank it's an empty string. I guess that the API is asking the question is this a substring of that. And the zero-length empty string is a substring of everything.

The empty String ("") basically "satisfies" every string. In your example, java calls
s.startsWith("");
to
s.startsWith("", 0);
which essentially follows the principle that "an empty element(string) satisfies its constraint (your string sentence).".
From String.java
/**
* Tests if the substring of this string beginning at the
* specified index starts with the specified prefix.
*
* #param prefix the prefix.
* #param toffset where to begin looking in this string.
* #return <code>true</code> if the character sequence represented by the
* argument is a prefix of the substring of this object starting
* at index <code>toffset</code>; <code>false</code> otherwise.
* The result is <code>false</code> if <code>toffset</code> is
* negative or greater than the length of this
* <code>String</code> object; otherwise the result is the same
* as the result of the expression
* <pre>
* this.substring(toffset).startsWith(prefix)
* </pre>
*/
public boolean startsWith(String prefix, int toffset) {
char ta[] = value;
int to = offset + toffset;
char pa[] = prefix.value;
int po = prefix.offset;
int pc = prefix.count;
// Note: toffset might be near -1>>>1.
if ((toffset < 0) || (toffset > count - pc)) {
return false;
}
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}

For folks who have taken automata theory, this makes sense because the empty string ε is a substring of any string and also is the concatenation identity element, ie:
for all strings x, ε + x = x, and x + ε = x
So yes, every string "startWith" the empty string. Also note (as many others said it), the empty string is different from a blank or null character.

A blank is (" "), that's different from an empty string (""). A blank space is a character, the empty string is the absence of any character.

An empty string is not a blank character. Assuming your question with empty string, I guess they decided to leave it that way but it does seem odd. They could have checked the length but they didn't.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding the strings in a TreeSet that start with a given prefix - java

Related

I need to prase integers after a specific character from list of strings

Strange behavior of Java String split() method

Prefix search using BinarySearch

indexOf (String str) - equals string to other string

Why do Strings start with a "" in Java? [duplicate]

Categories

Resources