So in Java, whenever an indexed range is given, the upper bound is almost always exclusive.
From java.lang.String:
substring(int beginIndex, int endIndex)
Returns a new string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1
From java.util.Arrays:
copyOfRange(T[] original, int from, int to)
from - the initial index of the range to be copied, inclusive
to - the final index of the range to be copied, exclusive.
From java.util.BitSet:
set(int fromIndex, int toIndex)
fromIndex - index of the first bit to be set.
toIndex - index after the last bit to be set.
As you can see, it does look like Java tries to make it a consistent convention that upper bounds are exclusive.
My questions are:
Is this the official authoritative recommendation?
Are there notable violations that we should be wary of?
Is there a name for this system? (ala "0-based" vs "1-based")
CLARIFICATION: I fully understand that a collection of N objects in a 0-based system is indexed 0..N-1. My question is that if a range (2,4) given, it can be either 3 items or 2, depending on the system. What do you call these systems?
AGAIN, the issue is not "first index 0 last index N-1" vs "first index 1 last index N" system; that's known as the 0-based vs 1-based system.
The issue is "There are 3 elements in (2,4)" vs "There are 2 elements in (2,4)" systems. What do you call these, and is one officially sanctioned over the other?
In general, yes. If you are working in a language with C-like syntax (C, C++, Java), then arrays are zero-indexed, and most random access data structures (vectors, array-lists, etc.) are going to be zero-indexed as well.
Starting indices at zero means that the size of the data structure is always going to be one greater than last valid index in the data structure. People often want to know the size of things, of course, and so it's more convenient to talk about the size than to talk about the the last valid index. People get accustomed to talking about ending indices in an exclusive fashion, because an array a[] that is n elements long has its last valid element in a[n-1].
There is another advantage to using an exclusive index for the ending index, which is that you can compute the size of a sublist by subtracting the inclusive beginning index from the exclusive ending index. If I call myList.sublist(3, 7), then I get a sublist with 7 - 3 = 4 elements in it. If the sublist() method had used inclusive indices for both ends of the list, then I would need to add an extra 1 to compute the size of the sublist.
This is particularly handy when the starting index is a variable: Getting the sublist of myList starting at i that is 5 elements long is just myList.sublist(i, i + 5).
All of that being said, you should always read the API documentation, rather than assuming that a given beginning index or ending index will be inclusive or exclusive. Likewise, you should document your own code to indicate if any bounds are inclusive or exclusive.
Credit goes to FredOverflow in his comment saying that this is called the "half-open range". So presumably, Java Collections can be described as "0-based with half-open ranges".
I've compiled some discussions about half-open vs closed ranges elsewhere:
siliconbrain.com - 16 good reasons to use half-open ranges (edited for conciseness):
The number of elements in the range [n, m) is just m-n (and not m-n+1).
The empty range is [n, n) (and not [n, n-1], which can be a problem if n is an iterator already pointing the first element of a list, or if n == 0).
For floats you can write [13, 42) (instead of [13, 41.999999999999]).
The +1 and -1 are almost never used, when handling ranges. This is an advantage if they are expensive (as it is for dates).
If you write a find in a range, the fact that there was nothing found can easily indicated by returning the end as the found position: if( find( [begin, end) ) == end) nothing found.
In languages, which start the array subscripts with 0 (like C, C++, JAVA, NCL) the upper bound is equal to the size.
Half-open versus closed ranges
Advantages of half-open ranges:
Empty ranges are valid: [0 .. 0]
Easy for subranges to go to the end of the original: [x .. $]
Easy to split ranges: [0 .. x] and [x .. $]
Advantages of closed ranges:
Symmetry.
Arguably easier to read.
['a' ... 'z'] does not require awkward + 1 after 'z'.
[0 ... uint.max] is possible.
That last point is very interesting. It's really awkward to write an numberIsInRange(int n, int min, int max) predicate with a half-open range if Integer.MAX_VALUE could be legally in a range.
Its just 0 to n-1 based.
A list/Array contains 10 items 0-9 indexed.
You cannot have a 0 indexed based list that is 0-n where the cout is n, that includes an item that does not exists...
This is the typical way things work.
Yes.
Excel Ranges/Sheets/Workbooks.
Index (information technology)
This practice was introduced by Josh Bloch to Collections API as a contract.
After that it became a standard in java and when anybody dicide to create a public library he assumes that he should keep the contract because users expect to see already known behavior in new libraries.
The indexes in array like datastructures are indeed always 0-based. The String is basically backed by a char[]. The Collections framework is under the hood based on arrays and so on. This makes designing/maintaining/using the API easier without changing the "under-the-hood" way to access the desired element(s) in the array.
There are however some "exceptions", such as the parameterindex-based setter methods of PreparedStatement and the columnindex-based getter methods of ResultSet. They are 1-based. Behind the scenes they does also not really represent an array of values.
This would probably bring up a new question: "Why are array indexes zero based?". Now, our respected computer programming scientist E.W. Dijkstra explains here why it should start with zero.
The easy way to think of half-open ranges is this: the first term identifies the start of elements within the range, and the second term identifies the start of elements after the range. Keep that in mind, and it all makes a good deal more sense. Plus the arithmetic works out better in many cases, per #polygenelubricants' answer.
Related
I am going through official TreeMap documentation. I see subMap() prototype as:
public SortedMap<K,V> subMap(K fromKey, K toKey) is equivalent to subMap(fromKey, true, toKey, false).
I have seen everywhere, Java does not include the last value of a given range by default? Why was this decision made that default inclusion value for toKey will be false (and not true)?
This has been "answered" for String#substring here:
https://stackoverflow.com/a/26631968/1571268
The question about the "Why" may be considered as philosophical or academic, and provoke answers along the line of "That's just the way it is".
[...]
However, regardless of which choice is made, and regardless how it is justified: It is important to be consistent throughout the whole API.
If you are going to specify ranges by specifying end-points it's generally best to specify the range as B<=x<E.
At the very least it means you can specify the empty range by B=E.
But when it comes to 'infinite' value spaces it becomes essential.
Suppose I want to specify every string beginning with C?
In the 'end exclusive' model that's B="C" and E="D".
In the 'end inclusive' model that's B="C" and E="CZZZZZZZZZZZZZZ..." where E is some string we've decided is longer than any string we're going to practically encounter.
It also makes it practical to define non-overlapping coverages as [B,E),[E,F) and so on.
NB: Mathematical convention is that [ indicates >= and ) indicates <
Some people argue it's a matter of taste. But in practice you can create a partition from ordered values A,B,C,D,E... as [A,B),[B,C),[C,D) without any fiddling around trying to identify (the potentially non-existent or unconstructable) value immediately before B.
It's somewhere between messy and impossible to construct the partition [A,B-e1],[B,C-e2]. Just duck the issue and use half-inclusive intervals!
It normally makes sense to use inclusive-exclusive ranges but the approach works the other way.
this is general in almost all programming languages I guess!
like I see in 'python' and 'C'.
This is kind of a rule of sets(collections) in mathematics. you can imagine this as below:
fromKey <= x < toKey
eg:
0 <= x < 10
anyway, if you want to include toKey itself in any range, you should do:
toKey + 1
or
toKey ++
Indices in most programming languages start at 0. Thus, the last element of any linear collection is size - 1.
Now, any range is defined as start (inclusive) - end (exclusive), since it makes the code more readable.
Example
Let's say you have "Java" and you want to split it at the "v".
With this definition, it is very easy:
String java = "Java"; // length: 4
int index = java.indexOf("v"); // 2
String start = java.substring(0, index); // from 0 (inclusive) to 2 (exclusive)
String end = java.substring(index); // from 2 (inclusive) to length (exclusive, implicit)
System.out.println(start + ", " + end); // Ja, va
in java almost every method to get a sub part of a data structure it goes form the fromkey (included) to the tokey (not included) so what you have to do is do tokey +1 instead of just tokey.
As for example, if we do map.subMap(a,b); we will get the values from indexes a to b-1.
Hope this helps.
I'm working on my homework and there's a question that ask us to sort a struct array
The structcitizen consist of an int id and a boolean gender, where id is randomly generated between 1 to 100,
and gender is determined by if id is odd or even, odd=true(male) and even=false(female)
for example a = {33, true}
The question requires me to sort the citizen[] array by gender, it seems very easy but it has the following requirements:
run in linear times O(N)
no new array
only constant extra space can be used
I am thinking about using counting sort but it seems a little bit hard to do it without a new array, is there any suggestion?
Since this is a homework question I'm not going to provide code. The following should be sufficient to get you started.
"Sorting" by gender here really means partitioning into two groups. A general purpose sort cannot be better than O(n*log(n)), but partitioning can be done in O(n) with constant space.
Consider iterating from both ends simultaneously (while loop, two index pointers initialized to first and last elements) looking for elements that are in the "wrong" section. When you find one such element at each end, swap them. Note that the pointers move independently of each other, only when skipping over elements that are already in the right section, and of course immediately after a swap, which is a subcase of "elements already in the right section".
Quit when the index pointers meet somewhere in the middle.
This is not a general purpose sort. You cannot do this for the case where the number of keys is unknown.
Since you only have two values to sort, you could use a kind of swap-counting-sort (I couldn't find any relevant paper on that one).
There is room for optimisation on that sort, but that will be your job.
Here is a pseudo-code of that special sort according to your issue :
integer maleIndex = 0 // Current position of males in the array
for i=0 until array.size do
if array.at(i) is a male then
// after a while, all female will end up at the end
// while all male will end up at the beginning
swap(array.at(maleIndex), array.at(i))
maleIndex = maleIndex + 1
end
end
One approach is similar to the partition stage in QuickSort, or to the median/rank finding algorithm QuickSelect.
I'm going to describe the outline of the algorithms, but not provide any code. Hopefully, it will be good enough that it is easy to make the translation.
You basically want to reorganize the array so that one gender is at the start, and the other is at the end of the array. You'll have the array partitioned in three:
From 0 to i-1 you have the first gender (male or female, up to you)
From i to j-1 you have both male/female. This is the unknown area.
From j to n-1 you have the second gender.
At the start of the algorithm i is set to 0, so the first area is empty, and j is set to n-1, so the second area is empty. Basically the whole array is in the unknown state.
Then, you iterate over the array in a particular way. At each step, you look at citizen[i].gender. If it is the first gender, you leave it alone and increment i. If it is the second gender, you swap A[i] with A[j] and decrement j. You stop when i is equal to j.
Why is this correct? Well, at each step we can see that the constraint of having the three areas is maintained, assuming it held to begin with (which it does), and either the first or the second one increases. At the end, the second area has no elements, so we're only left with the first gender at the start, and the second at the end.
Why is it linear? Well, at each step we make a constant-time decision for one element in the array about where it should belong. There's n such elements, so the time is linear in that. Alternatively, the iteration test can be expressed as while (j - i > 0), and the expression j - i starts at n-1 and drops by 1 for each iteration.
My teacher and I were discussing whether or not a recursive permutation function could be written without the use of substrings and/or arrays in Java.
Is there a way to do this?
The answer is yes, this can be done. I'm assuming that "without the use of substrings and/or arrays" refers to the info being passed to the recursion. You have to have some sort of container for the elements that are to be permuted.
In that case it can be done by pulling some hideous tricks with numerically encoding the indices of the elements as digits of a numeric argument. For instance, if there are 3 elements and I use 1 as a sentinel value in the left-most digit (so you can have 0 as the leading index sometimes), 1 means I haven't started, 10 means the first element has been selected, 102 means the first and third, and 1021 means I'm ready to print the permutation since I now have a 4 digit argument and there are 3 elements in the set. I can then deconstruct which elements to print using % 10 and / 10 arithmetic to pick them off.
I implemented this in Ruby rather than Java, and I'm not going to share the actual code because it's too horrible to contemplate. However, it works recursively with only the input array of elements and an integer as arguments, no partial solution substrings or arrays.
I am currently working on this coding problem for class.
Given a sorted array of n distinct values as well as a target value T, determine in O(n) time whether or not there exist two distinct values in the array that sum to T. (For example, if the array contained 3, 5, 6, 7, and 9 and T = 14, then the method you are to write should return true, since 5+9 = 14. It should return false if for the same array of values T = 17.)
So, initially, I just wrote the problem with a nested linear search approach which obviously results in a O(n^2) runtime to establish a baseline to simplify from, however, I have only been able to, so far, simplify it to O(n log(n)). I did this by creating a new array made up of the differences of the Target - array[i] and then comparing the new array to the original array using a binary search nested within a loop that linearly goes up the new array.
I am not asking for an answer but rather a hint at where to look to simplify my code. I feel like the fact that the array is sorted is important in getting it down to O(n) but not sure how to go about doing it.
Thanks for your time!
Imagine you have two pointers (s, e) wich set on start and end of you array.
If you will move them in opposite direction (with specific algorithm) and look at the summ of elements you will see that moving one pointer increase summ and moving other decrease.
Onli thing you need is find balance.
If it doesnt help. Ask for next tip.
Some tips/steps:
1 - Start the iteration by the array[i], which is the nearest lower value from T
2 - Move another pointer to the array[0]
3 - Sum both values and compare with T
4 - If bigger or if lower, do appropriate moving in the pointers and repeat the step 3
A Hint:
Something like Binary Search, start with middle (compare with middle)
we have startindex = 0, endindex = N-1
while(some condition){
middleindex = endindex - startindex / 2, middle = array[middleindex]
if T - array[middleindex] > middle, startindex = middleindex
if T - array[middleindex] < middle, endindex = middleindex
}
It will do the task in O(log(n)) :D
Directly from this java api:
Why adding a "\0" would "open" one range end as explained in the following quote?
I checked the "\0" escape sequence and it says it represents the null character.
What is the null character in terms of Strings? and why adding to the "high parameter" of a subset should give the parameter itself included in the range?
If you need a closed range (which includes both endpoints), and the
element type allows for calculation of the successor of a given value,
merely request the subrange from lowEndpoint to
successor(highEndpoint). For example, suppose that s is a sorted set
of strings. The following idiom obtains a view containing all of the
strings in s from low to high, inclusive:
SortedSet sub = s.subSet(low, high+"\0");
Thanks in advance for your time.
high+"\0" is a way to obtaining the String that would be sorted immediately after high.
So, if you want a subset that includes the high element, you need to specify the limit to the subset as high+"\0"
For example, if you were dealing with a SortedSet<Int> and you wanted the subset between 4 and 8, both inclusive, you would use s.subSet(4, 8+1). high+"\0" is the String equivalent.
When you call subset with a high and a low limit, the high limit element will not be included (ie low <= element < high will be included, but that excludes high).
If you want it included, you need to give a limit slightly higher, but not high enough to include another element.
The easiest way to make the next bigger string is to append a \0, since making it longer will make it sort just after the high limit (so the high limit element is included), but it's not possible to find another string that sorts between them, so there's no risk of inadvertently including an extra element.