Customize the sorting of tuples? [duplicate] - java

In Python 2.x, I could pass custom function to sorted and .sort functions
>>> x=['kar','htar','har','ar']
>>>
>>> sorted(x)
['ar', 'har', 'htar', 'kar']
>>>
>>> sorted(x,cmp=customsort)
['kar', 'htar', 'har', 'ar']
Because, in My language, consonents are comes with this order
"k","kh",....,"ht",..."h",...,"a"
But In Python 3.x, looks like I could not pass cmp keyword
>>> sorted(x,cmp=customsort)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'cmp' is an invalid keyword argument for this function
Is there any alternatives or should I write my own sorted function too?
Note: I simplified by using "k", "kh", etc. Actual characters are Unicodes and even more complicated, sometimes there is vowels comes before and after consonents, I've done custom comparison function, So that part is ok. Only the problem is I could not pass my custom comparison function to sorted or .sort

Use the key keyword and functools.cmp_to_key to transform your comparison function:
sorted(x, key=functools.cmp_to_key(customsort))

Use the key argument (and follow the recipe on how to convert your old cmp function to a key function).
functools has a function cmp_to_key mentioned at docs.python.org/3.6/library/functools.html#functools.cmp_to_key

A complete python3 cmp_to_key lambda example:
from functools import cmp_to_key
nums = [28, 50, 17, 12, 121]
nums.sort(key=cmp_to_key(lambda x, y: 1 if str(x)+str(y) < str(y)+str(x) else -1))
compare to common object sorting:
class NumStr:
def __init__(self, v):
self.v = v
def __lt__(self, other):
return self.v + other.v < other.v + self.v
A = [NumStr("12"), NumStr("121")]
A.sort()
print(A[0].v, A[1].v)
A = [obj.v for obj in A]
print(A)

Instead of a customsort(), you need a function that translates each word into something that Python already knows how to sort. For example, you could translate each word into a list of numbers where each number represents where each letter occurs in your alphabet. Something like this:
my_alphabet = ['a', 'b', 'c']
def custom_key(word):
numbers = []
for letter in word:
numbers.append(my_alphabet.index(letter))
return numbers
x=['cbaba', 'ababa', 'bbaa']
x.sort(key=custom_key)
Since your language includes multi-character letters, your custom_key function will obviously need to be more complicated. That should give you the general idea though.

I don't know if this will help, but you may check out the locale module. It looks like you can set the locale to your language and use locale.strcoll to compare strings using your language's sorting rules.

Use the key argument instead. It takes a function that takes the value being processed and returns a single value giving the key to use to sort by.
sorted(x, key=somekeyfunc)

Related

How to sort an array of Strings that contain "m" first, everything else second

String[] strArray = {"xyz", "aaazzz","abc","mft","gh","j", "aaazaz", "mm", "am"};
Arrays.sort(strArray, Comparator.comparing((s) -> s.contains("m")));
System.out.println("Array sorted by 'm': " + Arrays.toString(strArray));
I have been able to get the array so that it is sorted using 'm', but the results are in defending order - as in all strings that contain 'm' are at the end.
My print out reads;
Array sorted by 'm': [xyz, aaazzz, abc, gh, j, aaazaz, mft, mm, am]
I have considered using indexOf() but I haven't been able to figure out a way to get this to work.
Many thanks for any suggestions!
The natural order of boolean values is false -> true (i.e. false comes first).
You need to change the condition of your Comparator to get elements containing m to be placed and the beginning of the sorted list:
Comparator.comparing(s -> !s.contains("m"))
You might want both groups of string (with and without m) to be sorted as well. For that you can use method thenComparing() which allow to build an aggregate Comparator based on several conditions:
Comparator.comparing((String s) -> !s.contains("m"))
.thenComparing(Comparator.naturalOrder());
Note that when comparators are being chained together, the type inference mechanism fails to infer the types of arguments used in the chained methods based on the target type (i.e. the expected aggregate Comparator). And we need to either provide the types of argument in the lambda expression explicitly, like shown above (String s) -> ..., or use a so-called type-witness <String, Boolean>comparing(...).

With Scala's Set, is there a method analog to the containsAll method in Java's Set?

While working through converting some Java code over to Scala, I discovered while there is a contains method for Scala's Set, there isn't a containsAll method. Am I just missing the correct method name?
Here's a bit of code I worked up to fill in the gap so I could quickly get back to working. Is it sufficient, or am I missing some subtlety?
def containsAll[A](set: Set[A], subset: Set[A]): Boolean =
if (set.size >= subset.size)
subset.forall(a => set.contains(a))
else
false
There is subsetOf, which tests whether or not the elements of a Set are contained within another Set. (Kind of the reverse in terms of the expression)
val set = Set(1,2,3,4)
val subset = Set(1,2)
scala> subset.subsetOf(set)
res0: Boolean = true
scala> set.subsetOf(subset)
res1: Boolean = false
In Scala, Set is equipped with set operations such as intersect, thus for instance
set.intersect(subset) == subset
conveys the semantics of containsAll, even that subsetOf as already mentioned proves the most succinct.
It's worth adding that you can make derived helper methods like containsAll available on Set[T] if you want, by using an implicit enriched class. You might also consider making a variadic overload:
implicit class RichSet[T](val x: Set[T]) extends AnyVal {
def containsAll(y: Set[T]): Boolean = y.subsetOf(x)
def containsAll(y: T*): Boolean = x.containsAll(y.toSet)
}
So then you can do:
Set(1, 2, 3).containsAll(Set(1, 2))
Or:
Set(1, 2, 3).containsAll(1, 2)
Previous answers are all good, I'm just throwing-in another option. This one would also work with Lists which don't have subsetOf method:
Set(1,2,3) forall(Set(3, 2, 1) contains)

Shortest possible resulting length after iterated string replacement

How would I go about reasonably efficiently finding the shortest possible output given by repeatedly applying replacements to an input sequence? I believe (please correct me if I am wrong) that this is exponential-time in the worst case, but I am not sure due to the second constraint below. The naive method certainly is.
I tried coding the naive method (for all possible replacements, for all valid positions, recurse on a copy of the input after applying the replacement at the position. Return the shortest of all valid recursions and the input, with a cache on the function to catch equivalent replacement sequences), but it is (unworkably) slow, and I'm pretty sure it's an algorithmic issue as opposed to the implementation.
A couple of things that may (or may not) make a difference:
Token is an enumerated type.
The length of the output of each entry in the map is strictly less than the input of the entry.
I do not need what replacements were done and where, just the resulting sequence.
So, as an example where each character is a token (for simplicity's sake), if I have the replacement map as aaba -> a, aaa -> ab, and aba -> bb, and I apply minimalString('aaaaa'), I want to get 'a'.
The actual method signature is something along the following lines:
List<Token> getMinimalAfterReplacements(List<Token> inputList, Map<List<Token>, List<Token>> replacements) {
?
}
Is there a better method than brute-force? If not, is there, for example, a SAT library or similar that could be harnessed? Is there any preprocessing to the map that could be done to make it faster when called multiple times with different token lists but with the same replacement map?
The code below is a Python version to find the shortest possible reduction. It is non-recursive but not too far from the naive algorithm. In every step it tries all possible single reductions, thus, obtaining a set of strings to reduce for the next step.
One optimization that helps in cases when there are "symbol eating" rules like "aa" -> "a" is to check the next set of strings for duplicates.
Another optimization (not implemented in the code below) would be to process the replacement rules into a finite automaton that finds locations of all possible single reductions with a single pass through the input string. This would not help the exponential nature of the main tree search algorithm, though.
class Replacer:
def __init__(self, replacements):
self.replacements = [[tuple(key), tuple(value)] for key, value in replacements.items()]
def get_possible_replacements(self, input):
"Return all possible variations where a single replacement was done to the input"
result = []
for replace_what, replace_with in self.replacements:
#print replace_what, replace_with
for p in range(1 + len(input) - len(replace_what)):
if input[p : p + len(replace_what)] == replace_what:
input_copy = list(input[:])
input_copy[p : p + len(replace_what)] = replace_with
result.append(tuple(input_copy))
return result
def get_minimum_sequence_list(self, input):
"Return the shortest irreducible sequence that can be obtained from the given input"
irreducible = []
to_reduce = [tuple(input)]
to_reduce_new = []
step = 1
while to_reduce:
print "Reduction step", step, ", number of candidates to reduce:", len(to_reduce)
step += 1
for current_input in to_reduce:
reductions = self.get_possible_replacements(current_input)
if not reductions:
irreducible.append(current_input)
else:
to_reduce_new += reductions
to_reduce = set(to_reduce_new[:]) # This dramatically reduces the tree width by removing duplicates
to_reduce_new = []
irreducible_sorted = sorted(set(irreducible), key = lambda x: len(x))
#print "".join(input), "could be reduced to any of", ["".join(x) for x in irreducible_sorted]
return irreducible_sorted[0]
def get_minimum_sequence(self, input):
return "".join(self.get_minimum_sequence_list(list(input)))
input = "aaaaa"
replacements = {
"aaba" : "a",
"aaa" : "ab",
"aba" : "bb",
}
replacer = Replacer(replacements)
replaced = replacer.get_minimum_sequence(input)
print "The shortest string", input, "could be reduced to is", replaced
Just a simple idea which might reduce the branching: With rules like
ba -> c
ca -> b
and a string like
aaabaacaa
^ ^
you can do two substitutions and their order doesn't matter. This is already sort of covered by memoization, however, there's still a considerable overhead for generating the useless string. So I'd suggest the following rule:
After a substitution on position p, consider only substitutions on positions q such that
q + length(lhs_of_the_rule) > p
i.e., such that don't start to the left of the previous substitutions or they overlap.
As a simple low-level optimization I'd suggest to replace the List<Token> by a String or (or an encapsulated byte[] or short[] or whatever). The lower memory footprint should help the cache and you can index an array by a string element (or two) in order to find out what rules may be applicable for it.

How to use or implement arrays in XQuery?

Is there any built in support for array in XQuery? For example, if we want to implement
the simple java program in xquery how we would do it:
(I am not asking to translate the entire program into xquery, but just asking
how to implement the array in line number 2 of the below code to xquery? I am
using marklogic / xdmp functions also).
java.lang.String test = new String("Hello XQuery");
char[] characters = test.toCharArray();
for(int i = 0; i<characters.length; i++) {
if(character[i] == (char)13) {
character[i] = (char) 0x00;
}
}
Legend:
hex 0x00 dec 0 : null
hex 0x0d dec 13: carriage return
hex 0x0a dec 10: line feed
hex 0x20 dec 22: dquote
The problem with converting your sample code to XQuery is not the absence of support for arrays, but the fact that x00 is not a valid character in XML. If it weren't for this problem, you could express your query with the simple function call:
translate($input, '', '')
Now, you could argue that's cheating, it just happens so that there's a function that does exactly what you are trying to do by hand. But if this function didn't exist, you could program it in XQuery: there are sufficient primitives available for strings to allow you to manipulate them any way you want. If you need to (and it's rarely necessary) you can convert a string to a sequence of integers using the function string-to-codepoints(), and then take advantage of all the XQuery facilities for manipulating sequences.
The lesson is, when you use a declarative language like XQuery or XSLT, don't try to use the same low-level programming techniques you were forced to use in more primitive languages. There's usually a much more direct way of expressing the problem.
XQuery has built-in support for sequences. The function tokenize() (as suggested by #harish.ray) returns a sequence. You can also construct one yourself using braces and commas:
let $mysequence = (1, 2, 3, 4)
Sequences are ordered lists, so you can rely on that. That is slightly different from a node-set returned from an XPath, those usually are document-ordered.
On a side mark: actually, everything in XQuery is either a node-set or a sequence. Even if a function is declared to return one string or int, you can treat that returned value as if it is a sequence of one item. No explicit casting is necessary, for which there are no constructs in XQuery anyhow. Functions like fn:exists() and fn:empty() always work.
HTH!
Just for fun, here's how I would do this in XQuery if fn:translate did not exist. I think Michael Kay's suggestion would end up looking similar.
let $test := "Hello XQuery"
return codepoints-to-string(
for $c in string-to-codepoints($test)
return if ($c eq 32) then 44 else $c)
Note that I changed the transformation because of the problem he pointed: 0 is not a legal codepoint. So instead I translated spaces to commas.
With MarkLogic, another option is to use http://docs.marklogic.com/json:array and its associated functions. The json:set-item-at function would allow coding in a vaguely imperative style. Coding both variations might be a good learning exercise.
There are two ways to do this.
Firstly you can create an XmlResults object using
XmlManager.createResults(), and use XmlResults.add() to add your
strings to this. You can then use the XmlResults object to set the
value of a variable in XmlQueryContext, which can be used in your
query.
Example:
XmlResults values = XMLManager.createResults();
values.add(new XmlValue("value1"));
values.add(new XmlValue("value2"));
XmlQueryContext.setVariableValue("files", values);
The alternative is to split the string in XQuery. You
can do this using the tokenize() function, which works using a
regular expression to match the string separator.
http://www.w3.org/TR/xpath-functions/#func-tokenize
Thanks.
A little outlook: XQuery 3.1 will provide native support for arrays. See http://www.w3.org/TR/xquery-31/ for more details.
You can construct an array like this:
$myArray = tokenize('a b c d e f g', '\s')
// $myArray[3] -> c
Please note that the first index of this pseudo-array is 1 not 0!
Since the question "How to use or implement arrays in XQuery?" is being held generic (and thus shows up in search results on this topic), I would like to add a generic answer for future reference (making it a Community Wiki, so others may expand):
As Christian GrĂ¼n has already hinted at, with XQuery 3.1 XQuery got a native array datatype, which is a subtype of the function datatype.
Since an array is a 'ordered list of values' and an XPath/XQuery sequence is as well, the first question, which may arise, is: "What's the difference?" The answer is simple: a sequence can not contain another sequence. All sequences are automatically flattened. Not so an array, which can be an array of arrays. Just like sequences, arrays in XQuery can also have any mix of any other datatype.
The native XQuery array datatype can be expressed in either of two ways: As [] or via array {}. The difference being, that, when using the former constructor, a comma is being considered a 'hard' comma, meaning that the following array consists of two members:
[ ("apples", "oranges"), "plums" ]
while the following will consist of three members:
array { ("apples", "oranges"), "plums" }
which means, that the array expression within curly braces is resolved to a flat sequence first, and then memberized into an array.
Since Array is a subtype of function, an array can be thought of as an anonymous function, that takes a single parameter, the numeric index. To get the third member of an array, named $foo, we thus can write:
$foo(3)
If an array contains another array as a member you can chain the function calls together, as in:
$foo(3)(5)
Along with the array datatype, special operators have been added, which make it easy to look up the values of an array. One such operator (also used by the new Map datatype) is the question mark followed by an integer (or an expression that evaluates to zero or more integers).
$foo?(3)
would, again, return the third member within the array, while
$foo?(3, 6)
would return the members 3 and 6.
The parenthesis can be left out, when working with literal integers. However, the parens are needed, to form the lookup index from a dynamic expression, like in:
$foo?(3 to 6)
here, the expression in the parens gets evaluated to a sequence of integers and thus the expression would return a sequence of all members from index position 3 to index position 6.
The asterisk * is used as wildcard operator. The expression
$foo?*
will return a sequence of all items in the array. Again, chaining is possible:
$foo?3?5
matches the previos example of $foo(3)(5).
More in-depth information can be found in the official spec: XML Path Language (XPath) 3.1 / 3.11.2 Arrays
Also, a new set of functions, specific to arrays, has been implemented. These functions resinde in the namespace http://www.w3.org/2005/xpath-functions/array, which, conventionally, is being prefixed with array and can be found referenced in here: XPath and XQuery Functions and Operators 3.1 / 17.3 Functions that Operate on Arrays

Array declaration using lists

So I noticed something.
When doing a recursive method on codingbat, I looked at the:
String abc = "abc";
String mod = abc.substring(1);
System.out.println(mod); //prints "bc"
So I thought:
Hey, why not have a substring-like method for arrays?
For example:
String[] abc = {"a", "b", "c"};
String[] mod = abc[1, str.length];
for(int i = 0; i < mod.length; i++){
System.out.print(mod[i]);
if(i + 1 != mod.length){
System.out.print(", ");
}
}
// This would print out: "b, c"
So, as you can see from this, it is like the substring method as it adds the Objects from the first array to the second array from start index, to end index, but not including the end index (avoids OutOfBounds Exceptions).
How would this be going around to be made since I cannot seem to find the class that controls the "[ ]"'s since something has to regulate them because they aren't just "there" they had to be added in some way.
Thanks for any constructive criticism and feedback.
FirexRanger8
How would this be going around to be made since I cannot seem to find the class that controls the "[ ]"'s since something has to regulate them because they aren't just "there" they had to be added in some way.
They are "just there" in that they're hard-coded as part of the language and platform.
If you want to change how arrays are handled by the language, you'll have to make a change to the Java compiler. If you want to change how they behave at execution time, you'll have to change the JVM...
They're part of the Java language spec and what they do can't be changed, in Java.
There are other languages (ie: Groovy) that target the JVM, have a language syntax similar to Java, and do support things like overriding operators such as array indexing.
But, it can't be done in Java.
You can't overload operators like in c++, so achieving that exact syntax is not possible in java. But there already is a subList function exposed in List interface, maybe that is what you want?
If you're interested in doing something like this you should look at alternate JVM languages like scala or groovy, which have much more flexible syntax. For example, see:
Slice notation in scala?
I'm pretty sure this is a wheel that's already been invented:
String[] abc = { "a", "b", "c" };
String[] mod = Arrays.copyOfRange(abc, 1, abc.length); // now mod = [b, c]
The java.util.Arrays.copyOfRange function will do what you want ("a substring-like method for arrays") (just not with the syntax you want).

Categories