Getting shortestPaths in GraphFrames with Java - java

I am new to Spark and GraphFrames.
When I wanted to learn about shortestPaths method in GraphFrame, GraphFrames documentation gave me a sample code in Scala, but not in Java.
In their document, they provided following (Scala code):
import org.graphframes.{examples,GraphFrame}
val g: GraphFrame = examples.Graphs.friends // get example graph
val results = g.shortestPaths.landmarks(Seq("a", "d")).run()
results.select("id", "distances").show()
and in Java, I tried:
import org.graphframes.GraphFrames;
import scala.collection.Seq;
import scala.collection.JavaConverters;
GraphFrame g = new GraphFrame(...,...);
Seq landmarkSeq = JavaConverters.collectionAsScalaIterableConverter(Arrays.asList((Object)"a",(Object)"d")).asScala().toSeq();
g.shortestPaths().landmarks(landmarkSeq).run().show();
or
g.shortestPaths().landmarks(new ArrayList<Object>(List.of((Object)"a",(Object)"d"))).run().show();
Casting to java.lang.Object was necessary since the API demands Seq<Object> or ArrayList<Object> and I could not pass ArrayList<String> to compile it right.
After running the code, I saw the message:
Exception in thread "main" org.apache.spark.sql.AnalysisException: You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could:
1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`
2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)`, if input types are all non primitive
3. set spark.sql.legacy.allowUntypedScalaUDF to true and use this API with caution;
To follow the 3., I have added the code:
System.setProperty("spark.sql.legacy.allowUntypedScalaUDF","true");
but situation did not change.
Since there are limited number of sample code or stackoverflow questions about GraphFrames in Java, I could not find any useful information while seeking around.
Could anyone experienced in this area help me solve this problem?

This seems a bug in GraphFrames 0.8.0.
See Issue #367 in github.com

Related

How do I rewrite this java function in scala keeping the same Optional input parameter?

I have the following method in java
protected <T> T getObjectFromNullableOptional(final Optional<T> opt) {
return Optional.ofNullable(opt).flatMap(innerOptional -> innerOptional).orElse(null);
}
It takes a java Optional that can itself be null (I know, this is really bad and we're going to fix this eventually). And it wraps it with another Optional so it becomes either Some(Optional<T>) or None. Then it's flatMapped, so we get back Optional<T> or None, and finally apply orElse() to get T or null.
How do I write the same method with the same java.util.Optional in scala?
protected def getObjectFromNullableOptional[T >: Null](opt : Optional[T]): T =
???
I tried
protected def getObjectFromNullableOptional[T >: Null](opt : Optional[T]): T =
Optional.ofNullable(opt).flatMap(o => o).orElse(null)
But this gives me a Type mismatch error
Required: Function[_ >: Optional[T], Optional[NotInferedU]]
Found: Nothing => Nothing
I tried
protected def getObjectFromNullableOptional[T >: Null](opt : Optional[T]): T =
Option(opt).flatMap(o => o).getOrElse(null)
But this gives me
Cannot resolve overloaded method 'flatMap'
Edit I neglected to mention I'm using scala 2.11. I believe #tefanobaghino's solution is for scala 2.13 but it guided me towards the right path. I put my final solution in comments under this solution
The last error raises a few suspicions: it looks like you're wrapping a Java Optional in a Scala Option. I would have instead have expected this to have failed because you're trying to flatMap to a different type, something like
error: type mismatch;
found : java.util.Optional[T] => java.util.Optional[T]
required: java.util.Optional[T] => Option[?]
This seems to fulfill your requirement:
import java.util.Optional
def getObjectFromNullableOptional[T](opt: Optional[T]): T =
Optional.ofNullable(opt).orElse(Optional.empty).orElse(null.asInstanceOf[T])
assert(getObjectFromNullableOptional(null) == null)
assert(getObjectFromNullableOptional(Optional.empty) == null)
assert(getObjectFromNullableOptional(Optional.of(1)) == 1)
You can play around with this here on Scastie.
Note that asInstanceOf is compiled to a cast, not to an actual method call, so this code will not throw a NullPointerException.
You can also go into something closer to your original solution by helping Scala's type inference a bit:
def getObjectFromNullableOptional[T](opt: Optional[T]): T =
Optional.ofNullable(opt).flatMap((o: Optional[T]) => o).orElse(null.asInstanceOf[T])
Or alternatively using Scala's identity:
def getObjectFromNullableOptional[T](opt: Optional[T]): T =
Optional.ofNullable(opt).flatMap(identity[Optional[T]]).orElse(null.asInstanceOf[T])
For a solution using Scala's Option you can do something very close:
def getObjectFromNullableOption[T](opt: Option[T]): T =
Option(opt).getOrElse(None).getOrElse(null.asInstanceOf[T])
Note that going to your flatMap solution with Scala's Option allows you to avoid having to be explicit about the function type:
def getObjectFromNullableOption[T](opt: Option[T]): T =
Option(opt).flatMap(identity).getOrElse(null.asInstanceOf[T])
I'm not fully sure about the specifics, but I believe the issue is that, when using java.util.Optional you are passing a Scala Function to Optional.flatMap, which takes a Java Function. The Scala compiler can convert this automatically for you but apparently you have to be specific and explicit about type for this to work, at least in this case.
A note about your original code: you required T to be a supertype of Null but this is not necessary.
You have a better context regarding what you are doing, but as a general advice it's usually better to avoid having nulls leak in Scala code as much as possible.

Scala Collections - type casting Any to Seq[T] using Converters

I am new to Scala and I am trying to replace the deprecated JavaConversions library with JavaConverters. The original Code looks like this:
addresses = {
import scala.collection.JavaConversions._
config.getConfigList("amqp.addresses").map(address ⇒
Address(
host = address.foo()
))(collection.breakOut)
}
When I replace the JavaConversions with JavaConvertors in the code above, I get a compilation error:
Type mismatch: expected Seq[Address], actual: Any
I understand what the exception means, but I am not sure how I can convert the code above to make is return a Seq[Address] and not an Any. Also, there is a asJava method in Converters to convert the scala list to java list, but not sure how I can use it here. Thoughts?
You would need to state the final type and explicitly convert the Java collection into a Scala one (asScala):
addresses: Seq[Address] = {
import scala.collection.JavaConverters._
config.getConfigList("amqp.addresses").asScala.map(address ⇒
Address(
host = address.foo()
))(collection.breakOut)
}

Java-callable n-Sampler for Spark Dataset

I'm migrating code from Python to Java and want to build an n-Sampler for Dataset<Row>. It's been a bit frustrating, I ended up cheating and making a very inefficient Scala function for it based off other posts. I then run the function from my Java code, but even that hasn't worked
N-Sample behaviour:
- Select N-rows randomly from dataset
- No repetitions (no replacement)
Current Solution (broken)
import scala.util.Random
object ScalaFunctions {
def nSample(df : org.apache.spark.sql.Dataset[org.apache.spark.sql.Row], n : Int) : org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = {
//inefficient! Shuffles entire dataframe
val output = Random.shuffle(df).take(n)
return output.asInstanceOf[org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]]
}
}
Error Message
Error:(6, 25) inferred type arguments [org.apache.spark.sql.Row,org.apache.spark.sql.Dataset] do not conform to method shuffle's type parameter bounds [T,CC[X] <: TraversableOnce[X]]
val output = Random.shuffle(df).take(n)
Error:(6, 33) type mismatch;
found : org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
required: CC[T]
val output = Random.shuffle(df).take(n)
I'm new to Java and Scala, so even though I understand the shuffle function doesn't seem to like Datasets, I have no idea how to fix it.
- Virtual beer if you have a solution that doesn't involve shuffling the entire dataframe (for me, this could be like 4M rows) for a small n sample (250)

Using Scala in Java - how to convert a Java Object to Option<Object>

I'm writing in java, and I need to use external library that is written in scala. In particular, I need the following constructor:
new PartitionMetadata(partitionId: Int, leader: Option[Broker], replicas: Seq[Broker], isr: Seq[Broker] = collection.this.Seq.empty[Nothing], errorCode: Short = kafka.common.ErrorMapping.NoError)
I was able to convert all but the leader : Option[Broker] and the Seq parameters in my java code:
partitionMetadata = new kafka.api.PartitionMetadata(
partitionId, leader.getBroker(),(Seq)brokerReplicas, (Seq)brokerIsr, errorCode);
I'm getting the following error in my editor:
'PartitionMetadata(int, scala.Option<kafka.cluster.Broker>, scala.collection.Seq<kafka.cluster.Broker>, scala.collection.Seq<kafka.cluster.Broker>, short)' in 'kafka.api.PartitionMetadata'
cannot be applied to (int, kafka.cluster.Broker, scala.collection.Seq, scala.collection.Seq, short)
Is it possible to use a Scala constructor in Java? Also, how do I convert a Java Object (leader) to a Option?
Lastly, am I converting the ArrayList -> Scala.collection.Seq fields correctly?
Thanks
Yes, it's possible to use this scala constructor in Java. The error message from your editor gives you a hint: it expects a scala.Option<kafka.cluster.Broker> as the second argument.
You can create that scala.Option as follows: scala.Option.apply(leader.getBroker())
Also, you shouldn't just cast your java array lists to scala.Seq. Instead, check out scala.collection.JavaConversions

Using BabelNet in JAVA

I am using BabelNet API 2.5 to get the synset of a word. The code for various purposes is clearly elucidated here: http://babelnet.org/guide#Mainclasses
And accordingly I write my code (In JAVA):
BabelNet bn = BabelNet.getInstance();
...
for (BabelSynset synset : bn.getSynsets(Language.EN, value, BabelPOS.NOUN,BabelSenseSource.WN))
{
System.out.println("Synset ID: " + synset.getId());
}
In the code value contains the String whose Synset I need.
But I get this error:
'The method getSynsets(Language, String, POS, BabelSenseSource...) in the type BabelNet is not applicable for the arguments (Language, String, BabelPOS, BabelSenseSource)' with the bn.getSynsets highlighted.
I am using Eclipse to do this.
Can anybody explain the error?
You must use the class POS in edu.mit.jwi.item.POS instead of BabelPOS.
import edu.mit.jwi.item.POS contained in jwi-2.1.4.jar and after you could write
bn.getSynsets(Language.EN, value, POS.NOUN, BabelSenseSource.WN)
The example code you have seen in this link is for the latest version of Babelnet (3.0), not for 2.5.

Categories