I'm working in a project which uses Spark streaming, Apache kafka and Cassandra.
I use streaming-kafka integration. In kafka I have a producer which sends data using this configuration:
props.put("metadata.broker.list", KafkaProperties.ZOOKEEPER);
props.put("bootstrap.servers", KafkaProperties.SERVER);
props.put("client.id", "DemoProducer");
where ZOOKEEPER = localhost:2181, and SERVER = localhost:9092.
Once I send data I can receive it with spark, and I can consume it too. My spark configuration is:
SparkConf sparkConf = new SparkConf().setAppName("org.kakfa.spark.ConsumerData").setMaster("local[4]");
sparkConf.set("spark.cassandra.connection.host", "localhost");
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));
After that I am trying to store this data into cassandra database. But when I try to open session using this:
CassandraConnector connector = CassandraConnector.apply(jssc.sparkContext().getConf());
Session session = connector.openSession();
I get the following error:
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:220)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1231)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:334)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:182)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:161)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:161)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:36)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:61)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:70)
at org.kakfa.spark.ConsumerData.main(ConsumerData.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Regarding to cassandra, I'm using default configuration:
start_native_transport: true
native_transport_port: 9042
- seeds: "127.0.0.1"
cluster_name: 'Test Cluster'
rpc_address: localhost
rpc_port: 9160
start_rpc: true
I can manage to connect to cassandra from the command line using cqlsh localhost, getting the following message:
Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.0.5 | CQL spec 3.4.0 | Native protocol v4] Use HELP for help. cqlsh>
I used nodetool status too, which shows me this:
http://pastebin.com/ZQ5YyDyB
For running cassandra I invoke bin/cassandra -f
What I am trying to run is this:
try (Session session = connector.openSession()) {
System.out.println("dentro del try");
session.execute("DROP KEYSPACE IF EXISTS test");
System.out.println("dentro del try - 1");
session.execute("CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}");
System.out.println("dentro del try - 2");
session.execute("CREATE TABLE test.users (id TEXT PRIMARY KEY, name TEXT)");
System.out.println("dentro del try - 3");
}
My pom.xml file looks like that:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.6.0-M1</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.6.0-M2</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.1.0-alpha2</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.1.0-alpha2</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20160212</version>
</dependency>
</dependencies>
I have no idea why I can't connect to cassandra using spark, is it configuration bad or what i am doing wrong?
Thank you!
com.datastax.driver.core.exceptions.InvalidQueryException:
unconfigured table schema_keyspaces)
That error indicates an old driver with a new Cassandra version. Looking at the POM file, we find there the spark-cassandra-connector dependency declared twice.
One uses version 1.6.0-m2 (GOOD) and the other 1.1.0-alpha2 (old).
Remove the references to the old dependencies 1.1.0-alpha2 from your config:
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.1.0-alpha2</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.1.0-alpha2</version>
</dependency>
Related
Exception message - 'Unsupported OP_QUERY command: insert. The client driver may require an upgrade. For more details see https://dochub.mongodb.org/core/legacy-opcode-removal' on server localhost:27017. The full response is { "ok" : 0.0, "errmsg" : "Unsupported OP_QUERY command: insert. The client driver may require an upgrade. For more details see https://dochub.mongodb.org/core/legacy-opcode-removal", "code" : 352, "codeName" : "UnsupportedOpQueryCommand" }
Dependency -
<!-- mongodb java driver -->
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.3.0</version>
</dependency>
<!-- Spring data mongodb -->
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-mongodb</artifactId>
<version>1.2.0.RELEASE</version>
</dependency>
Properties
info.mongo.host=localhost
info.mongo.port=27017
info.mongo.dbname=testing
Code -
private static MongoClient mongo;
DBCollection dbCollection = mongo.getDB(dbName)).getCollection(collectionName);
BasicDBObject document = new BasicDBObject();
document.put("field", "test field");
dbCollection.insert(document);
Can anyone please come up with a favor regarding this issue?
I try to execute a hive ddl sql with stream table api on flink-1.13.2, the code like:
String hiveDDL = ResourceUtil.readClassPathSource("hive-ddl.sql");
EnvironmentSettings settings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode().build();
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, settings);
String name = "hive";
String defaultDatabase = "stream";
String hiveConfDir = "conf";
HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("hive", hive);
tableEnv.useCatalog("hive");
tableEnv.useDatabase("stream");
tableEnv.executeSql("DROP TABLE IF EXISTS dimension_table");
// 设置HIVE方言
tableEnv.getConfig().setSqlDialect(SqlDialect.HIVE);
tableEnv.executeSql(hiveDDL);
the hive server in cdh5.14.2, and the ddl sql like:
CREATE TABLE dimension_table (
product_id STRING,
product_name STRING,
unit_price DECIMAL(10, 4),
pv_count BIGINT,
like_count BIGINT,
comment_count BIGINT,
update_time TIMESTAMP(3),
update_user STRING
)
PARTITIONED BY (
pt_year STRING,
pt_month STRING,
pt_day STRING
)
TBLPROPERTIES (
– using default partition-name order to load the latest partition every 12h (the most recommended and convenient way)
'streaming-source.enable' = 'true',
'streaming-source.partition.include' = 'latest',
'streaming-source.monitor-interval' = '12 h',
'streaming-source.partition-order' = 'partition-name', – option with default value, can be ignored.
– using partition file create-time order to load the latest partition every 12h
'streaming-source.enable' = 'true',
'streaming-source.partition.include' = 'latest',
'streaming-source.partition-order' = 'create-time',
'streaming-source.monitor-interval' = '12 h'
– using partition-time order to load the latest partition every 12h
'streaming-source.enable' = 'true',
'streaming-source.partition.include' = 'latest',
'streaming-source.monitor-interval' = '12 h',
'streaming-source.partition-order' = 'partition-time',
'partition.time-extractor.kind' = 'default',
'partition.time-extractor.timestamp-pattern' = '$pt_year-$pt_month-$pt_day 00:00:00'
)
then run it, but throw NullPointerException, like:
2021-11-18 15:33:00,387 INFO [org.apache.flink.table.catalog.hive.HiveCatalog] - Setting hive conf dir as conf
2021-11-18 15:33:00,481 WARN [org.apache.hadoop.util.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-11-18 15:33:01,345 INFO [org.apache.flink.table.catalog.hive.HiveCatalog] - Created HiveCatalog 'hive'
2021-11-18 15:33:01,371 INFO [hive.metastore] - Trying to connect to metastore with URI thrift://cdh-dev-node-119:9083
2021-11-18 15:33:01,441 INFO [hive.metastore] - Opened a connection to metastore, current connections: 1
2021-11-18 15:33:01,521 INFO [hive.metastore] - Connected to metastore.
2021-11-18 15:33:01,856 INFO [org.apache.flink.table.catalog.hive.HiveCatalog] - Connected to Hive metastore
2021-11-18 15:33:01,899 INFO [org.apache.flink.table.catalog.CatalogManager] - Set the current default catalog as [hive] and the current default database as [stream].
2021-11-18 15:33:03,290 INFO [org.apache.hadoop.hive.ql.session.SessionState] - Created local directory: /var/folders/4m/n1wgh7rd2yqfv301kq00l4q40000gn/T/681dd0aa-ba35-4a0e-b069-3ad48f030774_resources
2021-11-18 15:33:03,298 INFO [org.apache.hadoop.hive.ql.session.SessionState] - Created HDFS directory: /tmp/hive/chenxiaojun/681dd0aa-ba35-4a0e-b069-3ad48f030774
2021-11-18 15:33:03,305 INFO [org.apache.hadoop.hive.ql.session.SessionState] - Created local directory: /var/folders/4m/n1wgh7rd2yqfv301kq00l4q40000gn/T/chenxiaojun/681dd0aa-ba35-4a0e-b069-3ad48f030774
2021-11-18 15:33:03,311 INFO [org.apache.hadoop.hive.ql.session.SessionState] - Created HDFS directory: /tmp/hive/chenxiaojun/681dd0aa-ba35-4a0e-b069-3ad48f030774/_tmp_space.db
2021-11-18 15:33:03,314 INFO [org.apache.hadoop.hive.ql.session.SessionState] - No Tez session required at this point. hive.execution.engine=mr.
Exception in thread "main" java.lang.NullPointerException
at org.apache.flink.table.catalog.hive.client.HiveShimV100.registerTemporaryFunction(HiveShimV100.java:422)
at org.apache.flink.table.planner.delegation.hive.HiveParser.parse(HiveParser.java:217)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:724)
at com.hacker.flinksql.hive.HiveSqlTest.main(HiveSqlTest.java:48)
I found the error code in flink-1.13.2,
org.apache.flink.table.catalog.hive.client.HiveShimV100.java - line:422
this method params is null, the code:
#Override
public void registerTemporaryFunction(String funcName, Class funcClass) {
try
{ registerTemporaryFunction.invoke(null, funcName, funcClass); }
catch (IllegalAccessException | InvocationTargetException e)
{ throw new FlinkHiveException("Failed to register temp function", e); }
}
my maven dependency
<properties>
<hadoop.version>2.6.0-cdh5.14.2</hadoop.version>
<hive.version>1.1.0-cdh5.14.2</hive.version>
</properties>
<!-- flink sql core -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.5</version>
<scope>provided</scope>
</dependency>
<!-- hive catalog -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hive_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
<scope>provided</scope>
</dependency>
<!-- catalog hadoop dependency -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0-cdh5.15.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.6.0-cdh5.15.2</version>
<scope>provided</scope>
</dependency>
I create issues at https://issues.apache.org/jira/browse/FLINK-24950
It's flink source bug? and what should i do ?
I am new to Cassandra and Spark and trying to fetch data from DB using spark.
I am using Java for this purpose.
Problem is that there are no exceptions thrown or error occurred but still I am not able to get the data. Find my code below -
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("Spark-Cassandra Integration");
sparkConf.setMaster("local[4]");
sparkConf.set("spark.cassandra.connection.host", "stagingHost22");
sparkConf.set("spark.cassandra.connection.port", "9042");
sparkConf.set("spark.cassandra.connection.timeout_ms", "5000");
sparkConf.set("spark.cassandra.read.timeout_ms", "200000");
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
String keySpaceName = "testKeySpace";
String tableName = "testTable";
CassandraJavaRDD<CassandraRow> cassandraRDD = CassandraJavaUtil.javaFunctions(javaSparkContext).cassandraTable(keySpaceName, tableName);
final ArrayList dataList = new ArrayList();
JavaRDD<String> userRDD = cassandraRDD.map(new Function<CassandraRow, String>() {
private static final long serialVersionUID = -165799649937652815L;
public String call(CassandraRow row) throws Exception {
System.out.println("Inside RDD call");
dataList.add(row);
return "test";
}
});
System.out.println( "data Size -" + dataList.size());
Cassandra and spark maven dependencies are -
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-extras</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.sparkjava</groupId>
<artifactId>spark-core</artifactId>
<version>2.5.4</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>2.0.0-M3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.0</version>
</dependency>
This is sure that stagingHost22 host has the cassandra data with keyspace - testKeySpace and table name - testTable. Find below query output -
cqlsh:testKeySpace> select count(*) from testTable;
count
34
(1 rows)
Can Anybody please suggest what am I missing here?
Thanks in advance.
Warm regards,
Vibhav
Your current code does not perform any Spark action. Therefore no data is loaded.
See the Spark documentation to understand the difference between transformations and actions in Spark:
http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations
Furthermore adding CassandraRows to a ArrayList isn't something that is usally necessary when using the Cassandra connector. I would suggest to implement a simple select first (following the Spark-Cassandra-Connector documentation). If this is working you can extend this code as needed.
Check the following links on samples how to load data using the connector:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md
I wrote code to get data from "topicTest1" Kafka Queue. I am not able to print data from the consumer. Error occurred and mentioned below,
Below is my code to consume data,
public static void main(String[] args) throws Exception {
// StreamingExamples.setStreamingLogLevels();
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount").setMaster("local[*]");
;
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(100));
int numThreads = Integer.parseInt("3");
Map<String, Integer> topicMap = new HashMap<>();
String[] topics = "topicTest1".split(",");
for (String topic : topics) {
topicMap.put(topic, numThreads);
}
JavaPairReceiverInputDStream<String, String> messages = KafkaUtils.createStream(jssc, "9.98.171.226:9092", "1",
topicMap);
messages.print();
jssc.start();
jssc.awaitTermination();
}
Using following depedencies
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>1.6.1</version>
</dependency>
Below error I got
Exception in thread "dispatcher-event-loop-0" java.lang.NoSuchMethodError: scala/Predef$.$conforms()Lscala/Predef$$less$colon$less; (loaded from file:/C:/Users/Administrator/.m2/repository/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar by sun.misc.Launcher$AppClassLoader#4b69b358) called from class org.apache.spark.streaming.scheduler.ReceiverSchedulingPolicy (loaded from file:/C:/Users/Administrator/.m2/repository/org/apache/spark/spark-streaming_2.11/1.6.2/spark-streaming_2.11-1.6.2.jar by sun.misc.Launcher$AppClassLoader#4b69b358).
at org.apache.spark.streaming.scheduler.ReceiverSchedulingPolicy.scheduleReceivers(ReceiverSchedulingPolicy.scala:138)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receive$1.applyOrElse(ReceiverTracker.scala:450)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)16/11/14 13:38:00 INFO ForEachDStream: metadataCleanupDelay = -1
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:785)
Another Error
Exception in thread "JobGenerator" java.lang.NoSuchMethodError: scala/Predef$.$conforms()Lscala/Predef$$less$colon$less; (loaded from file:/C:/Users/Administrator/.m2/repository/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar by sun.misc.Launcher$AppClassLoader#4b69b358) called from class org.apache.spark.streaming.scheduler.ReceivedBlockTracker (loaded from file:/C:/Users/Administrator/.m2/repository/org/apache/spark/spark-streaming_2.11/1.6.2/spark-streaming_2.11-1.6.2.jar by sun.misc.Launcher$AppClassLoader#4b69b358).
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.allocateBlocksToBatch(ReceivedBlockTracker.scala:114)
at org.apache.spark.streaming.scheduler.ReceiverTracker.allocateBlocksToBatch(ReceiverTracker.scala:203)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Make sure that you use the correct versions. Lets say you use following maven dependecy:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
So the artifact equals: spark-streaming-kafka_2.10
Now check if you use the correct Kafka version:
cd /KAFKA_HOME/libs
Now find kafka_YOUR-VERSION-sources.jar.
In case you have kafka_2.10-0xxxx-sources.jar you are fine! :)
If you use different versions, just change maven dependecies OR download the correct kafka version.
After that check your Spark version. Make sure you use the correct versions
groupId: org.apache.spark
artifactId: spark-core_2.xx
version: xxx
I'm trying to build some filters to filter data from Bigtable. I'm using bigtable-hbase drivers and HBase drivers. Actually here are my dependencies from pom.xml:
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigtable</groupId>
<artifactId>bigtable-hbase</artifactId>
<version>${bigtable.version}</version>
</dependency>
I'm filtering data like this:
Filter filterName = new SingleColumnValueFilter(Bytes.toBytes("FName"), Bytes.toBytes("FName"),
CompareFilter.CompareOp.EQUAL, new RegexStringComparator("JOHN"));
FilterList filters = new FilterList();
filters.addFilter(filterName);
Scan scan1 = new Scan();
scan1.setFilter(filters);
This works ok. But then I add following to previous code:
Filter filterSalary = new SingleColumnValueFilter(Bytes.toBytes("Salary"), Bytes.toBytes("Salary"),
CompareFilter.CompareOp.GREATER_OR_EQUAL, new LongComparator(100000));
filters.addFilter(filterSalary);
and it give me this exception:
Exception in thread "main" com.google.cloud.bigtable.hbase.adapters.filters.UnsupportedFilterException: Unsupported filters encountered: FilterSupportStatus{isSupported=false, reason='ValueFilter must have either a BinaryComparator with any compareOp or a RegexStringComparator with an EQUAL compareOp. Found (LongComparator, GREATER_OR_EQUAL)'}
at com.google.cloud.bigtable.hbase.adapters.filters.FilterAdapter.throwIfUnsupportedFilter(FilterAdapter.java:144)
at com.google.cloud.bigtable.hbase.adapters.ScanAdapter.throwIfUnsupportedScan(ScanAdapter.java:55)
at com.google.cloud.bigtable.hbase.adapters.ScanAdapter.adapt(ScanAdapter.java:91)
at com.google.cloud.bigtable.hbase.adapters.ScanAdapter.adapt(ScanAdapter.java:43)
at com.google.cloud.bigtable.hbase.BigtableTable.getScanner(BigtableTable.java:247)
So my question is how to filter long data type? Is it hbase issue or bigtable specific?
I found this How do you use a custom comparator with SingleColumnValueFilter on HBase? but I can't load my jars to server so it is not applicable for my case.
SingleColumnValueFilter supports the following comparators:
BinaryComparator
BinaryPrefixComparator
RegexStringComparator.
See this link for an up-to-date list:
https://cloud.google.com/bigtable/docs/hbase-differences