I try to execute a hive ddl sql with stream table api on flink-1.13.2, the code like:
String hiveDDL = ResourceUtil.readClassPathSource("hive-ddl.sql");
EnvironmentSettings settings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode().build();
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, settings);
String name = "hive";
String defaultDatabase = "stream";
String hiveConfDir = "conf";
HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("hive", hive);
tableEnv.useCatalog("hive");
tableEnv.useDatabase("stream");
tableEnv.executeSql("DROP TABLE IF EXISTS dimension_table");
// 设置HIVE方言
tableEnv.getConfig().setSqlDialect(SqlDialect.HIVE);
tableEnv.executeSql(hiveDDL);
the hive server in cdh5.14.2, and the ddl sql like:
CREATE TABLE dimension_table (
product_id STRING,
product_name STRING,
unit_price DECIMAL(10, 4),
pv_count BIGINT,
like_count BIGINT,
comment_count BIGINT,
update_time TIMESTAMP(3),
update_user STRING
)
PARTITIONED BY (
pt_year STRING,
pt_month STRING,
pt_day STRING
)
TBLPROPERTIES (
– using default partition-name order to load the latest partition every 12h (the most recommended and convenient way)
'streaming-source.enable' = 'true',
'streaming-source.partition.include' = 'latest',
'streaming-source.monitor-interval' = '12 h',
'streaming-source.partition-order' = 'partition-name', – option with default value, can be ignored.
– using partition file create-time order to load the latest partition every 12h
'streaming-source.enable' = 'true',
'streaming-source.partition.include' = 'latest',
'streaming-source.partition-order' = 'create-time',
'streaming-source.monitor-interval' = '12 h'
– using partition-time order to load the latest partition every 12h
'streaming-source.enable' = 'true',
'streaming-source.partition.include' = 'latest',
'streaming-source.monitor-interval' = '12 h',
'streaming-source.partition-order' = 'partition-time',
'partition.time-extractor.kind' = 'default',
'partition.time-extractor.timestamp-pattern' = '$pt_year-$pt_month-$pt_day 00:00:00'
)
then run it, but throw NullPointerException, like:
2021-11-18 15:33:00,387 INFO [org.apache.flink.table.catalog.hive.HiveCatalog] - Setting hive conf dir as conf
2021-11-18 15:33:00,481 WARN [org.apache.hadoop.util.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-11-18 15:33:01,345 INFO [org.apache.flink.table.catalog.hive.HiveCatalog] - Created HiveCatalog 'hive'
2021-11-18 15:33:01,371 INFO [hive.metastore] - Trying to connect to metastore with URI thrift://cdh-dev-node-119:9083
2021-11-18 15:33:01,441 INFO [hive.metastore] - Opened a connection to metastore, current connections: 1
2021-11-18 15:33:01,521 INFO [hive.metastore] - Connected to metastore.
2021-11-18 15:33:01,856 INFO [org.apache.flink.table.catalog.hive.HiveCatalog] - Connected to Hive metastore
2021-11-18 15:33:01,899 INFO [org.apache.flink.table.catalog.CatalogManager] - Set the current default catalog as [hive] and the current default database as [stream].
2021-11-18 15:33:03,290 INFO [org.apache.hadoop.hive.ql.session.SessionState] - Created local directory: /var/folders/4m/n1wgh7rd2yqfv301kq00l4q40000gn/T/681dd0aa-ba35-4a0e-b069-3ad48f030774_resources
2021-11-18 15:33:03,298 INFO [org.apache.hadoop.hive.ql.session.SessionState] - Created HDFS directory: /tmp/hive/chenxiaojun/681dd0aa-ba35-4a0e-b069-3ad48f030774
2021-11-18 15:33:03,305 INFO [org.apache.hadoop.hive.ql.session.SessionState] - Created local directory: /var/folders/4m/n1wgh7rd2yqfv301kq00l4q40000gn/T/chenxiaojun/681dd0aa-ba35-4a0e-b069-3ad48f030774
2021-11-18 15:33:03,311 INFO [org.apache.hadoop.hive.ql.session.SessionState] - Created HDFS directory: /tmp/hive/chenxiaojun/681dd0aa-ba35-4a0e-b069-3ad48f030774/_tmp_space.db
2021-11-18 15:33:03,314 INFO [org.apache.hadoop.hive.ql.session.SessionState] - No Tez session required at this point. hive.execution.engine=mr.
Exception in thread "main" java.lang.NullPointerException
at org.apache.flink.table.catalog.hive.client.HiveShimV100.registerTemporaryFunction(HiveShimV100.java:422)
at org.apache.flink.table.planner.delegation.hive.HiveParser.parse(HiveParser.java:217)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:724)
at com.hacker.flinksql.hive.HiveSqlTest.main(HiveSqlTest.java:48)
I found the error code in flink-1.13.2,
org.apache.flink.table.catalog.hive.client.HiveShimV100.java - line:422
this method params is null, the code:
#Override
public void registerTemporaryFunction(String funcName, Class funcClass) {
try
{ registerTemporaryFunction.invoke(null, funcName, funcClass); }
catch (IllegalAccessException | InvocationTargetException e)
{ throw new FlinkHiveException("Failed to register temp function", e); }
}
my maven dependency
<properties>
<hadoop.version>2.6.0-cdh5.14.2</hadoop.version>
<hive.version>1.1.0-cdh5.14.2</hive.version>
</properties>
<!-- flink sql core -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.5</version>
<scope>provided</scope>
</dependency>
<!-- hive catalog -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hive_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
<scope>provided</scope>
</dependency>
<!-- catalog hadoop dependency -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0-cdh5.15.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.6.0-cdh5.15.2</version>
<scope>provided</scope>
</dependency>
I create issues at https://issues.apache.org/jira/browse/FLINK-24950
It's flink source bug? and what should i do ?
Related
I face a problem when trying to use Quarkus Flyway extension with Quarkus Reactive Hibernate & RESTEasy. When starting my application, I get the following error:
[io.qu.ru.Application] (Quarkus Main Thread) Failed to start application (with profile dev): java.lang.IllegalStateException: Booting an Hibernate Reactive serviceregistry on a non-reactive RecordedState!
at io.quarkus.hibernate.reactive.runtime.boot.registry.PreconfiguredReactiveServiceRegistryBuilder.checkIsReactive(PreconfiguredReactiveServiceRegistryBuilder.java:76)
at io.quarkus.hibernate.reactive.runtime.boot.registry.PreconfiguredReactiveServiceRegistryBuilder.<init>(PreconfiguredReactiveServiceRegistryBuilder.java:66)
at io.quarkus.hibernate.reactive.runtime.FastBootHibernateReactivePersistenceProvider.rewireMetadataAndExtractServiceRegistry(FastBootHibernateReactivePersistenceProvider.java:177)
at io.quarkus.hibernate.reactive.runtime.FastBootHibernateReactivePersistenceProvider.getEntityManagerFactoryBuilderOrNull(FastBootHibernateReactivePersistenceProvider.java:156)
at io.quarkus.hibernate.reactive.runtime.FastBootHibernateReactivePersistenceProvider.createEntityManagerFactory(FastBootHibernateReactivePersistenceProvider.java:82)
Here are the relevant Quarkus configurations:
quarkus:
datasource:
db-kind: "postgresql"
username: "sarah"
password: "connor"
jdbc:
~: true
url: "jdbc:postgresql://localhost:5432/mybase"
reactive:
~: true
url: "postgresql://localhost:5432/mybase"
And the relevant dependencies:
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-hibernate-reactive-panache</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-reactive-pg-client</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-flyway</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-jdbc-postgresql</artifactId>
</dependency>
Disabling the JDBC configuration with ~: false avoids the exception but then the application does not launch the Flyway migration at start time. In that case, I see the following message:
[io.qu.ag.de.AgroalProcessor] (build-39) The Agroal dependency is present but no JDBC datasources have been defined.
I found on some Quarkus issues that it's indeed not possible to run a reactive and a blocking database connection at the same time but is there a way to make Flyway working with a reactive Quarkus application ?
Currently they indeed don't seem support both blocking JDBC and a reactive sql client at the same time.
A workaround is to disable JDBC for the Quarkus runtime and write your own wrapper to execute the Flyway migration.
Below workaround is based on their corresponding GitHub issue.
Flyway wrapper to run when application starts:
#ApplicationScoped
public class RunFlyway {
#ConfigProperty(name = "myapp.flyway.migrate")
boolean runMigration;
#ConfigProperty(name = "quarkus.datasource.reactive.url")
String datasourceUrl;
#ConfigProperty(name = "quarkus.datasource.username")
String datasourceUsername;
#ConfigProperty(name = "quarkus.datasource.password")
String datasourcePassword;
public void runFlywayMigration(#Observes StartupEvent event) {
if (runMigration) {
Flyway flyway = Flyway.configure().dataSource("jdbc:" + datasourceUrl, datasourceUsername, datasourcePassword).load();
flyway.migrate();
}
}
}
pom.xml:
<!-- DB -->
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-reactive-pg-client</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-hibernate-reactive</artifactId>
</dependency>
<!-- Flyway -->
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-flyway</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-jdbc-postgresql</artifactId>
</dependency>
application.yml:
myapp:
flyway:
migrate: true
quarkus:
datasource:
db-kind: postgresql
username: myuser
password: mypassword
jdbc: false
reactive:
url: postgresql://localhost:5432/mydb
In our Spring Boot project I am trying to accumulate messages in h2 in-memory database and load them into MySQL database once message number reaches 50.
Here is dependencies from pom.xml file:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-log4j -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-log4j</artifactId>
<version>1.3.8.RELEASE</version>
</dependency>
<!-- H2 DB -->
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>${h2.version}</version>
</dependency>
log4j.properties is as below:
log4j.rootLogger = DEBUG, sql
# Define the file appender
log4j.appender.sql=org.apache.log4j.jdbc.JDBCAppender
log4j.appender.sql.URL=jdbc:h2:mem:liteDatabase
# Set Database Driver
log4j.appender.sql.driver=org.h2.Driver
# Set database user name and password
log4j.appender.sql.user=sa
log4j.appender.sql.password=sa
# Set the SQL statement to be executed.
log4j.appender.sql.sql=INSERT INTO LOGS (LOG_DATE, LOGGER, LOG_LEVEL, MESSAGE) VALUES (now() ,'%C','%p','%m')
# Define the xml layout for file appender
log4j.appender.sql.layout=org.apache.log4j.PatternLayout
From my createTables-h2.sql file:
CREATE TABLE LOGS
(
id INT auto_increment PRIMARY KEY,
LOG_DATE DATETIME NOT NULL,
LOGGER VARCHAR2(250) NOT NULL,
LOG_LEVEL VARCHAR2(10) NOT NULL,
MESSAGE VARCHAR2(1000) NOT NULL
);
datasource bean for h2:
#Bean(name = "liteDataSource")
public BasicDataSource liteDataSource() {
BasicDataSource dataSource = new BasicDataSource();
dataSource.setDriverClassName(env.getProperty("h2.driver-class-name"));
dataSource.setUrl(env.getProperty("h2.url"));
dataSource.setUsername(env.getProperty("h2.username"));
dataSource.setPassword(env.getProperty("h2.password"));
Resource initData = new ClassPathResource("scripts/createTables-h2.sql");
DatabasePopulator databasePopulator = new ResourceDatabasePopulator(initData);
DatabasePopulatorUtils.execute(databasePopulator, dataSource);
return dataSource;
}
The problem is log4j is trying to log into the database before it is
created
Error message:
log4j:ERROR Failed to excute sql
org.h2.jdbc.JdbcSQLException: Table "LOGS" not found; SQL statement:
INSERT INTO LOGS VALUES ('', now() ,'org.springframework.core.env.MutablePropertySources','DEBUG','Adding [servletConfigInitParams] PropertySource with lowest search precedence') [42102-193]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.command.Parser.readTableOrView(Parser.java:5389)
at org.h2.command.Parser.readTableOrView(Parser.java:5366)
at org.h2.command.Parser.parseInsert(Parser.java:1053)
at org.h2.command.Parser.parsePrepared(Parser.java:413)
at org.h2.command.Parser.parse(Parser.java:317)
at org.h2.command.Parser.parse(Parser.java:289)
at org.h2.command.Parser.prepareCommand(Parser.java:254)
at org.h2.engine.Session.prepareLocal(Session.java:561)
at org.h2.engine.Session.prepareCommand(Session.java:502)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1203)
at org.h2.jdbc.JdbcStatement.executeUpdateInternal(JdbcStatement.java:126)
at org.h2.jdbc.JdbcStatement.executeUpdate(JdbcStatement.java:115)
at org.apache.log4j.jdbc.JDBCAppender.execute(JDBCAppender.java:218)
at org.apache.log4j.jdbc.JDBCAppender.flushBuffer(JDBCAppender.java:289)
at org.apache.log4j.jdbc.JDBCAppender.append(JDBCAppender.java:186)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.log(Log4jLoggerAdapter.java:581)
at org.apache.commons.logging.impl.SLF4JLocationAwareLog.debug(SLF4JLocationAwareLog.java:131)
at org.springframework.core.env.MutablePropertySources.addLast(MutablePropertySources.java:109)
at org.springframework.web.context.support.StandardServletEnvironment.customizePropertySources(StandardServletEnvironment.java:84)
at org.springframework.core.env.AbstractEnvironment.<init>(AbstractEnvironment.java:122)
at org.springframework.core.env.StandardEnvironment.<init>(StandardEnvironment.java:54)
at org.springframework.web.context.support.StandardServletEnvironment.<init>(StandardServletEnvironment.java:44)
at org.springframework.boot.SpringApplication.getOrCreateEnvironment(SpringApplication.java:436)
at org.springframework.boot.SpringApplication.prepareEnvironment(SpringApplication.java:335)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1186)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1175)
at com.example.DemoApplication.main(DemoApplication.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.springframework.boot.devtools.restart.RestartLauncher.run(RestartLauncher.java:49)
Please help me out if you can offer any solution!
I am new to Cassandra and Spark and trying to fetch data from DB using spark.
I am using Java for this purpose.
Problem is that there are no exceptions thrown or error occurred but still I am not able to get the data. Find my code below -
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("Spark-Cassandra Integration");
sparkConf.setMaster("local[4]");
sparkConf.set("spark.cassandra.connection.host", "stagingHost22");
sparkConf.set("spark.cassandra.connection.port", "9042");
sparkConf.set("spark.cassandra.connection.timeout_ms", "5000");
sparkConf.set("spark.cassandra.read.timeout_ms", "200000");
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
String keySpaceName = "testKeySpace";
String tableName = "testTable";
CassandraJavaRDD<CassandraRow> cassandraRDD = CassandraJavaUtil.javaFunctions(javaSparkContext).cassandraTable(keySpaceName, tableName);
final ArrayList dataList = new ArrayList();
JavaRDD<String> userRDD = cassandraRDD.map(new Function<CassandraRow, String>() {
private static final long serialVersionUID = -165799649937652815L;
public String call(CassandraRow row) throws Exception {
System.out.println("Inside RDD call");
dataList.add(row);
return "test";
}
});
System.out.println( "data Size -" + dataList.size());
Cassandra and spark maven dependencies are -
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-extras</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.sparkjava</groupId>
<artifactId>spark-core</artifactId>
<version>2.5.4</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>2.0.0-M3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.0</version>
</dependency>
This is sure that stagingHost22 host has the cassandra data with keyspace - testKeySpace and table name - testTable. Find below query output -
cqlsh:testKeySpace> select count(*) from testTable;
count
34
(1 rows)
Can Anybody please suggest what am I missing here?
Thanks in advance.
Warm regards,
Vibhav
Your current code does not perform any Spark action. Therefore no data is loaded.
See the Spark documentation to understand the difference between transformations and actions in Spark:
http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations
Furthermore adding CassandraRows to a ArrayList isn't something that is usally necessary when using the Cassandra connector. I would suggest to implement a simple select first (following the Spark-Cassandra-Connector documentation). If this is working you can extend this code as needed.
Check the following links on samples how to load data using the connector:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md
I'm working in a project which uses Spark streaming, Apache kafka and Cassandra.
I use streaming-kafka integration. In kafka I have a producer which sends data using this configuration:
props.put("metadata.broker.list", KafkaProperties.ZOOKEEPER);
props.put("bootstrap.servers", KafkaProperties.SERVER);
props.put("client.id", "DemoProducer");
where ZOOKEEPER = localhost:2181, and SERVER = localhost:9092.
Once I send data I can receive it with spark, and I can consume it too. My spark configuration is:
SparkConf sparkConf = new SparkConf().setAppName("org.kakfa.spark.ConsumerData").setMaster("local[4]");
sparkConf.set("spark.cassandra.connection.host", "localhost");
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));
After that I am trying to store this data into cassandra database. But when I try to open session using this:
CassandraConnector connector = CassandraConnector.apply(jssc.sparkContext().getConf());
Session session = connector.openSession();
I get the following error:
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:220)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1231)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:334)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:182)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:161)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:161)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:36)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:61)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:70)
at org.kakfa.spark.ConsumerData.main(ConsumerData.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Regarding to cassandra, I'm using default configuration:
start_native_transport: true
native_transport_port: 9042
- seeds: "127.0.0.1"
cluster_name: 'Test Cluster'
rpc_address: localhost
rpc_port: 9160
start_rpc: true
I can manage to connect to cassandra from the command line using cqlsh localhost, getting the following message:
Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.0.5 | CQL spec 3.4.0 | Native protocol v4] Use HELP for help. cqlsh>
I used nodetool status too, which shows me this:
http://pastebin.com/ZQ5YyDyB
For running cassandra I invoke bin/cassandra -f
What I am trying to run is this:
try (Session session = connector.openSession()) {
System.out.println("dentro del try");
session.execute("DROP KEYSPACE IF EXISTS test");
System.out.println("dentro del try - 1");
session.execute("CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}");
System.out.println("dentro del try - 2");
session.execute("CREATE TABLE test.users (id TEXT PRIMARY KEY, name TEXT)");
System.out.println("dentro del try - 3");
}
My pom.xml file looks like that:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.6.0-M1</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.6.0-M2</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.1.0-alpha2</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.1.0-alpha2</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20160212</version>
</dependency>
</dependencies>
I have no idea why I can't connect to cassandra using spark, is it configuration bad or what i am doing wrong?
Thank you!
com.datastax.driver.core.exceptions.InvalidQueryException:
unconfigured table schema_keyspaces)
That error indicates an old driver with a new Cassandra version. Looking at the POM file, we find there the spark-cassandra-connector dependency declared twice.
One uses version 1.6.0-m2 (GOOD) and the other 1.1.0-alpha2 (old).
Remove the references to the old dependencies 1.1.0-alpha2 from your config:
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.1.0-alpha2</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.1.0-alpha2</version>
</dependency>
Mongo java driver is verbose since v3.x.x, it's not very optimal to store a log of logs like this. Would like to hide some of them.
Here are my dependencies (maven / pom.xml)
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.0.1</version>
</dependency>
<!-- LOGS -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.5</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
<version>1.7.5</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
<version>1.7.5</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.5</version>
</dependency>
I want to hide mongo java driver logs :
2015-05-16 10:43:22 023 DEBUG command:56 - Sending command {count : BsonString{value='skydatas'}} to database rizze on connection [connectionId{localValue:2, serverValue:9}] to server 127.0.0.1:27017
2015-05-16 10:43:22 041 DEBUG command:56 - Command execution completed
2015-05-16 10:43:22 042 DEBUG command:56 - Sending command {count : BsonString{value='skydatas'}} to database rizze on connection [connectionId{localValue:2, serverValue:9}] to server 127.0.0.1:27017
2015-05-16 10:43:22 060 DEBUG command:56 - Command execution completed
2015-05-16T10:49:34.448Z DEBUG Checking status of 127.0.0.1:27017 | | cluster:56
2015-05-16T10:49:34.452Z DEBUG Updating cluster description to {type=STANDALONE, servers=[{address=127.0.0.1:27017, type=STANDALONE, roundTripTime=1,3 ms, state=CONNECTED}] | | cluster:56
Thanks
You should be able to do this from your log4j configuration file.
The first thing I would do is temporarily add the fully-qualified class name to your log4j appender format (%C if you're using a formatter based on PatternLayout). That gives you the package and class names of the source of the logging statements you want to exclude.
Then follow the example from the log4j manual to specifically raise the logging level of the packages/classes you don't want to see (search the page for text from the snippet below to find it on the page for more context):
# Print only messages of level WARN or above in the package com.foo.
log4j.logger.com.foo=WARN
Don't forget to take the %C out of your pattern when you've excluded all the packages you don't want to see.
Here is a log4j configuration, I'm using with log4j:
# Log4J logger file
log = ./log4j
log4j.rootLogger = WARN, CONSOLE
log4j.logger.com.rizze.db=INFO
# Direct log messages to stdout
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Target=System.out
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss SSS} %-5p %m / %c{1}:%L %n
What is doing this work around
- show Logs with level >= INFO for package com.rizze.db
- show others Logs with level >= WARN
it is work for me.
add this line to you log4j configuration to fix this problem.
log4j.logger.org.mongodb.driver=ERROR