I have created a Spring Boot microservice which runs aggregation on a stream of data and writes it into various Cassandra tables. I am looking for a java library similar to Flyway which will migrate Cassandra schema with the existence of a script in a resource folder. Does anyone have any recommendations, preferably for a library which you personally have used in production?
I used builtamont:
<dependency>
<groupId>com.builtamont</groupId>
<artifactId>cassandra-migration</artifactId>
<version>0.9</version>
</dependency>
migration in code:
import com.builtamont.cassandra.migration.CassandraMigration;
import com.builtamont.cassandra.migration.api.configuration.KeyspaceConfiguration;
import org.springframework.beans.factory.InitializingBean;
class CassandraDataSourceMigration implements InitializingBean {
private final String ip;
private final String clusterName;
private final Integer port;
private final String keyspaceName;
private final String migrationsPath;
public CassandraDataSourceMigration(String ip, String clusterName, Integer port, String keyspaceName, String migrationsPath) {
this.ip = ip;
this.clusterName = clusterName;
this.port = port;
this.keyspaceName = keyspaceName;
this.migrationsPath = migrationsPath;
}
// getters/setters
#Override
public void afterPropertiesSet() throws Exception {
final KeyspaceConfiguration keyspaceConfig = new KeyspaceConfiguration();
keyspaceConfig.setName(keyspaceName);
keyspaceConfig.getClusterConfig().setContactpoints(new String[]{ip});
if (port != null) {
keyspaceConfig.getClusterConfig().setPort(port);
}
final CassandraMigration migrationProcessor = new CassandraMigration();
migrationProcessor.setLocations(new String[]{migrationsPath});
migrationProcessor.setKeyspaceConfig(keyspaceConfig);
migrationProcessor.migrate();
}
}
application.properties
cassandra.ip=127.0.0.1
cassandra.cluster=My cluster
cassandra.keyspace=saya
cassandra.migration=classpath:db/migration
cassandra.port=9042
And the migration script is under resources/db/migration V1_0__Init_table.cql
Related
I am having issue when I deploy the jar in AWS ubuntu server,I have 2 packages in Spring Boot.
#1)core package
#2)Process package
The first package(core) is not containing spring boot dependency and generate the jar using maven package,When I run the code the below code works fine in Intellij. But when I deploy in AWS Ubuntu machine I get Null Pointer Exception for MySQLAdapter Class used in the Process PackageController.
Below is the code
//Code in Process
#SpringBootApplication
public class ProcessApplication
{
#Bean
MySQLAdapter mySQLAdapter() {
MySQLAdapter mySQLAdapter = new MySQLAdapter();
return mySQLAdapter;
}
public static void main(String[] args) {
SpringApplication.run(com.test.ProcessApplication.class, args);
}
}
// Core package jar
public class MySQLAdapter {
private String hostName;
private int port;
private String DBName;
private String username;
private String pwd;
private String secretPwdKey;
private com.test.Util.CryptoUtil cryptoUtil;
public MySQLAdapter() {
hostName = "localhost";
port = 3306;
username = "test";
pwd = "test";
DBName = "test";
secretPwdKey = "test";
cryptoUtil = new com.test.Util.CryptoUtil();
}
}
Is this Null Pointer Exception because of cryptoUtil not getting injected or instantiated?
How can this be solved?
// Controller inside Process package
#RestController
class MyRestController{
#Autowired
MySQLAdapter mySQLAdapter;
#RequestMapping(value = {"/register"}, method = {RequestMethod.POST}, produces = {"application/json"})
#ResponseBody
public RegistrationResponse register(#Valid #RequestBody UserPojo userPojo) {
if (!this.mySQLAdapter.checkClientID(userPojo.getClientID())) { //Null Pointer thrown here
}
}
}
Our project is not currently using a spring framework.
Therefore, it is being tested based on the standalone tomcat runner.
However, since integration-enabled tests such as #SpringBootTest are not possible, Tomcat is operated in advance and the HTTP API test is carried out using Spock.
Is there a way to turn this like #SpringBootTest?
TomcatRunner
private Tomcat tomcat = null;
private int port = 8080;
private String contextPath = null;
private String docBase = null;
private Context rootContext = null;
public Tomcat8Launcher(){
init();
}
public Tomcat8Launcher(int port, String contextPath, String docBase){
this.port = port;
this.contextPath = contextPath;
this.docBase = docBase;
init();
}
private void init(){
tomcat = new Tomcat();
tomcat.setPort(port);
tomcat.enableNaming();
if(contextPath == null){
contextPath = "";
}
if(docBase == null){
File base = new File(System.getProperty("java.io.tmpdir"));
docBase = base.getAbsolutePath();
}
rootContext = tomcat.addContext(contextPath, docBase);
}
public void addServlet(String servletName, String uri, HttpServlet servlet){
Tomcat.addServlet(this.rootContext, servletName, servlet);
rootContext.addServletMapping(uri, servletName);
}
public void addListenerServlet(ServletContextListener listener){
rootContext.addApplicationListener(listener.getClass().getName());
}
public void startServer() throws LifecycleException {
tomcat.start();
tomcat.getServer().await();
}
public void stopServer() throws LifecycleException {
tomcat.stop();
}
public static void main(String[] args) throws Exception {
System.setProperty("java.util.logging.manager", "org.apache.logging.log4j.jul.LogManager");
System.setProperty(javax.naming.Context.INITIAL_CONTEXT_FACTORY, "org.apache.naming.java.javaURLContextFactory");
System.setProperty(javax.naming.Context.URL_PKG_PREFIXES, "org.apache.naming");
Tomcat8Launcher tomcatServer = new Tomcat8Launcher();
tomcatServer.addListenerServlet(new ConfigInitBaseServlet());
tomcatServer.addServlet("restServlet", "/rest/*", new RestServlet());
tomcatServer.addServlet("jsonServlet", "/json/*", new JsonServlet());
tomcatServer.startServer();
}
Spock API Test example
class apiTest extends Specification {
//static final Tomcat8Launcher tomcat = new Tomcat8Launcher()
static final String testURL = "http://localhost:8080/api/"
#Shared
def restClient
def setupSpec() {
// tomcat.main()
restClient = new RESTClient(testURL)
}
def 'findAll user'() {
when:
def response = restClient.get([path: 'user/all'])
then:
with(response){
status == 200
contentType == "application/json"
}
}
}
The test will not work if the annotations are removed from the annotations below.
// static final Tomcat8Launcher tomcat = new Tomcat8Launcher()
This line is specified API Test at the top.
// tomcat.main()
This line is specified API Test setupSpec() method
I don't know why, but only logs are recorded after Tomcat has operated and the test method is not executed.
Is there a way to fix this?
I would suggest to create a Spock extension to encapsulate everything you need. See writing custom extensions of the Spock docs as well as the built-in extensions for inspiration.
Introduction:
Let me start by apologizing for any vagueness in my question I will try to provide as much information on this topic as I can (hopefully not too much), and please let me know if I should provide more. As well, I am quite new to Kafka and will probably stumble on terminology.
So, from my understanding on how the sink and source work, I can use the FileStreamSourceConnector provided by the Kafka Quickstart guide to write data(Neo4j commands) to a topic held in a Kafka cluster. Then I can write my own Neo4j sink connector and task to read those commands and send them to one or more Neo4j servers. To keep the project as simple as possible, for now, I based the sink connector and task off of the Kafka Quickstart guide's FileStreamSinkConnector and FileStreamSinkTask.
Kafka's FileStream:
FileStreamSourceConnector
FileStreamSourceTask
FileStreamSinkConnector
FileStreamSinkTask
My Neo4j Sink Connector:
package neo4k.sink;
import org.apache.kafka.common.config.ConfigDef;
import org.apache.kafka.common.config.ConfigDef.Importance;
import org.apache.kafka.common.config.ConfigDef.Type;
import org.apache.kafka.common.utils.AppInfoParser;
import org.apache.kafka.connect.connector.Task;
import org.apache.kafka.connect.sink.SinkConnector;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Neo4jSinkConnector extends SinkConnector {
public enum Keys {
;
static final String URI = "uri";
static final String USER = "user";
static final String PASS = "pass";
static final String LOG = "log";
}
private static final ConfigDef CONFIG_DEF = new ConfigDef()
.define(Keys.URI, Type.STRING, "", Importance.HIGH, "Neo4j URI")
.define(Keys.USER, Type.STRING, "", Importance.MEDIUM, "User Auth")
.define(Keys.PASS, Type.STRING, "", Importance.MEDIUM, "Pass Auth")
.define(Keys.LOG, Type.STRING, "./neoj4sinkconnecterlog.txt", Importance.LOW, "Log File");
private String uri;
private String user;
private String pass;
private String logFile;
#Override
public String version() {
return AppInfoParser.getVersion();
}
#Override
public void start(Map<String, String> props) {
uri = props.get(Keys.URI);
user = props.get(Keys.USER);
pass = props.get(Keys.PASS);
logFile = props.get(Keys.LOG);
}
#Override
public Class<? extends Task> taskClass() {
return Neo4jSinkTask.class;
}
#Override
public List<Map<String, String>> taskConfigs(int maxTasks) {
ArrayList<Map<String, String>> configs = new ArrayList<>();
for (int i = 0; i < maxTasks; i++) {
Map<String, String> config = new HashMap<>();
if (uri != null)
config.put(Keys.URI, uri);
if (user != null)
config.put(Keys.USER, user);
if (pass != null)
config.put(Keys.PASS, pass);
if (logFile != null)
config.put(Keys.LOG, logFile);
configs.add(config);
}
return configs;
}
#Override
public void stop() {
}
#Override
public ConfigDef config() {
return CONFIG_DEF;
}
}
My Neo4j Sink Task:
package neo4k.sink;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.connect.sink.SinkRecord;
import org.apache.kafka.connect.sink.SinkTask;
import org.neo4j.driver.v1.AuthTokens;
import org.neo4j.driver.v1.Driver;
import org.neo4j.driver.v1.GraphDatabase;
import org.neo4j.driver.v1.Session;
import org.neo4j.driver.v1.StatementResult;
import org.neo4j.driver.v1.exceptions.Neo4jException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.Collection;
import java.util.Map;
public class Neo4jSinkTask extends SinkTask {
private static final Logger log = LoggerFactory.getLogger(Neo4jSinkTask.class);
private String uri;
private String user;
private String pass;
private String logFile;
private Driver driver;
private Session session;
public Neo4jSinkTask() {
}
#Override
public String version() {
return new Neo4jSinkConnector().version();
}
#Override
public void start(Map<String, String> props) {
uri = props.get(Neo4jSinkConnector.Keys.URI);
user = props.get(Neo4jSinkConnector.Keys.USER);
pass = props.get(Neo4jSinkConnector.Keys.PASS);
logFile = props.get(Neo4jSinkConnector.Keys.LOG);
driver = null;
session = null;
try {
driver = GraphDatabase.driver(uri, AuthTokens.basic(user, pass));
session = driver.session();
} catch (Neo4jException ex) {
log.trace(ex.getMessage(), logFilename());
}
}
#Override
public void put(Collection<SinkRecord> sinkRecords) {
StatementResult result;
for (SinkRecord record : sinkRecords) {
result = session.run(record.value().toString());
log.trace(result.toString(), logFilename());
}
}
#Override
public void flush(Map<TopicPartition, OffsetAndMetadata> offsets) {
}
#Override
public void stop() {
if (session != null)
session.close();
if (driver != null)
driver.close();
}
private String logFilename() {
return logFile == null ? "stdout" : logFile;
}
}
The Issue:
After writing that, I next built that including any dependencies that it had, excluding any Kafka dependencies, into a jar (Or Uber Jar? It was one file). Then I edited the plugin pathways in the connect-standalone.properties to include that artifact and wrote a properties file for my Neo4j sink connector. I did this all in an attempt to follow these guidelines.
My Neo4j sink connector properties file:
name=neo4k-sink
connector.class=neo4k.sink.Neo4jSinkConnector
tasks.max=1
uri=bolt://localhost:7687
user=neo4j
pass=Hunter2
topics=connect-test
But upon running the standalone, I get this error in the output that shuts down the stream (Error on line 5):
[2017-08-14 12:59:00,150] INFO Kafka version : 0.11.0.0 (org.apache.kafka.common.utils.AppInfoParser:83)
[2017-08-14 12:59:00,150] INFO Kafka commitId : cb8625948210849f (org.apache.kafka.common.utils.AppInfoParser:84)
[2017-08-14 12:59:00,153] INFO Source task WorkerSourceTask{id=local-file-source-0} finished initialization and start (org.apache.kafka.connect.runtime.WorkerSourceTask:143)
[2017-08-14 12:59:00,153] INFO Created connector local-file-source (org.apache.kafka.connect.cli.ConnectStandalone:91)
[2017-08-14 12:59:00,153] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:100)
java.lang.IllegalArgumentException: Malformed \uxxxx encoding.
at java.util.Properties.loadConvert(Properties.java:574)
at java.util.Properties.load0(Properties.java:390)
at java.util.Properties.load(Properties.java:341)
at org.apache.kafka.common.utils.Utils.loadProps(Utils.java:429)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:84)
[2017-08-14 12:59:00,156] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:65)
[2017-08-14 12:59:00,156] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer:154)
[2017-08-14 12:59:00,168] INFO Stopped ServerConnector#540accf4{HTTP/1.1}{0.0.0.0:8083} (org.eclipse.jetty.server.ServerConnector:306)
[2017-08-14 12:59:00,173] INFO Stopped o.e.j.s.ServletContextHandler#6d548d27{/,null,UNAVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:865)
Edit: I should mention that during the part of the connector loading where the output is declaring what plugins have been added, I do not see any mention of the jar that I built earlier and created a pathway for in connect-standalone.properties. Here's a snippet for context:
[2017-08-14 12:58:58,969] INFO Added plugin 'org.apache.kafka.connect.file.FileStreamSinkConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-08-14 12:58:58,969] INFO Added plugin 'org.apache.kafka.connect.tools.MockSourceConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-08-14 12:58:58,969] INFO Added plugin 'org.apache.kafka.connect.tools.VerifiableSourceConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-08-14 12:58:58,969] INFO Added plugin 'org.apache.kafka.connect.tools.VerifiableSinkConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-08-14 12:58:58,970] INFO Added plugin 'org.apache.kafka.connect.tools.MockConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
Conclusion:
I am at loss, I've done testing and researching for about a couple hours and I don't think I'm exactly sure what question to ask. So I'll say thank you for reading if you've gotten this far. If you noticed anything glaring that I may have done wrong in code or in method (e.g. packaging the jar), or think I should provide more context or console logs or anything really let me know. Thank you, again.
As pointed out by #Randall Hauch, my properties file had hidden characters within it because it was a rich text document. I fixed this by duplicating the connect-file-sink.properties file provided with Kafka, which I believe is just a regular text document. Then renaming and editing that duplicate for my neo4j sink properties.
Iam Trying to import data from Mysql to hive using SqoopOptions but getting this error:
ERROR tool.ImportTool: Imported Failed: Wrong FS: hdfs://localhost:8020/user/hive/warehouse/default/emp/_logs, expected: file:///
it is importing data into hdfs but not in hive.
here is my complete code.
import org.apache.sqoop.tool.ImportTool;
import com.cloudera.sqoop.SqoopOptions;
public class App
{
public static void main( String[] args )
{
importToHive("emp");
}
/* CONSTANTS */
private static final String JOB_NAME = "Sqoop Hive Job";
private static final String MAPREDUCE_JOB = "Hive Map Reduce Job";
private static final String DBURL ="jdbc:mysql://localhost:3306/sample";
private static final String DRIVER = "com.mysql.jdbc.Driver";
private static final String USERNAME = "root";
private static final String PASSWORD = "cloudera";
private static final String HADOOP_HOME ="/usr/lib/hadoop-0.20-mapreduce";
private static final String JAR_OUTPUT_DIR = "/tmp/sqoop/compile";
private static final String HIVE_HOME = "/usr/lib/hive";
private static final String HIVE_DIR = "/user/hive/warehouse/";
private static final String WAREHOUSE_DIR = "hdfs://localhost:8020/user/hive/warehouse/default";
private static final String SUCCESS = "SUCCESS !!!";
private static final String FAIL = "FAIL !!!";
/* **
* Imports data from RDBMS MySQL and uploads into Hive environment
*/
public static void importToHive(String table){
System.out.println("SqoopOptions loading .....");
/* MySQL connection parameters */
SqoopOptions options = new SqoopOptions();
options.setConnectString(DBURL);
options.doOverwriteHiveTable();
options.setTableName(table);
options.setDriverClassName(DRIVER);
options.setUsername(USERNAME);
options.setPassword(PASSWORD);
options.setHadoopMapRedHome(HADOOP_HOME);
/* Hive connection parameters */
options.setHiveHome(HIVE_HOME);
options.setHiveImport(true);
options.setHiveTableName("bsefmcgh");
options.setOverwriteHiveTable(true);
options.setFailIfHiveTableExists(false);
//options.setFieldsTerminatedBy(',');
options.setOverwriteHiveTable(true);
options.setDirectMode(true);
options.setNumMappers(1); // No. of Mappers to be launched for the job
options.setWarehouseDir(WAREHOUSE_DIR);
options.setJobName(JOB_NAME);
options.setMapreduceJobName(MAPREDUCE_JOB);
options.setTableName(table);
options.setJarOutputDir(JAR_OUTPUT_DIR);
System.out.println("Import Tool running ....");
ImportTool it = new ImportTool();
int retVal = it.run((com.cloudera.sqoop.SqoopOptions) options);
}
}
I believe you do not need to specify the name node address in the warehouse-dir option.
Try this:
private static final String WAREHOUSE_DIR = "/user/hive/warehouse/default";
I would like to write some integration with ElasticSearch. For testing I would like to run in-memory ES.
I found some information in documentation, but without example how to write those kind of test. Elasticsearch Reference [1.6] » Testing » Java Testing Framework » integration tests
« unit tests
Also I found following article, but it's out of data. Easy JUnit testing with Elastic Search
I looking example how to start and run ES in-memory and access to it over REST API.
Based on the second link you provided, I created this abstract test class:
#RunWith(SpringJUnit4ClassRunner.class)
public abstract class AbstractElasticsearchTest {
private static final String HTTP_PORT = "9205";
private static final String HTTP_TRANSPORT_PORT = "9305";
private static final String ES_WORKING_DIR = "target/es";
private static Node node;
#BeforeClass
public static void startElasticsearch() throws Exception {
removeOldDataDir(ES_WORKING_DIR + "/" + clusterName);
Settings settings = Settings.builder()
.put("path.home", ES_WORKING_DIR)
.put("path.conf", ES_WORKING_DIR)
.put("path.data", ES_WORKING_DIR)
.put("path.work", ES_WORKING_DIR)
.put("path.logs", ES_WORKING_DIR)
.put("http.port", HTTP_PORT)
.put("transport.tcp.port", HTTP_TRANSPORT_PORT)
.put("index.number_of_shards", "1")
.put("index.number_of_replicas", "0")
.put("discovery.zen.ping.multicast.enabled", "false")
.build();
node = nodeBuilder().settings(settings).clusterName("monkeys.elasticsearch").client(false).node();
node.start();
}
#AfterClass
public static void stopElasticsearch() {
node.close();
}
private static void removeOldDataDir(String datadir) throws Exception {
File dataDir = new File(datadir);
if (dataDir.exists()) {
FileSystemUtils.deleteRecursively(dataDir);
}
}
}
In the production code, I configured an Elasticsearch client as follows. The integration test extends the above defined abstract class and configures property elasticsearch.port as 9305 and elasticsearch.host as localhost.
#Configuration
public class ElasticsearchConfiguration {
#Bean(destroyMethod = "close")
public Client elasticsearchClient(#Value("${elasticsearch.clusterName}") String clusterName,
#Value("${elasticsearch.host}") String elasticsearchClusterHost,
#Value("${elasticsearch.port}") Integer elasticsearchClusterPort) throws UnknownHostException {
Settings settings = Settings.settingsBuilder().put("cluster.name", clusterName).build();
InetSocketTransportAddress transportAddress = new InetSocketTransportAddress(InetAddress.getByName(elasticsearchClusterHost), elasticsearchClusterPort);
return TransportClient.builder().settings(settings).build().addTransportAddress(transportAddress);
}
}
That's it. The integration test will run the production code which is configured to connect to the node started in the AbstractElasticsearchTest.startElasticsearch().
In case you want to use the elasticsearch REST api, use port 9205. E.g. with Apache HttpComponents:
HttpClient httpClient = HttpClients.createDefault();
HttpPut httpPut = new HttpPut("http://localhost:9205/_template/" + templateName);
httpPut.setEntity(new FileEntity(new File("template.json")));
httpClient.execute(httpPut);
Here is my implementation
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.UUID;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.node.Node;
import org.elasticsearch.node.NodeBuilder;
/**
*
* #author Raghu Nair
*/
public final class ElasticSearchInMemory {
private static Client client = null;
private static File tempDir = null;
private static Node elasticSearchNode = null;
public static Client getClient() {
return client;
}
public static void setUp() throws Exception {
tempDir = File.createTempFile("elasticsearch-temp", Long.toString(System.nanoTime()));
tempDir.delete();
tempDir.mkdir();
System.out.println("writing to: " + tempDir);
String clusterName = UUID.randomUUID().toString();
elasticSearchNode = NodeBuilder
.nodeBuilder()
.local(false)
.clusterName(clusterName)
.settings(
ImmutableSettings.settingsBuilder()
.put("script.disable_dynamic", "false")
.put("gateway.type", "local")
.put("index.number_of_shards", "1")
.put("index.number_of_replicas", "0")
.put("path.data", new File(tempDir, "data").getAbsolutePath())
.put("path.logs", new File(tempDir, "logs").getAbsolutePath())
.put("path.work", new File(tempDir, "work").getAbsolutePath())
).node();
elasticSearchNode.start();
client = elasticSearchNode.client();
}
public static void tearDown() throws Exception {
if (client != null) {
client.close();
}
if (elasticSearchNode != null) {
elasticSearchNode.stop();
elasticSearchNode.close();
}
if (tempDir != null) {
removeDirectory(tempDir);
}
}
public static void removeDirectory(File dir) throws IOException {
if (dir.isDirectory()) {
File[] files = dir.listFiles();
if (files != null && files.length > 0) {
for (File aFile : files) {
removeDirectory(aFile);
}
}
}
Files.delete(dir.toPath());
}
}
You can start ES on your local with:
Settings settings = Settings.settingsBuilder()
.put("path.home", ".")
.build();
NodeBuilder.nodeBuilder().settings(settings).node();
When ES started, access it over REST like:
http://localhost:9200/_cat/health?v
As of 2016 embedded elasticsearch is no-longer supported
As per a response from one of the devlopers in 2017 you can use the following approaches:
Use the Gradle tools elasticsearch already has. You can read some information about this here: https://github.com/elastic/elasticsearch/issues/21119
Use the Maven plugin: https://github.com/alexcojocaru/elasticsearch-maven-plugin
Use Ant scripts like http://david.pilato.fr/blog/2016/10/18/elasticsearch-real-integration-tests-updated-for-ga
Using Docker: https://www.testcontainers.org/modules/elasticsearch
Using Docker from maven: https://github.com/dadoonet/fscrawler/blob/e15dddf72b1ed094dad279d492e4e0314f73683f/pom.xml#L241-L289