Introduction:
Let me start by apologizing for any vagueness in my question I will try to provide as much information on this topic as I can (hopefully not too much), and please let me know if I should provide more. As well, I am quite new to Kafka and will probably stumble on terminology.
So, from my understanding on how the sink and source work, I can use the FileStreamSourceConnector provided by the Kafka Quickstart guide to write data(Neo4j commands) to a topic held in a Kafka cluster. Then I can write my own Neo4j sink connector and task to read those commands and send them to one or more Neo4j servers. To keep the project as simple as possible, for now, I based the sink connector and task off of the Kafka Quickstart guide's FileStreamSinkConnector and FileStreamSinkTask.
Kafka's FileStream:
FileStreamSourceConnector
FileStreamSourceTask
FileStreamSinkConnector
FileStreamSinkTask
My Neo4j Sink Connector:
package neo4k.sink;
import org.apache.kafka.common.config.ConfigDef;
import org.apache.kafka.common.config.ConfigDef.Importance;
import org.apache.kafka.common.config.ConfigDef.Type;
import org.apache.kafka.common.utils.AppInfoParser;
import org.apache.kafka.connect.connector.Task;
import org.apache.kafka.connect.sink.SinkConnector;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Neo4jSinkConnector extends SinkConnector {
public enum Keys {
;
static final String URI = "uri";
static final String USER = "user";
static final String PASS = "pass";
static final String LOG = "log";
}
private static final ConfigDef CONFIG_DEF = new ConfigDef()
.define(Keys.URI, Type.STRING, "", Importance.HIGH, "Neo4j URI")
.define(Keys.USER, Type.STRING, "", Importance.MEDIUM, "User Auth")
.define(Keys.PASS, Type.STRING, "", Importance.MEDIUM, "Pass Auth")
.define(Keys.LOG, Type.STRING, "./neoj4sinkconnecterlog.txt", Importance.LOW, "Log File");
private String uri;
private String user;
private String pass;
private String logFile;
#Override
public String version() {
return AppInfoParser.getVersion();
}
#Override
public void start(Map<String, String> props) {
uri = props.get(Keys.URI);
user = props.get(Keys.USER);
pass = props.get(Keys.PASS);
logFile = props.get(Keys.LOG);
}
#Override
public Class<? extends Task> taskClass() {
return Neo4jSinkTask.class;
}
#Override
public List<Map<String, String>> taskConfigs(int maxTasks) {
ArrayList<Map<String, String>> configs = new ArrayList<>();
for (int i = 0; i < maxTasks; i++) {
Map<String, String> config = new HashMap<>();
if (uri != null)
config.put(Keys.URI, uri);
if (user != null)
config.put(Keys.USER, user);
if (pass != null)
config.put(Keys.PASS, pass);
if (logFile != null)
config.put(Keys.LOG, logFile);
configs.add(config);
}
return configs;
}
#Override
public void stop() {
}
#Override
public ConfigDef config() {
return CONFIG_DEF;
}
}
My Neo4j Sink Task:
package neo4k.sink;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.connect.sink.SinkRecord;
import org.apache.kafka.connect.sink.SinkTask;
import org.neo4j.driver.v1.AuthTokens;
import org.neo4j.driver.v1.Driver;
import org.neo4j.driver.v1.GraphDatabase;
import org.neo4j.driver.v1.Session;
import org.neo4j.driver.v1.StatementResult;
import org.neo4j.driver.v1.exceptions.Neo4jException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.Collection;
import java.util.Map;
public class Neo4jSinkTask extends SinkTask {
private static final Logger log = LoggerFactory.getLogger(Neo4jSinkTask.class);
private String uri;
private String user;
private String pass;
private String logFile;
private Driver driver;
private Session session;
public Neo4jSinkTask() {
}
#Override
public String version() {
return new Neo4jSinkConnector().version();
}
#Override
public void start(Map<String, String> props) {
uri = props.get(Neo4jSinkConnector.Keys.URI);
user = props.get(Neo4jSinkConnector.Keys.USER);
pass = props.get(Neo4jSinkConnector.Keys.PASS);
logFile = props.get(Neo4jSinkConnector.Keys.LOG);
driver = null;
session = null;
try {
driver = GraphDatabase.driver(uri, AuthTokens.basic(user, pass));
session = driver.session();
} catch (Neo4jException ex) {
log.trace(ex.getMessage(), logFilename());
}
}
#Override
public void put(Collection<SinkRecord> sinkRecords) {
StatementResult result;
for (SinkRecord record : sinkRecords) {
result = session.run(record.value().toString());
log.trace(result.toString(), logFilename());
}
}
#Override
public void flush(Map<TopicPartition, OffsetAndMetadata> offsets) {
}
#Override
public void stop() {
if (session != null)
session.close();
if (driver != null)
driver.close();
}
private String logFilename() {
return logFile == null ? "stdout" : logFile;
}
}
The Issue:
After writing that, I next built that including any dependencies that it had, excluding any Kafka dependencies, into a jar (Or Uber Jar? It was one file). Then I edited the plugin pathways in the connect-standalone.properties to include that artifact and wrote a properties file for my Neo4j sink connector. I did this all in an attempt to follow these guidelines.
My Neo4j sink connector properties file:
name=neo4k-sink
connector.class=neo4k.sink.Neo4jSinkConnector
tasks.max=1
uri=bolt://localhost:7687
user=neo4j
pass=Hunter2
topics=connect-test
But upon running the standalone, I get this error in the output that shuts down the stream (Error on line 5):
[2017-08-14 12:59:00,150] INFO Kafka version : 0.11.0.0 (org.apache.kafka.common.utils.AppInfoParser:83)
[2017-08-14 12:59:00,150] INFO Kafka commitId : cb8625948210849f (org.apache.kafka.common.utils.AppInfoParser:84)
[2017-08-14 12:59:00,153] INFO Source task WorkerSourceTask{id=local-file-source-0} finished initialization and start (org.apache.kafka.connect.runtime.WorkerSourceTask:143)
[2017-08-14 12:59:00,153] INFO Created connector local-file-source (org.apache.kafka.connect.cli.ConnectStandalone:91)
[2017-08-14 12:59:00,153] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:100)
java.lang.IllegalArgumentException: Malformed \uxxxx encoding.
at java.util.Properties.loadConvert(Properties.java:574)
at java.util.Properties.load0(Properties.java:390)
at java.util.Properties.load(Properties.java:341)
at org.apache.kafka.common.utils.Utils.loadProps(Utils.java:429)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:84)
[2017-08-14 12:59:00,156] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:65)
[2017-08-14 12:59:00,156] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer:154)
[2017-08-14 12:59:00,168] INFO Stopped ServerConnector#540accf4{HTTP/1.1}{0.0.0.0:8083} (org.eclipse.jetty.server.ServerConnector:306)
[2017-08-14 12:59:00,173] INFO Stopped o.e.j.s.ServletContextHandler#6d548d27{/,null,UNAVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:865)
Edit: I should mention that during the part of the connector loading where the output is declaring what plugins have been added, I do not see any mention of the jar that I built earlier and created a pathway for in connect-standalone.properties. Here's a snippet for context:
[2017-08-14 12:58:58,969] INFO Added plugin 'org.apache.kafka.connect.file.FileStreamSinkConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-08-14 12:58:58,969] INFO Added plugin 'org.apache.kafka.connect.tools.MockSourceConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-08-14 12:58:58,969] INFO Added plugin 'org.apache.kafka.connect.tools.VerifiableSourceConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-08-14 12:58:58,969] INFO Added plugin 'org.apache.kafka.connect.tools.VerifiableSinkConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
[2017-08-14 12:58:58,970] INFO Added plugin 'org.apache.kafka.connect.tools.MockConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:132)
Conclusion:
I am at loss, I've done testing and researching for about a couple hours and I don't think I'm exactly sure what question to ask. So I'll say thank you for reading if you've gotten this far. If you noticed anything glaring that I may have done wrong in code or in method (e.g. packaging the jar), or think I should provide more context or console logs or anything really let me know. Thank you, again.
As pointed out by #Randall Hauch, my properties file had hidden characters within it because it was a rich text document. I fixed this by duplicating the connect-file-sink.properties file provided with Kafka, which I believe is just a regular text document. Then renaming and editing that duplicate for my neo4j sink properties.
Related
I am getting the following exception when i try to retrieve an entry from kafka streams persistent state store:
org.apache.kafka.streams.errors.InvalidStateStoreException: Cannot get state store kafka-state-dir because the stream thread is STARTING, not RUNNING
I am using spring boot and kafka streams
Here is my code:
Configuration class
#Configuration
#EnableKafka
#EnableKafkaStreams
public class KafkaStreamsConfig {
#Value(value = "${spring.kafka.bootstrap-servers}")
private String bootstrapAddress;
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
KafkaStreamsConfiguration kStreamsConfig() {
Map<String, Object> props = new HashMap<>();
props.put(APPLICATION_ID_CONFIG, "applicationId");
props.put(BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
props.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.StringSerde.class);
props.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, CustomSerdes.messageContextSerde().getClass().getName());
props.put(STATE_DIR_CONFIG, "/app/kafka-state-dir");
return new KafkaStreamsConfiguration(props);
}
The kafka streams processor:
#Component
public class Processor {
private static final Logger logger = LoggerFactory.getLogger(MessageFinProcessor.class);
private static final Serde<String> STRING_SERDE = Serdes.String();
public static final String FIN_CACHE = "kafka-state-dir";
#Value("${kafka.topic.fin.message.ctx}")
private String finMessageContextTopic;
#Autowired
private void process(StreamsBuilder streamsBuilder) {
logger.info("Starting MessageFinProcessor on topic {}", finMessageContextTopic);
streamsBuilder.table(finMessageContextTopic,
Materialized.<String, MessageContext, KeyValueStore<Bytes, byte[]>>as(FIN_CACHE)
.withKeySerde(STRING_SERDE)
.withValueSerde(CustomSerdes.messageContextSerde()));
}
}
The Service where i am retrieving the entry from the cache:
#Service
public class KafkaStreamsStorageService {
private final StreamsBuilderFactoryBean streamsFactoryBean;
public static final String FIN_CACHE = "kafka-state-dir";
public KafkaStreamsStorageService(StreamsBuilderFactoryBean streamsFactoryBean) {
this.streamsFactoryBean = streamsFactoryBean;
}
public MessageContext get(String correlationId) {
KafkaStreams kafkaStreams = streamsFactoryBean.getKafkaStreams();
if (kafkaStreams != null) {
ReadOnlyKeyValueStore<String, MessageContext> keyValueStore = kafkaStreams.store(StoreQueryParameters.fromNameAndType(
FIN_CACHE, QueryableStoreTypes.keyValueStore()));
return keyValueStore.get(correlationId);
}
return null;
}
}
Inside the container where my java app runs i see only the following files in store dir:
ls -a /app/kafka-state-dir/applicationId
.lock
kafka-streams-process-metadata
Here is my Dockerfile:
FROM ...
ENV JVM_MEM_ARGS=-Xms128m\ -Xmx2g
ARG JAR_FILE
ADD ./target/${JAR_FILE} /app/myapp.jar
WORKDIR /app
CMD ["./run.sh", "myapp.jar"]
Here is also the volumes that i pass to the service in docker-compose.yml
...
volumes:
- /home/myuser/kafka-streams/:/app/kafka-state-dir/
Whereas when i run my app as a standalone java jar from intelij (with different profile), the whole procedure works as expected (i can retrieve the entry from the persistent store) and i see the following files inside the store dir:
ls -a /app/kafka-state-dir/applicationId
0_0
0_1
0_2
0_3
kafka-streams-process-metadata
.lock
ls kafka-state-dir/applicationId/0_0
.checkpoint
rocksdb
I have tried many different paths for the state.dir in order kafakstreams lib to be able to find it, but none of them worked. Do you have any ideas?
Thanks
In my code I have 2 cloud functions, cf1 and cf2. cf1 is triggered via pubsub topic t1 with a Google Scheduler cron job every 10 minutes and creates a list and sends it to topic t2 that triggers cf2. When I use the Google's example for the cf2 I can see my message and it works. However when I deploy my own code and log the message this is what I see: ```
cf2.accept:81) - data
.accept:83) - ms {"data_":{"bytes":[],"hash":0},"messageId_":"","orderingKey_":"","memoizedIsInitialized":-1,"unknownFields":{"fields":{},"fieldsDescending":{}},"memoizedSize":-1,"memoizedHashCode":0}
My code is: ```
public class cf2 implements BackgroundFunction<PubsubMessage> {
#Override
public void accept(PubsubMessage message, Context context) throws Exception {
if (message.getData() == null) {
logger.info("No message provided");
return;
}
String messageString = new String(
Base64.getDecoder().decode(message.getData().toStringUtf8()),
StandardCharsets.UTF_8);
logger.info(messageString);
logger.info("Starting the job");
String data = message.getData().toStringUtf8();
logger.info("data "+ data);
String ms = new Gson().toJson(message);
logger.info("ms "+ ms);
}```
But when I use Google's example code :
package com.example;
import com.example.Example.PubSubMessage;
import com.google.cloud.functions.BackgroundFunction;
import com.google.cloud.functions.Context;
import java.util.Base64;
import java.util.Map;
import java.util.logging.Logger;
public class Example implements BackgroundFunction<PubSubMessage> {
private static final Logger logger = Logger.getLogger(Example.class.getName());
#Override
public void accept(PubSubMessage message, Context context) {
String data = message.data != null
? new String(Base64.getDecoder().decode(message.data))
: "empty message";
logger.info(data);
}
public static class PubSubMessage {
String data;
Map<String, String> attributes;
String messageId;
String publishTime;
}
}
I see my message body very neatly in the logs. Can someone help me with what is wrong with my code?
Here's how I deploy my function:
gcloud --project=${PROJECT_ID} functions deploy \
cf2 \
--entry-point=path.to.cf2 \
--runtime=java11 \
--trigger-topic=t2 \
--timeout=540\
--source=folder \
--set-env-vars="PROJECT_ID=${PROJECT_ID}" \
--vpc-connector=projects/${PROJECT_ID}/locations/us-central1/connectors/appengine-default-connect
and when I log the message.getData() I get <ByteString#37c278a2 size=0 contents=""> while I know message is not empty ( I made another test subscription on the topic that helps me see the message there )
You need to define what is a PubSub message. This part is missing in your code and I don't know which PubSubMessage type you are using:
public static class PubSubMessage {
String data;
Map<String, String> attributes;
String messageId;
String publishTime;
}
It should solve your issue. Let me know.
I have a Spring Boot application and it needs to process some Kafka streaming data. I added an infinite loop to a CommandLineRunner class that will run on startup. In there is a Kafka consumer that can be woken up. I added a shutdown hook with Runtime.getRuntime().addShutdownHook(new Thread(consumer::wakeup));. Will I run into any problems? Is there a more idiomatic way of doing this in Spring? Should I use #Scheduled instead? The code below is stripped of specific Kafka-implementation stuff but otherwise complete.
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.errors.WakeupException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;
import java.time.Duration;
import java.util.Properties;
#Component
public class InfiniteLoopStarter implements CommandLineRunner {
private final Logger logger = LoggerFactory.getLogger(this.getClass());
#Override
public void run(String... args) {
Consumer<AccountKey, Account> consumer = new KafkaConsumer<>(new Properties());
Runtime.getRuntime().addShutdownHook(new Thread(consumer::wakeup));
try {
while (true) {
ConsumerRecords<AccountKey, Account> records = consumer.poll(Duration.ofSeconds(10L));
//process records
}
} catch (WakeupException e) {
logger.info("Consumer woken up for exiting.");
} finally {
consumer.close();
logger.info("Closed consumer, exiting.");
}
}
}
I'm not sure if you'll run into any issues there but it's a bit dirty - Spring has really nice built in support for working with Kafka so I would lean towards that (there's plenty of documentation on that on the web, but a nice one is: https://www.baeldung.com/spring-kafka).
You'll need the following dependency:
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
<version>2.2.2.RELEASE</version>
</dependency>
Configuration is as easy adding the #EnableKafka annotation to a config class and then setting up Listener and ConsumerFactory beans
Once configured you can setup a consumer easily as follows:
#KafkaListener(topics = "topicName")
public void listenWithHeaders(
#Payload String message,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition) {
System.out.println("Received Message: " + message"+ "from partition: " + partition);
}
Implementation look ok but using CommandLineRunner is not made for this. CommandLineRunner is used to run some task on startup only once. From Design perspective it's not very elegant. I would rather use spring integration adapter component with kafka. You can find example here https://github.com/raphaelbrugier/spring-integration-kafka-sample/blob/master/src/main/java/com/github/rbrugier/esb/consumer/Consumer.java .
To just answer my own question, I had a look at Kafka integration libraries like Spring-Kafka and Spring Cloud Stream but the integration with Confluent's Schema Registry is either not finished or not quite clear to me. It's simply enough for primitives but we need it for typed Avro objects that are validated by the schema registry. I now implemented a Kafka-agnostic solution, based on the answer at Spring Boot - Best way to start a background thread on deployment
The final code looks like this:
#Component
public class AccountStreamConsumer implements DisposableBean, Runnable {
private final Logger logger = LoggerFactory.getLogger(this.getClass());
private final AccountService accountService;
private final KafkaProperties kafkaProperties;
private final Consumer<AccountKey, Account> consumer;
#Autowired
public AccountStreamConsumer(AccountService accountService, KafkaProperties kafkaProperties,
ConfluentProperties confluentProperties) {
this.accountService = accountService;
this.kafkaProperties = kafkaProperties;
if (!kafkaProperties.getEnabled()) {
consumer = null;
return;
}
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, confluentProperties.getSchemaRegistryUrl());
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, kafkaProperties.getSecurityProtocolConfig());
props.put(SaslConfigs.SASL_MECHANISM, kafkaProperties.getSaslMechanism());
props.put(SaslConfigs.SASL_JAAS_CONFIG, PlainLoginModule.class.getName() + " required username=\"" + kafkaProperties.getUsername() + "\" password=\"" + kafkaProperties.getPassword() + "\";");
props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true);
props.put(ConsumerConfig.GROUP_ID_CONFIG, kafkaProperties.getAccountConsumerGroupId());
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class);
consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(kafkaProperties.getAccountsTopicName()));
Thread thread = new Thread(this);
thread.start();
}
#Override
public void run() {
if (!kafkaProperties.getEnabled())
return;
logger.debug("Started account stream consumer");
try {
//noinspection InfiniteLoopStatement
while (true) {
ConsumerRecords<AccountKey, Account> records = consumer.poll(Duration.ofSeconds(10L));
List<Account> accounts = new ArrayList<>();
records.iterator().forEachRemaining(record -> accounts.add(record.value()));
if (accounts.size() != 0)
accountService.store(accounts);
}
} catch (WakeupException e) {
logger.info("Account stream consumer woken up for exiting.");
} finally {
consumer.close();
}
}
#Override
public void destroy() {
if (consumer != null)
consumer.wakeup();
logger.info("Woke up account stream consumer, exiting.");
}
}
I have created a Spring Boot microservice which runs aggregation on a stream of data and writes it into various Cassandra tables. I am looking for a java library similar to Flyway which will migrate Cassandra schema with the existence of a script in a resource folder. Does anyone have any recommendations, preferably for a library which you personally have used in production?
I used builtamont:
<dependency>
<groupId>com.builtamont</groupId>
<artifactId>cassandra-migration</artifactId>
<version>0.9</version>
</dependency>
migration in code:
import com.builtamont.cassandra.migration.CassandraMigration;
import com.builtamont.cassandra.migration.api.configuration.KeyspaceConfiguration;
import org.springframework.beans.factory.InitializingBean;
class CassandraDataSourceMigration implements InitializingBean {
private final String ip;
private final String clusterName;
private final Integer port;
private final String keyspaceName;
private final String migrationsPath;
public CassandraDataSourceMigration(String ip, String clusterName, Integer port, String keyspaceName, String migrationsPath) {
this.ip = ip;
this.clusterName = clusterName;
this.port = port;
this.keyspaceName = keyspaceName;
this.migrationsPath = migrationsPath;
}
// getters/setters
#Override
public void afterPropertiesSet() throws Exception {
final KeyspaceConfiguration keyspaceConfig = new KeyspaceConfiguration();
keyspaceConfig.setName(keyspaceName);
keyspaceConfig.getClusterConfig().setContactpoints(new String[]{ip});
if (port != null) {
keyspaceConfig.getClusterConfig().setPort(port);
}
final CassandraMigration migrationProcessor = new CassandraMigration();
migrationProcessor.setLocations(new String[]{migrationsPath});
migrationProcessor.setKeyspaceConfig(keyspaceConfig);
migrationProcessor.migrate();
}
}
application.properties
cassandra.ip=127.0.0.1
cassandra.cluster=My cluster
cassandra.keyspace=saya
cassandra.migration=classpath:db/migration
cassandra.port=9042
And the migration script is under resources/db/migration V1_0__Init_table.cql
I would like to write some integration with ElasticSearch. For testing I would like to run in-memory ES.
I found some information in documentation, but without example how to write those kind of test. Elasticsearch Reference [1.6] » Testing » Java Testing Framework » integration tests
« unit tests
Also I found following article, but it's out of data. Easy JUnit testing with Elastic Search
I looking example how to start and run ES in-memory and access to it over REST API.
Based on the second link you provided, I created this abstract test class:
#RunWith(SpringJUnit4ClassRunner.class)
public abstract class AbstractElasticsearchTest {
private static final String HTTP_PORT = "9205";
private static final String HTTP_TRANSPORT_PORT = "9305";
private static final String ES_WORKING_DIR = "target/es";
private static Node node;
#BeforeClass
public static void startElasticsearch() throws Exception {
removeOldDataDir(ES_WORKING_DIR + "/" + clusterName);
Settings settings = Settings.builder()
.put("path.home", ES_WORKING_DIR)
.put("path.conf", ES_WORKING_DIR)
.put("path.data", ES_WORKING_DIR)
.put("path.work", ES_WORKING_DIR)
.put("path.logs", ES_WORKING_DIR)
.put("http.port", HTTP_PORT)
.put("transport.tcp.port", HTTP_TRANSPORT_PORT)
.put("index.number_of_shards", "1")
.put("index.number_of_replicas", "0")
.put("discovery.zen.ping.multicast.enabled", "false")
.build();
node = nodeBuilder().settings(settings).clusterName("monkeys.elasticsearch").client(false).node();
node.start();
}
#AfterClass
public static void stopElasticsearch() {
node.close();
}
private static void removeOldDataDir(String datadir) throws Exception {
File dataDir = new File(datadir);
if (dataDir.exists()) {
FileSystemUtils.deleteRecursively(dataDir);
}
}
}
In the production code, I configured an Elasticsearch client as follows. The integration test extends the above defined abstract class and configures property elasticsearch.port as 9305 and elasticsearch.host as localhost.
#Configuration
public class ElasticsearchConfiguration {
#Bean(destroyMethod = "close")
public Client elasticsearchClient(#Value("${elasticsearch.clusterName}") String clusterName,
#Value("${elasticsearch.host}") String elasticsearchClusterHost,
#Value("${elasticsearch.port}") Integer elasticsearchClusterPort) throws UnknownHostException {
Settings settings = Settings.settingsBuilder().put("cluster.name", clusterName).build();
InetSocketTransportAddress transportAddress = new InetSocketTransportAddress(InetAddress.getByName(elasticsearchClusterHost), elasticsearchClusterPort);
return TransportClient.builder().settings(settings).build().addTransportAddress(transportAddress);
}
}
That's it. The integration test will run the production code which is configured to connect to the node started in the AbstractElasticsearchTest.startElasticsearch().
In case you want to use the elasticsearch REST api, use port 9205. E.g. with Apache HttpComponents:
HttpClient httpClient = HttpClients.createDefault();
HttpPut httpPut = new HttpPut("http://localhost:9205/_template/" + templateName);
httpPut.setEntity(new FileEntity(new File("template.json")));
httpClient.execute(httpPut);
Here is my implementation
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.UUID;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.node.Node;
import org.elasticsearch.node.NodeBuilder;
/**
*
* #author Raghu Nair
*/
public final class ElasticSearchInMemory {
private static Client client = null;
private static File tempDir = null;
private static Node elasticSearchNode = null;
public static Client getClient() {
return client;
}
public static void setUp() throws Exception {
tempDir = File.createTempFile("elasticsearch-temp", Long.toString(System.nanoTime()));
tempDir.delete();
tempDir.mkdir();
System.out.println("writing to: " + tempDir);
String clusterName = UUID.randomUUID().toString();
elasticSearchNode = NodeBuilder
.nodeBuilder()
.local(false)
.clusterName(clusterName)
.settings(
ImmutableSettings.settingsBuilder()
.put("script.disable_dynamic", "false")
.put("gateway.type", "local")
.put("index.number_of_shards", "1")
.put("index.number_of_replicas", "0")
.put("path.data", new File(tempDir, "data").getAbsolutePath())
.put("path.logs", new File(tempDir, "logs").getAbsolutePath())
.put("path.work", new File(tempDir, "work").getAbsolutePath())
).node();
elasticSearchNode.start();
client = elasticSearchNode.client();
}
public static void tearDown() throws Exception {
if (client != null) {
client.close();
}
if (elasticSearchNode != null) {
elasticSearchNode.stop();
elasticSearchNode.close();
}
if (tempDir != null) {
removeDirectory(tempDir);
}
}
public static void removeDirectory(File dir) throws IOException {
if (dir.isDirectory()) {
File[] files = dir.listFiles();
if (files != null && files.length > 0) {
for (File aFile : files) {
removeDirectory(aFile);
}
}
}
Files.delete(dir.toPath());
}
}
You can start ES on your local with:
Settings settings = Settings.settingsBuilder()
.put("path.home", ".")
.build();
NodeBuilder.nodeBuilder().settings(settings).node();
When ES started, access it over REST like:
http://localhost:9200/_cat/health?v
As of 2016 embedded elasticsearch is no-longer supported
As per a response from one of the devlopers in 2017 you can use the following approaches:
Use the Gradle tools elasticsearch already has. You can read some information about this here: https://github.com/elastic/elasticsearch/issues/21119
Use the Maven plugin: https://github.com/alexcojocaru/elasticsearch-maven-plugin
Use Ant scripts like http://david.pilato.fr/blog/2016/10/18/elasticsearch-real-integration-tests-updated-for-ga
Using Docker: https://www.testcontainers.org/modules/elasticsearch
Using Docker from maven: https://github.com/dadoonet/fscrawler/blob/e15dddf72b1ed094dad279d492e4e0314f73683f/pom.xml#L241-L289