Elasticsearch does not see static scripts? - java

Get a strange behavior of my Elasticsearch cluster: seems that it does not see static groovy scripts any more.
It complains that "dynamic scripting is disabled", however I am using stating script and correct name of it.
They did work before, and now can't understand what was changed.
Here are steps I am using to reproduce the problem:
Create index with mapping defining one string field and on nested object:
curl -XPUT localhost:9200/test/ -d '{
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}'
curl -XPUT localhost:9200/test/_mapping/testtype -d '{
"testtype": {
"properties": {
"name": {
"type": "string"
},
"features": {
"type": "nested",
"properties": {
"key": {
"type": "string",
"value": {
"type": "string"
}
}
}
}
}
}
}'
response:
{
"acknowledged": true
}
Put a single object there:
curl -XPUT localhost:9200/test/testtype/1 -d '{
"name": "hello",
"features": []
}'
Call update using the script:
curl -XPOST http://localhost:9200/test/testtype/1/_update -d '{
"script": "add-feature-if-not-exists",
"params": {
"new_feature": {
"key": "Manufacturer",
"value": "China"
}
}
}'
response:
{
"error": "RemoteTransportException[[esnew1][inet[/78.46.100.39:9300]][indices:data/write/update]];
nested: ElasticsearchIllegalArgumentException[failed to execute script];
nested: ScriptException[dynamic scripting for [groovy] disabled]; ",
"status": 400
}
Getting "dynamic scripting for [groovy] disabled" - but I am using a reference to static script name in "script" field. However I've seen this message occurring if the name of script was incorrect. But looks like it is correct:
The script is located in /etc/elasticsearch/scripts/ .
Verifying that /etc/elasticsearch is used as a config directory:
ps aux | grep elas
elastic+ 944 0.8 4.0 21523740 1322820 ? Sl 15:35 0:39 /usr/lib/jvm/java-7-oracle/bin/java -Xms16g -Xmx16g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.4.0.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch
Looking if script is there:
$ ls -l /etc/elasticsearch/ total 24 -rw-r--r-- 1 root root
total 24
-rw-r--r-- 1 root root 13683 Nov 25 14:52 elasticsearch.yml
-rw-r--r-- 1 root root 1511 Nov 15 04:13 logging.yml
drwxr-xr-x 2 root root 4096 Nov 25 15:07 scripts
$ ls -l /etc/elasticsearch/scripts/ total 8 -rw-r--r-- 1
total 8
-rw-r--r-- 1 elasticsearch elasticsearch 438 Nov 25 15:07 add-feature-if-not-exists.groovy
-rw-r--r-- 1 elasticsearch elasticsearch 506 Nov 23 02:52 add-review-if-not-exists.groovy
Any hints on why this is happening? What else have I check?
Update: cluster has two nodes.
Config on node1:
cluster.name: myclustername
node.name: "esnew1"
node.master: true
node.data: true
bootstrap.mlockall: true
network.host: zz.zz.zz.zz
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["esnew1.company.com","esnew2.company.com"]
index.store.type: niofs
script.disable_dynamic: true
script.auto_reload_enabled: true
watcher.interval: 30s
Config on node 2:
cluster.name: myclustername
node.name: "esnew2"
node.master: true
node.data: true
bootstrap.mlockall: true
metwork.bind_host: 127.0.0.1,zz.zz.zz.zz
network.publish_host: zz.zz.zz.zz
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["esnew1.company.com","esnew2.company.com"]
index.store.type: niofs
script.disable_dynamic: true
script.auto_reload_enabled: true
watcher.interval: 30s
Elasticsearch version:
$ curl localhost:9200
{
"status" : 200,
"name" : "esnew2",
"cluster_name" : "myclustername",
"version" : {
"number" : "1.4.0",
"build_hash" : "bc94bd81298f81c656893ab1ddddd30a99356066",
"build_timestamp" : "2014-11-05T14:26:12Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
"tagline" : "You Know, for Search"
}
P.S.: One observation confirming that probably ES just don't see the scripts: at some moment, ES was seeing one script, but did not see another. After restart it does not see none of them.
P.P.S.: The script:
do_add = false
def stringsEqual(s1, s2) {
if (s1 == null) {
return s2 == null;
}
return s1.equalsIgnoreCase(s2);
}
for (item in ctx._source.features) {
if (stringsEqual(item['key'], new_feature.key) {
if (! stringsEqual(item['value'], new_feature.value) {
item.value = new_feature.value;
}
} else {
do_add = true
}
}
if (do_add) {
ctx._source.features.add(new_feature)
}

Related

Java - Filtering an array of JSON objects by multiple properties

I have an object stored in Cloudant like this :
{
"attr1": "value",
"Objects": [{
"code": "v1",
"allowNull": "true",
"country": "BE"
},
{
"code": "v2",
"allowNull": "false",
"country": "EG"
}
]
}
And I want to do a filter criteria by code/country, so the output would be only one object of Objects list.
Is there a way to do that from Cloudant side? Or an efficient way to be done at Java side ?
You can achieve this with a map-reduce view in Cloudant. Try something along the lines of this:
function(doc) {
if (doc && doc.Objects) {
doc.Objects.forEach(function(obj) {
emit([obj.code, obj.country], obj);
});
}
}
This emits all items in the Objects list into the index, with a vector-valued key [code, country].
curl -s -XPOST -H "content-type:application/json" 'https://skruger.cloudant.com/demo/_design/query/_view/by_code_country?reduce=false' -d '{ "keys": [["v1","BE"]] }'
{"total_rows":2,"offset":0,"rows":[
{"id":"1c593a931bcd7f0052ed8f9184810fd9","key":["v1","BE"],"value":
{"code":"v1","allowNull":"true","country":"BE"}}
]}
You can query by code only, using the {} wildcard marker, e.g.
curl -g 'https://skruger.cloudant.com/demo/_design/query/_view/by_code_country?reduce=false&startkey=["v1"]&endkey=["v1",{}]'
{"total_rows":2,"offset":0,"rows":[
{"id":"1c593a931bcd7f0052ed8f9184810fd9","key":["v1","BE"],"value":
{"code":"v1","allowNull":"true","country":"BE"}}
]}

How many empty transactions with 0-gas will ethereumJ mine in order to make the first block?

My application waits for a onSyncDone() event of a test ethereumj network using the genesis file (see bottom here) and a configuration which resets every time the database.
In onSyncDone() a contract is submitted.
ethereumj is starting mining, each transaction mines without using gas. The difficulty arises a bit at each one. It mines and mines ... and I am asking myself when the first block will be "completed" (and the onSyncDone() will be fired / executed).
What can I do in order to avoid all that (empty 0-gas-mining)?
=======
the genesis file used:
{
"config": {
"chainId": 313,
"eip158Block": 10,
"byzantiumBlock": 1700000,
"headerValidators": [
{"number": 10, "hash": "0xb3074f936815a0425e674890d7db7b5e94f3a06dca5b22d291b55dcd02dde93e"},
{"number": 585503, "hash": "0xe8d61ae10fd62e639c2e3c9da75f8b0bdb3fa5961dbd3aed1223f92e147595b9"}
]
},
"nonce": "0x0000000000000042",
"timestamp": "0x00",
"parentHash": "0x0000000000000000000000000000000000000000000000000000000000000000",
"extraData": "0x3535353535353535353535353535353535353535353535353535353535353535",
"gasLimit": "0xFFFFFFF",
"gasPrice": "0x0F",
"difficulty": "0x01",
"mixhash": "0x0000000000000000000000000000000000000000000000000000000000000000",
"coinbase": "0x0000000000000000000000000000000000000000",
"alloc": {
"0000000000000000000000000000000000000013": {
"balance": "1000"
},
"0000000000000000000000000000000000000014": {
"balance": "1000"
},
"0000000000000000000000000000000000000015": {
"balance": "1000"
},
"0000000000000000000000000000000000000016": {
"balance": "1000"
},
"0000000000000000000000000000000000000017": {
"balance": "1000"
},
"0000000000000000000000000000000000000018": {
"balance": "1000"
},
"0000000000000000000000000000000000000019": {
"balance": "1000"
},
"000000000000000000000000000000000000001a": {
"balance": "1000"
},
"000000000000000000000000000000000000001b": {
"balance": "1000"
},
"000000000000000000000000000000000000001c": {
"balance": "1000"
},
"000000000000000000000000000000000000001d": {
"balance": "1000"
},
"000000000000000000000000000000000000001e": {
"balance": "1000"
},
"000000000000000000000000000000000000001f": {
"balance": "1000"
}
}
}
====== the conf used:
peer.discovery.enabled = false
peer.listen.port = 20202
peer.networkId = 888
cache.flush.memory = 0
cache.flush.blocks = 1
sync.enabled = false
mine.start = true
database.dir = sksDB
database.reset = true
#Key value data source values: [leveldb/redis/mapdb]
keyvalue.datasource=rocksdb
// a number of public peers for this network (not all of then may be functioning)
// need to get Seed/Miner node info and fill it in active peer section of all regular nodes like this:
peer.active =
[
{
ip= 10.0.1.120
port= 20202
nodeName = "sks1"
name = b3ea40366eae0206f7923a38c61ccfd1fcbd1185aa46596cfcba5eb762d484c15f998d6447162905507212742fbbda96507667d834192dd32bdc980e08e16ad3
}
]
// special genesis for this test network
genesis = eth_genesis_sksprivate.json
blockchain.config.name = "sks"
blockchain.config.class = "org.ethereum.config.blockchain.FrontierConfig"
====
the repeating "null?" transaction:
19:15:51.263 INFO [db] Flush started
19:15:51.263 INFO [db] Flush completed in 0 ms
19:15:53.889 INFO [net] TCP: Speed in/out 0b / 0b(sec), packets in/out 0/0, total in/out: 0b / 0b
19:15:53.889 INFO [net] UDP: Speed in/out 0b / 0b(sec), packets in/out 0/0, total in/out: 0b / 0b
19:15:53.891 INFO [sync] Sync state: Off, last block #2258, best known #2258
19:15:54.361 INFO [mine] Wow, block mined !!!: f90210f9020ba0bce27ba3b3f49e4aee35b456cbd0613889f30d1eabe34e5dd3439d88bbd11488a01dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347940000000000000000000000000000000000000000a03916196ec987cfc44c5e046681ce1c7f5ad77dd28f28565a0375a82479040ef8a056e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421a056e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421b90100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000008305fcd98208d3840fffffff80845b390c4891457468657265756d4a20706f7765726564a05f2cf27106233b7b2a90df459ad19fb6da3e79d2a220e5ef65759cc395e0d2b888844682ba8b355f9bc0c0
BlockData [ hash=d797ff12bebf61dcdff54dc90b71e5bf33b2af169a18a5ffc1ff4c9ad3875417
parentHash=bce27ba3b3f49e4aee35b456cbd0613889f30d1eabe34e5dd3439d88bbd11488
unclesHash=1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347
coinbase=0000000000000000000000000000000000000000
stateRoot=3916196ec987cfc44c5e046681ce1c7f5ad77dd28f28565a0375a82479040ef8
txTrieHash=56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421
receiptsTrieHash=56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421
difficulty=05fcd9
number=2259
gasLimit=0fffffff
gasUsed=0
timestamp=1530465352 (2018.07.01 19:15:52)
extraData=457468657265756d4a20706f7765726564
mixHash=5f2cf27106233b7b2a90df459ad19fb6da3e79d2a220e5ef65759cc395e0d2b8
nonce=844682ba8b355f9b
Uncles []
Txs []
]
1

java.lang.OutOfMemoryError: Java heap space when transferring data from jdbc to elasticsearch via logstash [duplicate]

This question already has answers here:
How to deal with "java.lang.OutOfMemoryError: Java heap space" error?
(31 answers)
Closed 1 year ago.
I have a huge postgres database with 20 million rows and i want to transfer it to elasticsearch via logstash . I followed the advice mentioned here and I test it for a simple database with 300 rows and all things worked fine but when i tested it for my main database i allways cross with error:
nargess#nargess-Surface-Book:/usr/share/logstash/bin$ sudo ./logstash -w 1 -f students.conf --path.data /usr/share/logstash/data/students/ --path.settings /etc/logstash
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid3453.hprof ...
Heap dump file created [13385912484 bytes in 53.304 secs]
Exception in thread "Ruby-0-Thread-11: /usr/share/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/thread_pool.rb:216" java.lang.ArrayIndexOutOfBoundsException: -1
at org.jruby.runtime.ThreadContext.popRubyClass(ThreadContext.java:729)
at org.jruby.runtime.ThreadContext.postYield(ThreadContext.java:1292)
at org.jruby.runtime.ContextAwareBlockBody.post(ContextAwareBlockBody.java:29)
at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:198)
at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
at org.jruby.runtime.Block.call(Block.java:101)
at org.jruby.RubyProc.call(RubyProc.java:300)
at org.jruby.RubyProc.call(RubyProc.java:230)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:103)
at java.lang.Thread.run(Thread.java:748)
The signal INT is in use by the JVM and will not work correctly on this platform
Error: Your application used more memory than the safety cap of 12G.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace
Although I go to file /etc/logstash/jvm.options and set -Xms256m
-Xmx12000m, but I have had these errors yet. I have 13g memory free. how can i send my data to elastic search with this memory ?
this is the student-index.json that i use in elasticsearch
{
"aliases": {},
"warmers": {},
"mappings": {
"tab_students_dfe": {
"properties": {
"stcode": {
"type": "text"
},
"voroodi": {
"type": "integer"
},
"name": {
"type": "text"
},
"family": {
"type": "text"
},
"namp": {
"type": "text"
},
"lastupdate": {
"type": "date"
},
"picture": {
"type": "text"
},
"uniquename": {
"type": "text"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
}
}
then i try to insert this index in elastic search by :
curl -XPUT --header "Content-Type: application/json"
http://localhost:9200/students -d #postgres-index.json
and next, this is my configuration fil in /usr/shar/logstash/bin/students.conf file :
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/postgres"
jdbc_user => "postgres"
jdbc_password => "postgres"
# The path to downloaded jdbc driver
jdbc_driver_library => "./postgresql-42.2.1.jar"
jdbc_driver_class => "org.postgresql.Driver"
# The path to the file containing the query
statement => "select * from students"
}
}
filter {
aggregate {
task_id => "%{stcode}"
code => "
map['stcode'] = event.get('stcode')
map['voroodi'] = event.get('voroodi')
map['name'] = event.get('name')
map['family'] = event.get('family')
map['namp'] = event.get('namp')
map['uniquename'] = event.get('uniquename')
event.cancel()
"
push_previous_map_as_event => true
timeout => 5
}
}
output {
elasticsearch {
document_id => "%{stcode}"
document_type => "postgres"
index => "students"
codec => "json"
hosts => ["127.0.0.1:9200"]
}
}
Thank you for your help
This is a bit old, but I just had the same issue and increasing the heap size of logstash helped me here. I added this to my logstash service in the docker-compose file:
environment:
LS_JAVA_OPTS: "-Xmx2048m -Xms2048m"
Further read: What are the -Xms and -Xmx parameters when starting JVM?

Index_not_found_exception no such index found in elasticsearch using Powershell

I have created two files that are,
jdbc_sqlserver.json:
{
"type": "jdbc",
"jdbc": {
"url": "jdbc:sqlserver://localhost:1433;databaseName=merchant2merchant;integratedSecurity=true;",
"user": "",
"password": "",
"sql": "select * from planets",
"treat_binary_as_string": true,
"elasticsearch": {
"cluster": "elasticsearch",
"host": "localhost",
"port": 9200
},
"index": "testing"
}
}
jdb_sqlserver.ps1:
function Get - PSVersion {
if (test - path variable: psversiontable) {
$psversiontable.psversion
} else {
[version]
"1.0.0.0"
}
}
$powershell = Get - PSVersion
if ($powershell.Major - le 2) {
Write - Error "Oh, so sorry, this script requires Powershell 3
(due to convertto - json)
"
exit
}
if ((Test - Path env: \JAVA_HOME) - eq $false) {
Write - Error "Environment variable JAVA_HOME must be set to your java home"
exit
}
curl - XDELETE "http://localhost:9200/users/"
$DIR = "C:\Program Files\elasticsearch\plugins\elasticsearch-jdbc-2.3.4.0- dist\elasticsearch-jdbc-2.3.4.0\"
$FEEDER_CLASSPATH = "$DIR\lib"
$FEEDER_LOGGER = "file://$DIR\bin\log4j2.xml"
java - cp "$FEEDER_CLASSPATH\*" - "Dlog4j.configurationFile=$FEEDER_LOGGER"
"org.xbib.tools.Runner"
"org.xbib.tools.JDBCImporter"
jdbc_sqlserver.json
and running the second one in Powershell using command .\jdb_sqlserver.ps1 in "C:\servers\elasticsearch\bin\feeder" path but I got error likwIndex_not_found_exception no such index found in powershell.

elasticsearch - Return the tokens of a field

How can I have the tokens of a particular field returned in the result
For example, A GET request
curl -XGET 'http://localhost:9200/twitter/tweet/1'
returns
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
}
I would like to have the tokens of '_source.message' field included in the result
There is also another way to do it using the following script_fields script:
curl -H 'Content-Type: application/json' -XPOST 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "message"
}
}
}
}'
It's important to note that while this script returns the actual terms that were indexed, it also caches all field values and on large indices can use a lot of memory. So, on large indices, it might be more useful to retrieve field values from stored fields or source and reparse them again on the fly using the following MVEL script:
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import java.io.StringReader;
// Cache analyzer for further use
cachedAnalyzer=(isdef cachedAnalyzer)?cachedAnalyzer:doc.mapperService().documentMapper(doc._type.value).mappers().indexAnalyzer();
terms=[];
// Get value from Fields Lookup
//val=_fields[field].values;
// Get value from Source Lookup
val=_source[field];
if(val != null) {
tokenStream=cachedAnalyzer.tokenStream(field, new StringReader(val));
CharTermAttribute termAttribute = tokenStream.addAttribute(CharTermAttribute);
while(tokenStream.incrementToken()) {
terms.add(termAttribute.toString())
};
tokenStream.close();
}
terms
This MVEL script can be stored as config/scripts/analyze.mvel and used with the following query:
curl 'http://localhost:9200/test-idx/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "analyze",
"params": {
"field": "message"
}
}
}
}'
If you mean the tokens that have been indexed you can make a terms facet on the message field. Increase the size value in order to get more entries back, or set to 0 to get all terms.
Lucene provides the ability to store the term vectors, but there's no way to have access to it with elasticsearch by now (as far as I know).
Why do you need that? If you only want to check what you're indexing you can have a look at the analyze api.
Nowadays, it's possible with the Term vectors API:
curl http://localhost:9200/twitter/_termvectors/1?fields=message
Result:
{
"_index": "twitter",
"_id": "1",
"_version": 1,
"found": true,
"took": 0,
"term_vectors": {
"message": {
"field_statistics": {
"sum_doc_freq": 4,
"doc_count": 1,
"sum_ttf": 4
},
"terms": {
"elastic": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 11,
"end_offset": 18
}
]
},
"out": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 7,
"end_offset": 10
}
]
},
"search": {
"term_freq": 1,
"tokens": [
{
"position": 3,
"start_offset": 19,
"end_offset": 25
}
]
},
"trying": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 6
}
]
}
}
}
}
}
Note: Mapping types (here: tweets) have been removed in Elasticsearch 8.x (see migration guide).

Categories