OrientDB Lucene Index very slow

OrientDB Lucene Index very slow - java

I have a database with 100.000+ records. I want to index them on two fields with Lucene, so I added the following index:
create index Book.search on Book (title,isbn) FULLTEXT ENGINE LUCENE
However, when I search on one of the fields using the following query:
select from Book where [title,isbn] LUCENE "android"
The query is taking a very long time, like it's doing a full table scan. If I use the explain plan, it also suggests that's it's doing that:
explain select from Book where [title,isbn] LUCENE "android"
Result:
{
"result": [
{
"#type": "d",
"#version": 0,
"documentReads": 80551,
"current": "#16:217944",
"documentAnalyzedCompatibleClass": 80551,
"recordReads": 80551,
"_memoryIndex": "isbn:\n\t'[61 6c 6c 61]':1: [(1)]\n\t'[63 6f 6d 70 6c 65 74 6f]':1: [(6)]\n\t'[63 6f 6e]':1: [(3)]\n\t'[63 6f 72 73 6f]':1: [(5)]\n\t'[65 64 69 74 69 6f 6e]':1: [(15)]\n\t'[67 75 69 64 61]':1: [(0)]\n\t'[69 6d 70 61 72 61 72 65]':1: [(8)]\n\t'[69 74 61 6c 69 61 6e]':1: [(14)]\n\t'[70 65 72]':1: [(7)]\n\t'[70 6f 63 6f]':1: [(12)]\n\t'[70 72 6f 67 72 61 6d 6d 61 72 65]':1: [(10)]\n\t'[70 72 6f 67 72 61 6d 6d 61 7a 69 6f 6e 65]':1: [(2)]\n\t'[72]':1: [(4)]\n\t'[74 65 6d 70 6f]':1: [(13)]\n\tterms=14, positions=14, memory=32.9 KB\ntitle:\n\t'[31 35 33 30 30 35 38 32 33 36]':1: [(0)]\n\tterms=1, positions=1, memory=32.9 KB\n\nfields=2, terms=15, positions=15, memory=66.6 KB",
"fetchingFromTargetElapsed": 17037,
"evaluated": 80551,
"user": "#5:0",
"tips": [
"Query 'SELECT FROM Book WHERE [title, isbn] LUCENE \"android\"' fetched more than 50000 records: to speed up the execution, create an index or change the query to use an existent index"
],
"elapsed": 17040.559,
"resultType": "collection",
"resultSize": 848,
"#fieldTypes": "documentReads=l,current=x,documentAnalyzedCompatibleClass=l,recordReads=l,fetchingFromTargetElapsed=l,evaluated=l,user=x,elapsed=f"
}
],
"warnings": [
"Query 'SELECT FROM Book WHERE [title, isbn] LUCENE \"android\"' fetched more than 50000 records: to speed up the execution, create an index or change the query to use an existent index"
],
"notification": "Query executed in 17.686 sec. Returned 1 record(s)"
}
What am I missing here?

From your explain there is no index involved. So yes it is doing the scan
From the picture of your indexes i saw that the fields are declared in this order [isbn,title]
This should solve it:
select count(1) from Book where [isbn,title] LUCENE "android"

I tried to replicate your problem with 96000 records.
I used OrientDb 2.1.12.
Class book
I inserted a book with title "android" and isbn "12345"
The query select from Book where [title,isbn] LUCENE "android"
it was performed quickly
With explain
{
"result": [
{
"#type": "d",
"#version": 0,
"documentReads": 1,
"fullySortedByIndex": false,
"documentAnalyzedCompatibleClass": 1,
"recordReads": 1,
"Book_search_totalHits": 1,
"luceneIndex": true,
"fetchingFromTargetElapsed": 16,
"indexIsUsedInOrderBy": false,
"score": 8.087625,
"current": "#12:140533",
"totalHits": 1,
"_memoryIndex": "isbn:\n\t'[31 32 33 34 35]':1: [(0)]\n\tterms=1, positions=1, memory=32.9 KB\ntitle:\n\t'[61 6e 64 72 6f 69 64]':1: [(0)]\n\tterms=1, positions=1, memory=32.9 KB\n\nfields=2, terms=2, positions=2, memory=66.5 KB",
"involvedIndexes": [
"Book.search"
],
"limit": -1,
"evaluated": 1,
"user": "#5:0",
"elapsed": 11.263393,
"resultType": "document",
"resultSize": 1,
"#fieldTypes": "documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,fetchingFromTargetElapsed=l,score=f,current=x,involvedIndexes=e,evaluated=l,user=x,elapsed=f"
}
],
"notification": "Query executed in 0.042 sec. Returned 1 record(s)"
}
How many records do you have with the title that contains "android" ?
Are they more than 50000 ?

Related

How to encode properly the plus sign (+) when making a request with webflux webclient?

I am trying to send an international formatted phone number using spring Webflux Webclient and to read this phone number via another application also using webflux.
My code looks like this :
webClient = WebClient.builder()
.baseUrl(baseUrl)
.build();
return webClient
.get()
.uri(uriBuilder -> uriBuilder
.path("/endpoint")
.queryParam("phone-number", "+33612345678")
.build()
)
.retrieve()
.bodyToMono(String.class);
Unfortunately, somewhere between this call and the receiver, the plus sign is replaced by a space.
The endpoint receives : " 33612345678" as a String.
The netty debug log of the request shows this :
+--------+-------------------------------------------------+----------------+
|00000000| 47 45 54 20 2f 63 75 73 74 6f 6d 65 72 73 3f 70 |GET /endpoint?p|
|00000010| 68 6f 6e 65 2d 6e 75 6d 62 65 72 3d 2b 33 33 36 |hone-number=+336|
|00000020| 31 32 33 34 35 36 37 38 26 6f 6e 6c 79 2d 72 65 |12345678
I tried to encode the phone-number by myself like this :
.queryParam("phone-number", UriUtils.encode("+34612345678", StandardCharsets.UTF_8))
And netty's log shows :
+--------+-------------------------------------------------+----------------+
|00000000| 47 45 54 20 2f 63 75 73 74 6f 6d 65 72 73 3f 70 |GET /endpoint?p|
|00000010| 68 6f 6e 65 2d 6e 75 6d 62 65 72 3d 25 32 35 32 |hone-number=%252|
|00000020| 42 33 34 36 31 32 33 34 35 36 37 38 20 48 54 54 |B34612345678 HTT|
It seems that the phone number has been encoded twice.
+ -> %2B -> %252B
the plus sign has been encoded by UriUtils.encode then uriBuilder has encoded the %.
The only way I found to make it work is by disabling the encoding of the UriBuilder :
DefaultUriBuilderFactory factory = new DefaultUriBuilderFactory(baseUrl);
factory.setEncodingMode(DefaultUriBuilderFactory.EncodingMode.NONE);
this.webClient = WebClient.builder()
.baseUrl(baseUrl)
.uriBuilderFactory(factory)
.build();
and having my custom encoding UriUtils.encode("+34612345678", StandardCharsets.UTF_8)
in which case the netty's logs looks like expected :
+--------+-------------------------------------------------+----------------+
|00000000| 47 45 54 20 2f 63 75 73 74 6f 6d 65 72 73 3f 70 |GET /endpoint?p|
|00000010| 68 6f 6e 65 2d 6e 75 6d 62 65 72 3d 25 32 42 33 |hone-number=%2B3|
|00000020| 34 36 31 32 33 34 35 36 37 38 20 48 54 54 50 2f |4612345678 HTTP/|
And of course, the endpoint receiving the phone number get : "+33612345678"
To sum it up, it looks like the UriBuilder is encoding certain sign like "%" but does not encode the "+" sign.
Spring reference : https://docs.spring.io/spring-framework/docs/current/reference/html/web.html#web-uri-encoding

I struggled with same issue but found a workaround from the Spring reference you linked.
This should work for you:
return webClient
.get()
.uri(uriBuilder -> UriComponentsBuilder.fromUri(uriBuilder.build())
.path("/endpoint")
.queryParam("phone-number", "{phone-number}")
.encode()
.buildAndExpand("+33612345678")
.toUri()
)
.retrieve()
.bodyToMono(String.class);

I had the same issue and was able to resolve it in my Kotlin application using:
webClient.get()
.uri { uriBuilder ->
val queryParams = mapOf(
"phone-number" to "+33612345678",
"mobile-number" to "+61432111222"
)
uriBuilder.path("/endpoint")
.apply {
queryParams.keys.forEach { key ->
queryParam(key, "{$key}")
}
}
.build(queryParams)
}
.retrieve()
.bodyToMono<String>()

A solution provided here, you can try this way
DefaultUriBuilderFactory factory = new DefaultUriBuilderFactory();
factory.setEncodingMode(DefaultUriBuilderFactory.EncodingMode.VALUES_ONLY);
URI uri = factory.uriString("https://spring.io/")
.queryParam("query", "{query}")
.build("spring+framework");
// http://spring.io/?query=spring%2Bframework

Store data into aerospike

I have some data which is to be stored into aerospike against a column
Suppose the incomming data is
["A", 1]
Now the first question is how to hold this data in Java.
I tried this.
ArrayList value = new ArrayList();
value.add(new String("A"));
value.add(new Integer(2));
When i try to write this data to aerospike using
AerospikeClient client = client.put(new WritePolicy(),
new Key("namespace", "set", "test"),
new Bin("binName", value) );
Then if i use AQL to query then i am seeing
| AC ED 00 05 73 72 00 13 6A 61 76 61 2E 75 74 69 6C 2E 41 72 72 61 79 4C 69 73 74 78 81 D2 1D 99 C7 61 9D 03 00 01 49 00 04 73 69 7A 65 78 70 00 00 00 02 77 04 00 00 00 02 73 72 00 11 6A 61 76 61 2E 6C 61 6E 67 2E 49 6E 74 65 67 65 72 12 E2 A0 A4 F7 81 87 |
Some HexaDecimal numbers
but when i try to store the Data into Aerospike using
AerospikeClient client = client.put(new WritePolicy(),
new Key("namespace", "set", "test"),
new Bin("binName", Value.getAsList(value)) );
Then firing query through AQL gives me
["A",1]
Which is and seems like the intended behaviour but when i use the Aerospike client to fetch the values and check their types
List<Object> ret = (List<Object>) client.get(new Policy(), key, "test").getValue("binName");
if(ret.get(0) instanceof Long){
System.out.println("Got instance of long");
}
Then i can see the print statement though Initially i sent Integer data.
Why is this happening, and can anyone tell me the any alternate solution to save an incomming data into aerospike say the data is
["A",1]
PS:Please support your answer with small code snippit
FOUND SOME INFO ON GITHUB
In reference to this link there is some function
which i am copy/pasting below
/**
* Write/Read ArrayList<Object> directly instead of relying on java serializer.
*/
private void testListComplex(AerospikeClient client, Parameters params) throws Exception {
console.info("Read/Write ArrayList<Object>");
Key key = new Key(params.namespace, params.set, "listkey2");
client.delete(params.writePolicy, key);
byte[] blob = new byte[] {3, 52, 125};
ArrayList<Object> list = new ArrayList<Object>();
list.add("string1");
list.add(2);
list.add(blob);
Bin bin = new Bin(params.getBinName("listbin2"), list);
client.put(params.writePolicy, key, bin);
Record record = client.get(params.policy, key, bin.name);
List<?> receivedList = (List<?>) record.getValue(bin.name);
validateSize(3, receivedList.size());
validate("string1", receivedList.get(0));
// Server convert numbers to long, so must expect long.
validate(2L, receivedList.get(1));
validate(blob, (byte[])receivedList.get(2));
console.info("Read/Write ArrayList<Object> successful.");
}
There is a comment that server converts number to long
Now i have a question. So does it mean for this type of Case integer cannot be stored?

If you check the Aerospike datatypes, you'll see that they only support 64-bit integers, which are long types in Java.
If you don't need to access this data on the server through Lua UDF scripts, then you can just save it as serialized blob data. The java driver already supports native serialization which is what you did in the first attempt. The AQL is just showing you the serialized bytes but you can read it back in the java client just fine.
Or you can store it as a json serialized string so that it's more compatible with other language drivers that you might use in the future.

compile-time error: illegal Character: \8279

When I try to compile my servlet I get folowing exception:
illegal character: \8279
And it's pointing to &
msg.setContent("<a href=\"" + server +
":8080/myApp/ResetPasswordPage.jsp?randNum=" + randNum + ‌
"&practiceName=" + practiceName+"\" Click Here </a>",
"text/html" );
I can't find a whole lot on the net about it...

I tried to copy this String to a java file in Eclipse. When I tried to save it I got :
There are 2 problematic invisible characters just after randNum +.
Remove them.

This is a dump of a copy-and-paste of your code:
00000010 3c 61 20 68 72 65 66 3d 5c 22 22 20 2b 20 73 65 |<a href=\"" + se|
00000020 72 76 65 72 20 2b 20 0a 20 20 20 20 20 20 20 20 |rver + . |
00000030 20 20 20 20 20 20 20 22 3a 38 30 38 30 2f 6d 79 | ":8080/my|
00000040 41 70 70 2f 52 65 73 65 74 50 61 73 73 77 6f 72 |App/ResetPasswor|
00000050 64 50 61 67 65 2e 6a 73 70 3f 72 61 6e 64 4e 75 |dPage.jsp?randNu|
00000060 6d 3d 22 20 2b 20 72 61 6e 64 4e 75 6d 20 2b 20 |m=" + randNum + |
00000070 e2 80 8c e2 80 8b 0a 20 20 20 20 20 20 20 20 20 |....... |
00000080 20 20 20 20 20 20 22 26 70 72 61 63 74 69 63 65 | "&practice|
00000090 4e 61 6d 65 3d 22 20 2b 20 70 72 61 63 74 69 63 |Name=" + practic|
000000a0 65 4e 61 6d 65 2b 22 5c 22 20 43 6c 69 63 6b 20 |eName+"\" Click |
Note the e2 80 8c and e2 80 8b between randNum + and the next line. You need to remove those.

convert txt packet data to pcap format to open it by Wireshark

Hi I am working on application where I have to read live packets
from network work on it. And display it in sophisticated way.
But problem is I have packet but it is in text file, so to open it
by Wireshark I have to convert it in .pcap format.
So how can I convert packet in text to pcap format.
My text file format is like this shown below,
Frame:
Frame: number = 0
Frame: timestamp = 2014-02-13 09:39:11.288
Frame: wire length = 174 bytes
Frame: captured length = 174 bytes
Frame:
Eth: ******* Ethernet - "Ethernet" - offset=0 (0x0) length=14
Eth:
Eth: destination = 01:00:5e:7f:ff:fa
Eth: .... ..0. .... .... = [0] LG bit
Eth: .... ...0 .... .... = [0] IG bit
Eth: source = ec:9a:74:4d:8e:03
Eth: .... ..0. .... .... = [0] LG bit
Eth: .... ...0 .... .... = [0] IG bit
Eth: type = 0x800 (2048) [ip version 4]
Eth:
Ip: ******* Ip4 - "ip version 4" - offset=14 (0xE) length=20 protocol suite=NETWORK
Ip:
Ip: version = 4
Ip: hlen = 5 [5 * 4 = 20 bytes, No Ip Options]
Ip: diffserv = 0x0 (0)
Ip: 0000 00.. = [0] code point: not set
Ip: .... ..0. = [0] ECN bit: not set
Ip: .... ...0 = [0] ECE bit: not set
Ip: length = 160
Ip: id = 0x4CD1 (19665)
Ip: flags = 0x0 (0)
Ip: 0.. = [0] reserved
Ip: .0. = [0] DF: do not fragment: not set
Ip: ..0 = [0] MF: more fragments: not set
Ip: offset = 0
Ip: ttl = 0 [time to live]
Ip: type = 17 [next: User Datagram]
Ip: checksum = 0xB0AA (45226) [correct]
Ip: source = 124.125.80.90
Ip: destination = 239.255.255.250
Ip:
Udp: ******* Udp offset=34 (0x22) length=8
Udp:
Udp: source = 58845
Udp: destination = 1900
Udp: length = 140
Udp: checksum = 0x5154 (20820) [correct]
Udp:
Data: ******* Payload offset=42 (0x2A) length=132
Data:
002a: 4d 2d 53 45 41 52 43 48 20 2a 20 48 54 54 50 2f M-SEARCH * HTTP/
003a: 31 2e 31 0d 0a 48 6f 73 74 3a 32 33 39 2e 32 35 1.1..Host:239.25
004a: 35 2e 32 35 35 2e 32 35 30 3a 31 39 30 30 0d 0a 5.255.250:1900..
005a: 53 54 3a 75 72 6e 3a 73 63 68 65 6d 61 73 2d 75 ST:urn:schemas-u
006a: 70 6e 70 2d 6f 72 67 3a 64 65 76 69 63 65 3a 57 pnp-org:device:W
007a: 41 4e 43 6f 6e 6e 65 63 74 69 6f 6e 44 65 76 69 ANConnectionDevi
008a: 63 65 3a 31 0d 0a 4d 61 6e 3a 22 73 73 64 70 3a ce:1..Man:"ssdp:
009a: 64 69 73 63 6f 76 65 72 22 0d 0a 4d 58 3a 33 0d discover"..MX:3.
00aa: 0a 0d 0a 00

Wireshark provides a command line pcap converter for Text Files:
https://www.wireshark.org/docs/man-pages/text2pcap.html
AutoHotkey solution:
; Change appropriate file locations
Run, %A_ProgramFiles%\ethereal\text2pcap.exe c:\test.txt c:\testconv.cap,%A_ProgramFiles%\ethereal
If you want to do a fully automated solution you can modify this function which actively watches a directory for file changes/creation.
http://www.autohotkey.com/board/topic/41653-watchdirectory/

If you have raw packets captured you can write them directly to pcap file format (see man 5 pcap-savefile) or use hexdump/xxd+text2pcap utility as ahkcoder recommends. Text2pcap also supports generation of dummy L2-4 headers (ethernet, ip, tcp/udp/sctp).
If you have only text representation, you can either reconstruct packet from it (so, generate all appropriate headers for each protocol used in your system) or adjust hex dump part offsets (to begin from 0000) and feed it to text2pcap.

Read Java in as Hex

I have tried to solve this but I keep coming up with stuff that is no help I'm sure this is easy (when you know how of course ;) )
What I would like to do is read in a file using a byte stream like below:
while((read = in.read()) != -1){
//code removed to save space
Integer.toHexString(read);
System.out.println(read);
}
When it prints out the Hex to the screen it will print out numbers fine e.g
31
13
12
0
but when it comes to a hex code that should be 01 31 it will print 0 131. I want to read it in to a variable like you would see in a hex editor i.e 00 11 21 31 no single numbers as i need to scan the whole file and look for patterns which I know how to do I'm just stuck on this :/
so in short i need a variabe to contain the two hex characters i.e int temp = 01 not int temp = 0 , I hope this all makes sense, I'm a little confused as it's 3am!
If anyone knows how to do this I would be most greatful, p.s thanks for the help in advance this site has saved me loads of research and have learnt a lot!
Many thanks.

This method :
public static void printHexStream(final InputStream inputStream, final int numberOfColumns) throws IOException{
long streamPtr=0;
while (inputStream.available() > 0) {
final long col = streamPtr++ % numberOfColumns;
System.out.printf("%02x ",inputStream.read());
if (col == (numberOfColumns-1)) {
System.out.printf("\n");
}
}
}
will output something like this :
40 32 38 00 5f 57 69 64 65 43
68 61 72 54 6f 4d 75 6c 74 69
42 79 74 65 40 33 32 00 5f 5f
69 6d 70 5f 5f 44 65 6c 65 74
65 46 69 6c 65 41 40 34 00 5f
53 65 74 46 69 6c 65 50 6f 69
6e 74 65 72 40 31 36 00 5f 5f
69 6d 70 5f 5f 47 65 74 54 65
6d 70 50 61 74 68 41 40 38 00
Is it what you are looking for?

I think what you're looking for is a formatter. Try:
Formatter formatter = new Formatter();
formatter.format("%02x", your_int);
System.out.println(formatter.toString());
Does that do what you're looking for? Your question wasn't all that clear (and I think maybe you deleted too much code from your snippet).

import org.apache.commons.io.IOUtils;
import org.apache.commons.codec.binary.Hex;
InputStream is = new FileInputStream(new File("c:/file.txt"));
String hexString = Hex.encodeHexString(IOUtils.toByteArray(is));
In java 7 you can read byte array directly from file as below :
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.Path;
Path path = Paths.get("path/to/file");
byte[] data = Files.readAllBytes(path)

Hi everyone one posted, thanks for the reply but the way I eneded up doing it was:
hexIn = in.read();
s = Integer.toHexString(hexIn);
if(s.length() < 2){
s = "0" + Integer.toHexString(hexIn);
}
Just thought I would post they way I did it for anyone else in future, thank you soo much for your help though!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

OrientDB Lucene Index very slow - java

From your explain there is no index involved. So yes it is doing the scan From the picture of your indexes i saw that the fields are declared in this order [isbn,title] This should solve it: select count(1) from Book where [isbn,title] LUCENE "android"

Related

How to encode properly the plus sign (+) when making a request with webflux webclient?

Store data into aerospike

compile-time error: illegal Character: \8279

convert txt packet data to pcap format to open it by Wireshark

Read Java in as Hex

Categories

Resources