Accessing IMDB dataset in AWS in R, Python or Java

Accessing IMDB dataset in AWS in R, Python or Java - java

I am trying to connect to IMDB dataset in AWS.
I've already signed up for AWS and set up the credential.
I'm more familiar with R and apparently there's R package called aws.s3. And when I used s3HTTP function, I get errors as below
s3HTTP(verb="GET", bucket="imdb-datasets", path="documents/v1/current/name.basics.tsv.gz",
request_body = "documents/v1/current/name.basics.tsv.gz",
headers=list('x-amz-request-payer' = "requester"),
key=Sys.setenv("AWS_ACCESS_KEY_ID"="*******"), secret=Sys.setenv("AWS_SECRET_KEY"="******"))
List of 5
$ Code : chr "InvalidAccessKeyId"
$ Message : chr "The AWS Access Key Id you provided does not
exist in our records."
$ AWSAccessKeyId: chr "TRUE"
$ RequestId : chr "234D5ED951AD2468"
$ HostId : chr "ugVtbV2Qz6NrNFD7ODO84MnzYttftsjHwbAawExo75Bg9xq3JAXOuDqF8GcYLd5vD6TgcHe/ib4="
- attr(*, "headers")=List of 6
..$ x-amz-request-id : chr "234D5ED951AD2468"
..$ x-amz-id-2 : chr "ugVtbV2Qz6NrNFD7ODO84MnzYttftsjHwbAawExo75Bg9xq3JAXOuDqF8GcYLd5vD6TgcHe/ib4="
..$ content-type : chr "application/xml"
..$ transfer-encoding: chr "chunked"
..$ date : chr "Mon, 20 Nov 2017 08:37:13 GMT"
..$ server : chr "AmazonS3"
..- attr(*, "class")= chr [1:2] "insensitive" "list"
- attr(*, "class")= chr "aws_error"
- attr(*, "request_canonical")= chr "GET\n/imdb-
datasets/\nlocation=\nhost:s3.amazonaws.com\nx-amz-
date:20171120T083712Z\n\nhost;x-amz-date\ne3b0c44"| __truncated__
- attr(*, "request_string_to_sign")= chr "AWS4-HMAC-
SHA256\n20171120T083712Z\n20171120/us-east-
1/s3/aws4_request\n760638139d8fa8fa1e36b824f481abe59184955"| __truncated__
- attr(*, "request_signature")= chr "AWS4-HMAC-SHA256
Credential=TRUE/20171120/us-east-1/s3/aws4_request,
SignedHeaders=host;x-amz-date, Signature=b"| __truncated__
NULL
My access key is up to date and I have no problem accessing my own bucket.
I also copied the java example codes provided by IMDB on their webpage (http://www.imdb.com/interfaces/), and it seemed to be compiling without errors, but there's no file downloaded in my bucket in AWS.

Related

Only recognizing one token in antlr4 grammar

I want my grammar to recognize the following expression &COL[0]. I have built the following grammar:
array:
ARRAY_NAME L_RIGHT_PAR (ARRAY_DIGIT|STRING) R_RIGHT_PAR;
ARRAY_DIGIT:DIGIT+;
ARRAY_NAME: '&''COL';
STRING : QUOT ('\\"' | ~'"')* QUOT
;
L_RIGHT_PAR : '[' ;
R_RIGHT_PAR : ']' ;
fragment
DIGIT : '0'..'9' ;
This gives the error:
mismatched input '[1]' expecting '['
It only works if I write &COL[ 0] with spaces between the [ and ]

I changed the grammar a bit to make it complete enough to run. The text &COL[0] lexes fine with this amended grammar.
grammar test1; // different name for my test rig
test1: ARRAY_NAME L_RIGHT_PAR (ARRAY_DIGIT|STRING) R_RIGHT_PAR;
ARRAY_DIGIT:DIGIT+;
ARRAY_NAME: '&''COL';
STRING : QUOT ('\\"' | ~'"')* QUOT
;
QUOT: '"'; // assumed this
L_RIGHT_PAR : '[' ;
R_RIGHT_PAR : ']' ;
fragment
DIGIT : '0'..'9' ;
WS : [ \t\r\n] -> skip; // added whitespace just so I could add \r\n
Here's the tokenized output:
[#0,0:3='&COL',<ARRAY_NAME>,1:0]
[#1,4:4='[',<'['>,1:4]
[#2,5:5='0',<ARRAY_DIGIT>,1:5]
[#3,6:6=']',<']'>,1:6]
[#4,9:8='<EOF>',<EOF>,2:0]
So this answers the question you asked but I'm still not sure about your definition of STRING. But &COL[0] parses great now.

Parse without ignoring whitespaces - Java

I have the following string input (from a netstat -a command):
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ] DGRAM 11453 /run/systemd/shutdownd
unix 2 [ ] DGRAM 7644 /run/systemd/notify
unix 2 [ ] DGRAM 7646 /run/systemd/cgroups-agent
unix 5 [ ] DGRAM 7657 /run/systemd/journal/socket
unix 14 [ ] DGRAM 7659 /dev/log
unix 3 [ ] STREAM CONNECTED 16620
unix 3 [ ] STREAM CONNECTED 16621
Meanwhile I'm attempting to parse the above string as:
// lines is an array representing each line above
for (int i = 0; i < lines.length; i++) {
String[] tokens = lines[i].split("\\s+");
}
I want to have tokens as an array of 7 entries [Proto, RefCnt, Flag, Type, State, I-Node, Path]. Instead, I'm obtaining an array that excludes the brackets under Flags and the empty State:
["unix", "2", "[", "]", "DGRAM", "11453", "/run/systemd/shutdownd"]
instead of
["unix", "2", "[]", "DGRAM", "", "11453", "/run/systemd/shutdownd"]
How can I fix my regex to produce the correct output?

You need to set minimal space length in your regular expression to 2, try split like this:
String[] tokens = lines[i].split("\\s{2,16}+");
Or like #revo suggests using lookarounds, like this:
String[] tokens = lines[i].split("(?<!\\[)\\s{2,16}+(?!\\])");

How do I get my external IP address and external Port Number from a STUN request?

I am behind the NAT of my WiFi router. When I send a packet from port 20,000,it comes out of my WiFi router on port 56867. I want to be able to get the output port number (the 56867) programmatically and I am trying do so via the STUN protocol.
I am using the STUN client code in package "com.sun.stun" at https://code.google.com/p/openwonderland-jvoicebridge/source/browse/branches/jp/stun/src/com/sun/stun/StunClient.java . I tried using the STUN client code to get my external IP address and port number by writing the following code:
import com.sun.stun.*;
import java.net.*;
public class Main {
public static void main(String args[]) throws Exception {
InetSocketAddress ii = new InetSocketAddress("stun.ekiga.net", 3478);
DatagramSocket s = new DatagramSocket();
System.out.println("LocalAddress(): " + s.getLocalAddress());
System.out.println("LocalPort(): " + s.getLocalPort());
s.setSoTimeout(10000);
StunClient stunClient1 = new StunClient(ii, s);
System.out.println("external InetAddress/Port: " + stunClient1.getMappedAddress().toString());
}
}
My output looks like this:
run:
LocalAddress(): 0.0.0.0/0.0.0.0
LocalPort(): 32979
May 30, 2015 11:38:16 PM unmodified.com.sun.stun.StunClient run
INFO: using STUN server stun.ekiga.net/217.10.68.152:3478
external InetAddress/Port: /0.0.0.0:32979
BUILD SUCCESSFUL (total time: 0 seconds)
The problem is that stunClient1.getMappedAddress() is giving me the binding address and ephemeral port number of DatagramSocket s and I want to get my public ip and public port number. Please help.
* Explaination / Clarification *
When I create a DatagramSocket, the OS/Java assigns it an ephemeral port number, in this case 58100. I have a Wifi router that has NAT. When I send a packet through my router to stun.ekiga.net, the external port number that stun.ekiga.net is sending its response back to is not the same as the the internal port number. I want the external port number that the STUN server is replying to.
* Helpful information *
I have received the following STUN packet:
Length of response is: 88 bytes
STUN response type received: 257 [StunHeader.BINDING_RESPONSE]
STUN response packet in binary format:
00000001::00000001::00000000::01000100::11011100::00001110::00110110::10101000::11011100::00001110::00110110::10101000::11011100::00001110::00110110::10101000::11011100::00001110::00110110::10101000::00000000::00000001::00000000::00001000::00000000::00000001::10000000::00010100::00000000::00000000::00000000::00000000::00000000::00000100::00000000::00001000::00000000::00000001::00001101::10010110::11011001::00001010::01000100::10011000::00000000::00000101::00000000::00001000::00000000::00000001::00001101::10010111::11011001::01110100::01111010::10001010::10000000::00100000::00000000::00001000::00000000::00000001::01011100::00011010::11011100::00001110::00110110::10101000::10000000::00100010::00000000::00010000::01010110::01101111::01110110::01101001::01100100::01100001::00101110::01101111::01110010::01100111::00100000::00110000::00101110::00111001::00110110::00000000::
STUN response packet in hexadecimal format:
01:01:00:44:DC:0E:36:A8:DC:0E:36:A8:DC:0E:36:A8:DC:0E:36:A8:00:01:00:08:00:01:80:14:00:00:00:00:00:04:00:08:00:01:0D:96:D9:0A:44:98:00:05:00:08:00:01:0D:97:D9:74:7A:8A:80:20:00:08:00:01:5C:1A:DC:0E:36:A8:80:22:00:10:56:6F:76:69:64:61:2E:6F:72:67:20:30:2E:39:36:00:
STUN response packet in decimal format:
1 : 1 : 0 : 68 : -36 : 14 : 54 : -88 : -36 : 14 : 54 : -88 : -36 : 14 : 54 : -88 : -36 : 14 : 54 : -88 : 0 : 1 : 0 : 8 : 0 : 1 : -128 : 20 : 0 : 0 : 0 : 0 : 0 : 4 : 0 : 8 : 0 : 1 : 13 : -106 : -39 : 10 : 68 : -104 : 0 : 5 : 0 : 8 : 0 : 1 : 13 : -105 : -39 : 116 : 122 : -118 : -128 : 32 : 0 : 8 : 0 : 1 : 92 : 26 : -36 : 14 : 54 : -88 : -128 : 34 : 0 : 16 : 86 : 111 : 118 : 105 : 100 : 97 : 46 : 111 : 114 : 103 : 32 : 48 : 46 : 57 : 54 : 0 :
How do I extract my public/external port number from the binary data in the stun packet?

Here's how: https://github.com/patmooney/jstun
Run a client (which simply pings the Server and gets back the external IP and port of the request)
./run Test

Transferring data structure from R to Java

I have an R script that does some computation. The last step of the computation is a kernel density estimate: http://www.inside-r.org/packages/cran/kerdiest/docs/kde
I now, in R, need to convert the result of calling kde into a string, or save it into a file, such that I can read and "unmarshal" it from a Java program.
What is the best format to use for the exchange and what R and Java libraries can read / write that format?
The structure is not ridiculously complex, but also not trivial:
> str(tmp)
List of 8
$ x : num [1:1398, 1:3] 1.035 0.902 0.679 0.826 1.243 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "Rb ppm" "Sb ppm" "Cr ppm"
$ eval.points:'data.frame': 1398 obs. of 3 variables:
..$ Rb ppm: num [1:1398] 1.035 0.902 0.679 0.826 1.243 ...
..$ Sb ppm: num [1:1398] -2.58 -2.6 -2.48 -2.44 -2.53 ...
..$ Cr ppm: num [1:1398] 4.56 4.44 4.3 4.26 4.49 ...
$ estimate : Named num [1:1398] 0.1572 0.0897 0.0311 0.0434 0.099 ...
..- attr(*, "names")= chr [1:1398] "1" "2" "3" "4" ...
$ H : num [1:3, 1:3] 0.02395 0.00927 -0.014 0.00927 0.06868 ...
$ gridded : logi FALSE
$ binned : logi FALSE
$ names : chr [1:3] "Rb ppm" "Sb ppm" "Cr ppm"
$ w : num [1:1398] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "class")= chr "kde"

RJSONIO seems to do the job. It seems quite verbose, however.

Antlr4 - Parser for multi line file -

I'm trying to use antlr4 to parse a ssh command result, but I can not figure out why this code doesn't work, I keep getting an "extraneous input" error.
Here is a sample of the file I'm trying to parse :
system
home[1] HOME-NEW
sp
cpu[1]
cpu[2]
home[2] SECOND-HOME
sp
cpu[1]
cpu[2]
Here is my grammar file :
listAll
: ( system | home | NL)*
;
elements
: (sp | cpu )*
;
home
: 'home[' number ']' value NL elements
;
system
: 'system' NL
;
sp
: 'sp' NL
;
cpu
: 'cpu[' number ']' NL
;
value
: VALUE
;
number
: INT
;
VALUE : STRING+;
STRING: ('a'..'z'|'A'..'Z'| '-' | ' ' | '(' | ')' | '/' | '.' | '[' | ']');
INT : ('0'..'9')+ ;
NL : '\r'? '\n';
WS : (' '|'\t')* {skip();} ;
The entry point is 'listAll'.
Here is the result I get :
(listAll \r\n (system system \r\n) home[1] HOME-NEW \r\n sp \r\n cpu[1] \r\n cpu[2] \r\n[...])
The parsing failed after 'system'. And I get this error :
line 2:1 extraneous input 'home[1] HOME-NEW' expecting {, system', NL, WS}
Does anybody know why this is not working ?
I am a beginner with Antlr, and I'm not sure I really understand how it works !
Thank you all !

You need to combine NL and WS as one WS element and skip it using -> skip (not {skip()})
And since the WS will be skipped automatically, no need to specify it in all the rules.
Also, your STRING had a space (' ') which was causing the error and taking up the next input.
Here is your complete grammar :
listAll : ( system | home )* ;
elements : ( sp | cpu )* ;
home : 'home[' number ']' value elements;
system : 'system' ;
sp : 'sp' ;
cpu : 'cpu[' number ']' ;
value : VALUE ;
number : INT ;
VALUE : STRING+;
STRING : ('a'..'z'|'A'..'Z'| '-' | '(' | ')' | '/' | '.' | '[' | ']') ;
INT : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
Also, I'll suggest you to go through the ANTLR4 Documentation

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Accessing IMDB dataset in AWS in R, Python or Java - java

Related

Only recognizing one token in antlr4 grammar

Parse without ignoring whitespaces - Java

How do I get my external IP address and external Port Number from a STUN request?

Transferring data structure from R to Java

Antlr4 - Parser for multi line file -

Categories

Resources