Parse MIME sender in Java (RFC 822)

Parse MIME sender in Java (RFC 822) - java

MIME message senders appear in formats such as:
"John Doe" <johndoe#gmail.com>
<johndoe#gmail.com>
I'm trying to figure out how to extract the string "johndoe#gmail.com" in the above examples, although I will also need the "johndoe" and "gmail.com" parts (per RFC I'm pretty sure splitting on # is all that's needed from here). Obviously regex-ing up my own parser is one (not great) option.
It seemed this may be possible using javax.mail.internet.MimeMessage. All of the constructors require a Folder which I do not have (well, I sort of do, it exists in the IMAP layer), e.g.
MimeMessage(Folder folder, InputStream is, int msgnum)
Which makes me feel I'm using this class wrong. Nonetheless, if I parse this way I do get access to the getFrom() method which returns an array of Address, which itself doesn't offer methods of use to me.
Using mime4j it's easy to get this far:
case T_FIELD: // field means header
if(token.getName() == "from") {
// get raw string as above - unparsed
So using mime4j or using java, javax etc. utilities it should be possible to extract the "a#b.com" part of the address from there, but I haven't found a class within javax or mime4j that is responsible for this yet.

I think you need InternetAddress class from javax.mail:
http://docs.oracle.com/javaee/6/api/javax/mail/internet/InternetAddress.html#getAddress()
Minimum working example:
import javax.mail.internet.AddressException;
import javax.mail.internet.InternetAddress;
public class JavaMailExample {
public static void main(String[] args) throws AddressException {
String fullemail = "\"John Doe\" <johndoe#gmail.com>";
InternetAddress addr = new InternetAddress(fullemail);
System.out.println(addr.getPersonal()); // John Doe
System.out.println(addr.getAddress()); // johndoe#gmail.com
}
}

Related

KSQL: UDF does not accept parameters (STRING, STRING)

I'm running in to a problem with KSQL while trying to set up an ETL pipeline using a UDF. At some point in the ETL process I need to isolate specific info from a description field (VARCHAR) in my data. A made-up example for context:
description = "species=dog.sex=male.color=blonde.age=10." (the real data is formatted in the same way)
I've written a simple UDF to isolate any information on demand. It looks like this:
package com.my.package;
/** IMPORTS **/
import io.confluent.ksql.function.udf.Udf;
import io.confluent.ksql.function.udf.UdfDescription;
/** ClASS DEFINITION **/
#UdfDescription(name = "extract_from_description",
author = "Me",
version = "0.0.1",
description = "Given a description and a request for information, isolates and returns the requested information. Pass requested tag as 'tag='".)
public class Extract_From_Description {
#Udf(description = "Given a description and a request for information, isolates and returns the requested information. Pass requested tag as 'tag='.)
public String extract_from_description(final String description, final String request) {
return description.split(request)[1].split("\\.")[0];
}
}
I can upload and register the function just fine, it's listed and described properly when I run:
ksql> list functions;
ksql> describe function EXTRACT_FROM_DESCRIPTION;
I call the function like this to create a new stream:
CREATE STREAM result AS
SELECT recordId,
OtherVariables,
EXTRACT_FROM_DESCRIPTION(description, 'species=') AS species
FROM parent_stream
EMIT CHANGES;
There I get an error I can't make sense of:
Function 'extract_from_description' does not accept parameters (STRING, STRING).
Valid alternatives are:
Apparently KSQL can't properly interpret what the input for the function is supposed to be (looks like it expects no input?) and I can't figure out why. I've read through documentation to see if I define my function in a weird way but can't find any differences between the examples and my function. I did notice there are supposed to be several ways to define the input a function takes and tried them all, but the result is always the same.
I use Maven to create the jar file for this function (JDK1.8.0_201). Can anyone help me figure out what's going on?
TL;DR: My KSQL UDF doesn't accept input of type (String, String) even though the function specifies the input should be of type (String, String)

Found the problem, answering here for anyone that might run in to the same problem.
You need to specify the parameters using #UdfParameter, like this:
import io.confluent.ksql.function.udf.UdfParameter; // add this to the list of imports
// add #UdfParameter(name) to each input variable
public String extract_from_description(#UdfParameter(value = "description") final String description, #UdfParameter(value = "request") final String request){
function body
}

j2objc - exception when deserialize enum

I am having trouble deserializing objects that contain an enum. The object serializes without complaint, but I get an InvalidObjectException when I deserialize the object. The exception message says that there is "No enum constant com.mypackagname."
I have isolated and reproduced the problem by creating some test code based on the testSerialization() method in SerializationTest.java.
public class SerializationTest {
private static final String TEST_FILE_NAME = "serialization-test.bin";
public enum Gender { MALE, FEMALE }
public void testEnumSerialization() throws IOException, ClassNotFoundException {
Gender gender = Gender.MALE;
// Save the enum to a file.
ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(TEST_FILE_NAME));
out.writeObject(gender);
out.close();
// Read back the enum.
ObjectInputStream in = new ObjectInputStream(new FileInputStream(TEST_FILE_NAME));
Gender gender2 = (Gender) in.readObject();
in.close();
}
}
I have discovered that if I add a string value to the enum initialization in the generated Objective C code the deserialization works fine. The resulting initialize method in Obj C looks like this:
+ (void)initialize {
if (self == [SerializationTest_Gender class]) {
JreEnum(SerializationTest_Gender, MALE) = new_SerializationTest_Gender_initWithNSString_withInt_(#"MALE", 0);
JreEnum(SerializationTest_Gender, FEMALE) = new_SerializationTest_Gender_initWithNSString_withInt_(#"FEMALE", 1);
J2OBJC_SET_INITIALIZED(SerializationTest_Gender)
}
}
Note that I added the #"MALE" and #"FEMALE", the default from the j2objc output is #"".
I have two questions. (1) Is this the correct way to enable a round trip serialization/deserialization of enums? (2) If so, is there a way to have j2objc automatically populate the string constants in the enum rather than coding them by hand?
Thanks for any help you can provide.

We probably broke this with a recent change eliminating redundant enum constant name strings. We had the name defined both in the enum's class initializer and in its metadata, plus we had an important request to stop making enum constants easily discovered in app binaries (apparently tech writers have been known to dump early access binaries and run strings on them to get scoops on any new features). Now the constant name is only in the metadata (no redundancy), and if an app builds with --strip-reflection, the enum has no metadata and the name becomes the enum class plus the constant's ordinal. However, serialization support was overlooked since Google apps use protocol buffers instead (faster and less version-sensitive).
Thanks for the excellent test case, which will make it easier to fix. Please file a bug if you want to be notified when this is fixed.

Get all hostnames for an IP address in the network

I have a requirement wherein an IP address can have multiple hostnames mapped to it. I tried looking into InetAddress.getAllByName("10.33.28.55") however I did not get the desired result, it returned just one entry. nslookup on the IP address returns all DNS entries. How do I retrieve all the hostnames associated with this IP address in Java?

Looking at the source code for InetAddress.getAllByName() you find that it doesn't actually do a DNS query if the provided String is textual representation of an IP address. It simply returns an array containing a single InetAdddress object containing the IP. They even put a handy comment right in the method:
// if host is an IP address, we won't do further lookup
(See: http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java.net/InetAddress.java.html)
If only the JavaDoc was so clear. It states "If a literal IP address is supplied, only the validity of the address format is checked." ... I would argue that doesn't tell you that it isn't going to be looked up.
Thinking about it, however ... it makes sense in the context of InetAddress - the class encapsulates an IP address of which ... you only have one. It really needs getHostNames() and getAllCanonicalNames() (note the plurality) methods that would do what you are asking. I'm thinking of opening an issue / submitting a patch.
That said, it would appear currently there's no built in method of doing a RDNS query where multiple PTR records are supported. All the other lookup methods simply lop off the first record returned and that's what you get.
You're going to have to look into 3rd party DNS libraries for java (sorry, I don't have experience with using any of them).
Edit to add: I like figuring things out. I do not have an IP handy that has multiple PTR records to test this against, but it should do the trick.
import java.io.IOException;
import java.util.Properties;
import javax.naming.Context;
import javax.naming.NamingException;
import javax.naming.directory.Attribute;
import javax.naming.directory.Attributes;
import javax.naming.directory.InitialDirContext;
public class App
{
public static void main(String[] args) throws IOException, NamingException
{
Properties env = new Properties();
env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.dns.DnsContextFactory");
InitialDirContext idc = new InitialDirContext(env);
String ipAddr = "74.125.225.196";
// Turn the IP into an in-addr.arpa name
// 196.225.125.74.in-addr.arpa.
String[] quads = ipAddr.split("\\.");
StringBuilder sb = new StringBuilder();
for (int i = quads.length - 1; i >= 0; i--)
{
sb.append(quads[i]).append(".");
}
sb.append("in-addr.arpa.");
ipAddr = sb.toString();
Attributes attrs = idc.getAttributes(ipAddr, new String[] {"PTR"});
Attribute attr = attrs.get("PTR");
if (attr != null)
{
for (int i = 0; i < attr.size(); i++)
{
System.out.println((String)attr.get(i));
}
}
}
}

Well, there is only one good way: call nslookup or dig or whatever from the Java process.
With Runtime.getRuntime().exec(..)
or better with ProcessBuilder...

This answer might be helpful: https://stackoverflow.com/a/24205035/8026752
Using the lookupAllHostAddr method of DNSNameService works for me, and returns all IP addresses by hostname. Maybe it will also help with finding all hostnames by IP address, but it seems this depends on DNS server configuration. In my case I even couldn't find all hostnames using nslookup, so I couldn't test it, so I'm not sure about this solution.
One suggestion is that lookupAllHostAddr is not static method, so you should use it like this:
InetAddress[] ipAddress = new DNSNameService().lookupAllHostAddr("hostname")
Also, from my perspective, this link could be interesting (it's also information from the same answer thread mentioned by me above, I just summarize it a bit):
https://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html
On the linked page you can find properties to disable lookups caching:
sun.net.inetaddr.ttl - you should add it to JVM start command line like this: -Dsun.net.inetaddr.ttl=0, 0 here means that hostname will be cached for 0 seconds
networkaddress.cache.ttl - you should add the needed value to the java.security file located at %JRE%\lib\security
A bit more info can be found here also:
http://www.rgagnon.com/javadetails/java-0445.html

Making A Javascript Function Available To Java Code

Problem Description
A somewhat contrived example to illustrate my question. Imagine we have some library of javascript functions that is already maintained and updated daily by an army of frontend devs. To be specific, imagine one such function looks like this:
function employeesForStore(store) {
var dictionary = {
"downtown": ["Joe", "Mary", "Steve"],
"uptown": ["Jules", "Vincent", "Matt"],
// and so on for hundreds of locations
};
return dictionary[store];
}
NOTE: Please ignore the details of this function's implementation. The actual function may be far more complex than simple JSON dictionary lookups, and assume we don't know any implementation details about the js function. All we know is it takes a String argument and returns and array of Strings.
Now we would like to take advantage of this function in our Java code. That is, in our Java code, we'd like to "load" this function, and then be able to call it multiple times, passing it String args and receiving String[] or ArrayList<String> results.
From searching SO and google so far, I understand that this will involve using:
javax.script.ScriptEngineManager
javax.script.ScriptEngine
and perhaps scriptEngine.getContext() for passing values into the function and receiving results.
I am a bit hazy on the details of the above, especially since most examples I've found involve running javascript code a single time, rather than making javascript function available to Java.
Example Code I'd Like To See
Assuming the js function is in the file "my_functions.js", load that file into Java so all of its functions will be available for use.
Call employeesForStore("downtown") and store its results in a native java String[] or List<String> in a variable called downtownResults.
Same as 2, but call employeesForStore("uptown") and store in variable uptownResults

Create an interface to act as a facade to your JavaScript code.
Here is an example using the Rhino implementation embedded in Oracle's Java 1.7 implementation:
package demo;
import java.io.*; import java.util.*;
import java.util.concurrent.atomic.AtomicReference;
import javax.script.*;
public class StoreData {
public static interface Stores {
public String[] employees(String store);
}
public static Stores stores() throws IOException, ScriptException {
ScriptEngineManager sem = new ScriptEngineManager();
ScriptEngine engine = sem.getEngineByName("JavaScript");
AtomicReference<Stores> ref = new AtomicReference<>();
engine.put("ref", ref);
String adapt = "ref.set("
+ "new Packages.demo.StoreData.Stores({employees:employeesForStore})"
+ ");";
try (Reader myFns = new FileReader("my_functions.js")) { // TODO encoding
engine.eval(myFns);
engine.eval(adapt);
return ref.get();
}
}
public static void main(String[] args) throws IOException, ScriptException {
List<String> employees = Arrays.asList(stores().employees("uptown"));
System.out.println(employees);
}
}
By specifying an interface we let Rhino coerce the JavaScript types to Java types (String, String[], etc.)
The JRE spec makes no guarantees about what scripting engines should be provided so it may be wise to rely on an external engine. I don't know if Nashorn will change this.

You can use Rhino API to execute JS code in java
This tutorial covers the examples requested.

Get declared methods in order they appear in source code

The situation seems to be abnormal, but I was asked to build serializer that will parse an object into string by concatenating results of "get" methods. The values should appear in the same order as their "get" equivalent is declared in source code file.
So, for example, we have
Class testBean1{
public String getValue1(){
return "value1";
}
public String getValue2(){
return "value2";
}
}
The result should be:
"value1 - value2"
An not
"value2 - value1"
It can't be done with Class object according to the documentation. But I wonder if I can find this information in "*.class" file or is it lost? If such data exists, maybe, someone knows a ready to use tool for that purpose? If such information can't be found, please, suggest the most professional way of achieving the goal. I thought about adding some kind of custom annotations to the getters of the class that should be serialized.

If you want that you have to parse the source code, not the byte code.
There are a number of libraries that parse a source file into a node tree, my favorite is the javaparser (hosted at code.google.com), which, in a slightly modified version, is also used by spring roo.
On the usage page you can find some samples. Basically you will want to use a Visitor that listens for MethodDefinitions.

Although reflection does not anymore (as of java 7 I think) give you the methods in the order in which they appear in the source code, the class file appears to still (as of Java 8) contain the methods in the order in which they appear in the source code.
So, you can parse the class file looking for method names and then sort the methods based on the file offset in which each method was found.
If you want to do it in a less hacky way you can use Javassist, which will give you the line number of each declared method, so you can sort methods by line number.

I don't think the information is retained.
JAXB, for example, has #XmlType(propOrder="field1, field2") where you define the order of the fields when they are serialized to xml. You can implemenet something similar

Edit: This works only on concrete classes (the class to inspect has its own .class file). I changed the code below to reflect this. Until diving deeper into the ClassFileAnalyzer library to work with classes directly instead of reading them from a temporary file this limitation exists.
Following approach works for me:
Download and import following libarary ClassFileAnalyzer
Add the following two static methods (Attention! getClussDump() needs a little modification for writing out the class file to a temporary file: I removed my code here because it's very special at this point):
public static String getClassDump(Class<?> c) throws Exception {
String classFileName = c.getSimpleName() + ".class";
URL resource = c.getResource(classFileName);
if (resource == null) {
throw new RuntimeException("Works only for concreate classes!");
}
String absolutePath = ...; // write to temp file and get absolute path
ClassFile classFile = new ClassFile(absolutePath);
classFile.parse();
Info infos = new Info(classFile, absolutePath);
StringBuffer infoBuffer = infos.getInfos();
return infoBuffer.toString();
}
public static <S extends List<Method>> S sortMethodsBySourceOrder(Class<?> c, S methods) throws Exception {
String classDump = getClassDump(c);
int index = classDump.indexOf("constant_pool_count:");
final String dump = classDump.substring(index);
Collections.sort(methods, new Comparator<Method>() {
public int compare(Method o1, Method o2) {
Integer i1 = Integer.valueOf(dump.indexOf(" " + o1.getName() + lineSeparator));
Integer i2 = Integer.valueOf(dump.indexOf(" " + o2.getName() + lineSeparator));
return i1.compareTo(i2);
}});
return methods;
}
Now you can call the sortMethodsBySourceOrder with any List of methods (because sorting arrays is not very comfortable) and you will get the list back sorted.
It works by looking at the class dumps constant pool which in turn can be determined by the library.
Greetz,
GHad

Write your custom annotation to store ordering data, then use Method.getAnnotation(Class annotationClass)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse MIME sender in Java (RFC 822) - java

Related

KSQL: UDF does not accept parameters (STRING, STRING)

j2objc - exception when deserialize enum

Get all hostnames for an IP address in the network

Making A Javascript Function Available To Java Code

Get declared methods in order they appear in source code

Categories

Resources