I have a data set like this:
Result categoricF1 categoricF2 categoricF3
N red a apple
P green b banana
....
which I will then convert each element in each column into bit representation
for example:red will be 10000, green will be 01000 and then I will store 10000 in BigInteger array. I will do the same process for each element in dataset
what is the best way for this case to load data? (data frame, data set, RDD)
I need code in Java. Thanks indeed for helping
Spark Dataset are similar to RDDs, however, instead of using Java serialization or Kryo they use a specialized Encoder to serialize the objects for processing or transmitting over the network. While both encoders and standard serialization are responsible for turning an object into bytes, encoders are code generated dynamically and use a format that allows Spark to perform many operations like filtering, sorting and hashing without deserializing the bytes back into an object.
For example, you have a class ClassName which contains all parameters you require in your data.
import java.io.Serializable;
public class ClassName implements Serializable {
private String result;
private String categoricF1;
private String categoricF2;
private String categoricF3;
public String getResult() {
return result;
}
public String getCategoricF1() {
return categoricF1;
}
public String getCategoricF2() {
return categoricF2;
}
public String getCategoricF3() {
return categoricF3;
}
public void setResult(String result) {
this.result = result;
}
public void setCategoricF1(String categoricF1) {
this.categoricF1 = categoricF1;
}
public void setCategoricF2(String categoricF2) {
this.categoricF2 = categoricF2;
}
public void setCategoricF3(String categoricF3) {
this.categoricF3 = categoricF3;
}
}
Then to create Dataset of required data, you can code like this:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;
import java.util.ArrayList;
import java.util.List;
public class Test {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.master("local")
.getOrCreate();
// Create an instance of a Bean class
ClassName elem1 = new ClassName();
elem1.setResult("N");
elem1.setCategoricF1("red");
elem1.setCategoricF2("a");
elem1.setCategoricF3("apple");
ClassName elem2 = new ClassName();
elem2.setResult("P");
elem2.setCategoricF1("green");
elem2.setCategoricF2("b");
elem2.setCategoricF3("banana");
List<ClassName> obj = new ArrayList<>();
obj.add(elem1);
obj.add(elem2);
// Encoders are created for Java beans
Encoder<ClassName> classNameEncoder = Encoders.bean(ClassName.class);
Dataset<ClassName> javaBeanDS = spark.createDataset(obj, personEncoder);
javaBeanDS.show();
}
}
Related
I'm trying to Parse the static variable value from the JAVA file. But couldn't be able to parse the variable.
I've used JavaParser to Parse the code and fetch the value of variable. I got success in fetching all other class level variable and value but couldn't be able to parse the static field.
The Java File looks like ...
public class ABC {
public string variable1 = "Hello How are you?";
public boolean variable2 = false;
public static String variable3;
static{
variable3 = new String("Want to Fetch this...");
}
//Can't change this file, this is input.
public static void main(String args[]){
//....Other Code
}
}
I'm able to parse the all variables value except "variabl3". The Code of Java File looks like above Java Code and I need to Parse "variable3"'s value.
I've done below code to parse the class level variable...
import java.util.HashMap;
import java.util.List;
import com.github.javaparser.ast.body.FieldDeclaration;
import com.github.javaparser.ast.body.VariableDeclarator;
import com.github.javaparser.ast.expr.VariableDeclarationExpr;
import com.github.javaparser.ast.visitor.VoidVisitorAdapter;
public class StaticCollector extends
VoidVisitorAdapter<HashMap<String, String>> {
#Override
public void visit(FieldDeclaration n, HashMap<String, String> arg) {
// TODO Auto-generated method stub
List <VariableDeclarator> myVars = n.getVariables();
for (VariableDeclarator vars: myVars){
vars.getInitializer().ifPresent(initValue -> System.out.println(initValue.toString()));
//System.out.println("Variable Name: "+vars.getNameAsString());
}
}
}
Main Method ...
public class Test {
public static void main(String[] args) {
File file = new File("filePath");
CompilationUnit compilationUnit = null;
try {
compilationUnit = JavaParser.parse(file);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
HashMap<String, String> collector = new HashMap<String, String>();
compilationUnit.accept(new StaticCollector(), collector);
}
}
How could I parse the value of "variable3", which is static and value assigned inside static block? There might be other variable in the code but I need to find value of particular variable value (in this case Variable3).
Am I doing something wrong or i need to add some other way, please suggest.
Inspecting the AST as something that's easily readable, e.g., a DOT (GraphViz) image with PlantUML is a huge help to solve this kind of problem. See this blog on how to generate the DOT as well as other formats.
Here's the overview, with the "variable3" nodes highlighted (I just searched for it in the .dot output and put a fill color). You'll see that there are TWO spots where it occurs:
Zooming in on the node space on the right, we can see that the second sub-tree is under an InitializerDeclaration. Further down, it's part of an AssignExpr where the value is an ObjectCreationExpr:
So, I adapted your Visitor (it's an inner class to make the module self contained) and you need to override the visit(InitializerDeclaration n... method to get to where you want:
import com.github.javaparser.StaticJavaParser;
import com.github.javaparser.ast.CompilationUnit;
import com.github.javaparser.ast.body.FieldDeclaration;
import com.github.javaparser.ast.body.InitializerDeclaration;
import com.github.javaparser.ast.body.VariableDeclarator;
import com.github.javaparser.ast.stmt.Statement;
import com.github.javaparser.ast.visitor.VoidVisitorAdapter;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.List;
public class Test {
public static void main(String[] args) {
File file = new File("src/main/java/ABC.java");
CompilationUnit compilationUnit = null;
try {
compilationUnit = StaticJavaParser.parse(file);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
HashMap<String, String> collector = new HashMap<String, String>();
compilationUnit.accept(new StaticCollector(), collector);
}
private static class StaticCollector extends
VoidVisitorAdapter<HashMap<String, String>> {
#Override
public void visit(FieldDeclaration n, HashMap<String, String> arg) {
List<VariableDeclarator> myVars = n.getVariables();
for (VariableDeclarator vars: myVars){
vars.getInitializer().ifPresent(initValue -> System.out.println(initValue.toString()));
//System.out.println("Variable Name: "+vars.getNameAsString());
}
}
#Override
public void visit(InitializerDeclaration n, HashMap<String, String> arg) {
List<Statement> myStatements = n.getBody().getStatements();
for (Statement s: myStatements) {
s.ifExpressionStmt(expressionStmt -> expressionStmt.getExpression()
.ifAssignExpr(assignExpr -> System.out.println(assignExpr.getValue())));
}
}
}
}
Here's the output showing additionally variable3's initialization in the static block:
"Hello How are you?"
false
new String("Want to Fetch this...")
I have an existing object that I want to serialize in MongoDB using Java + POJO codec. For some reason the driver tries to create an instance of an enum instead of using valueOF:
org.bson.codecs.configuration.CodecConfigurationException: Failed to decode 'phase'. Failed to decode 'value'. Cannot find a public constructor for 'SimplePhaseEnumType'.
at org.bson.codecs.pojo.PojoCodecImpl.decodePropertyModel(PojoCodecImpl.java:192)
at org.bson.codecs.pojo.PojoCodecImpl.decodeProperties(PojoCodecImpl.java:168)
at org.bson.codecs.pojo.PojoCodecImpl.decode(PojoCodecImpl.java:122)
at org.bson.codecs.pojo.PojoCodecImpl.decode(PojoCodecImpl.java:126)
at com.mongodb.operation.CommandResultArrayCodec.decode(CommandResultArrayCodec.java:52)
The enumeration:
public enum SimplePhaseEnumType {
PROPOSED("Proposed"),
INTERIM("Interim"),
MODIFIED("Modified"),
ASSIGNED("Assigned");
private final String value;
SimplePhaseEnumType(String v) {
value = v;
}
public String value() {
return value;
}
public static SimplePhaseEnumType fromValue(String v) {
for (SimplePhaseEnumType c: SimplePhaseEnumType.values()) {
if (c.value.equals(v)) {
return c;
}
}
throw new IllegalArgumentException(v);
}}
And the class the uses the enumeration (only showing the relevant fields):
public class SpecificPhaseType {
protected SimplePhaseEnumType value;
protected String date;
public SimplePhaseEnumType getValue() {
return value;
}
public void setValue(SimplePhaseEnumType value) {
this.value = value;
}}
I was looking for a way to maybe annotate the class to tell the driver to use a different method to serialize / deserialize those fields when they are encountered. I know how to skip them during the serialization / deserialization but that doesn't fix the problem:
public class SpecificPhaseType {
#BsonIgnore
protected SimplePhaseEnumType value;
Any help on where I could look (code, documentation)?. I already checked PojoQuickTour.java, MongoDB Driver Quick Start - POJOs and POJOs - Plain Old Java Objects
Thanks!
--Jose
I figured out what to do, you first need to write a custom Codec to read and write the enum as a String (an ordinal is another option if you want to save space, but string was more than OK with me):
package com.kodegeek.cvebrowser.persistence.serializers;
import com.kodegeek.cvebrowser.entity.SimplePhaseEnumType;
import org.bson.BsonReader;
import org.bson.BsonWriter;
import org.bson.codecs.Codec;
import org.bson.codecs.DecoderContext;
import org.bson.codecs.EncoderContext;
public class SimplePhaseEnumTypeCodec implements Codec<SimplePhaseEnumType>{
#Override
public SimplePhaseEnumType decode(BsonReader reader, DecoderContext decoderContext) {
return SimplePhaseEnumType.fromValue(reader.readString());
}
#Override
public void encode(BsonWriter writer, SimplePhaseEnumType value, EncoderContext encoderContext) {
writer.writeString(value.value());
}
#Override
public Class<SimplePhaseEnumType> getEncoderClass() {
return SimplePhaseEnumType.class;
}
}
Then you need to register the new codec so MongoDB can handle the enum using your class:
/**
* MongoDB could not make this any simpler ;-)
* #return a Codec registry
*/
public static CodecRegistry getCodecRegistry() {
final CodecRegistry defaultCodecRegistry = MongoClient.getDefaultCodecRegistry();
final CodecProvider pojoCodecProvider = PojoCodecProvider.builder().register(packages).build();
final CodecRegistry cvePojoCodecRegistry = CodecRegistries.fromProviders(pojoCodecProvider);
final CodecRegistry customEnumCodecs = CodecRegistries.fromCodecs(
new SimplePhaseEnumTypeCodec(),
new StatusEnumTypeCodec(),
new TypeEnumTypeCodec()
);
return CodecRegistries.fromRegistries(defaultCodecRegistry, customEnumCodecs, cvePojoCodecRegistry);
}
Jackson makes it easier to register custom serializer/ deserializer with annotations like #JsonSerializer / #JsonDeserializer and while Mongo forces you to deal with the registry. Not a big deal :-)
You can peek at the full source code here. Hope this saves some time to anyone who has to deal with a similar issue.
I have a java file name E2BXmlParser where I am reading and manipulating the XML data fetched from the database.
Now I am trying to execute the java file using Oracle SQL Developer after changing the file like this
CREATE AND COMPILE JAVA SOURCE NAMED "E2BXmlParser" AS
--(Rest of Code).
And rest of code looks like this--
import oracle.jdbc.*
import oracle.xdb.XMLType;
import oracle.xml.parser.v2.XMLDocument;
import oracle.jdbc.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.*;
import java.sql.Connection;
import java.util.*
import javax.xml.xpath.*;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.io.StringReader;
class Reaction {
}
public class E2BXmlParser {
//variables
public E2BXmlParser(int regReportId, int reportId) {
//connection
}
public static void parseXML(int regReportId, int reportId, int isBlinded, int reportFormid,int pi_is_r3_profile,int pi_max_length,String pi_risk_category) throws SQLException, XPathExpressionException, TransformerException {
//fetching data
}
private static Document getDocumentFromString(String xmlContent) throws Exception {
}
private String getStringByElementName(String tagName, Element element) {
}
private OracleConnection getConnecton() {
//oracle connection
}
private Document getXmlDocumentFromDb(int regReportId, int reportId) {
//fetching and manipulating data
}
private List<Reaction> getReactionIds() {
//logic
}
private void findById(Reaction reaction, String id) {
//xpath for finding nodes
}
private boolean checkNodeExists(Element el, String nodeName) {
NodeList list = el.getElementsByTagName(nodeName);
return list.getLength() > 0;
}
private void updateNode(Reaction reaction, Element el) {
//update xml
}
private void updateXmlInDB(int regReportId, int reportId) throws SQLException {
//update xml in db
}
private void updateDrugNode() {
Element rootElement = document.getDocumentElement();
//logic
}
private void updateDrugEventandDrugRelatedness(int reportFormid) {
//update xml
}
private void updateMedicinalActiveSubstance(int regReportId, int isBlinded, int reportFormid,int pi_is_r3_profile,int pi_max_length,String pi_risk_category) {
//update xml after fetching data and changing in DB
}
private Boolean compareStrings(String strOne, String strTwo) {
//logic
}
private void updateDosageInformation() {
//logic
}
private void updateActiveSubstanceName() {
updating activesubstance using xpath
}
private void RemoveDuplicateActiveSubstance(NodeList activesubstancenameList, List<String> names) {
// logic
}
}
Now it is asking for multiple values(reactions,nodelist,node) that are used in code.
But this is not the case
when I am executing the java file from command line like this
loadjava -user username/password#DBalias -r E2BXmlParser.java
P.S I have to change my E2BXmlParser.java file to E2BXmlParser.sql file so that I can execute it from oracle sql developer.
Please help.
The easiest solution is wrapping all logic of your class into one static method in class. Next you have to publish this method to pl sql.
And publication of static function will be look (more or less) like this.
CREATE PROCEDURE parseXML (regReportId NUMBER, reportId NUMBER, isBlinded NUMBER, reportFormid NUMBER, pi_is_r3_profile NUMBER, pi_max_length NUMBER, pi_risk_category varchar2)
AS LANGUAGE JAVA
NAME 'E2BXmlParser.parseXML(int regReportId, int reportId, int isBlinded, int reportFormid,int pi_is_r3_profile,int pi_max_length,java.lang.String pi_risk_category)';
Note. In plsql you have to use full path to object example String -> java.lang.String
Of course oracle allows to use java class in more object oriented way but this is more complicated.
For more information check this manual. https://docs.oracle.com/cd/E18283_01/java.112/e10588/toc.htm
Chapter 3 (Calling Java Methods in Oracle Database) - for basic solutions.
Chapter 6 (Publishing Java Classes With Call Specifications) - ( paragraph Writing Object Type Call Specifications) - for publishing full java class.
I have a class to be saved into appengine datastore, which among others contains, a Text field (String-like appengine datatype, but not limited to 500 chars). Also a twin class which is basically the same, but is used on the client side (ie without any com.google.appengine.api.datastore.* import).
Is there any datatype, which would let me save the Text server-side field into client-Side?
A possible option would be split the Text into some Strings, but that sounds pretty ugly...
Any suggestions?
You can call getValue() to make it a String.
You can use Text for your persistable field. You just need to have a RPC serializer to be able to use it on the client (in GWT).
Take a look at http://blog.js-development.com/2010/02/gwt-app-engine-and-app-engine-data.html, it explains how to do it.
some additions to custom serializable libraries posted before
( http://juristr.com/blog/2010/02/gwt-app-engine-and-app-engine-data/
http://www.resmarksystems.com/code/
- get com.google.appengine.api.datastore.Text and other datastore types transferred to client)
need also to update com.google.appengine.eclipse.core.prefs to include library:
filesCopiedToWebInfLib=...|appengine-utils-client-1.1.jar
another workaround is making string serializable blob to overcome 1500 bytes limit (it will lost sort and filter ability fir this field):
#Persistent(serialized = "true")
public String content;
it is possible to have less overhead on client with converting from com.google.appengine.api.datastore.Text to String with lifecycle listeners (not instance listeners, they will got send to client and make it fail). use it together with custom serialization which allows client support for com.google.appengine.api.datastore.Text with no additional transport class is required.
com.google.appengine.api.datastore.Text may be cleared before sending to client to avoid sending overhead (simplest way is to mark it transient).
on server side we have to avoid setting String property directly, because jdo will not catch it change (will catch only for new records or when some persistent field is modified after). this is very little overhead.
detaching of records should be performed via pm.makeTransient. when using pm.detachCopy it is required to mark entity as detachable = "true" (DetachLifecycleListener to be called) and implement DetachLifecycleListener.postDetach similar way as StoreLifecycleListener.preStore. othwerwise non-persistent fields will not be copied (by pm.detachCopy) and will be empty on client.
it is possible to handle several classes similar way
import javax.jdo.JDOHelper;
import javax.jdo.PersistenceManager;
import javax.jdo.PersistenceManagerFactory;
import javax.jdo.listener.DetachLifecycleListener;
import javax.jdo.listener.InstanceLifecycleEvent;
import javax.jdo.listener.LoadLifecycleListener;
import javax.jdo.listener.StoreLifecycleListener;
import com.google.appengine.api.datastore.Text;
import com.mycompany.mywebapp.shared.Entity;
import com.mycompany.mywebapp.shared.Message;
#SuppressWarnings("rawtypes")
public class PersistenceManagerStuff
{
public static final PersistenceManagerFactory PMF = JDOHelper.getPersistenceManagerFactory("transactions-optional");
public static EntityLifecycleListener entityLifecycleListener = new EntityLifecycleListener();
public static Class[] entityClassList = new Class[] { Entity.class };
public static MessageLifecycleListener messageLifecycleListener = new MessageLifecycleListener();
public static Class[] messageClassList = new Class[] { Message.class };
public static PersistenceManager getPersistenceManager()
{
PersistenceManager pm = PMF.getPersistenceManager();
pm.addInstanceLifecycleListener(entityLifecycleListener, entityClassList);
pm.addInstanceLifecycleListener(messageLifecycleListener, messageClassList);
return pm;
}
// [start] lifecycle listeners
public static class EntityLifecycleListener implements LoadLifecycleListener, StoreLifecycleListener//, DetachLifecycleListener
{
public void postLoad(InstanceLifecycleEvent event)
{
Entity entity = ((Entity) event.getSource());
if (entity.content_long != null)
entity.content = entity.content_long.getValue();
else
entity.content = null;
}
public void preStore(InstanceLifecycleEvent event)
{
Entity entity = ((Entity) event.getSource());
entity.setContent(entity.content);
/*
need mark class #PersistenceAware to use code below, otherwise use setter
if (entity.content != null)
entity.content_long = new Text(entity.content);
else
entity.content_long = null;
*/
}
public void postStore(InstanceLifecycleEvent event)
{
}
/*public void postDetach(InstanceLifecycleEvent event)
{
}
public void preDetach(InstanceLifecycleEvent event)
{
}*/
}
public static class MessageLifecycleListener implements LoadLifecycleListener, StoreLifecycleListener
{
public void postLoad(InstanceLifecycleEvent event)
{
Message message = ((Message) event.getSource());
if (message.content_long != null)
message.content = message.content_long.getValue();
else
message.content = null;
}
public void preStore(InstanceLifecycleEvent event)
{
Message message = ((Message) event.getSource());
message.setContent(message.content);
}
public void postStore(InstanceLifecycleEvent event)
{
}
}
// [end] lifecycle listeners
}
#SuppressWarnings("serial")
#PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "false")
public class Entity implements Serializable
{
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
public Long id;
#NotPersistent
public String content;
#Persistent(column = "content")
public transient com.google.appengine.api.datastore.Text content_long;
public void setContent(String content)
{
this.content = content;
if (content != null)
content_long = new Text(content);
else
content_long = null;
}
public Entity() {}
}
#PersistenceAware
public class DataServiceImpl extends RemoteServiceServlet implements DataService
{
public Entity renameEntity(long id, String newContent) throws NotLoggedInException
{
PersistenceManager pm = PersistenceManagerStuff.getPersistenceManager();
Entity result = null;
try
{
Entity entity = (Entity) pm.getObjectById(Entity.class, id);
if (entity.longUserId != getLongUserId(pm))
throw new NotLoggedInException(String.format("wrong entity %d ownership", entity.id));
entity.modificationDate = java.lang.System.currentTimeMillis(); // will call lifecycle handlers for entity.content, but is still old value
//entity.content = newContent; // will not work, even owner class is #PersistenceAware
entity.setContent(newContent); // correct way to set long value
pm.makeTransient(result = entity);
}
catch (Exception e)
{
LOG.log(Level.WARNING, e.getMessage());
throw e;
}
finally
{
pm.close();
}
return result;
}
}
also in lifecycle handlers it is possible to mix old (short) and new (long) values into single entity if you have both (with different field names) and do not want to convert old to new. but is seems com.google.appengine.api.datastore.Text supports loading from old String values.
some low-level code to batch convert old values into new (using low level com.google.appengine.api.datastore api):
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Query q = new Query("Entity");
PreparedQuery pq = datastore.prepare(q);
for (com.google.appengine.api.datastore.Entity result : pq.asIterable())
{
String content = (String) result.getProperty("content");
if (content != null)
{
result.setProperty("content", new com.google.appengine.api.datastore.Text(content));
datastore.put(result);
}
}
I was wondering, given the following JSON, how I can produce a ResultSet instance, which carry Query valued ppb?
package jsontest;
import com.google.gson.Gson;
/**
*
* #author yccheok
*/
public class Main {
public static class ResultSet {
public String Query;
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
final String s = "{\"ResultSet\":{\"Query\":\"ppb\"}}";
System.out.println(s);
Gson gson = new Gson();
ResultSet resultSet = gson.fromJson(s, ResultSet.class);
// {}
System.out.println(gson.toJson(resultSet));
// null?
System.out.println(resultSet.Query);
}
}
Currently, here is what I get :
{"ResultSet":{"Query":"ppb"}}
{}
null
Without modified the String, how can I get a correct Java object?
Try first to construct a new object, call gson.toJson(object), and see the result.
I don't have gson, but jackson (another object-to-json mapper) prints this:
{"Query":"ppb"}
So, you don't include the class name. Actually, the gson user guide gives an example showing exactly this. Look at the BagOfPrimitives.
(And a final note - in Java, the accepted practice is that variables are lowercase - i.e. query rather than Query)
Update If you really can't change the json input, you can mirror the structure this way:
public static class Holder {
public ResultSet ResultSet;
}
public static class ResultSet {
public String Query;
}
(and then use Holder h = gson.fromJson(s, Holder.class);)