Instead of converting an entire CSV file to an object, is there a simple API that takes in one csv or tsv string, and converts it to an object? The api's I've found so far are geared towards csv/tsv FIlE to list of objects.
Obviously I could just split the String and call a constructor, but was wondering if there was a clean api I could use.
You can do this with Jackson. It looks pretty similar to the other answers but seems to perform better than SuperCSV according to their tests.
Define your POJO (both the annotation and constructor seems to be necessary):
#JsonPropertyOrder({ "foo", "bar" })
public class FooBar {
private String foo;
private String bar;
public FooBar() {
}
// Setters, getters, toString()
}
Then parse it:
String input = "1,2\n3,4";
StringReader reader = new StringReader(input);
CsvMapper m = new CsvMapper();
CsvSchema schema = m.schemaFor(FooBar.class).withoutHeader().withLineSeparator("\n").withColumnSeparator(',');
try {
MappingIterator<FooBar> r = m.reader(FooBar.class).with(schema).readValues(reader);
while (r.hasNext()) {
System.out.println(r.nextValue());
}
} catch (JsonProcessingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Go with uniVocity-parsers as it is at least twice as fast than SuperCSV and has way more features.
For example, let's say your bean is:
class TestBean {
// if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
#NullString(nulls = { "?", "-" })
// if a value resolves to null, it will be converted to the String "0".
#Parsed(defaultNullRead = "0")
private Integer quantity; // The attribute type defines which conversion will be executed when processing the value.
// In this case, IntegerConversion will be used.
// The attribute name will be matched against the column header in the file automatically.
#Trim
#LowerCase
// the value for the comments attribute is in the column at index 4 (0 is the first column, so this means fifth column in the file)
#Parsed(index = 4)
private String comments;
// you can also explicitly give the name of a column in the file.
#Parsed(field = "amount")
private BigDecimal amount;
#Trim
#LowerCase
// values "no", "n" and "null" will be converted to false; values "yes" and "y" will be converted to true
#BooleanString(falseStrings = { "no", "n", "null" }, trueStrings = { "yes", "y" })
#Parsed
private Boolean pending;
Now, to read your input as a list of TestBean
// BeanListProcessor converts each parsed row to an instance of a given class, then stores each instance into a list.
BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(getReader("/examples/bean_test.csv"));
// The BeanListProcessor provides a list of objects extracted from the input.
List<TestBean> beans = rowProcessor.getBeans();
To parse TSV files, just change the combination of CsvParserSettings & CsvParser to TsvParserSettings & TsvParser.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
I'm using this Api:
http://jsefa.sourceforge.net/
You can use annotations to convert your entities in CSV.
In the case of SuperCSV which you mentioned in a comment, you could pass it a String wrapped in a StringReader, i.e.
CsvBeanReader beanReader=new CsvBeanReader(new StringReader(theString), preferences);
beanReader.read(theBean, nameMapping);
I was currently dealing with a similar issue. in my case I wanted to import a single csv row at a time into a single pojo as I was getting my data in the form of discrete single line websocket updates. at the end jackson worked best for me as I didnt have to put everything into a list of pojos first.
here the code
String csvString="rick|sanchez|99"
private CsvMapper mapper=new CsvMapper();
private CsvSchema schema = mapper.schemaFor(Pojo.class).withColumnSeparator('|');
private ObjectReader r=mapper.readerFor(Pojo.class).with(schema);
Pojo pojo=r.readValue(csvString);
for this to work you also ned to add the following annotation to your pojo
#JsonPropertyOrder({"firstName","lastName","age"})
as far as I know its the only one that easily lets you parse a single csv line into a single pojo instance. obviously you could also do this over a constructor by hand but these libraries deal with with type conversions for you so its particularly useful if your pojo contains lots of different attributes
Related
I am receiving messages in protobuf format. I need to convert it to json format fast as all my business logic is written to handle json based POJO objects.
byte[] request = ..; // msg received
// convert to intermediate POJO
AdxOpenRtb.BidRequest bidRequestProto = AdxOpenRtb.BidRequest.parseFrom(request, reg);
// convert intermediate POJO to json string.
// THIS STEP IS VERY SLOW
Printer printer = JsonFormat.printer().printingEnumsAsInts().omittingInsignificantWhitespace();
String jsonBody = printer.print(bidRequestProto);
// convert json string to final POJO format
BidRequest bidRequest = super.parse(jsonBody.getBytes());
Proto object to json conversion step is very slow. Is there any faster approach for it?
can i reuse printer object? is it thread-safe?
Note: This POJO class (AdxOpenRtb.BidRequest & BidRequest) is very complex having many hierarchy and fields but contains similar data with slightly different fields name and data types.
I ran into some performance issues as well and ended up writing the QuickBuffers library. It generates dedicated JSON serialization methods (i.e. no reflection) and should give you a 10-30x speedup. It can be used side-by-side with Google's implementation. The code should look something like this:
// Initialization (objects can be reused if desired)
AdxOpenRtb.BidRequest bidRequestProto = AdxOpenRtb.BidRequest.newInstance();
ProtoSource protoSource = ProtoSource.newArraySource();
JsonSink jsonSink = JsonSink.newInstance().setWriteEnumsAsInts(true);
// Convert Protobuf to JSON
bidRequestProto.clearQuick() // or ::parseFrom if you want a new object
.mergeFrom(protoSource.setInput(request))
.writeTo(jsonSink.clear());
// Use the raw json bytes
RepeatedByte jsonBytes = jsonSink.getBytes();
JsonSinkBenchmark has some sample code for replacing the built-in JSON encoder with more battle-tested Gson/Jackson backends.
Edit: if you're doing this within a single process and are worried about performance, you're better off writing or generating code to convert the Java objects directly. JSON is not a very efficient format to go through.
I end up using MapStruct as suggested by some of you (#M.Deinum).
new code:
byte[] request = ..; // msg received
// convert to intermediate POJO
AdxOpenRtb.BidRequest bidRequestProto = AdxOpenRtb.BidRequest.parseFrom(request, reg);
// direct conversion from protobuf Pojo to my custom Pojo
BidRequest bidRequest = BidRequestMapper.INSTANCE.adxOpenRtbToBidRequest(bidRequestProto);
Code snippet of BidRequestMapper:
#Mapper(
collectionMappingStrategy = CollectionMappingStrategy.ADDER_PREFERRED, nullValueCheckStrategy = NullValueCheckStrategy.ALWAYS,
unmappedSourcePolicy = ReportingPolicy.WARN, unmappedTargetPolicy = ReportingPolicy.WARN)
#DecoratedWith(BidRequestMapperDecorator.class)
public abstract class BidRequestMapper {
public static final BidRequestMapper INSTANCE = Mappers.getMapper(BidRequestMapper.class);
#Mapping(source = "impList", target = "imp")
#Mapping(target = "impOverride", ignore = true)
#Mapping(target = "ext", ignore = true)
public abstract BidRequest adxOpenRtbToBidRequest(AdxOpenRtb.BidRequest adxOpenRtb);
...
...
}
// manage proto extensions
abstract class BidRequestMapperDecorator extends BidRequestMapper {
private final BidRequestMapper delegate;
BidRequestMapperDecorator(BidRequestMapper delegate) {
this.delegate = delegate;
}
#Override
public BidRequest adxOpenRtbToBidRequest(AdxOpenRtb.BidRequest bidRequestProto) {
// Covert protobuf msg to basic bid request object
BidRequest bidRequest = delegate.adxOpenRtbToBidRequest(bidRequestProto);
...
...
}
}
The new approach is 20-50x faster in my local test environment.
It's worth mentioning that MapStruct is an annotation processor which makes it much faster than other similar libraries which use reflection and it also has very good support for customization.
Error: com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input
Yaml file:
formatting.template:
fields:
- name: birthdate
type: java.lang.String
subType: java.util.Date
lenght: 10
ConfigurationProperties:
#Data
public class FormattingConfigurationProperties {
private List<Field> fields;
#Data
public static class Field {
private String name;
private String type;
private String subType;
private String lenght;
}
}
Method to read yaml
private static FormattingConfigurationProperties buildFormattingConfigurationProperties() throws IOException {
InputStream inputStream = new FileInputStream(new File("./src/test/resources/" + "application_formatting.yaml"));
YAMLMapper mapper = new YAMLMapper();
mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
mapper.enable(DeserializationFeature.ACCEPT_SINGLE_VALUE_AS_ARRAY);
return mapper.readerFor(FormattingConfigurationProperties.class)
.at(/formatting/template)
.readValue(inputStream);
}
I actually solved it by changing the Yaml file, splitting formatting.template on separate lines:
formatting:
template:
fields:
- name: birthdate
type: java.lang.String
subType: java.util.Date
lenght: 10
This means that is not able to read key with dot (periods (.)).
Someone know how to avoid MismatchedInputException, when the prefix is on the same line separated by dot?
You're using the JSON Pointer /formatting/template. This is for nested mappings, as shown in your second YAML file. If you have a condensed key formatting.template, you'll need the JSON Pointer /formatting.template instead.
YAML is perfectly able to read keys with a dot, it just does not do what you think it does. The dot is not a special character in YAML, just part of the content.
You may have worked with Spring which loads YAML files by rewriting them as Properties files, where . is a separator. Since the existing dots are not escaped, for YAML files used with Spring, a dot is the same as nested keys. However when you directly use a YAML loader, such as Jackson, that is not the case.
I am testing RedisGraph as a way to store my data which originates from a client as JSON.
The JSON passes through a bean for validation etc and I use Jackson to serialise the bean so the RedisGraph string is in the correct format. For completeness on that formatting step see the sample code at the end.
The data properties might contain sinqle quotes in valid JSON format eg: O'Toole
{ "name" : "Peter O'Toole", "desc" : "An actors actor" }
I can use a formatter as per the code block at the end to get the JSON into a format the RedisGraph command will allow which copes with the single quotes (without me needing to escape the data content - ie it can use what the client sends). eg this works:
GRAPH.QUERY movies "CREATE (:Actor {name:\"Peter O'Toole\", desc:\"An actors actor\", actor_id:1})"
So far, so good.
Now, the problem: I am having trouble with the syntax to persist original JSON where it ALSO contains escaped double quotes. eg:
{ "name" : "Peter O'Toole", "desc" : "An \"actors\" actor" }
I don't want to have to escape or wrap the desc property value because it is already escaped as valid JSON. But then how do I construct the RedisGraph command so it persists the properties using the values it is given? ie containing escaped double quotes.
In other words, this throws a parsing error because of the \" in the desc property.
GRAPH.QUERY movies "CREATE (:Actor {name:\"Peter O'Toole\", desc:\"An \"actors\" actor\", actor_id:1})"
Given it would be quite common to want to persist data containing valid JSON escaped double quotes \" AND unescaped single quotes, there must be a way to do this. eg name and address data.
Any ideas?
Thanks, Murray.
PS: this doesnt work either: it chokes on the embedded ' in O'Toole
GRAPH.QUERY movies "CREATE (:Actor {name:\'Peter O'Toole\', desc:\'an \"actors\" actor\', actor_id:3})"
// \u007F is the "delete" character.
// This is the highest char value Jackson allows and is
// unlikely to be in the JSON (hopefully!)
JsonFactory builder = new JsonFactoryBuilder().quoteChar('\u007F').build();
ObjectMapper objectMapper = new ObjectMapper(builder);
// Set pretty printing of json
objectMapper.enable(SerializationFeature.INDENT_OUTPUT);
// Do not surround property names with quotes. ie { firstName : "Peter" }
objectMapper.configure(JsonWriteFeature.QUOTE_FIELD_NAMES.mappedFeature(), false);
// Make a Person
Person person = new Person("Peter", "O'Toole");
// Set the desc property using embedded quotes
person.setDesc("An \"actors\" actor");
// Convert Person to JSON
String json = objectMapper.writeValueAsString(person);
// Now convert your json to escape the double quotes around the string properties:
String j2 = json.replaceAll("\u007F", "\\\\\"");
System.out.println(j2);
This yields:
{
firstName : \"Peter\",
lastName : \"O'Toole\",
desc : \"An \"actors\" actor\"
}
which is in a format Redis GRAPH.QUERY movies "CREATE..." can use (apart from the issue with \"actors\" as discussed above).
OK. The issue was an artefact of trying to test the syntax by entering the commands into RedisInsight directly. As it turns out all one needs to do is to remove the double quotes from the valid json.
So, to be clear, based on normal valid json coming from the client app,
the formatter test is:
ObjectMapper objectMapper = new ObjectMapper();
// (Optional) Set pretty printing of json
objectMapper.enable(SerializationFeature.INDENT_OUTPUT);
// Do not surround property names with quotes. ie { firstname : "Peter" }
objectMapper.configure(JsonWriteFeature.QUOTE_FIELD_NAMES.mappedFeature(), false);
// Make a Person
// For this example this is done directly,
// although in the Java this is done using
// objectMapper.readValue(incomingJson, Person.class)
Person person = new Person("Peter", "O'Toole");
// Set the desc property using escaped double quotes
person.setDesc("An \"actor's\" actor");
// Convert Person to JSON without quoted property names
String json = objectMapper.writeValueAsString(person);
System.out.println(json);
yields:
{
firstname : "Peter",
lastname : "O'Toole",
desc : "An \"actor's\" actor"
}
and the command string is consumed by the Vertx Redis:
Vertx vertx = Vertx.vertx();
private final Redis redisClient;
// ...
redisClient = Redis.createClient(vertx);
String cmdStr = "CREATE (:Actor {firstname:"Peter", lastname: "O'Toole", desc:"An \"actor's\" actor", actor_id:1})";
Future<String> futureResponse = redisClient.send(Request.cmd(Command.GRAPH_QUERY).arg("movies").arg(cmdStr))
.compose(response -> {
Log.info("createRequest response=" + response.toString());
return Future.succeededFuture("OK");
})
.onFailure(failure -> {
Log.error("createRequest failure=" + failure.toString());
});
:-)
I am using java maven plugin.I want to fetch employee.csv file records in pojo class.
this pojo class I am generating from employee.csv header and all fields of pojo class are String type.now I want to map employee.csv to generated pojo class.my requirement is I dont want to specify column names manually.because if I change csv file then again I have to chane my code so it should dynamically map with any file. for instance
firstName,lastName,title,salary
john,karter,manager,54372
I want to map this to pojo which I have already
public class Employee
{
private String firstName;
private String lastName;
.
.
//getters and setters
//toString()
}
uniVocity-parsers allows you to map your pojo easily.
class Employee {
#Trim
#LowerCase
#Parsed
private String firstName;
#Parsed
private String lastName;
#NullString(nulls = { "?", "-" }) // if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
#Parsed(defaultNullRead = "0") // if a value resolves to null, it will be converted to the String "0".
private Integer salary; // The attribute name will be matched against the column header in the file automatically.
...
}
To parse:
BeanListProcessor<Employee> rowProcessor = new BeanListProcessor<Employee>(Employee.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
//And parse!
//this submits all rows parsed from the input to the BeanListProcessor
parser.parse(new FileReader(new File("/path/to/your.csv")));
List<Employee> beans = rowProcessor.getBeans();
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
you can use openCSV jar to read the data and then you can map the each column values with the class attributes.
Due to security reason, i can not share my code with you.
I'm using XStream and JETTISON's Stax JSON serializer to send/receive messages to/from JSON javascripts clients and Java web applications.
I want to be able to create a list of objects to send to the server and be properly marshalled into Java but the format that XStream and JSON expect it in is very non-intuitive and requires our javascript libraries to jump through hoops.
[EDIT Update issues using GSON library]
I attempted to use the GSON library but it cannot deserialize concrete objects when I only have it expect generic super classes (XStream and Jettison handles this because type information is baked into the serialization).
GSON FAQ states Collection Limitation:
Collections Limitations
Can serialize collection of arbitrary objects but can not deserialize from it
Because there is no way for the user to indicate the type of the resulting object
While deserializing, Collection must be of a specific generic type
Maybe I'm using bad java practices but how would I go about building a JSON to Java messaging framework that sent/received various concrete Message objects in JSON format?
For example this fails:
public static void main(String[] args) {
Gson gson = new Gson();
MockMessage mock1 = new MockMessage();
MockMessage mock2 = new MockMessage();
MockMessageOther mock3 = new MockMessageOther();
List<MockMessage> messages = new ArrayList<MockMessage>();
messages.add(mock1);
messages.add(mock2);
messages.add(mock3);
String jsonString = gson.toJson(messages);
//JSON list format is non-intuitive single element array with class name fields
System.out.println(jsonString);
List gsonJSONUnmarshalledMessages = (List)gson.fromJson(jsonString, List.class);
//This will print 3 messages unmarshalled
System.out.println("XStream format JSON Number of messages unmarshalled: " + gsonJSONUnmarshalledMessages.size());
}
[{"val":1},{"val":1},{"otherVal":1,"val":1}]
Exception in thread "main" com.google.gson.JsonParseException: The JsonDeserializer com.google.gson.DefaultTypeAdapters$CollectionTypeAdapter#638bd7f1 failed to deserialized json object [{"val":1},{"val":1},{"otherVal":1,"val":1}] given the type interface java.util.List
Here's an example, I want to send a list of 3 Message objects, 2 are of the same type and the 3rd is a different type.
import java.util.ArrayList;
import java.util.List;
import com.thoughtworks.xstream.XStream;
import com.thoughtworks.xstream.io.json.JettisonMappedXmlDriver;
class MockMessage {
int val = 1;
}
class MockMessageOther {
int otherVal = 1;
}
public class TestJSONXStream {
public static void main(String[] args) {
JettisonMappedXmlDriver xmlDriver = new JettisonMappedXmlDriver();
XStream xstream = new XStream(xmlDriver);
MockMessage mock1 = new MockMessage();
MockMessage mock2 = new MockMessage();
MockMessageOther mock3 = new MockMessageOther();
List messages = new ArrayList();
messages.add(mock1);
messages.add(mock2);
messages.add(mock3);
String jsonString = xstream.toXML(messages);
//JSON list format is non-intuitive single element array with class name fields
System.out.println(jsonString);
List xstreamJSONUnmarshalledMessages = (List)xstream.fromXML(jsonString);
//This will print 3 messages unmarshalled
System.out.println("XStream format JSON Number of messages unmarshalled: " + xstreamJSONUnmarshalledMessages.size());
//Attempt to deserialize a reasonable looking JSON string
String jsonTest =
"{"+
"\"list\" : ["+
"{"+
"\"MockMessage\" : {"+
"\"val\" : 1"+
"}"+
"}, {"+
"\"MockMessage\" : {"+
"\"val\" : 1"+
"}"+
"}, {"+
"\"MockMessageOther\" : {"+
"\"otherVal\" : 1"+
"}"+
"} ]"+
"};";
List unmarshalledMessages = (List)xstream.fromXML(jsonTest);
//We expect 3 messages but XStream only deserializes one
System.out.println("Normal format JSON Number of messages unmarshalled: " + unmarshalledMessages.size());
}
}
Intuitively I expect the XStream JSON to be serialized (and able to deserialize correctly) from the following format:
{
"list" : [
{
"MockMessage" : {
"val" : 1
}
}, {
"MockMessage" : {
"val" : 1
}
}, {
"MockMessageOther" : {
"otherVal" : 1
}
} ]
}
Instead XStream creates a single element list with fields that are named the classnames and nested arrays of Objects of the same type.
{
"list" : [ {
"MockMessage" : [ {
"val" : 1
}, {
"val" : 1
} ],
"MockMessageOther" : {
"otherVal" : 1
}
} ]
}
The trouble may be caused by it using the XStream XML CollectionConverter?
Does anyone have a suggestion for a good JSON Java object serialization that allows you to read/write arbitrary Java objects. I looked at the Jackson Java JSON Processor but when you were reading in objects from a stream you had to specify what type of object it was unlike XStream where it will read in any object (because the serialized XStream JSON contains class name information).
I agree with other poster in that XStream is not a good fit -- it's an OXM (Object/Xml Mapper), and JSON is handled as a secondary output format using XML processing path. This is why a "convention" (of how to convert hierarchich xml model into object-graph model of json and vice versa) is needed; and your choice boils down to using whatever is least intrusive of sub-optimal choices.
That works ok if XML is your primary data format, and you just need some rudimentary JSON(-like) support.
To get good JSON-support, I would consider using a JSON processing library that does real OJM mapping (I assume Svenson does too, but additionally), such as:
Jackson
Google-gson
Also: even if you do need to support both XML and JSON, you are IMO better off using separate libraries for these tasks -- objects (beans) to use on server-side need not be different, just serialization libs that convert to/from xml and json.
I realize this is off-topic, but I'd like to present a solution in svenson JSON.
Do you really need public fields in your domain classes? Apart from having to use properties, svenson can handle cases like this with a more simple JSON output with a discriminator property
class Message
{
// .. your properties with getters and setters ..
// special property "type" acts a signal for conversion
}
class MessageOther
{
...
}
List list = new ArrayList();
list.add(new Message());
list.add(new MessageOther());
list.add(new Message());
String jsonDataSet = JSON.defaultJSON().forValue(list);
would output JSON like
[
{"type":"message", ... },
{"type":"message_other", ... },
{"type":"message", ... }
]
which could be parsed again with code like this
// configure reusable parse instance
JSONParser parser = new JSONParser();
// type mapper to map to your types
PropertyValueBasedTypeMapper mapper = new PropertyValueBasedTypeMapper();
mapper.setParsePathInfo("[]");
mapper.addFieldValueMapping("message", Message.class);
mapper.addFieldValueMapping("message_other", MessageOther.class);
parser.setTypeMapper(mapper);
List list = parser.parse(List.class, jsonDataset);
A svenson type mapper based on the full class name would look something like this
public class ClassNameBasedTypeMapper extends PropertyValueBasedTypeMapper
{
protected Class getTypeHintFromTypeProperty(String value) throws IllegalStateException
{
try
{
return Class.forName(value);
}
catch (ClassNotFoundException e)
{
throw new IllegalStateException(value + " is no valid class", e);
}
}
}
which is not an ideal implementation as it inherits the configuration of PropertyValueBasedTypeMapper without really needing. (should include a cleaner version in svenson)
The setup is very much like above
JSONParser parser = new JSONParser();
ClassNameBasedTypeMapper mapper = new ClassNameBasedTypeMapper();
mapper.setParsePathInfo("[]");
parser.setTypeMapper(mapper);
List foos = parser
.parse( List.class, "[{\"type\":\"package.Foo\"},{\"type\":\"package.Bar\"}]");