Storing data in flat files

Storing data in flat files - java

I have two data files which is some weird format. Need to parse it to some descent format to use that for future purposes. after parsing i end up having two formats on which one has an id and respective information pertaining to that id will be from another file.
Ex :
From file 1 i get
Name, Position, PropertyID
from file 2
PropertyId, Property1,Property2
like this i have more columns from both the file.
what is the idle way to store these information in a flat file to server as a database. i don't want to use database(Mysql,MsSql) for some reason.
initially i thought of using single Coma separated file. but ill end up using so many columns which will create problem when i update these information.
I ll be using the parsed data in some other application using java and python
can anyone suggest better way to handle this
Thanks

I would use JSON. JSON can be easily converted to and from objects in either Python or Java. In Python, JSON maps directly to dict. Java has various facilities to convert. Far less work than doing all that yourself. For Java, see JAXB.
Something like this.
File 1: Map people to propertyID
{
{"firstName": "John", "lastName": "Smith", "position": "sales"} : 123},
{"firstName": "Jane", "lastName": "Doe", "position": "manager"} : 456}
}
File 2: Map propertyId to list of properties.
{
{123: [{"address": "123 street", "city": "LA"}, {"address": "456 street", "city": "SF"}] } ,
{456: [{"address": "123 ave", "city": "XX"}, {"address": "456 ave", "city": "SF"}] }
}
p.s. It might make more sense to associate a person with a list of property IDs and have each property have it's own ID. Easier to move things around and reassign. Just my $0.02.

Ensure that you normalize your data with an ID to avoid touching so many different data columns with even a single change. Like the file2 you mentioned above, you can reduce the columns to two by having just the propertyId and the property columns. Rather than having 1 propertyId associated with 2 property in a single row you'd have 1 propertyId associated with 1 property per your example above. You need another file to correlate your two main data table. Normalizing your data like this can make your updates to them very minimal when change occurs.
file1:
owner_id | name | position |
1 | Jack Ma | CEO |
file2:
property_id | property |
101 | Hollywood Mansion |
102 | Miami Beach House |
file3:
OwnerId | PropertyId |
1 | 101
1 | 102

Related

Using Cucumber examples data in "scenario outline" line, not in a step, data isn't read and column marked as unused

When I use table column in "scenario outline" line in cucumber feature file, not in any step, using java and intellij-idea such as the following:
Scenario Outline: my test <lastname>
Given Customer Ask Chatbot "My name is <fname>"
When Verify Chatbot responses contain
"""
Hello <fname>!
"""
Then Customer clicks on "Yes"
Examples:
| fname | lastname |
| ahmed | amir |
| saad | sameh |
| mohamed | morad |
"fname" is acting normal, but "lastname" column is marked as unused, as it is only used in the "scenario outline" line and not in any step.
My question is, does this happen with you? and if so, is this the intended behavior? or is it an issue that needs to be reported and fixed? and if so, is it a problem within intellij or cucumber or something else?
Thank you

There's an open ticket for this issue in IDEA: https://youtrack.jetbrains.com/issue/IDEA-261249, you can vote for it

Most likely, the feature file is not placed in "expected" (by Intellij) place.
Put it under resources folder.
To have a correct display in Intellij, sometimes it is needed to mark that directory as "test resources root"
there is also a small timeout to display the column as "used"

REST API for updating informations with empty or null values

I have a general question about how best to build an API that can modify records in a database.
Suppose we have a table with 10 columns and we can query these 10 columns using REST (GET). The JSON response will contain all 10 fields. This is easy and works without problems.
The next step is that someone wants to create a new record via POST. In this case the person sends only 8 of the 10 fields in the JSON Request. We would then only fill the 8 fields in the database (the rest would be NULL). This also works without problems.
But what happens if someone wants to update a record? We see here different possibilities with advantages and disadvantages.
Only what should be updated is sent.
Problem: How can you explicitly empty / delete a field? If a "NULL" is passed in the JSON, we get NULL in the object, but any other field that is not passed is NULL as well. Therefore we cannot distinguish which field can be deleted and which field cannot be touched.
The complete object is sent.
Problem: Here the object could be fetched via a GET before, changed accordingly and returned via PUT. Now we get all information back and could write the information directly back into the database. Because empty fields were either already empty before or were cleared by the user.
What happens if the objects are extended by an update of the API. Suppose we extend the database by five more fields. The user of the API makes a GET, gets the 15 fields, but can only read the 10 fields he knows on his page (because he hasn't updated his side yet). Then he changes some of the 10 fields and sends them back via PUT. We would then update only the 10 fields on our site and the 5 new fields would be emptied from the database.
Or do you have to create a separate endpoint for each field? We have also thought about creating a map with key / value, what exactly should be changed.
About the technique: We use the Wildfly 15 with Resteasy and Jackson.
For example:
Database at the beginning
+----+----------+---------------+-----+--------+-------+
| ID | Name | Country | Age | Weight | Phone |
+----+----------+---------------+-----+--------+-------+
| 1 | Person 1 | Germany | 22 | 60 | 12345 |
| 2 | Person 2 | United States | 32 | 78 | 56789 |
| 3 | Person 3 | Canada | 52 | 102 | 99999 |
+----+----------+---------------+-----+--------+-------+
GET .../person/2
{
"id" : 2,
"name" : "Person 2",
"country" : "United States",
"age" : 22,
"weight" :62,
"phone": "56789"
}
Now I want to update his weight and remove the phone number
PUT .../person/2
{
"id" : 2,
"name" : "Person 2",
"country" : "United States",
"age" : 22,
"weight" :78
}
or
{
"id" : 2,
"name" : "Person 2",
"country" : "United States",
"age" : 22,
"weight" :78,
"phone" : null
}
Now the database should look like this:
+----+----------+---------------+-----+--------+-------+
| ID | Name | Country | Age | Weight | Phone |
+----+----------+---------------+-----+--------+-------+
| 1 | Person 1 | Germany | 22 | 60 | 12345 |
| 2 | Person 2 | United States | 32 | 78 | NULL |
| 3 | Person 3 | Canada | 52 | 102 | 99999 |
+----+----------+---------------+-----+--------+-------+
The problem is
We extend the table like this (salery)
+----+----------+---------------+-----+--------+--------+-------+
| ID | Name | Country | Age | Weight | Salery | Phone |
+----+----------+---------------+-----+--------+--------+-------+
| 1 | Person 1 | Germany | 22 | 60 | 1929 | 12345 |
| 2 | Person 2 | United States | 32 | 78 | 2831 | NULL |
| 3 | Person 3 | Canada | 52 | 102 | 3921 | 99999 |
+----+----------+---------------+-----+--------+--------+-------+
The person using the API does not know that there is a new field in JSON for the salary. And this person now wants to change the phone number of someone again, but does not send the salary. This would also empty the salary:
{
"id" : 3,
"name" : "Person 3",
"country" : "Cananda",
"age" : 52,
"weight" :102,
"phone" : null
}
+----+----------+---------------+-----+--------+--------+-------+
| ID | Name | Country | Age | Weight | Salery | Phone |
+----+----------+---------------+-----+--------+--------+-------+
| 1 | Person 1 | Germany | 22 | 60 | 1929 | 12345 |
| 2 | Person 2 | United States | 32 | 78 | 2831 | NULL |
| 3 | Person 3 | Canada | 52 | 102 | NULL | NULL |
+----+----------+---------------+-----+--------+--------+-------+
And salary should not be null, because it was not set inside the JSON request

You could deserialize your JSON to a Map.
This way, if a property has not been sent, the property is not present in the Map. If its null, its inside the map will a null value.
ObjectMapper mapper = new ObjectMapper();
TypeReference<HashMap<String, Object>> typeReference = new TypeReference<>() {};
HashMap<String, Object> jsonMap = mapper.readValue(json, typeReference);
jsonMap.entrySet().stream().map(Map.Entry::getKey).forEach(System.out::println);
Not a very convenient solution, but it might work for you.

A common technique is to track changes on the entity POJO.
Load Dog with color = black, size = null and age = null
Set size to null (the setter will mark this field as changed)
Run update SQL
The POJO will have an internal state knowning that size was changed, and thus include that field in the UPDATE. age, on the other hand, was never set, and is thus left unchanged. jOOQ works like that, I'm sure there's others.

Only what should be updated is sent. Problem: How can you explicitly empty / delete a field? If a "NULL" is passed in the JSON, we get NULL in the object, but any other field that is not passed is NULL as well. Therefore we cannot distinguish which field can be deleted and which field cannot be touched.
The problem you have identified is genuine; I have faced this too. I think it is reasonable to not provide a technical solution for this, but rather document the API usage to let the caller know the impact of leaving out a field or sending it as null. Of course, assuming that the validations on the server side are tight and ensure sanity.
The complete object is sent. Problem: Here the object could be fetched via a GET before, changed accordingly and returned via PUT. Now we get all information back and could write the information directly back into the database. Because empty fields were either already empty before or were cleared by the user.
This is "straighter-forward" and should be documented in the API.
What happens if the objects are extended by an update of the API.
With the onus put on the caller through the documentation, this too is handled implicitly.
Or do you have to create a separate endpoint for each field?
This, again, is a design issue, the solution to which varies from person-to-person. I would rather retain the API at a record level than at the level of individual value. However, there may be cases where they are needed to be that way. Eg, status updates.

Suppose we extend the database by five more fields. The user of the API makes a GET, gets the 15 fields, but can only read the 10 fields he knows on his page (because he hasn't updated his side yet). Then he changes some of the 10 fields and sends them back via PUT. We would then update only the 10 fields on our site and the 5 new fields would be emptied from the database.
So let's start with an example - what would happen on the web, where clients are interacting with your API via HTML rendered in browsers. The client would GET a form, and that form would have input controls for each of the fields. Client updates the fields in the form, submits it, and you apply those changes to your database.
When you want to extend the API to include more fields, you add those fields to the form. The client doesn't know about those fields. So what happens?
One way to manage this is that you make sure that you include in the form the correct default values for the new fields; then, if the client ignores the new fields, the correct value will be returned when the form is submitted.
More generally, the representations we exchange in our HTTP payloads are messages; if we want to support old clients, then we need the discipline of evolving the message schema in a backwards compatible way, and our clients have to be written with the understanding that the message schema may be extended with additional fields.
The person using the API does not know that there is a new field in JSON for the salary.
The same idea holds here - the new representation includes a field "salary" that the client doesn't know about, so it is the responsibility of the client to forward that data back to you unchanged, rather than just dropping it on the floor assuming it is unimportant.
There's a bunch of prior art on this from 15-20 years ago, because people writing messages in XML were facing exactly the same sort of problems. They have left some of their knowledge behind. The easiest way to find it is to search for some key phases; for instance must ignore or must forward.
See:
Versioning XML Vocabularies
Extensibility, XML Vocabularies, and XML Schema
Events in an event store have the same kinds of problems. Greg Young's book Versioning in an Event Sourced System covers a lot of the same ground (representations of events are also messages).

The accepted answer works well but it has a huge caveat which is that it's completely untyped. If the object's fields change then you'll have no compile time warning that you're looking for the wrong fields.
Therefore I would argue that it's better to force all fields to be present in the request body. Therefore a null means the user explicitly set it to null while if the user misses a field they'll receive a 400 Bad Request with the request body describing the error in detail.
Here's a great post on how to achieve this: Configure Jackson to throw an exception when a field is missing
Here's my example in Kotlin:
data class PlacementRequestDto(
val contentId: Int,
#param:JsonProperty(required = true)
val tlxPlacementId: Int?,
val keywords: List<Int>,
val placementAdFormats: List<Int>
)
Notice that the nullable field is marked as required. This way the user has to explicitly include it in the request body.

You can control empty or null values as below
public class Person{
#JsonInclude(JsonInclude.Include.NON_NULL)
private BigDecimal salary; // this will make sure salary can't be null or empty//
private String phone; //allow phone Number to be empty
// same logic for other fields
}
i) As you're updating weight and removing the phone number,Ask client to send fields which needs to updated along with record identifier i.e id in this case
{
"id" : 2,
"weight" :78,
"phone" : null
}
ii) As you're adding salary as one more column which is mandatory field & client should be aware of it..may be you have to redesign contract

Firestore search and order by document id or by document field, which is a better way?

I have two scenarios showing as the following description, and I would like to know that searching or ordering by document id or document field is a better way according to the time costs or other considerations such as firebase pricing.
A: I set the food name as the document id (assume they are all unique) and each of them has a field called "price", and another architecture B is to make the document a random id and set the food name into the field "name"
These two kinds of architecture can both be used for searching a certain document according to ID or field, but I would like to know if I am going to search for the price of a certain food, then which kind of architecture is better?
A: collection "Food":
document
|- "apple" |- "price" : 10
|- "banana" |- "price" : 13
|- "lemon" |- "price" : 15
B: collection "Food":
document
|- "DXE9JK3V3" |- "name" : "apple"
| |- "price" : 10
|- "VS92S0DV0" |- "name" : "banana"
| |- "price" : 13
|- "VAS0D3JMV" |- "name" : "lemon"
|- "price" : 15
The second question is for document ordering (or maybe for filtering), A: I set the document id as event timestamp string, and each of them has a field called "event", and another architecture B is to make the document a random id and make the time data into the field "time"
I would like to order these documents by the time, according to ID or field, but I don't know which kind of architecture is better?
A: collection "Event":
document
|- "201912201822" |- "event" : "gym"
|- "201912130723" |- "event" : "work"
|- "201911262247" |- "event" : "party"
B: collection "Event":
document
|- "DXE9JK3V3" |- "time" : "2019-12-20 18:22"
| |- "event" : "gym"
|- "VS92S0DV0" |- "time" : "2019-12-13 07:23"
| |- "event" : "work"
|- "VAS0D3JMV" |- "time" : "2019-11-26 22:47"
|- "event" : "party"

A: I set the food name as the document id (assume they are all unique) and each of them has a field called "price", and another architecture B is to make the document a random id and set the food name into the field "name"
In my opinion, B is the option you can go ahead with. This means that this approach is more likely to be used if you want your app to massively scale. Please also see Dan McGrath's answer from the following post:
Limitations of using sequential IDs in Cloud Firestore
The second question is for document ordering (or maybe for filtering), A: I set the document id as event timestamp string, and each of them has a field called "event", and another architecture B is to make the document a random id and make the time data into the field "time"
Again, B is the option you can go ahead with. To order the results of a query, you can simply pass the direction as the second argument to the orderBy() method:
FirebaseFirestore rootRef = FirebaseFirestore.getInstance();
CollectionReference eventsRef = rootRef.collection("Events");
Query queryEventsByTime = eventsRef.orderBy("time", Query.Direction.ASCENDING);
Or Query.Direction.DESCENDING, according to your needs. But be aware that the time property should be of type Date and not of type String, as I see in your schema. To be able to save the date in Firestore, please see my answer from the following post:
ServerTimestamp is always null on Firebase Firestore

FRED Economic Series IDs

I am trying to use the FRED API to get the economic data. The examples provided on the FRED website uses series Id to get the observations/values for a particular series. I am not able to find where I can find the series IDs. For example, in the below example it is asking for series id to get the data-
https://api.stlouisfed.org/fred/series/observations?series_id=GNPCA&api_key=*your_key*
Any help is appreciated.

From the documentation on FRED API, fred/category/series, "Get the series in a category."; an example:
https://api.stlouisfed.org/fred/category/series?category_id=125&api_key=abcdefghijklmnopqrstuvwxyz123456

Just go here Federal Reserve Economic Data | FRED | St. Louis Fed and search for the series you're interested in. Alternately, you can browse by several attributes, including Category from there. Once you've searched, if you find it, click one of the results. A search for S&P 500 yielded a link to S&P 500 | FRED | St. Louis Fed, which has a chart of it. You'll see some letters and/or numbers in parentheses right beside the page's title. In this case, it is SP500. THAT is your series ID.
Your complete link will be https://api.stlouisfed.org/fred/series/observations?series_id=SP500&file_type=json&api_key=YourAPIKey. More parameters are here: St. Louis Fed Web Services: fred/series/observations

Perhaps this will help you:
GET https://api.stlouisfed.org/fred/series/search?search_text=**your+search+query**&api_key=**abc123**&file_type=json&observation_start=2016-03-01&observation_end=2016-03-15
This will return JSON data with the series id you are looking for....
{
"realtime_start": "2016-04-06",
"realtime_end": "2016-04-06",
"order_by": "search_rank",
"sort_order": "desc",
"count": 164,
"offset": 0,
"limit": 1000,
"series**s**": [
{
"id": "**FEDFUNDS**",
"realtime_start": "2016-04-06",
"realtime_end": "2016-04-06",
"title": "Effective Federal Funds Rate",
------------------ json result intentionally cut off -------------
There are different ways to get the series id's and category id's found here:
https://research.stlouisfed.org/docs/api/fred/

How to break string from an excel file into substrings and load it?

I'm actually working on a talend job. I need to load from an excel file to an oracle 11g database.
I can't figure out how to break a field of my excel entry file within talend and load the broken string into the database.
For example I've got a field like this:
toto:12;tata:1;titi:15
And I need to load into a table, for example grade:
| name | grade |
|------|-------|
| toto |12 |
| titi |15 |
| tata |1 |
|--------------|
Thank's in advance

In a Talend job, you can use tFileInputExcel to read your Excel file, and then tNormalize to split your special column into individual rows with a separator of ";". After that, use tExtractDelimitedFields with a separator of ":" to split the normalized column into name and grade columns. Then you can use a tOracleOutput component to write the result to the database.
While this solution is more verbose than the Java snippet suggested by AlexR, it has the advantage that it stays within Talend's graphical programming model.

for(String pair : str.split(";")) {
String[] kv = pair.split(":");
// at this point you have separated values
String name = kv[0];
String grade = kv[1];
dbInsert(name, grade);
}
Now you have to implement dbInsert(). Do it either using JDBC or using any higher level tools (e.g. Hivernate, iBatis, JDO, JPA etc).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.