How can I read this unstructured flat file in Java? - java

I have text file
Now I am trying to read this into a two dimension array .
anyone with an example code or question which was answered ?

Consider this file divided in middle present two record in same format, you need to design class that contains fields that you want to get from this file. After that you need to read
List<String> fileLines = Files.readAllLines(Path pathToYourFile, Charset cs);
and parse this file with help of regular expressions. To simplify this task you may read lines and after that specify regexp per line.
class UnstructuredFile {
private List<String> rawLines;
public UnstructuredFile (List<String> rawLines) {
this.rawLines = rawLines;
}
public List<FileRecord> readAllRecords() {
//determine where start and stop one record in list list.sublist(0,5) or split it to List<List<String>>
}
private FileRecord readOneRecord(List<String> record) {
//read one record from list
}
}
in this class we first detect start and end of every record and after that pass it to method that parse one FileRecord from List
Maybe you need to decouple you task even more, consider you have one record
------
data 1
data 2
data 3
------
we make to do classes RecordRowOne, RecordRowTwo etc. every class have regex that know how
to parse particular line of row of the record string and returns partucular results like
RecordRowOne {
//fields
public RecordRowOne(String regex, String dataToParse) {
//code
}
int getDataOne() {
//parse
}
}
another row class in example has methods like
getDataTwo();
after you create all this row classes pass them to FileRecord class
that get data from all Row classes and it will be present one record of you file;
class FileRecord {
//fields
public FileRecord(RecordRowOne one, RecordRowTwo two) {
//get all data from rows and set it to fields
}
//all getters for fields
}
it is basic idea for you

Related

Extract values from embedded JSON string in java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I need help extracting a JSON array string into an array of objects so that it can be later processed.
The JSON string is embedded as a value within a pipe delimited string that is itself an XML element value.
A sample string is as below
<MSG>registerProfile|.|D|D|B95||43|5000|43100||UBSROOT43|NA|BMP|508|{"biometrics":{"fingerprints":{"fingerprints":[{"position":"RIGHT_INDEX","image":{"format":"BMP","resolutionDpi":"508","data":"Qk12WQEAAAAAADYAAAA="}},{"position":"LEFT_INDEX","image":{"format":"BMP","resolutionDpi":"508","data":"Qk12WQEADoAAAA"}}]}}}</MSG>
How can I extract the JSON properties and store them in separate arrays like
Format[0] =BMP
Position[0] =RIGHT_INDEX
Data[0]=Qk12WQEAAAAAADYAAAA=
Format[1] =BMP
Position[1]=LEFT_INDEX
Data[1]= Qk12WQEADoAAAA
These objects would then be passed to a separate function like below
FingerprintImage(Format[0],Position[0],Data[0]);
// ...
FingerprintImage(Format[1],Position[1],Data[1]);
// ...
public FingerprintImage(String format, String position, String data) {
setFormat(format);
setPosition(position);
setData(data);
}
I am not a java developer, the following is hopefully helpful to yourself or others who can provide more succinct syntax in java.
Firstly, we should identify there different layers of data serialization going on with your value:
<MSG></MSG> This is an outer XML element, so the first step is to interpret this value as an XML fragment and extract the XML Value.
The reason that we use XML deserialization at this top level, and not just use the string position, is that the inner values may have been XML escaped, so we need to parse the inner value using the XML encoding rules.
This leaves us with the strimg value: registerProfile|.|D|D|B95||43|5000|43100||UBSROOT43|NA|BMP|508|{"biometrics":{"fingerprints":{"fingerprints":[{"position":"RIGHT_INDEX","image":{"format":"BMP","resolutionDpi":"508","data":"Qk12WQEAAAAAADYAAAA="}},{"position":"LEFT_INDEX","image":{"format":"BMP","resolutionDpi":"508","data":"Qk12WQEADoAAAA"}}]}}}
The next level is pipe-delimited, which is the same as CSV, except the escape character is a | and usually there is no other encoding rules, as | isn't considered part of the normal lexical domain and we shouldn't need any further escaping.
You could therefore split this string into an array.
The value we are interested in is the 15th element in the array, eithe you know this in advance, or you could simply iterate through the elements to find the first one that starts with {
This leaves a JSON value: {"biometrics":{"fingerprints":{"fingerprints":[{"position":"RIGHT_INDEX","image":{"format":"BMP","resolutionDpi":"508","data":"Qk12WQEAAAAAADYAAAA="}},{"position":"LEFT_INDEX","image":{"format":"BMP","resolutionDpi":"508","data":"Qk12WQEADoAAAA"}}]}}}
Now that we have isolated the inner value in JSON format, the usual thing to do next is deserialize this value into an object.
I know OP is asking for arrays, but we can realize JSON objects as arrays if we really want to with the right tools.
In C# the above process is pretty simple, I'm sure it should be in Java as well, but my attempts keep throwing errors.
So, lets instead assume (I know... Ass-U-Me...) that there is only ever a single JSON value in the pipe-delimited array, with this knoweldge we can isolate the JSON using int String.IndexOf(str)
String xml = "<MSG>registerProfile|.|D|D|B95||43|5000|43100||UBSROOT43|NA|BMP|508|{\"biometrics\":{\"fingerprints\":{\"fingerprints\":[{\"position\":\"RIGHT_INDEX\",\"image\":{\"format\":\"BMP\",\"resolutionDpi\":\"508\",\"data\":\"Qk12WQEAAAAAADYAAAA=\"}},{\"position\":\"LEFT_INDEX\",\"image\":{\"format\":\"BMP\",\"resolutionDpi\":\"508\",\"data\":\"Qk12WQEADoAAAA\"}}]}}}</MSG>";
int start = xml.indexOf('{');
int end = xml.lastIndexOf('}') + 1; // +1 because we want to include the last character, so we need the index after it
String json = xml.substring(start, end);
results in: {"biometrics":{"fingerprints":{"fingerprints":[{"position":"RIGHT_INDEX","image":{"format":"BMP","resolutionDpi":"508","data":"Qk12WQEAAAAAADYAAAA="}},{"position":"LEFT_INDEX","image":{"format":"BMP","resolutionDpi":"508","data":"Qk12WQEADoAAAA"}}]}}}
Formatted to be pretty:
{
"biometrics": {
"fingerprints": {
"fingerprints": [
{
"position": "RIGHT_INDEX",
"image": {
"format": "BMP",
"resolutionDpi": "508",
"data": "Qk12WQEAAAAAADYAAAA="
}
},
{
"position": "LEFT_INDEX",
"image": {
"format": "BMP",
"resolutionDpi": "508",
"data": "Qk12WQEADoAAAA"
}
}
]
}
}
}
One way would be to create a class structure that matches this JSON value, then we can simply .fromJson() for the whole value, instead, lets meet halfway so we only need to define the inner class structure for the data we will actually use.
Now from this structure we can see there is an outer object that only has a single property called biometrics, this value is again an object witha single property called fingerprints. The value of this property is another object that has a single property called fingerprints except that this time it has an array value.
The following is a proof in Java, I have included first an example using serialization (using the gson library) and after that a similar implementation using only the simple-JSON library to read the values in arrays.
Try it out on JDoodle.com
MyClass.java
import java.util.*;
import java.lang.*;
import java.io.*;
//import javax.json.*;
import org.json.simple.JSONArray;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;
import com.google.gson.Gson;
public class MyClass {
public static void main(String args[]) {
String xml = "<MSG>registerProfile|.|D|D|B95||43|5000|43100||UBSROOT43|NA|BMP|508|{\"biometrics\":{\"fingerprints\":{\"fingerprints\":[{\"position\":\"RIGHT_INDEX\",\"image\":{\"format\":\"BMP\",\"resolutionDpi\":\"508\",\"data\":\"Qk12WQEAAAAAADYAAAA=\"}},{\"position\":\"LEFT_INDEX\",\"image\":{\"format\":\"BMP\",\"resolutionDpi\":\"508\",\"data\":\"Qk12WQEADoAAAA\"}}]}}}</MSG>";
int start = xml.indexOf('{');
int end = xml.lastIndexOf('}') + 1; // +1 because we want to include the last character, so we need the index after it
String jsonString = xml.substring(start, end);
JSONParser parser = new JSONParser();
Gson gson = new Gson();
try
{
// locate the fingerprints inner array using simple-JSON (org.apache.clerezza.ext:org.json.simple:0.4 )
JSONObject jsonRoot = (JSONObject) parser.parse(jsonString);
JSONObject biometrics = (JSONObject)jsonRoot.get("biometrics");
JSONObject fpOuter = (JSONObject)biometrics.get("fingerprints");
JSONArray fingerprints = (JSONArray)fpOuter.get("fingerprints");
// Using de-serialization from gson (com.google.code.gson:gson:2.8.6)
FingerPrint[] prints = new FingerPrint[fingerprints.size()];
for(int i = 0; i < fingerprints.size(); i ++)
{
JSONObject fpGeneric = (JSONObject)fingerprints.get(i);
prints[i] = gson.fromJson(fpGeneric.toString(), FingerPrint.class);
}
// Call the FingerprintImage function using the FingerPrint objects
System.out.print("FingerPrint Object Index: 0");
FingerprintImage(prints[0].image.format, prints[0].position, prints[0].image.data );
System.out.println();
System.out.print("FingerPrint Object Index: 1");
FingerprintImage(prints[1].image.format, prints[1].position, prints[1].image.data );
System.out.println();
// ALTERNATE Array Implementation (doesn't use gson)
String[] format = new String[fingerprints.size()];
String[] position = new String[fingerprints.size()];
String[] data = new String[fingerprints.size()];
for(int i = 0; i < fingerprints.size(); i ++)
{
JSONObject fpGeneric = (JSONObject)fingerprints.get(i);
position[i] = (String)fpGeneric.get("position");
JSONObject image = (JSONObject)fpGeneric.get("image");
format[i] = (String)image.get("format");
data[i] = (String)image.get("data");
}
System.out.print("Generic Arrays Index: 0");
FingerprintImage(format[0], position[0], data[0] );
System.out.println();
System.out.print("Generic Arrays Index: 1");
FingerprintImage(format[1], position[1], data[1] );
System.out.println();
}
catch (ParseException ignore) {
}
}
public static void FingerprintImage(String format, String position, String data) {
setFormat(format);
setPosition(position);
setData(data);
}
public static void setFormat(String format) {
System.out.print(", Format=" + format);
}
public static void setPosition(String position) {
System.out.print(", Position=" + position);
}
public static void setData(String data) {
System.out.print(", Data=" + data);
}
}
output
FingerPrint.java
public class FingerPrint {
public String position;
public FingerPrintImage image;
}
FingerPrintImage.java
public class FingerPrintImage {
public String format;
public int resolutionsDpi;
public String data;
}
Deserialization techniques are generally considered superior to forced/manual parsing especially when we need to pass around references to multiple parsed values. In the above example, by simply reading format, position and data into separate arrays, the relationship between them has become de-coupled, through our code implementation we can still use them together as long as we use the same array index, but the structure no longer defines the relationship between the values. De-serializing into a typed structure preserves the relationship between values and simplifies the task of passing around values that are related to each other.
update
If you used serialization, then you could pass through the equivalent FingerPrint object to any methods that need it, instead of passing through the related values individually, further to this you could simply pass around the entire array of FingerPrint objects.
public static void FingerprintImage(FingerPrint print) {
setFormat(print.image.format);
setPosition(print.position);
setData(print.image.data);
}
To process multiple FingerPrint objects in a batch, change the method to accept an array: FingerPrint[]
You could use the same technique to process arrays or each of the Format, Postion and Data, though it is really poor practise to do so. Passing around multiple arrays and expecting the receiving code to know that each of the arrays is supposed to be interpreted in sync, that is the same index in each array corresponds to the same finger print, this level of implementation detail is too ambiguous and will lead to maintenance nightmares down the track, its far better to learn and become proficient in OO concepts and creating business objects for passing around related data elements, instead of packaging everything into disassociated arrays.
The following code can assist you in processing multiple items using OPs array method but it should highlight why the practise is a bad habit to pickup:
public static void FingerprintImage(String[] formats, String[] positions, String[] datas) {
// now you must iterate each of the arrays using the same index
// however as there are no restrictions on the arrays, for each array
// and each index we should be checking that the array has not gone out
// of length.
}
From an OO point of view, passing through multiple arrays like this raises a number of issues, firstly, the developer will simply need to know that the same index must be used in each array to retrieve correlated information.
The next important issue is error handling...
If datas only has 1 element, but positions has 2 elements, which of the 2 elements does the 1 data element belong to? Or does this indicate that the same data should be used for both?
There are many other issues, consider when you expect 3 elements...
While you can get away with what seems like a shortcut in code if you really need to, you really shouldn't unless you absolutely understand what you are doing, you fully document the related code and you are taking responsibility for the potential fall out down the track.

CSV to a String Array - JAVA

I have a CSV file which has only one column with 100+ rows. I would like to put those values in an one dimensional array(only if its possible). So that it works as same as if I wrote a string array manually. I.e.
String[] username = {'lalala', 'tatata', 'mamama'}; //<---if I did it manually
String[] username = {after passing the CSV values}; //<---I want this like the above ones.
Then later I would like to be able to initialized that class to a different class, say if the class that holds the array is called ArrayClass, I would like to be able to initialized this to different class, like this --
public class MainClass{
ArrayClass array = new ArrayClass();
//Then I would like to be able to do this
someMethod(array.username);
}
I know I asked a lot of things but I seriously appreciate all your help. Even if you see this question and say THIS IS BS. Oh and one more thing I would prefer it to be in JAVA.
It might be easier to use an arraylist rather than an array as you dont have to worry about number of rows. An array has a fixed size that cant be changed. i.e ArrayList
As you have only one column you will not need to worry about commas in csv
Example code would look something like this:
import java.util.*;
import java.io.*;
public class MyClass {
private ArrayList<String> MyArray = new ArrayList<String>();
private Scanner scan;
public MyClass(){
try {
scan = new Scanner(new File("MyFile.csv"));
} catch (IOException ioex) {
System.out.println("File Not Found");
}
}
public ArrayList<String> getArray() {
while (scan.hasNext()) {
Scanner line = new Scanner(scan.nextLine());
MyArray.add(line.next());
}
return MyArray;
}
}
And in the main:
MyClass f = new MyClass();
System.out.println(f.getArray());
If it's just a csv you can use the split method of string with a proper regex.
Please do check the split method
The first half of your question is easy and can be handled in a number of different ways. Personally, I would use the Scanner class and set the delimiter to be ",". Create a new Scanner Object and then call setDelimiter(",") on it. Then simply scan through the tokens. See the example on the documentation. This method of doing things is effective because it handles reading in the file and separating it based on your criteria (the ',' character).

Design pattern for assembling disperate data?

I am designing a system which assembles disperate data in a standard row/column type output.
Each column can:
Exist in an independent system.
Can be paginated.
Can be sorted.
Each column can contain millions of rows.
And the system:
Needs to be extensible so different tables of different columns can be outputted.
The final domain object is known (the row).
The key is constant across all systems.
My current implementation plan is to design two classes per column (or one class column that implements two interfaces). The interfaces would:
Implement a pagination and sorting.
Implement "garnishing"
The idea is that the table constructor would receive information about the current sort column and page. Which would then return a list of appropriate keys for the table. This information would be used to create a list of the domain object rows which would then be passed in turn to each of the column "garnishing" implementations so that each columns information could be added in turn.
I guess my question is - what design patterns would be recommended - or alternative design decisions would people use for assembling disperate data with common keys and variable columns.
I'm not sure if I completely understood what you're trying to do, but from what I gather, you want to store rows of arbitrary data in a way that will allow you to make structured tables from it later on. What I would do in this case (assuming you're using Java) is make a very simple Column interface that would just have a "value" property:
public interface Column {
String value;
}
Then, you could make columns by implementing Column:
public class Key implements Column {
String value = new String();
public Key(String keyValue){
this.value = keyValue;
}
}
So then you can make a class called DataRow (or whatever you like) whose objects would contain the actual data. For example, you could have a method in that class that would allow you to add data:
public class DataRow {
List<Column> data = new ArrayList<Column>();
public DataRow(String key){
this.setColumn(new Key(key));
}
public void setColumn(Column columnData) {
this.data.add(columnData);
}
public Column getColumn(Class column){
for(Column c : this.data){
if(c.getClass().equals(column)){
return c;
}
}
return null;
}
}
As you can see, you can call the method setColumn() by giving it a new Column object. This will allow you to add any data you like of any type to the DataRow Object. Then, to make some tables, you could have a function that takes a List of DataRows, and a List of classes, that would then return only the objects which have data from the row specified:
public List<DataRow> createTable(List<DataRow> data, List<Class<? extends Column>> columns){
List<DataRow> table = new ArrayList<DataRow>();
for(DataRow row : data){
DataRow ret = new DataRow(row.getColumn(Key.class).value);
for(Class column : columns){
if(row.getColumn(column.getClass()) != null )ret.setColumn(row.getColumn(column.getClass()));
}
table.add(ret);
}
return table;
}
This will allow you to "create" tables using your data, and the columns you want to include in the table.
Note that I wrote this code to convey an idea, and that it's pretty messy at the moment. But I hope this will help you in some small way.

Loading a CSV file and creating new class Instances from the values

I have a class defined like this, with the appropriate getter and setter methods...
public class Album {
private int id;
private String artist;
private String name;
private int published;
}
I also have a .csv file that stores this content for a number of Albums. In the file, one line represents one Album.
I'm trying to read the information from the .csv file, and then use the setters of the Album class to assign the values. Here is my code...
public Map<Integer, Album> load() {
Scanner scanner = new Scanner(fileName);
Map<Integer, Album> loadedAlbums = new HashMap<Integer, Album>();
while(scanner.hasNextLine()) {
Album album = new Album();
String[] albumDivided = scanner.nextLine().split(",");
//in the .csv file every unit of information is divided by a comma.
album.setId(Integer.parseInt(albumDivided[0])); //this is line 11.
album.setArtist(albumDivided[1]);
album.setName(albumDivided[2]);
album.setPublished(Integer.parseInt(albumDivided[3]));
loadedAlbums.put(album.getId(), album);
}
return loadedAlbums;
}
However, trying to use this code, I get the following Exception:
java.lang.NumberFormatException: For input string: "albums.csv" at line 11.
Could you please help me to understand the cause of this problem.
Well the problem is described to you by the Exception...
A NumberFormatException would have been triggered by one of your Integer.parseInt() lines. The line of your code that is triggering the exception is Line 11 (as per the exception message) - not sure which one this is but its probably the first Integer.parseInt() line.
Your code is trying to convert the value "albums.csv" to a number, which is obviously isn't. So somewhere in your CSV file you must have a line that contains the value albums.csv where it is expecting a number.
Hope this helps pinpoint the problem.
Since you don't want the whole solution here is a hint to resolve your problem:
You should take a look at the API documentation of the Scanner class. Take a really close look on the constructor that expects a single String parameter (as you use it in your code).
As far as I can tell, albumDivided[0] will containt "1." which will not be able to parse to an integer because of the dot. Either remove the dot from your csv file, or create a new string that removes the dot before you parse it to Integer. The approach might look something like this:
String newString;
for(int i=0;i<albumDivided[0].length-1;i++){ //length -1 to remove the dot
newString = newString + albumDivided[0].charAt(i); //get the string stored in albumDivided[0] and add each char to the new string
}

how to perform joins in java without database

I need to perform Joins on 2 tables (that I have read from 2 CSV files) without use of database. I have no idea on collections (List, ArrayList). If anyone can give a detail piece of sample code on any one type of join that would be helpful.
For example I have 2 lists :
a=[2,3,4]
b=[3,4,5]
If it is an inner join
output: [3,4]
Tried so far:
for i in a:
for j in i:
if (i==j):
print(i)
Assuming that you have the following CSV files:
id,name,description
1,Foo,FooBar
2,Bar,BarFo
3,Hey,Ho
and the second one:
id,year
2,1990
1,1923
Then you could have the following structures (I'm skipping the constructors and methods for now):
public class Item {
public String name;
public String description;
}
and the second:
public class Date {
public final int year;
}
Then you could have a third one:
public class Joined {
public final Item item;
public final Date date;
}
And then you could have a Map<Integer,Joined>, and you can read the first CSV and create the Joined objects with only the Item part filled out, then read the second CSV and you could fill up the Date part of the Joined object.
In this joining part, you can decide which joining type you want to implement.
If you have a different key, then you have to change the key of the Map, or you may need to create a new class if you have a complex key.

Categories