There are lots of questions on how to format the results of an SQL query to an HTML table, but I'd like to go the other way - given an arbitrary HTML table with a header row, I'd like to be able to extract information form one or more rows using SQL (or an SQL-like language). Simple to state, but apparently not so simple to accomplish.
Ultimately, I'd prefer to parse the HTML properly with something like libtidy or JSoup, but while the API documentation is usually reasonable, when it comes to examples or tutorials on actually using them, you usually find an example of extracting the <title> tag (which could be accomplished with regexes) with no real-world examples of how to use the library. So, a good resource or example code for one of the existing, established libraries would also be good.
A simple code for transforming a table into a list of tuples using JSoup looks like this:
public class Main {
public static void main(String[] args) throws Exception {
final String html =
"<html><head/><body>" +
"<table id=\"example\">" +
"<tr><td>John</td><td>Doe</td></tr>" +
"<tr><td>Michael</td><td>Smith</td>" +
"</table>" +
"</body></html>";
final List<Tuple> tuples = parse (html, "example");
//... Here the table is parsed
}
private static final List<Tuple> parse(final String html, final String tableId) {
final List<Tuple> tuples = new LinkedList<Tuple> ();
final Element table = Jsoup.parse (html).getElementById(tableId);
final Elements rows = table.getElementsByTag("tr");
for (final Element row : rows) {
final Elements children = row.children();
final int childCount = children.size();
final Tuple tuple = new Tuple (childCount);
for (final Element child : children) {
tuple.addColumn (child.text ());
}
}
return tuples;
}
}
public final class Tuple {
private final String[] columns;
private int cursor;
public Tuple (final int size) {
columns = new String[size];
cursor = 0;
}
public String getColumn (final int no) {
return columns[no];
}
public void addColumn(final String value) {
columns[cursor++] = value;
}
}
From this on you can e.g. create an in-memory table with H2 and use a regular SQL.
Related
I have an unbounded stream of complex objects that I want to load into BigQuery. The structure of these objects represents the schema of my destination table in BigQuery.
The problem is that since there are a lot of nested fields in the POJO, its an extremely tedious task to convert it to a TableSchema object and I'm looking for a quick/ automated way to convert my POJO to TableSchema object while writing to BigQuery.
I'm not very familiar with Apache Beam API, and any help will be appreciated.
In a pipeline, I load a list of schema from GCS. I keep them in string format because the TableSchema is not serializable. However, I load them to TableSchema for validate them.
Then I add them in string format to a map in the Option object.
String schema = new String(blob.getContent());
// Decorate list of fields for allowing a correct parsing
String targetSchema = "{\"fields\":" + schema + "}";
try {
//Preload schema to ensure validity, but then use string version
Transport.getJsonFactory().fromString(targetSchema, TableSchema.class);
String tableName = blob.getName().replace(SCHEMA_FILE_PREFIX, "").replace(SCHEMA_FILE_SUFFIX, "");
tableSchemaStringMap.put(tableName, targetSchema);
} catch (IOException e) {
logger.warn("impossible to read schema " + blob.getName() + " in bucket gs://" + options.getSchemaBucket());
}
I didn't find another solution when I developed this.
In my company I created kind of a ORM (we called OBQM) to do this. We are expecting to release it to the public. The code is quite big (specially because I created annotations and so on) but I can share with you some snippets for a quick schema generation:
public TableSchema generateTableSchema(#Nonnull final Class cls) {
final TableSchema tableSchema = new TableSchema();
tableSchema.setFields(generateFieldsSchema(cls));
return tableSchema;
}
public List<TableFieldSchema> generateFieldsSchema(#Nonnull final Class cls) {
final List<TableFieldSchema> schemaFields = new ArrayList<>();
final Field[] clsFields = cls.getFields();
for (final Field field : clsFields) {
schemaFields.add(fromFieldToSchemaField(field));
}
return schemaFields;
}
This code takes all the fields from the POJO class and creates a TableSchema object (the one that BigQueryIO uses in ApacheBeam). You can see a method that I created called fromFieldToSchemaField. This method identifies each field type and setup the field name, mode, description and type. In this case to keep it simple I'm going to focus on the type and name:
public static TableFieldSchema fromFieldToSchemaField(#Nonnull final Field field) {
return fromFieldToSchemaField(field, 0);
}
public static TableFieldSchema fromFieldToSchemaField(
#Nonnull final Field field,
final int iteration) {
final TableFieldSchema schemaField = new TableFieldSchema();
final Type customType = field.getGenericType().getTypeName()
schemaField.setName(field.getName());
schemaField.setMode("NULLABLE"); // You can add better logic here, we use annotations to override this value
schemaField.setType(getFieldTypeString(field));
schemaField.setDescription("Optional"); // Optional
if (iteration < MAX_RECURSION
&& (isStruct(schemaField.getType())
|| isRecord(schemaField.getType()))) {
final List<TableFieldSchema> schemaFields = new ArrayList<>();
final Field[] fields = getFieldsFromComplexObjectField(field);
for (final Field subField : fields) {
schemaFields.add(
fromFieldToSchemaField(
subField, iteration + 1));
}
schemaField.setFields(schemaFields.isEmpty() ? null : schemaFields);
}
return schemaField;
}
And now the method that returns the BigQuery field type.
public static String getFieldTypeString(#Nonnull final Field field) {
// On my side this code is much complex but this is a short version of that
final Class<?> cls = (Class<?>) field.getGenericType()
if (cls.isAssignableFrom(String.class)) {
return "STRING";
} else if (cls.isAssignableFrom(Integer.class) || cls.isAssignableFrom(Short.class)) {
return "INT64";
} else if (cls.isAssignableFrom(Double.class)) {
return "NUMERIC";
} else if (cls.isAssignableFrom(Float.class)) {
return "FLOAT64";
} else if (cls.isAssignableFrom(Boolean.class)) {
return "BOOLEAN";
} else if (cls.isAssignableFrom(Double.class)) {
return "BYTES";
} else if (cls.isAssignableFrom(Date.class)
|| cls.isAssignableFrom(DateTime.class)) {
return "TIMESTAMP";
} else {
return "STRUCT";
}
}
Keep in mind that I'm not showing how to identify primitive types or arrays. But this is a good start for your code :). Please let me know if you need any help.
If your using JSON for the message serialization in PubSub you can make use of one of the provided templates:
PubSub To BigQuery Template
The code for that template is here:
PubSubToBigQuery.java
This is actually a re-do of an older question of mine that I have completely redone because my old question seemed to confuse people.
I have written a Java program that Queries a database and is intended to retrieve several rows of data. I have previously written the program in Informix-4GL and I am using a sql cursor to loop through the database and store each row into a "dynamic row of record". I understand there are no row of records in Java so I have ended up with the following code.
public class Main {
// DB CONNECT VARIABLE ===========================
static Connection gv_conn = null;
// PREPARED STATEMENT VARIABLES ==================
static PreparedStatement users_sel = null;
static ResultSet users_curs = null;
static PreparedStatement uinfo_sel = null;
static ResultSet uinfo_curs = null;
// MAIN PROGRAM START ============================
public static void main(String[] args) {
try {
// CONNECT TO DATABASE CODE
} catch(Exception log) {
// YOU FAILED CODE
}
f_prepare(); // PREPARE THE STATEMENTS
ArrayList<Integer> list_id = new ArrayList<Integer>();
ArrayList<String> list_name = new ArrayList<String>();
ArrayList<Integer> list_info = new ArrayList<String>();
ArrayList<String> list_extra = new ArrayList<String>();
try {
users_sel.setInt(1, 1);
users_curs = users_sel.executeQuery();
// RETRIEVE ROWS FROM USERS
while (users_curs.next()) {
int lv_u_id = users_curs.getInt("u_id");
String lv_u_name = users_curs.getString("u_name");
uinfo_sel.setInt(1, lv_u_id);
uinfo_curs = uinfo_sel.executeQuery();
// RETRIEVE DATA FROM UINFO RELATIVE TO USER
String lv_ui_info = uinfo_curs.getString("ui_info");
String lv_ui_extra = uinfo_curs.getString("ui_extra");
// STORE DATA I WANT IN THESE ARRAYS
list_id.add(lv_u_id);
list_name.add(lv_u_name);
list_info.add(lv_ui_info);
list_extra.add(lv_ui_extra);
}
} catch(SQLException log) {
// EVERYTHING BROKE
}
// MAKING SURE IT WORKED
System.out.println(
list_id.get(0) +
list_name.get(0) +
list_info.get(0) +
list_extra.get(0)
);
// TESTING WITH ARBITRARY ROWS
System.out.println(
list_id.get(2) +
list_name.get(5) +
list_info.get(9) +
list_extra.get(14)
);
}
// PREPARE STATEMENTS SEPARATELY =================
public static void f_prepare() {
String lv_sql = null;
try {
lv_sql = "select * from users where u_id >= ?"
users_sel = gv_conn.prepareStatement(lv_sql);
lv_sql = "select * from uinfo where ui_u_id = ?"
uinfo_sel = gv_conn.prepareStatement(lv_sql)
} catch(SQLException log) {
// IT WON'T FAIL COZ I BELIEEEVE
}
}
}
class DBConn {
// connect to SQLite3 code
}
All in all this code works, I can hit the database once, get all the data I need, store it in variables and work with them as I please however this does not feel right and I think it's far from the most suited way to do this in Java considering I can do it with only 15 lines of code in Informix-4GL.
Can anyone give me advice on a better way to achieve a similar result?
In order to use Java effectively you need to use custom objects. What you have here is a lot of static methods inside a class. It seems that you are coming from a procedural background and if you try to use Java as a procedural language, you will not much value from using it. So first off create a type, you can plop it right inside your class or create it as a separate file:
class User
{
final int id;
final String name;
final String info;
final String extra;
User(int id, String name, String info, String extra)
{
this.id = id;
this.name = name;
this.info = info;
this.name = name;
}
void print()
{
System.out.println(id + name + info + extra);
}
}
Then the loop becomes:
List<User> list = new ArrayList<User>();
try {
users_sel.setInt(1, 1);
users_curs = users_sel.executeQuery();
// RETRIEVE ROWS FROM USERS
while (users_curs.next()) {
int lv_u_id = users_curs.getInt("u_id");
String lv_u_name = users_curs.getString("u_name");
uinfo_sel.setInt(1, lv_u_id);
uinfo_curs = uinfo_sel.executeQuery();
// RETRIEVE DATA FROM UINFO RELATIVE TO USER
String lv_ui_info = uinfo_curs.getString("ui_info");
String lv_ui_extra = uinfo_curs.getString("ui_extra");
User user = new User(lv_u_id, lv_u_name, lv_ui_info, lv_ui_extra);
// STORE DATA
list.add(user);
}
} catch(SQLException log) {
// EVERYTHING BROKE
}
// MAKING SURE IT WORKED
list.get(0).print();
This doesn't necessarily address the number of lines. Most people who use Java don't interact with databases with this low-level API but in general, if you are looking to get down to the fewest number of lines (a questionable goal) Java isn't going to be your best choice.
Your code is actually quite close to box stock JDBC.
The distinction is that in Java, rather than having a discrete collection of arrays per field, we'd have a simple Java Bean, and a collection of that.
Some examples:
public class ListItem {
Integer id;
String name;
Integer info;
String extra;
… constructors and setters/getters ellided …
}
List<ListItems> items = new ArrayList<>();
…
while(curs.next()) {
ListItem item = new ListItem();
item.setId(curs.getInt(1));
item.setName(curs.getString(2));
item.setInfo(curs.getInfo(3));
item.setExtra(curs.getString(4));
items.add(item);
}
This is more idiomatic, and of course does not touch on the several frameworks and libraries available to make DB access a bit easier.
Currently, this is my main screen:
()
I have 2 files: “patient.txt” and “treatment.txt” which hold records of multiple patients and treatments.
What I’m trying to do is to display all of those records in a nice JTable whenever I click “Display Treatments” or “Display Patients”, in a screen like so:
I am using an MVC model for this Hospital Management System (with HMSGUIModel.java, HMSGUIView.java, HMSGUIController.java, HMSGUIInterface.java files), and add records using the following code:
FileWriter tfw = new FileWriter(file.getAbsoluteFile(), true);
BufferedWriter tbw = new BufferedWriter(tfw);
tbw.write(this.view.gettNumber() + "," + this.view.gettName() + "," + this.view.gettDoctor() + "," + this.view.gettRoom());
tbw.newLine();
tbw.flush();
JOptionPane.showMessageDialog(null, "Successfully added treatment!"); }
Please help on how I can add a reader as well, to display all the records from the text file to a table?
Many thanks in advance!!
Keeping in line with your MVC, you could create a TableModel which knew how to read a give patient record.
Personally though, I'd prefer to separate the management of the patient data from the view, so the view didn't care about where the data came from.
To this end, I would start by creating a Patient object and a Treatment object, these would hold the data in a self contained entity, making the management simpler...
You would need to read this data in and parse the results...
List<Treatment> treatments = new ArrayList<Treatment>(25);
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String text = null;
while ((text = br.readline()) != null) {
String parts[] = text.split(",");
Treatmeant treament = new Treatment(parts[0],
parts[1],
parts[2],
parts[3]);
treatments.add(treament);
}
} // Handle exception as required...
I'd wrap this into a readTreatments method in some utility class to make it easier to use...
Around about here, I'd be considering using a stand alone database or even an XML document, but that's just me.
Once you have this, you can design a TableModel to support it...
public class TreatmentTableModel extends AbstractTableModel {
protected static final String[] COUMN_NAMES = {
"Treatment-Number",
"Treatment-Name",
"Doctor-in-charge",
"Room-No",
};
protected static final Class[] COLUMN_CLASSES = new Class[]{
Integer.class,
String.class,
Doctor.class,
Integer.class,
};
private List<Treatment> treatments;
public TreatmentTableModel() {
this.treatments = new ArrayList<>();
}
public TreatmentTableModel(List<Treatment> treatments) {
this.treatments = new ArrayList<>(treatments);
}
#Override
public int getRowCount() {
return treatments.size();
}
#Override
public int getColumnCount() {
return 4;
}
#Override
public String getColumnName(int column) {
return COUMN_NAMES[column];
}
#Override
public Class<?> getColumnClass(int columnIndex) {
return COLUMN_CLASSES[columnIndex];
}
#Override
public Object getValueAt(int rowIndex, int columnIndex) {
Treatment treatment = treatments.get(rowIndex);
Object value = null;
switch (columnIndex) {
case 0:
value = treatment.getNumber();
break;
case 1:
value = treatment.getName();
break;
case 2:
value = treatment.getDoctor();
break;
case 3:
value = treatment.getRoomNumber();
break;
}
return value;
}
}
Then you simply apply it to what ever JTable you need...
private JTable treatments;
//...
treatments = new JTable(new TreatmentTableModel());
add(new JScrollPane(treatments));
Then, we you need to, you would load the List of Treatments and apply it to the table...
File file = new File("...");
treatments.setModel(new TreatmentTableModel(TreatmentUtilities.readTreatments(file)));
Depending on your needs for the table, you can look at using the DefaultTableModel and populating your data using that model. The downside to that is, you may want special capability from your table, like not being able to edit cells, store more than strings, etc... in which case you might look in to extending AbstractTableModel and defining your own behavior for the model.
A simple thing to do would be to start with the default model and expand on that.
String[] myColumns = {"Treatment-Number","Treatment-Name", "Doctor-in-charge", "Room-No"};
// init a model with no data and the specified column names
DefaultTableModel myModel = new DefaultTableModel(new Object[myList.size()][4](), myColumns);
// assuming you have a list of lists...
int i = 0;
int j = 0;
for (ArrayList<Object> list : myList) {
for ( Object o : list ) {
myModel.setValueAt(o, i, j); // set the value at cell i,j to o
j++;
}
i++;
}
JTable myTable = new JTable(myModel); // make a new table with the specified data model
// ... do other stuff with the table
If you want to access the table data, you use myTable.getModel() and update the data. This will automatically update the view of the table (completing the MVC connection)
Look here for more info on using tables.
I'm working on Lucene Library, and I found the documents required after executing a BooleanQuery.
I looped in the searcher and each time I would like to put the Document in a HashMap.
int docId = hits[i].doc;
Document doc = searcher.doc(docId);
HashMap X = new HashMap ();
Now I want to know how to fill the hashmap X with the name_Field and the value_Field of the document?
You can iterate over document fields like this:
for (IndexableField field : doc.getFields())
{
X.put(field.name(), field.stringValue());
}
But it will work only for fields which are stored in index (those which was added with Field.Store.YES flag). Also if you have several values for the field in the document this code has to be modified.
You could extends the lucene Collector then add the document as the way you want.
IndexSearcher searcher = new IndexSearcher(indexReader);
private Map<String, String> docs = new HashMap<String, String>();
searcher.search(query, new Collector() {
private int docBase;
// ignore scorer
public void setScorer(Scorer scorer) {
}
// accept docs out of order (for a BitSet it doesn't matter)
public boolean acceptsDocsOutOfOrder() {
return true;
}
public void collect(int docNum) {
Document luceneDoc = searcher.doc(doc + docBase);
docs.put(luceneDoc.getValues(name_Field), luceneDoc.getValues(value_Field));
}
public void setNextReader(AtomicReaderContext context) {
this.docBase = context.docBase;
}
});
I want to implement 2 reports with OO. The reports are all like (but have different columns and data):
name age gender phone_number
A 10 male 1234
B 20 female 5678
C 30 n/a 9012
As you can see, in the report, each column has its own header and parser (for parsing the data). I have design an object Column:
class Column<T extends Object>
{
private String header;
private ColumnParser parser;
public Column(String header)
{
this.header = header;
this.parser = new ColumnParser<T>()
{
public String parse(T t)
{
return t.toString();
}
}
}
public Column(String header, ColumnParser parser)
{
this.header = header;
this.parser = parser;
}
public interface ColumnParser<T>
{
public String parse(T t);
}
}
So that each column has its own parser to parse the data in that column. But after this, I don't know how to store the data so that they can be mapped to each column and can be parsed.
Please advise.
First, it would be helpful to know what format your original data (in memory) is in - e.g. is it an Object[][]?
Second, the output you require looks like it's tab separated. Is that correct?
Third, to write to a text file you have to append row by row. Your current code seems to suggest you want to append column by column - this would be much harder to implement.
If you can convert your data into a String[][] - which should be straightforward - you can then use the following to write to a file. If you want tab-delimited, you can use the "\t" as the delimiter (although, that's for windows - not sure if it is OS specific like new line).
public static void writeToFile(File file, String[][] data, String delimiter){
PrintWriter out = new PrintWriter(new FileWriter(file));
for (String[] row : data){
out.write(makeLine(row, delimiter));
}
out.close();
}
private static String makeLine(String[] row, String delimiter) {
StringBuilder str = new StringBuilder();
for (String cell : row){
str.append("\""+cell+"\"").append(delimiter);
}
str.deleteCharAt(str.length()-1);
str.append("\n");
return str.toString();
}