SparkSQL + Java: Pojo to Tabular Format while working with Datasets - java

I'm pretty new to Spark SQL. While implementing one of training tasks I faced the following issue and could not find an answer (all the following examples are a bit dumb, but should be still ok for demonstration purposes).
My app reads a parquet file and creates a dataset basing on its content:
DataFrame input = sqlContext.read().parquet("src/test/resources/integration/input/source.gz.parquet");
Dataset<Row> dataset = input.as(RowEncoder$.MODULE$.apply(input.schema()));
The dataset.show() call results in:
+------------+----------------+--------+
+ Names + Gender + Age +
+------------+----------------+--------+
| Jack, Jill | Male, Female | 30, 25 |
Then I convert the dataset into a new dataset with the Person type inside:
public static Dataset<Person> transformToPerson(Dataset<Row> rawData) {
return rawData
.flatMap((Row sourceRow) -> {
// code to parse an input row and split person data goes here
Person person1 = new Person(name1, gender1, age1);
Person person2 = new Person(name2, gender2, age2);
return Arrays.asList(person1, person2);
}, Encoders.bean(Person.class));
}
where
public abstract class Human implements Serializable {
protected String name;
protected String gender;
// getters/setters go here
// default constructor + constructor with the name and gender params
}
public class Person extends Human {
private String age;
// getters/setters for the age param go here
// default constructor + constructor with the age, name and gender params
// overriden toString() method which returns the string: (<name>, <gender>, <age>)
}
Finally, when I show the dataset's content I expect to see
+------------+----------------+--------+
+ name + gender + age +
+------------+----------------+--------+
| Jack | Male | 30 |
| Jill | Femail | 25 |
However, I see
+-------------------+----------------+--------+
+ name + gender + age +
+-------------------+----------------+--------+
|(Jack, Male, 30) | | |
|(Jill, Femail, 25) | | |
Which is a result of the toString() method, while the header is correct.
I believe something is wrong with the Encoder, as far as if I use the Encoders.javaSerizlization(T) or Encoders.kryo(T) it shows
+------------------+
+ value +
+------------------+
|(Jack, Male, 30) |
|(Jill, Femail, 25)|
What worries me most is maybe the incorrect usage of encoders could result in incorrect SerDe and/or performance penalties.
I cannot not see anything special in all Spark Java examples that I can find...
Could you please suggest what I do wrong?
UPDATE 1
Here are my project's dependencies:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.2</version>
</dependency>
SOLUTION
As abaghel suggested I upgraded the version to 2.0.2 (please be aware that on version 2.0.0 there is the bug for Windows), used Dataset instead of DataFrames everywhere in my code (seems like DataFrames are not a part of Apache Spark starting from 2.0.0), and used the iterator-based flatMap function to transform from Row to Person.
Just to share, the approach of using the TraversableOnce-based flatMap for version 1.6.2 did not work for me as it threw the 'MyPersonConversion$function1 not Serializable' exception.
Now everything is working as expected.

What is the version of Spark you are using? Method for flatMap you have provided is not compiling with version 2.2.0. Return type required is Iterator<Person>. Please use below FlatMapFunction and you will get the desired output.
public static Dataset<Person> transformToPerson(Dataset<Row> rawData) {
return rawData.flatMap(row -> {
String[] nameArr = row.getString(0).split(",");
String[] genArr = row.getString(1).split(",");
String[] ageArr = row.getString(2).split(",");
Person person1 = new Person(nameArr[0], genArr[0], ageArr[0]);
Person person2 = new Person(nameArr[1], genArr[1], ageArr[1]);
return Arrays.asList(person1, person2).iterator();
}, Encoders.bean(Person.class));
}
//Call function
Dataset<Person> dataset1 = transformToPerson(dataset);
dataset1.show();

Related

Grouping multiple columns without aggregation

I have a dataframe (Dataset<Row>) which has six columns in it, out of six, four needs to be grouped and for the other two columns it may repeat the grouped columns n times based on the varying value in those two columns.
required dataset like below:
id | batch | batch_Id | session_name | time | value
001| abc | 098 | course-I | 1551409926133 | 2.3
001| abc | 098 | course-I | 1551404747843 | 7.3
001| abc | 098 | course-I | 1551409934220 | 6.3
I tired something like below
Dataset<Row> df2 = df.select("*")
.groupBy(col("id"), col("batch_Id"), col("session_name"))
.agg(max("time"));
I added agg to get groupby output but don't know how to achieve it.
Help much appreciated... Thank you.
I don't think you were too far off.
Given your first dataset:
+---+-----+--------+------------+-------------+-----+
| id|batch|batch_Id|session_name| time|value|
+---+-----+--------+------------+-------------+-----+
|001| abc| 098| course-I|1551409926133| 2.3|
|001| abc| 098| course-I|1551404747843| 7.3|
|001| abc| 098| course-I|1551409934220| 6.3|
|002| def| 097| course-II|1551409926453| 2.3|
|002| def| 097| course-II|1551404747843| 7.3|
|002| def| 097| course-II|1551409934220| 6.3|
+---+-----+--------+------------+-------------+-----+
And assuming your desired output is:
+---+--------+------------+-------------+
| id|batch_Id|session_name| max(time)|
+---+--------+------------+-------------+
|002| 097| course-II|1551409934220|
|001| 098| course-I|1551409934220|
+---+--------+------------+-------------+
I would write the following code for the aggregation:
Dataset<Row> maxValuesDf = rawDf.select("*")
.groupBy(col("id"), col("batch_id"), col("session_name"))
.agg(max("time"));
And the whole app would look like:
package net.jgp.books.spark.ch13.lab900_max_value;
import static org.apache.spark.sql.functions.col;
import static org.apache.spark.sql.functions.max;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class MaxValueAggregationApp {
/**
* main() is your entry point to the application.
*
* #param args
*/
public static void main(String[] args) {
MaxValueAggregationApp app = new MaxValueAggregationApp();
app.start();
}
/**
* The processing code.
*/
private void start() {
// Creates a session on a local master
SparkSession spark = SparkSession.builder()
.appName("Aggregates max values")
.master("local[*]")
.getOrCreate();
// Reads a CSV file with header, called books.csv, stores it in a
// dataframe
Dataset<Row> rawDf = spark.read().format("csv")
.option("header", true)
.option("sep", "|")
.load("data/misc/courses.csv");
// Shows at most 20 rows from the dataframe
rawDf.show(20);
// Performs the aggregation, grouping on columns id, batch_id, and
// session_name
Dataset<Row> maxValuesDf = rawDf.select("*")
.groupBy(col("id"), col("batch_id"), col("session_name"))
.agg(max("time"));
maxValuesDf.show(5);
}
}
Does it help?

Connect to Hive from Spark without using "hive-site.xml"

Is there any way to connect to Hive from Spark without using "hive-site.xml"?
SparkLauncher sl = new SparkLauncher(evnProps);
sl.addSparkArg("--verbose");
sl.addAppArgs(appArgs);
sl.addFile(evnProps.get(KEY_YARN_CONF_DIR) + "/hive-site.xml");
We are passing "hive-site.xml" to SparkLauncher.I want to remove dependency on "hive-site.xml"enter code here.
Spark SQL supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, these dependencies are not included in the default Spark distribution. If Hive dependencies can be found on the classpath, Spark will load them automatically. Note that these Hive dependencies must also be present on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries (SerDes) in order to access data stored in Hive.
Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.
When working with Hive, one must instantiate SparkSession with Hive support, including connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions. Users who do not have an existing Hive deployment can still enable Hive support. When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started. Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse. You may need to grant write privilege to the user who starts the Spark application.
import java.io.File;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public static class Record implements Serializable {
private int key;
private String value;
public int getKey() {
return key;
}
public void setKey(int key) {
this.key = key;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
}
// warehouseLocation points to the default location for managed databases and tables
String warehouseLocation = new File("spark-warehouse").getAbsolutePath();
SparkSession spark = SparkSession
.builder()
.appName("Java Spark Hive Example")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate();
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive");
spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src");
// Queries are expressed in HiveQL
spark.sql("SELECT * FROM src").show();
// +---+-------+
// |key| value|
// +---+-------+
// |238|val_238|
// | 86| val_86|
// |311|val_311|
// ...
// Aggregation queries are also supported.
spark.sql("SELECT COUNT(*) FROM src").show();
// +--------+
// |count(1)|
// +--------+
// | 500 |
// +--------+
// The results of SQL queries are themselves DataFrames and support all normal functions.
Dataset<Row> sqlDF = spark.sql("SELECT key, value FROM src WHERE key < 10 ORDER BY key");
// The items in DataFrames are of type Row, which lets you to access each column by ordinal.
Dataset<String> stringsDS = sqlDF.map(
(MapFunction<Row, String>) row -> "Key: " + row.get(0) + ", Value: " + row.get(1),
Encoders.STRING());
stringsDS.show();
// +--------------------+
// | value|
// +--------------------+
// |Key: 0, Value: val_0|
// |Key: 0, Value: val_0|
// |Key: 0, Value: val_0|
// ...
// You can also use DataFrames to create temporary views within a SparkSession.
List<Record> records = new ArrayList<>();
for (int key = 1; key < 100; key++) {
Record record = new Record();
record.setKey(key);
record.setValue("val_" + key);
records.add(record);
}
Dataset<Row> recordsDF = spark.createDataFrame(records, Record.class);
recordsDF.createOrReplaceTempView("records");
// Queries can then join DataFrames data with data stored in Hive.
spark.sql("SELECT * FROM records r JOIN src s ON r.key = s.key").show();
// +---+------+---+------+
// |key| value|key| value|
// +---+------+---+------+
// | 2| val_2| 2| val_2|
// | 2| val_2| 2| val_2|
// | 4| val_4| 4| val_4|
// ...

How to get Windows process Description from Java?

Here is the code to get list of currently running process in windows.
import com.sun.jna.platform.win32.Kernel32;
import com.sun.jna.platform.win32.Tlhelp32;
import com.sun.jna.platform.win32.WinDef;
import com.sun.jna.platform.win32.WinNT;
import com.sun.jna.win32.W32APIOptions;
import com.sun.jna.Native;
public class ListProcesses {
public static void main(String[] args) {
Kernel32 kernel32 = (Kernel32) Native.loadLibrary(Kernel32.class, W32APIOptions.UNICODE_OPTIONS);
Tlhelp32.PROCESSENTRY32.ByReference processEntry = new Tlhelp32.PROCESSENTRY32.ByReference();
WinNT.HANDLE snapshot = kernel32.CreateToolhelp32Snapshot(Tlhelp32.TH32CS_SNAPPROCESS, new WinDef.DWORD(0));
try {
while (kernel32.Process32Next(snapshot, processEntry)) {
System.out.println(processEntry.th32ProcessID + "\t" + Native.toString(processEntry.szExeFile)+"\t"+processEntry.readField(""));
}
}
finally {
kernel32.CloseHandle(snapshot);
}
}
}
But I am unable to get description of the process/service in output.Kindly provide solution to get process description of each running proceess. Thanks in advance.
Instead of loading and invoking Kernel32 you could simply use the following code snippet in windows which uses the Runtime to execute a native process:
public List<String> execCommand(String ... command)
{
try
{
// execute the desired command
Process proc = null;
if (command.length > 1)
proc = Runtime.getRuntime().exec(command);
else
proc = Runtime.getRuntime().exec(command[0]);
// process the response
String line = "";
List<String> output = new ArrayList<>();
try (BufferedReader input = new BufferedReader(new InputStreamReader(proc.getInputStream())))
{
while ((line = input.readLine()) != null)
{
output.add(line);
}
}
return output;
}
catch (IOException e)
{
e.printStackTrace();
}
return Collections.<String>emptyList();
}
and then execute the command which invokes the Windows Management Information Command-line:
List<String> output = execCommand("wmic.exe PROCESS where name='"+processName+"'");
processName should contain the name of the running application or exe you try to get information from.
The returned list will then contain the line output of the status information of the running application. The first entry will contain header-information for the respective fields while the following entries will contain information on all matching process names.
Further infos on WMIC:
MSDN
Product documentation
Generating HTML output for WMI
HTH
As I stumbled today across this post here which showcases how to extract the version info from an executable file, your post came back to my mind and so I started a bit of investigation.
This further post states that the process description can only be extracted from the executable file itself so we need to lay our hands on JNA instead of parsing some output from WMIC or TASKLIST. This post further links the MSDN page for VerQueryValue function which provides a C way of extracting the process description. Here especially the 2nd parameter should be of interest as it defines what to return.
With the code mentioned in the first post it is now a bit of converting the C struct typedefs into Java equivalents. I will post the complete code here which at least works for me on Windows 7 64bit:
Maven pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>at.rovo.test</groupId>
<artifactId>JNI_Test</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>JNI_Test</name>
<url>http://maven.apache.org</url>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
</plugins>
</build>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<!-- JNA 3.4
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
<version>3.4.0</version>
</dependency>
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>platform</artifactId>
<version>3.4.0</version>
</dependency>
-->
<!-- JNA 4.0.0 -->
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna-platform</artifactId>
<version>4.0.0</version>
</dependency>
<!-- -->
</dependencies>
</project>
LangAndCodePage.java - a helper class which maps the extracted hex-values to human readable output of the language and code page information contained within the translation table. The values therefore are taken from this page here:
package at.rovo.test.jni_test;
import java.util.HashMap;
import java.util.Map;
public final class LangAndCodePage
{
private final static Map<String, String> languages = new HashMap<>();
private final static Map<String, String> codePage = new HashMap<>();
static
{
languages.put("0000", "Language Neutral");
languages.put("0401", "Arabic");
languages.put("0402", "Bulgarian");
languages.put("0403", "Catalan");
languages.put("0404", "Traditional Chinese");
languages.put("0405", "Czech");
languages.put("0406", "Danish");
languages.put("0407", "German");
languages.put("0408", "Greek");
languages.put("0409", "U.S. English");
languages.put("040A", "Castilian Spanish");
languages.put("040B", "Finnish");
languages.put("040C", "French");
languages.put("040D", "Hebrew");
languages.put("040E", "Hungarian");
languages.put("040F", "Icelandic");
languages.put("0410", "Italian");
languages.put("0411", "Japanese");
languages.put("0412", "Korean");
languages.put("0413", "Dutch");
languages.put("0414", "Norwegian ? Bokmal");
languages.put("0810", "Swiss Italian");
languages.put("0813", "Belgian Dutch");
languages.put("0814", "Norwegian ? Nynorsk");
languages.put("0415", "Polish");
languages.put("0416", "Portuguese (Brazil)");
languages.put("0417", "Rhaeto-Romanic");
languages.put("0418", "Romanian");
languages.put("0419", "Russian");
languages.put("041A", "Croato-Serbian (Latin)");
languages.put("041B", "Slovak");
languages.put("041C", "Albanian");
languages.put("041D", "Swedish");
languages.put("041E", "Thai");
languages.put("041F", "Turkish");
languages.put("0420", "Urdu");
languages.put("0421", "Bahasa");
languages.put("0804", "Simplified Chinese");
languages.put("0807", "Swiss German");
languages.put("0809", "U.K. English");
languages.put("080A", "Spanish (Mexico)");
languages.put("080C", "Belgian French");
languages.put("0C0C", "Canadian French");
languages.put("100C", "Swiss French");
languages.put("0816", "Portuguese (Portugal)");
languages.put("081A", "Serbo-Croatian (Cyrillic)");
codePage.put("0000", "7-bit ASCII");
codePage.put("03A4", "Japan (Shift ? JIS X-0208)");
codePage.put("03B5", "Korea (Shift ? KSC 5601)");
codePage.put("03B6", "Taiwan (Big5)");
codePage.put("04B0", "Unicode");
codePage.put("04E2", "Latin-2 (Eastern European)");
codePage.put("04E3", "Cyrillic");
codePage.put("04E4", "Multilingual");
codePage.put("04E5", "Greek");
codePage.put("04E6", "Turkish");
codePage.put("04E7", "Hebrew");
codePage.put("04E8", "Arabic");
}
// prohibit instantiation
private LangAndCodePage()
{
}
public static void printTranslationInfo(String lang, String cp)
{
StringBuilder builder = new StringBuilder();
builder.append("Language: ");
builder.append(languages.get(lang));
builder.append(" (");
builder.append(lang);
builder.append("); ");
builder.append("CodePage: ");
builder.append(codePage.get(cp));
builder.append(" (");
builder.append(cp);
builder.append(");");
System.out.println(builder.toString());
}
}
Last but not least the code which extracts the file version and the file description of the Windows explorer. The code contains plenty of documentation as I used it to learn the stuff myself ;)
package at.rovo.test.jni_test;
import java.io.IOException;
import com.sun.jna.Memory;
import com.sun.jna.Pointer;
import com.sun.jna.platform.win32.VerRsrc.VS_FIXEDFILEINFO;
import com.sun.jna.platform.win32.Version;
import com.sun.jna.ptr.IntByReference;
import com.sun.jna.ptr.PointerByReference;
import java.math.BigInteger;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;
public class FileVersion
{
// The structure as implemented by the MSDN article
public static class LANGANDCODEPAGE extends Structure
{
/** The language contained in the translation table **/
public short wLanguage;
/** The code page contained in the translation table **/
public short wCodePage;
public LANGANDCODEPAGE(Pointer p)
{
useMemory(p);
}
public LANGANDCODEPAGE(Pointer p, int offset)
{
useMemory(p, offset);
}
public static int sizeOf()
{
return 4;
}
// newer versions of JNA require a field order to be set
#Override
protected List getFieldOrder()
{
List fieldOrder = new ArrayList();
fieldOrder.add("wLanguage");
fieldOrder.add("wCodePage");
return fieldOrder;
}
}
public static void main(String[] args) throws IOException
{
// http://msdn.microsoft.com/en-us/library/ms647464%28v=vs.85%29.aspx
//
// VerQueryValue will take two input and two output parameters
// 1. parameter: is a pointer to the version-information returned
// by GetFileVersionInfo
// 2. parameter: will take a string and return an output depending on
// the string:
// "\\"
// Is the root block and retrieves a VS_FIXEDFILEINFO struct
// "\\VarFileInfo\Translation"
// will return an array of Var variable information structure
// holding the language and code page identifier
// "\\StringFileInfo\\{lang-codepage}\\string-name"
// will return a string value of the language and code page
// requested. {lang-codepage} is a concatenation of a language
// and the codepage identifier pair found within the translation
// array in a hexadecimal string! string-name must be one of the
// following values:
// Comments, InternalName, ProductName, CompanyName,
// LegalCopyright, ProductVersion, FileDescription,
// LegalTrademarks, PrivateBuild, FileVersion,
// OriginalFilename, SpecialBuild
// 3. parameter: contains the address of a pointer to the requested
// version information in the buffer of the 1st parameter.
// 4. parameter: contains a pointer to the size of the requested data
// pointed to by the 3rd parameter. The length depends on
// the input of the 2nd parameter:
// *) For root block, the size in bytes of the structure
// *) For translation array values, the size in bytes of
// the array stored at lplpBuffer;
// *) For version information values, the length in
// character of the string stored at lplpBuffer;
String filePath = "C:\\Windows\\explorer.exe";
IntByReference dwDummy = new IntByReference();
dwDummy.setValue(0);
int versionlength =
Version.INSTANCE.GetFileVersionInfoSize(filePath, dwDummy);
if (versionlength > 0)
{
// will hold the bytes of the FileVersionInfo struct
byte[] bufferarray = new byte[versionlength];
// allocates space on the heap (== malloc in C/C++)
Pointer lpData = new Memory(bufferarray.length);
// will contain the address of a pointer to the requested version
// information
PointerByReference lplpBuffer = new PointerByReference();
// will contain a pointer to the size of the requested data pointed
// to by lplpBuffer.
IntByReference puLen = new IntByReference();
// reads versionLength bytes from the executable file into the FileVersionInfo struct buffer
boolean fileInfoResult =
Version.INSTANCE.GetFileVersionInfo(
filePath, 0, versionlength, lpData);
// retrieve file description for language and code page "i"
boolean verQueryVal =
Version.INSTANCE.VerQueryValue(
lpData, "\\", lplpBuffer, puLen);
// contains version information for a file. This information is
// language and code page independent
VS_FIXEDFILEINFO lplpBufStructure =
new VS_FIXEDFILEINFO(lplpBuffer.getValue());
lplpBufStructure.read();
int v1 = (lplpBufStructure.dwFileVersionMS).intValue() >> 16;
int v2 = (lplpBufStructure.dwFileVersionMS).intValue() & 0xffff;
int v3 = (lplpBufStructure.dwFileVersionLS).intValue() >> 16;
int v4 = (lplpBufStructure.dwFileVersionLS).intValue() & 0xffff;
System.out.println(
String.valueOf(v1) + "." +
String.valueOf(v2) + "." +
String.valueOf(v3) + "." +
String.valueOf(v4));
// creates a (reference) pointer
PointerByReference lpTranslate = new PointerByReference();
IntByReference cbTranslate = new IntByReference();
// Read the list of languages and code pages
verQueryVal = Version.INSTANCE.VerQueryValue(
lpData, "\\VarFileInfo\\Translation", lpTranslate, cbTranslate);
if (cbTranslate.getValue() > 0)
{
System.out.println("Found "+(cbTranslate.getValue()/4)
+ " translation(s) (length of cbTranslate: "
+ cbTranslate.getValue()+" bytes)");
}
else
{
System.err.println("No translation found!");
return;
}
// Read the file description
// msdn has this example here:
// for( i=0; i < (cbTranslate/sizeof(struct LANGANDCODEPAGE)); i++ )
// where LANGANDCODEPAGE is a struct holding two WORDS. A word is
// 16 bits (2x 8 bit = 2 bytes) long and as the struct contains two
// words the length of the struct should be 4 bytes long
for (int i=0; i < (cbTranslate.getValue()/LANGANDCODEPAGE.sizeOf()); i++))
{
// writes formatted data to the specified string
// out: pszDest - destination buffer which receives the formatted, null-terminated string created from pszFormat
// in: ccDest - the size of the destination buffer, in characters. This value must be sufficiently large to accomodate the final formatted string plus 1 to account for the terminating null character.
// in: pszFormat - the format string. This string must be null-terminated
// in: ... The arguments to be inserted into the pszFormat string
// hr = StringCchPrintf(SubBlock, 50,
// TEXT("\\StringFileInfo\\%04x%04x\\FileDescription"),
// lpTranslate[i].wLanguage,
// lpTranslate[i].wCodePage);
// fill the structure with the appropriate values
LANGANDCODEPAGE langCodePage =
new LANGANDCODEPAGE(lpTranslate.getValue(), i*LANGANDCODEPAGE.sizeOf());
langCodePage.read();
// convert short values to hex-string:
// https://stackoverflow.com/questions/923863/converting-a-string-to-hexadecimal-in-java
String lang = String.format("%04x", langCodePage.wLanguage);
String codePage = String.format("%04x",langCodePage.wCodePage);
// see http://msdn.microsoft.com/en-us/library/windows/desktop/aa381058.aspx
// for proper values for lang and codePage
LangAndCodePage.printTranslationInfo(lang.toUpperCase(), codePage.toUpperCase());
// build the string for querying the file description stored in
// the executable file
StringBuilder subBlock = new StringBuilder();
subBlock.append("\\StringFileInfo\\");
subBlock.append(lang);
subBlock.append(codePage);
subBlock.append("\\FileDescription");
printDescription(lpData, subBlock.toString());
}
}
else
System.out.println("No version info available");
}
private static void printDescription(Pointer lpData, String subBlock)
{
PointerByReference lpBuffer = new PointerByReference();
IntByReference dwBytes = new IntByReference();
// Retrieve file description for language and code page "i"
boolean verQueryVal = Version.INSTANCE.VerQueryValue(
lpData, subBlock, lpBuffer, dwBytes);
// a single character is represented by 2 bytes!
// the last character is the terminating "\n"
byte[] description =
lpBuffer.getValue().getByteArray(0, (dwBytes.getValue()-1)*2);
System.out.println("File-Description: \""
+ new String(description, StandardCharsets.UTF_16LE)+"\"");
}
}
Finally, the output I'm receiving on my German Windows 7 64bit:
[exec:exec]
Version: 6.1.7601.17567
Found 1 translation(s) (length of cbTranslate: 4 bytes)
Language: German (0407); CodePage: Unicode (04B0);
File-Description: "Windows-Explorer"
HTH
#Edit: updated the code to use a class representation of the struct to simplify the extraction of the values (dealing with bytes does require them to swith the order of the bytes received - big and little endian issue)
Found after some tries an application that uses multiple languages and code pages within their file where I could test the output of multiple translations.
#Edit2: refactored the code and put it up on github. As mentioned in the comment, the output running the code from the github repo for the Logitech WingMan Event Monitor returns multiple language and codepage segments:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXX Information contained in: C:\Program Files\Logitech\Gaming Software\LWEMon.exe
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
File: C:\Program Files\Logitech\Gaming Software\LWEMon.exe
Version: 5.10.127.0
Language: U.S. English
CodePage: Multilingual
Original-Filename: LWEMon.exe
Company-Name: Logitech Inc.
File-Description: Logitech WingMan Event Monitor
File-Version: 5.10.127
Product-Version: 5.10.127
Product-Name: Logitech Gaming Software
Internal-Name: LWEMon
Private-Build:
Special-Build:
Legal-Copyright: © 1999-2010 Logitech. All rights reserved.
Legal-Trademark: Logitech, the Logitech logo, and other Logitech marks are owned by Logitech and may be registered. All other trademarks are the property of their respective owners.
Comment: Created by the WingMan Team.
File: C:\Program Files\Logitech\Gaming Software\LWEMon.exe
Version: 5.10.127.0
Language: Japanese
CodePage: Multilingual
Original-Filename: LWEMon.exe
Company-Name: Logicool Co. Ltd.
File-Description: Logicool WingMan Event Monitor
File-Version: 5.10.127
Product-Version: 5.10.127
Product-Name: Logicool Gaming Software
Internal-Name: LWEMon
Private-Build:
Special-Build:
Legal-Copyright: © 1999-2010 Logicool Co. Ltd. All rights reserved.
Legal-Trademark: Logicool, the Logicool logo, and other Logicool marks are owned by Logicool and may be registered. All other trademarks are the property of their respective owners.
Comment: Created by the WingMan Team.
(Answered by the OP in an answer-response. Converted to a community wiki answer. See Question with no answers, but issue solved in the comments (or extended in chat) )
Actually I find out another way ie. Windows PowerShell commands:
get-process notepad | select-object description.
Therefore, I am using command line to get descriptions of currently running process. Similarly for Services:
get-service | where {$_.status -eq 'running'}.

Regular expression, omit bracket

Need a help in figuring out the regular expression where I need to remove all the data between {{ and }}?
Below is the coupus:
{{for|the American actor|Russ Conway (actor)}}
{{Use dmy dates|date=November 2012}}
{{Infobox musical artist <!-- See Wikipedia:WikiProject_Musicians -->
| birth_name = Trevor Herbert Stanford
| birth_date = {{birth date|1925|09|2|df=y}}
| birth_place = [[Bristol]], [[England]], UK
| death_date = {{death date and age|2000|11|16|1925|09|02|df=y}}
| death_place = [[Eastbourne]], [[Sussex]], England, UK
| origin =
}}
record|hits]].<ref name="British Hit Singles & Albums"/>
{{reflist}}
==External links==
*[http://www.russconway.co.uk/ Russ Conway]
*{{YouTube|TnIpQhDn4Zg|Russ Conway playing Side Saddle}}
{{Authority control|VIAF=41343596}}
<!-- Metadata: see [[Wikipedia:Persondata]] -->
{{Persondata
| NAME =Conway, Russ
}}
{{DEFAULTSORT:Conway, Russ}}
[[Category:1925 births]]
Below is the output with all the curly braces are removed along with the text within it:
record|hits]].<ref name="British Hit Singles & Albums"/>
==External links==
*[http://www.russconway.co.uk/ Russ Conway]
*
<!-- Metadata: see [[Wikipedia:Persondata]] -->
[[Category:1925 births]]
P.S - I have omitted the space in the output, I will take care of that.
This would take care of nested {{ }}
Matcher m=Pattern.compile("\\{[^{}]*\\}").matcher(input);
while(m.find())
{
input=m.replaceAll("");
m.reset(input);
}
string.replaceAll("\\{\\{[\\s\\S]*?\\}\\}","");
will produce:
record|hits]].<ref name="British Hit Singles & Albums"/>
==External links==
*[http://www.russconway.co.uk/ Russ Conway]
*
<!-- Metadata: see [[Wikipedia:Persondata]] -->
[[Category:1925 births]]

How do I pass the data to a ResultSet from a MySQL Query?

When I run a query it returns the following results
periodEndingDate | TotalMin | TimesheetId |
-------------------------------------------
2007-08-19 | 38.000 | 1|
2010-09-17 | 26.500 | 2|
So, I have the following way of getting the values, I know the third one is getting it right (The Number one) but how can I cast to the second and first columns ? (Date) and (Float or Double), the TotalMin is an addition of several columns (minutes) of ints divided by 60.
This is what I have
Statement st = conn.createStatement();
ResultSet rs = st.executeQuery("SELECT periodEndingDate,
((minutesMon+minutesTue+minutesWed+minutesThu+"
+"minutesFri+minutesSat+minutesSun)/60) as TotalMin,
TimesheetId FROM timesheet WHERE employeeID='"+empID+"';");
for (; rs.next();) {
ArrayList tmData = new ArrayList();
tmData.add((String) rs.getObject(1));
tmData.add((String) rs.getObject(2));
//for the timesheet id
tmData.add(((Number) rs.getObject(3)).intValue());
TimeSheetData.add(tmData);
}
Thank you
You can use rs.getDate() or rs.getTimestamp() for the first column.
You can use rs.getDouble() for the second one.
This is better than using rs.getObject() and then casting it. I'd change your 3rd column to rs.getInt()
Note : rs.getInt() and rs.getDouble() return primitives that get autoboxed when they are added to your list.
Hope this is what you are after
Create a class
class Timesheet
{
//periodEndingDate | TotalMin | TimesheetId
private Date periodEndingDate;
private BigDecimal totalMin;
private int timesheetId;
//---------GETTERS AND SETTERS
}
Then in your code above replace for loop with this
List<Timesheet> tmData = new ArrayList<Timesheet>();
while (rs.next()) {
Timesheet timesheet = new Timesheet();
//periodEndingDate,TotalMin and TimesheetId should be the name of columns in the table
timesheet.setPeriodEndingDate( rs.getDate("periodEndingDate"));
timesheet.setTotalMin( rs.getBigDecimal("TotalMin"));
timesheet.setTimesheetId(rs.getInt("TimesheetId"));
tmData.add(timesheet);
}
Why cant you just do:
tmData.add((DateTime) rs.getObject(1));
tmData.add((decimal) rs.getObject(2));
What am I missing?

Categories