Can't get armenian months names - java

I want to print armenian month names but it doesn't work. This is my code:
Locale loc = new Locale("hy");
Calendar cal = Calendar.getInstance(loc);
System.out.println(cal.getDisplayName(Calendar.MONTH, Calendar.LONG_STANDALONE, loc));
I have tried many others abbreviation like "hye" or "arm", but nothing works. Other language such as russian "ru" work fine. I have no idee what i'm doing wrong

There was an enhancement in JDK8 wherein the CLDR's XML-based locale data has been incorporated into the JDK 8 release, however it is disabled by default.
So, if you run your code with the argument -Djava.locale.providers=CLDR or add the same through the java.locale.providers System.property in your code, hy: Armenian hy_AM: Armenian will be supported.
With JDK 9 enhancements , CLDR locale data is enabled by default. So, the code will run without adding any system property.
Hope this helps.

After browsing Oracles Website I've found a list of supported languishes and Locale_IDs. As it seems the languish you want is not supported by JDK7 Locale.
http://www.oracle.com/technetwork/java/javase/javase7locales-334809.html

This language is not supported, but you can create your own locale by following this guide.
This is the javadoc of Locale.Builder
https://docs.oracle.com/javase/8/docs/api/java/util/Locale.Builder.html

The answer of #Pallavi is correct for Java-8 and Java-9.
However, if you are on Java-7, then you could set up your own DateFormatSymbolsProvider specialized for Armenian language via the service loader mechanism.
You will need a file within META-INF/services-subdirectory like with exactly this name:
META-INF/services/java.text.spi.DateFormatSymbolsProvider
And the content of this file should contain a line like this (please adjust the names to your real implementation class of service provider mentioned above):
mypackage.MyImplementationOfDateFormatSymbolsProvider
As soon as you have created an appropriate jar-library with this META-INF-substructure included, the new service provider for Armenian will be queried, too.
About the required text resources, I have imported the CLDR-v30-resources into my own library Time4J. Maybe you can take profit from the resource file for Armenian (also containing standalone-forms for month names) and use a part of the content for your own service provider.

With the following code you can print out all supported Calendar locales (sorted by languageTag):
Locale[] locales = Calendar.getAvailableLocales();
Arrays.sort(locales, Comparator.comparing(Locale::toLanguageTag));
for (Locale locale : locales)
System.out.print(" " + locale.toLanguageTag());
Unfortunately, in my Oracle Java 8, there is no Armenian locale (beginning with "hy") in this list.
ar ar-AE ar-BH ar-DZ ar-EG ar-IQ ar-JO ar-KW ar-LB ar-LY ar-MA ar-OM ar-QA ar-SA ar-SD ar-SY ar-TN ar-YE be be-BY bg bg-BG ca ca-ES cs cs-CZ da da-DK de de-AT de-CH de-DE de-GR de-LU el el-CY el-GR en en-AU en-CA en-GB en-IE en-IN en-MT en-NZ en-PH en-SG en-US en-ZA es es-AR es-BO es-CL es-CO es-CR es-CU es-DO es-EC es-ES es-GT es-HN es-MX es-NI es-PA es-PE es-PR es-PY es-SV es-US es-UY es-VE et et-EE fi fi-FI fr fr-BE fr-CA fr-CH fr-FR fr-LU ga ga-IE he he-IL hi hi-IN hr hr-HR hu hu-HU id id-ID is is-IS it it-CH it-IT ja ja-JP ja-JP-u-ca-japanese-x-lvariant-JP ko ko-KR lt lt-LT lv lv-LV mk mk-MK ms ms-MY mt mt-MT nl nl-BE nl-NL nn-NO no no-NO pl pl-PL pt pt-BR pt-PT ro ro-RO ru ru-RU sk sk-SK sl sl-SI sq sq-AL sr sr-BA sr-CS sr-Latn sr-Latn-BA sr-Latn-ME sr-Latn-RS sr-ME sr-RS sv sv-SE th th-TH th-TH-u-nu-thai-x-lvariant-TH tr tr-TR uk uk-UA und vi vi-VN zh zh-CN zh-HK zh-SG zh-TW
Edit:
With Oracle Java 8 and additional option -Djava.locale.providers=CLDR as suggested in
Pallavi's answer
the resulting list contains the Armenian locale ("hy"):
aa af af-NA agq ak am ar ar-AE ar-BH ar-DZ ar-EG ar-IQ ar-JO ar-KW ar-LB ar-LY ar-MA ar-OM ar-QA ar-SA ar-SD ar-SY ar-TN ar-YE as asa az az-Cyrl bas be be-BY bem bez bg bg-BG bm bn bn-IN bo br brx bs byn ca ca-ES cgg chr cs cs-CZ cy da da-DK dav de de-AT de-CH de-DE de-GR de-LI de-LU dje dua dyo dz ebu ee el el-CY el-GR en en-AU en-BE en-BW en-BZ en-CA en-Dsrt en-GB en-HK en-IE en-IN en-JM en-MT en-NA en-NZ en-PH en-PK en-SG en-TT en-US en-US-POSIX en-ZA en-ZW eo es es-419 es-AR es-BO es-CL es-CO es-CR es-CU es-DO es-EC es-ES es-GQ es-GT es-HN es-MX es-NI es-PA es-PE es-PR es-PY es-SV es-US es-UY es-VE et et-EE eu ewo fa fa-AF ff fi fi-FI fil fo fr fr-BE fr-CA fr-CH fr-FR fr-LU fur ga ga-IE gd gl gsw gu guz gv ha haw he he-IL hi hi-IN hr hr-HR hu hu-HU hy ia id id-ID ig ii is is-IS it it-CH it-IT ja ja-JP ja-JP-u-ca-japanese-x-lvariant-JP jmc ka kab kam kde kea khq ki kk kl kln km kn ko ko-KR kok ksb ksf ksh kw lag lg ln lo lt lt-LT lu luo luy lv lv-LV mas mer mfe mg mgh mk mk-MK ml mr ms ms-BN ms-MY mt mt-MT mua my naq nb nd ne ne-IN nl nl-BE nl-NL nmg nn nn-NO no no-NO nr nso nus nyn om or pa pa-Arab pl pl-PL ps pt pt-BR pt-PT rm rn ro ro-RO rof ru ru-RU ru-UA rw rwk saq sbp se seh ses sg shi shi-Tfng si sk sk-SK sl sl-SI sn so sq sq-AL sr sr-BA sr-CS sr-Cyrl-BA sr-Latn sr-Latn-BA sr-Latn-ME sr-Latn-RS sr-ME sr-RS ss ssy st sv sv-FI sv-SE sw sw-KE swc ta te teo th th-TH th-TH-u-nu-thai-x-lvariant-TH ti ti-ER tig tn to tr tr-TR ts twq tzm uk uk-UA und ur ur-IN uz uz-Arab uz-Latn vai vai-Latn ve vi vi-VN vun wae wal xh xog yav yo zh zh-CN zh-HK zh-Hans-HK zh-Hans-MO zh-Hans-SG zh-Hant zh-Hant-HK zh-Hant-MO zh-SG zh-TW zu

Related

Different precision/recall/f1 values when reproducing Stanford CoreNLP austen demo in CLI and Code

I'm trying to validate my Stanford CoreNLP based java code with the Jane Austen files from the Stanford CRF FAQ. I'm training as described in the FAQ in the CLI with the following command:
# Training with corenlp 3.9.2
java -cp stanford-ner-2018-10-16/stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop
# testing with corenlp 3.9.2
java -cp stanford-ner-2018-10-16/stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier \
-loadClassifier ner-model.ser.gz -testFile jane-austen-emma-ch2.tsv
This gives me the following results:
CRFClassifier tagged 1999 words in 1 documents at 6227,41 words per second.
Entity P R F1 TP FP FN
PERS 0,8205 0,7273 0,7711 32 7 12
Totals 0,8205 0,7273 0,7711 32 7 12
Now I have java code to train and test the model programmaticaly:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Properties;
import edu.stanford.nlp.ie.crf.CRFClassifier;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.sequences.SeqClassifierFlags;
import edu.stanford.nlp.util.Triple;
public class train {
public static void main(String[] args) throws FileNotFoundException, IOException {
System.out.println("Start NER training");
Properties properties = new Properties();
properties.load(new FileInputStream(new File("data/eval/austen.prop")));
SeqClassifierFlags flags = new SeqClassifierFlags(properties);
CRFClassifier<CoreLabel> crf = new CRFClassifier<>(flags);
crf.train();
crf.serializeClassifier("data/eval/ner-model.ser.gz");
Triple<Double,Double,Double> scores = crf.classifyAndWriteAnswers("data/eval/jane-austen-emma-ch2.tsv", true);
System.out.println("Scores:");
System.out.format(" Precision:\t%.2f%%\n", scores.first);
System.out.format(" Recall:\t%.2f%%\n", scores.second);
System.out.format(" F1:\t\t%.2f%%\n", scores.third);
System.out.println();
System.out.println("End NER training done");
}
}
This gives me different values for precision/recall/f1:
CRFClassifier tagged 1999 words in 1 documents at 12572,33 words per second.
Entity P R F1 TP FP FN
PERS 0,8250 0,7500 0,7857 33 7 11
Totals 0,8250 0,7500 0,7857 33 7 11
Scores:
Precision: 82,50%
Recall: 75,00%
F1: 78,57%
The testing, training and austen.prop files where taken unchanged from stanford. Only the testing file jane-austen-emma-ch2.tsv was modified as described on dies Question.
I read the stanford sources to see if i missed something, but my values kept unchanged.
What am I missing?
Thanks for help in advance.
Update: I did some further tests. It's definitively a training issue. The model trained with the cli has one mode false negative. I made a diff to see which entity is not detected:
371,372c371,372
< Miss PERS PERS
< Churchill PERS PERS
---
> Miss PERS O
> Churchill PERS O
It's 'Miss Churchill'.
Update2: Found it. It's the following line in edu/stanford/nlp/ie/crf/CRFClassifier.java:
crf.knownLCWords.setMaxSize(-1);
Unfortunately the field is not visible public. And the object returned by crf.getKnownLCWords() has no method setMaxSize().
Some further reading of the CoreNLP sources lead to an solution. I casted the object returned by crf.getKnownLCWords() to a MaxSizeConcurrentHashSet. Now setMaxSize() is available and the Precision/Recall/F1 values are the same compared to the training with the CoreNLP CLI.
CRFClassifier<CoreLabel> crf = new CRFClassifier<>(flags);
MaxSizeConcurrentHashSet<String> knownLCWords = (MaxSizeConcurrentHashSet<String>) crf.getKnownLCWords();
knownLCWords.setMaxSize(-1);
crf.train();

Regex Java word context

what I want to achieve is that I want to obtain the context of an acronym. Can you help me pls with the regular expression?
I am looping over the text (String) and looking for dots, after match I am trying to get the context of the particular found acronym, so that I can do some other processing after that, but I cant get the context. I need to take at least 5 words before and 5 words after the acronym.
//Pattern to match each word ending with dot
Pattern pattern = Pattern.compile("(\\w+)\\b([.])");
Matcher matchDot = pattern.matcher(textToCorrect);
while (matchDot.find()) {
System.out.println("zkratka ---"+matchDot.group()+" ---");
//5 words before and after tha match = context
// Matcher matchContext = Pattern.compile("(.{25})("+matchDot.group()+")(.{25})").matcher(textToCorrect);
Pattern patternContext = Pattern.compile("(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,10}"+matchDot.group()+"(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,10}");
Matcher matchContext = patternContext.matcher(textToCorrect);
if (matchContext.find()) {
System.out.println("context: "+matchContext.group()+" :");
// System.out.println("context: "+matchContext.group(1)+" :");
// System.out.println("context: "+matchContext.group(2)+" :");
}
}
Example:
input:
Some 84% of Paris residents see fighting pol. as a priority and 54% supported a diesel ban in the city by 2020, according a poll carried out for the Journal du Dimanche.
output:
1-st regex will find pol.
2-nd regex will find "of Paris residents see fighting pol. as a priority and 54%"
Another example with more text
I need to loop through this once and every time I match an acronym to get the context of this particular acronym. After that I am processing some datamining. Here's the original text
neklidná nemocná, vyš. je možné provést pouze nativně
Na mozku je patrna hyperdenzita v počátečním úseku a. cerebri media
vlevo, vlevo se objevuje již smazání hranic mezi bazálními ganglii a
okolní bílou hmotou a mírná difuzní hypointenzita v periventrikulární
bílé hmotě. Kromě těchto čerstvých změn jsou patrné staré
postmalatické změny temporálně a parietookcipitálně vlevo. Oboustranně
jsou patrné vícečetné vaskulární mikroléze v centrum semiovale bilat.
Nejsou známky nitrolebního krvácení. skelet kalvy orientačně nihil tr.
Z á v ě r: Známky hyperakutní ischemie v povodí ACM vlevo, staré
postmalatickéé změny T,P a O vlevo, vaskulární mikroléze v centrum
semiovale bilat.
CT AG: vyš. po bolu k.l..
Po zklidnění nemocné se podařilo provést CT AG. Na krku je naznačený
kinkink na ACC vlevo a ACI vlevo pod bazí. Kalcifikace v karotických
sifonech nepůsobí hemodynamicky významné stenozy. Intrakraniálně je
patrný konický uzávěr operkulárního úseku a. cerebri media vlevo pro
parietální lalok. Ostatní nález na intrakraniálním tepenném řečišti je
v mezích normy.
Z á v ě r: uzávěr operkulárního úseku a. cerebri media vlevo.
Of course if it matches end of sentence is ok for me :-) The question is to find all the acronyms even if they are before new line (\n)
I would try this out:
(?:\w+\W+){5}((?:\w.?)+)(?:\w+\W+){5}
Though natural language processing with regular expressions cannot be accurate.
((?:[\w!##$%&*]+\s+){5}([\w!##$%&*]+\.)(?:\s+[\w!##$%&*]+){5})
Try this.See demo.
https://regex101.com/r/aQ3zJ3/9

DB2 insert UTF-8 characters on non unicode database with ALT_COLLATE UNICODE

I am trying to insert Chinese text in a DB2 Database but not working.
The database is configured by default as ANSI (en_US 819) (and it's a requirement for other applications that use the dame databse) ALT_COLLATE IDENTITY_16BIT is defined and UNICODE tables are created using CCSID UNICODE but unicode characters for Chinese or Korean are not inserted.
Example table:
CREATE TABLE LANGS (
IDIOMA char(2) NOT NULL,
PAIS char(2) NOT NULL,
TRADUC long varchar NOT NULL,
) CCSID UNICODE;
Example insert:
INSERT INTO LANGS (IDIOMA,PAIS,TRADUC) VALUES ('zh','TW','其他');
System Information:
Server: DB2 9.7 on Ubuntu 64bit (en_US)
Client: Windows 7 32bit (es_ES) Java 7 with db2jcc.jar
Example Java extract:
Class.forName("com.ibm.db2.jcc.DB2Driver");
...
Properties props = new Properties();
props.setProperty("user", user);
props.setProperty("password", pass);
props.setProperty("DB2CODEPAGE", "1208");
props.setProperty("retrieveMessagesFromServerOnGetMessage", "true");
con = DriverManager.getConnection(url, props);
...
Statement statement = con.createStatement();
statement.execute(sql);
...
statement.close();
con.close();
DB cfg get
DB2 Database locale configuration
Territorio de base de datos = en_US;
Página de códigos de base de datos = 819
Conjunto de códigos de base de datos = iso8859-1
Código de país/región de base de datos = 1
Secuencia de clasificación de base de datos = UNIQUE
Orden de clasificación alternativo (ALT_COLLATE) = IDENTITY_16BIT
Tamaño de página de base de datos = 4096
Statements are executed correctly and rows appears correctly in the database for:
en_GB
en_US
es_ES
pt_PT
but not for:
cy_GB
ko_KR
zh_TW
Insert from command line with db2cmd also does not work for this languages (Inserts but with only 1 byte.
Insert from command line in a Linux environment localized as zh_TW works.
Insert from command line in a Linux environment localized as en_US.utf-8 works.
Never work on Java on these environments.
Using "X" as prefix form the VARCHAR field is not an option due some restrictions and the SQL works on two environments.
I think it may be some encoding problem on Client, or server due to configuration, file or sql encoding.
Update:
I tried also to load a UTF-8 file with the SQLs. the file loads correctly and debugging the SQL with UTF-8 characters is correctly passed to the Statement but the result is the same.
new InputStreamReader(new FileInputStream(file),"UTF-8")
...
private void executeLineByLine(Reader reader) throws SQLException {
StringBuffer command = new StringBuffer();
try {
BufferedReader lineReader = new BufferedReader(reader);
String line;
while ((line = lineReader.readLine()) != null) {
command = handleLine(command, line);
}
checkForMissingLineTerminator(command);
} catch (Exception e) {
String message = "Error executing: " + command + ". Cause: " + e;
printlnError(message);
throw new SQLException(message, e);
}
}
private StringBuffer handleLine(StringBuffer command, String line) throws SQLException, UnsupportedEncodingException {
String trimmedLine = line.trim();
if (lineIsComment(trimmedLine)) {
println(trimmedLine);
} else if (commandReadyToExecute(trimmedLine)) {
command.append(line.substring(0, line.lastIndexOf(delimiter)));
command.append(LINE_SEPARATOR);
println(command);
executeStatement(command.toString());
command.setLength(0);
} else if (trimmedLine.length() > 0) {
command.append(line);
command.append(LINE_SEPARATOR);
}
return command;
}
private void executeStatement(String command) throws SQLException, UnsupportedEncodingException {
boolean hasResults = false;
Statement statement = connection.createStatement();
hasResults = statement.execute(command);
printResults(statement, hasResults);
statement.close();
}
Update2:
It's not possible to change the data types. The database is part of other systems and already with data.
The database is installed on 7 different servers on three of it that the data is inserted using Linux in a UTF-8 shell the data was inserted correctly from db2 command line.
From windows db2 command line or using Java it's not possible to insert the characters correctly.
Changing the Java sources to UTF-8 source makes the System.out prints the SQL correctly like i see debugging the sql variable.
When i insert this test SQL. It is shown correctly with chines characters in the System.out and in the Statement internal variable
INSERT INTO LANGS (IDIOMA,PAIS,TRADUC) VALUES ('zh','TW','TEST1 其他 FIN TEST1');
But in the database the test appears as:
TEST3 FIN TEST3
HEX reprentation:
54 45 53 54 33 20 1A 1A 1A 1A 1A 1A 1A 1A 20 46 49 4E 20 54 45 53 54 33
T E S T 3 _ ? ? ? ? ? ? ? ? _ F I N _ T E S T 3
I think that probably DB2 Java client is using allways Windows codepage (in this case is ISO-8859-1 or cp1252) instead of UTF-8 or the server is converting the data using the main collate instead the alternative collate of the table.
Update3:
I installed a Java SQL tool called DbVisualizer and using this tool on windows when a paste in the SQL panel the SQL and run it is inserted correctly in the databse.
This makes me to suspect that is not a problem of installation or data types. Probably are one of this three factors.
Client configuration
Server properties sended when client connects
Driver type of version used
Problem is solved using these steps:
Use always db2jcc4.jar not db2jcc.jar (JDBC 4)
(In some places JDBC level 2 was configured in the OS classpath with db2jcc instead DB2jcc4 )
Set the environment variable DISABLEUNICODE=0
There is a complete information in this page Understanding DB2 Universal Database character conversion about unicode on DB2

Tess-two OCR not working

im trying to get text from an image using tess-two on android.
But its giving me a really bad result
01-16 12:00:25.339: I/Tesseract(native)(29038): Initialized Tesseract API with language=spa
and like 30 seconds later it shows this as result string:
{ga
.,
r¿
y“: A
r M í
:3
' ‘Ev’.-:.. -: A 7
» w- ?" _
Á.» ¿"A ¿rw-V r
mjÏfn 'n’n . Y
' "\'ZA".‘.¡ A‘ :‘ïvAv- « ‘
:"Éf‘Ï'" -Ï«l :‘,.v:...»- .
' RFI' .. ’ g)" 3;:- 1-;4',
= * ¿,arifgggk mw; .1. ,
' "53» "J
't‘ ‘ ¿Las ;.‘».L',-‘»
' ' 'N‘“ "“=: - '. V . ‘9!
5.? ' “F a .“
Y , <_ 7- . 7.-, .
;« z "1:;2wr . A - . ' -»‘ 5“:
“4-”, ¿rn 73:33: w v'.‘ ¿a ‘ A ,z, v VA
...,,« ' 'Q ' ‘ 4 214€. 5 . AV ¿JL y .13:
1 » . 21mm; » ¿ati-“fl ¿ab-1377*“ w”
. x ‘ ‘ ú F v'v:
1 . ' . ; (“ya í .
of course thats not correct, im using this photo:
i have tried it a lot of times, always similar result.
What can be wrong, this is my code using tess-two
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init("/mnt/sdcard/external_sd/tess/", "spa",TessBaseAPI.OEM_TESSERACT_ONLY);
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
Log.d("Texto leido", "texto: "+recognizedText);
baseApi.end();
and this is how i get bitmap from file
BitmapFactory.Options options = new BitmapFactory.Options();
options.inPreferredConfig = Bitmap.Config.ARGB_8888;
Bitmap bitmap = BitmapFactory.decodeFile(photopath.getAbsolutePath(), options);
im using that bitmap on a imageview and it seems correct, so i cant find why its working that bad.
Any idea?
Here change the language code for image text language.
eg: if you want English language text recognition, then use 'eng', or Spanish language for 'spa'
1)
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init("/mnt/sdcard/external_sd/tess/", "eng");
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
Log.d("Texto leido", "texto: "+recognizedText);
baseApi.end();
2)Download language package files from Download here
you must download osd.traineddata.zip file and tesseract-ocr-3.01.eng.tar.zip(here eng for English, spa for Spanish.. etc) files paste into assets folder.
3)before set bitmap convert into gray scale image bitmap

dbf CREATE TABLE throws java.sql.SQLException: Syntax error: Stopped parse at

I have a dbf file, and I can see in the view that types of intersting fields are L ( I suppose it is logical type ) and M (I suppose it's a Memo type)
I try to recreate dbf template using dbf_jdbc, like table:
private static final String TABLE = "create table SAMPLE ( "
+ " SM Logical, "
+ " PRIM MEMO " + ")";
...
String url = "jdbc:DBF:/C:\\TEST";
Connection dbfConn = null;
PreparedStatement ps = null;
...
// instantiate it
Class.forName( "com.hxtt.sql.dbf.DBFDriver" ).newInstance();
dbfConn = DriverManager.getConnection( url, properties );
Statement stmt = dbfConn.createStatement();
stmt.executeUpdate(TABLE);
But i'm getting the following error:
java.sql.SQLException: Syntax error: Stopped parse at MEMO
java.sql.SQLException: Syntax error: Stopped parse at LOGICAL
The reason - type names, because when I use varchar, everythins is fine.
Dbf_jdbc version (from jar manifest file):
Manifest-Version: 1.0
Created-By: HXTT Version Robot
Main-Class: com.hxtt.sql.admin.Admin
Name: com/hxtt/sql/dbf/
Specification-Title: HXTT DBF JDBC 3.0 Package
Implementation-Title: com.hxtt.sql.dbf
Specification-Version: 4.2.056 on April 01, 2009
Specification-Vendor: Hongxin Technology & Trade Ltd.
Comment: JDBC 3.0 Package for Xbase database
Implementation-Version: 4.2.056 on April 01, 2009
Implementation-Vendor: Hongxin Technology & Trade Ltd.
Implementation-URL: http://www.hxtt.com/dbf.html
Name: com/hxtt/sql/admin/
Specification-Title: HXTT Database Admin
Implementation-Title: com.hxtt.sql.admin
Specification-Vendor: Hongxin Technology & Trade Ltd.
Specification-Version: 0.5 on April 01, 2009
Comment: HXTT Database Admin
Implementation-Version: 0.5 on April 01, 2009
Implementation-Vendor: Hongxin Technology & Trade Ltd.
Implementation-URL: http://www.hxtt.com/dbf/dbadmin.html
So my question is which sql type should I use so I could create dbf template using code and when I open a file using dbf viewer I could see letters M and L as type shortnames.
create table SAMPLE ( "
+ " SM BIT , "
+ " PRIM longvarchar" + ")";
SQL Data Types for Create Table at http://www.hxtt.com/dbf/sqlsyntax.html#createtable
I could not find the reason of the problem with dbf_jdbc. I used javadbf framework to create a template. The following example illustrates it:
File file = new File( filePathName );
DBFWriter dbfWriter = new DBFWriter( file );
dbfWriter.setCharactersetName( "cp866" );
DBFField[] fields = new DBFField[ 29 ];
fields[ 0 ] = new DBFField();
fields[ 0 ].setDataType( DBFField.FIELD_TYPE_L );
fields[ 0 ].setName( "SM" );
...
fields[ 19 ] = new DBFField();
fields[ 19 ].setDataType( DBFField.FIELD_TYPE_M );
fields[ 19 ].setName( "PRIM" );
I don't know about java based dbc driver, but an implied abbreviated version is to just use "L" or "M" respectively
create table SAMPLE ( SM L, PRIM M )";
Additionally for some other types
C(?) = character (?=length of character based field)
I = integer
D = date (only date portion)
T = date/time
B(?) = double(?=decimal precision -- ex: B(3) = up to 3 decimals )
dBase III files support:
Char name C(40)
Date birth D
Logical member L
Memo desc M
Numeric rate N(6, 2)
The first letter of the type is what you want to use.
Additionally, other dbf formats allow:
Currency price Y (note Y, not C)
DateTime appt T (note T, not D)
Double mass B (note B, not D)
Float (same as Numeric)
General bin_data G
Integer age I
Picture photo P
Currency, Double, Integer, General, and Picture all store the data as binary, while the others store the data as text.

Categories