Is it possible to code java in another language? - java

I was curious as to whether or not a person could code in another language.
I DON'T mean naming your variables in different languages like this:
String[] tableau = {"Janvier", "Fevrier"};
System.out.println(String[0].length);
But more like
Chaîne[] tableau = {"Janvier", "Fevrier"};
Système.sortie.imprimeln(Chaîne[0].longueur);
Is that doable or would you need to write your own french or [insert language] based coding language?

If you were to use a Preprocessor to accomplish this - I believe it would work perfectly well. Java does not ship with one, but C and C++ did (e.g. cpp) - So, you could add a step in your build chain to perform preprocessing and then your code would be translated into the "hosted" English Java before being compiled. For another example, consider the language CofeeScript; a language that translates itself into JavaScript. So, as long as your mapping is one for one equivalent I believe the answer is Oui.

No.
You could write your own frontend to convert "French Java source" to "English Java" (either bytecode or source) for the base language, but you are still going to have problems with all the libraries and any 3rd party tools which will still be English based.

Java syntax should be written in plain English. JVM will use English syntax to convert into byte code. Surely you can display labels, message in other languages using Locale.

Programming languages need not change their library identifiers, and built-in keywords, from one language to another.
If you're programming in Java, you can use UTF-8 for encoding your source files.
You can then use Unicode symbols such as characters from languages other than English in your own identifiers.
You can name your own type Chaîne; but the String type stays String, and keywords like if or for or public stay English.
The concept of localizing the keywords of a language has been tried. For instance in the obscure language Protium. (I'd give a link if all leads weren't defunct; but Rosetta Code has some examples.) In Protium, all symbols are made up of character trigraphs to create semi-readable abbreviations. For instance, this code snippet which is rendered in English:
<# SAI>
<# ITEFORLI3>2121|2008|
<# LETVARCAP>Christmas Day|25-Dec-<# SAYVALFOR>...</#></#>
<# TSTDOWVARLIT>Christmas Day|1</#>
<# IFF>
<# SAYCAP>Christmas Day <# SAYVALFOR>...</#> is a Sunday</#><# SAYKEY>__Newline</#>
</#>
</#>
</#>
Now the idea is that these trigraphs, like LET VAR CAP, which make up an identifier like LETVARCAP, individually map to some corresponding trigraphs in other languages. Or perhaps, in the case of languages with complex characters in their writing system like Chinese or Japanese, to a single ideographic character.
Make of it what you will.

Related

Getting the alphabet of a language from the Locale in Java

I am making an internationalized app in Java. I need a list of all the letters in a language, starting from the Locale. There are some questions like Alphabet constant in Java? or Create Alphabet List from list : Java which touch on the issue, but I'm wondering is there a Utils class or something where it's already defined and where I can get a list of chars or a String containing all the letters in the alphabet of a language by it's Locale.
You can refer this library and methods in detail, com.ibm.icu.util.LocaleData. Pass argument as Locale.ENGLISH to get alphabets of English.
There are several issues here.
First, I have to point out that there are many languages that aren't alphabetic. Obviously, Chinese, or Japanese are examples of ideographic languages. Unfortunately, it will be very hard, next to impossible to create a list of all the characters in these languages.
Second, although Common Locale Data Repository and as a consequence ICU have predefined sets of index exemplars and example characters this information is far from being complete.
Third, there are languages that use more than one script (aka writing system). Depending on the source of your locale you may or may not know which characters needs to be displayed.
Finally, it is hard to give you right answer when you haven't provided your use case. The design of your application may impose serious limitations on usability or localizability...

Parser tokens like those in PHP in other languages?

Short version:
Is there something similar to PHP parser tokens in other interpreted languages (Python, Ruby, etc.) and compiled languages (the C family, Java, etc.)?
Long Version:
On the CPP Rocks website there is an article showing a visual comparison of language complexity by means of a graph that breaks down the various building blocks of a language into categories. The graph for Coffeescript looks like this:
I wanted to make such a graph for PHP using the parser tokens as a starting point (to make sure I don't miss anything and because I'm lazy). I was wondering if there is something similar to these tokens in other in other interpreted languages (Python, Ruby, etc.) and compiled languages (the C family, Java, etc.).
Findings thus far:
Java: the Chapters of the Language Specification describing Syntax and Lexical Structure seem a good place to start.
Python: Chapter 2 of the manual does describe Python's lexical structure.
Ruby: the token list for Ruby.
All parsers make the input into tokens. The language may or may not show what those tokens are, and of course, the actual meaning and names of tokens vary, and of course, since different languages have different syntax, set of reserved words and other constructs, each language will have a slightly different set of tokens.
A token here is just a "named representation of the actual symbol in the language specification". So for example, the parser will see the word break as input, and make it into the token T_BREAK.
For the type of graph you are looking at, you need to know what the different language constructs are, categorise and then show them graphically - I'm not sure looking at the list of tokens is the best way to achieve that.
There is no such thing as a list of tokens for a language. Tokens are a property of the parser (more precisely: they are a property of the interface between the lexer and the parser), not the language. A different parser parsing the same language may use a completely different set of tokens. Many modern parsers are lexerless which means they don't have tokens at all.
In Ruby, for example, Melbourne (the parser used by Rubinius) uses a very different set of tokens than RedParse/RubyLexer (used by all sorts of projects) which again uses a very different set of tokens than the ANTLR-based parser used by XRuby and Sapphire in Steel.

Convert language code to (dojo) locale code

I have a system that returns me language codes, and I wish to use the same to compute the corresponding locale code that Dojo accepts.
Is there a neat way to do this?
(not sure what kind of mapping can be used)
Given that you've tagged this with Java, I'm guessing you're getting a java.util.Locale. Convert the underscores to dashes and make everything lowercase.

How can I parse REXX code in Java?

I'd like to parse REXX source so that I can analyse the structure of the program from Java.
I need to do things like normalise equivalent logic structures in the source that are syntactically different, find duplicate variable declarations, etc. and I already have a Java background.
Any easier ways to do this than writing a load of code?
REXX is not an easy language to parse with common tools, especially those that expect a BNF grammar. Unlike most languages designed by people exposed to C, REXX doesn't have any reserved words, making the task somewhat complicated. Every term that looks like a reserved word is actually only resolved in its specific context (e.g., "PULL" is only reserved as the first word of a PULL instruction or the second word of a PARSE PULL instruction - you can also have a variable called PULL ("PULL = 1 + 2")). Plus there are some very surprising effects of comments. But the ANSI REXX standard has the full syntax and all the rules.
If you have BNF Rexx grammar, then javacc can help you build an AST (Abstract Syntax Tree) representation of that Rexx code.
More accurately, javacc will build the Java classes which will :
parse Rexx code and
actually builds the AST.
There would still be "load of code", but you would not to be the one doing the writing of the classes for that Rexx code parser. Only its generation.
Have a look at ANTLR, it really does a nice work of building an AST, transforming it etc...
It has a nice editor (ANTLRWorks), is built on Java, and can debug your parser / tree walkers while they run in your application. Really worth investigating for any kind of parsing job.

How can I determine what the alphabet for a locale is in java?

I would like to determine what the alphabet for a given locale is, preferably based on the browser Accept-Language header values. Anyone know how to do this, using a library if necessary ?
take a look at [LocaleData.getExemplarSet][1]
for example for english this returns abcdefghijklmnopqrstuvwxyz
[1]: http://icu-project.org/apiref/icu4j/com/ibm/icu/util/LocaleData.html#getExemplarSet(com.ibm.icu.util.ULocale, int)
If you just want to know the name of an appropriate character set for a users locale then you might try the nio.CharSet class.
If you really want to use the Accept-Language header, then there's an old O'Reilly article on this matter which introduces a pretty handy class called LanguageNegotiator.
I think one of those will give you a decent enough start.
It depends on how specific you want to get. One place to look would be at the "Suppress-Script" properties in the IANA language registry.
Some languages have multiple "alphabets" that can be used for writing. For example, Azerbaijani can be written in Latin or Arabic script. Most languages, like English, are written almost exclusively in a single script, so the correct script goes without saying, and should be "suppressed" in language codes.
So, looking at the entry for Russian, you can tell that the preferred script is Cyrillic, while for Ethiopian, it is Amharic. But German, Norwegian, and English aren't more specific than "Latin". So, with this method, you'd have a hard time hiding umlauts and thorns from Americans, or offering any script to a Kashmiri writer.
This is an English answer written in Århus. Yesterday, I heard some Germans say 'Blödheit, à propos, ist dumm'. However, one of them wore a shirt that said 'I know the difference between 文字 and الْعَرَبيّة'.
What's the answer to your question for this text? Is it allowed? Isn't this an English text?
The International Components for Unicode might help here. Specifically the UScript class looks promising.
Out of curiosity: What do you need it for?

Categories