Convert foreign characthers (Greek) to English ones.

Convert foreign characthers (Greek) to English ones. - java

I am installing alarms, but they have an interface that I have to give name to various zones. The problem is that they can only accept greek characters that does not exist in english language else they have to be the English equivelant.
For example if I write "ΠΑΡΑΘΥΡΟ", the characthers 1,2,3,5,6,7 must enter in english because are the same with the greek ones in appearence. But chars 0 and 4 only must be in Greek.
I care only for capitals.
Any idea on how to do it with 2 simple jtextfields ?
Thank you!

Use a HashMap to translate characters. Since the problem domain is small and will probably never change, it's justifiable to hard-code the content of the map, like so:
private static final Map<Character, Character> GREEK_TO_ROMAN = new HashMap<>();
static {
GREEK_TO_ROMAN.put('\u0391', '\u0041'); // uppercase alpha
GREEK_TO_ROMAN.put('\u03A1', '\u0050'); // uppercase rho
// ...
}
Then get the input string's character array, translate characters as needed, and create a new String from the changed array:
String s = "ΠΑΡΑΘΥΡΟ";
char[] chars = s.toCharArray();
for (int i = 0; i < chars.length; i++) {
Character repl = GREEK_TO_ROMAN.get(chars[i]);
if (repl != null)
chars[i] = repl;
}
s = new String(chars);
How JTextField would come into play I don't quite see, but maybe if you want you can subclass it, overwrite the getText() method and make sure that any String it yields is already converted.

Related

Converting letters to alphabet position with two JLists

I'm trying to replace all words (alphabet letters) from JList1 to the number corresponding its place in the alphabet to JList2 with the press of the Run button. (ex. A to 01) And if it's not an English alphabet letter then leaving it as it is. Capitalization doesn't matter (a and A is still 01) and spaces should be kept.
For visual purposes:
"Apple!" should be converted to "0116161205!"
"stack Overflow" to "1920010311 1522051806121523"
"über" to "ü020518"
I have tried a few methods I found on here, but had zero clue how to add the extra 0 in front of the first 9 letters or keep the spaces. Any help is much appreciated.

Here is a solution :
//Create a Map of character and equivalent number
Map<Character, String> lettersToNumber = new HashMap<>();
int i = 1;
for(char c = 'a'; c <= 'z'; c++) {
lettersToNumber.put(c, String.format("%02d", i++));
}
//Loop over the characters of your input and the corresponding number
String result = "";
for(char c : "Apple!".toCharArray()) {
char x = Character.toLowerCase(c);
result+= lettersToNumber.containsKey(x) ? lettersToNumber.get(x) : c;
}
Input, Output
Apple! => 0116161205!
stack Overflow => 1920010311 1522051806121523
über => ü020518

So given...
(ex. A to 01) And if it's not an English alphabet letter then leaving it as it is. Capitalization doesn't matter (a and A is still 01) and spaces should be kept.
This raises some interesting points:
We don't care about non-english characters, so we can dispense with issues around UTF encoding
Capitalization doesn't matter
Spaces should be kept
The reason these points are interesting to me is it means we're only interested in a small subset of characters (1-26). This immediately screams "ASCII" to me!
This provides an immediate lookup table which doesn't require us to produce anything up front, it's immediately accessible.
A quick look at any ascii table provides us with all the information we need. A-Z is in the range of 65-90 (since we don't care about case, we don't need to worry about the lower case range.
But how does that help us!?
Well, this now means the primary question becomes, "How do we convert a char to an int?", which is amazingly simple! A char can be both a "character" and a "number" at the same time, because of the ASCII encoding support!
So if you were to print out (int)'A', it would print 65! And since all the characters are in order, we just need to subtract 64 from 65 to get 1!
That's basically your entire problem solved right there!
Oh, okay, you need to deal with the edge cases of characters not falling between A-Z, but that's just a simple if statement
A solution based on the above "might" look something like...
public static String convert(String text) {
int offset = 64;
StringBuilder sb = new StringBuilder(32);
for (char c : text.toCharArray()) {
char input = Character.toUpperCase(c);
int value = ((int) input) - offset;
if (value < 1 || value > 25) {
sb.append(c);
} else {
sb.append(String.format("%02d", value));
}
}
return sb.toString();
}
Now, there are a number of ways you might approach this, I've chosen a path based on my understanding of the problem and my experience.
And based on your example input...
String[] test = {"Apple!", "stack Overflow", "über"};
for (String value : test) {
System.out.println(value + " = " + convert(value));
}
would produce the following output...
Apple! = 0116161205!
stack Overflow = 1920010311 1522051806121523
über = ü020518

Change lowercase and uppercase of characters in java

If I want to create a dictionary where the user can create a custom alphabet (that still uses unicode) Is there a way to change lowercase and uppercase mapping of the characters?
Let's say I want the lowercase of 'I' to be 'ı' instead of 'i' or upperCase of 'b' to be 'P' instead of 'B' so that System.out.println("PAI".toLowerCase()); would write baı to the console.
I suppose I can create a method toLowerCase(String s) that first replaces "P" with "b"s then converts to lowercase but wouldn't that be slower when searching through a dictionary of hundreds of thousands of words?

The toLowerCase(String s) uses the locale to decide how to convert the characters, you should have to define your own locale and then, for example, load it as the default locale via Locale.setDefault(Locale) before executing the toLowerCase(String s)

No, it would not be slower because you are simply traversing through the array and not modifying the position of any object which would result in O(n). Performance wouldn't be affected, and any system should be able to handle a single conversion and then toLowerCase call easily.
You could also override the toLowerCase(String s) function to accommodate your needs. Even simpler!

This should do the trick:
import java.util.HashMap;
import java.util.Map;
class MyString {
String string;
static final Map<Character, Character> toLowerCaseMap, toUpperCaseMap;
static {
toLowerCaseMap = new HashMap<>();
toLowerCaseMap.put('I', '|');
toUpperCaseMap = new HashMap<>();
toUpperCaseMap.put('b', 'P');
}
MyString(String string) {
this.string = string;
}
String toLowerCase() {
char[] chars = string.toCharArray();
for (int i = 0; i < chars.length; i++) {
char c = chars[i];
chars[i] = toLowerCaseMap.containsKey(c) ? toLowerCaseMap.get(c) : Character.toLowerCase(c);
}
return new String(chars);
}
String toUpperCase() {
char[] chars = string.toCharArray();
for (int i = 0; i < chars.length; i++) {
char c = chars[i];
chars[i] = toUpperCaseMap.containsKey(c) ? toUpperCaseMap.get(c) : Character.toUpperCase(c);
}
return new String(chars);
}
}

Check this Answer you cannot inherits from String Class because its final, but you could create your class with your toLowerCase Method, I suggest you called diferents for maintenance.
And for the dictionary of hundreds of thousands of words....
Maybe you use a Map or HashMap with the key will be the string enter by the user and in the object you maybe save automatically the value in lowerCase, it depends of what you need.
But for get better performance I could recommend save the value in Database
Regards.

Run-length decompression

CS student here. I want to write a program that will decompress a string that has been encoded according to a modified form of run-length encoding (which I've already written code for). For instance, if a string contains 'bba10' it would decompress to 'bbaaaaaaaaaa'. How do I get the program to recognize that part of the string ('10') is an integer?
Thanks for reading!

A simple regex will do.
final Matcher m = Pattern.compile("(\\D)(\\d+)").matcher(input);
final StringBuffer b = new StringBuffer();
while (m.find())
m.appendReplacement(b, replicate(m.group(1), Integer.parseInt(m.group(2))));
m.appendTail(b);
where replicate is
String replicate(String s, int count) {
final StringBuilder b = new StringBuilder(count);
for (int i = 0; i < count; i++) b.append(s);
return b.toString();
}

Not sure whether this is one efficient way, but just for reference
for (int i=0;i<your_string.length();i++)
if (your_string.charAt(i)<='9' && your_string.charAt(i)>='0')
integer_begin_location = i;

I think you can divide chars in numeric and not numeric symbols.
When you find a numeric one (>0 and <9) you look to the next and choose to enlarge you number (current *10 + new) or to expand your string

Assuming that the uncompressed data does never contain digits: Iterate over the string, character by character until you get a digit. Then continue until you have a non-digit (or end of string). The digits inbetween can be parsed to an integer as others already stated:
int count = Integer.parseInt(str.substring(start, end));

Here is a working implementation in python. This also works fine for 2 or 3 or multiple digit numbers
inputString="a1b3s22d4a2b22"
inputString=inputString+"\0" //just appending a null char
charcount=""
previouschar=""
outputString=""
for char in inputString:
if char.isnumeric():
charcount=charcount+char
else:
outputString=outputString
if previouschar:
outputString=outputString+(previouschar*int(charcount))
charcount=""
previouschar=char
print(outputString) // outputString= abbbssssssssssssssssssssssddddaabbbbbbbbbbbbbbbbbbbbbb

Presuming that you're not asking about the parsing, you can convert a string like "10" into an integer like this:
int i = Integer.parseInt("10");

Remove all non alphabetic characters from a String array in java

I'm trying to write a method that removes all non alphabetic characters from a Java String[] and then convert the String to an lower case string. I've tried using regular expression to replace the occurence of all non alphabetic characters by "" .However, the output that I am getting is not able to do so. Here is the code
static String[] inputValidator(String[] line) {
for(int i = 0; i < line.length; i++) {
line[i].replaceAll("[^a-zA-Z]", "");
line[i].toLowerCase();
}
return line;
}
However if I try to supply an input that has non alphabets (say - or .) the output also consists of them, as they are not removed.
Example Input
A dog is an animal. Animals are not people.
Output that I'm getting
A
dog
is
an
animal.
Animals
are
not
people.
Output that is expected
a
dog
is
an
animal
animals
are
not
people

The problem is your changes are not being stored because Strings are immutable. Each of the method calls is returning a new String representing the change, with the current String staying the same. You just need to store the returned String back into the array.
line[i] = line[i].replaceAll("[^a-zA-Z]", "");
line[i] = line[i].toLowerCase();
Because the each method is returning a String you can chain your method calls together. This will perform the second method call on the result of the first, allowing you to do both actions in one line.
line[i] = line[i].replaceAll("[^a-zA-Z]", "").toLowerCase();

You need to assign the result of your regex back to lines[i].
for ( int i = 0; i < line.length; i++) {
line[i] = line[i].replaceAll("[^a-zA-Z]", "").toLowerCase();
}

It doesn't work because strings are immutable, you need to set a value
e.g.
line[i] = line[i].toLowerCase();

You must reassign the result of toLowerCase() and replaceAll() back to line[i], since Java String is immutable (its internal value never changes, and the methods in String class will return a new String object instead of modifying the String object).

As it already answered , just thought of sharing one more way that was not mentioned here >
str = str.replaceAll("\\P{Alnum}", "").toLowerCase();

A cool (but slightly cumbersome, if you don't like casting) way of doing what you want to do is go through the entire string, index by index, casting each result from String.charAt(index) to (byte), and then checking to see if that byte is either a) in the numeric range of lower-case alphabetic characters (a = 97 to z = 122), in which case cast it back to char and add it to a String, array, or what-have-you, or b) in the numeric range of upper-case alphabetic characters (A = 65 to Z = 90), in which case add 32 (A + 22 = 65 + 32 = 97 = a) and cast that to char and add it in. If it is in neither of those ranges, simply discard it.

You can also use Arrays.setAll for this:
Arrays.setAll(array, i -> array[i].replaceAll("[^a-zA-Z]", "").toLowerCase());

Here is working method
String name = "Joy.78#,+~'{/>";
String[] stringArray = name.split("\\W+");
StringBuilder result = new StringBuilder();
for (int i = 0; i < stringArray.length; i++) {
result.append(stringArray[i]);
}
String nameNew = result.toString();
nameNew.toLowerCase();

public static void solve(String line){
// trim to remove unwanted spaces
line= line.trim();
String[] split = line.split("\\W+");
// print using for-each
for (String s : split) {
System.out.println(s);
}

Java: looking for the fastest way to check String for presence of Unicode chars in certain range

I need to implement a very crude language identification algorithm. In my world, there are only two languages: English and not-English. I have ArrayList and I need to determine if each String is likely in English or the other language which has its Unicode chars in a certain range. So what I want to do is to check each String against this range using some type of "presence" test. If it passes the test, I say the String is not English, otherwise it's English. I want to try two type of tests:
TEST-ANY: If any char in the string falls within the range, the string passes the test
TEST-ALL: If all chars in the string fall within the range, the string passes the test
Since the array might be very long, I need to implement this very efficiently. What would be the fastest way of doing this in Java?
Thx
UPDATE: I am specifically checking for non-English by looking at a specific range of Unicodes rather then checking for whether the characters are ASCII, in part to take care of the "resume" problem mentioned below. What I am trying to figure out is whether Java provides any classes/methods that essentially implement TEST-ANY or TEST-ALL (or another similar test) as efficiently as possible. In other words, I am trying to avoid reinventing the wheel especially if the wheel invented before me is better anyway.

Here's how I ended up implementing TEST-ANY:
// TEST-ANY
String str = "wordToTest";
int UrangeLow = 1234; // can get range from e.g. http://www.utf8-chartable.de/unicode-utf8-table.pl
int UrangeHigh = 2345;
for(int iLetter = 0; iLetter < str.length() ; iLetter++) {
int cp = str.codePointAt(iLetter);
if (cp >= UrangeLow && cp <= UrangeHigh) {
// word is NOT English
return;
}
}
// word is English
return;

I really don't think that this solution is ideal for determining language, but if you want to check to see if a string is all ascii, you could do something like this:
public static boolean isASCII(String s){
boolean ret = true;
for(int i = 0; i < s.length() ; i++) {
if(s.charAt(i)>=128){
ret = false;
break;
}
}
return ret;
}
So then if you try this:
boolean r = isASCII("Hello");
r would equal true. But if you try:
boolean r = isASCII("Grüß dich");
then r would equal false. I haven't tested performance, but this would work reasonably fast, because all it does is compare a character to the number 128.
But as #AlexanderPogrebnyak mentioned in the comments above, this will return false if you give it "résumé". Be aware of that.
Update:
I am specifically checking for non-English by looking at a specific range of Unicodes rather then checking for whether the characters are ASCII
But ASCII is a range in Unicode (well at least in UTF-8). Unicode is just an extension of ASCII. What the code #mP. and I provided does is it checks to see whether each character is in a certain range. I chose that range to be ASCII, which is any Unicode character that has a decimal value of less than 128. You can just as well choose any other range. But the reason I chose ASCII is because it's the one with the Latin alphabet, the Arabic numbers, and some other common characters that would normally be in an 'English' string.

public static boolean isAscii( String s ){
int length = s.length;
for( int i = 0; i < length; i++){
final char c = s.charAt( i );
if( c > 'z' ){
return false;
}
}
return true;
}
#Hassan thanks for picking the typo replaced test against big Z with little z.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert foreign characthers (Greek) to English ones. - java

Related

Converting letters to alphabet position with two JLists

Change lowercase and uppercase of characters in java

Run-length decompression

Remove all non alphabetic characters from a String array in java

Java: looking for the fastest way to check String for presence of Unicode chars in certain range

Categories

Resources