If I want to create a dictionary where the user can create a custom alphabet (that still uses unicode) Is there a way to change lowercase and uppercase mapping of the characters?
Let's say I want the lowercase of 'I' to be 'ı' instead of 'i' or upperCase of 'b' to be 'P' instead of 'B' so that System.out.println("PAI".toLowerCase()); would write baı to the console.
I suppose I can create a method toLowerCase(String s) that first replaces "P" with "b"s then converts to lowercase but wouldn't that be slower when searching through a dictionary of hundreds of thousands of words?
The toLowerCase(String s) uses the locale to decide how to convert the characters, you should have to define your own locale and then, for example, load it as the default locale via Locale.setDefault(Locale) before executing the toLowerCase(String s)
No, it would not be slower because you are simply traversing through the array and not modifying the position of any object which would result in O(n). Performance wouldn't be affected, and any system should be able to handle a single conversion and then toLowerCase call easily.
You could also override the toLowerCase(String s) function to accommodate your needs. Even simpler!
This should do the trick:
import java.util.HashMap;
import java.util.Map;
class MyString {
String string;
static final Map<Character, Character> toLowerCaseMap, toUpperCaseMap;
static {
toLowerCaseMap = new HashMap<>();
toLowerCaseMap.put('I', '|');
toUpperCaseMap = new HashMap<>();
toUpperCaseMap.put('b', 'P');
}
MyString(String string) {
this.string = string;
}
String toLowerCase() {
char[] chars = string.toCharArray();
for (int i = 0; i < chars.length; i++) {
char c = chars[i];
chars[i] = toLowerCaseMap.containsKey(c) ? toLowerCaseMap.get(c) : Character.toLowerCase(c);
}
return new String(chars);
}
String toUpperCase() {
char[] chars = string.toCharArray();
for (int i = 0; i < chars.length; i++) {
char c = chars[i];
chars[i] = toUpperCaseMap.containsKey(c) ? toUpperCaseMap.get(c) : Character.toUpperCase(c);
}
return new String(chars);
}
}
Check this Answer you cannot inherits from String Class because its final, but you could create your class with your toLowerCase Method, I suggest you called diferents for maintenance.
And for the dictionary of hundreds of thousands of words....
Maybe you use a Map or HashMap with the key will be the string enter by the user and in the object you maybe save automatically the value in lowerCase, it depends of what you need.
But for get better performance I could recommend save the value in Database
Regards.
Related
I'm currently trying to loop through a String and identity a specific character within that string then add a specific character following on from the originally identified character.
For example using the string: aaaabbbcbbcbb
And the character I want to identify being: c
So every time a c is detected a following c will be added to the string and the loop will continue.
Thus aaaabbbcbbcbb will become aaaabbbccbbccbb.
I've been trying to make use of indexOf(),substring and charAt() but I'm currently either overriding other characters with a c or only detecting one c.
I know you've asked for a loop, but won't something as simple as a replace suffice?
String inputString = "aaaabbbcbbcbb";
String charToDouble = "c";
String result = inputString.replace(charToDouble, charToDouble+charToDouble);
// or `charToDouble+charToDouble` could be `charToDouble.repeat(2)` in JDK 11+
Try it online.
If you insist on using a loop however:
String inputString = "aaaabbbcbbcbb";
char charToDouble = 'c';
String result = "";
for(char c : inputString.toCharArray()){
result += c;
if(c == charToDouble){
result += c;
}
}
Try it online.
Iterate over all the characters. Add each one to a StringBuilder. If it matches the character you're looking for then add it again.
final String test = "aaaabbbcbbcbb";
final char searchChar = 'c';
final StringBuilder builder = new StringBuilder();
for (final char c : test.toCharArray())
{
builder.append(c);
if (c == searchChar)
{
builder.append(c);
}
}
System.out.println(builder.toString());
Output
aaaabbbccbbccbb
You probably are trying to modify a String in java. Strings in Java are immutable and cannot be changed like one might do in c++.
You can use StringBuilder to insert characters. eg:
StringBuilder builder = new StringBuilder("acb");
builder.insert(1, 'c');
The previous answer suggesting String.replace is the best solution, but if you need to do it some other way (e.g. for an exercise), then here's a 'modern' solution:
public static void main(String[] args) {
final String inputString = "aaaabbbcbbcbb";
final int charToDouble = 'c'; // A Unicode codepoint
final String result = inputString.codePoints()
.flatMap(c -> c == charToDouble ? IntStream.of(c, c) : IntStream.of(c))
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
assert result.equals("aaaabbbccbbccbb");
}
This looks at each character in turn (in an IntStream). It doubles the character if it matches the target. It then accumulates each character in a StringBuilder.
A micro-optimization can be made to pre-allocate the StringBuilder's capacity. We know the maximum possible size of the new string is double the old string, so StringBuilder::new can be replaced by () -> new StringBuilder(inputString.length()*2). However, I'm not sure if it's worth the sacrifice in readability.
I need to split a String into an array of single character Strings.
Eg, splitting "cat" would give the array "c", "a", "t"
"cat".split("(?!^)")
This will produce
array ["c", "a", "t"]
"cat".toCharArray()
But if you need strings
"cat".split("")
Edit: which will return an empty first value.
String str = "cat";
char[] cArray = str.toCharArray();
If characters beyond Basic Multilingual Plane are expected on input (some CJK characters, new emoji...), approaches such as "a💫b".split("(?!^)") cannot be used, because they break such characters (results into array ["a", "?", "?", "b"]) and something safer has to be used:
"a💫b".codePoints()
.mapToObj(cp -> new String(Character.toChars(cp)))
.toArray(size -> new String[size]);
split("(?!^)") does not work correctly if the string contains surrogate pairs. You should use split("(?<=.)").
String[] splitted = "花ab🌹🌺🌷".split("(?<=.)");
System.out.println(Arrays.toString(splitted));
output:
[花, a, b, 🌹, 🌺, 🌷]
To sum up the other answers...
This works on all Java versions:
"cat".split("(?!^)")
This only works on Java 8 and up:
"cat".split("")
An efficient way of turning a String into an array of one-character Strings would be to do this:
String[] res = new String[str.length()];
for (int i = 0; i < str.length(); i++) {
res[i] = Character.toString(str.charAt(i));
}
However, this does not take account of the fact that a char in a String could actually represent half of a Unicode code-point. (If the code-point is not in the BMP.) To deal with that you need to iterate through the code points ... which is more complicated.
This approach will be faster than using String.split(/* clever regex*/), and it will probably be faster than using Java 8+ streams. It is probable faster than this:
String[] res = new String[str.length()];
int 0 = 0;
for (char ch: str.toCharArray[]) {
res[i++] = Character.toString(ch);
}
because toCharArray has to copy the characters to a new array.
for(int i=0;i<str.length();i++)
{
System.out.println(str.charAt(i));
}
Maybe you can use a for loop that goes through the String content and extract characters by characters using the charAt method.
Combined with an ArrayList<String> for example you can get your array of individual characters.
If the original string contains supplementary Unicode characters, then split() would not work, as it splits these characters into surrogate pairs. To correctly handle these special characters, a code like this works:
String[] chars = new String[stringToSplit.codePointCount(0, stringToSplit.length())];
for (int i = 0, j = 0; i < stringToSplit.length(); j++) {
int cp = stringToSplit.codePointAt(i);
char c[] = Character.toChars(cp);
chars[j] = new String(c);
i += Character.charCount(cp);
}
In my previous answer I mixed up with JavaScript. Here goes an analysis of performance in Java.
I agree with the need for attention on the Unicode Surrogate Pairs in Java String. This breaks the meaning of methods like String.length() or even the functional meaning of Character because it's ultimately a technical object which may not represent one character in human language.
I implemented 4 methods that split a string into list of character-representing strings (Strings corresponding to human meaning of characters). And here's the result of comparison:
A line is a String consisting of 1000 arbitrary chosen emojis and 1000 ASCII characters (1000 times <emoji><ascii>, total 2000 "characters" in human meaning).
(discarding 256 and 512 measures)
Implementations:
codePoints (java 11 and above)
public static List<String> toCharacterStringListWithCodePoints(String str) {
if (str == null) {
return Collections.emptyList();
}
return str.codePoints()
.mapToObj(Character::toString)
.collect(Collectors.toList());
}
classic
public static List<String> toCharacterStringListWithIfBlock(String str) {
if (str == null) {
return Collections.emptyList();
}
List<String> strings = new ArrayList<>();
char[] charArray = str.toCharArray();
int delta = 1;
for (int i = 0; i < charArray.length; i += delta) {
delta = 1;
if (i < charArray.length - 1 && Character.isSurrogatePair(charArray[i], charArray[i + 1])) {
delta = 2;
strings.add(String.valueOf(new char[]{ charArray[i], charArray[i + 1] }));
} else {
strings.add(Character.toString(charArray[i]));
}
}
return strings;
}
regex
static final Pattern p = Pattern.compile("(?<=.)");
public static List<String> toCharacterStringListWithRegex(String str) {
if (str == null) {
return Collections.emptyList();
}
return Arrays.asList(p.split(str));
}
Annex (RAW DATA):
codePoints;classic;regex;lines
45;44;84;256
14;20;98;512
29;42;91;1024
52;56;99;2048
87;121;174;4096
175;221;375;8192
345;411;839;16384
667;826;1285;32768
1277;1536;2440;65536
2426;2938;4238;131072
We can do this simply by
const string = 'hello';
console.log([...string]); // -> ['h','e','l','l','o']
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax says
Spread syntax (...) allows an iterable such as an array expression or string to be expanded...
So, strings can be quite simply spread into arrays of characters.
I want to loop through the alphabet with a for loop and add each letter to my HashMap
for(char alphabet = 'A'; alphabet <= 'Z';alphabet++) {
System.out.println(alphabet);
}
doesn't work for me, because my HashMap is of form
HashMap<String, HashMap<String, Student>> hm;
I need my iterator to be a string, but
for(String alphabet = 'A'; alphabet <= 'Z';alphabet++) {
System.out.println(alphabet);
}
doesn't work.
Basically, I want to do this:
for i from 'A' to 'Z' do
hm.put(i, null);
od
Any ideas?
Basically convert the char to a string, like this:
for(char alphabet = 'A'; alphabet <= 'Z';alphabet++) {
hm.put(""+alphabet, null);
}
Although ""+alphabet is not efficient as it boils down to a call to StringBuilder
The equivalent but more effective way can be
String.valueOf(alphabet)
or
Character.toString(alphabet)
which are actually the same.
You cannot assign a char to a String, or to increment a string with ++. You can iterate on the char the way you did in your first sample, and convert that char to a String, like this:
for(char letter = 'A'; letter <= 'Z'; letter++) {
String s = new String(new char[] {letter});
System.out.println(s);
}
First problem: When you work with a HashMap, you are supposed to map a key to a value. You don't just put something in a hash map. The letter you wanted to put, is it a value? Then what is the key? Is it a key? Then what is the value?
You might think that using "null" as a value is a good idea, but you should ask yourself: in that case, should I use a map at all? Maybe using a HashSet is a better idea?
The second problem is that a HashMap, like all java collections, only takes objects - both as keys and as values. If you want to use a character as a key, you could define your map as Map<Character,Map<String,Student>>, which will auto-box your character (convert it to an object of type Character) or you could convert the character to a string using
Character.toString(alphabet);
In Java 8 with Stream API, you can do this.
IntStream.rangeClosed('A', 'Z').mapToObj(var -> String.valueOf((char) var)).forEach(System.out::println);
I am installing alarms, but they have an interface that I have to give name to various zones. The problem is that they can only accept greek characters that does not exist in english language else they have to be the English equivelant.
For example if I write "ΠΑΡΑΘΥΡΟ", the characthers 1,2,3,5,6,7 must enter in english because are the same with the greek ones in appearence. But chars 0 and 4 only must be in Greek.
I care only for capitals.
Any idea on how to do it with 2 simple jtextfields ?
Thank you!
Use a HashMap to translate characters. Since the problem domain is small and will probably never change, it's justifiable to hard-code the content of the map, like so:
private static final Map<Character, Character> GREEK_TO_ROMAN = new HashMap<>();
static {
GREEK_TO_ROMAN.put('\u0391', '\u0041'); // uppercase alpha
GREEK_TO_ROMAN.put('\u03A1', '\u0050'); // uppercase rho
// ...
}
Then get the input string's character array, translate characters as needed, and create a new String from the changed array:
String s = "ΠΑΡΑΘΥΡΟ";
char[] chars = s.toCharArray();
for (int i = 0; i < chars.length; i++) {
Character repl = GREEK_TO_ROMAN.get(chars[i]);
if (repl != null)
chars[i] = repl;
}
s = new String(chars);
How JTextField would come into play I don't quite see, but maybe if you want you can subclass it, overwrite the getText() method and make sure that any String it yields is already converted.
I'm trying to write a method that removes all non alphabetic characters from a Java String[] and then convert the String to an lower case string. I've tried using regular expression to replace the occurence of all non alphabetic characters by "" .However, the output that I am getting is not able to do so. Here is the code
static String[] inputValidator(String[] line) {
for(int i = 0; i < line.length; i++) {
line[i].replaceAll("[^a-zA-Z]", "");
line[i].toLowerCase();
}
return line;
}
However if I try to supply an input that has non alphabets (say - or .) the output also consists of them, as they are not removed.
Example Input
A dog is an animal. Animals are not people.
Output that I'm getting
A
dog
is
an
animal.
Animals
are
not
people.
Output that is expected
a
dog
is
an
animal
animals
are
not
people
The problem is your changes are not being stored because Strings are immutable. Each of the method calls is returning a new String representing the change, with the current String staying the same. You just need to store the returned String back into the array.
line[i] = line[i].replaceAll("[^a-zA-Z]", "");
line[i] = line[i].toLowerCase();
Because the each method is returning a String you can chain your method calls together. This will perform the second method call on the result of the first, allowing you to do both actions in one line.
line[i] = line[i].replaceAll("[^a-zA-Z]", "").toLowerCase();
You need to assign the result of your regex back to lines[i].
for ( int i = 0; i < line.length; i++) {
line[i] = line[i].replaceAll("[^a-zA-Z]", "").toLowerCase();
}
It doesn't work because strings are immutable, you need to set a value
e.g.
line[i] = line[i].toLowerCase();
You must reassign the result of toLowerCase() and replaceAll() back to line[i], since Java String is immutable (its internal value never changes, and the methods in String class will return a new String object instead of modifying the String object).
As it already answered , just thought of sharing one more way that was not mentioned here >
str = str.replaceAll("\\P{Alnum}", "").toLowerCase();
A cool (but slightly cumbersome, if you don't like casting) way of doing what you want to do is go through the entire string, index by index, casting each result from String.charAt(index) to (byte), and then checking to see if that byte is either a) in the numeric range of lower-case alphabetic characters (a = 97 to z = 122), in which case cast it back to char and add it to a String, array, or what-have-you, or b) in the numeric range of upper-case alphabetic characters (A = 65 to Z = 90), in which case add 32 (A + 22 = 65 + 32 = 97 = a) and cast that to char and add it in. If it is in neither of those ranges, simply discard it.
You can also use Arrays.setAll for this:
Arrays.setAll(array, i -> array[i].replaceAll("[^a-zA-Z]", "").toLowerCase());
Here is working method
String name = "Joy.78#,+~'{/>";
String[] stringArray = name.split("\\W+");
StringBuilder result = new StringBuilder();
for (int i = 0; i < stringArray.length; i++) {
result.append(stringArray[i]);
}
String nameNew = result.toString();
nameNew.toLowerCase();
public static void solve(String line){
// trim to remove unwanted spaces
line= line.trim();
String[] split = line.split("\\W+");
// print using for-each
for (String s : split) {
System.out.println(s);
}