Apache Commons Text: Random String for special characters java - java

I'm using apache commons-text:RandomStringGenerator for generating a random String like so:
//Utilities
private static RandomStringGenerator generator(int minimumCodePoint, int maximumCodePoint, CharacterPredicates... predicates) {
return new RandomStringGenerator.Builder()
.withinRange(minimumCodePoint, maximumCodePoint)
.filteredBy(predicates)
.build();
}
public static String randStringAlpha(int length) {
return generator('A', 'z', CharacterPredicates.LETTERS).generate(length);
}
public static String randStringAlphaNum(int length) {
return generator('1', 'z', CharacterPredicates.LETTERS, CharacterPredicates.DIGITS).generate(length);
}
//Generation
private void foo() {
String alpha = randStringAlpha(255);
String num = randStringAlphaNum(255);
}
I'm looking for a way to use the same library to generate to following:
A - special characters (could be limited to keyboard special characters)
B - alpha + A
C - num + A
D - alpha + num + A
I already checked the CharacterPredicates enum but it only has LETTERS and DIGITS for filtering. Any help would be really appreciated!
EDIT:===============================================
I decided to shelf my current solution in favor of this answer.
To clarify the scope of 'special characters' I was actually looking for this subset:
Snippet for case A:
public static CharSequence asciiSpecial() {
return asciiCharacters().toString().replaceAll("(\\d|[A-z])","");
}

Your category “special characters” is quiet fuzzy. As long as you stay with the ASCII range, all characters are either letter, digit or “special”, but can be entered with an ordinary keyboard. In other words, you don’t need to specify a filter at all for that. On the other hand, when you leave the ASCII range, there is a variety of character categories you would have to care of (e.g. you don’t want to insert random combining characters at arbitrary points), further, there is no general test whether a character can be entered with a keyboard (as there is no general keyboard)…
But note that your code trying to use that library is already bigger than code doing the actual work would be. E.g. to get a random letter string, you could use
public static String randStringAlpha(int size) {
return ThreadLocalRandom.current().ints('A', 'z'+1)
.filter(Character::isLetter)
.limit(size)
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
}
or the likely more efficient variant
public static String randStringAlpha(int size) {
return ThreadLocalRandom.current().ints(size, 'A', 'Z'+1)
.map(c -> ThreadLocalRandom.current().nextBoolean()? c: Character.toLowerCase(c))
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
}
without any 3rd party library.
Likewise, you could generalize the task using
public static String randomString(int size, CharSequence validChars) {
return ThreadLocalRandom.current().ints(size, 0, validChars.length())
.map(validChars::charAt)
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
}
public static String randomString(int minSizeIncl, int maxSizeIncl, CharSequence valid) {
return randomString(
ThreadLocalRandom.current().nextInt(minSizeIncl, maxSizeIncl), valid);
}
public static CharSequence asciiLetters() {
return IntStream.concat(IntStream.rangeClosed('A','Z'), IntStream.rangeClosed('a','z'))
.collect(StringBuilder::new,StringBuilder::appendCodePoint,StringBuilder::append);
}
public static CharSequence asciiLetterOrDigit() {
return IntStream.concat(asciiLetters().chars(),IntStream.rangeClosed('0', '9'))
.collect(StringBuilder::new,StringBuilder::appendCodePoint,StringBuilder::append);
}
public static CharSequence asciiCharacters() {
return IntStream.rangeClosed('!', '~')
.collect(StringBuilder::new,StringBuilder::appendCodePoint,StringBuilder::append);
}
Which you can use by combining two methods, e.g.
RandomString.randomString(10, asciiLetters()),
RandomString.randomString(10, asciiLetterOrDigit()), or
RandomString.randomString(10, asciiCharacters()), resp. their variable-size counterparts like RandomString.randomString(10, 20, asciiCharacters()).
The CharSequences can be reused between multiple string generation calls, would be similar to building a RandomStringGenerator and using it multiple times.

You can modify your argument type in method generator from CharacterPredicates to CharacterPredicate and write your custom CharacterPredicate like:
private static RandomStringGenerator generator(int minimumCodePoint, int maximumCodePoint, CharacterPredicate... predicates) {
return new RandomStringGenerator.Builder()
.withinRange(minimumCodePoint, maximumCodePoint)
.filteredBy(predicates)
.build();
}
public static String randSomething(int length) {
return generator('1', 'z', new CharacterPredicate() {
#Override
public boolean test(int i) {
return true; // Write your logic here
}
}).generate(length);
}

Related

Most efficient way to convert Enum values into comma seperated String

I have a java class in which I store an Enum.(shown at the bottom of this question) In this enum, I have a method named toCommaSeperatedString() who returns a comma separated String of the enums values. I am using a StringBuilder after reading some information on performance in this question here.
Is the way I am converting this enum's values into a commaSeperatedString the most efficient way of doing so, and if so, what would be the most efficient way to remove the extra comma at the last char of the String?
For example, my method returns 123, 456, however I would prefer 123, 456. If I wanted to return PROPERTY1, PROPERTY2 I could easily use Apache Commons library StringUtils.join(), however, I need to get one level lower by calling the getValue method when I am iterating through the String array.
public class TypeEnum {
public enum validTypes {
PROPERTY1("123"),
PROPERTY2("456");
private String value;
validTypes(String value) {
this.value = value;
}
public String getValue() {
return value;
}
public static boolean contains(String type) {
for (validTypes msgType : validTypes.values()) {
if (msgType.value.equals(type)) {
return true;
}
}
return false;
}
public static String toCommaSeperatedString() {
StringBuilder commaSeperatedValidMsgTypes = new StringBuilder();
for(validTypes msgType : validTypes.values()) {
commaSeperatedValidMsgTypes.append(msgType.getValue() + ", ");
}
return commaSeperatedValidMsgTypes.toString();
}
}
}
I wouldn't worry much about efficiency. It's simple enough to do this that it will be fast, provided you don't do it in a crazy way. If this is the most significant performance bottleneck in your code, I would be amazed.
I'd do it something like this:
return Arrays.stream(TypeEnum.values())
.map(t -> t.value)
.collect(Collectors.joining(','));
Cache it if you want; but that's probably not going to make a huge difference.
A common pattern for the trailing comma problem I see is something like
String[] values = {"A", "B", "C"};
boolean is_first = true;
StringBuilder commaSeperatedValidMsgTypes = new StringBuilder();
for(String value : values){
if(is_first){
is_first = false;
}
else{
commaSeperatedValidMsgTypes.append(',');
}
commaSeperatedValidMsgTypes.append(value);
}
System.out.println(commaSeperatedValidMsgTypes.toString());
which results in
A,B,C
Combining this with the answers about using a static block to initialize a static final field will probably give the best performance.
The most efficient code is code that doesn't run. This answer can't ever change, so run that code as you have it once when creating the enums. Take the hit once, return the calculated answer every other time somebody asks for it. The savings in doing that would be far greater in the long term over worrying about how specifically to construct the string, so use whatever is clearest to you (write code for humans to read).
For example:
public enum ValidTypes {
PROPERTY1("123"),
PROPERTY2("345");
private final static String asString = calculateString();
private final String value;
private static String calculateString() {
return // Do your work here.
}
ValidTypes(final String value) {
this.value = value;
}
public static String toCommaSeparatedString() {
return asString;
}
}
If you have to call this static method thousand and thousand of times on a short period, you may worry about performance and you should first check that this has a performance cost.
The JVM performs at runtime many optimizations.
So finally you could write more complex code without added value.
Anyway, the actual thing that you should do is storing the String returned by toCommaSeperatedString and returned the same instance.
Enum are constant values. So caching them is not a problem.
You could use a static initializer that values a static String field.
About the , character, just remove it after the loop.
public enum validTypes {
PROPERTY1("123"), PROPERTY2("456");
private static String valueSeparatedByComma;
static {
StringBuilder commaSeperatedValidMsgTypes = new StringBuilder();
for (validTypes msgType : validTypes.values()) {
commaSeperatedValidMsgTypes.append(msgType.getValue());
commaSeperatedValidMsgTypes.append(",");
}
commaSeperatedValidMsgTypes.deleteCharAt
(commaSeperatedValidMsgTypes.length()-1);
valueSeparatedByComma = commaSeperatedValidMsgTypes.toString();
}
public static String getvalueSeparatedByComma() {
return valueSeparatedByComma;
}
I usually add a static method on the enum class itself:
public enum Animal {
CAT, DOG, LION;
public static String possibleValues() {
return Arrays.stream(Animal.values())
.map(Enum::toString)
.collect(Collectors.joining(","));
}
}
So I can use it like String possibleValues = Animal.possibleValues();

using java streams in parallel with collect(supplier, accumulator, combiner) not giving expected results

I'm trying to find number of words in given string. Below is sequential algorithm for it which works fine.
public int getWordcount() {
boolean lastSpace = true;
int result = 0;
for(char c : str.toCharArray()){
if(Character.isWhitespace(c)){
lastSpace = true;
}else{
if(lastSpace){
lastSpace = false;
++result;
}
}
}
return result;
}
But, when i tried to 'parallelize' this with Stream.collect(supplier, accumulator, combiner) method, i am getting wordCount = 0. I am using an immutable class (WordCountState) just to maintain the state of word count.
Code :
public class WordCounter {
private final String str = "Java8 parallelism helps if you know how to use it properly.";
public int getWordCountInParallel() {
Stream<Character> charStream = IntStream.range(0, str.length())
.mapToObj(i -> str.charAt(i));
WordCountState finalState = charStream.parallel()
.collect(WordCountState::new,
WordCountState::accumulate,
WordCountState::combine);
return finalState.getCounter();
}
}
public class WordCountState {
private final boolean lastSpace;
private final int counter;
private static int numberOfInstances = 0;
public WordCountState(){
this.lastSpace = true;
this.counter = 0;
//numberOfInstances++;
}
public WordCountState(boolean lastSpace, int counter){
this.lastSpace = lastSpace;
this.counter = counter;
//numberOfInstances++;
}
//accumulator
public WordCountState accumulate(Character c) {
if(Character.isWhitespace(c)){
return lastSpace ? this : new WordCountState(true, counter);
}else{
return lastSpace ? new WordCountState(false, counter + 1) : this;
}
}
//combiner
public WordCountState combine(WordCountState wordCountState) {
//System.out.println("Returning new obj with count : " + (counter + wordCountState.getCounter()));
return new WordCountState(this.isLastSpace(),
(counter + wordCountState.getCounter()));
}
I've observed two issues with above code :
1. Number of objects (WordCountState) created are greater than number of characters in the string.
2. Result is always 0.
3. As per accumulator/consumer documentation, shouldn't the accumulator return void? Even though my accumulator method is returning an object, compiler doesn't complain.
Any clue where i might have gone off track?
UPDATE :
Used solution as below -
public int getWordCountInParallel() {
Stream<Character> charStream = IntStream.range(0, str.length())
.mapToObj(i -> str.charAt(i));
WordCountState finalState = charStream.parallel()
.reduce(new WordCountState(),
WordCountState::accumulate,
WordCountState::combine);
return finalState.getCounter();
}
You can always invoke a method and ignore its return value, so it’s logical to allow the same when using method references. Therefore, it’s no problem creating a method reference to a non-void method when a consumer is required, as long as the parameters match.
What you have created with your immutable WordCountState class, is a reduction operation, i.e. it would support a use case like
Stream<Character> charStream = IntStream.range(0, str.length())
.mapToObj(i -> str.charAt(i));
WordCountState finalState = charStream.parallel()
.map(ch -> new WordCountState().accumulate(ch))
.reduce(new WordCountState(), WordCountState::combine);
whereas the collect method supports the mutable reduction, where a container instance (may be identical to the result) gets modified.
There is still a logical error in your solution as each WordCountState instance starts with assuming to have a preceding space character, without knowing the actual situation and no attempt to fix this in the combiner.
A way to fix and simplify this, still using reduction, would be:
public int getWordCountInParallel() {
return str.codePoints().parallel()
.mapToObj(WordCountState::new)
.reduce(WordCountState::new)
.map(WordCountState::getResult).orElse(0);
}
public class WordCountState {
private final boolean firstSpace, lastSpace;
private final int counter;
public WordCountState(int character){
firstSpace = lastSpace = Character.isWhitespace(character);
this.counter = 0;
}
public WordCountState(WordCountState a, WordCountState b) {
this.firstSpace = a.firstSpace;
this.lastSpace = b.lastSpace;
this.counter = a.counter + b.counter + (a.lastSpace && !b.firstSpace? 1: 0);
}
public int getResult() {
return counter+(firstSpace? 0: 1);
}
}
If you are worrying about the number of WordCountState instances, note how many Character instances this solution does not create, compared to your initial approach.
However, this task is indeed suitable for mutable reduction, if you rewrite your WordCountState to a mutable result container:
public int getWordCountInParallel() {
return str.codePoints().parallel()
.collect(WordCountState::new, WordCountState::accumulate, WordCountState::combine)
.getResult();
}
public class WordCountState {
private boolean firstSpace, lastSpace=true, initial=true;
private int counter;
public void accumulate(int character) {
boolean white=Character.isWhitespace(character);
if(lastSpace && !white) counter++;
lastSpace=white;
if(initial) {
firstSpace=white;
initial=false;
}
}
public void combine(WordCountState b) {
if(initial) {
this.initial=b.initial;
this.counter=b.counter;
this.firstSpace=b.firstSpace;
this.lastSpace=b.lastSpace;
}
else if(!b.initial) {
this.counter += b.counter;
if(!lastSpace && !b.firstSpace) counter--;
this.lastSpace = b.lastSpace;
}
}
public int getResult() {
return counter;
}
}
Note how using int to represent unicode characters consistently, allows to use the codePoint() stream of a CharSequence, which is not only simpler, but also handles characters outside the Basic Multilingual Plane and is potentially more efficient, as it doesn’t need boxing to Character instances.
When you implemented stream().collect(supplier, accumulator, combiner) they do return void (combiner and accumulator). The problem is that this:
collect(WordCountState::new,
WordCountState::accumulate,
WordCountState::combine)
In your case actually means (just the accumulator, but same goes for the combiner):
(wordCounter, character) -> {
WordCountState state = wc.accumulate(c);
return;
}
And this is not trivial to get indeed. Let's say we have two methods:
public void accumulate(Character c) {
if (!Character.isWhitespace(c)) {
counter++;
}
}
public WordCountState accumulate2(Character c) {
if (Character.isWhitespace(c)) {
return lastSpace ? this : new WordCountState(true, counter);
} else {
return lastSpace ? new WordCountState(false, counter + 1) : this;
}
}
For the them the below code will work just fine, BUT only for a method reference, not for lambda expressions.
BiConsumer<WordCountState, Character> cons = WordCountState::accumulate;
BiConsumer<WordCountState, Character> cons2 = WordCountState::accumulate2;
You can imagine it slightly different, via an class that implementes BiConsumer for example:
BiConsumer<WordCountState, Character> clazz = new BiConsumer<WordCountState, Character>() {
#Override
public void accept(WordCountState state, Character character) {
WordCountState newState = state.accumulate2(character);
return;
}
};
As such your combine and accumulate methods needs to change to:
public void combine(WordCountState wordCountState) {
counter = counter + wordCountState.getCounter();
}
public void accumulate(Character c) {
if (!Character.isWhitespace(c)) {
counter++;
}
}
First of all, would it not be easier to just use something like input.split("\\s+").length to get the word count?
In case this is an exercise in streams and collectors, let's discuss your implementation. The biggest mistake was pointed out by you already: Your accumulator and combiner should not return new instances. The signature of collect tells you that it expects BiConsumer, which do not return anything. Because you create new object in the accumulator, you never increase the count of the WordCountState objects your collector actually uses. And by creating a new object in the combiner you would discard any progress you would have made. This is also why you create more objects than characters in your input: one per character, and then some for the return values.
See this adapted implementation:
public static class WordCountState
{
private boolean lastSpace = true;
private int counter = 0;
public void accumulate(Character character)
{
if (!Character.isWhitespace(character))
{
if (lastSpace)
{
counter++;
}
lastSpace = false;
}
else
{
lastSpace = true;
}
}
public void combine(WordCountState wordCountState)
{
counter += wordCountState.counter;
}
}
Here, we do not create new objects in every step, but change the state of the ones we have. I think you tried to create new objects because your Elvis operators forced you to return something and/or you couldn't change the instance fields as they are final. They do not need to be final, though, and you can easily change them.
Running this adapted implementation sequentially now works fine, as we nicely look at the chars one by one and end up with 11 words.
In parallel, though, it fails. It seems it creates a new WordCountState for every char, but does not count all of them, and ends up at 29 (at least for me). This shows a basic flaw with your algorithm: Splitting on every character doesn't work in parallel. Imagine the input abc abc, which should result in 2. If you do it in parallel and do not specify how to split the input, you might end up with these chunks: ab, c a, bc, which would add up to 4.
The problem is that by parallelizing between characters (i.e. in the middle of words), you make your separate WordCountStates dependent on each other (because they would need to know which one come before them and whether it ended with a whitespace char). This defeats the parallelism and results in errors.
Aside from all that, it might be easier to implement the Collector interface instead of providing the three methods:
public static class WordCountCollector
implements Collector<Character, SimpleEntry<AtomicInteger, Boolean>, Integer>
{
#Override
public Supplier<SimpleEntry<AtomicInteger, Boolean>> supplier()
{
return () -> new SimpleEntry<>(new AtomicInteger(0), true);
}
#Override
public BiConsumer<SimpleEntry<AtomicInteger, Boolean>, Character> accumulator()
{
return (count, character) -> {
if (!Character.isWhitespace(character))
{
if (count.getValue())
{
String before = count.getKey().get() + " -> ";
count.getKey().incrementAndGet();
System.out.println(before + count.getKey().get());
}
count.setValue(false);
}
else
{
count.setValue(true);
}
};
}
#Override
public BinaryOperator<SimpleEntry<AtomicInteger, Boolean>> combiner()
{
return (c1, c2) -> new SimpleEntry<>(new AtomicInteger(c1.getKey().get() + c2.getKey().get()), false);
}
#Override
public Function<SimpleEntry<AtomicInteger, Boolean>, Integer> finisher()
{
return count -> count.getKey().get();
}
#Override
public Set<java.util.stream.Collector.Characteristics> characteristics()
{
return new HashSet<>(Arrays.asList(Characteristics.CONCURRENT, Characteristics.UNORDERED));
}
}
We use a pair (SimpleEntry) to keep the count and the knowledge about the last space. This way, we do not need to implement the state in the collector itself or write a param object for it. You can use this collector like this:
return charStream.parallel().collect(new WordCountCollector());
This collector parallelizes nicer than the initial implementation, but still varies in results (mostly between 14 and 16) because of the mentioned weaknesses in your approach.

Removing accents from String

Recentrly I found very helpful method in StringUtils library which is
StringUtils.stripAccents(String s)
I found it really helpful with removing any special characters and converting it to some ASCII "equivalent", for instace ç=c etc.
Now I am working for a German customer who really needs to do such a thing but only for non-German characters. Any umlauts should stay untouched. I realised that strinAccents won't be useful in that case.
Does anyone has some experience around that stuff?
Are there any useful tools/libraries/classes or maybe regular expressions?
I tried to write some class which is parsing and replacing such characters but it can be very difficult to build such map for all languages...
Any suggestions appriciated...
Best built a custom function. It can be like the following. If you want to avoid the conversion of a character, you can remove the relationship between the two strings (the constants).
private static final String UNICODE =
"ÀàÈèÌìÒòÙùÁáÉéÍíÓóÚúÝýÂâÊêÎîÔôÛûŶŷÃãÕõÑñÄäËëÏïÖöÜüŸÿÅåÇçŐőŰű";
private static final String PLAIN_ASCII =
"AaEeIiOoUuAaEeIiOoUuYyAaEeIiOoUuYyAaOoNnAaEeIiOoUuYyAaCcOoUu";
public static String toAsciiString(String str) {
if (str == null) {
return null;
}
StringBuilder sb = new StringBuilder();
for (int index = 0; index < str.length(); index++) {
char c = str.charAt(index);
int pos = UNICODE.indexOf(c);
if (pos > -1)
sb.append(PLAIN_ASCII.charAt(pos));
else {
sb.append(c);
}
}
return sb.toString();
}
public static void main(String[] args) {
System.out.println(toAsciiString("Höchstalemannisch"));
}
My gut feeling tells me the easiest way to do this would be to just list allowed characters and strip accents from everything else. This would be something like
import java.util.regex.*;
import java.text.*;
public class Replacement {
public static void main(String args[]) {
String from = "aoeåöäìé";
String result = stripAccentsFromNonGermanCharacters(from);
System.out.println("Result: " + result);
}
private static String patternContainingAllValidGermanCharacters =
"a-zA-Z0-9äÄöÖéÉüÜß";
private static Pattern nonGermanCharactersPattern =
Pattern.compile("([^" + patternContainingAllValidGermanCharacters + "])");
public static String stripAccentsFromNonGermanCharacters(
String from) {
return stripAccentsFromCharactersMatching(
from, nonGermanCharactersPattern);
}
public static String stripAccentsFromCharactersMatching(
String target, Pattern myPattern) {
StringBuffer myStringBuffer = new StringBuffer();
Matcher myMatcher = myPattern.matcher(target);
while (myMatcher.find()) {
myMatcher.appendReplacement(myStringBuffer,
stripAccents(myMatcher.group(1)));
}
myMatcher.appendTail(myStringBuffer);
return myStringBuffer.toString();
}
// pretty much the same thing as StringUtils.stripAccents(String s)
// used here so I can demonstrate the code without StringUtils dependency
public static String stripAccents(String text) {
return Normalizer.normalize(text,
Normalizer.Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
}
(I realize the pattern doesn't probably contain all the characters needed, but add whatever is missing)
This might give you a work around. here you can detect the language and get the specific text only.
EDIT:
You can have the raw string as an input, put the language detection to German and then it will detect the German characters and will discard the remaining.

actual code in static method or instance method

I'm writing a small library.
public class MyClass {
public static String doSomethingWithString(final String s) {
new MyClass().doSomething(s);
}
public String doSomething(final String s) {
return null;
}
}
Or I can do like this.
public class MyClass {
public static String doSomethingWithString(final String s) {
return null;
}
public String doSomething(final String s) {
return doSomethingWithString(s);
}
}
Which style is preferable? Are they same?
UPDATE
Thank you for comments and answers.
Here are two classes.
public class IdEncoder {
private static String block(final long decoded) {
final StringBuilder builder = new StringBuilder(Long.toString(decoded));
builder.append(Integer.toString(
ThreadLocalRandom.current().nextInt(9) + 1)); // 1-9
builder.append(Integer.toString(
ThreadLocalRandom.current().nextInt(9) + 1)); // 1-9
builder.reverse();
return Long.toString(
Long.parseLong(builder.toString()), Character.MAX_RADIX);
}
public static String encodeLong(final long decoded) {
return block(decoded >>> 0x20) + "-" + block(decoded & 0xFFFFFFFFL);
}
public String encode(final long decoded) {
return encodeLong(decoded);
}
}
And another style.
public class IdDecoder {
public static long decodeLong(final String encoded) {
return new IdDecoder().decode(encoded);
}
public long decode(final String encoded) {
final int index = encoded.indexOf('-');
if (index == -1) {
throw new IllegalArgumentException("wrong encoded: " + encoded);
}
return (block(encoded.substring(0, index)) << 32)
| (block(encoded.substring(index + 1)));
}
private long block(final String encoded) {
final StringBuilder builder = new StringBuilder(
Long.toString(Long.parseLong(encoded, Character.MAX_RADIX)));
builder.reverse();
builder.deleteCharAt(builder.length() - 1);
builder.deleteCharAt(builder.length() - 1);
return Long.parseLong(builder.toString());
}
}
If you are just picking between these 2 options, take the second one.
The reason is the first requires you to allocate a new dummy object on the heap just to call a method. If there is truly no other difference, don't waste the time and space and just call the static method from the class.
The second is more akin to a static Utility function, which are a fine coding practice.
When writing a library, ease of use dramatically trumps general best practices. Your method should be static if it doesn't make sense for a user to instantiate something in order to access it. However often it is actually much cleaner and more powerful for a method to be part of an object, because it allows the user (as well as the library writer) to override it in child classes.
In a sense, you aren't actually asking a programming question, but a UX question. Ask yourself how your users would best benefit from accessing your code, and implement it that way. As a good benchmark, look at the Guava API; it consists of many static utility classes, but just as many classes and interfaces designed to be easily extended. Do what you think is best.

Convert All Chars in String to Different Escaped Formats(Java)

I'm looking to convert characters in a string to different escaped formats like the following, where the letter 'a' is the string being converted:
hex-url: %61
hex-html: a
decimal-html: &#97
I've searched used various built-in methods, but they merely take out the url-encoding specified chars(like '<') and escape them. I want to escape the ENTIRE string. Is there any way to convert a string into the formats above in java(using built in libraries, preferrably)?
public class StringEncoders
{
static public void main(String[] args)
{
System.out.println("hex-url: " + hexUrlEncode("a"));
System.out.println("hex-html: " + hexHtmlEncode("a"));
System.out.println("decimal-html: " + decimalHtmlEncode("a"));
}
static public String hexUrlEncode(String str) {
return encode(str, hexUrlEncoder);
}
static public String hexHtmlEncode(String str) {
return encode(str, hexHtmlEncoder);
}
static public String decimalHtmlEncode(String str) {
return encode(str, decimalHtmlEncoder);
}
static private String encode(String str, CharEncoder encoder)
{
StringBuilder buff = new StringBuilder();
for ( int i = 0; i < str.length(); i++)
encoder.encode(str.charAt(i), buff);
return ""+buff;
}
private static class CharEncoder
{
String prefix, suffix;
int radix;
public CharEncoder(String prefix, String suffix, int radix) {
this.prefix = prefix;
this.suffix = suffix;
this.radix = radix;
}
void encode(char c, StringBuilder buff) {
buff.append(prefix).append(Integer.toString(c, radix)).append(suffix);
}
}
static final CharEncoder hexUrlEncoder = new CharEncoder("%","",16);
static final CharEncoder hexHtmlEncoder = new CharEncoder("&#x",";",16);
static final CharEncoder decimalHtmlEncoder = new CharEncoder("&#",";",10);
}
I'm not sure about built in libraries, but it's pretty easy to write a method to do this yourself. All you need to do is loop through the string character by character and for each character do something like this:
"&#"+Integer.toHexString(character)+";";
and then append it to a new string you are making that has all the characters encoded.
There is unlikely to be an existing library method that does what you want:
In each of those examples, the escaping is unnecessary; e.g. for the letter 'a'. Library methods that do escaping only do it if it is necessary.
Libraries that allow you to do HTML / XML escaping don't allow you to chose the specific escaping syntax (AFAIK).
Your third example is incorrectly escaped.
You will need to implement this yourself. (The code is trivial ... and I'm assuming that you are capable.)

Categories