Convert All Chars in String to Different Escaped Formats(Java) - java

I'm looking to convert characters in a string to different escaped formats like the following, where the letter 'a' is the string being converted:
hex-url: %61
hex-html: a
decimal-html: &#97
I've searched used various built-in methods, but they merely take out the url-encoding specified chars(like '<') and escape them. I want to escape the ENTIRE string. Is there any way to convert a string into the formats above in java(using built in libraries, preferrably)?

public class StringEncoders
{
static public void main(String[] args)
{
System.out.println("hex-url: " + hexUrlEncode("a"));
System.out.println("hex-html: " + hexHtmlEncode("a"));
System.out.println("decimal-html: " + decimalHtmlEncode("a"));
}
static public String hexUrlEncode(String str) {
return encode(str, hexUrlEncoder);
}
static public String hexHtmlEncode(String str) {
return encode(str, hexHtmlEncoder);
}
static public String decimalHtmlEncode(String str) {
return encode(str, decimalHtmlEncoder);
}
static private String encode(String str, CharEncoder encoder)
{
StringBuilder buff = new StringBuilder();
for ( int i = 0; i < str.length(); i++)
encoder.encode(str.charAt(i), buff);
return ""+buff;
}
private static class CharEncoder
{
String prefix, suffix;
int radix;
public CharEncoder(String prefix, String suffix, int radix) {
this.prefix = prefix;
this.suffix = suffix;
this.radix = radix;
}
void encode(char c, StringBuilder buff) {
buff.append(prefix).append(Integer.toString(c, radix)).append(suffix);
}
}
static final CharEncoder hexUrlEncoder = new CharEncoder("%","",16);
static final CharEncoder hexHtmlEncoder = new CharEncoder("&#x",";",16);
static final CharEncoder decimalHtmlEncoder = new CharEncoder("&#",";",10);
}

I'm not sure about built in libraries, but it's pretty easy to write a method to do this yourself. All you need to do is loop through the string character by character and for each character do something like this:
"&#"+Integer.toHexString(character)+";";
and then append it to a new string you are making that has all the characters encoded.

There is unlikely to be an existing library method that does what you want:
In each of those examples, the escaping is unnecessary; e.g. for the letter 'a'. Library methods that do escaping only do it if it is necessary.
Libraries that allow you to do HTML / XML escaping don't allow you to chose the specific escaping syntax (AFAIK).
Your third example is incorrectly escaped.
You will need to implement this yourself. (The code is trivial ... and I'm assuming that you are capable.)

Related

Detect file type (json, html, text) from a String object in Java

My Java class will receive a String object that could be json, html, or plain text. I need to be able to detect which type from the Java String object.
Apache Tika does this, but only detects the type from a File object. When I pass it a String object it returns "application/octet-stream" as the type (for all types), which is incorrect.
Until now, we have only had to detect whether the String was html or plain text. In the code sample provided, we only had to search for obvious html tags. Now, we need to scan the String and figure out whether it is html, json, or plain text.
I would love to use a third-party library if one exists that can detect the type from a String object.
public static final String[] HTML_STARTS = {
"<html>",
"<!--",
"<!DOCTYPE",
"<?xml",
"<body"
};
public static boolean isJSON(String str)
{
str = str.trim();
if(str[0] == '{' && str[str.length-1] == '}') {
return true;
}
return false;
}
public static boolean isHTML(String str)
{
List<String> htmlTags = Arrays.asList(
"<html>",
"<!--",
"<!DOCTYPE",
"<?xml",
"<body"
);
return htmlTags.stream().anyMatch(string::contains);
}
public static int IS_PLAIN = 0;
public static int IS_HTML = 1;
public static int IS_JSON = 2;
public static int getType(String str)
{
if(isJSON(str)) return IS_JSON;
else if(isHTML(str)) return IS_HTML;
else return IS_PLAIN;
}
You can use JSoup for parsing HTML and Jackson or Gson for JSON.

Apache Commons Text: Random String for special characters java

I'm using apache commons-text:RandomStringGenerator for generating a random String like so:
//Utilities
private static RandomStringGenerator generator(int minimumCodePoint, int maximumCodePoint, CharacterPredicates... predicates) {
return new RandomStringGenerator.Builder()
.withinRange(minimumCodePoint, maximumCodePoint)
.filteredBy(predicates)
.build();
}
public static String randStringAlpha(int length) {
return generator('A', 'z', CharacterPredicates.LETTERS).generate(length);
}
public static String randStringAlphaNum(int length) {
return generator('1', 'z', CharacterPredicates.LETTERS, CharacterPredicates.DIGITS).generate(length);
}
//Generation
private void foo() {
String alpha = randStringAlpha(255);
String num = randStringAlphaNum(255);
}
I'm looking for a way to use the same library to generate to following:
A - special characters (could be limited to keyboard special characters)
B - alpha + A
C - num + A
D - alpha + num + A
I already checked the CharacterPredicates enum but it only has LETTERS and DIGITS for filtering. Any help would be really appreciated!
EDIT:===============================================
I decided to shelf my current solution in favor of this answer.
To clarify the scope of 'special characters' I was actually looking for this subset:
Snippet for case A:
public static CharSequence asciiSpecial() {
return asciiCharacters().toString().replaceAll("(\\d|[A-z])","");
}
Your category “special characters” is quiet fuzzy. As long as you stay with the ASCII range, all characters are either letter, digit or “special”, but can be entered with an ordinary keyboard. In other words, you don’t need to specify a filter at all for that. On the other hand, when you leave the ASCII range, there is a variety of character categories you would have to care of (e.g. you don’t want to insert random combining characters at arbitrary points), further, there is no general test whether a character can be entered with a keyboard (as there is no general keyboard)…
But note that your code trying to use that library is already bigger than code doing the actual work would be. E.g. to get a random letter string, you could use
public static String randStringAlpha(int size) {
return ThreadLocalRandom.current().ints('A', 'z'+1)
.filter(Character::isLetter)
.limit(size)
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
}
or the likely more efficient variant
public static String randStringAlpha(int size) {
return ThreadLocalRandom.current().ints(size, 'A', 'Z'+1)
.map(c -> ThreadLocalRandom.current().nextBoolean()? c: Character.toLowerCase(c))
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
}
without any 3rd party library.
Likewise, you could generalize the task using
public static String randomString(int size, CharSequence validChars) {
return ThreadLocalRandom.current().ints(size, 0, validChars.length())
.map(validChars::charAt)
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
}
public static String randomString(int minSizeIncl, int maxSizeIncl, CharSequence valid) {
return randomString(
ThreadLocalRandom.current().nextInt(minSizeIncl, maxSizeIncl), valid);
}
public static CharSequence asciiLetters() {
return IntStream.concat(IntStream.rangeClosed('A','Z'), IntStream.rangeClosed('a','z'))
.collect(StringBuilder::new,StringBuilder::appendCodePoint,StringBuilder::append);
}
public static CharSequence asciiLetterOrDigit() {
return IntStream.concat(asciiLetters().chars(),IntStream.rangeClosed('0', '9'))
.collect(StringBuilder::new,StringBuilder::appendCodePoint,StringBuilder::append);
}
public static CharSequence asciiCharacters() {
return IntStream.rangeClosed('!', '~')
.collect(StringBuilder::new,StringBuilder::appendCodePoint,StringBuilder::append);
}
Which you can use by combining two methods, e.g.
RandomString.randomString(10, asciiLetters()),
RandomString.randomString(10, asciiLetterOrDigit()), or
RandomString.randomString(10, asciiCharacters()), resp. their variable-size counterparts like RandomString.randomString(10, 20, asciiCharacters()).
The CharSequences can be reused between multiple string generation calls, would be similar to building a RandomStringGenerator and using it multiple times.
You can modify your argument type in method generator from CharacterPredicates to CharacterPredicate and write your custom CharacterPredicate like:
private static RandomStringGenerator generator(int minimumCodePoint, int maximumCodePoint, CharacterPredicate... predicates) {
return new RandomStringGenerator.Builder()
.withinRange(minimumCodePoint, maximumCodePoint)
.filteredBy(predicates)
.build();
}
public static String randSomething(int length) {
return generator('1', 'z', new CharacterPredicate() {
#Override
public boolean test(int i) {
return true; // Write your logic here
}
}).generate(length);
}

Remove part of string after or before a specific word in java

Is there a command in java to remove the rest of the string after or before a certain word;
Example:
Remove substring before the word "taken"
before:
"I need this words removed taken please"
after:
"taken please"
String are immutable, you can however find the word and create a substring:
public static String removeTillWord(String input, String word) {
return input.substring(input.indexOf(word));
}
removeTillWord("I need this words removed taken please", "taken");
There is apache-commons-lang class StringUtils that contains exactly you want:
e.g. public static String substringBefore(String str, String separator)
public static String foo(String str, String remove) {
return str.substring(str.indexOf(remove));
}
Clean way to safely remove until a string
String input = "I need this words removed taken please";
String token = "taken";
String result = input.contains(token)
? token + StringUtils.substringAfter(string, token)
: input;
Apache StringUtils functions are null-, empty-, and no match- safe
Since OP provided clear requirements
Remove the rest of the string after or before a certain word
and nobody has fulfilled those yet, here is my approach to the problem. There are certain rules to the implementation, but overall it should satisfy OP's needs, if he or she comes to revisit the question.
public static String remove(String input, String separator, boolean before) {
Objects.requireNonNull(input);
Objects.requireNonNull(separator);
if (input.trim().equals(separator)) {
return separator;
}
if (separator.isEmpty() || input.trim().isEmpty()) {
return input;
}
String[] tokens = input.split(separator);
String target;
if (before) {
target = tokens[0];
} else {
target = tokens[1];
}
return input.replace(target, "");
}

Removing accents from String

Recentrly I found very helpful method in StringUtils library which is
StringUtils.stripAccents(String s)
I found it really helpful with removing any special characters and converting it to some ASCII "equivalent", for instace ç=c etc.
Now I am working for a German customer who really needs to do such a thing but only for non-German characters. Any umlauts should stay untouched. I realised that strinAccents won't be useful in that case.
Does anyone has some experience around that stuff?
Are there any useful tools/libraries/classes or maybe regular expressions?
I tried to write some class which is parsing and replacing such characters but it can be very difficult to build such map for all languages...
Any suggestions appriciated...
Best built a custom function. It can be like the following. If you want to avoid the conversion of a character, you can remove the relationship between the two strings (the constants).
private static final String UNICODE =
"ÀàÈèÌìÒòÙùÁáÉéÍíÓóÚúÝýÂâÊêÎîÔôÛûŶŷÃãÕõÑñÄäËëÏïÖöÜüŸÿÅåÇçŐőŰű";
private static final String PLAIN_ASCII =
"AaEeIiOoUuAaEeIiOoUuYyAaEeIiOoUuYyAaOoNnAaEeIiOoUuYyAaCcOoUu";
public static String toAsciiString(String str) {
if (str == null) {
return null;
}
StringBuilder sb = new StringBuilder();
for (int index = 0; index < str.length(); index++) {
char c = str.charAt(index);
int pos = UNICODE.indexOf(c);
if (pos > -1)
sb.append(PLAIN_ASCII.charAt(pos));
else {
sb.append(c);
}
}
return sb.toString();
}
public static void main(String[] args) {
System.out.println(toAsciiString("Höchstalemannisch"));
}
My gut feeling tells me the easiest way to do this would be to just list allowed characters and strip accents from everything else. This would be something like
import java.util.regex.*;
import java.text.*;
public class Replacement {
public static void main(String args[]) {
String from = "aoeåöäìé";
String result = stripAccentsFromNonGermanCharacters(from);
System.out.println("Result: " + result);
}
private static String patternContainingAllValidGermanCharacters =
"a-zA-Z0-9äÄöÖéÉüÜß";
private static Pattern nonGermanCharactersPattern =
Pattern.compile("([^" + patternContainingAllValidGermanCharacters + "])");
public static String stripAccentsFromNonGermanCharacters(
String from) {
return stripAccentsFromCharactersMatching(
from, nonGermanCharactersPattern);
}
public static String stripAccentsFromCharactersMatching(
String target, Pattern myPattern) {
StringBuffer myStringBuffer = new StringBuffer();
Matcher myMatcher = myPattern.matcher(target);
while (myMatcher.find()) {
myMatcher.appendReplacement(myStringBuffer,
stripAccents(myMatcher.group(1)));
}
myMatcher.appendTail(myStringBuffer);
return myStringBuffer.toString();
}
// pretty much the same thing as StringUtils.stripAccents(String s)
// used here so I can demonstrate the code without StringUtils dependency
public static String stripAccents(String text) {
return Normalizer.normalize(text,
Normalizer.Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
}
(I realize the pattern doesn't probably contain all the characters needed, but add whatever is missing)
This might give you a work around. here you can detect the language and get the specific text only.
EDIT:
You can have the raw string as an input, put the language detection to German and then it will detect the German characters and will discard the remaining.

What is the most elegant way to convert a hyphen separated word (e.g. "do-some-stuff") to the lower camel-case variation (e.g. "doSomeStuff")?

What is the most elegant way to convert a hyphen separated word (e.g. "do-some-stuff") to the lower camel-case variation (e.g. "doSomeStuff") in Java?
Use CaseFormat from Guava:
import static com.google.common.base.CaseFormat.*;
String result = LOWER_HYPHEN.to(LOWER_CAMEL, "do-some-stuff");
With Java 8 there is finally a one-liner:
Arrays.stream(name.split("\\-"))
.map(s -> Character.toUpperCase(s.charAt(0)) + s.substring(1).toLowerCase())
.collect(Collectors.joining());
Though it takes splitting over 3 actual lines to be legible ツ
(Note: "\\-" is for kebab-case as per question, for snake_case simply change to "_")
The following method should handle the task quite efficient in O(n). We just iterate over the characters of the xml method name, skip any '-' and capitalize chars if needed.
public static String toJavaMethodName(String xmlmethodName) {
StringBuilder nameBuilder = new StringBuilder(xmlmethodName.length());
boolean capitalizeNextChar = false;
for (char c:xmlMethodName.toCharArray()) {
if (c == '-') {
capitalizeNextChar = true;
continue;
}
if (capitalizeNextChar) {
nameBuilder.append(Character.toUpperCase(c));
} else {
nameBuilder.append(c);
}
capitalizeNextChar = false;
}
return nameBuilder.toString();
}
Why not try this:
split on "-"
uppercase each word, skipping the first
join
EDIT: On second thoughts... While trying to implement this, I found out there is no simple way to join a list of strings in Java. Unless you use StringUtil from apache. So you will need to create a StringBuilder anyway and thus the algorithm is going to get a little ugly :(
CODE: Here is a sample of the above mentioned aproach. Could someone with a Java compiler (sorry, don't have one handy) test this? And benchmark it with other versions found here?
public static String toJavaMethodNameWithSplits(String xmlMethodName)
{
String[] words = xmlMethodName.split("-"); // split on "-"
StringBuilder nameBuilder = new StringBuilder(xmlMethodName.length());
nameBuilder.append(words[0]);
for (int i = 1; i < words.length; i++) // skip first
{
nameBuilder.append(words[i].substring(0, 1).toUpperCase());
nameBuilder.append(words[i].substring(1));
}
return nameBuilder.toString(); // join
}
If you don't like to depend on a library you can use a combination of a regex and String.format. Use a regex to extract the starting characters after the -. Use these as input for String.format. A bit tricky, but works without a (explizit) loop ;).
public class Test {
public static void main(String[] args) {
System.out.println(convert("do-some-stuff"));
}
private static String convert(String input) {
return String.format(input.replaceAll("\\-(.)", "%S"), input.replaceAll("[^-]*-(.)[^-]*", "$1-").split("-"));
}
}
Here is a slight variation of Andreas' answer that does more than the OP asked for:
public static String toJavaMethodName(final String nonJavaMethodName){
final StringBuilder nameBuilder = new StringBuilder();
boolean capitalizeNextChar = false;
boolean first = true;
for(int i = 0; i < nonJavaMethodName.length(); i++){
final char c = nonJavaMethodName.charAt(i);
if(!Character.isLetterOrDigit(c)){
if(!first){
capitalizeNextChar = true;
}
} else{
nameBuilder.append(capitalizeNextChar
? Character.toUpperCase(c)
: Character.toLowerCase(c));
capitalizeNextChar = false;
first = false;
}
}
return nameBuilder.toString();
}
It handles a few special cases:
fUnnY-cASe is converted to funnyCase
--dash-before-and--after- is converted to dashBeforeAndAfter
some.other$funky:chars? is converted to someOtherFunkyChars
For those who has com.fasterxml.jackson library in the project and don't want to add guava you can use the jaskson namingStrategy method:
new PropertyNamingStrategy.SnakeCaseStrategy.translate(String);
get The Apache commons jar for StringUtils. Then you can use the capitalize method
import org.apache.commons.lang.StringUtils;
public class MyClass{
public String myMethod(String str) {
StringBuffer buff = new StringBuffer();
String[] tokens = str.split("-");
for (String i : tokens) {
buff.append(StringUtils.capitalize(i));
}
return buff.toString();
}
}
As I'm not a big fan of adding a library just for one method, I implemented my own solution (from camel case to snake case):
public String toSnakeCase(String name) {
StringBuilder buffer = new StringBuilder();
for(int i = 0; i < name.length(); i++) {
if(Character.isUpperCase(name.charAt(i))) {
if(i > 0) {
buffer.append('_');
}
buffer.append(Character.toLowerCase(name.charAt(i)));
} else {
buffer.append(name.charAt(i));
}
}
return buffer.toString();
}
Needs to be adapted depending of the in / out cases.
In case you use Spring Framework, you can use provided StringUtils.
import org.springframework.util.StringUtils;
import java.util.Arrays;
import java.util.stream.Collectors;
public class NormalizeUtils {
private static final String DELIMITER = "_";
private NormalizeUtils() {
throw new IllegalStateException("Do not init.");
}
/**
* Take name like SOME_SNAKE_ALL and convert it to someSnakeAll
*/
public static String fromSnakeToCamel(final String name) {
if (StringUtils.isEmpty(name)) {
return "";
}
final String allCapitalized = Arrays.stream(name.split(DELIMITER))
.filter(c -> !StringUtils.isEmpty(c))
.map(StringUtils::capitalize)
.collect(Collectors.joining());
return StringUtils.uncapitalize(allCapitalized);
}
}
Iterate through the string. When you find a hypen, remove it, and capitalise the next letter.

Categories