Avoid overwriting files using regex - java

I have a class that replaces illegal characters that strings might contain to allow using them as filenames. The problem is that it replaces any illegal character with "_", which is fine as long as the string does not entirely consist of illegal characters.
For example cleanFilename(">>>") will return the same string cleanFilename("***") returns. So storing "***" in a file after storing ">>>", would replace the first file.
public class StringCleaner {
public static String cleanFilename(String dirtyString) {
return dirtyString.replaceAll("[:\\/*?|<> ]", "_");
}
public static String cleanDirectory(String dirtyDirectory) {
return dirtyDirectory.replaceAll("[:\\*?|<> ]", "_");
}
}
What can i change in order to avoid this problem?
Sorry for the awkward title I could not find a better one.
Update: I want it to create readable filenames so that identification through reading the filename only will be possible.
Thanks
Selim

So you are looking for a reversible and repeatable mechanism for replacing funny characters in file names. A typical way to do this is to create an escape sequence. For example, consider the following:
Pick a single character to use as an escape sequence. This character must be a legal character in a file name, but not commonly used, and we will use it as an escape sequence.
Let's chose the + character. Then, we replace all illegal characters with a sequence of characters that uniquely identfy the replaced character.
For example, replacing the space (character 32) in the file "this has a space" would give the result "this+32+has+32+a+32+space" ....
public class StringCleaner {
public static void main(String[] args) {
StringCleaner sc = new StringCleaner();
System.out.println(sc.cleanFilename("this has a space"));
System.out.println(sc.cleanFilename("this has a plus +"));
System.out.println(sc.cleanFilename("this is full :\\/*?|<> + of stuff"));
}
private static final Pattern illegalfilechars = Pattern.compile("[:\\/*?|<> +]");
private static final Pattern illegaldirchars = Pattern.compile("[:\\*?|<> +]");
private static final String replaceall(Pattern pattern, String dirtyString) {
Matcher mat = pattern.matcher(dirtyString);
if (!mat.find()) {
return dirtyString;
}
StringBuffer sb = new StringBuffer();
do {
mat.appendReplacement(sb, "+" + (int)mat.group(0).charAt(0) + "+");
} while (mat.find());
mat.appendTail(sb);
return sb.toString();
}
public static String cleanFilename(String dirtyString) {
return replaceall(illegalfilechars, dirtyString);
}
public static String cleanDirectory(String dirtyDirectory) {
return replaceall(illegaldirchars, dirtyDirectory);
}
}
When I run the code I get the results:
this+32+has+32+a+32+space
this+32+has+32+a+32+plus+32++43+
this+32+is+32+full+32++58+\+47++42++63++124++60++62++32++43++32+of+32+stuff
which also indicates that the pattern is wrong for the character '\'

Related

How to match specific part of the URI string based on the following characters / words

I am trying to match particular part of the URI only when it is not followed by anything, or when it followed by '?'.
.../survey?expand=all //should match
.../survey //should match
.../survey/.. //should not match
I could not find a way to do that in one pattern. I tried (?=.*survey(?!\\?)) and did not work. I also could not find a way to do it in two separate patterns. For example, I want to match .../survey and not .../survey/... but this .*?/survey\\b did not work for me.
My parser:
public class UriParser {
private static String reqUriPath = null;
private static String pattern = null;
public static Boolean isURIMatching(Object routePattern, String pattern){
reqUriPath = routePattern.toString();
return checkPattern(reqUriPath, pattern);
}
private static Boolean checkPattern(String reqUriPath, String pattern) {
Pattern p = Pattern.compile(pattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(reqUriPath);
return m.find();
}
}
I have updated my regex with better answer
\\/survey(?(?=\\?)\\S*|$)
Explanation:
Here (?(?=\\?)\\S*|$) is basically an If clause i.e. (?=\\?) --> if(contains ?) else |$ so we have following condition :-
1. ?=<condition> i.e. \\? --> ? i.e. literal ? , so if ? is found in the String then allow \S* any nonwhite space.
2. | is Else condition i.e. else the string must end there, hence no chance of matching / any further.
If you feel confused with the explanation you can check out below link for regex 101 which has a coll explanation about the regex with all the description
https://regex101.com/r/WCt3WB/2
I am trying to match particular part of the URI only when it is not followed by anything, or when it followed by '?'.
You can do this without Regex;
public class UriParser {
public static void main(String []args){
System.out.println(isURIMatching("http://test.com/survey?expand=all", "survey"));
System.out.println(isURIMatching("http://test.com/survey", "survey"));
System.out.println(isURIMatching("http://test.com/survey/anything", "survey"));
}
public static Boolean isURIMatching(Object routePattern, String pattern){
final String reqUriPath = routePattern.toString();
final int lastIndex = reqUriPath.lastIndexOf(pattern);
final String lastPart = reqUriPath.substring(lastIndex+pattern.length()).trim();
return "".equals(lastPart) || lastPart.startsWith("?");
}
}

How to split a string in JAVA with two different seperators? [duplicate]

I want to split the string "004-034556" into two strings by the delimiter "-":
part1 = "004";
part2 = "034556";
That means the first string will contain the characters before '-', and the second string will contain the characters after '-'.
I also want to check if the string has '-' in it.
Use the appropriately named method String#split().
String string = "004-034556";
String[] parts = string.split("-");
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556
Note that split's argument is assumed to be a regular expression, so remember to escape special characters if necessary.
there are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), and the opening square bracket [, the opening curly brace {, These special characters are often called "metacharacters".
For instance, to split on a period/dot . (which means "any character" in regex), use either backslash \ to escape the individual special character like so split("\\."), or use character class [] to represent literal character(s) like so split("[.]"), or use Pattern#quote() to escape the entire string like so split(Pattern.quote(".")).
String[] parts = string.split(Pattern.quote(".")); // Split on the exact string.
To test beforehand if the string contains certain character(s), just use String#contains().
if (string.contains("-")) {
// Split it.
} else {
throw new IllegalArgumentException("String " + string + " does not contain -");
}
Note, this does not take a regular expression. For that, use String#matches() instead.
If you'd like to retain the split character in the resulting parts, then make use of positive lookaround. In case you want to have the split character to end up in left hand side, use positive lookbehind by prefixing ?<= group on the pattern.
String string = "004-034556";
String[] parts = string.split("(?<=-)");
String part1 = parts[0]; // 004-
String part2 = parts[1]; // 034556
In case you want to have the split character to end up in right hand side, use positive lookahead by prefixing ?= group on the pattern.
String string = "004-034556";
String[] parts = string.split("(?=-)");
String part1 = parts[0]; // 004
String part2 = parts[1]; // -034556
If you'd like to limit the number of resulting parts, then you can supply the desired number as 2nd argument of split() method.
String string = "004-034556-42";
String[] parts = string.split("-", 2);
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556-42
An alternative to processing the string directly would be to use a regular expression with capturing groups. This has the advantage that it makes it straightforward to imply more sophisticated constraints on the input. For example, the following splits the string into two parts, and ensures that both consist only of digits:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class SplitExample
{
private static Pattern twopart = Pattern.compile("(\\d+)-(\\d+)");
public static void checkString(String s)
{
Matcher m = twopart.matcher(s);
if (m.matches()) {
System.out.println(s + " matches; first part is " + m.group(1) +
", second part is " + m.group(2) + ".");
} else {
System.out.println(s + " does not match.");
}
}
public static void main(String[] args) {
checkString("123-4567");
checkString("foo-bar");
checkString("123-");
checkString("-4567");
checkString("123-4567-890");
}
}
As the pattern is fixed in this instance, it can be compiled in advance and stored as a static member (initialised at class load time in the example). The regular expression is:
(\d+)-(\d+)
The parentheses denote the capturing groups; the string that matched that part of the regexp can be accessed by the Match.group() method, as shown. The \d matches and single decimal digit, and the + means "match one or more of the previous expression). The - has no special meaning, so just matches that character in the input. Note that you need to double-escape the backslashes when writing this as a Java string. Some other examples:
([A-Z]+)-([A-Z]+) // Each part consists of only capital letters
([^-]+)-([^-]+) // Each part consists of characters other than -
([A-Z]{2})-(\d+) // The first part is exactly two capital letters,
// the second consists of digits
Use:
String[] result = yourString.split("-");
if (result.length != 2)
throw new IllegalArgumentException("String not in correct format");
This will split your string into two parts. The first element in the array will be the part containing the stuff before the -, and the second element in the array will contain the part of your string after the -.
If the array length is not 2, then the string was not in the format: string-string.
Check out the split() method in the String class.
This:
String[] out = string.split("-");
should do the thing you want. The string class has many method to operate with a string.
// This leaves the regexes issue out of question
// But we must remember that each character in the Delimiter String is treated
// like a single delimiter
public static String[] SplitUsingTokenizer(String subject, String delimiters) {
StringTokenizer strTkn = new StringTokenizer(subject, delimiters);
ArrayList<String> arrLis = new ArrayList<String>(subject.length());
while(strTkn.hasMoreTokens())
arrLis.add(strTkn.nextToken());
return arrLis.toArray(new String[0]);
}
With Java 8:
List<String> stringList = Pattern.compile("-")
.splitAsStream("004-034556")
.collect(Collectors.toList());
stringList.forEach(s -> System.out.println(s));
Use org.apache.commons.lang.StringUtils' split method which can split strings based on the character or string you want to split.
Method signature:
public static String[] split(String str, char separatorChar);
In your case, you want to split a string when there is a "-".
You can simply do as follows:
String str = "004-034556";
String split[] = StringUtils.split(str,"-");
Output:
004
034556
Assume that if - does not exists in your string, it returns the given string, and you will not get any exception.
The requirements left room for interpretation. I recommend writing a method,
public final static String[] mySplit(final String s)
which encapsulate this function. Of course you can use String.split(..) as mentioned in the other answers for the implementation.
You should write some unit-tests for input strings and the desired results and behaviour.
Good test candidates should include:
- "0022-3333"
- "-"
- "5555-"
- "-333"
- "3344-"
- "--"
- ""
- "553535"
- "333-333-33"
- "222--222"
- "222--"
- "--4555"
With defining the according test results, you can specify the behaviour.
For example, if "-333" should return in [,333] or if it is an error.
Can "333-333-33" be separated in [333,333-33] or [333-333,33] or is it an error? And so on.
To summarize: there are at least five ways to split a string in Java:
String.split():
String[] parts ="10,20".split(",");
Pattern.compile(regexp).splitAsStream(input):
List<String> strings = Pattern.compile("\\|")
.splitAsStream("010|020202")
.collect(Collectors.toList());
StringTokenizer (legacy class):
StringTokenizer strings = new StringTokenizer("Welcome to EXPLAINJAVA.COM!", ".");
while(strings.hasMoreTokens()){
String substring = strings.nextToken();
System.out.println(substring);
}
Google Guava Splitter:
Iterable<String> result = Splitter.on(",").split("1,2,3,4");
Apache Commons StringUtils:
String[] strings = StringUtils.split("1,2,3,4", ",");
So you can choose the best option for you depending on what you need, e.g. return type (array, list, or iterable).
Here is a big overview of these methods and the most common examples (how to split by dot, slash, question mark, etc.)
You can try like this also
String concatenated_String="hi^Hello";
String split_string_array[]=concatenated_String.split("\\^");
Assuming, that
you don't really need regular expressions for your split
you happen to already use apache commons lang in your app
The easiest way is to use StringUtils#split(java.lang.String, char). That's more convenient than the one provided by Java out of the box if you don't need regular expressions. Like its manual says, it works like this:
A null input String returns null.
StringUtils.split(null, *) = null
StringUtils.split("", *) = []
StringUtils.split("a.b.c", '.') = ["a", "b", "c"]
StringUtils.split("a..b.c", '.') = ["a", "b", "c"]
StringUtils.split("a:b:c", '.') = ["a:b:c"]
StringUtils.split("a b c", ' ') = ["a", "b", "c"]
I would recommend using commong-lang, since usually it contains a lot of stuff that's usable. However, if you don't need it for anything else than doing a split, then implementing yourself or escaping the regex is a better option.
For simple use cases String.split() should do the job. If you use guava, there is also a Splitter class which allows chaining of different string operations and supports CharMatcher:
Splitter.on('-')
.trimResults()
.omitEmptyStrings()
.split(string);
The fastest way, which also consumes the least resource could be:
String s = "abc-def";
int p = s.indexOf('-');
if (p >= 0) {
String left = s.substring(0, p);
String right = s.substring(p + 1);
} else {
// s does not contain '-'
}
String Split with multiple characters using Regex
public class StringSplitTest {
public static void main(String args[]) {
String s = " ;String; String; String; String, String; String;;String;String; String; String; ;String;String;String;String";
//String[] strs = s.split("[,\\s\\;]");
String[] strs = s.split("[,\\;]");
System.out.println("Substrings length:"+strs.length);
for (int i=0; i < strs.length; i++) {
System.out.println("Str["+i+"]:"+strs[i]);
}
}
}
Output:
Substrings length:17
Str[0]:
Str[1]:String
Str[2]: String
Str[3]: String
Str[4]: String
Str[5]: String
Str[6]: String
Str[7]:
Str[8]:String
Str[9]:String
Str[10]: String
Str[11]: String
Str[12]:
Str[13]:String
Str[14]:String
Str[15]:String
Str[16]:String
But do not expect the same output across all JDK versions. I have seen one bug which exists in some JDK versions where the first null string has been ignored. This bug is not present in the latest JDK version, but it exists in some versions between JDK 1.7 late versions and 1.8 early versions.
There are only two methods you really need to consider.
Use String.split for a one-character delimiter or you don't care about performance
If performance is not an issue, or if the delimiter is a single character that is not a regular expression special character (i.e., not one of .$|()[{^?*+\) then you can use String.split.
String[] results = input.split(",");
The split method has an optimization to avoid using a regular expression if the delimeter is a single character and not in the above list. Otherwise, it has to compile a regular expression, and this is not ideal.
Use Pattern.split and precompile the pattern if using a complex delimiter and you care about performance.
If performance is an issue, and your delimiter is not one of the above, you should pre-compile a regular expression pattern which you can then reuse.
// Save this somewhere
Pattern pattern = Pattern.compile("[,;:]");
/// ... later
String[] results = pattern.split(input);
This last option still creates a new Matcher object. You can also cache this object and reset it for each input for maximum performance, but that is somewhat more complicated and not thread-safe.
You can split a string by a line break by using the following statement:
String textStr[] = yourString.split("\\r?\\n");
You can split a string by a hyphen/character by using the following statement:
String textStr[] = yourString.split("-");
public class SplitTest {
public static String[] split(String text, String delimiter) {
java.util.List<String> parts = new java.util.ArrayList<String>();
text += delimiter;
for (int i = text.indexOf(delimiter), j=0; i != -1;) {
String temp = text.substring(j,i);
if(temp.trim().length() != 0) {
parts.add(temp);
}
j = i + delimiter.length();
i = text.indexOf(delimiter,j);
}
return parts.toArray(new String[0]);
}
public static void main(String[] args) {
String str = "004-034556";
String delimiter = "-";
String result[] = split(str, delimiter);
for(String s:result)
System.out.println(s);
}
}
Please don't use StringTokenizer class as it is a legacy class that is retained for compatibility reasons, and its use is discouraged in new code. And we can make use of the split method as suggested by others as well.
String[] sampleTokens = "004-034556".split("-");
System.out.println(Arrays.toString(sampleTokens));
And as expected it will print:
[004, 034556]
In this answer I also want to point out one change that has taken place for split method in Java 8. The String#split() method makes use of Pattern.split, and now it will remove empty strings at the start of the result array. Notice this change in documentation for Java 8:
When there is a positive-width match at the beginning of the input
sequence then an empty leading substring is included at the beginning
of the resulting array. A zero-width match at the beginning however
never produces such empty leading substring.
It means for the following example:
String[] sampleTokensAgain = "004".split("");
System.out.println(Arrays.toString(sampleTokensAgain));
we will get three strings: [0, 0, 4] and not four as was the case in Java 7 and before. Also check this similar question.
One way to do this is to run through the String in a for-each loop and use the required split character.
public class StringSplitTest {
public static void main(String[] arg){
String str = "004-034556";
String split[] = str.split("-");
System.out.println("The split parts of the String are");
for(String s:split)
System.out.println(s);
}
}
Output:
The split parts of the String are:
004
034556
import java.io.*;
public class BreakString {
public static void main(String args[]) {
String string = "004-034556-1234-2341";
String[] parts = string.split("-");
for(int i=0;i<parts.length;i++) {
System.out.println(parts[i]);
}
}
}
You can use Split():
import java.io.*;
public class Splitting
{
public static void main(String args[])
{
String Str = new String("004-034556");
String[] SplittoArray = Str.split("-");
String string1 = SplittoArray[0];
String string2 = SplittoArray[1];
}
}
Else, you can use StringTokenizer:
import java.util.*;
public class Splitting
{
public static void main(String[] args)
{
StringTokenizer Str = new StringTokenizer("004-034556");
String string1 = Str.nextToken("-");
String string2 = Str.nextToken("-");
}
}
Here are two ways two achieve it.
WAY 1: As you have to split two numbers by a special character you can use regex
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TrialClass
{
public static void main(String[] args)
{
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher("004-034556");
while(m.find())
{
System.out.println(m.group());
}
}
}
WAY 2: Using the string split method
public class TrialClass
{
public static void main(String[] args)
{
String temp = "004-034556";
String [] arrString = temp.split("-");
for(String splitString:arrString)
{
System.out.println(splitString);
}
}
}
You can simply use StringTokenizer to split a string in two or more parts whether there are any type of delimiters:
StringTokenizer st = new StringTokenizer("004-034556", "-");
while(st.hasMoreTokens())
{
System.out.println(st.nextToken());
}
Check out the split() method in the String class on javadoc.
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
String data = "004-034556-1212-232-232";
int cnt = 1;
for (String item : data.split("-")) {
System.out.println("string "+cnt+" = "+item);
cnt++;
}
Here many examples for split string but I little code optimized.
String str="004-034556"
String[] sTemp=str.split("-");// '-' is a delimiter
string1=004 // sTemp[0];
string2=034556//sTemp[1];
I just wanted to write an algorithm instead of using Java built-in functions:
public static List<String> split(String str, char c){
List<String> list = new ArrayList<>();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.length(); i++){
if(str.charAt(i) != c){
sb.append(str.charAt(i));
}
else{
if(sb.length() > 0){
list.add(sb.toString());
sb = new StringBuilder();
}
}
}
if(sb.length() >0){
list.add(sb.toString());
}
return list;
}
You can use the method split:
public class Demo {
public static void main(String args[]) {
String str = "004-034556";
if ((str.contains("-"))) {
String[] temp = str.split("-");
for (String part:temp) {
System.out.println(part);
}
}
else {
System.out.println(str + " does not contain \"-\".");
}
}
}
To split a string, uses String.split(regex). Review the following examples:
String data = "004-034556";
String[] output = data.split("-");
System.out.println(output[0]);
System.out.println(output[1]);
Output
004
034556
Note:
This split (regex) takes a regex as an argument. Remember to escape the regex special characters, like period/dot.
String s = "TnGeneral|DOMESTIC";
String a[]=s.split("\\|");
System.out.println(a.toString());
System.out.println(a[0]);
System.out.println(a[1]);
Output:
TnGeneral
DOMESTIC
String s="004-034556";
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)=='-')
{
System.out.println(s.substring(0,i));
System.out.println(s.substring(i+1));
}
}
As mentioned by everyone, split() is the best option which may be used in your case. An alternative method can be using substring().

How to change characters of a string into '*'

So I'm trying to make a simple Wheel of fortune type game. But I'm having a serious issue getting started. I'm just trying to convert my phrase into "*" so that it can't be seen until the user guesses what one of the letters is. Here's what I have so far:
public class Puzzle
{
private String solution="DOG PILE";
private StringBuilder puzzle;
public Puzzle(String solution)
{
int startindex=puzzle.indexOf(solution);
puzzle.replace(startIndex, endIndex, "-");
}
}
Use a regular expression and replace method:
String hideSolution = solution.replaceAll(".", "-");
Use guava library
example:
String noDigits = CharMatcher.JAVA_DIGIT.replaceFrom(string, "*"); // star out all digits
You can try something like this
public static String hide(String data, StringBuilder charactersToShow) {
return data.replaceAll("[^\\s" + charactersToShow.toString() + "]", "*");
}
public static void main(String[] args) throws Exception {
StringBuilder gueses = new StringBuilder();
String solution = "DOG PILE";
System.out.println(hide(solution, gueses));//
gueses.append('D');
System.out.println(hide(solution, gueses));
gueses.append('I');
System.out.println(hide(solution, gueses));
}
Output:
*** ****
D** ****
D** *I**
Little explanation:
replaceAll method takes two arguments: regular expression that describes what part of String should be replaced, and second argument is replacement. Result of that method is new String so original String will not be changed.
As regular expression I used class of characters [] with negation [^...] so it will match any character that is not in this class. Besides user characters I added \\s at the beginning, because it represents every white space (normal spaces, tabulators, new lines, and so on) since you probably don't want to replace them with *.
You may also want to add ' into that "set" if you don't want to replace it.

Java and SEO friendly URLs: ©reate ╨ a valid http URL from a string composed by special caracters

I'm trying to extract SEO friendly URLs from strings that can contain special characters, letter with accents, Chinese like characters, etc.
SO is doing this and it's translating this post title in
java-and-seo-friendly-urls-reate--a-valid-http-url-from-a-string-composed-by-s
I'm trying to do this in Java.
I'm using this post solution with URLEncoder.encode to translate Chinese and other symbols into valid URL characters.
Have you ever implemented something like this? Is there a better way?
This might be an oversimplistic approach to the problem, but you could just use regular expressions to remove all non standard characters. So after converting your string to lowercase, you can replace all non lowercase alphabetic characters with an empty character and then replace all spaces with the '-' character.
private static String encodeForUrl(String input) {
return input.toLowerCase().replaceAll("[^a-z\\s]", "").replaceAll("\\s", "-");
}
I don't know of any standard way for this, I've been using a similair solution as what you are refering to. Not sure which one's better, so here you have it:
public class TextUtils {
private static final Pattern DIACRITICS_AND_FRIENDS =
Pattern.compile("[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+");
private static final Transliterator TO_LATIN_TRANSLITERATOR = Transliterator.getInstance("Any-Latin");
private static final Pattern EEQUIVALENTS = Pattern.compile("[ǝƏ]+");
private static final Pattern IEQUIVALENTS = Pattern.compile("[ı]+");
private static final Pattern DEQUIVALENTS = Pattern.compile("[Ððđ]+");
private static final Pattern OEQUIVALENTS = Pattern.compile("[Øø]+");
private static final Pattern LEQUIVALENTS = Pattern.compile("[Ł]+");
//all spaces, non-ascii and punctuation characters except _ and -
private static final Pattern CRAP = Pattern.compile("[\\p{IsSpace}\\P{IsASCII}\\p{IsP}\\+&&[^_]]");
private static final Pattern SEPARATORS = Pattern.compile("[\\p{IsSpace}/`-]");
private static final Pattern URLFRIENDLY = Pattern.compile("([a-zA-Z0-9_])*");
private static final CharsetEncoder ASCII_ENCODER = Charset.forName("ISO-8859-1").newEncoder();
/**
* Returns true when the input test contains only characters from the ASCII set, false otherwise.
*/
public static boolean isPureAscii(String text) {
return ASCII_ENCODER.canEncode(text);
}
/**
* Replaces all characters that normalize into two characters with their base symbol (e.g. ü -> u)
*/
public static String replaceCombiningDiacriticalMarks(String text) {
return DIACRITICS_AND_FRIENDS.matcher(Normalizer.normalize(text, Normalizer.Form.NFKD)).replaceAll("");
}
/**
* Turns the input string into a url friendly variant (containing only alphanumeric characters and '-' and '_').
* If the input string cannot be converted an IllegalArgumentException is thrown.
*/
public static String urlFriendlyStrict(String unfriendlyString) throws IllegalArgumentException {
String friendlyString =
urlFriendly(unfriendlyString);
//Assert can be removed to improve performance
Assert.isTrue(URLFRIENDLY.matcher(friendlyString).matches(),
format("Friendly string [%s] based on [%s] is not friendly enough", friendlyString, unfriendlyString));
return friendlyString;
}
/**
* Turns the input string into a url friendly variant (containing only alphanumeric characters and '-' and '_').
* Use {#link #urlFriendlyStrict(String)} to avoid potential bugs in this code.
*/
private static String urlFriendly(String unfriendlyString) {
return removeCrappyCharacters(
replaceEquivalentsOfSymbols(
replaceCombiningDiacriticalMarks(
transLiterateSymbols(
replaceSeparatorsWithUnderscores(
unfriendlyString.trim()))))).toLowerCase();
}
private static String transLiterateSymbols(String incomprehensibleString) {
String latin = TO_LATIN_TRANSLITERATOR.transform(incomprehensibleString);
return latin;
}
private static String replaceEquivalentsOfSymbols(String unfriendlyString) {
return
LEQUIVALENTS.matcher(
OEQUIVALENTS.matcher(
DEQUIVALENTS.matcher(
IEQUIVALENTS.matcher(
EEQUIVALENTS.matcher(unfriendlyString).replaceAll("e"))
.replaceAll("i"))
.replaceAll("d"))
.replaceAll("o"))
.replaceAll("l");
}
private static String removeCrappyCharacters(String unfriendlyString) {
return CRAP.matcher(unfriendlyString).replaceAll("");
}
private static String replaceSeparatorsWithUnderscores(String unfriendlyString) {
return SEPARATORS.matcher(unfriendlyString).replaceAll("_");
}
}
I would say URLEncoder.encode is the way to go. All non-URL chars are mapped, and you surely don't want to reinvent the wheel (again and again and again).

java replaceLast() [duplicate]

This question already has answers here:
Replace the last part of a string
(11 answers)
Closed 5 years ago.
Is there replaceLast() in Java? I saw there is replaceFirst().
EDIT: If there is not in the SDK, what would be a good implementation?
It could (of course) be done with regex:
public class Test {
public static String replaceLast(String text, String regex, String replacement) {
return text.replaceFirst("(?s)"+regex+"(?!.*?"+regex+")", replacement);
}
public static void main(String[] args) {
System.out.println(replaceLast("foo AB bar AB done", "AB", "--"));
}
}
although a bit cpu-cycle-hungry with the look-aheads, but that will only be an issue when working with very large strings (and many occurrences of the regex being searched for).
A short explanation (in case of the regex being AB):
(?s) # enable dot-all option
A # match the character 'A'
B # match the character 'B'
(?! # start negative look ahead
.*? # match any character and repeat it zero or more times, reluctantly
A # match the character 'A'
B # match the character 'B'
) # end negative look ahead
EDIT
Sorry to wake up an old post. But this is only for non-overlapping instances.
For example .replaceLast("aaabbb", "bb", "xx"); returns "aaaxxb", not "aaabxx"
True, that could be fixed as follows:
public class Test {
public static String replaceLast(String text, String regex, String replacement) {
return text.replaceFirst("(?s)(.*)" + regex, "$1" + replacement);
}
public static void main(String[] args) {
System.out.println(replaceLast("aaabbb", "bb", "xx"));
}
}
If you don't need regex, here's a substring alternative.
public static String replaceLast(String string, String toReplace, String replacement) {
int pos = string.lastIndexOf(toReplace);
if (pos > -1) {
return string.substring(0, pos)
+ replacement
+ string.substring(pos + toReplace.length());
} else {
return string;
}
}
Testcase:
public static void main(String[] args) throws Exception {
System.out.println(replaceLast("foobarfoobar", "foo", "bar")); // foobarbarbar
System.out.println(replaceLast("foobarbarbar", "foo", "bar")); // barbarbarbar
System.out.println(replaceLast("foobarfoobar", "faa", "bar")); // foobarfoobar
}
use replaceAll and add a dollar sign right after your pattern:
replaceAll("pattern$", replacement);
You can combine StringUtils.reverse() with String.replaceFirst()
See for yourself: String
Or is your question actually "How do I implement a replaceLast()?"
Let me attempt an implementation (this should behave pretty much like replaceFirst(), so it should support regexes and backreferences in the replacement String):
public static String replaceLast(String input, String regex, String replacement) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
if (!matcher.find()) {
return input;
}
int lastMatchStart=0;
do {
lastMatchStart=matcher.start();
} while (matcher.find());
matcher.find(lastMatchStart);
StringBuffer sb = new StringBuffer(input.length());
matcher.appendReplacement(sb, replacement);
matcher.appendTail(sb);
return sb.toString();
}
Use StringUtils from apache:
org.apache.commons.lang.StringUtils.chomp(value, ignoreChar);
No.
You could do reverse / replaceFirst / reverse, but it's a bit expensive.
If the inspected string is so that
myString.endsWith(substringToReplace) == true
you also can do
myString=myString.replaceFirst("(.*)"+myEnd+"$","$1"+replacement)
it is slow, but works:3
import org.apache.commons.lang.StringUtils;
public static String replaceLast(String str, String oldValue, String newValue) {
str = StringUtils.reverse(str);
str = str.replaceFirst(StringUtils.reverse(oldValue), StringUtils.reverse(newValue));
str = StringUtils.reverse(str);
return str;
}
split the haystack by your needle using a lookahead regex and replace the last element of the array, then join them back together :D
String haystack = "haystack haystack haystack";
String lookFor = "hay";
String replaceWith = "wood";
String[] matches = haystack.split("(?=" + lookFor + ")");
matches[matches.length - 1] = matches[matches.length - 1].replace(lookFor, replaceWith);
String brandNew = StringUtils.join(matches);
I also have encountered such a problem, but I use this method:
public static String replaceLast2(String text,String regex,String replacement){
int i = text.length();
int j = regex.length();
if(i<j){
return text;
}
while (i>j&&!(text.substring(i-j, i).equals(regex))) {
i--;
}
if(i<=j&&!(text.substring(i-j, i).equals(regex))){
return text;
}
StringBuilder sb = new StringBuilder();
sb.append(text.substring(0, i-j));
sb.append(replacement);
sb.append(text.substring(i));
return sb.toString();
}
It really works good. Just add your string where u want to replace string in s and in place of "he" place the sub string u want to replace and in place of "mt" place the sub string you want in your new string.
import java.util.Scanner;
public class FindSubStr
{
public static void main(String str[])
{
Scanner on=new Scanner(System.in);
String s=on.nextLine().toLowerCase();
String st1=s.substring(0, s.lastIndexOf("he"));
String st2=s.substring(s.lastIndexOf("he"));
String n=st2.replace("he","mt");
System.out.println(st1+n);
}
}

Categories