Remove specifed trailing and leading punctuation from a word (Java) [duplicate] - java

This question already has answers here:
How can I remove all leading and trailing punctuation?
(3 answers)
Closed 9 years ago.
I have generated a constant by regex alled punctuation that contains everything that is defined to be punctuation i.e.
PUNCTUATION = " !\"',;:.-_?)([]<>*#\n\t\r"
Only problem is that I am not sure how to use this to remove all leading and trailing punctuation from a specified word. I have tried methods like replaceAll and startsWith but have had no luck.
Any suggestions anyone?

Completely untested, but should work:
public static String trimChars(String source, String trimChars) {
char[] chars = source.toCharArray();
int length = chars.length;
int start = 0;
while (start < length && trimChars.indexOf(chars[start]) > -1) {
start++;
}
while (start < length && trimChars.indexOf(chars[length - 1]) > -1) {
length--;
}
if (start > 0 || length < chars.length) {
return source.substring(start, length);
} else {
return source;
}
}
And you'd call it this way:
String trimmed = trimChars(input, PUNCTUATION);

A method that clears all chars in a string from the start and end (this should be more time-efficient than applying regex patterns):
public class StringUtil {
private static final String PUNCTUATION = " !\"',;:.-_?)([]<>*#\n\t\r";
public static String strip(String original, String charsToRemove) {
if (original == null) {
return null;
}
int end = original.length();
int start = 0;
char[] val = original.toCharArray();
while (start < end && charsToRemove.indexOf(val[start]) >= 0) {
start++;
}
while (start < end && charsToRemove.indexOf(val[end - 1]) >= 0) {
end--;
}
return ((start > 0) || (end < original.length())) ? original.substring(start, end) : original;
}
}
Use like this:
assertEquals("abc", StringUtil.strip(" !abc;-< ", StringUtils.PUNCTUATION));

String PUNCTUATION = " !\"',;:.-_?)([]<>*#\n\t\r";
String pattern = "([" + PUNCTUATION.replaceAll("(.)", "\\\\$1") + "]+)";
//[\ \!\"\'\,\;\:\.\-\_\?\)\(\[\]\<\>\*\#\t\n]
pattern = "\\b" + pattern + "|" + pattern + "\\b";
String text = ".\n<>#aword,... \n\t..# asecondword,?";
System.out.println( text.replaceAll(pattern, "") );
//awordasecondword
\b
is for word boundry.
Firstly you should put your characters in to [ ] (chracter class) and escape special characters.
"\b" + pattern
is for leading characters and
pattern + "\b"
is for trailing chracters.

Related

CodingBat starOut, why using substring won't work correctly

I am solving coding challenge on CodingBat.com. Here is the question:
Given a string and a non-empty word string, return a version of the
original String where all chars have been replaced by pluses ("+"),
except for appearances of the word string which are preserved
unchanged.
plusOut("12xy34", "xy") → "++xy++"
plusOut("12xy34", "1") → "1+++++"
plusOut("12xy34xyabcxy", "xy") → "++xy++xy+++xy"
Here is my attempted solution:
public String plusOut(String str, String word)
{
String ret = "";
for (int i = 0; i < str.length() - word.length() + 1; ++i) {
if (str.substring(i, i + word.length()).equals(word))
ret += word;
else
ret += "+";
}
return ret;
}
But is giving wrong outputs: giving too many plus signs. I don't understand why this shouldn't work. I suspect that the substring method is not returning enough matches, so the plus sign is appended. But I don't see why this maybe so.
I would use a StringBuilder to construct the result to avoid creating multiple String objects as String in java is immutable:
public String plusOut(String str, String word) {
StringBuilder result = new StringBuilder(str);
int len = str.length(), wordLen = word.length(), index = 0;
while(index < len){
if ( (index <= len-wordLen) && (str.substring(index, index+wordLen).equals(word))){
index += wordLen;
continue;
}
result.setCharAt(index++, '+');
}
return result.toString();
}
You were doing a few things wrong. I've corrected your code although there is probably a cleaner way to do this. I will explain what's changed below.
public static String plusOut(String str, String word)
{
String ret = "";
for (int i = 0; i < str.length(); ++i) {
int endIndex = i + word.length();
if (endIndex < str.length() + 1
&& str.substring(i, i + word.length()).equals(word)) {
ret += word;
i = i + word.length() - 1;
} else
ret += "+";
}
return ret;
}
First mistake is that you are not looping over the whole content of str and therefore never reach the last character of str.
Another problem is that once you find a word, you don't "jump" to the correct next index in the loop, but still continue looping over characters of the found word, which results in additional + characters in your result string.
i = i + word.length() - 1;
In your solution, the above will put you to the next index of a character inside str that you should be looking at. Example:
In string 12xy34xyabcxy looking for xy.
You will find that word xy starts at index 2 and ends at 3.
At that point you have result string ++xy after adding the found word to it.
Now, the problem begins. You still end up going over index 3 and adding an additional + because the next couple of characters do not add up to your word.
The 2 characters after the found xy also add + and you now have ++xy+++ which is incorrect.
endIndex < str.length() + 1
endIndex is named after what it is - end index of your substring.
This check prevents us from checking for xy when there aren't enough characters left in the string from current index to the last in order to make up xy, so we end up adding + for each remaining character instead.
Do it like this :
public static String plusOut(String str, String word)
{
String ret = "";
int i;
for (i = 0; i < str.length() - word.length() +1 ; i++) {
if (str.substring(i, i + word.length()).equals(word)) {
ret += word;
i += word.length() - 1;
}
else
ret += "+";
}
while (i < str.length()) {
ret += "+";
i++;
}
return ret;
}
Here is your solution
public String plusOut(String str, String word)
{
String ret = "";
for (int i = 0; i < str.length();) {
if (i + word.length()<= str.length() && str.substring(i, i + word.length()).equals(word)) {
ret += word;
i+=word.length();
}
else{
ret += "+";
i++;
}
}
return ret;
}

How can I find a String within a Java program converted to a string?

Basically, I read a java program into my program as a string, and I'm trying to find a way to extract strings from this. I have a loop counting through each character of this program, and this is what happens when it reaches a '"'.
else if (ch == '"')
{
String subString = " ";
index ++;
if (ch != '"')
{
subString += ch;
}
else
{
System.out.println(lineNumber + ", " + TokenType.STRING + ", " + subString);
index ++;
continue;
}
Unfortunately, this isn't working. This is the way I am trying to output the subString.
Essentially, I am looking for a way to add all the characters in between two "s together in order to get a String.
You could use regular expressions:
Pattern regex = Pattern.compile("(?:(?!<')\"(.*?(?<!\\\\)(?:\\\\\\\\)*)\")");
Matcher m = regex.matcher(content);
while (m.find())
System.out.println(m.group(1));
This will capture quoted strings, and takes account of escaped quotes/backslashes.
To break down the pattern:
(?: ... ) = don't capture as a group (the inside is captured instead)
(?!<') = make sure there isn't a single quote before (to avoid '"')
\"( ... )\" = capture what is inside the quotes
.*? = match the minimum of string of any chars
(?<!\\\\) = don't match single backslash before (double-escape = single backslash in content)
(?\\\\\\\\)* = match 0 or even number of backslashes
Together, 5. & 6. only match an even number of backslashes before the quote. This allows string endings like \\", \\\\", but not \" and \\\", which would be part of the string.
Non-regex solution, also taking care of escaped quotes:
List<String> strings = new ArrayList<>();
int start = -1;
int backslashes = 0;
for (int i = 0; i < content.length(); i++) {
char ch = content.charAt(i);
if (ch == '"') {
if (start == -1) {
start = i + 1;
backslashes = 0;
} else if (backslashes % 2 == 1) {
backslashes = 0;
} else {
strings.add(content.substring(start, i));
start = -1;
}
} else if (ch == '\\') backslashes++;
}
strings.forEach(System.out::println);

Java - extract content inside square brackets (ignore nested square brackets)? [duplicate]

This question already has an answer here:
Match contents within square brackets, including nested square brackets
(1 answer)
Closed 3 years ago.
I want to extract the string content inside square brackets (if inside one square brackets contains nested square brackets, it should be ignored).
Example:
c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5
Should return:
match1 = "ts[0],99:99,99:99";
match2 = "ts[1],99:99,99:99, ts[2]";
The code I have so far works only with non-nested square brackets
String in = "c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5";
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher(in);
while(m.find()) {
System.out.println(m.group(1));
}
// print: ts[0, ts[1, 2
I made a function to do it (not with regex, but it works)
for (int i = 0; i < in.length(); i++){
char c = in.charAt(i);
String part = String.valueOf(c);
int numberOfOpenBrackets = 0;
if (c == '[') {
part = "";
numberOfOpenBrackets++;
for (int j = i + 1; j < in.length(); j++) {
char d = in.charAt(j);
if (d == '[') {
numberOfOpenBrackets++;
}
if (d == ']') {
numberOfOpenBrackets--;
i = j;
if (numberOfOpenBrackets == 0) {
break;
}
}
part += d;
}
System.out.println(part);
part = "[" + part + "]";
}
result += part;
}
// print: ts[0],99:99,99:99
// ts[1],99:99,99:99, ts[2]
If the nesting is just one level, you can search for a sequence between the brackets:
a sequence of:
either a not a [
or a [ followed by the shortest sequence to ]
So
Pattern p = Pattern.compile("\\[([^\\[]|\\[.*?\\])*\\]");
// [ ]
// ( not-[ or
// [, shortest sequence to ]
// )* repeatedly
The problem being that brackets must be correctly paired: no syntax errors allowed.
Without regex; just straight java:
import java.util.ArrayList;
import java.util.List;
public class BracketParser {
public static List<String> parse(String target) throws Exception {
List<String> results = new ArrayList<>();
for (int idx = 0; idx < target.length(); idx++) {
if (target.charAt(idx) == '[') {
String result = readResult(target, idx + 1);
if (result == null) throw new Exception();
results.add(result);
idx += result.length() + 1;
}
}
return results;
}
private static String readResult(String target, int startIdx) {
int openBrackets = 0;
for (int idx = startIdx; idx < target.length(); idx++) {
char c = target.charAt(idx);
if (openBrackets == 0 && c == ']')
return target.substring(startIdx, idx);
if (c == '[') openBrackets++;
if (c == ']') openBrackets--;
}
return null;
}
public static void main(String[] args) throws Exception {
System.out.println(parse("c[ts[0],99:99,99:99] + 5 - d[ts[1],99:99,99:99, ts[2]] + 5"));
}
}
Complete code on GitHub
You might want to add a right boundary in your expression and ts start and swipe everything in between, which might work, maybe similar to this expression:
(ts.*?)(\]\s+\+)
If we have more chars here: (\s\+), you can simply add it with logical ORs in a char list and it would still work.
RegEx
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
RegEx Circuit
You can also visualize your expressions in jex.im:

java regex, split on comma only if not in quotes or brackets

I would like to do a java split via regex.
I would like to split my string on every comma when it is NOT in single quotes or brackets.
example:
Hello, 'my,',friend,(how ,are, you),(,)
should give:
hello
my,
friend
how, are, you
,
I tried this:
(?i),(?=([^\'|\(]*\'|\([^\'|\(]*\'|\()*[^\'|\)]*$)
But I can't get it to work (I tested via http://java-regex-tester.appspot.com/)
Any ideas?
Nested paranthesises can't be split by regex. Its easier to split them manually.
public static List<String> split(String orig) {
List<String> splitted = new ArrayList<String>();
int nextingLevel = 0;
StringBuilder result = new StringBuilder();
for (char c : orig.toCharArray()) {
if (c == ',' && nextingLevel == 0) {
splitted.add(result.toString());
result.setLength(0);// clean buffer
} else {
if (c == '(')
nextingLevel++;
if (c == ')')
nextingLevel--;
result.append(c);
}
}
// Thanks PoeHah for pointing it out. This adds the last element to it.
splitted.add(result.toString());
return splitted;
}
Hope this helps.
A java CSV parser library would be better suited to this task than regex: http://sourceforge.net/projects/javacsv/
Assuming no nested (), you could split on
",(?=(?:[^']*'[^']*')*[^']*$)(?=(?:[^()]*\\([^()]*\\))*[^()]*$)"
It will only split on a comma when ahead in the string is an even number of ' and bracket pairs.
It's a brittle solution, but it may be good enough.
As in some comments and answer by #Balthus this should better be done in a CSV Parser. You do need to do some smart RexEx replacement to prepare the input string for parsing. Consider code like this:
String str = "Hello, 'my,',friend,(how ,are, you),(,)"; // input string
// prepare String for CSV parser: replace left/right brackets OR ' by a "
CsvReader reader = CsvReader.parse(str.replaceAll("[(')]", "\""));
reader.readRecord(); // read the CSV input
for (int i=0; i<reader.getColumnCount(); i++)
System.out.printf("col[%d]: [%s]%n", i, reader.get(i));
OUTPUT
col[0]: [Hello]
col[1]: [my,]
col[2]: [friend]
col[3]: [how ,are, you]
col[4]: [,]
I also need to split on comma outside of quotes and brackets.
After searching over all the related answers on SO, I realized a lexer is needed in such a case, and I wrote a generic implementation for myself. It supports a separator, multiple quotes and multiple brackets as regexes.
public static List<String> split(String string, String regex, String[] quotesRegex, String[] leftBracketsRegex,
String[] rightBracketsRegex) {
if (leftBracketsRegex.length != rightBracketsRegex.length) {
throw new IllegalArgumentException("Bracket count mismatch, left: " + leftBracketsRegex.length + ", right: "
+ rightBracketsRegex.length);
}
// Prepare all delimiters.
String[] delimiters = new String[1 + quotesRegex.length + leftBracketsRegex.length + rightBracketsRegex.length];
delimiters[0] = regex;
System.arraycopy(quotesRegex, 0, delimiters, 1, quotesRegex.length);
System.arraycopy(leftBracketsRegex, 0, delimiters, 1 + quotesRegex.length, leftBracketsRegex.length);
System.arraycopy(rightBracketsRegex, 0, delimiters, 1 + quotesRegex.length + leftBracketsRegex.length,
rightBracketsRegex.length);
// Build delimiter regex.
StringBuilder delimitersRegexBuilder = new StringBuilder("(?:");
boolean first = true;
for (String delimiter : delimiters) {
if (delimiter.endsWith("\\") && !delimiter.endsWith("\\\\")) {
throw new IllegalArgumentException("Delimiter contains trailing single \\: " + delimiter);
}
if (first) {
first = false;
} else {
delimitersRegexBuilder.append("|");
}
delimitersRegexBuilder
.append("(")
.append(delimiter)
.append(")");
}
delimitersRegexBuilder.append(")");
String delimitersRegex = delimitersRegexBuilder.toString();
// Scan.
int pendingQuoteIndex = -1;
Deque<Integer> bracketStack = new LinkedList<>();
StringBuilder pendingSegmentBuilder = new StringBuilder();
List<String> segmentList = new ArrayList<>();
Matcher matcher = Pattern.compile(delimitersRegex).matcher(string);
int matcherIndex = 0;
while (matcher.find()) {
pendingSegmentBuilder.append(string.substring(matcherIndex, matcher.start()));
int delimiterIndex = -1;
for (int i = 1; i <= matcher.groupCount(); ++i) {
if (matcher.group(i) != null) {
delimiterIndex = i - 1;
break;
}
}
if (delimiterIndex < 1) {
// Regex.
if (pendingQuoteIndex == -1 && bracketStack.isEmpty()) {
segmentList.add(pendingSegmentBuilder.toString());
pendingSegmentBuilder.setLength(0);
} else {
pendingSegmentBuilder.append(matcher.group());
}
} else {
delimiterIndex -= 1;
pendingSegmentBuilder.append(matcher.group());
if (delimiterIndex < quotesRegex.length) {
// Quote.
if (pendingQuoteIndex == -1) {
pendingQuoteIndex = delimiterIndex;
} else if (pendingQuoteIndex == delimiterIndex) {
pendingQuoteIndex = -1;
}
// Ignore unpaired quotes.
} else if (pendingQuoteIndex == -1) {
delimiterIndex -= quotesRegex.length;
if (delimiterIndex < leftBracketsRegex.length) {
// Left bracket
bracketStack.push(delimiterIndex);
} else {
delimiterIndex -= leftBracketsRegex.length;
// Right bracket
int topBracket = bracketStack.peek();
// Ignore unbalanced brackets.
if (delimiterIndex == topBracket) {
bracketStack.pop();
}
}
}
}
matcherIndex = matcher.end();
}
pendingSegmentBuilder.append(string.substring(matcherIndex, string.length()));
segmentList.add(pendingSegmentBuilder.toString());
while (segmentList.size() > 0 && segmentList.get(segmentList.size() - 1).isEmpty()) {
segmentList.remove(segmentList.size() - 1);
}
return segmentList;
}

Search for Capital Letter in String

I am trying to search a string for the last index of a capital letter. I don't mind using regular expressions, but I'm not too familiar with them.
int searchPattern = searchString.lastIndexOf(" ");
String resultingString = searchString.substring(searchPattern + 1);
As you can see, with my current code I'm looking for the last space that is included in a string. I need to change this to search for last capital letter.
You can write a method as follows:
public int lastIndexOfUCL(String str) {
for(int i=str.length()-1; i>=0; i--) {
if(Character.isUpperCase(str.charAt(i))) {
return i;
}
}
return -1;
}
Pattern pat = Pattern.compile("[A-Z][^A-Z]*$");
Matcher match = pat.matcher(inputString);
int lastCapitalIndex = -1;
if(match.find())
{
lastCapitalIndex = match.start();
}
lastCapitalIndex will contain the index of the last capital letter in the inputString or -1 if no capitals exist.
EDIT NOTE: Solution formerly contained a loop, now it will work with one call to find() and no looping thanks to an improved regex. Tested new pattern as well, and it worked.
In Android (Java) you can use this:
String s = MyDocumentFileIsHere;
String textWithSpace = s.replaceAll("(.)([A-Z])", "$1 $2");
holder.titleTxt.setText(textWithSpace);
The result of String will be "My Document File Is Here"
You can compare each character of the string with the uppercase characters range in the ASCII table (decimal 65 ('A') to 90 ('Z')).
You can increase the readability of your code and benefit from some other features of modern Java here. Please use the Stream approach for solving this problem.
/**
* Finds the last uppercase letter in a String.
*/
public class FindLastCapitalLetter {
public static void main(String[] args) {
String str = "saveChangesInTheEditor";
int lastUppercaseLetter = findLastUppercaseLetter(str);
if (lastUppercaseLetter != -1) {
System.out.println("The last uppercase letter is "
+ Character.toString((char) lastUppercaseLetter));
} else {
System.out.println("No uppercase letter was found in the String.");
}
}
private static int findLastUppercaseLetter(String str) {
return new StringBuilder(str).reverse().toString().chars()
.filter(c -> Character.isUpperCase(c)).findFirst().orElse(-1);
}
}
Sample output:
The last uppercase letter is E
Also, this code gives you the index of the last capital letter in the String.
import java.util.stream.IntStream;
/**
* Finds the index of the last uppercase letter in a String.
*/
public class FindIndexOfLastUppercaseLetter {
public static void main(String[] args) {
String str = "saveChangesInTheEditor";
int lastUppercaseLetterIndex = findLastUppercaseLetter(str);
if (lastUppercaseLetterIndex != -1) {
System.out.println("The last uppercase letter index is " + lastUppercaseLetterIndex
+ " which is " + str.charAt(lastUppercaseLetterIndex));
} else {
System.out.println("No uppercase letter was found in the String.");
}
}
private static int findLastUppercaseLetter(String str) {
int[] stringChars = str.chars().toArray();
int stringCharsLenght = stringChars.length;
return IntStream.range(0, stringCharsLenght)
.map(i -> stringCharsLenght - i - 1)
.filter(i -> Character.isUpperCase(stringChars[i]))
.findFirst().orElse(-1);
}
}
Sample output:
The last uppercase letter index is 16 which is E
LeetCode - Detect capitals
class Solution {
public boolean detectCapitalUse(String word) {
int len = word.length();
if (word.charAt(0) >= 'A' && word.charAt(0) <= 'Z') {
if (word.charAt(len-1) >= 'A' && word.charAt(len-1) <= 'Z') {
for (int i = 1 ; i < len-1 ; i++) {
if ( word.charAt(i) < 'A' || word.charAt(i) > 'Z')
return false;
}
} else {
for (int i = 1 ; i <= len-1 ; i++) {
if ( word.charAt(i) < 'a' || word.charAt(i) > 'z')
return false;
}
}
} else {
for (int i = 0 ; i <= len-1 ; i++) {
if ( word.charAt(i) < 'a' || word.charAt(i) > 'z')
return false;
}
}
return true;
}
}

Categories