Missing characters in text Java

Missing characters in text Java - java

I have a small string like: "OT-0*02"
and I have a "database" where I have the full strings "OT-0502" and "OT-0602".
I need to locate these based on the user input, which looks like the first string above, the star marks the unknown character, and it could be anywhere.
How do I do this? I've tried fooling around with the Pattern.matches...stuff and the regex thingy but it doesn't seem to give me any solution.
This is what I've got so far
rsz="OT-0*02";
int cv = 0;
while (cv < jarmu.size()-1){
if (Pattern.matches(jarmu.get(cv).substring(9), rsz)) {
System.out.println("fasz");
}
cv++;
}

It sounds like you just want a simple wildcard search where * matches any char. To do that, change all * in the user input to ., and escape everything else. Then you can use it as a pattern.
String userPattern = "O*-0*02";
String[] parts = userPattern.split("\\*", -1);
for (int i = 0; i < parts.length(); i++) parts[i] = "\\Q" + parts[i] + "\\E";
userPattern = String.join(".", parts);
Then use ****.matches(userPattern) to check if a string matches the pattern.
Basically, you want * to mean "match any single char" but in regex, . performs this function, so you replace *s with .s. However, you don't want anything else to be interpreted as a special character, so the string is broken up into parts at the *s, and each part is quoted using \Q and \E to remove any special meaning, and then put together, with .s where the *s used to be. The -1 means *s at the ends of the string won't be lost.

Related

split a string in java into equal length substrings while maintaining word boundaries

How to split a string into equal parts of maximum character length while maintaining word boundaries?
Say, for example, if I want to split a string "hello world" into equal substrings of maximum 7 characters it should return me
"hello "
and
"world"
But my current implementation returns
"hello w"
and
"orld "
I am using the following code taken from Split string to equal length substrings in Java to split the input string into equal parts
public static List<String> splitEqually(String text, int size) {
// Give the list the right capacity to start with. You could use an array
// instead if you wanted.
List<String> ret = new ArrayList<String>((text.length() + size - 1) / size);
for (int start = 0; start < text.length(); start += size) {
ret.add(text.substring(start, Math.min(text.length(), start + size)));
}
return ret;
}
Will it be possible to maintain word boundaries while splitting the string into substring?
To be more specific I need the string splitting algorithm to take into account the word boundary provided by spaces and not solely rely on character length while splitting the string although that also needs to be taken into account but more like a max range of characters rather than a hardcoded length of characters.

If I understand your problem correctly then this code should do what you need (but it assumes that maxLenght is equal or greater than longest word)
String data = "Hello there, my name is not importnant right now."
+ " I am just simple sentecne used to test few things.";
int maxLenght = 10;
Pattern p = Pattern.compile("\\G\\s*(.{1,"+maxLenght+"})(?=\\s|$)", Pattern.DOTALL);
Matcher m = p.matcher(data);
while (m.find())
System.out.println(m.group(1));
Output:
Hello
there, my
name is
not
importnant
right now.
I am just
simple
sentecne
used to
test few
things.
Short (or not) explanation of "\\G\\s*(.{1,"+maxLenght+"})(?=\\s|$)" regex:
(lets just remember that in Java \ is not only special in regex, but also in String literals, so to use predefined character sets like \d we need to write it as "\\d" because we needed to escape that \ also in string literal)
\G - is anchor representing end of previously founded match, or if there is no match yet (when we just started searching) beginning of string (same as ^ does)
\s* - represents zero or more whitespaces (\s represents whitespace, * "zero-or-more" quantifier)
(.{1,"+maxLenght+"}) - lets split it in more parts (at runtime :maxLenght will hold some numeric value like 10 so regex will see it as .{1,10})
. represents any character (actually by default it may represent any character except line separators like \n or \r, but thanks to Pattern.DOTALL flag it can now represent any character - you may get rid of this method argument if you want to start splitting each sentence separately since its start will be printed in new line anyway)
{1,10} - this is quantifier which lets previously described element appear 1 to 10 times (by default will try to find maximal amout of matching repetitions),
.{1,10} - so based on what we said just now, it simply represents "1 to 10 of any characters"
( ) - parenthesis create groups, structures which allow us to hold specific parts of match (here we added parenthesis after \\s* because we will want to use only part after whitespaces)
(?=\\s|$) - is look-ahead mechanism which will make sure that text matched by .{1,10} will have after it:
space (\\s)
OR (written as |)
end of the string $ after it.
So thanks to .{1,10} we can match up to 10 characters. But with (?=\\s|$) after it we require that last character matched by .{1,10} is not part of unfinished word (there must be space or end of string after it).

Non-regex solution, just in case someone is more comfortable (?) not using regular expressions:
private String justify(String s, int limit) {
StringBuilder justifiedText = new StringBuilder();
StringBuilder justifiedLine = new StringBuilder();
String[] words = s.split(" ");
for (int i = 0; i < words.length; i++) {
justifiedLine.append(words[i]).append(" ");
if (i+1 == words.length || justifiedLine.length() + words[i+1].length() > limit) {
justifiedLine.deleteCharAt(justifiedLine.length() - 1);
justifiedText.append(justifiedLine.toString()).append(System.lineSeparator());
justifiedLine = new StringBuilder();
}
}
return justifiedText.toString();
}
Test:
String text = "Long sentence with spaces, and punctuation too. And supercalifragilisticexpialidocious words. No carriage returns, tho -- since it would seem weird to count the words in a new line as part of the previous paragraph's length.";
System.out.println(justify(text, 15));
Output:
Long sentence
with spaces,
and punctuation
too. And
supercalifragilisticexpialidocious
words. No
carriage
returns, tho --
since it would
seem weird to
count the words
in a new line
as part of the
previous
paragraph's
length.
It takes into account words that are longer than the set limit, so it doesn't skip them (unlike the regex version which just stops processing when it finds supercalifragilisticexpialidosus).
PS: The comment about all input words being expected to be shorter than the set limit, was made after I came up with this solution ;)

Remove Special Characters For A Pattern Java

I want to remove that characters from a String:
+ - ! ( ) { } [ ] ^ ~ : \
also I want to remove them:
/*
*/
&&
||
I mean that I will not remove & or | I will remove them if the second character follows the first one (/* */ && ||)
How can I do that efficiently and fast at Java?
Example:
a:b+c1|x||c*(?)
will be:
abc1|xc*?

This can be done via a long, but actually very simple regex.
String aString = "a:b+c1|x||c*(?)";
String sanitizedString = aString.replaceAll("[+\\-!(){}\\[\\]^~:\\\\]|/\\*|\\*/|&&|\\|\\|", "");
System.out.println(sanitizedString);

I think that the java.lang.String.replaceAll(String regex, String replacement) is all you need:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String).

there is two way to do that :
1)
ArrayList<String> arrayList = new ArrayList<String>();
arrayList.add("+");
arrayList.add("-");
arrayList.add("||");
arrayList.add("&&");
arrayList.add("(");
arrayList.add(")");
arrayList.add("{");
arrayList.add("}");
arrayList.add("[");
arrayList.add("]");
arrayList.add("~");
arrayList.add("^");
arrayList.add(":");
arrayList.add("/");
arrayList.add("/*");
arrayList.add("*/");
String string = "a:b+c1|x||c*(?)";
for (int i = 0; i < arrayList.size(); i++) {
if (string.contains(arrayList.get(i)));
string=string.replace(arrayList.get(i), "");
}
System.out.println(string);
2)
String string = "a:b+c1|x||c*(?)";
string = string.replaceAll("[+\\-!(){}\\[\\]^~:\\\\]|/\\*|\\*/|&&|\\|\\|", "");
System.out.println(string);

Thomas wrote on How to remove special characters from a string?:
That depends on what you define as special characters, but try
replaceAll(...):
String result = yourString.replaceAll("[-+.^:,]","");
Note that the ^ character must not be the first one in the list, since
you'd then either have to escape it or it would mean "any but these
characters".
Another note: the - character needs to be the first or last one on the
list, otherwise you'd have to escape it or it would define a range (
e.g. :-, would mean "all characters in the range : to ,).
So, in order to keep consistency and not depend on character
positioning, you might want to escape all those characters that have a
special meaning in regular expressions (the following list is not
complete, so be aware of other characters like (, {, $ etc.):
String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");
If you want to get rid of all punctuation and symbols, try this regex:
\p{P}\p{S} (keep in mind that in Java strings you'd have to escape
back slashes: "\p{P}\p{S}").
A third way could be something like this, if you can exactly define
what should be left in your string:
String result = yourString.replaceAll("[^\\w\\s]","");
Here's less restrictive alternative to the "define allowed characters"
approach, as suggested by Ray:
String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");
The regex matches everything that is not a letter in any language and
not a separator (whitespace, linebreak etc.). Note that you can't use
[\P{L}\P{Z}] (upper case P means not having that property), since that
would mean "everything that is not a letter or not whitespace", which
almost matches everything, since letters are not whitespace and vice
versa.

the best way for character replacement in String in java

I want to check a string for each character I replace it with other characters or keep it in the string. and also because it's a long string the time to do this task is so important. what is the best way of these, or any better idea?
for all of them I append the result to an StringBuilder.
check all of the characters with a for and charAt commands.
use switch like the previous way.
use replaceAll twice.
and if one of the first to methods is better is there any way to check a character with a group of characters, like :
if (st.charAt(i)=='a'..'z') ....
Edit:
please tell the less consuming in time way and tell the reason.I know all of these ways you said!

If you want to replace a single character (or a single sequence), use replace(), as other answers have suggested.
If you want to replace several characters (e.g., 'a', 'b', and 'c') with a single substitute character or character sequence (e.g., "X"), you should use a regular expression replace:
String result = original.replaceAll("[abc]", "X");
If you want to replace several characters, each with a different replacement (e.g., 'a' with 'A', 'b' with 'B'), then looping through the string yourself and building the result in a StringBuilder will probably be the most efficient. This is because, as you point out in your question, you will be going through the string only once.
String sb = new StringBuilder();
String targets = "abc";
String replacements = "ABC";
for (int i = 0; i < result.length; ++i) {
char c = original.charAt(i);
int loc = targets.indexOf(c);
sb.append(loc >= 0 ? replacements.charAt(loc) : c);
}
String result = sb.toString();

Check the documentation and find some good methods:
char from = 'a';
char to = 'b';
str = str.replace(from, to);

String replaceSample = "This String replace Example shows
how to replace one char from String";
String newString = replaceSample.replace('r', 't');
Output: This Stting teplace Example shows how to teplace one chat ftom Stting
Also, you could use contains:
str1.toLowerCase().contains(str2.toLowerCase())
To check if the substring str2 exists in str1
Edit.
Just read that the String come from a file. You can use Regex for this. That would be the best method.
http://docs.oracle.com/javase/tutorial/essential/regex/literals.html

This is your comment:
I want to replace all of the uppercases to lower cases and replace all
of the characters except a-z with space.
You can do it like this:
str = str.toLowerCase().replaceAll("[^a-z]", " ");
Your requirement should be part of the question, not in comment #7 under a posted answer...

You should look into regex for Java. You can match an entire set of characters. Strings have several functions: replace, replaceAll, and match, which you may find useful here.
You can match the set of alphanumeric, for instance, using [a-zA-Z], which may be what you're looking for.

Java String Regex Divide - Always the Same Pattern

I never understood how to make properly regex to divide my Strings.
I have this types of Strings example = "on[?a, ?b, ?c]";
Sometimes I have this, Strings example2 = "not clear[?c]";
For the first Example I would like to divide into this:
[on, a, b, c]
or
String name = "on";
String [] vars = [a,b,c];
And for the second example I would like to divide into this type:
[not clear, c]
or
String name = "not clear";
String [] vars = [c];
Thanks alot in advance guys ;)

If you know the character set of your identifiers, you can simply do a split on all of the text that isn't in that set. For example, if your identifiers only consist of word characters ([a-zA-Z_0-9]) you can use:
String[] parts = "on[?a, ?b, ?c]".split("[\\W]+");
String name = parts[0];
String[] vars = Arrays.copyOfRange(parts, 1, parts.length);
If your identifiers only have A-Z (upper and lower) you could replace \\W above with ^A-Za-z.
I feel that this is more elegant than using a complex regular expression.
Edit: I realize that this will have issues with your second example "not clear". If you have no option of using something like an underscore instead of a space there, you could do one split on [? (or substring) to get the "name", and another split on the remainder, like so:
String s = "not clear[?a, ?b, ?c]";
String[] parts = s.split("\\[\\?"); //need the '?' so we don't get an extra empty array element in the next split
String name = parts[0];
String[] vars = parts[1].split("[\\W]+");

This comes close, but the problem is the third remembered group is actually repeated so it only captures the last match.
(.*?)\[(?:\s*(?:\?(.*?)(?:\s*,\s*\?(.*?))*)\s*)?]
For example, the first one you list on[?a, ?b, ?c] would give group 1 as on, 2 as a 3 as c. If you are using perl, you could the g flag to apply a regex to a line multiple times and use this:
my #tokens;
while ( my $line =~ /\s*(.*?)\s*[[,\]]/g ) {
push( #tokens, $1 );
}
Note, i did not actually test the perl code, just off the top of my head. It should give you the idea though

String[] parts = example.split("[^\\w ]");
List<String> x = new ArrayList<String>();
for (int i = 0; i < parts.length; i++) {
if (!"".equals(parts[i]) && !" ".equals(parts[i])) {
x.add(parts[i]);
}
}
This will work as long as you don't have more than one space separating your non-space characters. There's probably a cleverer way of filtering out the null and " " strings.

Regular Expression - Java

For the string value "ABCD_12" (including quotes), I would like to extract only the content and exclude out the double quotes i.e. ABCD_12 . My code is:
private static void checkRegex()
{
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9])+\"");
Matcher findMatches = stringPattern.matcher("\"ABC_12\"");
if (findMatches.matches())
System.out.println("Match found" + findMatches.group(0));
}
Now I have tried doing findMatches.group(1);, but that only returns the last character in the string (I did not understand why !).
How can I extract only the content leaving out the double quotes?

Try this regex:
Pattern.compile("\"([a-zA-Z_0-9]+)\"");
OR
Pattern.compile("\"([^\"]+)\"");
Problem in your code is a misplaced + outside right parenthesis. Which is causing capturing group to capture only 1 character (since + is outside) and that's why you get only last character eventually.

A nice simple (read: non-regex) way to do this is:
String myString = "\"ABC_12\"";
String myFilteredString = myString.replaceAll("\"", "");
System.out.println(myFilteredString);
gets you
ABC_12

You should change your pattern to this:
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9]+)\"");
Note that the + sign was moved inside the group, since you want the character repetition to be part of the group. In the code you posted, what you were actually searching for was a repetition of the group, which consisted in a single occurence of a single characters in [a-zA-Z_0-9].

If your pattern is strictly any text in between double quotes, then you may be better off using substring:
String str = "\"ABC_12\"";
System.out.println(str.substring(1, str.lastIndexOf('\"')));
Assuming it is a bit more complex (double quotes in between a larger string), you can use the split() function in the Pattern class and use \" as your regex - this will split the string around the \" so you can easily extract the content you want
Pattern p = Pattern.compile("\"");
// Split input with the pattern
String[] result =
p.split(str);
for (int i=0; i<result.length; i++)
System.out.println(result[i]);
}
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html#split%28java.lang.CharSequence%29

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Missing characters in text Java - java

Related

split a string in java into equal length substrings while maintaining word boundaries

Remove Special Characters For A Pattern Java

the best way for character replacement in String in java

Java String Regex Divide - Always the Same Pattern

Regular Expression - Java

Categories

Resources