replaceFirst for character "`" - java

First time here. I'm trying to write a program that takes a string input from the user and encode it using the replaceFirst method. All letters and symbols with the exception of "`" (Grave accent) encode and decode properly.
e.g. When I input
`12
I am supposed to get 28AABB as my encryption, but instead, it gives me BB8AA2
public class CryptoString {
public static void main(String[] args) throws IOException, ArrayIndexOutOfBoundsException {
String input = "";
input = JOptionPane.showInputDialog(null, "Enter the string to be encrypted");
JOptionPane.showMessageDialog(null, "The message " + input + " was encrypted to be "+ encrypt(input));
public static String encrypt (String s){
String encryptThis = s.toLowerCase();
String encryptThistemp = encryptThis;
int encryptThislength = encryptThis.length();
for (int i = 0; i < encryptThislength ; ++i){
String test = encryptThistemp.substring(i, i + 1);
//Took out all code with regard to all cases OTHER than "`" "1" and "2"
//All other cases would have followed the same format, except with a different string replacement argument.
if (test.equals("`")){
encryptThis = encryptThis.replaceFirst("`" , "28");
}
else if (test.equals("1")){
encryptThis = encryptThis.replaceFirst("1" , "AA");
}
else if (test.equals("2")){
encryptThis = encryptThis.replaceFirst("2" , "BB");
}
}
}
I've tried putting escape characters in front of the grave accent, however, it is still not encoding it properly.

Take a look at how your program works in each loop iteration:
i=0
encryptThis = '12 (I used ' instead of ` to easier write this post)
and now you replace ' with 28 so it will become 2812
i=1
we read character at position 1 and it is 1 so
we replace 1 with AA making 2812 -> 28AA2
i=2
we read character at position 2, it is 2 so
we replace first 2 with BB making 2812 -> BB8AA2
Try maybe using appendReplacement from Matcher class from java.util.regex package like
public static String encrypt(String s) {
Map<String, String> replacementMap = new HashMap<>();
replacementMap.put("`", "28");
replacementMap.put("1", "AA");
replacementMap.put("2", "BB");
Pattern p = Pattern.compile("[`12]"); //regex that will match ` or 1 or 2
Matcher m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find()){//we found one of `, 1, 2
m.appendReplacement(sb, replacementMap.get(m.group()));
}
m.appendTail(sb);
return sb.toString();
}

encryptThistemp.substring(i, i + 1); The second parameter of substring is length, are you sure you want to be increasing i? because this would mean after the first iteration test would not be 1 character long. This could throw off your other cases which we cannot see!

Related

Length of String within tags in java

We need to find the length of the tag names within the tags in java
{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}
so the length of Student tag is 7 and that of subject tag is 7 and that of marks is 5.
I am trying to split the tags and then find the length of each string within the tag.
But the code I am trying gives me only the first tag name and not others.
Can you please help me on this?
I am very new to java. Please let me know if this is a very silly question.
Code part:
System.out.println(
getParenthesesContent("{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}"));
public static String getParenthesesContent(String str) {
return str.substring(str.indexOf('{')+1,str.indexOf('}'));
}
You can use Patterns with this regex \\{(\[a-zA-Z\]*)\\} :
String text = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
Matcher matcher = Pattern.compile("\\{([a-zA-Z]*)\\}").matcher(text);
while (matcher.find()) {
System.out.println(
String.format(
"tag name = %s, Length = %d ",
matcher.group(1),
matcher.group(1).length()
)
);
}
Outputs
tag name = Student, Length = 7
tag name = Subject, Length = 7
tag name = Marks, Length = 5
You might want to give a try to another regex:
String s = "{Abc}{Defg}100{Hij}100{/Klmopr}{/Stuvw}"; // just a sample String
Pattern p = Pattern.compile("\\{\\W*(\\w++)\\W*\\}");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1) + ", length: " + m.group(1).length());
}
Output you get:
Abc, length: 3
Defg, length: 4
Hij, length: 3
Klmopr, length: 6
Stuvw, length: 5
If you need to use charAt() to walk over the input String, you might want to consider using something like this (I made some explanations in the comments to the code):
String s = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
ArrayList<String> tags = new ArrayList<>();
for(int i = 0; i < s.length(); i++) {
StringBuilder sb = new StringBuilder(); // Use StringBuilder and its append() method to append Strings (it's more efficient than "+=") String appended = ""; // This String will be appended when correct tag is found
if(s.charAt(i) == '{') { // If start of tag is found...
while(!(Character.isLetter(s.charAt(i)))) { // Skip characters that are not letters
i++;
}
while(Character.isLetter(s.charAt(i))) { // Append String with letters that are found
sb.append(s.charAt(i));
i++;
}
if(!(tags.contains(sb.toString()))) { // Add final String to ArrayList only if it not contained here yet
tags.add(sb.toString());
}
}
}
for(String tag : tags) { // Printing Strings contained in ArrayList and their length
System.out.println(tag + ", length: " + tag.length());
}
Output you get:
Student, length: 7
Subject, length: 7
Marks, length: 5
yes use regular expression, find the pattern and apply that.

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters
You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>
To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);
Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

String.split by semicolon

I want to split a string by semicolon(";"):
String phrase = "‫;‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";
String[] dateSplit = phrase.split(";");
System.out.println("dateSplit[0]:" + dateSplit[0]);
System.out.println("dateSplit[1]:" + dateSplit[1]);
But it removes the ";" from string and puts all string to 'datesplit1'
so the output is:
dateSplit[0]:‫
dateSplit[1]:‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid`
Demo
and on doing
System.out.println("Real String :"+phrase);
string printed is
Real String :‫;‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid
The phrase contains bi-directional characters like right-to-left embedding. It's why some editors don't manage to display correctly the string.
This piece of code shows the actual characters in the String (for some people the phrase won't display here the right way, but it compiles and looks fine in Eclipse). I just translate left-right with ->, right-to-left with <- and pop directions with ^:
public static void main(String[]args) {
String phrase = "‫;‪14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";
String[] dateSplit = phrase.split(";");
for (String d : dateSplit) {
System.out.println(d);
}
char[] c = phrase.toCharArray();
StringBuilder p = new StringBuilder();
for (int i = 0; i < c.length;i++) {
int code = Character.codePointAt(c, i);
switch (code) {
case 8234:
p.append(" -> ");
break;
case 8235:
p.append(" <- ");
break;
case 8236:
p.append(" ^ ");
break;
default:
p.append(c[i]);
}
}
System.out.println(p.toString());
}
Prints:
<- ; -> 14/May/2015 ^ ^ <- -> FC ^ ^ <- -> Barcelona ^ ^ <- -> VS. ^ ^ <- -> Real ^ ^ <- -> Madrid
The String#split() will work on the actual character string and not on what the editor displays, hence you can see the ; is the second character after a right-to-left, which gives (beware of display again: the ; is not part of the string in dateSplit[1]):
dateSplit[0] = "";
dateSplit[1] = "14/May/2015‬‬ ‫‪FC‬‬ ‫‪Barcelona‬‬ ‫‪VS.‬‬ ‫‪Real‬‬ ‫‪Madrid";
I guess you are processing data from a language writing/reading from right-to-left and there is some mixing with the football team names which are left-to-right. The solution is certainly to get rid of directional characters and put the ; at the right place, i.e as a separator for the token.
I rewrote your code, instead of coping from here and its working perfectly fine.
public static void main(String[] args) {
String phrase = "14/May/2015; FC Barcelona VS. Real Madrid";
String[] dateSplit = phrase.split(";");
System.out.println("dateSplit[0]:" + dateSplit[0]);
System.out.println("dateSplit[1]:" + dateSplit[1]);
}
Demo
Cut and pasting your code into IntelliJ screwed up the editor; as #Palcente said, possible encoding issues.
However, I would recommend usinge a StringTokenizer instead.
StringTokenizer sTok = new StringTokenizer(phrase, ";");
You can then iterate over it, which leads to nicer (and safer) code.

How to remove special characters from input text

I want to remove all special characters from input text as well as some restricted words.
Whatever the things I want to remove, that will come dynamically
(Let me clarify this: Whatever the words I need to exclude they will be provided dynamically - the user will decide what needs to be excluded. That is the reason I did not include regex. restricted_words_list (see my code) will get from the database just to check the code working or not I kept statically ),
but for demonstration purposes, I kept them in a String array to confirm whether my code is working properly or not.
public class TestKeyword {
private static final String[] restricted_words_list={"#","of","an","^","#","<",">","(",")"};
private static final Pattern restrictedReplacer;
private static Set<String> restrictedWords = null;
static {
StringBuilder strb= new StringBuilder();
for(String str:restricted_words_list){
strb.append("\\b").append(Pattern.quote(str)).append("\\b|");
}
strb.setLength(strb.length()-1);
restrictedReplacer = Pattern.compile(strb.toString(),Pattern.CASE_INSENSITIVE);
strb = new StringBuilder();
}
public static void main(String[] args)
{
String inputText = "abcd abc# cbda ssef of jjj t#he g^g an wh&at ggg<g ss%ss ### (()) D^h^D";
System.out.println("inputText : " + inputText);
String modifiedText = restrictedWordCheck(inputText);
System.out.println("Modified Text : " + modifiedText);
}
public static String restrictedWordCheck(String input){
Matcher m = restrictedReplacer.matcher(input);
StringBuffer strb = new StringBuffer(input.length());//ensuring capacity
while(m.find()){
if(restrictedWords==null)restrictedWords = new HashSet<String>();
restrictedWords.add(m.group()); //m.group() returns what was matched
m.appendReplacement(strb,""); //this writes out what came in between matching words
for(int i=m.start();i<m.end();i++)
strb.append("");
}
m.appendTail(strb);
return strb.toString();
}
}
The output is :
inputText : abcd abc# cbda ssef of jjj t#he g^g an wh&at ggg
Modified Text : abcd abc# cbda ssef jjj the gg wh&at gggg ss%ss ### (()) DhD
Here the excluded words are of and an, but only some of the special characters, not all that I specified in restricted_words_list
Now I got a better Solution:
String inputText = title;// assigning input
List<String> restricted_words_list = catalogueService.getWordStopper(); // getting all stopper words from database dynamically (inside getWordStopper() method just i wrote a query and getting list of words)
String finalResult = "";
List<String> stopperCleanText = new ArrayList<String>();
String[] afterTextSplit = inputText.split("\\s"); // split and add to list
for (int i = 0; i < afterTextSplit.length; i++) {
stopperCleanText.add(afterTextSplit[i]); // adding to list
}
stopperCleanText.removeAll(restricted_words_list); // remove all word stopper
for (String addToString : stopperCleanText)
{
finalResult += addToString+";"; // add semicolon to cleaned text
}
return finalResult;
public String replaceAll(String regex,
String replacement)
Replaces each substring of this string (which matches the given regular expression) with the given replacement.
Parameters:
regex - the regular expression to which this string is to be
matched
replacement - the string to be substituted for each match.
So you just need to provide replacement parameter with an empty String.
You should change your loop
for(String str:restricted_words_list){
strb.append("\\b").append(Pattern.quote(str)).append("\\b|");
}
to this:
for(String str:restricted_words_list){
strb.append("\\b*").append(Pattern.quote(str)).append("\\b*|");
}
Because with your loop you're matching the restricted_words_list elements only if there is something before and after the match. Since abc# does not have anything after the # it will not be replaced. If you add * (which means 0 or more occurences) to the \\b on either side it will match things like abc# as well.
You may consider to use Regex directly to replace those special character with empty ''? Check it out: Java; String replace (using regular expressions)?, some tutorial here: http://www.vogella.com/articles/JavaRegularExpressions/article.html
You can also do like this :
String inputText = "abcd abc# cbda ssef of jjj t#he g^g an wh&at ggg<g ss%ss ### (()) D^h^D";
String regx="([^a-z^ ^0-9]*\\^*)";
String textWithoutSpecialChar=inputText.replaceAll(regx,"");
System.out.println("Without Special Char:"+textWithoutSpecialChar);
String yourSetofString="of|an"; // your restricted words.
String op=textWithoutSpecialChar.replaceAll(yourSetofString,"");
System.out.println("output : "+op);
o/p :
Without Special Char:abcd abc cbda ssef of jjj the gg an what gggg ssss h
output : abcd abc cbda ssef jjj the gg what gggg ssss h
String s = "abcd abc# cbda ssef of jjj t#he g^g an wh&at ggg (blah) and | then";
String[] words = new String[]{ " of ", "|", "(", " an ", "#", "#", "&", "^", ")" };
StringBuilder sb = new StringBuilder();
for( String w : words ) {
if( w.length() == 1 ) {
sb.append( "\\" );
}
sb.append( w ).append( "|" );
}
System.out.println( s.replaceAll( sb.toString(), "" ) );

Tokenize a string with a space in java

I want to tokenize a string like this
String line = "a=b c='123 456' d=777 e='uij yyy'";
I cannot split based like this
String [] words = line.split(" ");
Any idea how can I split so that I get tokens like
a=b
c='123 456'
d=777
e='uij yyy';
The simplest way to do this is by hand implementing a simple finite state machine. In other words, process the string a character at a time:
When you hit a space, break off a token;
When you hit a quote keep getting characters until you hit another quote.
Depending on the formatting of your original string, you should be able to use a regular expression as a parameter to the java "split" method: Click here for an example.
The example doesn't use the regular expression that you would need for this task though.
You can also use this SO thread as a guideline (although it's in PHP) which does something very close to what you need. Manipulating that slightly might do the trick (although having quotes be part of the output or not may cause some issues). Keep in mind that regex is very similar in most languages.
Edit: going too much further into this type of task may be ahead of the capabilities of regex, so you may need to create a simple parser.
line.split(" (?=[a-z+]=)")
correctly gives:
a=b
c='123 456'
d=777
e='uij yyy'
Make sure you adapt the [a-z+] part in case your keys structure changes.
Edit: this solution can fail miserably if there is a "=" character in the value part of the pair.
StreamTokenizer can help, although it is easiest to set up to break on '=', as it will always break at the start of a quoted string:
String s = "Ta=b c='123 456' d=777 e='uij yyy'";
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
st.ordinaryChars('0', '9');
st.wordChars('0', '9');
while (st.nextToken() != StreamTokenizer.TT_EOF) {
switch (st.ttype) {
case StreamTokenizer.TT_NUMBER:
System.out.println(st.nval);
break;
case StreamTokenizer.TT_WORD:
System.out.println(st.sval);
break;
case '=':
System.out.println("=");
break;
default:
System.out.println(st.sval);
}
}
outputs
Ta
=
b
c
=
123 456
d
=
777
e
=
uij yyy
If you leave out the two lines that convert numeric characters to alpha, then you get d=777.0, which might be useful to you.
Assumptions:
Your variable name ('a' in the assignment 'a=b') can be of length 1 or more
Your variable name ('a' in the assignment 'a=b') can not contain the space character, anything else is fine.
Validation of your input is not required (input assumed to be in valid a=b format)
This works fine for me.
Input:
a=b abc='123 456' &=777 #='uij yyy' ABC='slk slk' 123sdkljhSDFjflsakd#*#&=456sldSLKD)#(
Output:
a=b
abc='123 456'
&=777
#='uij yyy'
ABC='slk slk'
123sdkljhSDFjflsakd#*#&=456sldSLKD)#(
Code:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
// SPACE CHARACTER followed by
// sequence of non-space characters of 1 or more followed by
// first occuring EQUALS CHARACTER
final static String regex = " [^ ]+?=";
// static pattern defined outside so that you don't have to compile it
// for each method call
static final Pattern p = Pattern.compile(regex);
public static List<String> tokenize(String input, Pattern p){
input = input.trim(); // this is important for "last token case"
// see end of method
Matcher m = p.matcher(input);
ArrayList<String> tokens = new ArrayList<String>();
int beginIndex=0;
while(m.find()){
int endIndex = m.start();
tokens.add(input.substring(beginIndex, endIndex));
beginIndex = endIndex+1;
}
// LAST TOKEN CASE
//add last token
tokens.add(input.substring(beginIndex));
return tokens;
}
private static void println(List<String> tokens) {
for(String token:tokens){
System.out.println(token);
}
}
public static void main(String args[]){
String test = "a=b " +
"abc='123 456' " +
"&=777 " +
"#='uij yyy' " +
"ABC='slk slk' " +
"123sdkljhSDFjflsakd#*#&=456sldSLKD)#(";
List<String> tokens = RegexTest.tokenize(test, p);
println(tokens);
}
}
Or, with a regex for tokenizing, and a little state machine that just adds the key/val to a map:
String line = "a = b c='123 456' d=777 e = 'uij yyy'";
Map<String,String> keyval = new HashMap<String,String>();
String state = "key";
Matcher m = Pattern.compile("(=|'[^']*?'|[^\\s=]+)").matcher(line);
String key = null;
while (m.find()) {
String found = m.group();
if (state.equals("key")) {
if (found.equals("=") || found.startsWith("'"))
{ System.err.println ("ERROR"); }
else { key = found; state = "equals"; }
} else if (state.equals("equals")) {
if (! found.equals("=")) { System.err.println ("ERROR"); }
else { state = "value"; }
} else if (state.equals("value")) {
if (key == null) { System.err.println ("ERROR"); }
else {
if (found.startsWith("'"))
found = found.substring(1,found.length()-1);
keyval.put (key, found);
key = null;
state = "key";
}
}
}
if (! state.equals("key")) { System.err.println ("ERROR"); }
System.out.println ("map: " + keyval);
prints out
map: {d=777, e=uij yyy, c=123 456, a=b}
It does some basic error checking, and takes the quotes off the values.
This solution is both general and compact (it is effectively the regex version of cletus' answer):
String line = "a=b c='123 456' d=777 e='uij yyy'";
Matcher m = Pattern.compile("('[^']*?'|\\S)+").matcher(line);
while (m.find()) {
System.out.println(m.group()); // or whatever you want to do
}
In other words, find all runs of characters that are combinations of quoted strings or non-space characters; nested quotes are not supported (there is no escape character).
public static void main(String[] args) {
String token;
String value="";
HashMap<String, String> attributes = new HashMap<String, String>();
String line = "a=b c='123 456' d=777 e='uij yyy'";
StringTokenizer tokenizer = new StringTokenizer(line," ");
while(tokenizer.hasMoreTokens()){
token = tokenizer.nextToken();
value = token.contains("'") ? value + " " + token : token ;
if(!value.contains("'") || value.endsWith("'")) {
//Split the strings and get variables into hashmap
attributes.put(value.split("=")[0].trim(),value.split("=")[1]);
value ="";
}
}
System.out.println(attributes);
}
output:
{d=777, a=b, e='uij yyy', c='123 456'}
In this case continuous space will be truncated to single space in the value.
here attributed hashmap contains the values
import java.io.*;
import java.util.Scanner;
public class ScanXan {
public static void main(String[] args) throws IOException {
Scanner s = null;
try {
s = new Scanner(new BufferedReader(new FileReader("<file name>")));
while (s.hasNext()) {
System.out.println(s.next());
<write for output file>
}
} finally {
if (s != null) {
s.close();
}
}
}
}
java.util.StringTokenizer tokenizer = new java.util.StringTokenizer(line, " ");
while (tokenizer.hasMoreTokens()) {
String token = tokenizer.nextToken();
int index = token.indexOf('=');
String key = token.substring(0, index);
String value = token.substring(index + 1);
}
Have you tried splitting by '=' and creating a token out of each pair of the resulting array?

Categories