How to insert comma in specific index in string using java? - java

My problem is that I have a line with multiple usernames and I want to separate each username with a comma. Each username contains first name and last name and also "&" symbol between them.
The first name and last name with same criteria and long for both
I know the way to follow, which is that I have to find the number of letters of the first name (before the symbol "&") and after the symbol I get the same number of letters of the first name (that will be the second name) and therefore I add the comma here.
For Example:
The input line like this:
ramy&hanyfrank&gerry
The output should be like this:
ramy&hany,frank&gerry
======
I tried to implement it, but doesn't work with me as following:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class one {
public static void main(String[] args) {
// String inputText = "ramy&hanyfrank&gerry";
StringBuffer inputText2 = new StringBuffer("ramy&hanyfrank&gerry");
int usernameLength = 0;
Pattern pattern = Pattern.compile("&");
Matcher matcher = pattern.matcher(inputText2);
boolean found = false;
while (matcher.find()) {
System.out.println("I found " + matcher.group() + " starting at index " +
matcher.start() + " and ending at index " + matcher.end());
usernameLength = matcher.start() + matcher.end();
System.out.println(usernameLength);
inputText2.insert(usernameLength, ",");
found = true;
}
System.out.println(inputText2);
if (!found) {
System.out.println("No match found.");
}
}
}

With your code , it would throw index out of bound exception because matcher.start()+matcher.end() is incorrect . It will be out of bound. You need to have last inserted position of , and reduce from it.
You can try below code.
StringBuffer inputText2 = new StringBuffer("ramy&hanyfrank&gerry");
int usernameLength = 0;
Pattern pattern = Pattern.compile("&");
Matcher matcher = pattern.matcher(inputText2);
boolean found = false;
int a= 0;
while (matcher.find()) {
System.out.println("I found " + matcher.group() + " starting at index " +
matcher.start() + " and ending at index " + matcher.end());
usernameLength = matcher.start()-a+matcher.end();
System.out.println(usernameLength);
if (usernameLength<inputText2.length())
inputText2.insert(usernameLength, ",");
a= usernameLength+1;
System.out.println(inputText2);
found = true;
}
System.out.println(inputText2);
if (!found) {
System.out.println("No match found.");
}

Related

Is there a way to identify tokens in a string while also going by longest substring?

I'm trying to figure out how to properly identify tokens from an input file and return what type it is supposed to be while using a delimiter for white-spaces and new lines.
The four types that the lexer is supposed to identify are:
Identifiers = ([a-z] | [A-Z])([a-z] | [A-Z] | [0-9])*
Numbers = [0-9]+
Punctuation = \+ | \- | \* | / | \( | \) | := | ;
Keywords = if | then | else | endif | while | do | endwhile | skip
For example, if the file has as a line that says:
tcu else i34 2983 ( + +eqdQ
it should tokenize and print out:
identifier: tcu
keyword: else
identifier: i34
number: 2983
punctuation: (
punctuation: +
punctuation: +
identifier: eqdQ
I can't figure out how to get the lexer to go by longest substring for a case in which two different types are right beside each other.
This is what I have for an attempt:
//start
public static void main(String[] args) throws IOException {
//input file//
File file = new File("input.txt");
//output file//
FileWriter writer = new FileWriter("output.txt");
//instance variables
String sortedOutput = "";
String current = "";
Scanner scan = new Scanner(file);
String delimiter = "\\s+ | \\s*| \\s |\\n|$ |\\b\\B|\\r|\\B\\b|\\t";
String[] analyze;
BufferedReader read = new BufferedReader(new FileReader(file));
//lines get read here from the .txt file
while(scan.hasNextLine()){
sortedOutput = sortedOutput.concat(scan.nextLine() + System.lineSeparator());
}
//lines are tokenized here
analyze = sortedOutput.split(delimiter);
//first line is printed here through a separate reader
current = read.readLine();
System.out.println("Current Line: " + current + System.lineSeparator());
writer.write("Current Line: " + current + System.lineSeparator() +"\n");
//string matching starts here
for(String a: analyze)
{
//matches identifiers if it doesn't match with a keyword
if(a.matches(patternAlpha))
{
if(a.matches(one))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(two))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(three))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(four))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(five))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(six))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(seven))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(eight))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else
{
System.out.println("Identifier: " + a);
writer.write("Identifier: "+ a + System.lineSeparator());
}
}
//number check
else if(a.matches(patternNumber))
{
System.out.println("Number: " + a);
writer.write("Number: "+ a + System.lineSeparator());
}
//punctuation check
else if(a.matches(patternPunctuation))
{
System.out.println("Punctuation: " + a);
writer.write("Punctuation: "+ a + System.lineSeparator());
}
//this special case here updates the current line with the next line
else if(a.matches(nihil))
{
System.out.println();
current = read.readLine();
System.out.println("\nCurrent Line: " + current + System.lineSeparator());
writer.write("\nCurrent Line: " + current + System.lineSeparator() + "\n");
}
//everything not listed in regex is read as an error
else
{
System.out.println("Error reading: " + a);
writer.write("Error reading: "+ a + System.lineSeparator());
}
}
//everything closes here to avoid errors
scan.close();
read.close();
writer.close();
}
}
I would greatly appreciate any advice. Thank you in advance.
This can definitely be done without a parser since tokens that are input to a parser can almost always be defined by a regular language (the Unix tools Lex and Flex have been doing this for years. See Flex (lexical analyser generator). I did not want to take the time to hand-translate some Python code that I had that did this very thing into Java, but I took a few minutes to modify it for your example. I did make a few changes that I think are appropriate. As input to a parser, you would typically want to treat the (, ), and ; characters as distinct tokens. You would also want to treat each each reserved word as a distinct token class rather than lumping them together as KEYWORDS (or the singular KEYWORD as I have done).
Methodology
Define your tokens using regular expressions with named capture groups. Make sure you have one for whitespace and comments (if your language defines comments).
Include an ERROR token that will match any single character (using '.' for the regex) to ensure that find() always returns a match until the input is exhausted. This ERROR regex must be the last alternate pattern and if it is matched, it represents an unrecognizable token.
Place these is a list making sure that the regular expression(s) for all of your reserved words precedes your regular expression for an identifier.
Create a single regular expression from Step 3 by "joining" the items in your list with the "|" operator.
Search for the next match. If the actual match found is whitespace or a comment and these tokens have no semantic meaning to the parser, continue matching. If it is the ERROR token, return that to the parser, but do not return successive error tokens. When the input is exhausted, return an end-of-file token.
Quick Java Implementation
This version is structured so that next method can be called to return Token object. Also, it is usually more convenient for the the token type to be represented by an integer because it will ultimately be used to index into parse tables:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Lexer {
public static class Token
{
public int tokenNumber;
public String tokenValue;
public Token(int tokenNumber, String tokenValue)
{
this.tokenNumber = tokenNumber;
this.tokenValue = tokenValue;
}
}
public static int WHITESPACE = 1; // group 1
public static int PUNCTUATION = 2; // group 2 etc.
public static int LPAREN = 3;
public static int RPAREN = 4;
public static int KEYWORD = 5;
public static int IDENTIFIER = 6;
public static int NUMBER = 7;
public static int SEMICOLON = 8;
public static int ERROR = 9;
public static int EOF = 10;
Matcher m;
String text;
boolean skipError;
public static void main(String[] args) {
Lexer lexer = new Lexer("tcu else i34 !!!! 2983 ( + +eqdQ!!!!"); // With some error characters "!" thrown in the middle and at the end
for(;;) {
Token token = lexer.next();
System.out.println(token.tokenNumber + ": " + token.tokenValue);
if (token.tokenNumber == EOF)
break;
}
}
public Lexer(String text)
{
String _WHITESPACE = "(\\s+)";
String _PUNCTUATION = "((?:[+*/-]|:=))";
String _LPAREN = "(\\()";
String _RPAREN = "(\\))";
String _KEYWORD = "(if|then|else|endif|while|do|endwhile|skip)";
String _IDENTIFIER = "([a-zA-Z][0-9a-zA-Z]*)";
String _NUMBER = "([0-9)]+)";
String _SEMICOLON = "(;)";
String _ERROR = "(.)"; // must be last and able to capture one character
String regex = String.join("|", _WHITESPACE, _PUNCTUATION, _LPAREN, _RPAREN, _KEYWORD, _IDENTIFIER, _NUMBER, _SEMICOLON, _ERROR);
Pattern p = Pattern.compile(regex);
this.text = text;
m = p.matcher(this.text);
skipError = false;
}
public Token next()
{
Token token = null;
for(;;) {
if (!m.find())
return new Token(EOF, "<EOF>");
for (int tokenNumber = 1; tokenNumber <= 9; tokenNumber++) {
String tokenValue = m.group(tokenNumber);
if (tokenValue != null) {
token = new Token(tokenNumber, tokenValue);
break;
}
}
if (token.tokenNumber == ERROR) {
if (!skipError) {
skipError = true; // we don't want successive errors
return token;
}
}
else {
skipError = false;
if (token.tokenNumber != WHITESPACE)
return token;
}
}
}
}
Prints:
6: tcu
5: else
6: i34
9: !
7: 2983
3: (
2: +
2: +
6: eqdQ
9: !
10: <EOF>
Java Demo

Check a string contains at least one Unicode letter using regex

I want such a validation that My String must be contains at least one Unicode letter. The character that will evaluate Character.isLetter() to true.
for example i want
~!##$%^&*(()_+<>?:"{}|\][;'./-=` : false
~~1_~ : true
~~k_~ : true
~~汉_~ : true
I know i can use for-loop with Character.isLetter(), but i just don't want to do it.
And This is totally different from this since it only checks for the English alphabets, but in my case is about one unicode letter. It's not a same at all.
You can try to use this regex "\\p{L}|[0-9]"
To better understand Unicode in Regex read this.
Usage code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String args[]) {
// String to be scanned to find the pattern.
String line = "~!##$%^&*(()_+<>?:\"{}|\\][;'./-=`";
String pattern = "\\p{L}|[0-9]"; // regex needed
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
System.out.print("String \"" + line + "\" results to ");
if (m.find()) {
System.out.println("TRUE -> Found value: " + m.group(0));
} else {
System.out.println("FALSE");
}
line = "~~1_~";
m = r.matcher(line);
System.out.print("String \"" + line + "\" results to ");
if (m.find()) {
System.out.println("TRUE -> Found value: " + m.group(0));
} else {
System.out.println("FALSE");
}
line = "~~k_~";
m = r.matcher(line);
System.out.print("String \"" + line + "\" results to ");
if (m.find()) {
System.out.println("TRUE -> Found value: " + m.group(0));
} else {
System.out.println("FALSE");
}
line = "~~汉_~";
m = r.matcher(line);
System.out.print("String \"" + line + "\" results to ");
if (m.find()) {
System.out.println("TRUE -> Found value: " + m.group(0));
} else {
System.out.println("FALSE");
}
}
}
Result:
String "~!##$%^&*(()_+<>?:"{}|\][;'./-=`" results to FALSE
String "~~1_~" results to TRUE
String "~~k_~" results to TRUE -> Found value: k
String "~~汉_~" results to TRUE -> Found value: 汉

How to cut word in java before and after space

Can you help me with this, please?
I would like to get only specified WORD from the below String.
String test1="This is WORD test".
I did this:
String regex = "\\s*\\bWORD\\b\\s*";
Text= test1.replaceAll(regex, " ");
and I get this: This is test
But what I want is the opposite: I want only the part matching the regex.
Sometime my String could be:
String test2="WORD it is the text"
String test3="Text WORD"
But all the time I would like to cut only specified word and put into other string. Thanks
Simple solution using regular expression where I only check for the word being either surrounded by space or at the beginning of the line with space after or at the end of the line with space before.
String regex = "( WORD )|(^WORD )|( WORD$)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(test1);
if (m.find()) {
System.out.println("[" + m.group(0).trim() + "]");
}
EDIT
A possible way to solve this
String test1 = "This is WORD test";
String wordToFind = "WORD";
String message = "";
int k = 0;
for (int i = -1; (i = test1.indexOf(wordToFind, i + 1)) != -1; i++) {
k = i;
}
String s = test1.substring(k, k+ (wordToFind.length()));
if(s.equals(wordToFind)){
message = s;
} else {
message = "The word \"" + wordToFind + "\" was not found in \"" + test1 + "\"";
}
System.out.print(message);

Regex for finding a pattern in a string

I can have string like switchaubcsp-loafsyvgvhv which can possibly contain any of the following patterns: s-loaf, p-loaf etc.
Following is the requirement in detail:
1st character - Any of [p,s,a,r,l],
2nd character -> [-], Followed by Word [loaf].
In the above example, when searched for [p-loaf], Found the text p-loaf starting at 11 index and ending at index 17 using java.util.regex.
What will be the regular expression for finding the first character to be Any of [p,s,a,r,l], 2nd character -> [-], Followed by Word [loaf].
[psarl]-loaf
if you want the word to start with it, add ^ to the beginning and if you want the word to end with it, add $ to the end.
you can try it out here demo regex
import java.util.regex.*;
public class RegexExamples {
public static void main(String[] args) {
RegexTag("devicexyzas-loafcdbdd", "[psarl][-]loaf");
}
public static void RegexTag(Stri`enter code here`ng Content, String PatternToMatch){
Pattern pattern;
try {
pattern = Pattern.compile(PatternToMatch);
}
catch (PatternSyntaxException e)
{
System.err.println ("Regex syntax error: " + e.getMessage ());
System.err.println ("Error description: " + e.getDescription ());
System.err.println ("Error index: " + e.getIndex ());
System.err.println ("Erroneous pattern: " + e.getPattern ());
return;
}
Matcher matcher = pattern.matcher(Content);
System.out.println ("Regex = " + Content);
System.out.println ("Text = " + PatternToMatch);
System.out.println ();
while (matcher.find()) {
System.out.println("Found the text \"" + matcher.group()
+ "\" starting at " + matcher.start()
+ " index and ending at index " + matcher.end());
}
}
}

Markdown algorithm: string difficulties

I started writing this algorithm:
public static String convert(String str) {
if (str.equals("# "))
return " ";
if (str.matches("#+.+")) {
int n = str.length() - str.replaceFirst("#+", "").length();
return "<h" + n + ">" + str.substring(n) + "<h" + n + ">";
}
return str;
}
}
So when I type, ####title, it returns < h4>title< /h4>
My problem is that when I write ####title###title, I would like it to return < h4>title< /h4> < h3>title< /h3> but it only returns < h4>title< /h4>...What am I doing wrong???
Thats because you are using the pattern: - #+.+.
Now, since . matches everything in Regex, so in the above pattern, it matches everything after an initial set of #'s.
So, for your input: - ####title###title, your pattern will match: -
#+ will match ####
.+ will match title###title
You need to change your regex to : - (#+[^#]+), and probably need to use Pattern class here to get the desired output, becaues you want to match every part of your string to the given pattern.
#+[^#]+ -> Will match the first set of # and then everything after that, except #. So it stops where the next set of #'s start.
Here's how you can use it: -
String str = "####title###title"; // str is the method parameter
if (str.equals("# "))
System.out.println(" ");
Pattern pattern = Pattern.compile("(#+[^#]+)");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
String str1 = matcher.group(1);
int n = str1.length() - str1.replaceFirst("#+", "").length();
System.out.println("<h" + n + ">" + str1.substring(n) + "</h" + n + ">");
}
OUTPUT: -
<h4>title</h4>
<h3>title</h3>
You are matching wrong string, try this one:
#+[^#]+
And of course you want to make call it recursivly or in a loop
You are replacing only the first occurrence of #+. Try replacing the if with a while and instead of return inside the if, append the result into a StringBuilder.
Something like:
String str = "####title###title2";
StringBuilder sb = new StringBuilder();
while (str.matches("#+.+")) {
int n = str.length() - str.replaceFirst("#+", "").length();
str = str.replaceFirst("#+", "");
int y = str.length();
if(str.matches(".+#+.+")) {
y = str.indexOf("#");
sb.append( "<h" + n + ">" + str.substring(0,y) + "<h" + n + ">");
str = str.substring(y, str.length());
} else {
sb.append( "<h" + n + ">" + str.substring(0,y) + "<h" + n + ">");
}
}
System.out.println(sb.toString());
}

Categories