How to tokenize the string with and without delimiter in single split

How to tokenize the string with and without delimiter in single split - java

Assume i have a single string content as follows
Input:
FTX+AAA+++201707141009UTC'
FTX+BBB+++201707141009UTC'
FTX+CCC+++201707141009UTC?:??'
PISCO US LTS;?:V.D??'
SOUZA?:GB?:GB'
FTX+ZZZ+++201707141009UTC'
Expected Output:
Number of segments: 4
Input:
FTX+AAA+++201707141009UTC'
FTX+CCC+++201707141009UTC?:??'
PISCO US LTS;?:V.D??'
FTX+ZZZ+++201707141009UTC'
Expected Output:
Number of segments: 3
Basically i want to consider as same line when the delimiter ' comes with a question mark. The line delimiter is '
How to tokenize and get the count the segments in Java ???
Thanks in advance.

You can use a negative lookbehind in a regex:
String input = "FTX+AAA+++201707141009UTC'\n"
+ " FTX+BBB+++201707141009UTC'\n"
+ " FTX+CCC+++201707141009UTC?:??'\n"
+ " PISCO US LTS;?:V.D??' \n"
+ " SOUZA?:GB?:GB'\n"
+ " FTX+ZZZ+++201707141009UTC'";
String[] tokens = input.split("(?<!\\?)'\\s*");
System.out.println(tokens.length);
4
But, in the second example I would expect two segments, not three...

Another alternative to the above - but again demonstrating that the second example you post may be wrong because the third line ends with a ?' which, by your definition should not be a break.
public void test() {
test("FTX+AAA+++201707141009UTC'" +
"FTX+BBB+++201707141009UTC'" +
"FTX+CCC+++201707141009UTC?:??'" +
"PISCO US LTS;?:V.D??'" +
"SOUZA?:GB?:GB'" +
"FTX+ZZZ+++201707141009UTC'");
test("FTX+AAA+++201707141009UTC'" +
"FTX+CCC+++201707141009UTC?:??'" +
"PISCO US LTS;?:V.D??'" +
"FTX+ZZZ+++201707141009UTC'");
}
private void test(String s) {
String[] split = s.split("(?<!\\?)'");
System.out.println(split.length+"->"+Arrays.toString(split));
}
prints
4->[FTX+AAA+++201707141009UTC, FTX+BBB+++201707141009UTC, FTX+CCC+++201707141009UTC?:??'PISCO US LTS;?:V.D??'SOUZA?:GB?:GB, FTX+ZZZ+++201707141009UTC]
2->[FTX+AAA+++201707141009UTC, FTX+CCC+++201707141009UTC?:??'PISCO US LTS;?:V.D??'FTX+ZZZ+++201707141009UTC]

I think what he/she want is this:
String a = "FTX+AAA+++201707141009UTC'"
+ "FTX+BBB+++201707141009UTC'"
+ "FTX+CCC+++201707141009UTC?:??'"
+ "PISCO US LTS;?:V.D??' "
+ "SOUZA?:GB?:GB'"
+ "FTX+ZZZ+++201707141009UTC'";
String result[] = a.split("'");
List<String> stringList = new ArrayList<String>(Arrays.asList(result));
for (int i = 0; i < stringList.size(); i++) {
if (!stringList.get(i).startsWith("FTX") && i != 0) {
stringList.set(i-1, stringList.get(i-1) + stringList.get(i));
stringList.remove(i);
i--;
}
}
for (int j = 0; j < stringList.size(); j++) {
System.out.println(stringList.get(j));
}
FTX+AAA+++201707141009UTC
FTX+BBB+++201707141009UTC
FTX+CCC+++201707141009UTC?:??PISCO US LTS;?:V.D?? SOUZA?:GB?:GB
FTX+ZZZ+++201707141009UTC

Related

Tough Algorithm - Do not let the same character repeat for n positions

I wasn't able to figure this one out since I don't know how to calculate "inserting" an underscore. I included my attempt at solving this problem.
Given a string, do not let the same character repeat for n positions. If it does repeat, insert an underscore to push
it X positions down. The final output needed is just the total number of characters.
Example 1) Input "QQ",2 becomes "Q__Q", the return value is 4.
Example 2) Input "ABCA",2 becomes "ABCA" (no spaces needed), total characters is 4.
Example 3) Input "DEDEE", 1 becomes "DEDE_E", total chars is 6.
Example 4) Input "JKJK", 2 becomes "JK_JK", total characters is 5 (The toughest example).
import java.lang.Math;
import java.util.HashMap;
import java.util.ArrayList;
public class Spacer {
public static void main (String args[]) {
System.out.println("QQ,2 = " + spacey("QQ", 2) + ", expected 4");
System.out.println("ABCA,2 = " + spacey("ABCA",2) + ", expected 4");
System.out.println("DEDEE,1 = " + spacey("DEDEE", 1) + ", expected 6");
System.out.println("JKJK,2 = " + spacey("JKJK", 2) + ", expected 5");
}
private static int spacey(String word, int spaces) {
// int shift = 0;
HashMap<Character, Integer> hm = new HashMap<>();
for (int i=0; i<word.length(); i++) {
char letter = word.charAt(i);
System.out.println(i + "=" + letter + " last saw " + hm.get(word.charAt(i)));
if (hm.get(letter) == null) {
hm.put(letter, i);
} else {
System.out.println(i + "-" + hm.get(letter) + "<=" + spaces);
if (i - hm.get(word.charAt(i)) <= spaces) {
// System.out.println("add " + (spaces + 1 - (i - hm.get(letter))));
// shift += (spaces + 1) - (i - hm.get(letter));
word = word.substring(0, i) + "_" + word.substring(i);
System.out.println(i + " word=" + word);
}
hm.put(letter, i); // update the hashmap with the last seen again
}
}
return word.length();
}
}

Your question is (mainly) about inserting underscores. A key insight that can help move forward is that the input and output strings are different, so it would be cleaner to treat them as such, using a StringBuilder for example. Additionally, it doesn't hurt at this stage to use temporary variables to capture concepts such as distance between characters. Leveraging these two ideas, you can have more self-explanatory code, for example:
public static String space(String input, int spaces) {
HashMap<Character, Integer> map = new HashMap<>();
StringBuilder result = new StringBuilder();
for( char symbol : input.toCharArray() ) {
int position = result.length();
int lastPosition = map.getOrDefault(symbol, position-spaces-1);
int distance = position - lastPosition -1;
for( int j = 0; j < Math.max( spaces - distance, 0) ; j++ ) {
result.append('_');
}
result.append(symbol);
map.put(symbol, result.length()-1);
}
return result.toString();
}
(and once this is mastered and digested, it's of course possible to in-line the temps)

The requirement doesn't ask you to display the constructed string so we need to only do calculations. The regex (.+)\1 will match any repetition of 1 or more chars and countPattern returns how many times that pattern was found.
public static void main(String[] args) {
System.out.println("QQ,2 = " + spacey("QQ", 2) + ", expected 4");
System.out.println("ABCA,2 = " + spacey("ABCA",2) + ", expected 4");
System.out.println("DEDEE,1 = " + spacey("DEDEE", 1) + ", expected 6");
System.out.println("JKJK,2 = " + spacey("JKJK", 2) + ", expected 6"); //in becomes JK__JK, ie. 4 + 2x'_'
}
private static int spacey(String word, int spaces) {
if(spaces<0){
throw new IllegalArgumentException("should be positive value");
}
if(word==null){
return 0;
}
if(spaces==0){
return word.length();
}
final Pattern repeatedCharRegex = Pattern.compile("(.+)\\1");
final int repetitions = countPattern(word, repeatedCharRegex);
return word.length() + repetitions*spaces;
}
public static int countPattern(String references, Pattern referencePattern) {
Matcher matcher = referencePattern.matcher(references);
int count = 0;
while (matcher.find()){
count++;
}
return count;
}

First of all you have an error in one of your test cases. Assuming you want to reproduce the cases in the quoted challenge, you need a 1 as second argument to the call to spacey here:
System.out.println("DEDEE,1 = " + spacey("DEDEE", 1) + ", expected 6");
// ^ ^
The formula to calculate the number of underscores to insert is:
previousindex + n + 1 - i
...where previousindex is the index at which the current letter occurred before, and i is the current index.
You can repeat an underscore with the .repeat string method. Don't forget to update i afterwards, so it keeps pointing to the currently processed character (which moved forward).
So your code could work like this:
import java.lang.Math;
import java.util.HashMap;
import java.util.ArrayList;
public class Spacer {
public static void main (String args[]) {
System.out.println("QQ,2 = " + spacey("QQ", 2) + ", expected 4");
System.out.println("ABCA,2 = " + spacey("ABCA",2) + ", expected 4");
System.out.println("DEDEE,1 = " + spacey("DEDEE", 1) + ", expected 6");
System.out.println("JKJK,2 = " + spacey("JKJK", 2) + ", expected 5");
}
private static int spacey(String word, int spaces) {
HashMap<Character, Integer> hm = new HashMap<>();
for (int i=0; i<word.length(); i++) {
char letter = word.charAt(i);
if (hm.get(letter) == null) {
hm.put(letter, i);
} else {
int underscores = hm.get(letter) + spaces + 1 - i;
if (underscores > 0) { // Need to add underscores
word = word.substring(0, i) + "_".repeat(underscores) + word.substring(i);
i += underscores; // update i so it still points to the current character
}
hm.put(letter, i);
}
}
return word.length();
}
}

how to delete empty line and rest of the character in java

I want to delete empty line and rest of the character from my string, I would like to parse particular value alone from the string.
I want this value alone 23243232 from my string, after product price I've have empty line space and again I've some character so I'm using that empty line as delimiter and trying to get product price alone. But I'm getting other values also along with 23243232. Can someone help me to get only 23243232 from this string
String actualResponse = "--sGEFoZV85Qnkco_QAU5b6B3Tt1OrOOFkArwzoF_yDmmW5DfupJDtuHlh20LL2SAbWZb8a3exzoF_yDmmW5DfupJDtuHlh20LL2SAbWZb8a3exsGEFoZV85Qnkco_QAU5b6B3Tt1OrOOFkArw\r\n"
+ "Product-Discription: form-name; productName=\"iPhone\"\r\n" + "Product-Type: Mobile\r\n"
+ "Product-Price: 23243232\r\n" + "\r\n" + "%dsafdfw32.323efaeed\r\n" + "#$#####";
String productPrice = actualResponse.substring(actualResponse.lastIndexOf("Product-Price:") + 15);
System.out.println("Printing product price ..." + productPrice);
String finalString = productPrice.replaceAll(" .*", "");
This is the output I'm getting:
Printing product price ...23243232
%dsafdfw32.323efaeed
#$#####
But I want only 23243232 - this value alone.

Apply Regular Expression for more flexibility.
String content = "--sGEFoZV85Qnkco_QAU5b6B3Tt1OrOOFkArwzoF_yDmmW5DfupJDtuHlh20LL2SAbWZb8a3exzoF_yDmmW5DfupJDtuHlh20LL2SAbWZb8a3exsGEFoZV85Qnkco_QAU5b6B3Tt1OrOOFkArw\r\n"
+ "Product-Discription: form-name; productName=\"iPhone\"\r\n" + "Product-Type: Mobile\r\n"
+ "Product-Price: 23243232\r\n" + "\r\n" + "%dsafdfw32.323efaeed\r\n" + "#$#####";
String re1 = "\\bProduct-Price:\\s"; // Word 1
String re2 = "(\\d+)"; // Integer Number 1
Pattern p = Pattern.compile(re1 + re2, Pattern.DOTALL);
Matcher m = p.matcher(content);
while (m.find()) {
for (int i = 0; i <= m.groupCount(); i++) {
System.out.println(String.format("Group=%d | Value=%s",i, m.group(i)));
}
}
It will print out:
Group=0 | Value=Product-Price: 23243232
Group=1 | Value=23243232

first solution came in my mind. its not the best but will solve your problem.
StringBuilder finalString =new StringBuilder();
for (Character c : productPrice.toCharArray()) {
if(Character.isDigit(c)){
finalString.append(c);
}else{
break;
}
}

This is because you are printing the entire sub-string right from index: actualResponse.lastIndexOf("Product-Price:") + 15 to the end of the string.
You need to provide the end index too as a second parameter in substring method.
You need to use this:
int start = actualResponse.lastIndexOf("Product-Price:") + 15;
int end = actualResponse.indexOf("\r\n", start); // The first "\r\n" from the index `start`
String productPrice = actualResponse.substring(start, end);

This will give your final ans...
String actualResponse ="--sGEFoZV85Qnkco_QAU5b6B3Tt1OrOOFkArwzoF_yDmmW5DfupJDtuHlh20LL2SAbWZb8a3exzoF_y DmmW5DfupJDtuHlh20LL2SAbWZb8a3exsGEFoZV85Qnkco_QAU5b6B3Tt1OrOOFkArw\r\n"
+ "Product-Discription: form-name; productName=\"iPhone\"\r\n" + "Product-Type: Mobile\r\n"
+ "Product-Price: 23243232\r\n" + "\r\n" + "%dsafdfw32.323efaeed\r\n" + "#$#####";
String productPrice = actualResponse.substring(actualResponse.lastIndexOf("Product-Price:") + 15);
System.out.println("Printing content lenght..." + productPrice.split("\r\n")[0]);

How to only add something to a string if it doesn't contain it?

I am making a Lipogram program where any words with the banned letter are printed, however, the words are sometimes printed twice. How do I get it to not repeat the words?
Here is my code:
public String allWordsWith(char letter) {
String str = "";
String word = "";
s = s.replace(".", " ");
s = s.replace(",", " ");
s = s.replace("?", " ");
s = s.replace("!", " ");
s = " " + s + " ";
for (int i = 0; i <= s.lastIndexOf(letter); i++) {
if (s.charAt(i) == letter) {
if (str.contains(s.substring(s.lastIndexOf(" ", i), s.lastIndexOf(" ", i) + 1) + '\n') == true) {
} else {
word += s.substring(s.lastIndexOf(" ", i), s.indexOf(" ", i)) + '\n';
str += word;
}
}
}
return str;
}

Important clarification: Is the function run with the letter chosen as "o" on the string "hello hi hello howdy" meant to return "hello hello howdy" or "hello howdy". I.e., if the word appears twice, do you want to print it twice, or do you only want to print it once regardless of repetition?
If only once regardless of repetition, then you should be using a Set to store your data.
However, I think there's a chance you're instead dealing with an issue that when running the function with the letter chosen as "l" on that same string, "hello hi hello howdy", you are getting an output of "hello hello hello hello". Correct?
The issue here is that you are checking every letter and not testing each word. To fix this, I would use:
String[] words = s.split(" ");
to create an array of your words. Test each value in that array to see if it contains the given letter using:
if(words[index].contains(letter)){
str += " " + words[index];
}

Swapping the position of elements within an array in java?

Ok, so it's the first time I am posting out here, so bear with me.
I have a name in the format of "Smith, Bob I" and I need to switch this string around to read "Bob I. Smith". Any ideas on how to go about doing this?
This is one way that I've tried, and while it does get the job done, It looks pretty sloppy.
public static void main(String[] args) {
String s = "Smith, Bob I.", r = "";
String[] names;
for(int i =0; i < s.length(); i++){
if(s.indexOf(',') != -1){
if(s.charAt(i) != ',')
r += s.charAt(i);
}
}
names = r.split(" ");
for(int i = 0; i < names.length; i++){
}
System.out.println(names[1] +" " + names[2] + " " + names[0]);
}

If the name is always <last name>, <firstname>, try this:
String name = "Smith, Bob I.".replaceAll( "(.*),\\s+(.*)", "$2 $1" );
This will collect Smith into group 1 and Bob I. into group 2, which then are accessed as $1 and $2 in the replacement string. Due to the (.*) groups in the expression the entire string matches and will be replaced completely by the replacement, which is just the 2 groups swapped and separated by a space character.

String[] names = "Smith, Bob I.".split("[, ]+");
System.out.println(names[1] + " " + names[2] + " " + names[0]);

final String[] s = "Smith, Bob I.".split(",");
System.out.println(String.format("%s %s", s[1].trim(), s[0]));

String s = "Smith, Bob I.";
String result = s.substring(s.indexOf(" ")).trim() + " "
+ s.substring(0, s.indexOf(","));

Java program that acts as an assembler (for a made up language): will not quit while loop

Basically, it is a two pass assembler and I am working on implementing entry points into a given assembly file. The format of the command is as follows:
Prog .ORIG
.ENT some,entry,point
some LD R0,entry
entry LD R1,point
point .FILL #42
.END some
The relevant part is the .ENT line. That is the line the assembler is getting hung up on.
The first pass takes care of handling .ENTs but it will not work for anything more than two arguments (that is, more than one comma). It does work for two operands and less, though. The code for the specific .ENT part is as follows:
String comma = ",";
String entry = "";
String[] tempEntryArray = new String[2];
int indexOfComma = read.indexOf(comma);
int startingIndex = 17;
int numOperands = 1;
while (indexOfComma != -1) {
if ((indexOfComma-startingIndex) == 0) {
return "An operand must precede the comma.";
}
if (numOperands > 4) {
return "The .ENT psuedo-op on line " + lineCounter
+ " has more than 4 operands.";
}
entry = overSubstring(read, startingIndex, indexOfComma);
if (entry.contains(" ")) {
return "The operand \"" + entry + "\" on line "
+ lineCounter + " has a space in it.";
}
if (entry.length() > 6) {
return "The operand \"" + entry + "\" on line "
+ lineCounter + " is longer than 6 characters.";
}
machineTables.externalSymbolTable.put(entry, tempEntryArray);
entry = read.substring(indexOfComma + 1);
startingIndex = indexOfComma + 1;
indexOfComma = entry.indexOf(comma);
if (indexOfComma != -1) {
indexOfComma += (startingIndex - 1);
}
numOperands++;
}
entry = overSubstring(read, startingIndex, read.length());
if (entry.contains(" ")) {
return "The operand \"" + entry + "\" on line "
+ lineCounter + " has a space in it.";
}
if (entry.length() > 6) {
return "The operand \"" + entry + "\" on line "
+ lineCounter + " is longer than 6 characters.";
}
machineTables.externalSymbolTable.put(entry, tempEntryArray);
read is a String containing one line of the input file.
overSubstring is a method that will perform similarly to substring but it will return a whitespace character if it reads a null string.
I am sorry for the huge block of code, and I know the error messages can be done a lot better, but for now I am concerned with this particular code hanging the assembler whenever there are more than two operands (more than one comma).
I would appreciate it very much if someone could help me with this problem.
Thanks.

I think that you're reading the same indexOfComma value infinitely. Instead of all that startingIndex and substring() business, just use String#indexOf(String, int) instead of String#indexOf(String) to properly skip the preceding indices you've already found.
Get indexOfComma consistently. Something like this:
int indexOfComma = -1;
int numOperands = 1;
while ((indexOfComma = read.indexOf(comma, indexOfComma+1)) != -1) {
// snip...
machineTables.externalSymbolTable.put(entry, tempEntryArray);
numOperands++;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to tokenize the string with and without delimiter in single split - java

Related

Tough Algorithm - Do not let the same character repeat for n positions

how to delete empty line and rest of the character in java

How to only add something to a string if it doesn't contain it?

Swapping the position of elements within an array in java?

Java program that acts as an assembler (for a made up language): will not quit while loop

Categories

Resources