String Tokenizer separation

String Tokenizer separation - java

I want to know how can we separate words of a sentence where delimiter is be a ' '(space) or '?'
or '.'.
For ex
Input: THIS IS A STRING PROGRAM.IS THIS EASY?YES,IT IS.
Output:
THIS
IS
A
STRING
PROGRAM
IS
THIS
EASY
YES
IT
IS

Refer to the constructor of the StringTokenizer class in Java. It has provision to accept custom delimiter.
Try this:
StringTokenizer tokenizer = new StringTokenizer("THIS IS A STRING PROGRAM.IS THIS EASY?YES,IT IS", " .?");
while (tokenizer.hasMoreElements()) {
System.out.println(tokenizer.nextElement());
}

public static void main(String[] args) {
String str = "THIS IS A STRING PROGRAM.IS THIS EASY?YES,IT IS";
StringTokenizer st = new StringTokenizer(str);
System.out.println("---- Split by space ------");
while (st.hasMoreElements()) {
System.out.println(st.nextElement());
}
System.out.println("---- Split by comma ',' ------");
StringTokenizer st2 = new StringTokenizer(str, ",");
while (st2.hasMoreElements()) {
System.out.println(st2.nextElement());
}
}

Related

StringTokeniser not reading data

I've to read data from a file and I've just entered some sample data as a String in StringTokenizer. I can't understand what is wrong with my code here. Can someone please advise?
import java.util.StringTokenizer;
public class rough {
public static void main(String[] args) throws Exception {
StringTokenizer itr = new StringTokenizer("PKE2324 02-12-2020 200"
+ "\nMJD432 19-05-2019 150");
while (itr.hasMoreTokens()){
String line = itr.nextToken().toString();
String[] tokens = line.split(" ");
String[] date = tokens[1].split("-");
String yr = date[2];
String reg = tokens[0];
String regy = new String(reg + " " + yr);
System.out.println(regy);
}
}
}
I want to get the registration number and year as a String. When I run this, I keep getting ArrayIndexOutofBounds

This is the type of runtime error which has been caused due to logical error and inappropriate delimeter.
Assuming you want to tokenise the big string using newline character, use, \n as delimeter to StringTokenizer.
Have a look at the corrected code below which satisfies your use case:
import java.util.StringTokenizer;
public class rough {
public static void main(String[] args) throws Exception {
StringTokenizer itr = new StringTokenizer("PKE2324 02-12-2020 200"
+ "\nMJD432 19-05-2019 150", "\n");
while (itr.hasMoreTokens()){
String line = itr.nextToken().toString();
String[] tokens = line.split(" ");
String[] date = tokens[1].split("-");
String yr = date[2];
String reg = tokens[0];
String regy = new String(reg + " " + yr);
System.out.println(regy);
}
}
}
Output:
PKE2324 2020
MJD432 2019

Exception caused by this line
String[] tokens = line.split(" ");
String[] date = tokens[1].split("-");
line = PKE2324, so tokens[] = { "PKE2324" } --> length = 1
tokens[1] --> ArrayIndexOutOfBoundException
To fix this:
StringTokenizer itr = new StringTokenizer("PKE2324 02-12-2020 200"
+ "\nMJD432 19-05-2019 150");
Default delimeter is whitespace.
You should pass delimeter in constructor if you want different delimeter.

How to tokenize brackets?

I have used StringTokenizer as follows and expected it to actually separates each brackers but it took all as a token. How can I tokenize them?
Stack<String> a=new Stack<>();
String S = "{[()()]}";
String temp="";
StringTokenizer str=new StringTokenizer(S);
while (str.hasMoreTokens()){
temp=str.nextToken();
a.push(temp);
}

// write all symbols you want here on st
StringTokenizer st = new StringTokenizer(str, "#!");
String s = "Hello, i am using Stack Overflow;";
System.out.println("s = " + s);
String delims = " ,;";
StringTokenizer tokens = new StringTokenizer(s, delims);
while(tokens.hasMoreTokens())
System.out.println(tokens.nextToken());

If you try
StringTokenizer st = new StringTokenizer("[[]{}[[]]()]","[]{}()");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
It will return empty as it is tokenizing the string but there is nothing else left after tokenising all the brackets. If you instead try :
StringTokenizer st = new StringTokenizer("[[a]{b}[[c]d]()]","[]{}()");
You will get a b c d - the tokenised values.
Now if you want to leave the brackets there, id recommend lookahead and lookback regex :
StringTokenizer st = new StringTokenizer(z,"[]{}()");
String regEx "(?<=[{}()\\[\\]])|(?=[{}()\\[\\]])";
System.out.println(Arrays.toString(z.split();
that will return :
[[, [, a, ], {, b, }, [, [, c, ], d, ], (, ), ]]

how to delete up extra line breakers in string

I have got a text like this in my String s (which I have already read from txt.file)
trump;Donald Trump;trump#yahoo.eu
obama;Barack Obama;obama#google.com
bush;George Bush;bush#inbox.com
clinton,Bill Clinton;clinton#mail.com
Then I'm trying to cut off everything besides an e-mail address and print out on console
String f1[] = null;
f1=s.split("(.*?);");
for (int i=0;i<f1.length;i++) {
System.out.print(f1[i]);
}
and I have output like this:
trump#yahoo.eu
obama#google.com
bush#inbox.com
clinton#mail.com
How can I avoid such output, I mean how can I get output text without line breakers?

Try using below approach. I have read your file with Scanner as well as BufferedReader and in both cases, I don't get any line break. file.txt is the file that contains text and the logic of splitting remains the same as you did
public class CC {
public static void main(String[] args) throws IOException {
Scanner scan = new Scanner(new File("file.txt"));
while (scan.hasNext()) {
String f1[] = null;
f1 = scan.nextLine().split("(.*?);");
for (int i = 0; i < f1.length; i++) {
System.out.print(f1[i]);
}
}
scan.close();
BufferedReader br = new BufferedReader(new FileReader(new File("file.txt")));
String str = null;
while ((str = br.readLine()) != null) {
String f1[] = null;
f1 = str.split("(.*?);");
for (int i = 0; i < f1.length; i++) {
System.out.print(f1[i]);
}
}
br.close();
}
}

You may just replace all line breakers as shown in the below code:
String f1[] = null;
f1=s.split("(.*?);");
for (int i=0;i<f1.length;i++) {
System.out.print(f1[i].replaceAll("\r", "").replaceAll("\n", ""));
}
This will replace all of them with no space.

Instead of split, you might match an email like format by matching not a semicolon or a whitespace character one or more times using a negated character class [^\\s;]+ followed by an # and again matching not a semicolon or a whitespace character.
final String regex = "[^\\s;]+#[^\\s;]+";
final String string = "trump;Donald Trump;trump#yahoo.eu \n"
+ " obama;Barack Obama;obama#google.com \n"
+ " bush;George Bush;bush#inbox.com \n"
+ " clinton,Bill Clinton;clinton#mail.com";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final List<String> matches = new ArrayList<String>();
while (matcher.find()) {
matches.add(matcher.group());
}
System.out.println(String.join("", matches));
[^\\s;]+#[^\\s;]+
Regex demo
Java demo

package com.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "trump;Donald Trump;trump#yahoo.eu "
+ "obama;Barack Obama;obama#google.com "
+ "bush;George Bush;bush#inbox.com "
+ "clinton;Bill Clinton;clinton#mail.com";
String spaceStrings[] = s.split("[\\s,;]+");
String output="";
for(String word:spaceStrings){
if(validate(word)){
output+=word;
}
}
System.out.println(output);
}
public static final Pattern VALID_EMAIL_ADDRESS_REGEX = Pattern.compile(
"^[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,6}$",
Pattern.CASE_INSENSITIVE);
public static boolean validate(String emailStr) {
Matcher matcher = VALID_EMAIL_ADDRESS_REGEX.matcher(emailStr);
return matcher.find();
}
}

Just replace '\n' that may arrive at start and end.
write this way.
String f1[] = null;
f1=s.split("(.*?);");
for (int i=0;i<f1.length;i++) {
f1[i] = f1[i].replace("\n");
System.out.print(f1[i]);
}

Parse hashtags between symbols

I need to parse hashtags from String (test comment #georgios#gsabanti sefse #afa).
String text = "test comment #georgios#gsabanti sefse #afa";
String[] words = text.split(" ");
List<String> tags = new ArrayList<String>();
for ( final String word : words) {
if (word.substring(0, 1).equals("#")) {
tags.add(word);
}
}
In the end i need an Array with "#georgios" , "#gsabanti" , "#afa" elements.
But now #georgios#gsabanti showing like one hashtag.
How to fix it?

+1 for the Regular Expressions:
Matcher matcher = Pattern.compile("(#[^#\\s]*)")
.matcher("test comment #georgios#gsabanti sefse #afa");
List<String> tags = new ArrayList<>();
while (matcher.find()) {
tags.add(matcher.group());
}
System.out.println(tags);

Here is a simple way of doing that
String text = "test comment #georgios#gsabanti sefse #afa";
String patternst = "#[a-zA-Z0-9]*";
Pattern pattern = Pattern.compile(patternst);
Matcher matcher = pattern.matcher(text);
List<String> tags = new ArrayList<String>();
while (matcher.find()) {
tags.add(matcher.group(0));
}
I hope it will work for you :)

Use Arraylist instead of array:
String text = "test comment #georgios#gsabanti sefse #afa";
ArrayList<String> hashTags = new ArrayList()<>;
char[] c = text.toCharArray();
for(int i=0;i<c.length;i++) {
if(c[i]=='#') {
String hash = "";
for(int j=i+1;j<c.length;j++) {
if(c[j]==' ' || c[j]=='#') {
hashTags.add(hash);
hash="";
break;
}
hash+=c[j];
}
}
}

String text = "test comment #georgios#gsabanti sefse #afa";
String[] words = text.split("(?=#)|\\s+")
List<String> tags = new ArrayList<String>();
for ( final String word : words) {
if (!word.isEmpty() && word.startsWith("#")) {
tags.add(word);
}
}

You can split your string at " " or "#" and keep the delimiters and filter those out which start with "#" like below:
public static void main(String[] args){
String text = "test comment #georgios#gsabanti sefse #afa";
String[] tags = Stream.of(text.split("(?=#)|(?= )")).filter(e->e.startsWith("#")).toArray(String[]::new);
System.out.println(Arrays.toString(tags));
}

String reverse using Java'sstringbuilder

I develop using Java to make a little project.
I want String reverse.
If I entered "I am a girl", Printed reversing...
Already I tried to use StringBuilder.
Also I write it using StringBuffer grammar...
But I failed...
It is not printed my wish...
WISH
My with Print -> "I ma a lrig"
"I am a girl" -> "I ma a lrig" REVERSE!!
How can I do?..
Please help me thank you~!!!
public String reverse() {
String[] words = str.split("\\s");
StringTokenizer stringTokenizer = new StringTokenizer(str, " ");
for (String string : words) {
System.out.print(string);
}
String a = Arrays.toString(words);
StringBuilder builder = new StringBuilder(a);
System.out.println(words[0]);
for (String st : words){
System.out.print(st);
}
return "";
}

Java 8 code to do this :
public static void main(String[] args) {
String str = "I am a girl";
StringBuilder sb = new StringBuilder();
// split() returns an array of Strings, for each string, append it to a StringBuilder by adding a space.
Arrays.asList(str.split("\\s+")).stream().forEach(s -> {
sb.append(new StringBuilder(s).reverse() + " ");
});
String reversed = sb.toString().trim(); // remove trailing space
System.out.println(reversed);
}
O/P :
I ma a lrig

if you do not want to go with lambda then you can try this solution too
String str = "I am a girl";
String finalString = "";
String s[] = str.split(" ");
for (String st : s) {
finalString += new StringBuilder(st).reverse().append(" ").toString();
}
System.out.println(finalString.trim());
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

String Tokenizer separation - java

I want to know how can we separate words of a sentence where delimiter is be a ' '(space) or '?' or '.'. For ex Input: THIS IS A STRING PROGRAM.IS THIS EASY?YES,IT IS. Output: THIS IS A STRING PROGRAM IS THIS EASY YES IT IS

Related

StringTokeniser not reading data

How to tokenize brackets?

how to delete up extra line breakers in string

Parse hashtags between symbols

String reverse using Java'sstringbuilder

Categories

Resources