How to add delimiters from the StringTokenizers to a seperate string?

How to add delimiters from the StringTokenizers to a seperate string? - java

I am inputting a string and I want to add the delimeters in that string to a different string and I was wondering how you would do that. This is the code I have at the moment.
StringTokenizer tokenizer = new StringTokenizer(input, "'.,><-=[]{}+!##$%^&*()~`;/?");
while (tokenizer.hasMoreTokens()){
//add delimeters to string here
}
Any help would be greatly appreciated(:

If you want StringTokenizer to return the delimiters it parses, you would need to add a flag to the constructor as shown here
StringTokenizer tokenizer = new StringTokenizer(input, "'.,><-=[]{}+!##$%^&*()~`;/?", true);
But if you are searching only for delimiters I dont think this is the right approach.

I don't think StringTokenizer is good for this task, try
StringBuilder sb = new StringBuilder();
for(char c : input.toCharArray()) {
if ("'.,><-=[]{}+!##$%^&*()~`;/?".indexOf(c) >= 0) {
sb.append(c);
}
}

I'm guessing you want to extract all the delimiters from the string and process them
String allTokens = "'.,><-=[]{}+!##$%^&*()~`;/?";
StringTokenizer tokenizer = new StringTokenizer(input, allTokens, true);
while(tokenizer.hasMoreTokens()) {
String nextToken = tokenizer.nextToken();
if(nextToken.length()==1 && allTokens.contains(nextToken)) {
//this token is a delimiter
//append to string or whatever you want to do with the delimiter
processDelimiter(nextToken);
}
}
Create a processDelimiter method in which you add the delimiter to a different string or perform any action you want.

This would even take care of repeated usage of delimeters
String input = "adfhkla.asijdf.';.akjsdhfkjsda";
String compDelims = "'.,><-=[]{}+!##$%^&*()~`;/?";
String delimsUsed = "";
for (char a : compDelims.toCharArray()) {
if (input.indexOf(a) > 0 && delimsUsed.indexOf(a) == -1) {
delimsUsed += a;
}
}
System.out.println("The delims used are " + delimsUsed);

Related

Split command on a nextElement

I am making a java servlet and am trying to make it display a preview of 3 different articles. I want it to preview the first sentence of each article, but can't seem to get split to work properly since I am reading the articles in with tokenizer. So I have something like:
while ((s = br.readLine()) != null) {
out.println("<tr>");
StringTokenizer s2 = new StringTokenizer(s, "|");
while (s2.hasMoreElements()) {
if (index == 0) {
out.println("<td class='first'>" + s2.nextElement() + "</td>");
}
out.println("</tr>");
}
index = 0;
}
How do I make s2.nextElement print out only the first sentence instead of the whole article? I imagine I could do split with a delimiter of ".", but can't get the code to work right. Thanks.

Try
s2.nextElement().split("\\.")[0];
to get the first sentence in the paragraph.

It would be better to use a Scanner:
Scanner scanner = new Scanner(new File("articles.txt"));
while (scanner.hasNext()) {
String article = scanner.next();
String[] parts = article.split("\\s*\\|\\s*");
String title = parts[0];
String text = parts[1];
String date = parts[2];
String image = parts[3];
String firstSentence = text.replaceAll("\\..*", ".");
// Output what you like about the article using the extracted parts
}
Scanner.next() reads in the whole line (the default delimiter is the newline char(s)).
split("\\s*\\|\\s*") splits the line on pipe chars (which have to be escaped because the pipe char has special regex meaning) and the \s* consumes any whitespace that may surround the pipe chars.

What I did was change hasMoreElements() to hasMoreTokens(). I then found the first occurrence of a ".". and created an int value. I then printed out a substring. here is what my code looked like:
while((s = br.readLine()) != null){
out.println("<tr>");
StringTokenizer s2 = new StringTokenizer(s, "|");
while (s2.hasMoreTokens()){
if (index == 0){
String one = s2.nextToken();
int i = one.indexOf(".");
out.println("<td>"+one.substring(0 , i)+"."+"</td>");
}

How to replace all special characters with another character in java?

I want to replace all 'special characters' with a special character in java
For example 'cash&carry' will become 'cash+carry' and also 'cash$carry' will become 'cash+carry'
I have a sample CSV file as
Here the CSV headers are 'What' and 'Where'
What,Where
salon,new+york+metro
pizza,los+angeles+metro
crate&barrel,los+angeles+metro
restaurants,los+angeles+metro
gas+station,los+angeles+metro
persian+restaurant,los+angeles+metro
car+wash,los+angeles+metro
book store,los+angeles+metro
garment,los+angeles+metro
"cash,carry",los+angeles+metro
cash&carry,los+angeles+metro
cash carry,los+angeles+metro
The expected output
What,Where
salon,new+york+metro
pizza,los+angeles+metro
crate+barrel,los+angeles+metro
restaurants,los+angeles+metro
gas+station,los+angeles+metro
persian+restaurant,los+angeles+metro
car+wash,los+angeles+metro
book+store,los+angeles+metro
garment,los+angeles+metro
cash+carry,los+angeles+metro
cash+carry,los+angeles+metro
cash+carry,los+angeles+metro
The sample code is as follows
String csvfile="BidAPI.csv";
try{
// create the 'Array List'
ArrayList<String> What=new ArrayList<String>();
ArrayList<String> Where=new ArrayList<String>();
BufferedReader br=new BufferedReader(new FileReader(csvfile));
StringTokenizer st=null;
String line="";
int linenumber=0;
int columnnumber;
int free=0;
int free1=0;
while((line=br.readLine())!=null){
linenumber++;
columnnumber=0;
st=new StringTokenizer(line,",");
while(st.hasMoreTokens()){
columnnumber++;
String token=st.nextToken();
if("What".equals(token)){
free=columnnumber;
System.out.println("the value of free :"+free);
} else if("Where".equals(token)){
free1=columnnumber;
System.out.println("the value of free1 :"+free1);
}
if(linenumber>1){
if (columnnumber==free){
What.add(token);
} else if(columnnumber==free1){
Where.add(token);
}
}
}
}
// converting the 'What' Array List to array
String[] what=What.toArray(new String[What.size()]);
// converting the 'Where' Array List to array
String[] where = Where.toArray(new String[Where.size()]);
for(int i=0;i<what.length;i++){
String data = what[i].replaceAll("[^A-Za-z0-9\",]| (?!([^\"]*\"){2}[^\"]*$)", "+").replace("\"", "");
System.out.println(data);
System.out.println(where[i]);
String finaldata = data+where[i];
String json = readUrl(desturl);
br.close();
}catch(Exception e){
System.out.println("There is an error :"+e);
}
All the special characters, all the spaces and the double quotes should be removed and replaced as in the desired output.
I am using value.replaceAll("[^A-Za-z0-9 ]", "+") , but it is not working.
Error
cash
carry"
Any help is appreciated. new to regex.

You need to:
replace all commas within quotes with +
replace non-whitelist (and you need to add commas to your whitelist)
+
remove double quotes
Try this:
line = line.replaceAll("[^A-Za-z0-9\",]|,(?!(([^\"]*\"){2})*[^\"]*$)", "+").replace("\"", "");

I think your regex is pretty close. Add an exception for comma's as well and get rid of the space and you are good.
BufferedReader r = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = r.readLine()) != null)
{
String replaced = line.replace("\"", "");
replaced = replaced.replaceAll("[^A-Za-z0-9,]", "+");
System.out.println(replaced);
}
Of course, Strings are immutable in Java. Keep that in mind. replaceAll() returns a new String and does not modify the original instance.
Demo here.

You need to first find quote and replace , inside it with +. Next you can just use replaceAll("[^A-Za-z0-9,]", "+") so you will replace all non alphanumeric characters or , with +. Your code for that can use
Pattern p = Pattern.compile("\"([^\"]*)\"");
pattern to locate quotations and appendReplacement, appendTail from Matcher class to replace founded quotations with its new version.
So in short your code can look something like
Scanner scanner = new Scanner(new File(csvfile));
Pattern p = Pattern.compile("\"([^\"]*)\"");
StringBuffer sb = new StringBuffer();
while(scanner.hasNextLine()){
String line = scanner.nextLine();
Matcher m = p.matcher(line);
while (m.find()){//find quotes
//and replace their content with content with replaced `,` by `+`
//BTW group(1) holds part of quotation without `"` marsk
m.appendReplacement(sb, m.group(1).replace(',', '+'));
}
m.appendTail(sb);//we need to also add rest of unmatched data to buffer
//now we can just normally replace special characters with +
String result = sb.toString().replaceAll("[^A-Za-z0-9,]", "+");
//after job is done we can use result, so lest print it
System.out.println(result);
//lets not forget to reset buffer for next line
sb.delete(0, sb.length());
}

Answer to the question
String csvfile="BidAPI.csv";
try{
// create the 'Array List'
ArrayList<String> What=new ArrayList<String>();
ArrayList<String> Where=new ArrayList<String>();
BufferedReader br=new BufferedReader(new FileReader(csvfile));
StringTokenizer st=null;
String line="";
int linenumber=0;
int columnnumber;
int free=0;
int free1=0;
while((line=br.readLine())!=null){
line =line.replaceAll("[^A-Za-z0-9\",]|,(?!(([^\"]*\"){2})*[^\"]*$)", "+").replace("\"", "");
linenumber++;
columnnumber=0;
st=new StringTokenizer(line,",");
while(st.hasMoreTokens()){
columnnumber++;
String token=st.nextToken();
if("What".equals(token)){
free=columnnumber;
System.out.println("the value of free :"+free);
} else if("Where".equals(token)){
free1=columnnumber;
System.out.println("the value of free1 :"+free1);
}
if(linenumber>1){
if (columnnumber==free){
What.add(token);
} else if(columnnumber==free1){
Where.add(token);
}
}
}
}
// converting the 'What' Array List to array
String[] what=What.toArray(new String[What.size()]);
// converting the 'Where' Array List to array
String[] where = Where.toArray(new String[Where.size()]);
for(int i=0;i<what.length;i++){
String data = what[i].replaceAll("[^A-Za-z0-9\",]| (?!([^\"]*\"){2}[^\"]*$)", "+").replace("\"", "");
System.out.println(data);
System.out.println(where[i]);
String finaldata = data+where[i];
String json = readUrl(desturl);
br.close();
}catch(Exception e){
System.out.println("There is an error :"+e);
}

Tokenize words ignoring hashtags with Open nlp

I'm trying to tokenize some sentences. For example the sentences :
String sentence = "The sky is blue. A cat is #blue.";
I use the following command with Open nlp:
SimpleTokenizer tokenizer = SimpleTokenizer.INSTANCE;
String[] result = tokenizer.tokenize(sentence);
But I want opennlp considers '#' as a letter of a word. So '#blue#' will be a token.
How to do this ?

You just have to create a new Tokenizer object (implementing Tokenizer).
Tokenizer t = new Tokenizer() {
#Override
public Span[] tokenizePos(String arg0) {
}
#Override
public String[] tokenize(String arg0) {
}
};
Then, Copy/Paste the SimpleTokenizer code into thoses 2 functions.
And Associate the '#' to others alphanumericals values :
if (StringUtil.isWhitespace(c)) {
charType = CharacterEnum.WHITESPACE;
} else if (Character.isLetter(c) || c=='#') {
charType = CharacterEnum.ALPHABETIC;
} else if (Character.isDigit(c)) {
charType = CharacterEnum.NUMERIC;
} else {
charType = CharacterEnum.OTHER;
}

Maybe you are just being unlucky, try this:
public static void tokenize() throws InvalidFormatException, IOException {
InputStream is = new FileInputStream("models/en-token.bin");
TokenizerModel model = new TokenizerModel(is);
Tokenizer tokenizer = new TokenizerME(model);
String tokens[] = tokenizer.tokenize("The sky is blue. A cat is #blue. ");
for (String a : tokens)
System.out.println(a);
is.close();
}
As you can see, "#blue" is tokenized as a single token. And the intelligence of the Tokenizer remains.
For this to work you will need the "en-token.bin" model for this to work.

you could just try String[] tokens = sentence.split(" ");
split() is a method of String in java. Passing it a space (i.e. " ") will just give you all the tokens in the string delimited by a space

I want to search for a string using StringTokenizer but the string I'm looking for has a delimiter in it - Java

I have an external file named quotes.txt and I'll show you some contents of the file:
1 Everybody's always telling me one thing and out the other.
2 I love criticism just so long as it's unqualified praise.
3 The difference between 'involvement' and 'commitment' is like an eggs-and-ham
breakfast: the chicken was 'involved' - the pig was 'committed'.
I used this: StringTokenizer str = new StringTokenizer(line, " .'");
This is the code for the searching:
String line = "";
boolean wordFound = false;
while((line = bufRead.readLine()) != null) {
while(str.hasMoreTokens()) {
String next = str.nextToken();
if(next.equalsIgnoreCase(targetWord) {
wordFound = true;
output = line;
break;
}
}
if(wordFound) break;
else output = "Quote not found";
}
Now, I want to search for strings "Everybody's" and "it's" in line 1 and 2 but it won't work since the apostrophe is one of the delimiters. If I remove that delimiter, then I won't be able to search for "involvement", "commitment", "involved" and "committed" in line 3.
What suitable code can I do with this problem? Please help and thanks.

I would suggest using regular expressions (the Pattern class) rather than StringTokenizer for this. For example:
final Pattern targetWordPattern =
Pattern.compile("\\b" + Pattern.quote(targetWord) + "\\b",
Pattern.CASE_INSENSITIVE);
String line = "";
boolean wordFound = false;
while((line = bufRead.readLine()) != null) {
if(targetWordPattern.matcher(line).find()) {
wordFound = true;
break;
}
else
output = "Quote not found";
}

Tokenize by whitespace, then trim by the ' character.

Regarding Java String Manipulation

I have the string "MO""RET" gets stored in items[1] array after the split command. After it get's stored I do a replaceall on this string and it replaces all the double quotes.
But I want it to be stored as MO"RET. How do i do it. In the csv file from which i process using split command Double quotes within the contents of a Text field are repeated (Example: This account is a ""large"" one"). So i want retain the one of the two quotes in the middle of string if it get's repeated and ignore the end quotes if present . How can i do it?
String items[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
items[1] has "MO""RET"
String recordType = items[1].replaceAll("\"","");
After this recordType has MORET I want it to have MO"RET

Don't use regex to split a CSV line. This is asking for trouble ;) Just parse it character-by-character. Here's an example:
public static List<List<String>> parseCsv(InputStream input, char separator) throws IOException {
BufferedReader reader = null;
List<List<String>> csv = new ArrayList<List<String>>();
try {
reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
for (String record; (record = reader.readLine()) != null;) {
boolean quoted = false;
StringBuilder fieldBuilder = new StringBuilder();
List<String> fields = new ArrayList<String>();
for (int i = 0; i < record.length(); i++) {
char c = record.charAt(i);
fieldBuilder.append(c);
if (c == '"') {
quoted = !quoted;
}
if ((!quoted && c == separator) || i + 1 == record.length()) {
fields.add(fieldBuilder.toString().replaceAll(separator + "$", "")
.replaceAll("^\"|\"$", "").replace("\"\"", "\"").trim());
fieldBuilder = new StringBuilder();
}
if (c == separator && i + 1 == record.length()) {
fields.add("");
}
}
csv.add(fields);
}
} finally {
if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}
return csv;
}
Yes, there's little regex involved, but it only trims off ending separator and surrounding quotes of a single field.
You can however also grab any 3rd party Java CSV API.

How about:
String recordType = items[1].replaceAll( "\"\"", "\"" );

I prefer you to use replace instead of replaceAll.
replaceAll uses REGEX as the first argument.
The requirement is to replace two continues QUOTES with one QUOTE
String recordType = items[1].replace( "\"\"", "\"" );
To see the difference between replace and replaceAll , execute bellow code
recordType = items[1].replace( "$$", "$" );
recordType = items[1].replaceAll( "$$", "$" );

Here you can use the regular expression.
recordType = items[1].replaceAll( "\\B\"", "" );
recordType = recordType.replaceAll( "\"\\B", "" );
First statement replace the quotes in the beginning of the word with empty character.
Second statement replace the quotes in the end of the word with empty character.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to add delimiters from the StringTokenizers to a seperate string? - java

I don't think StringTokenizer is good for this task, try StringBuilder sb = new StringBuilder(); for(char c : input.toCharArray()) { if ("'.,><-=[]{}+!##$%^&*()~`;/?".indexOf(c) >= 0) { sb.append(c); } }

Related

Split command on a nextElement

How to replace all special characters with another character in java?

Tokenize words ignoring hashtags with Open nlp

I want to search for a string using StringTokenizer but the string I'm looking for has a delimiter in it - Java

Regarding Java String Manipulation

Categories

Resources