Summary
I have a string [tab] [ch]C[/ch] [ch]Am[/ch] \n I heard there was a secret chord[/tab]
When the TextView is big enough to hold it with no wrapping it should (and does) look like this:
C Am
I heard there was a secret chord
When the line(s) are too long to fit in the TextView, I want it to wrap like this:
C
I heard there was a
Am
secret chord
Right now it wraps like this (like you'd expect if it was just text)
C
Am
I heard there was a
secret chord
Constraints:
I use a monospace text font to keep alignment
The chords (C, F, Am, G) are clickable so if you make a custom implementation of TextView, it still has to be able to handle ClickableSpans or otherwise keep them clickable
Kotlin or Java (or XML) is fine
If it's helpful, this is for an open source project of mine, so the source is available on Github. Here's the fragment source (look for fun processTabContent(text: CharSequence) -- that's where I process the text right now. Here's the layout xml.
Input Format
My data is stored in a single string (this can't be changed -- I get it from an API). Here's how the above tab would be formatted:
[Intro]\n[tab][ch]C[/ch] [ch]Am[/ch] [ch]C[/ch] [ch]Am[/ch][/tab]\n[Verse 1][tab] [ch]C[ch] [ch]Am[/ch] I heard there was a secret chord [/tab][tab] [ch]C[/ch] [ch]Am[/ch]\nThat David played, and it pleased the Lord[/tab][tab] [ch]C[/ch] [ch]F[/ch] [ch]G[/ch]\n But you don't really care for music, do you?[/tab]
Note that the chords (notes that a guitarist would play, like C or F) are wrapped in [ch] tags. I currently have code that finds these, removes the [ch] tags, and wraps each chord in a ClickableSpan. On click, my application shows another fragment with instructions how to play the chord on a guitar. This is only important in that the answer to this question must allow these chords to be clicked like this still.
What I'm doing right now (that isn't working)
As you may have noticed by now, it's the [tab] tags that we're going to have to focus on for this question. Right now, I'm going through the string and replacing [tab] with a newline and removing all instances of [/tab]. This works fine if my TextView's text size is small enough that entire lines fit on the device screen. However, when the word wrap kicks in I start having problems.
This:
C Am
I heard there was a secret chord
Should wrap to this:
C
I heard there was a
Am
secret chord
But instead wraps like this:
C
Am
I heard there was a
secret chord
I think this solution might solve the issue. But there are some assumption,
Every lyric starts with [tab] and end with [/tab]
It is always separated with \n between chords and lyric
And I believe you need to cleanse the data before you use it. Since, it is likely possible to handle Intro, Verse easily, I will focus on lyric tab only.
Here is the sample data for single lyric
[tab] [ch]C[/ch] [ch]F[/ch] [ch]G[/ch]
\n But you don't really care for music, do you?[/tab]
Firstly, We need to remove some unwanted blocks.
val inputStr = singleLyric
.replace("[tab]", "")
.replace("[/tab]", "")
.replace("[ch]", "")
.replace("[/ch]", "")
After that, I separated the chords and lyric
val indexOfLineBreak = inputStr.indexOf("\n")
val chords = inputStr.substring(0, indexOfLineBreak)
val lyrics = inputStr.substring(indexOfLineBreak + 1, inputStr.length).trim()
After we clean the data, we can start to set the data.
text_view.text = lyrics
text_view.post {
val lineCount = text_view.lineCount
var currentLine = 0
var newStr = ""
if (lineCount <= 1) {// if it's not multi line, no need to manipulate data
newStr += chords + "\n" + lyrics
} else {
val chordsCount = chords.count()
while (currentLine < lineCount) {
//get start and end index of selected line
val lineStart = text_view.layout.getLineStart(currentLine)
val lineEnd = text_view.layout.getLineEnd(currentLine)
// add chord substring
if (lineEnd <= chordsCount) //chords string can be shorter than lyric
newStr += chords.substring(lineStart, lineEnd) + "\n"
else if (lineStart < chordsCount) //it can be no more chords data to show
newStr += chords.substring(lineStart, chordsCount) + "\n"
// add lyric substring
newStr += lyrics.substring(lineStart, lineEnd) + "\n"
currentLine++
}
}
text_view.text = newStr
}
Idea is simple. After we set the lyric data to textview, we can get line count. With the current line number, we can get starting index and ending index of the selected line. With the indexes, we can manipulate the string. Hope this can help u.
This is based off of Hein Htet Aung's answer. The general idea is that you have two lines passed in (singleLyric), but the lines might have to be processed before appending them (hence the middle while loop). For convenience, this was written with a parameter appendTo that the lyric will be appended to. It returns a finished SpannableStringBuilder with the lyric appended. It would be used like this:
ssb = SpannableStringBuilder()
for (lyric in listOfDoubleLyricLines) {
ssb = processLyricLine(lyric, ssb)
}
textView.movementMethod = LinkMovementMethod.getInstance() // without LinkMovementMethod, link can not click
textView.setText(ssb, TextView.BufferType.SPANNABLE)
Here's the processing function:
private fun processLyricLine(singleLyric: CharSequence, appendTo: SpannableStringBuilder): SpannableStringBuilder {
val indexOfLineBreak = singleLyric.indexOf("\n")
var chords: CharSequence = singleLyric.subSequence(0, indexOfLineBreak).trimEnd()
var lyrics: CharSequence = singleLyric.subSequence(indexOfLineBreak + 1, singleLyric.length).trimEnd()
var startLength = appendTo.length
var result = appendTo
// break lines ahead of time
// thanks #Andro https://stackoverflow.com/a/11498125
val availableWidth = binding.tabContent.width.toFloat() //- binding.tabContent.textSize / resources.displayMetrics.scaledDensity
while (lyrics.isNotEmpty() || chords.isNotEmpty()) {
// find good word break spot at end
val plainChords = chords.replace("[/?ch]".toRegex(), "")
val wordCharsToFit = findMultipleLineWordBreak(listOf(plainChords, lyrics), binding.tabContent.paint, availableWidth)
// make chord substring
var i = 0
while (i < min(wordCharsToFit, chords.length)) {
if (i+3 < chords.length && chords.subSequence(i .. i+3) == "[ch]"){
//we found a chord; add it.
chords = chords.removeRange(i .. i+3) // remove [ch]
val start = i
while(chords.subSequence(i .. i+4) != "[/ch]"){
// find end
i++
}
// i is now 1 past the end of the chord name
chords = chords.removeRange(i .. i+4) // remove [/ch]
result = result.append(chords.subSequence(start until i))
//make a clickable span
val chordName = chords.subSequence(start until i)
val clickableSpan = makeSpan(chordName)
result.setSpan(clickableSpan, startLength+start, startLength+i, Spanned.SPAN_EXCLUSIVE_EXCLUSIVE)
} else {
result = result.append(chords[i])
i++
}
}
result = result.append("\r\n")
// make lyric substring
val thisLine = lyrics.subSequence(0, min(wordCharsToFit, lyrics.length))
result = result.append(thisLine).append("\r\n")
// update for next pass through
chords = chords.subSequence(i, chords.length)
lyrics = lyrics.subSequence(thisLine.length, lyrics.length)
startLength = result.length
}
return result
}
And finally, I found the need to break my text at words rather than just at the max line length, so here's the word break finder function for that:
private fun findMultipleLineWordBreak(lines: List<CharSequence>, paint: TextPaint, availableWidth: Float): Int{
val breakingChars = "‐–〜゠= \t\r\n" // all the chars that we'll break a line at
var totalCharsToFit: Int = 0
// find max number of chars that will fit on a line
for (line in lines) {
totalCharsToFit = max(totalCharsToFit, paint.breakText(line, 0, line.length,
true, availableWidth, null))
}
var wordCharsToFit = totalCharsToFit
// go back from max until we hit a word break
var allContainWordBreakChar: Boolean
do {
allContainWordBreakChar = true
for (line in lines) {
allContainWordBreakChar = allContainWordBreakChar
&& (line.length <= wordCharsToFit || breakingChars.contains(line[wordCharsToFit]))
}
} while (!allContainWordBreakChar && --wordCharsToFit > 0)
// if we had a super long word, just break at the end of the line
if (wordCharsToFit < 1){
wordCharsToFit = totalCharsToFit
}
return wordCharsToFit
}
Related
I am working on an exercise with the following criteria:
"The input consists of pairs of tokens where each pair begins with the type of ticket that the person bought ("coach", "firstclass", or "discount", case-sensitively) and is followed by the number of miles of the flight."
The list can be paired -- coach 1500 firstclass 2000 discount 900 coach 3500 -- and this currently works great. However, when the String and int value are split like so:
firstclass 5000 coach 1500 coach
100 firstclass
2000 discount 300
it breaks entirely. I am almost certain that it has something to do with me using this format (not full)
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ")
while(token.hasMoreTokens())
{
String ticketClass = token.nextToken().toLowerCase();
int count = Integer.parseInt(token.nextToken());
...
}
}
because it will always read the first value as a String and the second value as an integer. I am very lost on how to keep track of one or the other while going to read the next line. Any help is truly appreciated.
Similar (I think) problems:
Efficient reading/writing of key/value pairs to file in Java
Java-Read pairs of large numbers from file and represent them with linked list, get the sum and product of each pair
Reading multiple values in multiple lines from file (Java)
If you can afford to read the text file in all at once as a very long String, simply use the built-in String.split() with the regex \\s+, like so
String[] tokens = fileAsString.split("\\s+");
This will split the input file into tokens, assuming the tokens are separated by one or more whitespace characters (a whitespace character covers newline, space, tab, and carriage return). Even and odd tokens are ticket types and mile counts, respectively.
If you absolutely have to read in line-by-line and use StringTokenizer, a solution is to count number of tokens in the last line. If this number is odd, the first token in the current line would be of a different type of the first token in the last line. Once knowing the starting type of the current line, simply alternating types from there.
int tokenCount = 0;
boolean startingType = true; // true for String, false for integer
boolean currentType;
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ");
startingType = startingType ^ (tokenCount % 2 == 1); // if tokenCount is odd, the XOR ^ operator will flip the starting type of this line
tokenCount = 0;
while(token.hasMoreTokens())
{
tokenCount++;
currentType = startingType ^ (tokenCount % 2 == 0); // alternating between types in current line
if (currentType) {
String ticketClass = token.nextToken().toLowerCase();
// do something with ticketClass here
} else {
int mileCount = Integer.parseInt(token.nextToken());
// do something with mileCount here
}
...
}
}
I found another way to do this problem without using either the StringTokenizer or the regex...admittedly I had trouble with the regular expressions haha.
I declare these outside of the try-catch block because I want to use them in both my finally statement and return the points:
int points = 0;
ArrayList<String> classNames = new ArrayList<>();
ArrayList<Integer> classTickets = new ArrayList<>();
Then inside my try-statement, I declare the index variable because I won't need that outside of this block. That variable increases each time a new element is read. Odd elements are read as ticket classes and even elements are read as ticket prices:
try
{
int index = 0;
// read till the file is empty
while(fileScanner.hasNext())
{
// first entry is the ticket type
if(index % 2 == 0)
classNames.add(fileScanner.next());
// second entry is the number of points
else
classTickets.add(Integer.parseInt(fileScanner.next()));
index++;
}
}
You can either catch it here like this or use throws NoSuchElementException in your method declaration -- As long as you catch it on your method call
catch(NoSuchElementException noElement)
{
System.out.println("<###-NoSuchElementException-###>");
}
Then down here, loop through the number of elements. See which flight class it is and multiply the ticket count respectively and return the points outside of the block:
finally
{
for(int i = 0; i < classNames.size(); i++)
{
switch(classNames.get(i).toLowerCase())
{
case "firstclass": // 2 points for first
points += 2 * classTickets.get(i);
break;
case "coach": // 1 point for coach
points += classTickets.get(i);
break;
default:
// budget gets nothing
}
}
}
return points;
The regex seems like the most convenient way, but this was more intuitive to me for some reason. Either way, I hope the variety will help out.
simply use the built-in String.split() - #bui
I was finally able to wrap my head around regular expressions, but \s+ was not being recognized for some reason. It kept giving me this error message:
Invalid escape sequence (valid ones are \b \t \n \f \r " ' \ )Java(1610612990)
So when I went through with those characters instead, I was able to write this:
int points = 0, multiplier = 0, tracker = 0;
while(fileScanner.hasNext())
{
String read = fileScanner.next().split(
"[\b \t \n \f \r \" \' \\ ]")[0];
if(tracker % 2 == 0)
{
if(read.toLowerCase().equals("firstclass"))
multiplier = 2;
else if(read.toLowerCase().equals("coach"))
multiplier = 1;
else
multiplier = 0;
}else
{
points += multiplier * Integer.parseInt(read);
}
tracker++;
}
This code goes one entry at a time instead of reading a whole array void of whitespace as a work-around for that error message I was getting. If you could show me what the code would look like with String[] tokens = fileAsString.split("\s+"); instead I would really appreciate it :)
you need to add another "\" before "\s" to escape the slash before "s" itself – #bui
I’m developing an android app that gets objects from a server and shows them in a simple list.
I’m trying to figure out how to deal with long object’s titles :
Every title populates a designated multi-line TextView.
If a title is longer than 16 characters, it messes with my desired UI.
There are two scenarios I need to solve -
1). If the title is longer than 16 characters & contains more than one word, I need to split the words into different lines (I tried to .split("") and .trim(), but I don’t want to use another view, just break a line in the same one, and the use in ("") seems unreliable to me).
2). If the title is longer than 16 characters and contains only one long word, I only need to change font size specifically.
Any ideas for a good and reliable solution?
Thanks a lot in advance.
use SpannableString for a single view
For title:
SpannableString titleSpan = new SpannableString("title String");
titleSpan.setSpan(new RelativeSizeSpan(1.3f), 0, titleSpan.length(), Spanned.SPAN_EXCLUSIVE_EXCLUSIVE);
for Message
SpannableString messageSpan = new SpannableString("Message String");
messageSpan.setSpan(new RelativeSizeSpan(1.0f), 0, messageSpan.length(), Spanned.SPAN_EXCLUSIVE_EXCLUSIVE);
set in TextView
tvTermsPolicyHeading.setText(TextUtils.concat(titleSpan, messageSpan));
Code like below it will work as you need
String title; //your title
//find length of your title
int length = title.length();
if (length>16){
string[] titles = myString.split("\\s+");
int size = titles.length;
if (size < 2){
yourTextview.setText(title);
// reduce the text size of your textview
}else {
String newTitle= "";
for (int i=0;i<titles.length;i++){
newTitle = titles[i]+"\n"
}
yourTextview.setText(newTitle);
}
}
You can split and then concatenate the words using "\n" if there are more than one words.
In case of long word
You can see this question here
Auto-fit TextView for Android
try this:
if(title.split(" ").size > 1){
String line1 = title.substring(0, 16);
int end = line1.lastIndexOf(" ");
titleTextView.setText(title.substring(0,end) + "\n" +
title.substring(end+1,title.size-1);
}else{
titleTextView.setText(title);
titleTextView.setTextSize(yourTextSize);
}
this code should work perfectly for your case.
I have over a gigabyte of text that I need to go through and surround punctuation with spaces (tokenizing). I have a long regular expression (1818 characters, though that's mostly lists) that defines when punctuation should not be separated. Being long and complicated makes it hard to use groups with it, though I wouldn't leave that out as an option since I could make most groups non-capturing (?:).
Question: How can I efficiently replace certain characters that don't match a particular regular expression?
I've looked into using lookaheads or similar, and I haven't quite figured it out, but it seems to be terribly inefficient anyway. It would likely be better than using placeholders though.
I can't seem to find a good "replace with a bunch of different regular expressions for both finding and replacing in one pass" function.
Should I do this line by line instead of operating on the whole text?
String completeRegex = "[^\\w](("+protectedPrefixes+")|(("+protectedNumericOnly+")\\s*\\p{N}))|"+protectedRegex;
Matcher protectedM = Pattern.compile(completeRegex).matcher(s);
ArrayList<String> protectedStrs = new ArrayList<String>();
//Take note of the protected matches.
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
}
//Replace protected matches.
String replaceStr = "<PROTECTED>";
s = protectedM.replaceAll(replaceStr);
//Now that it's safe, separate punctuation.
s = s.replaceAll("([^\\p{L}\\p{N}\\p{Mn}_\\-<>'])"," $1 ");
// These are for apostrophes. Can these be combined with either the protecting regular expression or the one above?
s = s.replaceAll("([\\p{N}\\p{L}])'(\\p{L})", "$1 '$2");
s = s.replaceAll("([^\\p{L}])'([^\\p{L}])", "$1 ' $2");
Note the two additional replacements for apostrophes. Using placeholders protects against those replacements as well, but I'm not really concerned with apostrophes or single quotes in my protecting regex anyway, so it's not a real concern.
I'm rewriting what I considered very inefficient Perl code with my own in Java, keeping track of speed, and things were going fine until I started replacing the placeholders with the original strings. With that addition it's too slow to be reasonable (I've never seen it get even close to finishing).
//Replace placeholders with original text.
String resultStr = "";
String currentStr = "";
int currentPos = 0;
int[] protectedArray = replaceStr.codePoints().toArray();
int protectedLen = protectedArray.length;
int[] strArray = s.codePoints().toArray();
int protectedCount = 0;
for (int i=0; i<strArray.length; i++) {
int pt = strArray[i];
// System.out.println("pt: "+pt+" symbol: "+String.valueOf(Character.toChars(pt)));
if (protectedArray[currentPos]==pt) {
if (currentPos == protectedLen - 1) {
resultStr += protectedStrs.get(protectedCount);
protectedCount++;
currentPos = 0;
} else {
currentPos++;
}
} else {
if (currentPos > 0) {
resultStr += replaceStr.substring(0, currentPos);
currentPos = 0;
currentStr = "";
}
resultStr += ParseUtils.getSymbol(pt);
}
}
s = resultStr;
This code may not be the most efficient way to return the protected matches. What is a better way? Or better yet, how can I replace punctuation without having to use placeholders?
I don't know exactly how big your in-between strings are, but I suspect that you can do somewhat better than using Matcher.replaceAll, speed-wise.
You're doing 3 passes across the string, each time creating a new Matcher instance, and then creating a new String; and because you're using + to concatenate the strings, you're creating a new string which is the concatenation of the in-between string and the protected group, and then another string when you concatenate this to the current result. You don't really need all of these extra instances.
Firstly, you should accumulate the resultStr in a StringBuilder, rather than via direct string concatenation. Then you can proceed something like:
StringBuilder resultStr = new StringBuilder();
int currIndex = 0;
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
appendInBetween(resultStr, str, current, protectedM.str());
resultStr.append(protectedM.group());
currIndex = protectedM.end();
}
resultStr.append(str, currIndex, str.length());
where appendInBetween is a method implementing the equivalent to the replacements, just in a single pass:
void appendInBetween(StringBuilder resultStr, String s, int start, int end) {
// Pass the whole input string and the bounds, rather than taking a substring.
// Allocate roughly enough space up-front.
resultStr.ensureCapacity(resultStr.length() + end - start);
for (int i = start; i < end; ++i) {
char c = s.charAt(i);
// Check if c matches "([^\\p{L}\\p{N}\\p{Mn}_\\-<>'])".
if (!(Character.isLetter(c)
|| Character.isDigit(c)
|| Character.getType(c) == Character.NON_SPACING_MARK
|| "_\\-<>'".indexOf(c) != -1)) {
resultStr.append(' ');
resultStr.append(c);
resultStr.append(' ');
} else if (c == '\'' && i > 0 && i + 1 < s.length()) {
// We have a quote that's not at the beginning or end.
// Call these 3 characters bcd, where c is the quote.
char b = s.charAt(i - 1);
char d = s.charAt(i + 1);
if ((Character.isDigit(b) || Character.isLetter(b)) && Character.isLetter(d)) {
// If the 3 chars match "([\\p{N}\\p{L}])'(\\p{L})"
resultStr.append(' ');
resultStr.append(c);
} else if (!Character.isLetter(b) && !Character.isLetter(d)) {
// If the 3 chars match "([^\\p{L}])'([^\\p{L}])"
resultStr.append(' ');
resultStr.append(c);
resultStr.append(' ');
} else {
resultStr.append(c);
}
} else {
// Everything else, just append.
resultStr.append(c);
}
}
}
Ideone demo
Obviously, there is a maintenance cost associated with this code - it is undeniably more verbose. But the advantage of doing it explicitly like this (aside from the fact it is just a single pass) is that you can debug the code like any other - rather than it just being the black box that regexes are.
I'd be interested to know if this works any faster for you!
At first I thought that appendReplacement wasn't what I was looking for, but indeed it was. Since it's replacing the placeholders at the end that slowed things down, all I really needed was a way to dynamically replace matches:
StringBuffer replacedBuff = new StringBuffer();
Matcher replaceM = Pattern.compile(replaceStr).matcher(s);
int index = 0;
while (replaceM.find()) {
replaceM.appendReplacement(replacedBuff, "");
replacedBuff.append(protectedStrs.get(index));
index++;
}
replaceM.appendTail(replacedBuff);
s = replacedBuff.toString();
Reference: Second answer at this question.
Another option to consider:
During the first pass through the String, to find the protected Strings, take the start and end indices of each match, replace the punctuation for everything outside of the match, add the matched String, and then keep going. This takes away the need to write a String with placeholders, and requires only one pass through the entire String. It does, however, require many separate small replacement operations. (By the way, be sure to compile the patterns before the loop, as opposed to using String.replaceAll()). A similar alternative is to add the unprotected substrings together, and then replace them all at the same time. However, the protected strings would then have to be added to the replaced string at the end, so I doubt this would save time.
int currIndex = 0;
while (protectedM.find()) {
protectedStrs.add(protectedM.group());
String substr = s.substring(currIndex,protectedM.start());
substr = p1.matcher(substr).replaceAll(" $1 ");
substr = p2.matcher(substr).replaceAll("$1 '$2");
substr = p3.matcher(substr).replaceAll("$1 ' $2");
resultStr += substr+protectedM.group();
currIndex = protectedM.end();
}
Speed comparison for 100,000 lines of text:
Original Perl script: 272.960579875 seconds
My first attempt: Too long to finish.
With appendReplacement(): 14.245160866 seconds
Replacing while finding protected: 68.691842962 seconds
Thank you, Java, for not letting me down.
I'm trying to ensure text does not appear outside of the window as the window size cannot be changed.
The image above shows what happens when the string of the order numbers exceeds the length of the window. I'm trying to ensure that when the length of the string of order numbers reaches a certain length, I use regex to make a new line for the next orders.
private String listOfOrders( Map<String, List<Integer> > map, String key )
{
String res = "";
if ( map.containsKey( key ))
{
List<Integer> orders = map.get(key);
for ( Integer i : orders )
{
res += " " + i + ",";
}
} else {
res = "-No key-";
}
return res;
}
}
This is the code to display the text, it works by forming the string res and filling it with the order numbers from the array list.
I found, through researching, a cool little piece of code which replaces a string every set amount of characters with itself plus a new line.
if(res.length() >= W-10)
{
res = res.replaceAll("(.{20})", "$1\n");
}
else
{
res += " " + i + ",";
}
But this has no effect at all. And I also realised that this code can not tell how long each line is because I'm using length to determine the length of each line and not how long each line is between each "\n".
My question is, how do I go about using regex to ensure each line in the string is a certain number of characters long? As my attempt does not work. The above just provides context as to why I want lines in a string a certain legnth.
Thanks!
Problem:
I have to design an algorithm, which does the following for me:
Say that I have a line (e.g.)
alert tcp 192.168.1.1 (caret is currently here)
The algorithm should process this line, and return a value of 4.
I coded something for it, I know it's sloppy, but it works, partly.
private int counter = 0;
public void determineRuleActionRegion(String str, int index) {
if (str.length() == 0 || str.indexOf(" ") == -1) {
triggerSuggestionList(1);
return;
}
//remove duplicate space, spaces in front and back before searching
int num = str.trim().replaceAll(" +", " ").indexOf(" ", index);
//Check for occurances of spaces, recursively
if (num == -1) { //if there is no space
//no need to check if it's 0 times it will assign to 1
triggerSuggestionList(counter + 1);
counter = 0;
return; //set to rule action
} else { //there is a space
counter++;
determineRuleActionRegion(str, num + 1);
}
} //end of determineactionRegion()
So basically I find for the space and determine the region (number of words typed). However, I want it to change upon the user pressing space bar <space character>.
How may I go around with the current code?
Or better yet, how would one suggest me to do it the correct way? I'm figuring out on BreakIterator for this case...
To add to that, I believe my algorithm won't work for multi sentences. How should I address this problem as well.
--
The source of String str is acquired from textPane.getText(0, pos + 1);, the JTextPane.
Thanks in advance. Do let me know if my question is still not specific enough.
--
More examples:
alert tcp $EXTERNAL_NET any -> $HOME_NET 22 <caret>
return -1 (maximum of the typed text is 7 words)
alert tcp 192.168.1.1 any<caret>
return 4 (as it is still at 2nd arg)
alert tcp<caret>
return 2 (as it is still at 2nd arg)
alert tcp <caret>
return 3
alert tcp $EXTERNAL_NET any -> <caret>
return 6
It is something like shell commands. As above. Though I think it does not differ much I believe, I just want to know how many arguments are typed. Thanks.
--
Pseudocode
Get whole paragraph from textpane
if more than 1 line -> process the last line
count how many arguments typed and return appropriate number
else
process current line
count how many arguments typed and return appropriate number
End
This uses String.split; I think this is what you want.
String[] texts = {
"alert tcp $EXTERNAL_NET any -> $HOME_NET 22 ",
"alert tcp 192.168.1.1 any",
"alert tcp",
"alert tcp ",
"alert tcp $EXTERNAL_NET any -> ",
"multine\ntest\ntest 1 2 3",
};
for (String text : texts) {
String[] lines = text.split("\r?\n|\r");
String lastLine = lines[lines.length - 1];
String[] tokens = lastLine.split("\\s+", -1);
for (String token : tokens) {
System.out.print("[" + token + "]");
}
int pos = (tokens.length <= 7) ? tokens.length : -1;
System.out.println(" = " + pos);
}
This produces the following output:
[alert][tcp][$EXTERNAL_NET][any][->][$HOME_NET][22][] = -1
[alert][tcp][192.168.1.1][any] = 4
[alert][tcp] = 2
[alert][tcp][] = 3
[alert][tcp][$EXTERNAL_NET][any][->][] = 6
[test][1][2][3] = 4
The codes provided by polygenelubricants and helios work, to a certain extent. It addresses the aforementioned problem I'd stated, but not with multi-lines. helios's code is more straightforward.
However both codes did not address the problem when you press enter in the JTextPane, it will still return back the old count instead of 1 as the split() returns it as one sentence instead of two.
E.g. alert tcp <enter is pressed>
By right it should return 1 since it is a new sentence. It returned 2 for both algorithms.
Also, if I highlight all and delete both algorithms will throw NullPointerException as there is no string to be split.
I added one line, and it solved the problems mentioned above:
public void determineRuleActionRegion(String str) {
//remove repetitive spaces and concat $ for new line indicator
str = str.trim().replaceAll(" +", " ") + "$";
String[] lines = str.split("\r?\n|\r");
String lastLine = lines[lines.length - 1];
String[] tokens = lastLine.split("\\s+", -1);
int pos = (tokens.length <= 7) ? tokens.length : -1;
triggerSuggestionList(pos);
System.out.println("Current pos: " + pos);
return;
} //end of determineactionRegion()
With that, when split() parses the str, the "$" will create another line, which will be the last line regardless, and the count now will return to one. Also, there will not be NullPointerException as the "$" is always there.
However, without the help of polygenelubricants and helios, I don't think I will be able to figure it out so soon. Thanks guys!
EDIT: Okay... apparently split("\r?\n|\r",-1) works the same. Question is should I accept polygenelubricants or my own? Hmm.
2nd EDIT: One thing bad about concatenating '%' to the end of the str, lastLine.endsWith(" ") == true will return false. So have to use split("\r?\n|\r",-1) and lastLine.endsWith(" ") == true for the complete solution.
What about this: get last line, count what's between spaces...
String text = ...
String[] lines = text.split("\n"); // or \r\n depending on how you get the string
String lastLine = lines[lines.length-1];
StringTokenizer tokenizer = new StringTokenizer(lastLine, " ");
// note that strtokenizer will ignore empty tokens, it is, what is between two consecutive spaces
int count = 0;
while (tokenizer.hasMoreTokens()) {
tokenizer.nextToken();
count++;
}
return count;
Edit you could control if you have a final space (lastLine.endsWith(" ")) so you are starting a new word or whatever, it's a basic approach for you to make it up :)
Is the sample line representative? An editor for some rule based language (ACLs)?
How about going for a full Information Extraction/named entity recognition solution, the one that will be able to recognize entities (keywords, ip addresses, etc)? You don't have to write everything from scratch, there're existing tools and libraries.
UPDATE: Here's a piece of Snort code that I believe does the parsing:
Function ParseRule()
if (*args == '(') {
// "Preprocessor Rule detected"
} else {
/* proto ip port dir ip port r*/
toks = mSplit(args, " \t", 7, &num_toks, '\\');
/* A rule might not have rule options */
if (num_toks < 6) {
ParseError("Bad rule in rules file: %s", args);
}
..
}
otn = ParseRuleOptions(sc, rtn, roptions, rule_type, protocol);
..
mSplit is defined in mstring.c, a function to split a string into tokens.
In your case, ParseRuleOptions should return one for the whole string inside brackets I guess.
UPDATE 2: btw, is your first example correct, since in snort, you can add options to rules? For example this is a valid rule being written (options section not completed):
alert tcp any any -> 192.168.1.0/24 111 (content:"|00 01 86 a5|"; <caret>
In some cases you can have either 6 or 7 'words', so your algorithm should have a bit more knowledge, right?