DefaultHighlightPainter keeps shifting in JTextPane - java

I am currently working on a application that reads a big text, and highlights a specific substring inside that text. And it kinda works...
But it seems that (and I have no clue why) the highligt keeps shifting every time it highlights the string.
public List<int[]> findString(String text) {
text = text.toLowerCase();
List<int[]> highlightPositions = new ArrayList<int[]>();
JTextPane pane = getTextPane();
String paneText = pane.getText().toLowerCase();
int end = 0;
while (paneText.indexOf(text, end) != -1) {
int start = paneText.indexOf(text, end);
end = start + text.length();
highlightPositions.add(new int[] {start, end});
}
try {
highlight(highlightPositions);
} catch (Exception ex) {
}
return null;
}
and this is the code that does the actual highlighting
public void highlight(List<int[]> highlightPositions) throws BadLocationException {
DefaultHighlighter.DefaultHighlightPainter highlightPainter = new DefaultHighlighter.DefaultHighlightPainter(Color.YELLOW);
JTextPane textPane = getTextPane();
for (int[] position : highlightPositions) {
System.out.println("Highlight: " + position[0] + " : " + position[1]);
textPane.getHighlighter().addHighlight(position[0], position[1], highlightPainter);
}
}
Does anyone know how to fix this?
EDIT:
Here is how it looks when I attempt to highlight the word "Device".
Highlighting output

String paneText = pane.getText().toLowerCase();
Don't use getText(). The getText() method will return the string with the end-of-line string for the platform which in the case of Windows is \r\n. However the Document only stores \n for the EOL string so you have a mismatch of offsets for every extra line in the Document.
The solution is to use:
String paneText = pane.getDocument().getText().toLowerCase();
See Text and New Lines for more complete information on the problem.

Related

hadoop mapper input deal with hex values

I have list of tweet as the input to the hdfs, and try to perform a map-reduce task. This is my mapper implementation:
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
try {
String[] fields = value.toString().split("\t");
StringBuilder sb = new StringBuilder();
for (int i = 1; i < fields.length; i++) {
if (i > 1) {
sb.append("\t");
}
sb.append(fields[i]);
}
tid.set(fields[0]);
content.set(sb.toString());
context.write(tid, content);
} catch(DecoderException e) {
e.printStackTrace();
}
}
As you can see, I tried to split the input by "\t", but the input (value.toString()) looks like this when I print it out:
2014\x091880284777\x09argento_un\x090\x090\x09RT #topmusic619: #RETWEET THIS!!!!!\x5CnFOLLOW ME &amp
; EVERYONE ELSE THAT RETWEETS THIS FOR 35+ FOLLOWERS\x5Cn#TeamFollowBack #Follow2BeFollowed #TajF\xE2\x80\xA6
here is another example:
2014\x0934447260\x09RBEKP\x090\x090\x09\xE2\x80\x9C#LENEsipper: Wild lmfaooo RT #Yerrp08: L**o some
n***a nutt up while gettin twerked
I noted that \x09 should be a tab character (ASCII 09 is tab), So I tried to use apache Hex:
String tmp = value.toString();
byte[] bytes = Hex.decodeHex(tmp.toCharArray());
But the decodeHex function returns null.
This is weird, since some of the characters are in hex while others are not. How can I decode them?
Edit:
Also note that besides tab, emojis are also encoded as hex values.

how to stop a java spell checker program from correcting repetitive words

I've implemented a program that does the following:
scan all of the words in a web page into a string (using jsoup)
Filter out all of the HTML markup and code
Put these words into a spell checking program and offer suggestions
The spell checking program loads a dictionary.txt file into an array and compares the string input to the words inside the dictionary.
My current problem is that when the input contains the same word multiple times, such as "teh program is teh worst", the code will print out
You entered 'teh', did you mean 'the'?
You entered 'teh', did you mean 'the'?
Sometimes a website will have multiple words over and over again and this can become messy.
If it's possible, printing the word along with how many times it was spelled incorrectly would be perfect, but putting a limit to each word being printed once would be good enough.
My program has a handful of methods and two classes, but the spell checking method is below:
Note: the original code contains some 'if' statements that remove punctuation marks but I've removed them for clarity.
static boolean suggestWord;
public static String checkWord(String wordToCheck) {
String wordCheck;
String word = wordToCheck.toLowerCase();
if ((wordCheck = (String) dictionary.get(word)) != null) {
suggestWord = false; // no need to ask for suggestion for a correct
// word.
return wordCheck;
}
// If after all of these checks a word could not be corrected, return as
// a misspelled word.
return word;
}
TEMPORARY EDIT: As requested, the complete code:
Class 1:
public class ParseCleanCheck {
static Hashtable<String, String> dictionary;// To store all the words of the
// dictionary
static boolean suggestWord;// To indicate whether the word is spelled
// correctly or not.
static Scanner urlInput = new Scanner(System.in);
public static String cleanString;
public static String url = "";
public static boolean correct = true;
/**
* PARSER METHOD
*/
public static void PageScanner() throws IOException {
System.out.println("Pick an english website to scan.");
// This do-while loop allows the user to try again after a mistake
do {
try {
System.out.println("Enter a URL, starting with http://");
url = urlInput.nextLine();
// This creates a document out of the HTML on the web page
Document doc = Jsoup.connect(url).get();
// This converts the document into a string to be cleaned
String htmlToClean = doc.toString();
cleanString = Jsoup.clean(htmlToClean, Whitelist.none());
correct = false;
} catch (Exception e) {
System.out.println("Incorrect format for a URL. Please try again.");
}
} while (correct);
}
/**
* SPELL CHECKER METHOD
*/
public static void SpellChecker() throws IOException {
dictionary = new Hashtable<String, String>();
System.out.println("Searching for spelling errors ... ");
try {
// Read and store the words of the dictionary
BufferedReader dictReader = new BufferedReader(new FileReader("dictionary.txt"));
while (dictReader.ready()) {
String dictInput = dictReader.readLine();
String[] dict = dictInput.split("\\s"); // create an array of
// dictionary words
for (int i = 0; i < dict.length; i++) {
// key and value are identical
dictionary.put(dict[i], dict[i]);
}
}
dictReader.close();
String user_text = "";
// Initializing a spelling suggestion object based on probability
SuggestSpelling suggest = new SuggestSpelling("wordprobabilityDatabase.txt");
// get user input for correction
{
user_text = cleanString;
String[] words = user_text.split(" ");
int error = 0;
for (String word : words) {
if(!dictionary.contains(word)) {
checkWord(word);
dictionary.put(word, word);
}
suggestWord = true;
String outputWord = checkWord(word);
if (suggestWord) {
System.out.println("Suggestions for " + word + " are: " + suggest.correct(outputWord) + "\n");
error++;
}
}
if (error == 0) {
System.out.println("No mistakes found");
}
}
} catch (IOException e) {
e.printStackTrace();
System.exit(-1);
}
}
/**
* METHOD TO SPELL CHECK THE WORDS IN A STRING. IS USED IN SPELL CHECKER
* METHOD THROUGH THE "WORD" STRING
*/
public static String checkWord(String wordToCheck) {
String wordCheck;
String word = wordToCheck.toLowerCase();
if ((wordCheck = (String) dictionary.get(word)) != null) {
suggestWord = false; // no need to ask for suggestion for a correct
// word.
return wordCheck;
}
// If after all of these checks a word could not be corrected, return as
// a misspelled word.
return word;
}
}
There is a second class (SuggestSpelling.java) which holds a probability calculator but that isn't relevant right now, unless you planned on running the code for yourself.
Use a HashSet to detect duplicates -
Set<String> wordSet = new HashSet<>();
And store each word of the input sentence. If any word already exist during inserting into the HashSet, don't call checkWord(String wordToCheck) for that word. Something like this -
String[] words = // split input sentence into words
for(String word: words) {
if(!wordSet.contains(word)) {
checkWord(word);
// do stuff
wordSet.add(word);
}
}
Edit
// ....
{
user_text = cleanString;
String[] words = user_text.split(" ");
Set<String> wordSet = new HashSet<>();
int error = 0;
for (String word : words) {
// wordSet is another data-structure. Its only for duplicates checking, don't mix it with dictionary
if(!wordSet.contains(word)) {
// put all your logic here
wordSet.add(word);
}
}
if (error == 0) {
System.out.println("No mistakes found");
}
}
// ....
You have other bugs as well like you are passing String wordCheck as argument of checkWord and re-declare it inside checkWord() again String wordCheck; which is not right. Please check the other parts as well.

Can i read file in java and print its contents without comment statements?

When i read java file as tokens and print it's content,
using BufferedReader and StringTokenizer,how can i print only its content without comment statements that begin with " // " , " /* */" .
I want to print content of file without these statement that used for clarify the code.
You can do that very easily using JavaParser: just parse the code specifying that you want to ignore comments and then dump the AST
CompilationUnit cu = JavaParser.parse(reader, false /*considerComments*/);
String codeWithoutComments = cu.toString();
While dumping it will reformat the code.
1 If you want to remove comments, you can well:
remove // => see the same question here, no need of regex : Find single line comments in byte array
remove /* */ it is more difficult. regex could work, but you could get a lot of pain . I dont recommend that
2 use a java parser : Java : parse java source code, extract methods
javaparser for example: https://github.com/javaparser/javaparser
then iterate the code, and remove comments, etc.
This code will remove the comment inside a text file.But, It will not remove the symbols of comment, if you need to remove it, you can do it by editing the three functions which I had written below.Test case which i had tested.
// helloworld
/* comment */
a /* comment */
b
/*
comment
*/
c
d
e
// xxxx
f // xxxx
The Output will be:
//
/* */
a /* */
b
/*
*/
c
d
e
//
f //
In this program I didn't remove the comment symbol as I was making lexical analyzer.You can remove the comment symbols by editing the program statements where i had put the comments.
public class testSpace {
public static void main(String[] args) {
try {
String filePath = "C:\\Users\\Sibil\\eclipse-workspace\\Assignment1\\src\\Input.txt";
FileReader fr = new FileReader(filePath);
String line;
BufferedReader br = new BufferedReader(fr);
int lineNumber = 0;
while ((line = br.readLine()) != null) {
lineNumber++;
if ((line.contains("/*") && line.contains("*/")) || (line.contains("//"))) {
line = findreplacement(line);
System.out.println(line);//Begining of the multiline comment
} else if (line.contains("/*")) {
line = getStartString(line);
System.out.println(line);
while ((line = br.readLine()) != null) {
lineNumber++;
if (line.contains("*/")) {
line = getEndString(line);
System.out.println(line);//Print the end of a Multline comment
break;
} else {
line = " ";
System.out.println(line);//Blank Space for commented line inside a multiline comment
}
}
} else
System.out.println(line);//Line without comment
}
} catch (Exception e) {
System.out.println(e);
}
}
private static String getEndString(String s) {
int end = s.indexOf("*/");
String lineEnd = s.substring(end, s.length());//Edit here if you don't need the comment symbol by substracting 2 or adding 2
return lineEnd;
}
private static String getStartString(String s) {
int start = s.indexOf("/*");
String lineStart = s.substring(0, start + 2);//Edit here if you don't need the comment symbol by substracting 2 or adding 2
return lineStart;
}
private static String findreplacement(String s) {
String line = "";
if (s.contains("//")) {
int start = s.indexOf("//");
line = s.substring(0, start + 2);//Edit here if you don't need the comment symbol by substracting 2 or adding 2
} else if ((s.contains("/*") && s.contains("*/"))) {
int start = s.indexOf("/*");
int end = s.indexOf("*/");
String lineStart = s.substring(0, start + 2);//Edit here if you don't need the comment symbol by substracting 2 or adding 2
String lineEnd = s.substring(end, s.length());//Edit here if you don't need the comment symbol by substracting 2 or adding 2
line = lineStart + " " + lineEnd;
}
return line;
}
}
If your file has a line like this,
System.out.println("Hello World/*Do Something */");
It will fail and the output will be:
System.out.println("Hello world");

Scanner reading "\n" or Enter/Return key

So, I'm trying to set up a simple config for a project. The goal here is to read certain values from a file and, if the file does not exist, to write said file. Currently, the creation of the file works fine, but my Scanner is acting a bit funny. When I reach the code
case "resolution": resolution = readConfig.next();
it makes the value of resolution "1024x768\nvsync" whereas it should only be "1024x768". If it were working as I planned, then the next value for
readingConfig = readConfig.next();
at the beginning of my while loop would be "vsync", which my switch statement would then catch and continue editing the values to those of the file.
Why is my Scanner picking up on the "\n" that is the 'enter' to the next line in the text document?
public static void main(String[] args) {
int musicVol = 0;
int soundVol = 0;
String resolution = null;
boolean vsync = false;
Scanner readConfig;
String readingConfig;
File configFile = new File(gameDir + "\\config.txt");
if (configFile.exists() != true) {
try {
configFile.createNewFile();
PrintWriter writer = new PrintWriter(gameDir + "\\config.txt");
writer.write("resolution = 1024x768 \n vsync = true \n music = 100 \n sound = 100");
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
try {
readConfig = new Scanner(configFile);
readConfig.useDelimiter(" = ");
while (readConfig.hasNext()) {
readingConfig = readConfig.next();
switch (readingConfig) {
case "resolution":
resolution = readConfig.next();
break;
case "vsync":
vsync = readConfig.nextBoolean();
break;
case "music":
musicVol = readConfig.nextInt();
break;
case "sound":
soundVol = readConfig.nextInt();
break;
}
}
readConfig.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
Instead of using .hasNext and .next(), you will have to use .hasNextLine() and .nextLine(). I would write this as a comment, but do not have enought rep to comment yet.
You are using next() which will not delimit your lines, try using nextLine() instead:
String nextLine() Advances this scanner past the current line and
returns the input that was skipped.
I'd suggest not using the delimiter, and get the whole line instead as a string, and then split the string to the parts you want.
Something like
String nextLine = readConfig.nextLine();
String[] split = nextLine.split(" = ");
String resolution = split[1]; // just an example
...
Aha, solved!
What this does is pull the entire text file into a String (using Scanner.nextLine() removes the '\n') and then adds the " = " at the end of each line instead. Thus, when the Scanner runs back over the String for the switch, it will already be ignoring the " = " and pull the desired information from the String.
String config = "";
try {
readConfig = new Scanner(configFile);
while (readConfig.hasNext()) {
config += readConfig.nextLine() + " = ";
readConfig = new Scanner(config);
readConfig.useDelimiter(" = ");

how to fix wrong encoding in translation application?

I implemented some translational application and faced with the problem - incorrect output.
For example:
Input:
"Three predominant stories interweave: a dynastic war among several
families for control of Westeros; the rising threat of the dormant
cold supernatural Others dwelling beyond an immense wall of ice on
Westeros' northern border; and the am"
Output:
"%0D%0A%0D%0AThe+история+ -
+A+Песня+из+Лед+и+Fire+принимает++++вымышленный+континентах+Вестероса+и+Essos%2C+with+a+история++тысяч++лет.++Точка+++++главе+в+в+история+
- +a+ограниченной+перспектива+++ассортимент++символы+,+растет+from+девяти+в+в+первое++тридцать
один+++пятый+of+the+романов.+Три+преобладающим+рассказы+переплетаются%3A+a+династические+war+среди+несколько+семей+for+control++Вестероса%3B++рост+угрозу+of+the+спящие+cold+сверхъестественное+Другие+жилье+за+an+огромный+wall++лед+on+Вестероса%27+сев.
границы%3B+и++am"
I know that URLEncoder is the reason of wrong output (all these "+" and "%"), but don't know how to fix it.
Here is some code:
// This method should take an original text that should be
// translated and encode it to use as URL parameter.
private String encodeText(String text) throws IOException {
return URLEncoder.encode(text, "UTF-8");
}
// It shold “extract” translated text from Yandex Translator response.
// More details about response format you can find at
// http://api.yandex.ru/translate/doc/dg/reference/translate.xml,
// we need to use XML interface.
private String parseContent(String content)
throws UnsupportedEncodingException {
String begin = "<text>";
String end = "</text>";
String result = "";
int i, j;
i = content.indexOf(begin);
j = content.indexOf(end);
if ((i != -1) && (j != -1)) {
result = content.substring((i + begin.length()), j);
}
return new String(result.getBytes(), "UTF-8");
}
// method translate() should return translation of original text.
// urlSourceProvider loads translated text
public String translate(String original) throws IOException {
return parseContent(urlSourceProvider
.load(prepareURL(encodeText(original))));
}
Try:
String result = URLDecoder.decode(variable, "UTF-8");
it should decode your text.

Categories