Ideal method to truncate a string with ellipsis

Ideal method to truncate a string with ellipsis - java

I'm sure all of us have seen ellipsis' on Facebook statuses (or elsewhere), and clicked "Show more" and there are only another 2 characters or so. I'd guess this is because of lazy programming, because surely there is an ideal method.
Mine counts slim characters [iIl1] as "half characters", but this doesn't get around ellipsis' looking silly when they hide barely any characters.
Is there an ideal method? Here is mine:
/**
* Return a string with a maximum length of <code>length</code> characters.
* If there are more than <code>length</code> characters, then string ends with an ellipsis ("...").
*
* #param text
* #param length
* #return
*/
public static String ellipsis(final String text, int length)
{
// The letters [iIl1] are slim enough to only count as half a character.
length += Math.ceil(text.replaceAll("[^iIl]", "").length() / 2.0d);
if (text.length() > length)
{
return text.substring(0, length - 3) + "...";
}
return text;
}
Language doesn't really matter, but tagged as Java because that's what I'm mostly interested in seeing.

I like the idea of letting "thin" characters count as half a character. Simple and a good approximation.
The main issue with most ellipsizings however, are (imho) that they chop of words in the middle. Here is a solution taking word-boundaries into account (but does not dive into pixel-math and the Swing-API).
private final static String NON_THIN = "[^iIl1\\.,']";
private static int textWidth(String str) {
return (int) (str.length() - str.replaceAll(NON_THIN, "").length() / 2);
}
public static String ellipsize(String text, int max) {
if (textWidth(text) <= max)
return text;
// Start by chopping off at the word before max
// This is an over-approximation due to thin-characters...
int end = text.lastIndexOf(' ', max - 3);
// Just one long word. Chop it off.
if (end == -1)
return text.substring(0, max-3) + "...";
// Step forward as long as textWidth allows.
int newEnd = end;
do {
end = newEnd;
newEnd = text.indexOf(' ', end + 1);
// No more spaces.
if (newEnd == -1)
newEnd = text.length();
} while (textWidth(text.substring(0, newEnd) + "...") < max);
return text.substring(0, end) + "...";
}
A test of the algorithm looks like this:

I'm shocked no one mentioned Commons Lang StringUtils#abbreviate().
Update: yes it doesn't take the slim characters into account but I don't agree with that considering everyone has different screens and fonts setup and a large portion of the people that land here on this page are probably looking for a maintained library like the above.

It seems like you might get more accurate geometry from the Java graphics context's FontMetrics.
Addendum: In approaching this problem, it may help to distinguish between the model and view. The model is a String, a finite sequence of UTF-16 code points, while the view is a series of glyphs, rendered in some font on some device.
In the particular case of Java, one can use SwingUtilities.layoutCompoundLabel() to effect the translation. The example below intercepts the layout call in BasicLabelUI to demonstrate the effect. It may be possible to use the utility method in other contexts, but the appropriate FontMetrics would have to be be determined empirically.
import java.awt.Color;
import java.awt.EventQueue;
import java.awt.Font;
import java.awt.FontMetrics;
import java.awt.GridLayout;
import java.awt.Rectangle;
import java.awt.event.ComponentAdapter;
import java.awt.event.ComponentEvent;
import javax.swing.BorderFactory;
import javax.swing.Icon;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.border.EmptyBorder;
import javax.swing.border.LineBorder;
import javax.swing.plaf.basic.BasicLabelUI;
/** #see http://stackoverflow.com/questions/3597550 */
public class LayoutTest extends JPanel {
private static final String text =
"A damsel with a dulcimer in a vision once I saw.";
private final JLabel sizeLabel = new JLabel();
private final JLabel textLabel = new JLabel(text);
private final MyLabelUI myUI = new MyLabelUI();
public LayoutTest() {
super(new GridLayout(0, 1));
this.setBorder(BorderFactory.createCompoundBorder(
new LineBorder(Color.blue), new EmptyBorder(5, 5, 5, 5)));
textLabel.setUI(myUI);
textLabel.setFont(new Font("Serif", Font.ITALIC, 24));
this.add(sizeLabel);
this.add(textLabel);
this.addComponentListener(new ComponentAdapter() {
#Override
public void componentResized(ComponentEvent e) {
sizeLabel.setText(
"Before: " + myUI.before + " after: " + myUI.after);
}
});
}
private static class MyLabelUI extends BasicLabelUI {
int before, after;
#Override
protected String layoutCL(
JLabel label, FontMetrics fontMetrics, String text, Icon icon,
Rectangle viewR, Rectangle iconR, Rectangle textR) {
before = text.length();
String s = super.layoutCL(
label, fontMetrics, text, icon, viewR, iconR, textR);
after = s.length();
System.out.println(s);
return s;
}
}
private void display() {
JFrame f = new JFrame("LayoutTest");
f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
f.add(this);
f.pack();
f.setLocationRelativeTo(null);
f.setVisible(true);
}
public static void main(String[] args) {
EventQueue.invokeLater(new Runnable() {
#Override
public void run() {
new LayoutTest().display();
}
});
}
}

If you're talking about a web site - ie outputting HTML/JS/CSS, you can throw away all these solutions because there is a pure CSS solution.
text-overflow:ellipsis;
It's not quite as simple as just adding that style to your CSS, because it interracts with other CSS; eg it requires that the element has overflow:hidden; and if you want your text on a single line, white-space:nowrap; is good too.
I have a stylesheet that looks like this:
.myelement {
word-wrap:normal;
white-space:nowrap;
overflow:hidden;
-o-text-overflow:ellipsis;
text-overflow:ellipsis;
width: 120px;
}
You can even have a "read more" button that simply runs a javascript function to change the styles, and bingo, the box will re-size and the full text will be visible. (in my case though, I tend to use the html title attribute for the full text, unless it's likely to get very long)
Hope that helps. It's a much simpler solution that trying to mess calculate the text size and truncate it, and all that. (of course, if you're writing a non-web-based app, you may still need to do that)
There is one down-side to this solution: Firefox doesn't support the ellipsis style. Annoying, but I don't think critical -- It does still truncate the text correctly, as that is dealt with by by overflow:hidden, it just doesn't display the ellipsis. It does work in all the other browsers (including IE, all the way back to IE5.5!), so it's a bit annoying that Firefox doesn't do it yet. Hopefully a new version of Firefox will solve this issue soon.
[EDIT]
People are still voting on this answer, so I should edit it to note that Firefox does now support the ellipsis style. The feature was added in Firefox 7. If you're using an earlier version (FF3.6 and FF4 still have some users) then you're out of luck, but most FF users are now okay. There's a lot more detail about this here: text-overflow:ellipsis in Firefox 4? (and FF5)

For me this would be ideal -
public static String ellipsis(final String text, int length)
{
return text.substring(0, length - 3) + "...";
}
I would not worry about the size of every character unless I really know where and in what font it is going to be displayed. Many fonts are fixed width fonts where every character has same dimension.
Even if its a variable width font, and if you count 'i', 'l' to take half the width, then why not count 'w' 'm' to take double the width? A mix of such characters in a string will generally average out the effect of their size, and I would prefer ignoring such details. Choosing the value of 'length' wisely would matter the most.

Using Guava's com.google.common.base.Ascii.truncate(CharSequence, int, String) method:
Ascii.truncate("foobar", 7, "..."); // returns "foobar"
Ascii.truncate("foobar", 5, "..."); // returns "fo..."

How about this (to get a string of 50 chars):
text.replaceAll("(?<=^.{47}).*$", "...");

public static String getTruncated(String str, int maxSize){
int limit = maxSize - 3;
return (str.length() > maxSize) ? str.substring(0, limit) + "..." : str;
}

If you're worried about the ellipsis only hiding a very small number of characters, why not just check for that condition?
public static String ellipsis(final String text, int length)
{
// The letters [iIl1] are slim enough to only count as half a character.
length += Math.ceil(text.replaceAll("[^iIl]", "").length() / 2.0d);
if (text.length() > length + 20)
{
return text.substring(0, length - 3) + "...";
}
return text;
}

I'd go with something similar to the standard model that you have. I wouldn't bother with the character widths thing - as #Gopi said it is probably goign to all balance out in the end. What I'd do that is new is have another paramter called something like "minNumberOfhiddenCharacters" (maybe a bit less verbose). Then when doign the ellipsis check I'd do something like:
if (text.length() > length+minNumberOfhiddenCharacters)
{
return text.substring(0, length - 3) + "...";
}
What this will mean is that if your text length is 35, your "length" is 30 and your min number of characters to hide is 10 then you would get your string in full. If your min number of character to hide was 3 then you would get the ellipsis instead of those three characters.
The main thing to be aware of is that I've subverted the meaning of "length" so that it is no longer a maximum length. The length of the outputted string can now be anything from 30 characters (when the text length is >40) to 40 characters (when the text length is 40 characters long). Effectively our max length becomes length+minNumberOfhiddenCharacters. The string could of course be shorter than 30 characters when the original string is less than 30 but this is a boring case that we should ignore.
If you want length to be a hard and fast maximum then you'd want something more like:
if (text.length() > length)
{
if (text.length() - length < minNumberOfhiddenCharacters-3)
{
return text.substring(0, text.length() - minNumberOfhiddenCharacters) + "...";
}
else
{
return text.substring(0, length - 3) + "...";
}
}
So in this example if text.length() is 37, length is 30 and minNumberOfhiddenCharacters = 10 then we'll go into the second part of the inner if and get 27 characters + ... to make 30. This is actually the same as if we'd gone into the first part of the loop (which is a sign we have our boundary conditions right). If the text length was 36 we'd get 26 characters + the ellipsis giving us 29 characters with 10 hidden.
I was debating whether rearranging some of the comparison logic would make it more intuitive but in the end decided to leave it as it is. You might find that text.length() - minNumberOfhiddenCharacters < length-3 makes it more obvious what you are doing though.

In my eyes, you can't get good results without pixel math.
Thus, Java is probably the wrong end to fix this problem when you are in a web application context (like facebook).
I'd go for javascript. Since Javascript is not my primary field of interest, I can't really judge if this is a good solution, but it might give you a pointer.

Most of this solutions don't take font metrics into account, here is a very simple but working solution for java swing that i have used for years now.
private String ellipsisText(String text, FontMetrics metrics, Graphics2D g2, int targetWidth) {
String shortText = text;
int activeIndex = text.length() - 1;
Rectangle2D textBounds = metrics.getStringBounds(shortText, g2);
while (textBounds.getWidth() > targetWidth) {
shortText = text.substring(0, activeIndex--);
textBounds = metrics.getStringBounds(shortText + "...", g2);
}
return activeIndex != text.length() - 1 ? shortText + "..." : text;
}

For simple cases, I have used String.format for this.
Here I abbreviate to max 10 chars and add ellipses:
String abbreviate(String longString) {
return String.format("%.10s...", longString);
}
Little known fact the "precision" numbers in the format pattern is used for truncation in strings.
Add your own length-check, of course, if you want to make ellipses conditional. (I was shortening a JWT for logging, so I know it's going to be longer)
As a bonus, if the String is already shorter than the precision, there is no padding, it simply leaves it as is.
> System.out.println(abbreviate("This is a very long string"));
> System.out.println(abbreviate("Shorty"));
This is a ...
Shorty...

You can also simply implement like this:
mb_strimwidth($string, 0, 120, '...')
Thanks.

Related

Write/Print to the bottom of terminal

I'm creating a sort of messaging program, where users can send messages to each other. I would like to make this look like any social messaging app where the most recently received message appears at the bottom of the screen and the previous messages are shown above (older messages appear higher up). Is there a way to print a message (String) to a specific line in terminal, or to the bottom line?
Mac OS x

Oh the days of yore! I remember when every village lad, who dreamed to become a Wizard, knew his ANSI control codes by heart. Surely those byte sequences allowed one in the knowledge to produce awesome visions and bright colors of exotic places on text displays lesser ones thought only as dumb terminals.
But the days are gone, and we have high resolution displays and 3D graphics, mouses and touchscreens. Time to go, you old fool! But wait! In the heart of every terminal window, (we are not speaking about Windows now) there beats a golden heart of old VT100 terminal. So yes, it is possible to control cursor positions and scroll areas with Java console output.
The idea is, that the terminal is not so dump after all. It only waits escape sequences for you to tell what special it should do. Instructions start with escape character, which is non-printable ASCII character 27, usually written in hexadecimal 0x1b or octal 033. After that some human readable stuff follows, most often bracket [, some numerical information and a letter. If more numbers are needed, they are separated by semicolons.
For instance, to change font color to red, you output sequence <ESC>[31m like this:
System.out.println("\033[31mThis should appear red");
You'd better open a separate terminal window for fiddling with the codes, because the display turns easily garbled. In fact the colors of shell prompt and directory listings are implemented by these codes and can easily be personalized.
If you want to separate areas from top and bottom of the window from scrolling, there's a code for that, <ESC>[<top>;<bottom>r where you declare the lines from top and bottom, between where the scrolling happens, for instance: \033[2;22r
Links:
Wikipedia's quite exhaustive article: Ansi escape code
https://en.wikipedia.org/wiki/ANSI_escape_code
Clear list of the codes: http://www.termsys.demon.co.uk/vtansi.htm
And of course answer is not an answer without running code. Right? I have to admit I got little carried away with this. Although it felt strange to use Java for terminal sequences. K&R C or COMMODORE BASIC would have been more fitting. I made very simple demo, but slowed down the printing speed to same level as old 1200 baud modem, so you can easily observe what's happening. Set the SPEED to 0 to disable.
Escape sequences are same kind of markup as HTML text formatting tags, and what's best, The BLINK is still there!
Save as TerminalDemo.java, compile with javac TerminalDemo.java and run with command: java TerminalDemo
import java.io.*;
public class TerminalDemo {
// Default speed in bits per second for serial line simulation
// Set 0 to disable
static final int SPEED = 1200;
// ANSI Terminal codes
static final String ESC = "\033";
static String clearScreen() { return "\033[2J"; }
static String cursorHome() { return "\033[H"; }
static String cursorTo(int row, int column) {
return String.format("\033[%d;%dH", row, column);
}
static String cursorSave() { return "\033[s"; }
static String cursorRestore() { return "\033[u"; }
static String scrollScreen() { return "\033[r"; }
static String scrollSet(int top, int bottom) {
return String.format("\033[%d;%dr", top, bottom);
}
static String scrollUp() { return "\033D"; }
static String scrollDown() { return "\033D"; }
static String setAttribute(int attr) {
return String.format("\033[%dm", attr);
}
static final int ATTR_RESET = 0;
static final int ATTR_BRIGHT = 1;
static final int ATTR_USCORE = 4;
static final int ATTR_BLINK = 5;
static final int ATTR_REVERSE = 7;
static final int ATTR_FCOL_BLACK = 30;
static final int ATTR_FCOL_RED = 31;
static final int ATTR_FCOL_GREEN = 32;
static final int ATTR_FCOL_YELLOW = 33;
static final int ATTR__BCOL_BLACK = 40;
static final int ATTR__BCOL_RED = 41;
static final int ATTR__BCOL_GREEN = 42;
public static void main(String[] args) {
// example string showing some text attributes
String s = "This \033[31mstring\033[32m should \033[33mchange \033[33m color \033[41m and start \033[5m blinking!\033[0m Isn't that neat?\n";
// Reset scrolling, clear screen and bring cursor home
System.out.print(clearScreen());
System.out.print(scrollScreen());
// Print example string s
slowPrint(s);
// some text attributes
slowPrint("This "
+ setAttribute(ATTR_USCORE) + "should be undescored\n"
+ setAttribute(ATTR_RESET)
+ setAttribute(ATTR_FCOL_RED) + "and this red\n"
+ setAttribute(ATTR_RESET)
+ "some "
+ setAttribute(ATTR_BRIGHT)
+ setAttribute(ATTR_FCOL_YELLOW)
+ setAttribute(ATTR_BLINK) + "BRIGHT YELLOW BLINKIN\n"
+ setAttribute(ATTR_RESET)
+ "could be fun.\n\n"
+ "Please press ENTER");
// Wait for ENTER
try { System.in.read(); } catch(IOException e) {e.printStackTrace();}
// Set scroll area
slowPrint(""
+ clearScreen()
+ scrollSet(2,20)
+ cursorTo(1,1)
+ "Cleared screen and set scroll rows Top: 2 and Bottom: 20\n"
+ cursorTo(21,1)
+ "Bottom area starts here"
+ cursorTo(2,1)
+ "");
// print some random text
slowPrint(randomText(60));
// reset text attributes, reset scroll area and set cursor
// below scroll area
System.out.print(setAttribute(ATTR_RESET));
System.out.print(scrollScreen());
System.out.println(cursorTo(22,1));
}
// Slow things down to resemble old serial terminals
private static void slowPrint(String s) {
slowPrint(s, SPEED);
}
private static void slowPrint(String s, int bps) {
for (int i = 0; i < s.length(); i++) {
System.out.print(s.charAt(i));
if(bps == 0) continue;
try { Thread.sleep((int)(8000.0 / bps)); }
catch(InterruptedException ex)
{ Thread.currentThread().interrupt(); }
}
}
// Retursn a character representation of sin graph
private static String randomText(int lines) {
String r = "";
for(int i=0; i<lines; i++) {
int sin = (int)Math.abs((Math.sin(1.0/20 * i)*30));
r += setAttribute((sin / 4) + 30);
for(int j=0; j<80; j++) {
if(j > 40 + sin)
break;
r += (j < (40-sin)) ? " " : "X";
}
r += setAttribute(ATTR_RESET) + "\n";
}
return r;
}
}

How can I display a short length version of a file name in java inside of a jlist

I have a JList that displays a filelist. The style I have it set to looks good with a FileFilter set to only show files and directories with names that are 15 characters long, however I still want to show the files that are longer than that, just show the first 15 characters or so. Basically, I want it to show this:
If I have a text file that says "1234567891234567.txt" - that has 20 characters including the ".txt" and it won't show up in the list. But I want it to show something like this:
"12345...567.txt" or something similar. Is there a way to do this?
Would I have to create a seperate array and copy everything over, and edit the value of the new array to be no longer than 15 characters? I tried looking for a function that would change the name of the file but I couldn't find any. Suggestions?

You can check the length of file name and abbreviate it if it contains more than 20 characters, like the method below:
private static String getShortName(String fileName){
if(fileName.length() <= 20){
return fileName;
}
String extension = fileName.substring(fileName.lastIndexOf("."));
String name = fileName.substring(0, fileName.lastIndexOf("."));
return name.substring(0, 5) + "..." + name.substring(name.length() - 4) + extension;
}
public static void main(String[] args) throws Exception {
System.out.println(getShortName("123.txt"));
System.out.println(getShortName("123rewe.txt"));
System.out.println(getShortName("123fdsfdsfdasfadsfdsgafgaf.txt"));
}
Please note that it won't work if the extension itself is more than 20 characters or file name does not have any extension. However, you can modify it as per your requirement.

Coming up with an abbreviated string that will fit in a certain amount of space isn’t as trivial as it may sound. Sure, you could just make sure your text is no longer than 15 characters, but all Swing look-and-feels assign a variable-width font to JLists. The system look-and-feel will use whatever font the underlying desktop uses for lists, which also is a variable-width font in all desktops I’m aware of.
Which means, the 20-character string IIIIIIIIIIIIIIII.txt and the 20-character string WWWWWWWWWWWWWWWW.txt are not the same width. Truncating each of them to fit in the JList’s space will not be as simple as making them 15 characters long.
Fortunately, you can use a FontMetrics to calculate a string’s visual size.
The simplest, though hardly most efficient, algorithm is to whittle down a string one character at a time until it fits in the JList’s width:
static <T> JList<T> createList(Collection<T> items) {
JList<T> list = new JList<T>(new Vector<T>(items)) {
private static final long serialVersionUID = 1;
#Override
public boolean getScrollableTracksViewportWidth() {
return true;
}
};
list.setCellRenderer(new DefaultListCellRenderer() {
private static final long serialVersionUID = 1;
private Insets insets = new Insets(0, 0, 0, 0);
#Override
public Component getListCellRendererComponent(JList<?> list,
Object value,
int index,
boolean selected,
boolean focused) {
insets = list.getInsets(insets);
int listWidth =
list.getWidth() - insets.left - insets.right - 4;
if (listWidth > 0 &&
value != null &&
!(value instanceof Icon)) {
FontMetrics metrics = list.getFontMetrics(list.getFont());
Graphics g = list.getGraphics();
String text = value.toString();
while (text.length() > 1 &&
metrics.getStringBounds(text, g).getWidth() > listWidth) {
int midpoint = text.length() / 2;
if (text.charAt(midpoint) != '\u2026') {
// Replace center character with ellipsis.
text = text.substring(0, midpoint) + '\u2026'
+ text.substring(midpoint + 1);
} else {
// Remove character before or after ellipsis.
if (text.length() % 2 == 0) {
midpoint--;
} else {
midpoint++;
}
text = text.substring(0, midpoint)
+ text.substring(midpoint + 1);
}
}
value = text;
g.dispose();
}
return super.getListCellRendererComponent(list, value, index,
selected, focused);
}
});
return list;
}
(Notice that “…” is not three period characters, but rather the ellipsis character. What’s the difference? The ellipsis is kerned differently, justified differently, read by screen readers differently, can’t be broken up by word-wrap, and is simply the correct punctuation. You wouldn’t use two apostrophes to represent a double-quote.)
I naïvely start by replacing the center character in each string, regardless of the width of the characters on either side of that character, but a more intelligent and possibly more visually pleasing approach would be to use the character visually located at the center, using TextLayout.hitTestChar.

Implementation of Basic Sliding Window Algorithm in Java

I am attempting to implement the following Basic Sliding Window algorithm in Java. I get the basic idea of it, but I am a bit confused by some the wording, specifically the sentence in bold:
A sliding window of ﬁxed width w is moved across the ﬁle,
and at every position k in the ﬁle, the ﬁngerprint of
its content is computed. Let k be a chunk boundary
(i.e., Fk mod n = 0). Instead of taking the hash of the
entire chunk, we choose the numerically smallest ﬁngerprint
of a sliding window within this chunk. Then we compute a hash
of this randomly chosen window within the chunk. Intuitively,
this approach would permit small edits within the chunks to
have less impact on the similarity computation. This method
produces a variable length document signature, where the
number of ﬁngerprints in the signature is proportional to
the document length.
Please see my code/results below. Am I understanding the basic idea of the algorithm? As per the text in bold, what does it mean to "choose the numerically smallest fingerprint of a sliding window within its chunk"? I am currently just hashing the entire chunk.
code:
public class BSW {
/**
* #param args
*/
public static void main(String[] args) {
int w = 15; // fixed width of sliding window
char[] chars = "Once upon a time there lived in a certain village a little
country girl, the prettiest creature who was ever seen. Her mother was
excessively fond of her; and her grandmother doted on her still more. This
good woman had a little red riding hood made for her. It suited the girl so
extremely well that everybody called her Little Red Riding Hood."
.toCharArray();
List<String> fingerprints = new ArrayList<String>();
for (int i = 0; i < chars.length; i = i + w) {
StringBuffer sb = new StringBuffer();
if (i + w < chars.length) {
sb.append(chars, i, w);
System.out.println(i + ". " + sb.toString());
} else {
sb.append(chars, i, chars.length - i);
System.out.println(i + ". " + sb.toString());
}
fingerprints.add(hash(sb));
}
}
private static String hash(StringBuffer sb) {
// Implement hash (MD5)
return sb.toString();
}
}
results:
0. Once upon a tim
15. e there lived i
30. n a certain vil
45. lage a little c
60. ountry girl, th
75. e prettiest cre
90. ature who was e
105. ver seen. Her m
120. other was exces
135. sively fond of
150. her; and her gr
165. andmother doted
180. on her still m
195. ore. This good
210. woman had a lit
225. tle red riding
240. hood made for h
255. er. It suited t
270. he girl so extr
285. emely well that
300. everybody call
315. ed her Little R
330. ed Riding Hood.

That is not a sliding window. All you have done is break up the input into disjoint chunks. An example of a sliding window would be
Once upon a time
upon a time there
a time there lived
etc.

The simple answer is NO per my understanding (I once studied sliding window algorithm years ago, so I just remember the principles, while cannot remember some details. Correct me if you have more insightful understanding).
As the name of the algorithm 'Sliding Window', your window should be sliding not jumping as it says
at every position k in the ﬁle, the ﬁngerprint of its content is computed
in your quotes. That is to say the window slides one character each time.
Per my knowledge, the concept of chunks and windows should be distinguished. So should be fingerprint and hash, although they could be the same. Given it too expense to compute hash as fingerprint, I think Rabin fingerprint is a more proper choice. The chunk is a large block of text in the document and a window highlight a small portion in a chunk.
IIRC, the sliding windows algorithm works like this:
The text file is chunked at first;
For each chunk, you slide the window (a 15-char block in your running case) and compute their fingerprint for each window of text;
You now have the fingerprint of the chunk, whose length is proportional to the length of chunk.
The next is how you use the fingerprint to compute the similarity between different documents, which is out of my knowledge. Could you please give us the pointer to the article you referred in the OP. As an exchange, I recommend you this paper, which introduce a variance of sliding window algorithm to compute document similarity.
Winnowing: local algorithms for document fingerprinting
Another application you can refer to is rsync, which is a data synchronisation tool with block-level (corresponding to your chunk) deduplication. See this short article for how it works.

package com.perturbation;
import java.util.ArrayList;
import java.util.List;
public class BSW {
/**
* #param args
*/
public static void main(String[] args) {
int w = 2; // fixed width of sliding window
char[] chars = "umang shukla"
.toCharArray();
List<String> fingerprints = new ArrayList<String>();
for (int i = 0; i < chars.length+w; i++) {
StringBuffer sb = new StringBuffer();
if (i + w < chars.length) {
sb.append(chars, i, w);
System.out.println(i + ". " + sb.toString());
} else {
sb.append(chars, i, chars.length - i);
System.out.println(i + ". " + sb.toString());
}
fingerprints.add(hash(sb));
}
}
private static String hash(StringBuffer sb) {
// Implement hash (MD5)
return sb.toString();
}
}
this program may help you. and please try to make more efficent

Java update jLabel.setText via for loop

I basically checked out a book from the Library and started learning Java. I'm trying to code a little score calculator for my golf league and this site has been a lof of help! So thanks for even being here!
Now to the question:
I have a 9 labels, created with NetBeans GUI, with names like jLabel_Hole1, jLabel_Hole2, ...
If a user selects the radio option to play the front nine those labels have number 1 - 9 and if they change it to the "Back Nine" then they should display 10 - 18. I can manually set each label to the new value on a selection change but I wanted to know if there was a more elegant way and if so if one of you could be kind enough to explain how it works.
Here is the code that I want to try and truncate:
TGL.jLbl_Hole1.setText("10");
TGL.jLbl_Hole2.setText("11");
TGL.jLbl_Hole3.setText("12");
TGL.jLbl_Hole4.setText("13");
TGL.jLbl_Hole5.setText("14");
TGL.jLbl_Hole6.setText("15");
TGL.jLbl_Hole7.setText("16");
TGL.jLbl_Hole8.setText("17");
TGL.jLbl_Hole9.setText("18");
I've read some things about String being immutable and maybe it's just a limitation but I would think there has to be way and I just can't imagine it.
Thanks.

Basically, rather then creating a individual label for each hole, you should create an array of labels, where each element in the array represents a individual hole.
So instead of...
TGL.jLbl_Hole1.setText("10");
TGL.jLbl_Hole2.setText("11");
TGL.jLbl_Hole3.setText("12");
TGL.jLbl_Hole4.setText("13");
TGL.jLbl_Hole5.setText("14");
TGL.jLbl_Hole6.setText("15");
TGL.jLbl_Hole7.setText("16");
TGL.jLbl_Hole8.setText("17");
TGL.jLbl_Hole9.setText("18");
You would have...
for (JLabel label : TGL.holeLables) {
lable.setText(...);
}
A better solution would be to hide the labels from the developer and simply provide a setter...
TGL.setHoleText(hole, text); // hole is int and text is String
Internally to your TGL class, you have two choices...
If you've used the form editor in Netbeans, you're going to have to place the components that Netbeans creates into your own array...
private JLabel[] holes;
//...//
// Some where after initComponents is called...
holes = new JLabel[9];
holes[0] = jLbl_Hole1;
// There other 8 holes...
Then you would simply provide a setter and getter methods that can update or return the value...
public void setHole(int hole, String text) {
if (hole >= 0 && hole < holes.length) {
holes[hole].setText(text);
}
}
public String getHole() {
String text = null;
if (hole >= 0 && hole < holes.length) {
text = holes[hole].getText();
}
return text;
}
Take a closer look at the Arrays tutorial for more details...

I've never found a Java GUI-generator to provide code that's any good. I may be wrong--there may be a good one, but I always prefer to position and name them myself. So,
/**
* The JLabels for the holes on the golf course.
* <p>
* holeLabels[0][i] are for the outward holes, 1-9.
* holeLabels[1][i] are for the inward holes, 10-18.
*/
private JLabel[][] holeLabels;
/**
* The starts of the outward and inward ranges of holes.
*/
private static final int[] holeStart = {1, 10};
// Later
holeLabels = new JLabel[2][9];
for(final int i = 0; i < holeLabels.length; i++) {
for (final int j = 0; j < holeLabels[i].length; j++) {
holeLabel[i][j] = new JLabel();
holeLabel[i][j].setText(Integer.toString(holeStart[i] + j));
}
}
Interestingly, holeLabels.length is 2. holeLabels is an array of 2 arrays of 9 ints. i goes from 0 to 1, and j goes from 0 to 8, so the text computation works. The reason I did things this way is so you can easily place the labels in an appropriate GridLayout later.

Non-ASCII Character not displaying properly in JFrame or any Swing Component

Let's say we desire to have a non-ASCII character, for example, U+2082 (subscript 2).
Normally, we can display this in a swing component, such as JFrame, as Character.toString('\u2082').
Now, my issue is that I can't determine the the exact Unicode code, since the exact code is determined by the String supplied in the parameter. The parameter will always be a polyatomic ion - e.g. PO3. What my goal is to find the "3", turn that into a subscript 3 (U+2083), but also have the algorithm/method be abstracted enough that it will apply for any polyatomic ion (not just PO3, but also PO4 as well), and have it display correctly on a JFrame. I supply my method below.
private static String processName(String original)
{
char[] or = original.toCharArray();
int returned = -1;
for(int i = 0; i < or.length; i++)
{
if(Character.isDigit(or[i]))
{
returned = Integer.parseInt(Character.toString(or[i]));
or[i] = (char) (returned + 2080);
returned = -1;
}
}
return new String(or);
}
You probably are thinking, well, the code looks clean and should display correctly. However, the part (char) (returned+2080) doesn't display the symbol - it displays a blank box. I tried to fix it by setting a compatible font (GNU Unifont), but that didn't do anything. Any ideas?

2083 is a hex value, not decimal. See the unicode page for details about this character. I think the value you want is 0x2083, or 8323

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.