Related
I'm trying to solve an interview problem I was given a few years ago in preparation for upcoming interviews. The problem is outlined in a pdf here. I wrote a simple solution using DFS that works fine for the example outlined in the document, but I haven't been able to get the program to meet the criteria of
Your code should produce correct answers in under a second for a
10,000 x 10,000 Geo GeoBlock containing 10,000 occupied Geos.
To test this I generated a CSV file with 10000 random entries and when I run the code against it, it averages just over 2 seconds to find the largest geo block in it. I'm not sure what improvements could be made to my approach to cut the runtime by over half, other than running it on a faster laptop. From my investigations it appears the search itself seems to only take about 8ms, so perhaps the way I load the data into memory is the inefficient part?
I'd greatly appreciate an advice on how this could be improved. See code below:
GeoBlockAnalyzer
package analyzer.block.geo.main;
import analyzer.block.geo.model.Geo;
import analyzer.block.geo.result.GeoResult;
import java.awt.*;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeParseException;
import java.util.List;
import java.util.*;
public class GeoBlockAnalyzer {
private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
private final int width;
private final int height;
private final String csvFilePath;
private GeoResult result = new GeoResult();
// Map of the geo id and respective geo object
private final Map<Integer, Geo> geoMap = new HashMap<>();
// Map of coordinates to each geo in the grid
private final Map<Point, Geo> coordMap = new HashMap<>();
/**
* Constructs a geo grid of the given width and height, populated with the geo data provided in
* the csv file
*
* #param width the width of the grid
* #param height the height of the grid
* #param csvFilePath the csv file containing the geo data
* #throws IOException
*/
public GeoBlockAnalyzer(final int width, final int height, final String csvFilePath)
throws IOException {
if (!Files.exists(Paths.get(csvFilePath)) || Files.isDirectory(Paths.get(csvFilePath))) {
throw new FileNotFoundException(csvFilePath);
}
if (width <= 0 || height <= 0) {
throw new IllegalArgumentException("Input height or width is 0 or smaller");
}
this.width = width;
this.height = height;
this.csvFilePath = csvFilePath;
populateGeoGrid();
populateCoordinatesMap();
calculateGeoNeighbours();
// printNeighbours();
}
/** #return the largest geo block in the input grid */
public GeoResult getLargestGeoBlock() {
for (final Geo geo : this.geoMap.values()) {
final List<Geo> visited = new ArrayList<>();
search(geo, visited);
}
return this.result;
}
/**
* Iterative DFS implementation to find largest geo block.
*
* #param geo the geo to be evaluated
* #param visited list of visited geos
*/
private void search(Geo geo, final List<Geo> visited) {
final Deque<Geo> stack = new LinkedList<>();
stack.push(geo);
while (!stack.isEmpty()) {
geo = stack.pop();
if (visited.contains(geo)) {
continue;
}
visited.add(geo);
final List<Geo> neighbours = geo.getNeighbours();
for (int i = neighbours.size() - 1; i >= 0; i--) {
final Geo g = neighbours.get(i);
if (!visited.contains(g)) {
stack.push(g);
}
}
}
if (this.result.getSize() < visited.size()) {
this.result = new GeoResult(visited);
}
}
/**
* Creates a map of the geo grid from the csv file data
*
* #throws IOException
*/
private void populateGeoGrid() throws IOException {
try (final BufferedReader br = Files.newBufferedReader(Paths.get(this.csvFilePath))) {
int lineNumber = 0;
String line = "";
while ((line = br.readLine()) != null) {
lineNumber++;
final String[] geoData = line.split(",");
LocalDate dateOccupied = null;
// Handle for empty csv cells
for (int i = 0; i < geoData.length; i++) {
// Remove leading and trailing whitespace
geoData[i] = geoData[i].replace(" ", "");
if (geoData[i].isEmpty() || geoData.length > 3) {
throw new IllegalArgumentException(
"There is missing data in the csv file at line: " + lineNumber);
}
}
try {
dateOccupied = LocalDate.parse(geoData[2], formatter);
} catch (final DateTimeParseException e) {
throw new IllegalArgumentException("There input date is invalid on line: " + lineNumber);
}
this.geoMap.put(
Integer.parseInt(geoData[0]),
new Geo(Integer.parseInt(geoData[0]), geoData[1], dateOccupied));
}
}
}
/** Create a map of each coordinate in the grid to its respective geo */
private void populateCoordinatesMap() {
// Using the geo id, calculate its point on the grid
for (int i = this.height - 1; i >= 0; i--) {
int blockId = (i * this.width);
for (int j = 0; j < this.width; j++) {
if (this.geoMap.containsKey(blockId)) {
final Geo geo = this.geoMap.get(blockId);
geo.setCoordinates(i, j);
this.coordMap.put(geo.getCoordinates(), geo);
}
blockId++;
}
}
}
private void calculateGeoNeighbours() {
for (final Geo geo : this.geoMap.values()) {
addNeighboursToGeo(geo);
}
}
private void addNeighboursToGeo(final Geo geo) {
final int x = geo.getCoordinates().x;
final int y = geo.getCoordinates().y;
final Point[] possibleNeighbours = {
new Point(x, y + 1), new Point(x - 1, y), new Point(x + 1, y), new Point(x, y - 1)
};
Geo g;
for (final Point p : possibleNeighbours) {
if (this.coordMap.containsKey(p)) {
g = this.coordMap.get(p);
if (g != null) {
geo.getNeighbours().add(g);
}
}
}
}
private void printNeighbours() {
for (final Geo geo : this.geoMap.values()) {
System.out.println("Geo " + geo.getId() + " has the following neighbours: ");
for (final Geo g : geo.getNeighbours()) {
System.out.println(g.getId());
}
}
}
}
GeoResult
package analyzer.block.geo.result;
import analyzer.block.geo.model.Geo;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
public class GeoResult {
private final List<Geo> geosInBlock = new ArrayList<>();
public GeoResult() {
}
public GeoResult(final List<Geo> geosInBlock) {
this.geosInBlock.addAll(geosInBlock);
}
public List<Geo> getGeosInBlock() {
this.geosInBlock.sort(Comparator.comparingInt(Geo::getId));
return this.geosInBlock;
}
public int getSize() {
return this.geosInBlock.size();
}
#Override
public String toString() {
final StringBuilder sb = new StringBuilder();
sb.append("The geos in the largest cluster of occupied Geos for this GeoBlock are: \n");
for(final Geo geo : this.geosInBlock) {
sb.append(geo.toString()).append("\n");
}
return sb.toString();
}
}
Geo
package analyzer.block.geo.model;
import java.awt.Point;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class Geo {
private final int id;
private final String name;
private final LocalDate dateOccupied;
private final Point coordinate;
private final List<Geo> neighbours = new ArrayList<>();
public Geo (final int id, final String name, final LocalDate dateOccupied) {
this.id = id;
this.name = name;
this.dateOccupied = dateOccupied;
this.coordinate = new Point();
}
public int getId() {
return this.id;
}
public String getName() {
return this.name;
}
public LocalDate getDateOccupied() {
return this.dateOccupied;
}
public void setCoordinates(final int x, final int y) {
this.coordinate.setLocation(x, y);
}
public Point getCoordinates() {
return this.coordinate;
}
public String toString() {
return this.id + ", " + this.name + ", " + this.dateOccupied;
}
public List<Geo> getNeighbours() {
return this.neighbours;
}
#Override
public int hashCode() {
return Objects.hash(this.id, this.name, this.dateOccupied);
}
#Override
public boolean equals(final Object obj) {
if(this == obj) {
return true;
}
if(obj == null || this.getClass() != obj.getClass()) {
return false;
}
final Geo geo = (Geo) obj;
return this.id == geo.getId() &&
this.name.equals(geo.getName()) &&
this.dateOccupied == geo.getDateOccupied();
}
}
The major optimization available here is a conceptual one. Unfortunately, this type of optimization is not easy to teach, nor look up in a reference somewhere. The principle being used here is:
It's (almost always) cheaper to use an analytic formula to compute a known result than to (pre)compute it. [1]
It's clear from your code & the definition of your problem that you are not taking advantage of this principle and the problem specification. In particular, one of the key points taken directly from the problem specification is this:
Your code should produce correct answers in under a second for a 10,000 x 10,000 Geo GeoBlock containing 10,000 occupied Geos.
When you read this statement a few things should be going through your mind (when thinking about runtime efficiency):
10,000^2 is a much larger number than 10,000 (exactly 10,000 times larger!) There is a clear efficiency gain if you can maintain an algorithm that is O(n) as opposed to O(n^2) (in the expected case because of the use of hashing.)
touching (i.e. computing any O(1) operation) for the entire grid is going to immediately yield a O(n^2) algorithm; clearly, this is something that must be avoided if possible
from the problem statement, we should never expect O(n^2) geo's that need to be touched. This should be a major hint as to what the person who wrote the problem is looking for. BFS or DFS is an O(N+M) algorithm where N,M are the number of nodes and edges touched. Thus, we should be expecting an O(n) search.
based on the above points, it is clear that the solution being looked for here should be O(10,000) for a problem input with grid size 10,000 x 10,000 and 10,000 geos
The solution you provided is O(n^2) because,
You use visited.contains where visited is a List. This is not showing up in your testing as a problem area because I suspect you are using small geo clusters. Try using a large geo cluster (one with 10,000 geos.) You should see a major slow down as compared to say the largest cluster having 3 geos. The solution here is to use an efficient data structure for visited, some that come to mind are a bit set (unknown to me if Java has any available, but any decent language should) or a hash set (clearly Java has some available.) Because you did not notice this in testing, this suggests to me you are not vetting/testing your code well enough with enough varied examples of the corner cases you expect. This should of come up immediately in any thorough testing/profiling of your code. As per my comment, I would of liked to have seen this type of groundwork/profiling done before the question was posted.
You touch the entire 10,000 x 10,000 grid in the function/member populateCoordinatesMap. This is clearly already O(n^2) where n=10,000. Notice, that the only location where coordMap is used outside of populateCoordinatesMap is in addNeighboursToGeo. This is a major bottleneck, and for no reason, addNeighboursToGeo can be computed in O(1) time without the need for a coordMap. However, we can still use your code as is with a minor modification given below.
I hope it is obvious how to fix (1). To fix (2), replace populateCoordinatesMap
/** Create a map of each coordinate in the grid to its respective geo */
private void populateCoordinatesMap() {
for (Map.Entry<int,Geo> entry : geoMap.entrySet()) {
int key = entry.getKey();
Geo value = entry.getValue();
int x = key % this.width;
int y = key / this.width;
value.setCoordinates(x, y);
this.coordMap.put(geo.getCoordinates(), geo);
}
}
Notice the principle being put to use here. Instead of iterating over the entire grid as you were doing before (O(n^2) immediately), this iterates only over the occupied Geos, and uses the analytic formula for indexing a 2D array (as opposed to doing copious computation to compute the same thing.) Effectively, this change improves populateCoordinatesMap from being O(n^2) to being O(n).
Some general & opinionated comments below:
Overall, I strongly disagree with using an object oriented approach over a procedural one for this problem. I think the OO approach is completely unjustified for how simple this code should be, but I understand that the interviewer wanted to see it.
This is a very simple problem you are trying to solve, and I think the object orientated approach you took here confounds it so much so you could not see the forest for the trees (or perhaps the trees for the forest.) A much simpler approach could of been taken in how this algorithm was implemented, even using an object oriented approach.
It's clear from the points above, you could benefit from knowing the available tools in the language you are working in. By this I mean you should know what containers are readily available and what the trade offs are for using each operation on each container. You should also know at least one decent profiling tool for the language you are working with if you are going to be looking into optimizing code. Given that you failed to post a profiling summary, even after I asked for it, it suggests to me you do not know of such a tool with Java. Learn one.
[1] I provide no reference for this principle because it is a first principle, and can be explained by the fact that running fewer constant time operations is cheaper than running many. The assumption here is that the known analytic form requires less computation. There are occasional exceptions to this rule. But it should be stressed that such exceptions are almost always because of hardware limitations or advantages. For example, when computing the hamming distance it is cheaper to use a precomputed LUT for computing the population count on a hardware architecture without access to SSE registers/operations.
Without testing, it seems to me that the main block here is the literal creation of the map, which could be up to 100,000,000 cells. There would be no need for that if instead we labeled each CSV entry and had a function getNeighbours(id, width, height) that returned the list of possible neighbour IDs (think modular arithmetic). As we iterate over each CSV entry in turn, if (1) neighbour IDs were already seen that all had the same label, we'd label the new ID with that label; if (2) no neighbours were seen, we'd use a new label for the new ID; and if (3) two or more different labels existed between seen neighbour IDs, we'd combine them to one label (say the minimal label), by having a hash that mapped a label to its "final" label. Also store the sum and size for each label. Your current solution is O(n), where n is width x height. The idea here would be O(n), where n is the number of occupied Geos.
Here's something really crude in Python that I wouldn't expect to have all scenarios handled but could hopefully give you an idea (sorry, I don't know Java):
def get_neighbours(id, width, height):
neighbours = []
if id % width != 0:
neighbours.append(id - 1)
if (id + 1) % width != 0:
neighbours.append(id + 1)
if id - width >= 0:
neighbours.append(id - width)
if id + width < width * height:
neighbours.append(id + width)
return neighbours
def f(data, width, height):
ids = {}
labels = {}
current_label = 0
for line in data:
[idx, name, dt] = line.split(",")
idx = int(idx)
this_label = None
neighbours = get_neighbours(idx, width, height)
no_neighbour_was_seen = True
for n in neighbours:
# A neighbour was seen
if n in ids:
no_neighbour_was_seen = False
# We have yet to assign a label to this ID
if not this_label:
this_label = ids[n]["label"]
ids[idx] = {"label": this_label, "data": name + " " + dt}
final_label = labels[this_label]["label"]
labels[final_label]["size"] += 1
labels[final_label]["sum"] += idx
labels[final_label]["IDs"] += [idx]
# This neighbour has yet to be connected
elif ids[n]["label"] != this_label:
old_label = ids[n]["label"]
old_obj = labels[old_label]
final_label = labels[this_label]["label"]
ids[n]["label"] = final_label
labels[final_label]["size"] += old_obj["size"]
labels[final_label]["sum"] += old_obj["sum"]
labels[final_label]["IDs"] += old_obj["IDs"]
del labels[old_label]
if no_neighbour_was_seen:
this_label = current_label
current_label += 1
ids[idx] = {"label": this_label, "data": name + " " + dt}
labels[this_label] = {"label": this_label, "size": 1, "sum": idx, "IDs": [idx]}
for i in ids:
print i, ids[i]["label"], ids[i]["data"]
print ""
for i in labels:
print i
print labels[i]
return labels, ids
data = [
"4, Tom, 2010-10-10",
"5, Katie, 2010-08-24",
"6, Nicole, 2011-01-09",
"11, Mel, 2011-01-01",
"13, Matt, 2010-10-14",
"15, Mel, 2011-01-01",
"17, Patrick, 2011-03-10",
"21, Catherine, 2011-02-25",
"22, Michael, 2011-02-25"
]
f(data, 4, 7)
print ""
f(data, 7, 4)
Output:
"""
4 0 Tom 2010-10-10
5 0 Katie 2010-08-24
6 0 Nicole 2011-01-09
11 1 Mel 2011-01-01
13 2 Matt 2010-10-14
15 1 Mel 2011-01-01
17 2 Patrick 2011-03-10
21 2 Catherine 2011-02-25
22 2 Michael 2011-02-25
0
{'sum': 15, 'size': 3, 'IDs': [4, 5, 6], 'label': 0}
1
{'sum': 26, 'size': 2, 'IDs': [11, 15], 'label': 1}
2
{'sum': 73, 'size': 4, 'IDs': [13, 17, 21, 22], 'label': 2}
---
4 0 Tom 2010-10-10
5 0 Katie 2010-08-24
6 0 Nicole 2011-01-09
11 0 Mel 2011-01-01
13 0 Matt 2010-10-14
15 3 Mel 2011-01-01
17 2 Patrick 2011-03-10
21 3 Catherine 2011-02-25
22 3 Michael 2011-02-25
0
{'sum': 39, 'size': 5, 'IDs': [4, 5, 6, 11, 13], 'label': 0}
2
{'sum': 17, 'size': 1, 'IDs': [17], 'label': 2}
3
{'sum': 58, 'size': 3, 'IDs': [21, 22, 15], 'label': 3}
"""
I am trying out easyModbus TCP, to read in holding registers of a ADAM 6017 analog unit. I am using the example client code to get familiar with this library. The problem I seem to be having is I need the values of the registers to be read as unsigned but it gives me signed values back. I have 3.3548 Volts attached to the unit, and the scale is set to 0-5v, and it outputs -10781 from the ADC. Here is what I have written:
package modbus.logger;
import de.re.easymodbus.modbusclient.*;
import java.lang.*;
/**
*
* #author Michael Haire
* SJVAPCD
*/
public class ModbusLogger {
public static void main(String[] args)
{
float volt;
int Input;
int x = 1;
float input;
ModbusClient modbusClient = new ModbusClient("192.168.1.201",502);
try
{ while(x>0){
modbusClient.Connect();
System.out.print("Raw ADC Value: ");System.out.println(modbusClient.ReadHoldingRegisters(0, 1)[0]);
Input = modbusClient.ReadHoldingRegisters(0, 1)[0];
System.out.print("Input: ");System.out.println(Input);
input = (float) Input;
volt = (float) ((input / 65536)*5.0);
System.out.print("Voltage: ");System.out.printf("%f%n" , volt);System.out.println("");
}}
catch (Exception e){
}
}
}
What should I do to get an unsigned value?
If I'm not mistaken the manual of your device:
https://www.i-components.fi/pdf/76-ADAM-6066-CE.pdf
says (page 268) that your channel 0 analog value should be in register 40001.
That means you should be using modbusClient.ReadHoldingRegisters instead of input registers.
Sometimes it's useful to check your device before you get busy writing code. To do that you can use modpoll (https://www.modbusdriver.com/modpoll.html) or something like QModMaster (https://sourceforge.net/projects/qmodmaster/).
I'm creating a sort of messaging program, where users can send messages to each other. I would like to make this look like any social messaging app where the most recently received message appears at the bottom of the screen and the previous messages are shown above (older messages appear higher up). Is there a way to print a message (String) to a specific line in terminal, or to the bottom line?
Mac OS x
Oh the days of yore! I remember when every village lad, who dreamed to become a Wizard, knew his ANSI control codes by heart. Surely those byte sequences allowed one in the knowledge to produce awesome visions and bright colors of exotic places on text displays lesser ones thought only as dumb terminals.
But the days are gone, and we have high resolution displays and 3D graphics, mouses and touchscreens. Time to go, you old fool! But wait! In the heart of every terminal window, (we are not speaking about Windows now) there beats a golden heart of old VT100 terminal. So yes, it is possible to control cursor positions and scroll areas with Java console output.
The idea is, that the terminal is not so dump after all. It only waits escape sequences for you to tell what special it should do. Instructions start with escape character, which is non-printable ASCII character 27, usually written in hexadecimal 0x1b or octal 033. After that some human readable stuff follows, most often bracket [, some numerical information and a letter. If more numbers are needed, they are separated by semicolons.
For instance, to change font color to red, you output sequence <ESC>[31m like this:
System.out.println("\033[31mThis should appear red");
You'd better open a separate terminal window for fiddling with the codes, because the display turns easily garbled. In fact the colors of shell prompt and directory listings are implemented by these codes and can easily be personalized.
If you want to separate areas from top and bottom of the window from scrolling, there's a code for that, <ESC>[<top>;<bottom>r where you declare the lines from top and bottom, between where the scrolling happens, for instance: \033[2;22r
Links:
Wikipedia's quite exhaustive article: Ansi escape code
https://en.wikipedia.org/wiki/ANSI_escape_code
Clear list of the codes: http://www.termsys.demon.co.uk/vtansi.htm
And of course answer is not an answer without running code. Right? I have to admit I got little carried away with this. Although it felt strange to use Java for terminal sequences. K&R C or COMMODORE BASIC would have been more fitting. I made very simple demo, but slowed down the printing speed to same level as old 1200 baud modem, so you can easily observe what's happening. Set the SPEED to 0 to disable.
Escape sequences are same kind of markup as HTML text formatting tags, and what's best, The BLINK is still there!
Save as TerminalDemo.java, compile with javac TerminalDemo.java and run with command: java TerminalDemo
import java.io.*;
public class TerminalDemo {
// Default speed in bits per second for serial line simulation
// Set 0 to disable
static final int SPEED = 1200;
// ANSI Terminal codes
static final String ESC = "\033";
static String clearScreen() { return "\033[2J"; }
static String cursorHome() { return "\033[H"; }
static String cursorTo(int row, int column) {
return String.format("\033[%d;%dH", row, column);
}
static String cursorSave() { return "\033[s"; }
static String cursorRestore() { return "\033[u"; }
static String scrollScreen() { return "\033[r"; }
static String scrollSet(int top, int bottom) {
return String.format("\033[%d;%dr", top, bottom);
}
static String scrollUp() { return "\033D"; }
static String scrollDown() { return "\033D"; }
static String setAttribute(int attr) {
return String.format("\033[%dm", attr);
}
static final int ATTR_RESET = 0;
static final int ATTR_BRIGHT = 1;
static final int ATTR_USCORE = 4;
static final int ATTR_BLINK = 5;
static final int ATTR_REVERSE = 7;
static final int ATTR_FCOL_BLACK = 30;
static final int ATTR_FCOL_RED = 31;
static final int ATTR_FCOL_GREEN = 32;
static final int ATTR_FCOL_YELLOW = 33;
static final int ATTR__BCOL_BLACK = 40;
static final int ATTR__BCOL_RED = 41;
static final int ATTR__BCOL_GREEN = 42;
public static void main(String[] args) {
// example string showing some text attributes
String s = "This \033[31mstring\033[32m should \033[33mchange \033[33m color \033[41m and start \033[5m blinking!\033[0m Isn't that neat?\n";
// Reset scrolling, clear screen and bring cursor home
System.out.print(clearScreen());
System.out.print(scrollScreen());
// Print example string s
slowPrint(s);
// some text attributes
slowPrint("This "
+ setAttribute(ATTR_USCORE) + "should be undescored\n"
+ setAttribute(ATTR_RESET)
+ setAttribute(ATTR_FCOL_RED) + "and this red\n"
+ setAttribute(ATTR_RESET)
+ "some "
+ setAttribute(ATTR_BRIGHT)
+ setAttribute(ATTR_FCOL_YELLOW)
+ setAttribute(ATTR_BLINK) + "BRIGHT YELLOW BLINKIN\n"
+ setAttribute(ATTR_RESET)
+ "could be fun.\n\n"
+ "Please press ENTER");
// Wait for ENTER
try { System.in.read(); } catch(IOException e) {e.printStackTrace();}
// Set scroll area
slowPrint(""
+ clearScreen()
+ scrollSet(2,20)
+ cursorTo(1,1)
+ "Cleared screen and set scroll rows Top: 2 and Bottom: 20\n"
+ cursorTo(21,1)
+ "Bottom area starts here"
+ cursorTo(2,1)
+ "");
// print some random text
slowPrint(randomText(60));
// reset text attributes, reset scroll area and set cursor
// below scroll area
System.out.print(setAttribute(ATTR_RESET));
System.out.print(scrollScreen());
System.out.println(cursorTo(22,1));
}
// Slow things down to resemble old serial terminals
private static void slowPrint(String s) {
slowPrint(s, SPEED);
}
private static void slowPrint(String s, int bps) {
for (int i = 0; i < s.length(); i++) {
System.out.print(s.charAt(i));
if(bps == 0) continue;
try { Thread.sleep((int)(8000.0 / bps)); }
catch(InterruptedException ex)
{ Thread.currentThread().interrupt(); }
}
}
// Retursn a character representation of sin graph
private static String randomText(int lines) {
String r = "";
for(int i=0; i<lines; i++) {
int sin = (int)Math.abs((Math.sin(1.0/20 * i)*30));
r += setAttribute((sin / 4) + 30);
for(int j=0; j<80; j++) {
if(j > 40 + sin)
break;
r += (j < (40-sin)) ? " " : "X";
}
r += setAttribute(ATTR_RESET) + "\n";
}
return r;
}
}
I am trying to fill in the code for the updateLetterCount() method, which should be quite similar to the updateDigramCount() method. However, I am stuck. Any suggestions would be helpful! I am having trouble with defining the letter variable, because I know it has to be defined for the Map. Any idea about how to go about doing so?
// imports of classes used for creating GUI
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import javax.swing.border.*;
// imports related to reading/writing files
import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.IOException;
// imports of general-purpose data-structures
import java.util.TreeMap;
import java.util.Map;
import java.util.Set;
/**
* WordCount
*
* WordCount is an application for analyzing the occurrence of words, letters,
* and letter-combinations that are found in a text block.
*
* The text block can be either pasted into a window, or it can be loaded from a
* text-file.
*
*/
public class WordCount extends JFrame {
//-------------------------------------------------------------------------------------------------------
public static final String startString =
"Infrequently Asked Questions\n"
+ "\n"
+ " 1. Why does my country have the right to be occupying Iraq?\n"
+ " 2. Why should my country not support an international court of justice?\n"
+ " 3. Is my country not strong enough to achieve its aims fairly?\n"
+ " 4. When the leaders of a country cause it to do terrible things, what is the best way to restore the honor of that country?\n"
+ " 5. Is it possible for potential new leaders to raise questions about their country's possible guilt, without committing political suicide?\n"
+ " 6. Do I deserve retribution from aggrieved people whose lives have been ruined by actions that my leaders have taken without my consent?\n"
+ " 7. How can I best help set in motion a process by which reparations are made to people who have been harmed by unjust deeds of my country?\n"
+ " 8. If day after day goes by with nobody discussing uncomfortable questions like these, won't the good people of my country be guilty of making things worse?\n"
+ "\n"
+ "Alas, I cannot think of a satisfactory answer to any of these questions. I believe the answer to number 6 is still no; yet I fear that a yes answer is continually becoming more and more appropriate, as month upon month goes by without any significant change to the status quo.\n"
+ "\n"
+ "Perhaps the best clues to the outlines of successful answers can be found in a wonderful speech that Richard von Weizsäcker gave in 1985:\n"
+ "\n"
+ " > The time in which I write ... has a horribly swollen belly, it carries in its womb a national catastrophe ... Even an ignominious issue remains something other and more normal than the judgment that now hangs over us, such as once fell on Sodom and Gomorrah ... That it approaches, that it long since became inevitable: of that I cannot believe anybody still cherishes the smallest doubt. ... That it remains shrouded in silence is uncanny enough. It is already uncanny when among a great host of the blind some few who have the use of their eyes must live with sealed lips. But it becomes sheer horror, so it seems to me, when everybody knows and everybody is bound to silence, while we read the truth from each other in eyes that stare or else shun a meeting. \n"
+ " >\n"
+ " > Germany ... today, clung round by demons, a hand over one eye, with the other staring into horrors, down she flings from despair to despair. When will she reach the bottom of the abyss? When, out of uttermost hopelessness --- a miracle beyond the power of belief --- will the light of hope dawn? A lonely man folds his hands and speaks: ``God be merciful to thy poor soul, my friend, my Fatherland!'' \n"
+ " >\n"
+ " > -- Thomas Mann, Dr. Faustus (1947, written in 1945)\n"
+ " > [excerpts from chapter 33 and the epilogue] \n"
+ "\n"
+ "[ Author: Donald Knuth ; Source: http://www-cs-faculty.stanford.edu/~uno/iaq.html ]\n";
//-------------------------------------------------------------------------------------------------------
/**
* getDigramCount
*
* Get a count of how many times each digram occurs in an input String.
* A digram, in case you don't know, is just a pair of letters.
*
* #param text a string containing the text you wish to analyze
* #return a map containing entries whose keys are digrams, and
* whose values correspond to the number of times that digram occurs
* in the input String text.
*/
public Map<String,Integer> getDigramCount(String text)
{
Map<String,Integer> digramMap = new TreeMap<String,Integer>();
text = text.toLowerCase();
text = text.replaceAll("\\W|[0-9]|_","");
for(int i=0;i<text.length()-1;i++)
{
String digram = text.substring(i,i+2);
if(!digramMap.containsKey(digram))
{
digramMap.put(digram,1);
} else {
int freq = digramMap.get(digram);
freq++;
digramMap.put(digram,freq);
}
}
return digramMap;
}
/**
* updateDigramCount
*
* Use the getDigramCount method to get the digram counts from the
* input text area, and then update the appropriate output area with
* the information.
*/
public void updateDigramCount()
{
Map<String,Integer> wordCountList = getDigramCount(words);
StringBuffer sb = new StringBuffer();
Set<Map.Entry<String,Integer>> values = wordCountList.entrySet();
for(Map.Entry<String,Integer> me : values)
{
// We will only print the digrams that occur at least 5 times.
if(me.getValue() >= 5)
{
sb.append(me.getKey()+" "+me.getValue()+"\n");
}
}
digramCountText.setText(sb.toString());
}
/**
* getLetterCount
*
* Get a count of how many times each letter occurs in an input String.
*
* #param text a string containing the text you wish to analyze
* #return a map containing entries whose keys are alphabetic letters, and
* whose values correspond to the number of times that letter occurs
* in the input String text.
*/
public Map<Character,Integer> getLetterCount(String text)
{
Map<Character,Integer> letterMap = new TreeMap<Character,Integer>();
text = text.toLowerCase();
// Now get rid of anything that is not an alphabetic character.
text = text.replaceAll("\\W|[0-9]|_","");
for(int i=0;i<text.length()-1;i++)
{
Character letter = text.charAt(i);
if(!letterMap.containsKey(letter))
{
letterMap.put(letter,1);
} else {
int freq = letterMap.get(letter);
freq++;
letterMap.put(letter,freq);
}
}
return new TreeMap<Character,Integer>();
}
/**
* updateLetterCount
*
* Use the getLetterCount method to get the letter counts from the
* input text area, and then update the appropriate output area with
* the information.
*/
public void updateLetterCount()
{
String words = theText.getText();
Map<Character,Integer> letterCountList = getLetterCount(letter);
StringBuffer sb = new StringBuffer();
Set<Map.Entry<Character,Integer>> values = letterCountList.entrySet();
for(Map.Entry<Character,Integer> me : values)
{
if(me.getValue() >= 5)
{
sb.append(me.getKey()+" "+me.getValue()+"\n");
}
}
letterCountText.setText(sb.toString());
}
This is a screenshot of the error
public Map<Character,Integer> getLetterCount(String text)
{
...
return new TreeMap<Character,Integer>();
}
returns an empty map. You want to return letterMap here,
I'm sure all of us have seen ellipsis' on Facebook statuses (or elsewhere), and clicked "Show more" and there are only another 2 characters or so. I'd guess this is because of lazy programming, because surely there is an ideal method.
Mine counts slim characters [iIl1] as "half characters", but this doesn't get around ellipsis' looking silly when they hide barely any characters.
Is there an ideal method? Here is mine:
/**
* Return a string with a maximum length of <code>length</code> characters.
* If there are more than <code>length</code> characters, then string ends with an ellipsis ("...").
*
* #param text
* #param length
* #return
*/
public static String ellipsis(final String text, int length)
{
// The letters [iIl1] are slim enough to only count as half a character.
length += Math.ceil(text.replaceAll("[^iIl]", "").length() / 2.0d);
if (text.length() > length)
{
return text.substring(0, length - 3) + "...";
}
return text;
}
Language doesn't really matter, but tagged as Java because that's what I'm mostly interested in seeing.
I like the idea of letting "thin" characters count as half a character. Simple and a good approximation.
The main issue with most ellipsizings however, are (imho) that they chop of words in the middle. Here is a solution taking word-boundaries into account (but does not dive into pixel-math and the Swing-API).
private final static String NON_THIN = "[^iIl1\\.,']";
private static int textWidth(String str) {
return (int) (str.length() - str.replaceAll(NON_THIN, "").length() / 2);
}
public static String ellipsize(String text, int max) {
if (textWidth(text) <= max)
return text;
// Start by chopping off at the word before max
// This is an over-approximation due to thin-characters...
int end = text.lastIndexOf(' ', max - 3);
// Just one long word. Chop it off.
if (end == -1)
return text.substring(0, max-3) + "...";
// Step forward as long as textWidth allows.
int newEnd = end;
do {
end = newEnd;
newEnd = text.indexOf(' ', end + 1);
// No more spaces.
if (newEnd == -1)
newEnd = text.length();
} while (textWidth(text.substring(0, newEnd) + "...") < max);
return text.substring(0, end) + "...";
}
A test of the algorithm looks like this:
I'm shocked no one mentioned Commons Lang StringUtils#abbreviate().
Update: yes it doesn't take the slim characters into account but I don't agree with that considering everyone has different screens and fonts setup and a large portion of the people that land here on this page are probably looking for a maintained library like the above.
It seems like you might get more accurate geometry from the Java graphics context's FontMetrics.
Addendum: In approaching this problem, it may help to distinguish between the model and view. The model is a String, a finite sequence of UTF-16 code points, while the view is a series of glyphs, rendered in some font on some device.
In the particular case of Java, one can use SwingUtilities.layoutCompoundLabel() to effect the translation. The example below intercepts the layout call in BasicLabelUI to demonstrate the effect. It may be possible to use the utility method in other contexts, but the appropriate FontMetrics would have to be be determined empirically.
import java.awt.Color;
import java.awt.EventQueue;
import java.awt.Font;
import java.awt.FontMetrics;
import java.awt.GridLayout;
import java.awt.Rectangle;
import java.awt.event.ComponentAdapter;
import java.awt.event.ComponentEvent;
import javax.swing.BorderFactory;
import javax.swing.Icon;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.border.EmptyBorder;
import javax.swing.border.LineBorder;
import javax.swing.plaf.basic.BasicLabelUI;
/** #see http://stackoverflow.com/questions/3597550 */
public class LayoutTest extends JPanel {
private static final String text =
"A damsel with a dulcimer in a vision once I saw.";
private final JLabel sizeLabel = new JLabel();
private final JLabel textLabel = new JLabel(text);
private final MyLabelUI myUI = new MyLabelUI();
public LayoutTest() {
super(new GridLayout(0, 1));
this.setBorder(BorderFactory.createCompoundBorder(
new LineBorder(Color.blue), new EmptyBorder(5, 5, 5, 5)));
textLabel.setUI(myUI);
textLabel.setFont(new Font("Serif", Font.ITALIC, 24));
this.add(sizeLabel);
this.add(textLabel);
this.addComponentListener(new ComponentAdapter() {
#Override
public void componentResized(ComponentEvent e) {
sizeLabel.setText(
"Before: " + myUI.before + " after: " + myUI.after);
}
});
}
private static class MyLabelUI extends BasicLabelUI {
int before, after;
#Override
protected String layoutCL(
JLabel label, FontMetrics fontMetrics, String text, Icon icon,
Rectangle viewR, Rectangle iconR, Rectangle textR) {
before = text.length();
String s = super.layoutCL(
label, fontMetrics, text, icon, viewR, iconR, textR);
after = s.length();
System.out.println(s);
return s;
}
}
private void display() {
JFrame f = new JFrame("LayoutTest");
f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
f.add(this);
f.pack();
f.setLocationRelativeTo(null);
f.setVisible(true);
}
public static void main(String[] args) {
EventQueue.invokeLater(new Runnable() {
#Override
public void run() {
new LayoutTest().display();
}
});
}
}
If you're talking about a web site - ie outputting HTML/JS/CSS, you can throw away all these solutions because there is a pure CSS solution.
text-overflow:ellipsis;
It's not quite as simple as just adding that style to your CSS, because it interracts with other CSS; eg it requires that the element has overflow:hidden; and if you want your text on a single line, white-space:nowrap; is good too.
I have a stylesheet that looks like this:
.myelement {
word-wrap:normal;
white-space:nowrap;
overflow:hidden;
-o-text-overflow:ellipsis;
text-overflow:ellipsis;
width: 120px;
}
You can even have a "read more" button that simply runs a javascript function to change the styles, and bingo, the box will re-size and the full text will be visible. (in my case though, I tend to use the html title attribute for the full text, unless it's likely to get very long)
Hope that helps. It's a much simpler solution that trying to mess calculate the text size and truncate it, and all that. (of course, if you're writing a non-web-based app, you may still need to do that)
There is one down-side to this solution: Firefox doesn't support the ellipsis style. Annoying, but I don't think critical -- It does still truncate the text correctly, as that is dealt with by by overflow:hidden, it just doesn't display the ellipsis. It does work in all the other browsers (including IE, all the way back to IE5.5!), so it's a bit annoying that Firefox doesn't do it yet. Hopefully a new version of Firefox will solve this issue soon.
[EDIT]
People are still voting on this answer, so I should edit it to note that Firefox does now support the ellipsis style. The feature was added in Firefox 7. If you're using an earlier version (FF3.6 and FF4 still have some users) then you're out of luck, but most FF users are now okay. There's a lot more detail about this here: text-overflow:ellipsis in Firefox 4? (and FF5)
For me this would be ideal -
public static String ellipsis(final String text, int length)
{
return text.substring(0, length - 3) + "...";
}
I would not worry about the size of every character unless I really know where and in what font it is going to be displayed. Many fonts are fixed width fonts where every character has same dimension.
Even if its a variable width font, and if you count 'i', 'l' to take half the width, then why not count 'w' 'm' to take double the width? A mix of such characters in a string will generally average out the effect of their size, and I would prefer ignoring such details. Choosing the value of 'length' wisely would matter the most.
Using Guava's com.google.common.base.Ascii.truncate(CharSequence, int, String) method:
Ascii.truncate("foobar", 7, "..."); // returns "foobar"
Ascii.truncate("foobar", 5, "..."); // returns "fo..."
How about this (to get a string of 50 chars):
text.replaceAll("(?<=^.{47}).*$", "...");
public static String getTruncated(String str, int maxSize){
int limit = maxSize - 3;
return (str.length() > maxSize) ? str.substring(0, limit) + "..." : str;
}
If you're worried about the ellipsis only hiding a very small number of characters, why not just check for that condition?
public static String ellipsis(final String text, int length)
{
// The letters [iIl1] are slim enough to only count as half a character.
length += Math.ceil(text.replaceAll("[^iIl]", "").length() / 2.0d);
if (text.length() > length + 20)
{
return text.substring(0, length - 3) + "...";
}
return text;
}
I'd go with something similar to the standard model that you have. I wouldn't bother with the character widths thing - as #Gopi said it is probably goign to all balance out in the end. What I'd do that is new is have another paramter called something like "minNumberOfhiddenCharacters" (maybe a bit less verbose). Then when doign the ellipsis check I'd do something like:
if (text.length() > length+minNumberOfhiddenCharacters)
{
return text.substring(0, length - 3) + "...";
}
What this will mean is that if your text length is 35, your "length" is 30 and your min number of characters to hide is 10 then you would get your string in full. If your min number of character to hide was 3 then you would get the ellipsis instead of those three characters.
The main thing to be aware of is that I've subverted the meaning of "length" so that it is no longer a maximum length. The length of the outputted string can now be anything from 30 characters (when the text length is >40) to 40 characters (when the text length is 40 characters long). Effectively our max length becomes length+minNumberOfhiddenCharacters. The string could of course be shorter than 30 characters when the original string is less than 30 but this is a boring case that we should ignore.
If you want length to be a hard and fast maximum then you'd want something more like:
if (text.length() > length)
{
if (text.length() - length < minNumberOfhiddenCharacters-3)
{
return text.substring(0, text.length() - minNumberOfhiddenCharacters) + "...";
}
else
{
return text.substring(0, length - 3) + "...";
}
}
So in this example if text.length() is 37, length is 30 and minNumberOfhiddenCharacters = 10 then we'll go into the second part of the inner if and get 27 characters + ... to make 30. This is actually the same as if we'd gone into the first part of the loop (which is a sign we have our boundary conditions right). If the text length was 36 we'd get 26 characters + the ellipsis giving us 29 characters with 10 hidden.
I was debating whether rearranging some of the comparison logic would make it more intuitive but in the end decided to leave it as it is. You might find that text.length() - minNumberOfhiddenCharacters < length-3 makes it more obvious what you are doing though.
In my eyes, you can't get good results without pixel math.
Thus, Java is probably the wrong end to fix this problem when you are in a web application context (like facebook).
I'd go for javascript. Since Javascript is not my primary field of interest, I can't really judge if this is a good solution, but it might give you a pointer.
Most of this solutions don't take font metrics into account, here is a very simple but working solution for java swing that i have used for years now.
private String ellipsisText(String text, FontMetrics metrics, Graphics2D g2, int targetWidth) {
String shortText = text;
int activeIndex = text.length() - 1;
Rectangle2D textBounds = metrics.getStringBounds(shortText, g2);
while (textBounds.getWidth() > targetWidth) {
shortText = text.substring(0, activeIndex--);
textBounds = metrics.getStringBounds(shortText + "...", g2);
}
return activeIndex != text.length() - 1 ? shortText + "..." : text;
}
For simple cases, I have used String.format for this.
Here I abbreviate to max 10 chars and add ellipses:
String abbreviate(String longString) {
return String.format("%.10s...", longString);
}
Little known fact the "precision" numbers in the format pattern is used for truncation in strings.
Add your own length-check, of course, if you want to make ellipses conditional. (I was shortening a JWT for logging, so I know it's going to be longer)
As a bonus, if the String is already shorter than the precision, there is no padding, it simply leaves it as is.
> System.out.println(abbreviate("This is a very long string"));
> System.out.println(abbreviate("Shorty"));
This is a ...
Shorty...
You can also simply implement like this:
mb_strimwidth($string, 0, 120, '...')
Thanks.