fastest way to match fingerprints?

fastest way to match fingerprints? - java

I am trying to check whether a fingerprint exists/matches in a huge collection of fingerprints (100,000 fingerprints). It will take more time to search for matches sequentially. is there any better way to search for match? is it possible to organize the fingerprints in a binary tree structure so that the number of comparisons can be reduced? if yes how can we do it? it would be helpful if the answers are in Java perspective.
edit:
I have all the fingerprints as .gif images. how can i convert the finger print images into data?
Thanks.

1) You need to use Wavelet compression algorithm to encode the fingerprint in sequence of wavelet compression parameters:
0, -1, 2.4, 5.6.7.7, 32.-1.5, e.t.c.
2) You need to define match function, which will find some similarities, there are two options:
-the geometry approach (compare qudrants to qudrants, all field are spaced in continuous blocks by some space algorithm)
Pros:
hardware accelerated (SSE) pixel matching algorithm, normalization all fingerprints to a standart basis using affine transformation, f.e. to square 512x512 px
Cons:
Hight sensitivity in fingerprint quality (if a part of searched fingerprint is omittet totally)
-the topology approach (the connectivity of lines, arcs, the breakpoints, mutual positioning each other)
Pros:
Low sensibility to angle, position, and quality of fingerprint, can use the original image scale and direction;
Cons:
Low speed of analysis, highly dependent upon quality of an classification function,
3) You need to define some sort of a genetic algorithm to train the evaluate function on a known set of fingerprints
You knowledge system will be able to find fingerprints by the given sample, not known by the system, but trained to find some particular differences/matches, raises the probability of a successful search, lovering the probability of false matches upon the search.

This is not my field of expertise (I'm a web developer), but I think you should look into neural networks. I downloaded some demo code once and did some experimenting with character recognition. It was amazing to see how the neural network that I had setup could recognize characters that I drew on the screen. But before it could do this, it first had to learn (backpropagation learning).
Here's a slideshow that provides an outline:
http://www.slideshare.net/alessandrobaffa/fingerprints-recognition-using-neural-networks
The last slide contains further references.
Good luck!
/Thomas Kahn

You can't just do some kind of image comparison - there are specific ways to analyze and store fingerprint information already established which, for example, take into account the quality of the lifted/scanned fingerprint and that of the stored fingerprint data.
I googled for fingerprint encoding standard and came up with several interesting results, including the Encyclopedia of Biometrics which mentions "quality in various fingerprint encoding standards", and an article talking about the FBI image coding standard (among other things)

I know that this question was asked 4 years ago, however many individuals are viewing it and for the viewers I think my response might be helpful.
There are a few questions posed:-
1)Is there any way to search for a fingerprint matching as fast as possible for large scale databases?
Ans: Yes - Before for a matching a fingerprint, there is an important step you are missing. This process is fingerprint classification, which is broken down into exclusive classification and continuous classification. Exclusive classification is easier to implement, since you identify a pattern of the fingerprint, known as a class and compare it to only the fingerprints in the database that are of the same class. This is what done to speed up fingerprint matching.
The link created by Peter kovesi below provides code for orientation field and minutia extraction for matching:-
http://www.csse.uwa.edu.au/~pk/research/matlabfns/#fingerprints
Singular point detection and orientation fields aids in identifying classes. It can be found on the link.
2)How can one convert the finger print images into data?
Ans: Ok, it doesn't matter what format the image is, I use tiff. You need to know that fingerprints are made up of ridges and valleys. Ridges are represented by the darker line. You need to apply something called ridge segmentation to discard the background and extract only the ridges. This is stored in a mask.
3) "the existing image and the image that is scanned wont be exactly similar. that's my problem"
Ans: Is it affected by noise, rotation, translation etc. Reducing noise, use enhancement techniques. For rotation, use reference points and align fingerprints.
I know this is a brief overview, however I hope that it points your'll in the right direction. Good luck!

I can't comment on the fully-DIY best approach, but I do have a lot of field expertise in this area. All large (expensive!) commercial products have 2 or more algorithms to do fingerprint matching on larger datasets. There are some that use fingerprint classes (loop, whorl etc) to do some pre-filtering, but in general fingerprints do not index very well, you'll have to brute force it in an intelligent way. That is where the multiple algorithms come into play.
There are a few classes of algorithms that can do very fast fingerprint compares (ridge shape) but are highly susceptible to errors so are not by themselves accurate enough to do a sane identification on reasonably sized databases. So those algorithms are typically deployed as first stage. If the algorithm is in any doubt, it passes to the next stage. This could be some 'mid-class' algorithm, e.g. spectral minutiae or a 'slow&accurate' algorith, e.g. something that actually compares all minutiae. The net effect is that the secondary stages typically correct for most of the false-accepts of the first stage. The only unrecoverable loss is the false rejects in the first (and second) stage. Depending on the application domain, this can be negligible or pretty high. It's a trade-off between accuracy and performance. In our own test environment we have seen speeds over 100.000.000 fingerprints compares per second in this way on a single (beefy) desktop, solving the original problem in ~1ms. It is however a complex, expensive and very specialist piece of software.

Fingerprint matching, if you want accuracy is best done using the tried and true methods that just about all automated fingerprint matching algorithm use.
The extract minutia points and store their location and other data in a template and then use statistical analysis of relative positioning of the minutia data within two templates to calculate a score of how closely the two templates match.
Using this technique often requires taking into consideration things like differences in the rotation and area of the finger as they were placed on the fingerprint scanner for each impression.
Biometric algorithms are not perfect and there performance is measured by their False Accept Rate (FAR) and their False Reject Rate (FRR). These two measures are inversely related to each other, meaning that as you increase security (decrease FAR) you increase the FRR.

Related

How to force dependency / linkage of genes in a genetic algorithm?

For a current project, I want to use genetic algorithms - currently I had a look at the jenetics library.
How can I force that some genes are dependent on each other? I want to map CSS on the gene, f.e. I have genes indicating if an image is displayed, and in case it is also the respective height and width. So I want to have those genes as a group togheter, as it would make no sense that after a crossover, the chrosome would indicate something like "no image" - height 100px - width 0px.
Is there a method to do so? Or maybe another library (in java) which supports this?
Many thanks!

You want to embed more knowledge into your system to reduce the search space.
If it would be knowledge about the structure of the solution, I would propose taking a look at grammatical evolution (GE). Your knowledge appears to be more about valid combinations of codons, so GE is not easily applicable.
It might be possible to combine a few features into a single codon, but this may be undesirable and/or unfeasible (e.g. due to great number of possible combinations).
But in fact you don't have an issue here:
it's fine to have meaningless genotypes — they will be removed due to the selection pressure
it's fine to have meaningless codon sequences — it's called "bloat"; bloat is quite common to some evolutionary algorithms (usually discussed in the context of genetic programming) and is not strictly bad; fighting with bloat too much can reduce the search performance

If you know how your genome is encoded - that is, you know which sequences of chromosomes form groups - then you could extend (since you mention jenetics) io.jenetics.MultiPointCrossover to avoid splitting groups. (Source code available on GitHub.)
It could be as simple as storing ranges of genes which form groups if one of the random cut indexes would split a group, adjusting the index to the nearest end of the group. (Of course this would cause a statistically higher likelihood of cuts at the ends of groups; it would probably be better to generate a new random location until it doesn't intersect a group.)
But it's also valid (as Pete notes) to have genes which aren't meaningful (ignored) based on other genes; if the combination is anti-survival it will be selected out.

How can I compare two images for similarities (Not exact matches with MD5)?

How can I take two images and compare them to see how similar they are?
I'm not talking about comparing two exact images using MD5. The two images that I am comparing will be completely different, as well as likely different sizes at times.
Using Pokemon cards as an example:
I'm going to have scanned HD images of each of the cards. I want the user to be able to take a picture of their Pokemon card with their phone and I want to be able to compare it against my scanned images and then determine which card it is that they took a picture of.
The processing does not have to be done directly on the phone, offloading to a web service is an option however note that my knowledge somewhat limited on the programming languages (limited to PHP/JAVA/Android pretty much). The server I'm using is my own Ubuntu server so I do have access to the exec command from php if this would help.
At first I figured someone would have done something like this before (comparing two images). I tried using php with imageik using an example I found that claimed to do what I was trying ( utilizing compareImages() ), but it didn't work at all. There doesn't seem to be much (if any) documentation on doing something like this which is why I'm so stuck. All I'm looking for is a push in the right direction.
My second thought was to try using OCR to pull just the title of the card and I would just compare that against a database of titles and display the images tied to that title. So far I've tried using phpocr first, which didnt work at all as it requires monochrome images to my understanding. Next I tried tesseract directly from the console on my server, and while it did WAY better than phpocr, more than 80% of the characters were either wrong or incorrect on a scanned image, so a lower quality image coming from a smart phone would really have troubles.
I also tried OpenCV for Android but couldnt get any of the samples working.
Has anyone done anything like this, or at least used something that can accomplish what Im looking for?

There are two distinct tasks - identify area of interest ( which can be done with Haar cascades - same as face detection ) and recognition of identified image which can be
done with invariant moment techniques (like Hu moments - it was good enough to count soviet tanks on satellite images so it shall be good for pokemons). Nice property of invariant moments is soft degradation of results in case of low quality - you get a list of probability for symbols - like this is 80% pikachu and 30% something else.
We are developing OCR library based on invariant moments for use in android here:
https://sourceforge.net/projects/javaocr/
(
pure java and reasonable speed , and there are android samples in demos subdirectory.
And here is app based on javaocr, it will recognize black on white phone number and dial it: https://play.google.com/store/apps/details?id=de.pribluda.android.ocrcall&feature=search_result#?t=W251bGwsMSwyLDEsImRlLnByaWJsdWRhLmFuZHJvaWQub2NyY2FsbCJd
)
You may also consider some aiming help so user positions symbol to be matched properly
( so first task will use real intellect )

You should decide what kind of similarity comparison you need. There are geometric algorithms. They use edge detection and then try to match detected edges in both images. They are probably useful when dealing with different colours of objects with the same shape. And there are algorithms that are more based on colour similarity. They compare what colours are in the image and how they are distributed.
If you are looking for a concrete algorithm, you probably should have a look at the Hough Transform.

Handwritten character (English letters, kanji,etc.) analysis and correction

I would like to know how practical it would be to create a program which takes handwritten characters in some form, analyzes them, and offers corrections to the user. The inspiration for this idea is to have elementary school students in other countries or University students in America learn how to write in languages such as Japanese or Chinese where there are a lot of characters and even the slightest mistake can make a big difference.
I am unsure how the program will analyze the character. My current idea is to get a single pixel width line to represent the stroke, compare how far each pixel is from the corresponding pixel in the example character loaded from a database, and output which area needs the most work. Endpoints will also be useful to know. I would also like to tell the user if their character could be interpreted as another character similar to the one they wanted to write.
I imagine I will need a library of some sort to complete this project in any sort of timely manner but I have been unable to locate one which meets the standards I will need for the program. I looked into OpenCV but it appears to be meant for vision than image processing. I would also appreciate the library/module to be in python or Java but I can learn a new language if absolutely necessary.
Thank you for any help in this project.

Character Recognition is usually implemented using Artificial Neural Networks (ANNs). It is not a straightforward task to implement seeing that there are usually lots of ways in which different people write the same character.
The good thing about neural networks is that they can be trained. So, to change from one language to another all you need to change are the weights between the neurons, and leave your network intact. Neural networks are also able to generalize to a certain extent, so they are usually able to cope with minor variances of the same letter.
Tesseract is an open source OCR which was developed in the mid 90's. You might want to read about it to gain some pointers.

You can follow company links from this Wikipedia article:
http://en.wikipedia.org/wiki/Intelligent_character_recognition
I would not recommend that you attempt to implement a solution yourself, especially if you want to complete the task in less than a year or two of full-time work. It would be unfortunate if an incomplete solution provided poor guidance for students.
A word of caution: some companies that offer commercial ICR libraries may not wish to support you and/or may not provide a quote. That's their right. However, if you do not feel comfortable working with a particular vendor, either ask for a different sales contact and/or try a different vendor first.
My current idea is to get a single pixel width line to represent the stroke, compare how far each pixel is from the corresponding pixel in the example character loaded from a database, and output which area needs the most work.
The initial step of getting a stroke representation only a single pixel wide is much more difficult than you might guess. Although there are simple algorithms (e.g. Stentiford and Zhang-Suen) to perform thinning, stroke crossings and rough edges present serious problems. This is a classic (and unsolved) problem. Thinning works much of the time, but when it fails, it can fail miserably.
You could work with an open source library, and although that will help you learn algorithms and their uses, to develop a good solution you will almost certainly need to dig into the algorithms themselves and understand how they work. That requires quite a bit of study.
Here are some books that are useful as introduct textbooks:
Digital Image Processing by Gonzalez and Woods
Character Recognition Systems by Cheriet, Kharma, Siu, and Suen
Reading in the Brain by Stanislas Dehaene
Gonzalez and Woods is a standard textbook in image processing. Without some background knowledge of image processing it will be difficult for you to make progress.
The book by Cheriet, et al., touches on the state of the art in optical character recognition (OCR) and also covers handwriting recognition. The sooner you read this book, the sooner you can learn about techniques that have already been attempted.
The Dehaene book is a readable presentation of the mental processes involved in human reading, and could inspire development of interesting new algorithms.

Have you seen http://www.skritter.com? They do this in combination with spaced recognition scheduling.
I guess you want to classify features such as curves in your strokes (http://en.wikipedia.org/wiki/CJK_strokes), then as a next layer identify componenents, then estimate the most likely character. All the while statistically weighting the most likely character. Where there are two likely matches you will want to show them as likely to be confused. You will also need to create a database of probably 3000 to 5000 characters, or up to 10000 for the ambitious.
See also http://www.tegaki.org/ for an open source program to do this.

Feature/blob correlation and histogram analysis

I'm working on a sketch search engine that correlates whatever someone's sketching with a picture in the database (the db is just about 40 pictures now). I'm doing this mostly for fun so I'm not that well-versed in computer imaging techniques.
First of all, are there any rules of thumb on how one should create histograms (bin sizes, ranges, etc)? I'm using some histogram code found at http://www.scribd.com/doc/6194304/Histograms (but ported to JavaCV). Sometimes I get good results, sometimes I get bad results, most of the time I get "meh" results. I've been experimenting a TON with bin sizes and ranges and I'm wondering if comparing higher dimensional histograms may be the answer here.
Second of all, it seems that black makes a very strong presence in my current histogram setup (even a black dot shifts the entire result set). Should this be expected? Or did I screw something up? Example:
And after the dot:
Note how I'm already getting pictures of "earthrise" as "close" matches.
I'm also wondering what methods I should use for blob or feature analysis. I think that stuff like SURF may be overkill because I only want to broadly compare blobs, not accurately map templates. Is there any way I can compare the edges after being passed through a Canny filter? (Low complexity if possible):
For example, here, I want the two smiley faces to be at the top because the needle smiley "blob" is more closely related to the smily face shape than to a bunch of passion fruit or a galaxy.
Phew long question. If you want to try out the engine for yourself, go to http://skrch.dvt.name/ (shameless plug, I know, I know -- only works in FF/Chrome/Safari). Maybe more experienced computer vision people can make suggestions based on results. Oh, I'm using the CV_COMP_BHATTACHARYYA distance when comparing histograms (it seemed that it gave the best results although chi-square isn't bad either).

Is there a background ?
IS it significant ?
Maybe you need to look at whether there is a user-supplied background or not.
then you "just" need to have 2 histogram per db entry, one with bg, one without.
That'll stop earthrise looking like an apple with a dot.
for basic bg separation, try a canny, then taking "outside" and removing it from a copy of the original.

What is the best way to read, represent and render map data?

I am interested in writing a simplistic navigation application as a pet project. After searching around for free map-data I have settled on the US Census Bureau TIGER 2007 Line/Shapefile map data. The data is split up into zip files for individual counties and I've downloaded a single counties map-data for my area.
What would be the best way to read in this map-data into a useable format?
How should I:
Read in these files
Parse them - Regular expression or some library that can already parse these Shapefiles?
Load the data into my application - Should I load the points directly into some datastructure in memory? Use a small database? I have no need for persistence once you close the application of the map data. The user can load the Shapefile again.
What would be the best way to render the map once I have read the in the Shapefile data?
Ideally I'd like to be able to read in a counties map data shapefile and render all the poly-lines onto the screen and allow rotating and scaling.
How should I:
Convert lat/lon points to screen coordinates? - As far as I know the Shapefile uses longitude and latitude for its points. So obviously I'm going to have to convert these somehow to screen coordinates to display the map features.
Render the map data (A series of polylines for roads, boundaries, etc) in a way that I can easily rotate and scale the entire map?
Render my whole map as a series of "tiles" so only the features/lines within the viewing area are rendered?
Ex. of TIGER data rendered as a display map:
Anyone with some experience and insight into what the best way for me to read in these files, how I should represent them (database, in memory datastructure) in my program, and how I should render (with rotating/scaling) the map-data on screen would be appreciated.
EDIT: To clarify, I do not want to use any Google or Yahoo maps API. Similarly, I don't want to use OpenStreetMap. I'm looking for a more from-scratch approach than utilizing those apis/programs. This will be a desktop application.

First, I recommend that you use the 2008 TIGER files.
Second, as others point out there are a lot of projects out there now that already read in, interpret, convert, and use the data. Building your own parser for this data is almost trivial, though, so there's no reason to go through another project's code and try to extract what you need unless you plan on using their project as a whole.
If you want to start from the lower level
Parsing
Building your own TIGER parser (reasonably easy - just a DB of line segments), and building a simple render on top of that (lines, polygons, letters/names) is also going to be fairly easy. You'll want to look at various map projection types for the render phase. The most frequently used (and therefore most familiar to users) is the Mercator projection - it's fairly simple and fast. You might want to play with supporting other projections.
This will provide a bit of 'fun' in terms of seeing how to project a map, and how to reverse that projection (say a user clicks on the map, you want to see the lat/lon they clicked - requires reversing the current projection equation).
Rendering
When I developed my renderer I decided to base my window on a fixed size (embedded device), and a fixed magnification. This meant that I could center the map at a lat/lon, and with the center pixel=center lat/lon at a given magnification, and given the mercator projection I could calculate which pixel represented each lat/lon, and vice-versa.
Some programs instead allow the window to vary, and instead of using magnification and a fixed point, they use two fixed points (often the upper left and lower right corners of a rectangle defining the window). In this case it becomes trivial to determine the pixel to lat/lon transfer - it's just a few interpolation calculations. Rotating and scaling make this transfer function a little more complex, but shouldn't be considerably so - it's still a rectangular window with interpolation, but the window corners don't need to be in any particular orientation with respect to north. This adds a few corner cases (you can turn the map inside out and view it as if from inside the earth, for instance) but these aren't onerous, and can be dealt with as you work on it.
Once you've got the lat/lon to pixel transfer done, rendering lines and polygons is fairly simple except for normal graphics issues (such as edges of lines or polygons overlapping inappropriately, anti-aliasing, etc). But rendering a basic ugly map such as it done by many open source renderers is fairly straightforward.
You'll also be able to play with distance and great circle calculations - for instance a nice rule of thumb is that every degree of lat or lon at the equator is approximately 111.1KM - but one changes as you get closer to either pole, while the other continues to remain at 111.1kM.
Storage and Structures
How you store and refer to the data, however, depends greatly on what you plan on doing with it. A lot of difficult problems arise if you want to use the same database structure for demographics vs routing - a given data base structure and indexing will be fast for one, and slow for the other.
Using zipcodes and loading only the nearby zipcodes works for small map rendering projects, but if you need a route across the country you need a different structure. Some implementations have 'overlay' databases which only contain major roads and snaps routes to the overlay (or through multiple overlays - local, metro, county, state, country). This results in fast, but sometimes inefficient routing.
Tiling
Tiling your map is actually not easy. At lower magnifications you can render a whole map and cut it up. At higher magnifications you can't render the whole thing at once (due to memory/space constraints), so you have to slice it up.
Cutting lines at boundaries of tiles so you can render individual tiles results in less than perfect results - often what is done is lines are rendered beyond the tile boundary (or, at least the data of the line end is kept, though rendering stops once it finds it's fallen off the edge) - this reduces error that occurs with lines looking like they don't quite match as they travel across tiles.
You'll see what I'm talking about as you work on this problem.
It isn't trivial to find the data that goes into a given tile as well - a line may have both ends outside a given tile, but travel across the tile. You'll need to consult graphics books about this (Michael Abrash's book is the seminal reference, freely available now at the preceding link). While it talks mostly about gaming, the windowing, clipping, polygon edges, collision, etc all apply here.
However, you might want to play at a higher level.
Once you have the above done (either by adapting an existing project, or doing the above yourself) you may want to play with other scenarios and algorithms.
Reverse geocoding is reasonably easy. Input lat/lon (or click on map) and get the nearest address. This teaches you how to interpret addresses along line segments in TIGER data.
Basic geocoding is a hard problem. Writing an address parser is a useful and interesting project, and then converting that into lat/lon using the TIGER data is non-trivial, but a lot of fun. Start out simple and small by requiring exact name and format matching, and then start to look into 'like' matching and phonetic matching. There's a lot of research in this area - look at search engine projects for some help here.
Finding the shortest path between two points is a non-trivial problem. There are many, many algorithms for doing that, most of which are patented. I recommend that if you try this go with an easy algorithm of your own design, and then do some research and compare your design to the state of the art. It's a lot of fun if you're into graph theory.
Following a path and pre-emptively giving instructions is not as easy as it looks on first blush. Given a set of instructions with an associated array of lat/lon pairs, 'follow' the route using external input (GPS, or simulated GPS) and develop an algorithm that gives the user instructions as they approach each real intersection. Notice that there are more lat/lon pairs than instructions due to curving roads, etc, and you'll need to detect direction of travel and so forth. Lots of corner cases you won't see until you try to implement it.
Point of interest search. This one is interesting - you need to find the current location, and all the points of interest (not part of TIGER, make your own or get another source) within a certain distance (as the crow flies, or harder - driving distance) of the origin. This one is interesting in that you have to convert the POI database into a format that is easy to search in this circumstance. You can't take the time to go through millions of entries, do the distance calculation (sqrt(x^2 + y^2)), and return the results. You need to have some method or algorithm to cut the amount of data down first.
Traveling salesman. Routing with multiple destinations. Just a harder version of regular routing.
You can find a number of links to many projects and sources of information on this subject here.
Good luck, and please publish whatever you do, no matter how rudimentary or ugly, so others can benefit!
-Adam

SharpMap is an open-source .NET 2.0 mapping engine for WinForms and ASP.NET. This may provide all the functionality that you need. It deals with most common GIS vector and raster data formats including ESRI shapefiles.

the solution is :
a geospatial server like mapserver, geoserver, degree (opensource).
They can read and serve shapefiles (and many other things). For example, geoserver (when installed) serve data from US Census Bureau TIGER shapefiles as demo
a javascript cartographic library like openlayers (see the examples at link text
There are plenty of examples on the web using this solution

Funny question. Here's how I do it.
I gather whatever geometry I need in whatever formats they come in. I've been pulling data from USGS, so that amounts to a bunch of:
SHP Files (ESRI Shapefile Technical Description)
DBF Files (Xbase Data file (*.dbf))
I then wrote a program that "compiles" those shape definitions into a form that is efficient to render. This means doing any projections and data format conversions that are necessary to efficiently display the data. Some details:
For a 2D application, you can use whatever projection you want: Map Projections.
For 3D, you want to convert those latitude/longitudes into 3D coordinates. Here is some math on how to do that: transformation from
spherical coordinates to normal rectangular coordinates.
Break up all the primitives into a quadtree/octree (2D/3D). Leaf nodes in this tree contain references to all geometry that intersects that leaf node's (axis-aligned) bounding-box. (This means that a piece of geometry can be referenced more than once.)
The geometry is then split into a table of vertices and a table of drawing commands. This is an ideal format for OpenGL. Commands can be issued via glDrawArrays using vertex buffers (Vertex Buffer Objects).
A general visitor pattern is used to walk the quadtree/octree. Walking involves testing whether the visitor intersects the given nodes of the tree until a leaf node is encountered. Visitors include: drawing, collision detection, and selection. (Because the tree leaves can contain duplicate references to geometry, the walker marks nodes as being visited and ignores them thereafter. These marks have to be reset or otherwise updated before doing the next walk.)
Using a spatial partitioning system (one of the trees) and a drawing-efficient representation is crucial to achieving high framerates. I have found that in these types of applications, you want your frame rate as high as possible 20 fps at a minimum. Not to mention the fact that lots of performance will give you lots of opportunities to create a better looking map. (Mine's far from good looking, but will get there some day.)
The spatial partitioning helps rendering performance by reducing the number of draw commands sent to the processor. However, there could come a time when the user actually wants to view the entire dataset (perhaps an arial view). In this case, you need a level of detail control system. Since my application deals with streets, I give priority to highways and larger roads. My drawing code knows about how many primitives I can draw before my framerate goes down. The primitives are also sorted by this priority. I draw only the first x items where x is the number of primitives I can draw at my desired framerate.
The rest is camera control and animation of whatever data you want to display.
Here are some examples of my existing implementation:
Picture http://seabusmap.com/assets/Picture%205.png Picture http://seabusmap.com/assets/Picture%207.png

for storing tiger data locally, I would chose Postgresql with the postgis tools.
they have an impressive collection of tools, for you especially the Tiger Geocoder offers good way of importing and using the tiger data.
you will need to take a look at the tools that interact with postgis, most likely some sort of mapserver
from http://postgis.refractions.net/documentation/:
There are now several open source tools which work with PostGIS. The uDig project is working on a full read/write desktop environment that can work with PostGIS directly. For internet mapping, the University of Minnesota Mapserver can use PostGIS as a data source. The GeoTools Java GIS toolkit has PostGIS support, as does the GeoServer Web Feature Server. GRASS supports PostGIS as a data source. The JUMP Java desktop GIS viewer has a simple plugin for reading PostGIS data, and the QGIS desktop has good PostGIS support. PostGIS data can be exported to several output GIS formats using the OGR C++ library and commandline tools (and of cource with the bundled Shape file dumper). And of course any language which can work with PostgreSQL can work with PostGIS -- the list includes Perl, PHP, Python, TCL, C, C++, Java, C#, and more.
edit: depite mapserver having the word SERVER in its name, this will be usable in a desktop environment.

Though you already decided to use the TIGER data, you might be interested in OSM (Open Street Map), beacuse OSM has a complete import of the TIGER data in it, enriched with user contributed data. If you stick to the TIGER format, your app will be useless to international users, with OSM you get TIGER and everything else at once.
OSM is an open project featuring a collaboratively edited free world map. You can get all this data as well structured XML, either query for a region, or download the whole world in a large file.
There are some map renderers for OSM available in various programming languages, most of them open source, but still there is much to be done.
There also is an OSM routing service avaliable. It has a web-interface and might also be queriable via a web service API. Again, it's not all finished. Users could definitely use a desktop or mobile routing application built on top of this.
Even if you don't decide to go with that project, you can get lots of inspiration from it. Just have a look at the project wiki and at the sources of the various software projects which are involved (you will find links to them inside the wiki).

You could also work with Microsoft's visual earth mapping application and api or use Google's api. I have always programmed commercially with ESRI products and have not played with the open api's that much.
Also, you might want to look at Maker! and Finder! They are relatively new programs but I think they are free. Might be limited on embedding the data.Maker can be found here.
The problem is that spatial processing is fairly new in the non commercial scale.

If you don't mind paying for a solution Safe Software produces a product called FME. This tool will help you translate data from any format to just about any other. Including KML the Google Earth Format or render it as a JPEG (or series of JPEGs). After converting the data you can embed google earth into your application using their API or just display the tiled images.
As a side not FME is a very powerful platform so while doing your translations you can add or remove parts of data that you don't necessarily need. Merge sources if you have more than one. Convert coordinates (I don't remember what exactly Google Earth uses). Store backups in a database. But seriously if your willing to shell out a few bucks you should look into this.
You can also create flags (much like in your sample map) which contain a location (where to put it) and other data/comments about the location. These flags come in many shapes and sizes.

One simplification over a Mercator or other projection is to assume a constant conversion factor for the latitude and longitude. Multiply the degrees of latitude by 69.172 miles; for the longitude, pick the middle latitude of your map area and multiply (180-longitude) by cosine(middle_latitude)*69.172. Once you've converted to miles, you can use another set of conversions to get to screen coordinates.
This is what worked for me back in 1979.
My source for the number of miles per degree.

When I gave this answer the question was labeled
"What would be the best way to render a Shapefile (map data) with polylines in .Net?"
Now it is a different question but I leave my answer to the original question.
I wrote a .net version that could draw
vector-data (such as the geometry from
a shp file) using plain GDI+ in c#. It
was quite fun.
The reason was that we needed to
handle different versions of
geometries and attributes with a lot
of additional information so we could
not use a commercial map component or
an open source one.
The main thing when doing this is
establish a viewport and
translate/transform WGIS84 coordinates
to a downscale and GDI+ x,y
coordinates and wait with projection
if you even need to reproject at all.

One solution is to use MapXtreme. They have API's for Java and C#. The API is able to load these files and render them.
For Java:
http://www.mapinfo.com/products/developer-tools/desktop%2c-mobile-%26-internet-offering/mapxtreme-java
For .NET:
http://www.mapinfo.com/products/developer-tools/desktop%2c-mobile-%26-internet-offering/mapxtreme-2008
I used this solution in a Desktop application and it worked well. It offers a lot more that only rendering information.
Now doing this from scratch could take quite a while. They do have an evaluation version that you can download. I think it just prints "MAPXTREME" over the map as a watermark, but it is completely usable otherwise

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.