Sunday, October 17, 2010

Reading #10

Comments:
Ozgur

Summary:

This paper describes the HUNCH system, which is a primitive sketch recognition system. Not fully functional as Paleo, but considering the year 1976 that it has been published, HUNCH has given many important concept of what a well-functioning sketch recognition system should have.

The HUNCH system works the way like: first, find the corners by the speed of drawing, usually the corner is the local minima of the speed function. Then, latch these endpoints there are near to each other -- if they are within a radius of each other. The last two steps are related to inference the intend of the user. However, this inference does not come from a learned classifier like nowadays approach, but from the context that user provided -- a simple discretion of the hierarchical interpretation of the data.

Since more empathies has been put to the interpretation, different "recognition" result will be generated by different user interpretation.


Discussion:

Considering the  year it has been published, undoubtedly this paper has provided the most basic steps that a recognition system should have, like a corner finder, latching/connector.

However, this system does not seem like to be a "recognition" system, if user can give rich context to describe what they would like to draw, why the system bother to guess/recognize what is the intend of user. The context can tell anything if well defined.

So this is more like a augmented drawing or paint stuff -- the user gives a primitive drawing that assigns some basic constrains -- some feature point like line/corners or alike-- that the final shape should follow, and tell the system what they would like to draw by a context, then the system beautify the primitive drawing of the user.

Even though I do not think there is too much in the paper can be applied for nowadays recognition domain. This is a very interesting trace to follow, imagine we can get a very simple drawing tools that a novice can produce a very complicated work. However, we need to design a good context -- simple to the user but rich enough for computer to understand.

Reading #9

Comments:
Yue

Summary:

This paper has introduced the Paleo system, which is a new low-level recognition and beautification system with high accuracy. The recognizer can classify single strokes into primitives. Primitives drawn with multiple strokes can be merged by an upper-level recognition system.

A stroke is defined as the set of points (consisting of an x coordinate, y coordinate, and time value) sampled
between pen down and pen up events. The primitive shapes that can be recognized as one stroke are:

• Line: a stroke with a relatively constant slope between all sample points
• Polyline: a stroke consisting of multiple, connected lines
• Circle: a stroke that has a total direction close to 2π, constant radius between the center point and
each stroke point, and whose major and minor axes are close in size
• Ellipse: a stroke with similar properties of a circle, but whose major and minor axes are not similar.
• Arc: a segment of an incomplete circle
• Curve: a stroke whose points can be fit smoothly up to a fifth degree curve
• Spiral: a stroke that is composed of a series of circles with continuously descending (or ascending) radii but a constant center.
• Helix: a stroke that is composed a series of circles with similar radii but with moving centers. We also assume that helixes are drawn linearly.

So, there are eight classifiers for each of these eight shapes. The user input will go through each of these classifiers, and get a return value whether the stroke belongs to the current classifier.

And it also needs to be mentioned that there are two important features -- the normalize distance between direction extremes (NDDE) and the direction change ratio (DCR), that are introduced by the authors, which has improved the result a lot.


Discussion:

Since we have already used Paleo, it does not need to mention the high accuracy it can achieve(98.56%). And for almost every cases, Paleo can give a decent solution. The only problem I have found with using it, is it can accidentally classify a triangle into a circle. It is hard to tell the difference between triangle and circle if the user draw with one stroke but not careful enough to make every edge a straight line. That can be seen for some degree of "round" the triangle is not a triangle but a circle, some thresholds are set.

In the paper, these thresholds are hard set and are chosen empirically, in other word, they are tuned manually, so it has left quite a lot of space to improve, like whether we can find an automatic way to tune these threshold? Or to choose some thresholds that are more robust. However, I doubt if there are more accurate ways to beat an accuracy of 98.56%.