Sketchar: September 2010

Thursday, September 16, 2010

Reading #8

Comments:
Sampath

Summary:

This paper proposed the $N recognizer, as the name indicates, this is an extension of 1$ recognizer, but with many improvements, such as recognizing gestures comprising

multiple strokes, automatically generalizing from one multistroke template to all possible multistrokes with alternative stroke orderings and directions, recognizing 1D gestures such as lines, and providing bounded rotation invariance.

As an extension, $N recognizer is built on top of 1$ recognizer, the major difference is it has some more preprocessings to change a multi-stroke in to a unistroke, then compare with the template, for this part, 1$ and N$ are almost the same (well, the rotation part is a little different).

Dealing with Multistroke: The user needs to define all the permutation of a multi-stroke, that is the user needs to define one multistroke, and ensure that different stroke orders and/or directions will be properly recognized. At runtime, the multistroke will first be converted to a unistroke, and then follow 1$'s algorithm.

Dealing with Rotation: Here is a little bit similar to Protractor -- if the full rotation failed to capture some information, then perform a bounded rotation (within a range), as the author indicated, 45 degree is a good range.

Dealing with 1D gesture: A MINSIDE-to-MAXSIDE threshold is set to decide if the stroke is too "thin", if it is too "thin", then it's 1D, otherwise, it's 2D.

Discussion:

The main contribution of this paper is the part of how to deal with multi-stroke, the method of doing it here is pretty simple (take two direction for each stroke, and connect all the end points), however, I am wondering if it is really accurate. And, the speed is also questionable, defining each component stroke as a dichotomous [0,1] variable, which indicates a exponential complexity, 2^N for N strokes. But the author indicated the speed is fast and also said " most multistroke gestures have only a few strokes because more elaborate gestures are harder for users to remember and use". Well, this is also another quotation from some other paper. But if it comes with very few strokes, I won't be too surprised to see a high accuracy.

Tell me if I am wrong.......

Reading #7

Comments:
Sampath Jayarathna

Summary:

This paper discuss a sketch user interface, by which user can draw naturally as using a pen and paper. And the system is able to recognize the geometric model of the object that user intended to draw. And also this drawing is unrestricted -- it can be in any order or any number of strokes etc.

The algorithm can be described generally in three steps.

Approximation: This process mainly deals with how to detect vertices. For the stroke with a lot of straight line segments, vertices are selected according to the speed of drawing and the curvature at corners. For curves, a Bezier curve is used to approximate/interpolate the shape.

Beautification: Make the drawing more like user’s intention. Make line straight, or paralleled, this may or may not need to move the vertices.

Recognition: Actually, it is just some simple geometric models; a simple template matching is applied.

Finally, the author asked several candidates to try the system, and they all give positive replay.

Discussion:

The most important part of the system is the approximation part, in other word, to extract vertices from the input stroke, if this part can be solved perfectly, the following beautification and recognition part will be much easier to do.

However, how to extract the vertices remains a problem for years, even in this paper. Even in this paper, although the author claimed the accuracy to be 96%, it is based on only 10 figures. The vertices, should also be considered as some kind of features, however, the ways of finding these vertices does not seem convincing to me. (It does not deal with rotation scaling like conventional method does) I am wondering how it will work if comes with some complex examples.

Monday, September 13, 2010

Reading #6

Comments:
Yue Li

Summary:

In this paper the author introduced a gesture recognizer - Protractor, which is an extension of 1$ recognizer, so they share a lot in common. So, I would like to use the similar four steps in 1$ recognizer to explain the Protector.

1. Points Resampling: N points with equal distance will be sampled.

2. Rotation: User can choose orientation-invariant or -sensitive. If -invariant is chosen, a similar rotation as in 1$ is done. If -sensitive is chosen, Protractor aligns the indicative orientation of a gesture with the one of eight base orientations that requires the least rotation. And this eight orientations are considered the major gesture orientations

3. Scaling and Translation: all the points will be translated to make the cancroids (0,0); not scaling is done, however, based on the way of doing Angular Distances, Protractor is inherently scale-invariant.

4. Calculating Optimal Angular Distances: Based on the vector representation obtained from previous steps, Protractor uses inverse cosine distances between their vectors as the similarity score. The template with the maximum similarity score will be the best match.

At last, the author gives many example, indicating the Protractor is superior to 1$ in terms of both accuracy and speed.

Discussion:

Protractor, as a template-based recognizer should share the similar property as 1$ recognizer. No training step is involved, and training samples are stored as templates. However, before classification, a preprocessing is needed, and the unknown gesture is compared against all the templates. This is time consuming, and will finally turns out to be slower than feature-based algorithm. For in this paper, the author did not make a comparison with feature-based, however, these properties should be similar.

As a comparison with 1$ recognizer:

Similar parts:

1. Points Resampling: They are doing the same way.

2. Translation: Still the same.

3. Scaling: Even though Protractor does not have rescaling step, according to the way of calculating the Angular Distance, we can assume a scale-invariant here.

4. Rotation-invariant: If this option is chosen, these two are both rotation-invariant.

Different parts;

1. Rotation-sensitive: Even though it is called rotation sensitive, we still need to eliminate the effect of rotation noise, instead of doing a full rotation, Protractor does a rotation to align to the eight orientations, for me this is kind of way, Rather than to totally align the rotation or to ignore the rotation noise, to align to eight orientations is kind like in-between.

2. Calculating Optimal Angular Distances: This part is the main contribution of the paper, the author indicates that the close-form solution for the calculation of vector based similar measurement is the key reasons why Protractor outperforms 1$ recognizer. This close-form solution has saved a lot of computation time rather than the iterative approach of finding rotation as in 1$ recognizer. And the results also indicates a better accuracy.

Reading #5

Comment:
chris aikens

Summary:

In this paper, the author has introduced the 1$ recognizer, as the name indicates, this algorithm is simple, easy to implement and can be docked into other system easily.

Its algorithm can be described in four steps,

1. Resample the Point Path, to make gesture paths directly comparable even at different movement speeds.
2. Rotate once based on the “Indicative Angle”, where “indicative angle” is defined as the angle formed between the centroid of the gesture (x¯,y¯) and the gesture’s first point.
3. Scale and Translate, during which the gesture is scaled and translated to a reference square and point.
4. Find the Optimal Angle for the Best Score, after all the previous steps are done for all templates, a candidate C is compared to each stored template T to find the average distance, which is defined as the lease path-distance. The template Ti with the least path-distance to C is the result of the recognition.

The $1 recognizer is limited to single stroke, and the author states an accuracy more than 99% accuracy, as well as a comparison with Rubines algorithm and Dynamic Time Warping.

Discussion:

Feature based vs Template based: Rubines and many other researchers formed the gesture recognition as a feature-based recognition process, as far as the fractures can be easily computed (which is usually true). Even though the training step could be slow, but as soon as the classifier is trained, the classification is fast. For 1$ recognition, it does not have an explicit training step, instead, an alignment is required for every input, which is even more time-consuming.

Invariant Feature: The translation/scaling/rotation invariant feature is the best feature that people are looking for. 1$ recognizer, as I said, is not a feature-based method. However, the alignment step is applied to eliminate the effects of these transformations. But another problem rises, since for sketch, sometimes, these transformations are also an important feature for a certain meaning, for example, in Rubines’ paper, 4th feature, which is the angle between the bounding box diagonal and the bottom horizontal. This feature actually captured the rotation information, however, this information is lost in 1$ recognizer.

Thursday, September 9, 2010

Reading #4

Comment:
chris aikens

Summary:

In this paper, the author introduces the sketchpad, the first pen-based sketching system in 1960s. First chapter of the paper starts with an examples of how to drawing a hexagon from circle and how this can be used to generate a hexagonal lattice. The following parts analyze the capabilities, design and the usage of the system. The 2nd chapter talked about the data structure of the system, actually, all the lines and points are stored to their properties. All things with the same type will be stored in a ring under a generic heading, which contains all the information to distinguish this type from other types. The rest chapters talks about light pen how to track and locate the position; how to drawn lines, circle, digits, etc. to the screen; how recursion can be used for efficient operations like deleting, merging etc; what kind of constrains that user can apply to the drawing.

Discussion:

Actually, I learned Sutherland firstly from Computer Graphics class, since there is a famous Polygon Clipping algorithm "Sutherland-Hodgman" named after him. I just googled, and I am sure it is the same person here.

For this paper,

From the view of Graphics, I can see quite a lot of similar idea. What Graphics concerns is the description the drawing, same as Sutherland stated in the paper "Each time a drawing is made, a description of that drawing is stored". Although most of the aspects in Sketchpad can be easily realized by nowadays tools, like OpenGL with any object-oriented programming language like C++, this is undoubtedly a great breakthrough in 60 years ago.

From the view of sketch recognition, I was thinking, if we can use some Graphics way like Sutherland did in the paper. Would it be more efficient to build a representation/description for object -- like to use a spline to fit to the shape, then check its parameters -- Well, this is another kind of feature, I agree. Just wondering if these are some redundant correlated features in Rubines or others paper.

Tuesday, September 7, 2010

Reading #3

Comments:

George
and some others

Summary

This paper mainly talked about the toolkit Quil, rather than recognition, the intimate goal of this tool is to advise designers on how to improve their gestures by warning the user if two gestures are too similar that can cause confusion.

First the author talks about how the similarity is built up, actually, a similarity model is learned from human judgments (three experiments and 49+49+266 participants). After an introduction of Quil's user interface, most of the paper talks about the difficulties and solution in UI, implementation and similarity metrics.

Discussion

Admittedly, the idea of building a gesture design tool is really fascinating, and many difficulties can be foreseen. However, I would like to see more discussion about how to build up the similarity metrics, this should be the core of the whole system and it is disappointing that the recognition only plays a small part in the whole system (I may need to check some previous paper from Long). Besides, the challenges and solutions (like timing and hierarchy) discussed in the paper I think are more related to AI.

Reading #2

Comments:
Danielle

Summary:

For the first part the paper, the author mainly discussed the GRANDMA toolkit, the first sketch recognition toolkit that has ever been developed. This toolkit used a graphic user interface, that can rapidly extract features from drawing and recognize the meaning from that drawing.

The rest part of the paper talked about the algorithm behind GRANDMA, and similar as last reading. The whole algorithm has two parts, feature extraction and classifier training, be more specific, 13 features and linear classifier.

Discussion:

I think the one main contribution of the paper is to build up the GRANDMA system, which has provided a simple but efficient interaction for sketch gesture recognition, and another contribution is the features he has defined.

However, whether these features are robust, whether they are suitable for more complex cases, should we add more features rather than x,y (like pressure, width of stroke etc)? This paper has provided a fundamental framework, so we can explore more on these points.

Readings #1

COMMENTS:
chris aikens

SUMMARY:

This paper first relates sketch recognition to a more common technique gesture recognition. Even though the author considers they are fundamentally different, for some certain situations gesture recognition technique can give good results for sketch recognition problems, therefore it is worthwhile giving some efforts.

And for the rest part of the paper, the author discusses three fundamental gesture recognition methods by, Rubins, Long and Wobbrock

Runbins work (1991) is considered by the author as the first one that applied gesture recognition technique for sketch problem. This method can be simply formulated as 13 features + linear classifiers. Long's work(1996) is an extension of Runbines, while he used 22 features. Wobbrock introduced a so-called $1 recognizer, which is a template matcher rather than a feature based method.

DISCUSSION:

In general, the gesture recognition problem lies in how to build a good classifier, could be based on features or templates. And this usually has two steps, feature extraction, and classifier tanning, after the classifier has been trained, classification is done based on that.

For a good feature, it should be unique, rotation/translation invariant, or hopefully, scaling invariant, how to define a good feature is a key to the problem, Robines and Long have different ways of defining features. These features can be extracted quickly for training data and input data, even though the training for linear classifier might be slow, the classification from linear classifier is fast.

As for the templates, as in Wobbrock's method, instead of finding the invariant features from the first place, the pre-processing (rotation/translation/scaling) is done for every whole single object. And I think that is why this method is slower.

There is no classifier training step, the classifier is just templates (or maybe these templates are trained? not sure, need to take a look at the original paper) from different gesture class, and the classification is simply based on the distance between template and input data, however, those pre-processing (transformations) of input data has already taken a lot of time.

For classifier, no matter linear classifier or neural network, etc, there are quite a lot of well-developed classifiers, so I think this is not a big deal.

Monday, September 6, 2010

Questionarie

1. Photo of yourself.

2. E-mail address (e.g., yourname at domain.com).

kingyy@neo.tamu.edu

3. Graduate standing (e.g., 3rd year Phd) (e.g., 3rd Year PhD, 2nd Year Masters, 1st Year PhD w/ Masters).

I am a first-year PhD.

4. Why are you taking this class?

This is a new area, and I believe my previous knowledge may have a good use in this area.

5. What experience do you bring to this class?

I have background in Computer Vision and Graphics, as well as Pattern Recognition, Machine learning, etc.

6. What do you expect to be doing in 10 years?

Since both my parents are professors, I would like to be a professor myself. If this goal can not be achieved, I would like to go to some research facilities.

7. What do you think will be the next biggest technological advancement in computer science?

Human computer interaction

Computer should enter people’s daily life, and this use of computer has been paid most attention nowadays. Therefore, how to interact efficiently with the computer became a crucial topic, there must be some breakthrough in this area.

8. What was your favorite course when you were an undergraduate (computer science or otherwise)?

Computer Principals

It is a directly-translated name from Chinese, actually it majorly talked about how computer works in hardware level and mostly the assembly language. Even though I have forgotten most of assembly language, but since I have undergraduate in Electrical Engineering, it feels comfortable to understand how computer works from hardware view.

9. What is your favorite movie and why?

Final Destination 1.

Rather than the horrific and gory scenes, I would give more credits to its originality, when I first saw this movie in 2002, I was totally attracted by this new idea, 'destiny'. If I can use my own words, I would like to say, everything in the world is connected, this connection could be strong or weak. I am not religious, but I do believe there are connections dragging everything together and I am very pleased to see people trying to find it, use it or against it.

10. If you could travel back in time, who would you like to meet and why?

Albert Einstein, he is the second man that I admired most. The first one is my father, however, I do not need to travel back in time to see him.

11. Give some interesting fact about yourself.

Even though I am a CS student, I am extremely interested in geography and history, as well as arts and music.