Sketchar: 2010

Tuesday, December 14, 2010

Reading #30

Comments:
Ozgur

Summary:

The paper introduced Tahuti, a geometrical sketch recognition system, which can create UML diagrams. Proposed system uses a multi-layer framework for sketch recognition. The stages of the multi-layer recognition framework are: 1) Preprocessing 2) Selection 3) Recognition 4) Identification. Through user studies, the authors discovered that Tahuti's interpreted view was deemed to be easier to drawn in and easier to edit in than comparative systems.

Discussion:

This is a new area of application of sketch recognition. UML diagrams contains nearly all the straight lines. This makes the processing much easier. The system allows people to drag and move. This is a great idea of application.

Reading #29

Comment:
Wen zhe

Summary:

This paper provides an system that takes acoustic-based input, as the name say, scratch. The user proposed to use a modified stethoscope through solid materials, a mic is attached to the surface. It is particularly good to amplifying sound and detecting high frequency noises. The author claim an average accuracy of 89.5%.

Discussion:

In my opinion, the major handicap of this idea is how to eliminate the noise. Compared wit sketch, scratch is much more noisy. And some systems even consider scratch itself as kind of noise. Maybe the stethoscope mic is the key to make this idea work. Well, I am not sure about the stethoscope. Make it plays all the magic in this paper.

Reading #28

Comments:
Ozgur

Summary:

This paper provides an evaluate system about how well people draw the face. From image side, face recognition is applied to model features from the desired human face. From the user side, a sketch recognition is applied to measure how similar it is to the features from human face of image. More over, the system can guide user step by step to draw a more accurate face.

Discussion:

This is a good way of combining computer vision technique with sketch technique. Feature extraction from face image is not very hard, the real hard part is how you can come up with this interesting application.

Reading #27

Comments:
Jianjie

Summary:

This paper proposed an animation tool K-Sketch that can help novice to create animations. This is a pen-based system that requires the user to give timing and spatial information. This paper has adopted a novel optimization algorithm that makes the whole system simultaneously fast. K-Sketch currently supports all ten desired animation

operations: Translate, Scale, Rotate, Set Timing, Move Relative, Appear, Disappear, Trace, Copy Motion, and Orient to Path.

Discussion:

I have been wondering how sketch can be efficiently used for creating animation, and then here comes this paper. I really want to see how it works, the data from paper can not convince me how it really feels like.

Reading #26

Comments:
Jonathon

Summary:

This paper proposed a sketch-based game for collecting data on how people make and describe sketches. Actually Picture-phone has been talked in Reading 24. This paper has given a detailed description as well as the implementation of the system.

The system has three mode:

Draw: Text description is given and players are asked to draw according to it.

Describe: Inverse as Draw, sketch is given, and players need to described.

Rate: Player needs to judge how well the drawings matches.

Discussion:

Not much to say about this paper, since it is the same one as in Reading #24. But more detailed. Again, this is really an interesting way of collecting datas.

Reading #25

Comments:
Jinjie

Summary:

This paper proposed a method of retrieving image with combined text information and sketch information. The whole is based on a descriptor, which is constructed by both color image and sketch, where the sketch actually provides the edges of the desired images. An edge histogram descriptor in a cell will be stored for each image. It takes up to 3 seconds to search an image among 1.5 million images.

Discussion:

The major contribution of this paper is the idea of using sketch information combined with text. We have all experienced using text description to find a picture. It does not work well. This is really a very interesting idea of using sketch.

Reading #24

Comments:
Sampath

Summary:

This paper provides a game system where players interact from drawing. The system collects the raw sketch input and associate it with text information. Two game systems have been described in the paper, Picture-phone and Stella-sketch. While Picture-phone supports people to play at their own rate, a game of Stella-sketch requires several people to play at the same rate.

Discussion:

In general, This paper has presented Picture-phone and Stella-sketch, two sketching games for collecting data about how people make and describe hand-made drawings. It is very interesting idea, since the game system will of course attract more people to collect data.

Reading #23

Comments:
Ozgur

Summary:

This paper has proposed InkSeine, which is a TabletPC that supports active note taking by coupling a pen-and-ink, this application that offers fast interactions for users to search, gather, and link across multiple documents. InkSeine interface is tailored to the unique demands of pen input, and that maintains the primacy of inking above all other tasks.

Discussion:

This paper is a perfect example to show the use of a text recognizer. However, I doubt the accuracy of this idea. Since different user has different habits of organizing documents, will this system cover all the cases?

Reading #22

Comments:
Wenzhe

Summary:

This is another paper on generate 3D model from 2D sketch. The major difference between this one and previous reading is Plushie can not only generate 3D meshes, but also the texture attached to it, which makes the final 3D model more real. The user interactively draws free-form strokes on the canvas as gestures and the system will handle all the rest operations. Even the system is tested on children, they can generate new plush toy.

Discussion:

It is definitely amazing to see this application. I am so surprised to see children who totally have on professional training of generating 3D models or even 3D spatial thinking. It is a great successes to even the most inexperience user.

Reading #21

Comment:
Sampath

Summary:

This paper designed a sketching interface for quickly and easily designing freeform models from 2D freeform strokes interactively on the screen, and the system will automatically constructs plausible 3D polygonal surfaces. In general, the user will be asked to draw a 2D sketch which contains the silhouette, indicates what can be seen from that angle. Then a 3D mesh will be generated in real time.

Discussion:

To create 3D model is tedious and hard. Actually I am really surprised to find they can create 3D polygons. It would be easy to just create a certain view from an angle. But to get a 3D polygons, this is really a great contribution.

Reading #20

Comments:
Longfei

Summary:

This paper has built a system Mathpad2 for users to efficiently do mathematical operations. The system will recognize sketch math notations or equations, then give the calculated results. Besides the sketch that user can write to the screen, the system also provide some useful gestures.MathPad also provide set of computational functions to choose from, as well as different colors to help to recognize. Current Mathpad2 can calculate simple equations, but failed to the complicated ones.

Discussion:

As a specific software, the author does not give a lot of details about the algorithm. However, this paper has given a very good application of sketch recognition, and even prompted to a product.

Reading #19

Comments:
Jianjie

Summary:

The paper proposed the method of using Bayesian conditional random fields to recognize sketched. The conditional random files will not only take the current stroke but also the neighboring strokes. So the recognition is also done using contextual information.

Discussion:

The main contribution of this paper lies in the idea of using context information. It is reasonable to assume that all the strokes are related, since human tends to draw things related together. The math is fancy, and complicated to me.

Reading #18

Comments:
Ozgur

Summary:
In this paper, the author presents a framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagram. This paper has also adopted the graph method to recognize shapes. One advantage of this method is it does not require the order how user draw the strokes. For every symbol, a graph is build and then separated into smaller sub graph. The recognition is done between each sub graph.

Discussion:
A matching between whole graph is hard, however, if this matching could be applied to smaller sub graph, it is much easier. That is really the smartness of this paper.

Reading #17

Comments:
Drew

Summary:
In this paper the author proposed an algorithm to ditinguish text. The features extracted are gaps between strokes time data, their relation to each other and some characteristic features for classification. Then they use a HMM for recognition.

Discussion:
To ditinguish shape from text is a hard problem. I think the author has made the problem even harder by using HMM. However, I have found limitation since HMM asks for a fixed order, which means users have to draw in the same order.

Reading #16

Comments:
Yue

Summary:
In this paper, the author proposed a graph-based recognizer. Each sketch symbol is represented by attributes relational graph, where vertices represent a primitive, and each edge represents a relation between primitives.
For every input, the author did a graph marching to find the each corresponding node between two graphs. This is well-known NP -complete problem. However, author gives some approximate the best solution. The author defines six different marching score metric in this paper as well as its weight value, which get from empirical study. Result shows that about 93% accuracy.

Discussion:
The key idea of this paper is the graph matching problem that the author tried to solve. The accuracy depends on how well the approximate isomorphism algorithm. More improvement can be made here.

Reading #15

Comments:
Yue

Summary:
This paper has proposed an image-based method, which follows vision ideas. In this method, the sketch is compared with a 48 by 48 bitmap image. This algorithm is similarity transformation- invariant, and fast. It needs to mention about the rotation. They first transform the symbol image into polar coordinates and mapped into [-pi, pi] range. In polar coordinate they can easily calculate the rotation angle and again transformed into screen coordinate to continue their recognitin. The whole system is instances-based, so it is easy to add/remove already-defined class.

Dicussion:
It is a clever idea to convert the sketch into an image, so that all the algorithm related to vision ca be applied. However, I am pretty surprised about the speed. Since I assume the convertion from sketch to image might be slow, even though they can solve the rotation efficiently.

Reading #14

Comments:
Ozgur

Summary:

This paper used an entropy idea that can distinguish text as high entropy from non-text information as low entropy. Entropy represents uncertainty measurement, small and complex shapes usually have larger entropy. Strokes are translated into a string of characters representing the angle that stroke is pointed. An entropy model has been introduced that can store the degree of curvature and measure the density. They have achieved a 92% recognition rate.

Discussion:

The idea of entropy is very interesting. However, I doubt the assumption that text has big entropy, since complex shape can also result in big entropy.

Reading #13

Comments:
Ozgur

Summary
This paper used features from texts such as curvature, speed, intersection, etc to distinguish the shape described by text. Totally 46 features have been chosen and grouped in to 7 categories. The experiment was tested on 26 participants, and they use a statistical partition to find 8 of these features

are found useful to split text strokes and build a decision tree.

Discussion:
This paper has broaden our horizon, that sketch recognition may not just be points and lines. Since it is the human who is drawing, it is reasonable that human can give other information like text.

Tuesday, November 30, 2010

Reading #12

Comments:
Wenzhe

Summary

In general this paper used a template-based method, however, this template is built by learning a bunch of labeled data. And the template built from these models is a statistical model that capturing the distances between each sketch part in the model. And as for the sketch parts, some mandatory parts must exist in the sketch, while some optimal parts which can vary in different examples. This is one strong assumption in the paper.

For the recognition, first is to label is to label the mandatory parts, second is the search for the optional parts in the remaining strokes. Both of these searches uses the same objective function, which is a maximum likelihood. And here comes the second strong assumption that all the strokes for similar parts should be drawn in similar strokes, which will make the mapping much easier.

Comments:

I am really glad to see some paper used a more traditional Computer Vision method, where a statistical model is built for the shape/sketch. However, these two assumptions are the major concerns.

The assumption that mandatory parts mush exist is ok for me, since for example a human face must contain eyes, nose, ears, etc. This assumption is natural. However, the assumption that requires all the strokes draw similarly has added a big constrain for the user. Like a flowerpot that is drawn with one stroke should not be drawn with four separate strokes. A traditional shape model won't put any constrain on drawing order. This has lowered the difficulty to recognize each parts.

Monday, November 29, 2010

Reading #11

Comments:
Longfei

Summary

Rather than a detailed algorithm, LADDER is a language to describe how sketched diagrams in a domain are drawn, displayed, and edited. And this description can be automatically transformed into domain specific shape recognizer, editing recognizers and shape exhibitors to use for sketch recognition domain.

The LADDER as a language itself consists of following components: Shape definition, Language contents, and vectors.

And the whole recognition system consists of Recognition of primitive shapes, Recognition of domain shapes, Editing recognition and Constraint solver.

The constraints play an important role for the whole system. These constraints can be predefined or user-customized. These constraints limit as well as simplify the way of doing recognition.

Discussion

This method is not a traditional recognition method, like learning features or templates. T
his is like what I have discussed in the previous blog -- the context given by user can be rich enough and well defined, so that, instead of extracting feature and template from input sketch, the system is more robust at interpreting the context.

Undoubtedly, LADDER is a rich and well-defined context, instead of being an augmented context to the traditional method, this idea has opened a new way of doing sketch recognition.

Sunday, October 17, 2010

Reading #10

Comments:
Ozgur

Summary:

This paper describes the HUNCH system, which is a primitive sketch recognition system. Not fully functional as Paleo, but considering the year 1976 that it has been published, HUNCH has given many important concept of what a well-functioning sketch recognition system should have.

The HUNCH system works the way like: first, find the corners by the speed of drawing, usually the corner is the local minima of the speed function. Then, latch these endpoints there are near to each other -- if they are within a radius of each other. The last two steps are related to inference the intend of the user. However, this inference does not come from a learned classifier like nowadays approach, but from the context that user provided -- a simple discretion of the hierarchical interpretation of the data.

Since more empathies has been put to the interpretation, different "recognition" result will be generated by different user interpretation.

Discussion:

Considering the year it has been published, undoubtedly this paper has provided the most basic steps that a recognition system should have, like a corner finder, latching/connector.

However, this system does not seem like to be a "recognition" system, if user can give rich context to describe what they would like to draw, why the system bother to guess/recognize what is the intend of user. The context can tell anything if well defined.

So this is more like a augmented drawing or paint stuff -- the user gives a primitive drawing that assigns some basic constrains -- some feature point like line/corners or alike-- that the final shape should follow, and tell the system what they would like to draw by a context, then the system beautify the primitive drawing of the user.

Even though I do not think there is too much in the paper can be applied for nowadays recognition domain. This is a very interesting trace to follow, imagine we can get a very simple drawing tools that a novice can produce a very complicated work. However, we need to design a good context -- simple to the user but rich enough for computer to understand.

Reading #9

Comments:
Yue

Summary:

This paper has introduced the Paleo system, which is a new low-level recognition and beautification system with high accuracy. The recognizer can classify single strokes into primitives. Primitives drawn with multiple strokes can be merged by an upper-level recognition system.

A stroke is defined as the set of points (consisting of an x coordinate, y coordinate, and time value) sampled

between pen down and pen up events. The primitive shapes that can be recognized as one stroke are:

• Line: a stroke with a relatively constant slope between all sample points

• Polyline: a stroke consisting of multiple, connected lines

• Circle: a stroke that has a total direction close to 2π, constant radius between the center point and

each stroke point, and whose major and minor axes are close in size

• Ellipse: a stroke with similar properties of a circle, but whose major and minor axes are not similar.

• Arc: a segment of an incomplete circle

• Curve: a stroke whose points can be fit smoothly up to a fifth degree curve

• Spiral: a stroke that is composed of a series of circles with continuously descending (or ascending) radii but a constant center.

• Helix: a stroke that is composed a series of circles with similar radii but with moving centers. We also assume that helixes are drawn linearly.

So, there are eight classifiers for each of these eight shapes. The user input will go through each of these classifiers, and get a return value whether the stroke belongs to the current classifier.

And it also needs to be mentioned that there are two important features -- the normalize distance between direction extremes (NDDE) and the direction change ratio (DCR), that are introduced by the authors, which has improved the result a lot.

Discussion:

Since we have already used Paleo, it does not need to mention the high accuracy it can achieve(98.56%). And for almost every cases, Paleo can give a decent solution. The only problem I have found with using it, is it can accidentally classify a triangle into a circle. It is hard to tell the difference between triangle and circle if the user draw with one stroke but not careful enough to make every edge a straight line. That can be seen for some degree of "round" the triangle is not a triangle but a circle, some thresholds are set.

In the paper, these thresholds are hard set and are chosen empirically, in other word, they are tuned manually, so it has left quite a lot of space to improve, like whether we can find an automatic way to tune these threshold? Or to choose some thresholds that are more robust. However, I doubt if there are more accurate ways to beat an accuracy of 98.56%.

Thursday, September 16, 2010

Reading #8

Comments:
Sampath

Summary:

This paper proposed the $N recognizer, as the name indicates, this is an extension of 1$ recognizer, but with many improvements, such as recognizing gestures comprising

multiple strokes, automatically generalizing from one multistroke template to all possible multistrokes with alternative stroke orderings and directions, recognizing 1D gestures such as lines, and providing bounded rotation invariance.

As an extension, $N recognizer is built on top of 1$ recognizer, the major difference is it has some more preprocessings to change a multi-stroke in to a unistroke, then compare with the template, for this part, 1$ and N$ are almost the same (well, the rotation part is a little different).

Dealing with Multistroke: The user needs to define all the permutation of a multi-stroke, that is the user needs to define one multistroke, and ensure that different stroke orders and/or directions will be properly recognized. At runtime, the multistroke will first be converted to a unistroke, and then follow 1$'s algorithm.

Dealing with Rotation: Here is a little bit similar to Protractor -- if the full rotation failed to capture some information, then perform a bounded rotation (within a range), as the author indicated, 45 degree is a good range.

Dealing with 1D gesture: A MINSIDE-to-MAXSIDE threshold is set to decide if the stroke is too "thin", if it is too "thin", then it's 1D, otherwise, it's 2D.

Discussion:

The main contribution of this paper is the part of how to deal with multi-stroke, the method of doing it here is pretty simple (take two direction for each stroke, and connect all the end points), however, I am wondering if it is really accurate. And, the speed is also questionable, defining each component stroke as a dichotomous [0,1] variable, which indicates a exponential complexity, 2^N for N strokes. But the author indicated the speed is fast and also said " most multistroke gestures have only a few strokes because more elaborate gestures are harder for users to remember and use". Well, this is also another quotation from some other paper. But if it comes with very few strokes, I won't be too surprised to see a high accuracy.

Tell me if I am wrong.......

Reading #7

Comments:
Sampath Jayarathna

Summary:

This paper discuss a sketch user interface, by which user can draw naturally as using a pen and paper. And the system is able to recognize the geometric model of the object that user intended to draw. And also this drawing is unrestricted -- it can be in any order or any number of strokes etc.

The algorithm can be described generally in three steps.

Approximation: This process mainly deals with how to detect vertices. For the stroke with a lot of straight line segments, vertices are selected according to the speed of drawing and the curvature at corners. For curves, a Bezier curve is used to approximate/interpolate the shape.

Beautification: Make the drawing more like user’s intention. Make line straight, or paralleled, this may or may not need to move the vertices.

Recognition: Actually, it is just some simple geometric models; a simple template matching is applied.

Finally, the author asked several candidates to try the system, and they all give positive replay.

Discussion:

The most important part of the system is the approximation part, in other word, to extract vertices from the input stroke, if this part can be solved perfectly, the following beautification and recognition part will be much easier to do.

However, how to extract the vertices remains a problem for years, even in this paper. Even in this paper, although the author claimed the accuracy to be 96%, it is based on only 10 figures. The vertices, should also be considered as some kind of features, however, the ways of finding these vertices does not seem convincing to me. (It does not deal with rotation scaling like conventional method does) I am wondering how it will work if comes with some complex examples.

Monday, September 13, 2010

Reading #6

Comments:
Yue Li

Summary:

In this paper the author introduced a gesture recognizer - Protractor, which is an extension of 1$ recognizer, so they share a lot in common. So, I would like to use the similar four steps in 1$ recognizer to explain the Protector.

1. Points Resampling: N points with equal distance will be sampled.

2. Rotation: User can choose orientation-invariant or -sensitive. If -invariant is chosen, a similar rotation as in 1$ is done. If -sensitive is chosen, Protractor aligns the indicative orientation of a gesture with the one of eight base orientations that requires the least rotation. And this eight orientations are considered the major gesture orientations

3. Scaling and Translation: all the points will be translated to make the cancroids (0,0); not scaling is done, however, based on the way of doing Angular Distances, Protractor is inherently scale-invariant.

4. Calculating Optimal Angular Distances: Based on the vector representation obtained from previous steps, Protractor uses inverse cosine distances between their vectors as the similarity score. The template with the maximum similarity score will be the best match.

At last, the author gives many example, indicating the Protractor is superior to 1$ in terms of both accuracy and speed.

Discussion:

Protractor, as a template-based recognizer should share the similar property as 1$ recognizer. No training step is involved, and training samples are stored as templates. However, before classification, a preprocessing is needed, and the unknown gesture is compared against all the templates. This is time consuming, and will finally turns out to be slower than feature-based algorithm. For in this paper, the author did not make a comparison with feature-based, however, these properties should be similar.

As a comparison with 1$ recognizer:

Similar parts:

1. Points Resampling: They are doing the same way.

2. Translation: Still the same.

3. Scaling: Even though Protractor does not have rescaling step, according to the way of calculating the Angular Distance, we can assume a scale-invariant here.

4. Rotation-invariant: If this option is chosen, these two are both rotation-invariant.

Different parts;

1. Rotation-sensitive: Even though it is called rotation sensitive, we still need to eliminate the effect of rotation noise, instead of doing a full rotation, Protractor does a rotation to align to the eight orientations, for me this is kind of way, Rather than to totally align the rotation or to ignore the rotation noise, to align to eight orientations is kind like in-between.

2. Calculating Optimal Angular Distances: This part is the main contribution of the paper, the author indicates that the close-form solution for the calculation of vector based similar measurement is the key reasons why Protractor outperforms 1$ recognizer. This close-form solution has saved a lot of computation time rather than the iterative approach of finding rotation as in 1$ recognizer. And the results also indicates a better accuracy.

Reading #5

Comment:
chris aikens

Summary:

In this paper, the author has introduced the 1$ recognizer, as the name indicates, this algorithm is simple, easy to implement and can be docked into other system easily.

Its algorithm can be described in four steps,

1. Resample the Point Path, to make gesture paths directly comparable even at different movement speeds.
2. Rotate once based on the “Indicative Angle”, where “indicative angle” is defined as the angle formed between the centroid of the gesture (x¯,y¯) and the gesture’s first point.
3. Scale and Translate, during which the gesture is scaled and translated to a reference square and point.
4. Find the Optimal Angle for the Best Score, after all the previous steps are done for all templates, a candidate C is compared to each stored template T to find the average distance, which is defined as the lease path-distance. The template Ti with the least path-distance to C is the result of the recognition.

The $1 recognizer is limited to single stroke, and the author states an accuracy more than 99% accuracy, as well as a comparison with Rubines algorithm and Dynamic Time Warping.

Discussion:

Feature based vs Template based: Rubines and many other researchers formed the gesture recognition as a feature-based recognition process, as far as the fractures can be easily computed (which is usually true). Even though the training step could be slow, but as soon as the classifier is trained, the classification is fast. For 1$ recognition, it does not have an explicit training step, instead, an alignment is required for every input, which is even more time-consuming.

Invariant Feature: The translation/scaling/rotation invariant feature is the best feature that people are looking for. 1$ recognizer, as I said, is not a feature-based method. However, the alignment step is applied to eliminate the effects of these transformations. But another problem rises, since for sketch, sometimes, these transformations are also an important feature for a certain meaning, for example, in Rubines’ paper, 4th feature, which is the angle between the bounding box diagonal and the bottom horizontal. This feature actually captured the rotation information, however, this information is lost in 1$ recognizer.

Thursday, September 9, 2010

Reading #4

Comment:
chris aikens

Summary:

In this paper, the author introduces the sketchpad, the first pen-based sketching system in 1960s. First chapter of the paper starts with an examples of how to drawing a hexagon from circle and how this can be used to generate a hexagonal lattice. The following parts analyze the capabilities, design and the usage of the system. The 2nd chapter talked about the data structure of the system, actually, all the lines and points are stored to their properties. All things with the same type will be stored in a ring under a generic heading, which contains all the information to distinguish this type from other types. The rest chapters talks about light pen how to track and locate the position; how to drawn lines, circle, digits, etc. to the screen; how recursion can be used for efficient operations like deleting, merging etc; what kind of constrains that user can apply to the drawing.

Discussion:

Actually, I learned Sutherland firstly from Computer Graphics class, since there is a famous Polygon Clipping algorithm "Sutherland-Hodgman" named after him. I just googled, and I am sure it is the same person here.

For this paper,

From the view of Graphics, I can see quite a lot of similar idea. What Graphics concerns is the description the drawing, same as Sutherland stated in the paper "Each time a drawing is made, a description of that drawing is stored". Although most of the aspects in Sketchpad can be easily realized by nowadays tools, like OpenGL with any object-oriented programming language like C++, this is undoubtedly a great breakthrough in 60 years ago.

From the view of sketch recognition, I was thinking, if we can use some Graphics way like Sutherland did in the paper. Would it be more efficient to build a representation/description for object -- like to use a spline to fit to the shape, then check its parameters -- Well, this is another kind of feature, I agree. Just wondering if these are some redundant correlated features in Rubines or others paper.

Tuesday, September 7, 2010

Reading #3

Comments:

George
and some others

Summary

This paper mainly talked about the toolkit Quil, rather than recognition, the intimate goal of this tool is to advise designers on how to improve their gestures by warning the user if two gestures are too similar that can cause confusion.

First the author talks about how the similarity is built up, actually, a similarity model is learned from human judgments (three experiments and 49+49+266 participants). After an introduction of Quil's user interface, most of the paper talks about the difficulties and solution in UI, implementation and similarity metrics.

Discussion

Admittedly, the idea of building a gesture design tool is really fascinating, and many difficulties can be foreseen. However, I would like to see more discussion about how to build up the similarity metrics, this should be the core of the whole system and it is disappointing that the recognition only plays a small part in the whole system (I may need to check some previous paper from Long). Besides, the challenges and solutions (like timing and hierarchy) discussed in the paper I think are more related to AI.

Reading #2

Comments:
Danielle

Summary:

For the first part the paper, the author mainly discussed the GRANDMA toolkit, the first sketch recognition toolkit that has ever been developed. This toolkit used a graphic user interface, that can rapidly extract features from drawing and recognize the meaning from that drawing.

The rest part of the paper talked about the algorithm behind GRANDMA, and similar as last reading. The whole algorithm has two parts, feature extraction and classifier training, be more specific, 13 features and linear classifier.

Discussion:

I think the one main contribution of the paper is to build up the GRANDMA system, which has provided a simple but efficient interaction for sketch gesture recognition, and another contribution is the features he has defined.

However, whether these features are robust, whether they are suitable for more complex cases, should we add more features rather than x,y (like pressure, width of stroke etc)? This paper has provided a fundamental framework, so we can explore more on these points.

Readings #1

COMMENTS:
chris aikens

SUMMARY:

This paper first relates sketch recognition to a more common technique gesture recognition. Even though the author considers they are fundamentally different, for some certain situations gesture recognition technique can give good results for sketch recognition problems, therefore it is worthwhile giving some efforts.

And for the rest part of the paper, the author discusses three fundamental gesture recognition methods by, Rubins, Long and Wobbrock

Runbins work (1991) is considered by the author as the first one that applied gesture recognition technique for sketch problem. This method can be simply formulated as 13 features + linear classifiers. Long's work(1996) is an extension of Runbines, while he used 22 features. Wobbrock introduced a so-called $1 recognizer, which is a template matcher rather than a feature based method.

DISCUSSION:

In general, the gesture recognition problem lies in how to build a good classifier, could be based on features or templates. And this usually has two steps, feature extraction, and classifier tanning, after the classifier has been trained, classification is done based on that.

For a good feature, it should be unique, rotation/translation invariant, or hopefully, scaling invariant, how to define a good feature is a key to the problem, Robines and Long have different ways of defining features. These features can be extracted quickly for training data and input data, even though the training for linear classifier might be slow, the classification from linear classifier is fast.

As for the templates, as in Wobbrock's method, instead of finding the invariant features from the first place, the pre-processing (rotation/translation/scaling) is done for every whole single object. And I think that is why this method is slower.

There is no classifier training step, the classifier is just templates (or maybe these templates are trained? not sure, need to take a look at the original paper) from different gesture class, and the classification is simply based on the distance between template and input data, however, those pre-processing (transformations) of input data has already taken a lot of time.

For classifier, no matter linear classifier or neural network, etc, there are quite a lot of well-developed classifiers, so I think this is not a big deal.

Monday, September 6, 2010

Questionarie

1. Photo of yourself.

2. E-mail address (e.g., yourname at domain.com).

kingyy@neo.tamu.edu

3. Graduate standing (e.g., 3rd year Phd) (e.g., 3rd Year PhD, 2nd Year Masters, 1st Year PhD w/ Masters).

I am a first-year PhD.

4. Why are you taking this class?

This is a new area, and I believe my previous knowledge may have a good use in this area.

5. What experience do you bring to this class?

I have background in Computer Vision and Graphics, as well as Pattern Recognition, Machine learning, etc.

6. What do you expect to be doing in 10 years?

Since both my parents are professors, I would like to be a professor myself. If this goal can not be achieved, I would like to go to some research facilities.

7. What do you think will be the next biggest technological advancement in computer science?

Human computer interaction

Computer should enter people’s daily life, and this use of computer has been paid most attention nowadays. Therefore, how to interact efficiently with the computer became a crucial topic, there must be some breakthrough in this area.

8. What was your favorite course when you were an undergraduate (computer science or otherwise)?

Computer Principals

It is a directly-translated name from Chinese, actually it majorly talked about how computer works in hardware level and mostly the assembly language. Even though I have forgotten most of assembly language, but since I have undergraduate in Electrical Engineering, it feels comfortable to understand how computer works from hardware view.

9. What is your favorite movie and why?

Final Destination 1.

Rather than the horrific and gory scenes, I would give more credits to its originality, when I first saw this movie in 2002, I was totally attracted by this new idea, 'destiny'. If I can use my own words, I would like to say, everything in the world is connected, this connection could be strong or weak. I am not religious, but I do believe there are connections dragging everything together and I am very pleased to see people trying to find it, use it or against it.

10. If you could travel back in time, who would you like to meet and why?

Albert Einstein, he is the second man that I admired most. The first one is my father, however, I do not need to travel back in time to see him.

11. Give some interesting fact about yourself.

Even though I am a CS student, I am extremely interested in geography and history, as well as arts and music.