Intelligent and Interactive Systems

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

In the area of AI (earlier) machine learning took a back seat to Expert Systems

Expert system development usually consists of an expert and a knowledge engineer (KE). The KE extracts rules from the expert and uses them to create an expert program.

The bottleneck of expert system development is the expert-KE interaction.

It would be better if we could automate the knowledge engineering process or better yet allow systems to learn to be experts by example.

According to Herbert Simon, learning is, “Any change in a System that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population.” [G. F. Luger and W. A. Stubblefield, Artificial Intelligence: Structures and Strategies for Complex Problem Solving, The Benjamin/Cummings Publishing Company, Inc. 1989.]

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Machine learning algorithms have been used in:

speech recognition

drive automobiles

play world-class backgammon

program generation

routing in communication networks

understanding handwritten text

data mining

etc.

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

When solving a machine learning problem we must be sure to identify:

What task is to be learned?

How do we (will we) test the performance of our system?

What knowledge do we want to learn?

How do we represent this knowledge?

What learning paradigm would be best to use?

How do we construct a training experience for our learner?

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Training Experience

The way one decides to train a learner is important.

An appropriately designed training experience will allow the learner to generalize well on new, previously unseen examples

One can view training as “practice” for real-world situations

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Types of Training Experience

Direct (Supervised)

Supervised learning is a machine learning technique for learning a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs.
(e.g. case based reasoning)

Indirect (Reinforcement)

Reinforcement learning is a sub-area of machine learning concerned with how an agent ought to take
actions in an environment so as to maximize some notion of long-term reward.
Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states (i.e. typically modeled in finite state Markov decision process (MDP))

Unsupervised

Unsupervised learning is a class of problems in which one seeks to determine how the data are organised. It is distinguished from supervised learning (and reinforcement learning) in that the learner is given only unlabeled examples

Control of Training Experience

Teacher Controlled
Learner Controlled

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Distribution of Training Examples

It is important that the set of training examples given to the learner represents the type of examples that will be given to the learner during the performance measurement.

If the distribution of the training examples does not represent the test elements then the learner will generalize poorly.

Machine Learning as Search

Machine Learning can be viewed as search through hypothesis-space.

A hypothesis can be viewed as a mapping function from percepts to desired output.

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

The Four Components of Learning Systems

The Performance System

solves the problem at hand using the learned hypothesis (learned approximation of the target function) constructed by the learning element

takes unseen percepts as inputs and provides the outputs (again based on the learned hypothesis)

The Critic

informs the learner of how well the agent is doing

provides training examples to the learning element

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

The Four Components of Learning Systems (cont.)

The Learning Element (Generalizer)

takes training examples as input

outputs a hypothesis

The Experiment Generator

takes as input a hypothesis

outputs a new problem for the performance system to explore

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Learning Decision Trees

Decision tree induction is a simple but powerful learning paradigm. In this method a set of training examples is broken down into smaller and smaller subsets while at the same time an associated decision tree get incrementally developed. At the end of the learning process, a decision tree covering the training set is returned.

The decision tree can be thought of as a set sentences (in Disjunctive Normal Form) written propositional logic.

Some characteristics of problems that are well suited to Decision Tree Learning are:

Attribute-value paired elements

Discrete target function

Disjunctive descriptions (of target function)

Works well with missing or erroneous training data

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

(Outlook = Sunny Ù Humidity = Normal) Ú (Outlook = Overcast) Ú (Outlook = Rain Ù Wind = Weak)

[picture from: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]
COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

This decision tree was created using the following training examples:

Day	Outlook	Temperature	Humidity	Wind	PlayTennis
D1	Sunny	Hot	High	Weak	No
D2	Sunny	Hot	High	Strong	No
D3	Overcast	Hot	High	Weak	Yes
D4	Rain	Mild	High	Weak	Yes
D5	Rain	Cool	Normal	Weak	Yes
D6	Rain	Cool	Normal	Strong	No
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes
D11	Sunny	Mild	Normal	Strong	Yes
D12	Overcast	Mild	High	Strong	Yes
D13	Overcast	Hot	Normal	Weak	Yes
D14	Rain	Mild	High	Strong	No

[Table from: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]
COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Building a Decision Tree

In building a decision tree we:

first test all attributes and select the on that would function as the best root;

break-up the training set into subsets based on the branches of the root node;

test the remaining attributes to see which ones fit best underneath the branches of the root node;

continue this process for all other branches until

all examples of a subset are of one type

there are no examples left (return majority classification of the parent)

there are no more attributes left (default value should be majority classification)

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Determining which attribute is best (Entropy & Gain)

Entropy (E) is the minimum number of bits needed in order to classify an arbitrary example as yes or no

E(S) = Sci=1 –pi log2 pi Where S is a set of training examples

For our entropy equation 0 log2 0 = 0

The information gain G(S,A) where A is an attribute

G(S,A) º E(S) - Sv in Values(A) (|Sv| / |S|) * E(Sv)

COMP-6600/7600: Artificial & Computational Intelligence
(Learning from Observations)

Let Try an Example!

Overfitting
In decision tree learning (given a large number of training examples) it is possible to “overfit” the training set. … Finding meaningless regularity in the data. p662 Overfitting occurs with the target function is not at all random.

At the onset of overfitting, the accuracy of the “growing” tree continues to increase (on the training set) but in actuality the tree (as successive trees) begin to lose their ability to generalize beyond the training set.

Overfitting can be eliminated or reduced by using:

Pruning

Preventing recursive splitting on attributes that are clearly not relevant even when the data at that node in the tree is not uniformly classified.

Cross-Validation

Try to estimate how well the current hypothesis will predict unseen data.
This is done by setting aside some fraction of the known data and using it to test the prediction performance of a hypothesis induced from the remaining data
k-fold cross validation means run k experiments each time setting aside a different 1/k of the data to test on, and average the results

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Decision Tree learning can be extended to deal with:

Missing Data

In many domains not all attribute values will be known

Attributes with Differing Costs

We could add an attribute Restaurant Name, with different value for every example

Continuous-valued Attributes

If we are trying to predict a numerical value such as the price of a work of art, rather than a discrete classification, then we need a regressions tree.
Each leaf has a subset of numerical attributes, rather than a single value.

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Concept Learning

In concept learning, the objective is to infer a boolean valued function that is a conjunction of the attribute value assigns of the training examples.

In concept learning, n+2 values can be assigned to an attribute

The n values of the attribute.

The value “?” which means any value assignment is accepted for the corresponding attribute.

The value “Æ” which means no value assignment is acceptable

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Consider the following set of training examples:

height	attitude	KOG	scoring ability	defensive ability	Great Player
tall	good	good	high	good	yes
short	bad	good	low	poor	no
tall	good	good	low	good	yes
tall	good	poor	high	poor	no
short	good	poor	low	good	yes

One hypothesis that works is:

<?,good,?,?,good>

Are there any other hypothesis that will work? Are they more or less general?

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

The Find-S Algorithm

Step1: Initialize S to the most specific hypothesis

Step2: For each positive training example x
For each value ai assigned to an attribute
If ai is different than the corresponding value in x Then
Replace ai with next most general value

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

The Find-G Algorithm

Step1: Initialize G to the most general hypothesis in the hypothesis-space

Step2: If x is negative Then

For each g in G that matches x, replace g with its most general specializations that do not match x
Remove all g in G that are more specific than any other hypothesis in G
Remove all g in G that fail to match a positive example in the training set
Remove any hypothesis in G that matches a previously seen negative example

Else
Remove from G all hypotheses that do not match x

COMP-4640: Intelligent & Interactive Systems
(Learning from Observations)

Candidate Elimination Algorithm

Step 1: Initialize G to the set of most general hypotheses
Initialize S to the set of most specific hypotheses

Step 2: For each training example, x,
If x is positive Then

Remove from G those hypotheses that do not match x
Add all generalizations so that each s in S is consistent with x, and a member of G that is more general than s (the original s is removed)
Only keep the most specific hypothesis in S

Else

Remove any hypotheses in S that matches x
For each g in G that matches x,
Remove g
replace g with its most general specializations that do not match x and some member of S that is more specific
Remove all g in G that are more specific than any other hypothesis in G

Source: http://www.eng.auburn.edu/~sealscd/chapter18.doc

Web site to visit: http://www.eng.auburn.edu

Author of the text: indicated on the source document of the above text

If you are the author of the text above and you not agree to share your knowledge for teaching, research, scholarship (for fair use as indicated in the United States copyrigh low) please send us an e-mail and we will remove your text quickly. Fair use is a limitation and exception to the exclusive right granted by copyright law to the author of a creative work. In United States copyright law, fair use is a doctrine that permits limited use of copyrighted material without acquiring permission from the rights holders. Examples of fair use include commentary, search engines, criticism, news reporting, research, teaching, library archiving and scholarship. It provides for the legal, unlicensed citation or incorporation of copyrighted material in another author's work under a four-factor balancing test. (source: http://en.wikipedia.org/wiki/Fair_use)

The information of medicine and health contained in the site are of a general nature and purpose which is purely informative and for this reason may not replace in any case, the council of a doctor or a qualified entity legally to the profession.

Intelligent and Interactive Systems

The texts are the property of their respective authors and we thank them for giving us the opportunity to share for free to students, teachers and users of the Web their texts will used only for illustrative educational and scientific purposes only.

All the information in our site are given for nonprofit educational purposes

Intelligent and Interactive Systems

Intelligent and Interactive Systems

Training Experience

Types of Training Experience

Control of Training Experience

Distribution of Training Examples

Machine Learning as Search

The Four Components of Learning Systems

Learning Decision Trees

D2

Building a Decision Tree

Intelligent and Interactive Systems

Intelligent and Interactive Systems

Intelligent and Interactive Systems