-
Notifications
You must be signed in to change notification settings - Fork 15
/
Copy path01-Representations.tex
executable file
·124 lines (102 loc) · 7.16 KB
/
01-Representations.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
\documentclass[main]{subfiles}
\begin{document}
\setcounter{section}{1}
\section{Representations}
%@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
% summarizes lecture 1 & 2
%author: Benjamin Ellenberger
% This was never tested in the exam before
\subsection{Learning Problem of Pattern Recognition}
\begin{itemize}
\item \textbf{Representation of objects $\rightarrow$ Data representation}:
Choosing the wrong data representation can induce inappropriate similarity
measures!
\item \textbf{Definition/modelling of structure}: What is a pattern?
A statistical definition of good and poor structures is mandatory for rational
pattern recognition!
\item \textbf{Optimization}: Search for preferred structures multi-scale optimization yields efficient algorithms to detect good structures in
data.
\item \textbf{Validation}: Are the structures indeed in the data or are
they explained by fluctuations? How good are you solving the problem?
Without validation, any pattern recognition strategy is doomed to fail.
\end{itemize}
\subsection{Objects, Measurements and Data}
In order to do detection, classification, abstract on objects, we have to measure them to represent them in a data space. How to represent data is a crucial part to successful learning. The concrete choice of representation ('features') learning methods expect standardized representations of data {\color{orange}\emph{(e.g. Points in vector spaces, nodes in a graph, similarity matrices ...)}}
\begin{itemize}
\item Object space: Design/Configuration/ Object space \(\printlatex{\mathcal{O}}\)
\item Measurement/Feature: Measurement \(X\) maps an object into a domain
\(\printlatex{X: \mathcal{O}^{(1)}\ldots\mathcal{O}^{(R)}\rightarrow \K}\) and represents an object in a data space.
\item Data:
\begin{itemize}
\item \textbf{Monadic data} \(\printlatex{\mathcal{O} \rightarrow \R^{d},~~~~ o \rightarrow X_o}\): An indivisible, impenetrable unit of data such as {\color{orange}\emph{temperature, water depth, pressure, intensity}} etc..
\item \textbf{Dyadic data} \(\printlatex{\mathcal{O}^{(1)}\times \mathcal{O}^{(2)} \rightarrow \R,~~~~ (o_1,o_2) \rightarrow X_{o_1,o_2}}\): Data consisting of two monadic datasets of different object spaces. This can be {\color{orange}\emph{\(\printlatex{\{users\} \times \{websites\}}\)}} etc..
\item \textbf{Polyadic data} \(\printlatex{\mathcal{O}^{(1)}\times \mathcal{O}^{(2)} \times \ldots \times \mathcal{O}^{(R)} \rightarrow \R},~~~~ (o_1,o_2,o_3) \rightarrow X_{o_1,o_2,o_3}\): Data consisting of multiple monadic datasets of different object spaces.
\item \textbf{Pairwise data} \(\printlatex{\mathcal{O} \times \mathcal{O} \rightarrow \R,~~~~ (o_1,o_2) \rightarrow X_{o_1,o_2}}\): Data that comes in pairs of the same object space such as {\color{orange}\emph{\(\printlatex{\{proteins\} \times \{proteins\}}\)}} comparisons.
\end{itemize}
\end{itemize}
The choice of features \(X\) is an engineering problem.
\begin{enumerate}
\item Ideally as informative as possible about label Y
\item Must consider cost of acquiring / computing them
\item Poor choice of features \(\printlatex{\rightarrow}\) no luck with learning!
\end{enumerate}
In training, we must find a trade-off between \textbf{underfitting} and \textbf{overfitting} the data.
\begin{enumerate}
\item Depending on the model, you might get classification areas that are not supported by data.
\item It is important that you take the nature of the data into account (if they are probabilities and you do not see that in the first place, you will pay it with computation time)
\end{enumerate}
\subsection{Loss Minimization}
Given the independent random variable \(X\) the conditional
expected loss is defined as:
\[\printlatex{R(f, X) = \int \underbrace{Q(Y, f(X))}_{\text{Loss function}}P(Y|X)dY}\]
Since we can not calculate the Conditional Expected Loss directly because we do not know \(\printlatex{P(Y|X)}\), we have to use empirical data to find it by minimizing the loss between our labeled data and our values estimated by our estimator.
Separate your data into training data \(Z_{train}\) and test data \(Z_{test}\).
The training data is used to train your estimator with, meaning it learns from that data.
An additional validation data set \(Z_{validation}\) is usually used to guide estimator selection.
Test data cannot be used before the final estimator has been selected!
\[\printlatex{Z_{train} = {(X_1 , Y_1 ), . . . , (X_n , Y_n )}}\]
\[\printlatex{Z_{test} = {(X_{n+1} , Y_{n+1} ), . . . , (X_{n+m} , Y_{n+m} )}}\]
First we find the function \(\hat f\) that minimizes the loss based on your training data.
\[\printlatex{\hat f \in \underset{f \in \mathbb{H}}{\mathrm{arg min}} \hat R (\hat f,\mathbb{Z^{train}})}\]
whereas the training error \(R(\hat f,Z_{train})\) for an Empirical Loss Minimizer we can find using our loss function \(Q(y,f(x))\):
\[\printlatex{\hat R(\hat f,Z^{train}) = \frac{1}{n} \sum\limits^n_{i=1}Q(Y_i,\hat f(X_i))}\]
\subsection{Data Types, Scale, Transformation invariances}
\subsubsection{Data Types and Scale}
\begin{itemize}
\item \textbf{Nominal or categorical scale}: Qualitative, but without quantitative measurements, e.g. binary scale \(\printlatex{\mathbb{X} = \{0, 1\}}\) (presence or absence of
properties). Ordering does not matter.
\item \textbf{Ordinal scale}: Measurement values are meaningful only with respect to other measurements, i.e., the rank order of measurements carries
the information, not the numerical differences {\color{orange}(\emph{e.g. information on the ranking of different marathon races)}}
\item \textbf{Quantitative scales}
\begin{itemize}
\item \textbf{Interval scale}: The relation of numerical differences carries
the information. Invariance w.r.t. translation and scaling {\color{orange}\emph{(Fahrenheit scale of temperature)}}.
\item \textbf{Ratio scale}: Zero value of the scale carries information but
not the measurement unit. {\color{orange}\emph{(Kelvin scale)}}.
\item \textbf{Absolute scale}: Absolute values are meaningful. {\color{orange}\emph{(grades of final exams)}}
\end{itemize}
\end{itemize}
\subsubsection{Transformation invariances}
\textbf{Importance of invariances:} if the measurements are invariant under a set of transformation then the mathematical definition of structure should obey the same invariances. Otherwise, our structure search procedure breaks the symmetry in an a priori (not data-dependent) way.
\begin{table}[H]
\centering
\begin{tabular}{|l|l|}
\hline
scale type & transformation invariances \\
\hline
nominal & \(\printlatex{T = \{f : \R \rightarrow \R~|~\text{f bijective}\}}\)\\
ordinal & \(\printlatex{T = \{f : \R \rightarrow \R~|~f(x_1 ) < f (x_2 ), \forall x_1 < x_2 \}}\)\\
interval & \(\printlatex{T = \{f : \R \rightarrow \R~|~f(x) = ax + c, a \in \R^+ , c \in \R\}}\)\\
ratio & \(\printlatex{T = \{f : \R \rightarrow \R~|~f(x) = ax, a \in \R^+ \}}\)\\
absolute & \(\printlatex{T = \{f : \R \rightarrow \R~|~\text{f is identity map}\}}\)\\
\hline
\end{tabular}
\caption{Formal characterisation of different scale types and their invariance properties.}
\end{table}
\subsection{Readings}
\begin{enumerate}
\item for each topic:
\item Topic 1: Bishop, Chapter x / Duda, Chapter y
\end{enumerate}
\todo[inline]{Write down readings from the books covering the topics of the representations section.}
\end{document}