Mô hình phân biệt

Mô hình phân biệt (tiếng Anh: discriminative model, conditional model) là lớp các mô hình logistic dùng cho phân loại bằng thống kê hay hồi quy.^[1] Chúng phân biệt ranh giới quyết định thông qua dữ liệu quan sát được, chẳng hạn như đạt/không đạt, thắng/thua, sống/chết hoặc khỏe mạnh/bệnh tật.

Các mô hình phân biệt điển hình bao gồm hồi quy logistic (LR), $k$ hàng xóm gần nhất, SVM, các trường điều kiện ngẫu nhiên (CRF) (được chỉ định trên một biểu đồ vô hướng), cây quyết định, và nhiều thứ khác.

Có một loại mô hình khác, đó là mô hình sinh (generative models). Các cách tiếp cận mô hình tạo sinh điển hình bao gồm các phân lớp Naive Bayes, các mô hình hỗn hợp Gauss, bộ mã hóa biến phân (variational autoencoders) và nhiều cái khác.

Định nghĩa

Khác với mô hình hóa tạo sinh, nghiên cứu phân phối xác suất đồng thời $P(x,y)$ , mô hình phân biệt nghiên cứu $P(y|x)$ hoặc các ánh xạ trực tiếp biến không được quan sát đã cho (mục tiêu) $x$ đến một lớp nhãn $y$ tùy theo các biến quan sát (các mẫu huấn luyện). Ví dụ, trong nhận diện đối tượng ngoại lai (outline of object recognition), $x$ giống như là một véctơ các pixel thô (hoặc các đặc tính được trích xuất từ các pixel thô của hình ảnh). Bên trong một khung xác suất, điều này được thực hiện bằng cách mô hình hóa phân phối xác suất có điều kiện (conditional probability distribution) $P(y|x)$ , có thể được sử dụng để dự đoán $y$ từ $x$ . Chú ý rằng vẫn có sự khác biệt giữa mô hình có điều kiện và mô hình phân biệt, mặc dù chúng thường được phân loại một cách đơn giản là mô hình phân biệt.

Mô hình phân biệt thuần túy so với mô hình có điều kiện

Một mô hình điều kiện mô hình phân phối xác suất điều kiện, trong khi đó mô hình phân biệt truyền thống nhắm đến việc tối ưu ánh xạ đầu vào xung quanh các mẫu được đào tạo gần giống nhất.^[2]

Một số cách tiếp cận mô hình phân biệt đối xử điển hình

Cách tiếp cận sau dựa trên giả định rằng nó được cung cấp tập dữ liệu huấn luyện $D=\{(x_{i};y_{i})|i\leq N\in \mathbb {Z} \}$ , theo đó $y_{i}$ là đầu ra tương ứng với đầu vào $x_{i}$ .

Bộ phân loại tuyến tính

Hàm $f(x)$ được dùng để mô phỏng hành vi quan sát được từ tập huấn luyện theo phương pháp bộ phân loại tuyến tính (linear classifier). Sử dụng vectơ đặc tính hợp nhất $\phi (x,y)$ , hàm quyết định được định nghĩa:

f(x,w)=\arg \max _{y}w^{T}\phi (x,y)

Theo diễn dịch Memisevic,^[3] $w^{T}\phi (x,y)$ , hay là $c(x,y;w)$ , tính toán một điểm số đo lường khả năng tính toán của đầu vào $x$ với đầu ra tiềm năng $y$ . Sau đó, $\arg \max$ xác định một lớp với điểm số cao nhất.

Hồi quy logistic (LR)

Từ lúc hàm mất mát 0-1 (0-1 loss function) thường được sử dụng trong lý thuyết quyết định, phân phối xác suất có điều kiện $P(y|x;w)$ , với $w$ là tham số véctơ để tối ưu hóa dữ liệu huấn luyện, có thể được cân nhắc lại như sau đối với mô hình hồi quy logistic:

P(y|x;w)={\frac {1}{Z(x;w)}}\exp(w^{T}\phi (x,y))

, với

Z(x;w)=\textstyle \sum _{y}\displaystyle \exp(w^{T}\phi (x,y))

Phương trình trên thể hiện hồi quy logistic (logistic regression). Lưu ý, sự khác biệt chủ yếu giữa các mô hình là cách chúng đưa ra xác suất hậu nghiệm, được suy ra từ mô hình tham số. Sau đó, có thể tối đa hóa tham số bằng phương trình sau:

L(w)=\textstyle \sum _{i}\displaystyle \log p(y^{i}|x^{i};w)

Nó cũng có thể được thay thế bằng phương trình mất mát log (log loss) như sau:

l^{\log }(x^{i},y^{i},c(x^{i};w))=-\log p(y^{i}|x^{i};w)=\log Z(x^{i};w)-w^{T}\phi (x^{i},y^{i})

Khi mất mát log có thể phân biệt được (hay có tính khả vi), một phương pháp dựa trên gradient có thể được sử dụng để tối ưu hóa mô hình. Mức tối ưu toàn cục được đảm bảo vì hàm mục tiêu là hàm lồi. Độ dốc của log likelihood được thể hiện:

{\frac {\partial L(w)}{\partial w}}=\textstyle \sum _{i}\displaystyle \phi (x^{i},y^{i})-E_{p(y|x^{i};w)}\phi (x^{i},y)

với $E_{p(y|x^{i};w)}$ là kỳ vọng của $p(y|x^{i};w)$ .

Phương pháp trên sẽ cung cấp sự tính toán hiệu quả cho sự phân loại với số lượng tương đối nhỏ.

Xem thêm

Mô hình tạo sinh

Tham khảo

^ “Background: What is a Generative Model?”. Truy cập 26 tháng 1 năm 2021.
^ Ballesteros, Miguel. “Discriminative Models” (PDF). Truy cập ngày 28 tháng 10 năm 2018.^{[liên kết hỏng]}
^ Memisevic, Roland (ngày 21 tháng 12 năm 2006). “An introduction to structured discriminative learning”. Truy cập ngày 29 tháng 10 năm 2018.

Thống kê

Outline of statistics
List of statistics articles

Thống kê mô tả

Continuous probability distribution

Central tendency	Số bình quân Trung bình cộng Trung bình nhân Trung bình điều hòa Số trung vị Số yếu vị
Statistical dispersion	Phương sai Độ lệch chuẩn Hệ số biến thiên Percentile Khoảng biến thiên Độ trải giữa
Shape of a probability distribution	Định lý giới hạn trung tâm Mô men (toán học) Độ xiên (thống kê) Độ nhọn (thống kê) L-moments

Count data

Index of dispersion

Summary tables

Grouped data
Frequency distribution
Contingency table

Hệ số tương quan

Pearson correlation coefficient
Rank correlation
- Spearman's rank correlation coefficient
- Kendall rank correlation coefficient
Partial correlation
Scatter plot

Statistical graphics

Bar chart
Biplot
Box plot
Control chart
Correlogram
Fan chart (statistics)
Forest plot
Histogram
Pie chart
Q–Q plot
Run chart
Scatter plot
Stem-and-leaf display
Radar chart
Violin plot

Thu thập dữ liệu

Design of experiments	Quần thể thống kê Hoạt động thống kê Effect size Power of a test Optimal design Sample size determination Replication (statistics) Missing data
Survey methodology	Sampling (statistics) Stratified sampling Cluster sampling Standard error Opinion poll Questionnaire
Thí nghiệm	Scientific control Randomized experiment Randomized controlled trial Random assignment Blocking (statistics) Interaction (statistics) Factorial experiment
Adaptive Designs	Adaptive clinical trial Up-and-Down Designs Stochastic approximation
Observational study	Cross-sectional study Cohort study Natural experiment Quasi-experiment

Suy luận thống kê

Statistical theory

Quần thể thống kê
Hoạt động thống kê
Phân phối xác suất
Sampling distribution
- Order statistic
Empirical distribution function
- Density estimation
Mô hình thống kê
- Statistical model specification
- Lp space
Statistical parameter
- Location parameter
- Scale parameter
- Shape parameter
Parametric statistics
- Likelihood function Monotone likelihood ratio
- Location–scale family
- Exponential family
Completeness (statistics)
Thống kê đủ
Plug-in principle
- Bootstrapping (statistics)
- U-statistic
- V-statistic
Optimal decision
- Hàm mất mát
Efficiency (statistics)
Statistical distance
- Divergence (statistics)
Asymptotic theory (statistics)
Robust statistics

Frequentist inference

Point estimation	Estimating equations Hợp lý cực đại Method of moments (statistics) M-estimator Minimum-distance estimation Bias of an estimators Minimum-variance unbiased estimator Rao–Blackwell theorem Lehmann–Scheffé theorem Median-unbiased estimator Plug-in principle
Interval estimation	Khoảng tin cậy Pivotal quantity Likelihood interval Prediction interval Tolerance interval Resampling (statistics) Bootstrapping (statistics) Jackknife resampling
Kiểm định giả thuyết thống kê	One- and two-tailed tests Power of a test Uniformly most powerful test Permutation test Resampling (statistics) Multiple comparisons problem
Parametric statistics	Likelihood-ratio test Score test Wald test

Specific tests

Z-test Student's t-test F-test
Goodness of fit	Kiểm định chi bình phương G-test Kolmogorov–Smirnov test Anderson–Darling test Lilliefors test Phép kiểm định Jarque-Bera Shapiro–Wilk test Likelihood-ratio test Model selection Kiểm chứng chéo Akaike information criterion Bayesian information criterion
Rank statistics	Sign test Sample median Wilcoxon signed-rank test Hodges–Lehmann estimator Mann–Whitney U test Nonparametric statistics Analysis of variance Kruskal–Wallis one-way analysis of variance Friedman test Jonckheere's trend test

Suy luận Bayes

Bayesian probability
- Prior probability
- Xác suất hậu nghiệm
Credible interval
Bayes factor
Ước lượng Bayes
- Maximum a posteriori estimation

Hệ số tương quan	Pearson correlation coefficient Partial correlation Confounding Coefficient of determination
Phân tích hồi quy	Errors and residuals Regression validation Mixed model Simultaneous equations models Multivariate adaptive regression spline
Hồi quy tuyến tính	Simple linear regression Ordinary least squares General linear model Bayesian linear regression
Non-standard predictors	Hồi quy phi tuyến tính Nonparametric regression Semiparametric regression Isotonic regression Robust regression Hiệp phương sai không đồng nhất Homoscedasticity
Generalized linear model	Exponential family Logistic regression / Binomial regression / Poisson regressions
Partition of sums of squares	Analysis of variance Analysis of covariance Multivariate analysis of variance Degrees of freedom (statistics)

Categorical variable / Multivariate statistics / Chuỗi thời gian / Survival analysis

Categorical variable

Cohen's kappa
Contingency table
Mô hình xác suất dạng đồ thị
Poisson regression
McNemar's test

Multivariate statistics

General linear model
Multivariate analysis of variance
Phép phân tích thành phần chính
Canonical correlation
Linear discriminant analysis
Phân nhóm dữ liệu
Phân loại bằng thống kê
Structural equation modeling
- Phân tích nhân tố
Joint probability distributions
- Elliptical distributions
  - Phân phối chuẩn nhiều chiều

Chuỗi thời gian

General	Decomposition of time series Linear trend estimation Stationary process Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller test Kiểm định Johansen Ljung–Box test Thống kê Durbin–Watson Breusch–Godfrey test
Time domain	Autocorrelation Partial autocorrelation function Cross-correlation Arma Box–Jenkins method ARCH Vector autoregression
Frequency domain	Spectral density estimation Giải tích Fourier Wavelet Whittle likelihood

Survival analysis

Survival function	Kaplan–Meier estimator Proportional hazards models Accelerated failure time model First-hitting-time model
Failure rate	Nelson–Aalen estimator
Test	Logrank test

List of fields of application of statistics

Biostatistics	Tin sinh học Thử nghiệm lâm sàngs / Clinical study design Dịch tễ học Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Statistical process control / Kiểm soát chất lượng Reliability engineering System identification
Social statistics	Actuarial science Điều tra dân số Crime statistics Demographic statistics Kinh tế lượng Jurimetrics National accounts Official statistics Demographic statistics Psychometrics
Spatial analysis	Bản đồ học Thống kê môi trường Hệ thống Thông tin Địa lý Địa thống kê Kriging

[[::Thể loại:Thống kê]]
'
Commons:Category:Statistics
Wikipedia:WikiProject Statistics