10-701: Introduction to Machine Learning Lecture 8 โ Regularization Henry Chai 2/12/24Front Matter Announcements: HW2 released 2/7, due 2/19 (previously 2/16) at 11:59 PM HW3 released 2/19 (previously 2/16), due 2/28 (previously 2/26) at 11:59 PM Lecture schedule has been updated, see the course website for full details Lecture on 2/21 (Wednesday) and Recitation on 2/23 (Friday) have been [...] = ๐/ = ๐0 = ๐1 = ๐!, = 0 Henry Chai - 2/12/24 1 ๐ ฮง๐ โ ๐ 5 ฮง๐ โ ๐ 27Hard Constraints โ!, = 10()-order polynomials ๐!,!, ๐ฅ = ๐ฅ, ๐ฅ#, ๐ฅ%, ๐ฅ&, ๐ฅ-, ๐ฅ., ๐ฅ/, ๐ฅ0, ๐ฅ1, ๐ฅ!, Given ฮง = 1 ๐!,!, ๐ฅ ! 1 ๐!,!, ๐ฅ # โฎ โฎ 1 ๐!,!, ๐ฅ 4 and ๐ = ๐ฆ ! ๐ฆ # โฎ ๐ฆ 4 find ๐ = ๐,, ๐!, ๐#, ๐%, ๐&, ๐-, ๐., ๐/, ๐0, ๐1, ๐!, that minimizes Subject to ๐% = ๐& = ๐- = ๐. [...] = ๐/ = ๐0 = ๐1 = ๐!, = 0 Henry Chai - 2/12/24 28 1 ๐ ; 63! 4 ; 23, !, ๐ฅ2 6 ๐2 โ ๐ฆ 6 # โ!, = 10()-order polynomials ๐!,!, ๐ฅ = ๐ฅ, ๐ฅ#, ๐ฅ%, ๐ฅ&, ๐ฅ-, ๐ฅ., ๐ฅ/, ๐ฅ0, ๐ฅ1, ๐ฅ!, Given ฮง = 1 ๐!,!, ๐ฅ ! 1 ๐!,!, ๐ฅ # โฎ โฎ 1 ๐!,!, ๐ฅ 4 and ๐ = ๐ฆ ! ๐ฆ # โฎ ๐ฆ 4 find ๐ = ๐,, ๐!, ๐#, ๐%, ๐&, ๐-, ๐., ๐/, ๐0, ๐1, ๐!, that minimizes Subject to nothing! Hard Constraints Henry Chai - 2/12/24 29 1 ๐ ; 63! [...] Given ฮง = 1 ๐! ๐ ! โฏ ๐7 ๐ ! โฎ โฎ โฑ โฎ 1 ๐! ๐ 4 โฏ ๐7 ๐ 4 and ๐ = ๐ฆ ! ๐ฆ # โฎ ๐ฆ 4 , find ๐ that minimizes Subject to: Soft Constraints ๐ # # = ๐5๐ = ; 23, " ๐2 # โค ๐ถ Henry Chai - 2/12/24 31 1 ๐ ฮง๐ โ ๐ 5 ฮง๐ โ ๐Henry Chai - 2/12/24 ๐5๐ = ๐ถ 0,0 subject to ๐5๐ โค ๐ถ minimize โ๐ ๐ = ฮง๐ โ ๐ 5 ฮง๐ โ ๐ U๐ โ๐ ๐ Soft Constraints 320,0 Henry Chai - 2/12/24 Soft Constraints subject to ๐5๐ โค ๐ถ minimize โ๐ ๐ = ฮง๐ โ [...] ๐ = โ๐ ๐ + ๐=๐5๐ โ Henry Chai - 2/12/24 Soft Constraints: Solving for #๐#$% subject to ๐5๐ โค ๐ถ minimize โ๐ ๐ = ฮง๐ โ ๐ 5 ฮง๐ โ ๐ 35โ๐โ๐ :>? [...] ๐ = 2 ๐ธ5๐ธ๐ โ ๐ธ5๐ + ๐=๐ 2 ๐ธ5๐ธU๐9:; โ ๐ธ5๐ + ๐= U๐9:; = 0 ๐ธ5๐ธ + ๐=๐ผ"@! U๐9:; = ๐ธ5๐ U๐9:; = ๐ธ5๐ธ + ๐=๐ผ"@! A!๐ธ5๐ Henry Chai - 2/12/24 Ridge Regression Adding this positive (๐= โฅ 0) diagonal matrix can help if ๐ธ5๐ธ is not invertible! minimize โ๐ :>? [...] ๐ = โ๐ ๐ + ๐=๐5๐ 360 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Target Function 10th-Order Hypothesis Noisy Samples Ridge Regression 10-dimensional target function with additive Gaussian noise โ!, = 10()-order polynomial Henry Chai - 2/12/24 370 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Target Function 10th-Order Hypothesis [...] ๐พF ๐, ๐โฒ = ฮฆ ๐ 5ฮฆ ๐G โ ๐, ๐G โ ๐ณ ๐พF ๐, ๐โฒ should be cheaper to compute than ฮฆ ๐ Example: ฮฆ# G ๐ = ๐ฅ!, โฆ , ๐ฅ", ๐ฅ!#, 2๐ฅ!๐ฅ#, โฆ , 2๐ฅ"A!๐ฅ", ๐ฅ" # ฮฆ# G ๐ 5ฮฆ# G ๐G = ; H3! " ๐ฅH๐ฅH G + ; H3! " ๐ฅH #๐ฅH G# + ; H3! " ; IJH 2๐ฅH๐ฅH G๐ฅI๐ฅI G ฮฆ# G โ๐ฅ 5ฮฆ# G โ๐ฅG = ; H3! " ๐ฅH๐ฅH G + ; H3! " ๐ฅH๐ฅH G # = ๐5๐G + ๐5๐G # ๐พF! " ๐, ๐โฒ = ๐5๐G + ๐5๐G # Computing ฮฆ# G ๐ 5ฮฆ# G ๐G requires ๐ ๐ท# time whereas computing ๐พF! " [...] ๐พ ๐, ๐โฒ = ฮฆ ๐ 5ฮฆ ๐G โ ๐, ๐โฒ โ the Gram matrix ฮ = ๐พ ๐ ! , ๐ ! ๐พ ๐ ! , ๐ # โฏ ๐พ ๐ ! , ๐ 4 ๐พ ๐ # , ๐ ! ๐พ ๐ # , ๐ # โฏ ๐พ ๐ # , ๐ 4 โฎ โฎ โฑ โฎ ๐พ ๐ 4 , ๐ ! ๐พ ๐ 4 , ๐ # โฏ ๐พ ๐ 4 , ๐ 4 is symmetric and positive semi-definite โ sets ๐ ! , ๐ # , โฆ , ๐ 4 57 Henry Chai - 2/12/24Key Takeaways Henry Chai - 2/12/24 Polynomial/non-linear feature transformations allow for learning non-linear functions/decision bo
- Pages
- 58
- Published in
- United States of America