cover image: 10-701: Introduction to Machine Learning Lecture 8 โ€“ Regularization

Premium

10-701: Introduction to Machine Learning Lecture 8 โ€“ Regularization

2024

10-701: Introduction to Machine Learning Lecture 8 โ€“ Regularization Henry Chai 2/12/24Front Matter Announcements: HW2 released 2/7, due 2/19 (previously 2/16) at 11:59 PM HW3 released 2/19 (previously 2/16), due 2/28 (previously 2/26) at 11:59 PM Lecture schedule has been updated, see the course website for full details Lecture on 2/21 (Wednesday) and Recitation on 2/23 (Friday) have been [...] = ๐œ”/ = ๐œ”0 = ๐œ”1 = ๐œ”!, = 0 Henry Chai - 2/12/24 1 ๐‘ ฮง๐Ž โˆ’ ๐’š 5 ฮง๐Ž โˆ’ ๐’š 27Hard Constraints โ„‹!, = 10()-order polynomials ๐œ™!,!, ๐‘ฅ = ๐‘ฅ, ๐‘ฅ#, ๐‘ฅ%, ๐‘ฅ&, ๐‘ฅ-, ๐‘ฅ., ๐‘ฅ/, ๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ!, Given ฮง = 1 ๐œ™!,!, ๐‘ฅ ! 1 ๐œ™!,!, ๐‘ฅ # โ‹ฎ โ‹ฎ 1 ๐œ™!,!, ๐‘ฅ 4 and ๐’š = ๐‘ฆ ! ๐‘ฆ # โ‹ฎ ๐‘ฆ 4 find ๐Ž = ๐œ”,, ๐œ”!, ๐œ”#, ๐œ”%, ๐œ”&, ๐œ”-, ๐œ”., ๐œ”/, ๐œ”0, ๐œ”1, ๐œ”!, that minimizes Subject to ๐œ”% = ๐œ”& = ๐œ”- = ๐œ”. [...] = ๐œ”/ = ๐œ”0 = ๐œ”1 = ๐œ”!, = 0 Henry Chai - 2/12/24 28 1 ๐‘ ; 63! 4 ; 23, !, ๐‘ฅ2 6 ๐œ”2 โˆ’ ๐‘ฆ 6 # โ„‹!, = 10()-order polynomials ๐œ™!,!, ๐‘ฅ = ๐‘ฅ, ๐‘ฅ#, ๐‘ฅ%, ๐‘ฅ&, ๐‘ฅ-, ๐‘ฅ., ๐‘ฅ/, ๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ!, Given ฮง = 1 ๐œ™!,!, ๐‘ฅ ! 1 ๐œ™!,!, ๐‘ฅ # โ‹ฎ โ‹ฎ 1 ๐œ™!,!, ๐‘ฅ 4 and ๐’š = ๐‘ฆ ! ๐‘ฆ # โ‹ฎ ๐‘ฆ 4 find ๐Ž = ๐œ”,, ๐œ”!, ๐œ”#, ๐œ”%, ๐œ”&, ๐œ”-, ๐œ”., ๐œ”/, ๐œ”0, ๐œ”1, ๐œ”!, that minimizes Subject to nothing! Hard Constraints Henry Chai - 2/12/24 29 1 ๐‘ ; 63! [...] Given ฮง = 1 ๐œ™! ๐’™ ! โ‹ฏ ๐œ™7 ๐’™ ! โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ 1 ๐œ™! ๐’™ 4 โ‹ฏ ๐œ™7 ๐’™ 4 and ๐’š = ๐‘ฆ ! ๐‘ฆ # โ‹ฎ ๐‘ฆ 4 , find ๐Ž that minimizes Subject to: Soft Constraints ๐Ž # # = ๐Ž5๐Ž = ; 23, " ๐œ”2 # โ‰ค ๐ถ Henry Chai - 2/12/24 31 1 ๐‘ ฮง๐Ž โˆ’ ๐’š 5 ฮง๐Ž โˆ’ ๐’šHenry Chai - 2/12/24 ๐Ž5๐Ž = ๐ถ 0,0 subject to ๐Ž5๐Ž โ‰ค ๐ถ minimize โ„“๐’Ÿ ๐Ž = ฮง๐Ž โˆ’ ๐’š 5 ฮง๐Ž โˆ’ ๐’š U๐Ž โ„“๐’Ÿ ๐Ž Soft Constraints 320,0 Henry Chai - 2/12/24 Soft Constraints subject to ๐Ž5๐Ž โ‰ค ๐ถ minimize โ„“๐’Ÿ ๐Ž = ฮง๐Ž โˆ’ [...] ๐Ž = โ„“๐’Ÿ ๐Ž + ๐œ†=๐Ž5๐Ž โ‡• Henry Chai - 2/12/24 Soft Constraints: Solving for #๐Ž#$% subject to ๐Ž5๐Ž โ‰ค ๐ถ minimize โ„“๐’Ÿ ๐Ž = ฮง๐Ž โˆ’ ๐’š 5 ฮง๐Ž โˆ’ ๐’š 35โˆ‡๐Žโ„“๐’Ÿ :>? [...] ๐Ž = 2 ๐›ธ5๐›ธ๐Ž โˆ’ ๐›ธ5๐’š + ๐œ†=๐Ž 2 ๐›ธ5๐›ธU๐Ž9:; โˆ’ ๐›ธ5๐’š + ๐œ†= U๐Ž9:; = 0 ๐›ธ5๐›ธ + ๐œ†=๐ผ"@! U๐Ž9:; = ๐›ธ5๐’š U๐Ž9:; = ๐›ธ5๐›ธ + ๐œ†=๐ผ"@! A!๐›ธ5๐’š Henry Chai - 2/12/24 Ridge Regression Adding this positive (๐œ†= โ‰ฅ 0) diagonal matrix can help if ๐›ธ5๐›ธ is not invertible! minimize โ„“๐’Ÿ :>? [...] ๐Ž = โ„“๐’Ÿ ๐Ž + ๐œ†=๐Ž5๐Ž 360 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Target Function 10th-Order Hypothesis Noisy Samples Ridge Regression 10-dimensional target function with additive Gaussian noise โ„‹!, = 10()-order polynomial Henry Chai - 2/12/24 370 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Target Function 10th-Order Hypothesis [...] ๐พF ๐’™, ๐’™โ€ฒ = ฮฆ ๐’™ 5ฮฆ ๐’™G โˆ€ ๐’™, ๐’™G โˆˆ ๐’ณ ๐พF ๐’™, ๐’™โ€ฒ should be cheaper to compute than ฮฆ ๐’™ Example: ฮฆ# G ๐’™ = ๐‘ฅ!, โ€ฆ , ๐‘ฅ", ๐‘ฅ!#, 2๐‘ฅ!๐‘ฅ#, โ€ฆ , 2๐‘ฅ"A!๐‘ฅ", ๐‘ฅ" # ฮฆ# G ๐’™ 5ฮฆ# G ๐’™G = ; H3! " ๐‘ฅH๐‘ฅH G + ; H3! " ๐‘ฅH #๐‘ฅH G# + ; H3! " ; IJH 2๐‘ฅH๐‘ฅH G๐‘ฅI๐‘ฅI G ฮฆ# G โƒ—๐‘ฅ 5ฮฆ# G โƒ—๐‘ฅG = ; H3! " ๐‘ฅH๐‘ฅH G + ; H3! " ๐‘ฅH๐‘ฅH G # = ๐’™5๐’™G + ๐’™5๐’™G # ๐พF! " ๐’™, ๐’™โ€ฒ = ๐’™5๐’™G + ๐’™5๐’™G # Computing ฮฆ# G ๐’™ 5ฮฆ# G ๐’™G requires ๐‘‚ ๐ท# time whereas computing ๐พF! " [...] ๐พ ๐’™, ๐’™โ€ฒ = ฮฆ ๐’™ 5ฮฆ ๐’™G โˆ€ ๐’™, ๐’™โ€ฒ โ‡• the Gram matrix ฮš = ๐พ ๐’™ ! , ๐’™ ! ๐พ ๐’™ ! , ๐’™ # โ‹ฏ ๐พ ๐’™ ! , ๐’™ 4 ๐พ ๐’™ # , ๐’™ ! ๐พ ๐’™ # , ๐’™ # โ‹ฏ ๐พ ๐’™ # , ๐’™ 4 โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ ๐พ ๐’™ 4 , ๐’™ ! ๐พ ๐’™ 4 , ๐’™ # โ‹ฏ ๐พ ๐’™ 4 , ๐’™ 4 is symmetric and positive semi-definite โˆ€ sets ๐’™ ! , ๐’™ # , โ€ฆ , ๐’™ 4 57 Henry Chai - 2/12/24Key Takeaways Henry Chai - 2/12/24 Polynomial/non-linear feature transformations allow for learning non-linear functions/decision bo
Pages
58
Published in
United States of America