Getting to the gradient descent algorithmBatch vs incremental gradient descentA constrained gradient descent...

Examples of a statistic that is not independent of sample's distribution?

Are the terms "stab" and "staccato" synonyms?

How much attack damage does the AC boost from a shield prevent on average?

BitNot does not flip bits in the way I expected

Finding algorithms of QGIS commands?

Could you please stop shuffling the deck and play already?

Why doesn't this Google Translate ad use the word "Translation" instead of "Translate"?

How to pass a string to a command that expects a file?

Space in array system equations

Who deserves to be first and second author? PhD student who collected data, research associate who wrote the paper or supervisor?

How did Alan Turing break the enigma code using the hint given by the lady in the bar?

Algorithm to convert a fixed-length string to the smallest possible collision-free representation?

Do f-stop and exposure time perfectly cancel?

What wound would be of little consequence to a biped but terrible for a quadruped?

Should I tell my boss the work he did was worthless

A three room house but a three headED dog

How strictly should I take "Candidates must be local"?

Making a sword in the stone, in a medieval world without magic

Does "variables should live in the smallest scope as possible" include the case "variables should not exist if possible"?

Is Gradient Descent central to every optimizer?

What are some noteworthy "mic-drop" moments in math?

Fourth person (in Slavey language)

infinitive telling the purpose

How are such low op-amp input currents possible?

Getting to the gradient descent algorithm

Batch vs incremental gradient descentA constrained gradient descent algorithmHow does one rigorously prove that gradient descent indeed decreases the function in question locally i.e. show $f(x^{(t+1)}) leq f(x^{(t)})$?How do mirror prox and accelerated methods differ conceptually?(steepest) gradient descent for minimizing a quadratic function $langle x, Ax rangle$ with $A succeq 0$Taylor Series Remainder, gradient methodsAbout the feasibility of a subgradient stepUnderstanding Newton method when optimizing cost function depending on a triangle.Second-Order Taylor Series Terms In Gradient DescentHow to show convergence of Frank-Wolfe algorithm ( or conditional gradient method)?

I understand that gradient descent comes from the (quite natural) idea that we might want to choose our next weight vector ($w^{t+1}$) as

$$w^{t+1} = arg min_w frac{1}{2} |w-w^t|^{2} + eta f(w^t) + langle w-w^t, nabla f(w^t) rangle$$

because the approximation $$f(w) approx f(w^t)~+ langle w-w^t,nabla f(w^t) rangle$$ gets worse as $|w-w^t|$ grows.

But, how do I get from the first expression above to the gradient descent algorithm:
$$w^{t+1} = w^t + nabla f(w^t)?$$
A step-by-step explanation would be much appreciated.

edited Mar 9 at 16:47

Rodrigo de Azevedo

13k41960

asked Mar 9 at 16:22

Barton

154

$begingroup$
Why is the term $eta f(w^t)$ in the objective function?
$endgroup$
– Rodrigo de Azevedo
Mar 9 at 16:48

add a comment |

I understand that gradient descent comes from the (quite natural) idea that we might want to choose our next weight vector ($w^{t+1}$) as

$$w^{t+1} = arg min_w frac{1}{2} |w-w^t|^{2} + eta f(w^t) + langle w-w^t, nabla f(w^t) rangle$$

because the approximation $$f(w) approx f(w^t)~+ langle w-w^t,nabla f(w^t) rangle$$ gets worse as $|w-w^t|$ grows.

But, how do I get from the first expression above to the gradient descent algorithm:
$$w^{t+1} = w^t + nabla f(w^t)?$$
A step-by-step explanation would be much appreciated.

edited Mar 9 at 16:47

Rodrigo de Azevedo

13k41960

asked Mar 9 at 16:22

Barton

154

$begingroup$
Why is the term $eta f(w^t)$ in the objective function?
$endgroup$
– Rodrigo de Azevedo
Mar 9 at 16:48

add a comment |

I understand that gradient descent comes from the (quite natural) idea that we might want to choose our next weight vector ($w^{t+1}$) as

$$w^{t+1} = arg min_w frac{1}{2} |w-w^t|^{2} + eta f(w^t) + langle w-w^t, nabla f(w^t) rangle$$

because the approximation $$f(w) approx f(w^t)~+ langle w-w^t,nabla f(w^t) rangle$$ gets worse as $|w-w^t|$ grows.

But, how do I get from the first expression above to the gradient descent algorithm:
$$w^{t+1} = w^t + nabla f(w^t)?$$
A step-by-step explanation would be much appreciated.

edited Mar 9 at 16:47

Rodrigo de Azevedo

13k41960

asked Mar 9 at 16:22

Barton

154

I understand that gradient descent comes from the (quite natural) idea that we might want to choose our next weight vector ($w^{t+1}$) as

$$w^{t+1} = arg min_w frac{1}{2} |w-w^t|^{2} + eta f(w^t) + langle w-w^t, nabla f(w^t) rangle$$

because the approximation $$f(w) approx f(w^t)~+ langle w-w^t,nabla f(w^t) rangle$$ gets worse as $|w-w^t|$ grows.

But, how do I get from the first expression above to the gradient descent algorithm:
$$w^{t+1} = w^t + nabla f(w^t)?$$
A step-by-step explanation would be much appreciated.

optimization numerical-optimization gradient-descent

edited Mar 9 at 16:47

Rodrigo de Azevedo

13k41960

asked Mar 9 at 16:22

Barton

154

edited Mar 9 at 16:47

Rodrigo de Azevedo

13k41960

asked Mar 9 at 16:22

Barton

154

edited Mar 9 at 16:47

Rodrigo de Azevedo

13k41960

edited Mar 9 at 16:47

Rodrigo de Azevedo

13k41960

edited Mar 9 at 16:47

Rodrigo de Azevedo

13k41960

asked Mar 9 at 16:22

Barton

154

asked Mar 9 at 16:22

Barton

154

asked Mar 9 at 16:22

Barton

154

$begingroup$
Why is the term $eta f(w^t)$ in the objective function?
$endgroup$
– Rodrigo de Azevedo
Mar 9 at 16:48

add a comment |

$begingroup$
Why is the term $eta f(w^t)$ in the objective function?
$endgroup$
– Rodrigo de Azevedo
Mar 9 at 16:48

Why is the term $eta f(w^t)$ in the objective function?

– Rodrigo de Azevedo
Mar 9 at 16:48

add a comment |

1 Answer
1

active

oldest

votes

You can motivate gradient descent as both minimization of a quadratic approximation of the objective or a linear function with a penalty for moving away from the current point. Here's what you are minimizing actually:
$$x_{k+1} = argmin_{xin mathbb{R}^n}{f(x_{k}) + nabla f(x_k) cdot (x-x_k) + frac{1}{2tau}||x-x_k||^2}$$
$$x_{k+1} = argmin_{xin mathbb{R}^n}{nabla f(x_k) cdot (x-x_k) + frac{1}{2tau}||x-x_k||^2}$$
$$nabla f(x_k) + frac{1}{tau}(x_{k+1}-x_k) = 0$$
$$x_{k+1} = x_k - tau nabla f(x_k)$$
The second interpretation follows from considering the function $f_{x_k}(x) = f(x_k) + nabla f(x_k) cdot (x-x_k)$ which is obviously linear, and adding the penalty you get $f_{x_k}^{tau}(x) = f_{x_k}(x) + frac{1}{2tau}||x-x_k||^2$ which is a $tau^{-1}$ strongly convex function and thus has a unique minimizer $x_{k+1}$.

edited Mar 9 at 17:07

answered Mar 9 at 16:46

lightxbulb

1,125311

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3141295%2fgetting-to-the-gradient-descent-algorithm%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

edited Mar 9 at 17:07

answered Mar 9 at 16:46

lightxbulb

1,125311

add a comment |

edited Mar 9 at 17:07

answered Mar 9 at 16:46

lightxbulb

1,125311

add a comment |

edited Mar 9 at 17:07

answered Mar 9 at 16:46

lightxbulb

1,125311

edited Mar 9 at 17:07

answered Mar 9 at 16:46

lightxbulb

1,125311

edited Mar 9 at 17:07

answered Mar 9 at 16:46

lightxbulb

1,125311

answered Mar 9 at 16:46

lightxbulb

1,125311

answered Mar 9 at 16:46

lightxbulb

1,125311

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dtjytyk