본문 바로가기
Artificial Intelligence/Deep Learning

coursera_week2_6_Derivatives with a Computation Graph

by raphael3 2017. 10. 4.
반응형
here's a computation graph;. Let's say you want to compute the derivative of J with respect to V. If we were to take the value of V and change it a little bit, how would the value of J change?(만약 V값을 조금 바꾼다면, J값은 얼마나 바뀔까?) Well, J is defined as 3*V, and right now V is equal to 11. If we're to pump up V by a little bit to 11.001, then J, which is currently 33, would end up being pumped up to the new value of V, 33.003. 
So the derivative of J with respect to V is equal to three(3). because the increase in J is three times the increase in V. This is very analogous to the example we had in the previous video where we had f(a) equals 3a. We then derive that df(a)/da which was slightly simplified notation, you can read as df/da. We have J equals 3V and so dJ/dV is equal to three, with J playing the role of f(), and V playing the role of a in this previous example. 

In the terminology of back-propagation what we've seen is that if you want to compute the derivative of this final upper variable which uses variable you care most about;, with respect to V, then we're done sort of one step of backpropagation. so the called one step backwards in this graph. 

Now, let's look at another example. What is dJ/da? In other words, if we pump up the value of a, how does that affect the value of J;? 
Variable a is equal to 5. let's pump it up to 5.001. 
The impact of that is V, which ia a plus U, 11. We can increase this value to 11.001. 
And then we've already seen as above that J, now gets pumped up to 33.003. 
So, what we've seen is that if you increase a by 0.001, J increases by 0.003. 


And by increase a I mean if you were to take this value 5 and just plug in the new value then the change of a will propagate to the right of the computation graph.(a값의 변화는 오른쪽 방향으로 전파되어 다음 노드의 값에 영향을 끼친다.) 

J의 증가는 a의 증가의 3배이다. 즉 미분값이 3이다. a값을 바꾸면 그 영향이 v의 값을 바꾸고, v값의 변화가 J의 값에 변화를 가져온다. 
V값은 얼마나 증가하게 될까? V값은 dV/dA에 의 해 결정된 만큼의 양이 증가된다. 그러고나서 V의 값의 변화는 J값의 변화를 불러올 것이다. 이를 미적분학에서는 chain rule이라고 부른다.

a —> v —> J 로 영향을 준다.



What we saw from this calculation() is that if you increase a by 0.001, V changes by the same amount. So dV/dA is equal to one. 
dJ/dV is equal to 3 and dV/da is equal to 1. so the product of this, 3 * 1. so That actually gives you the correct value that dJ/da is equal to 3;. This little illustration shows that having computed dJ/dV,with respect to V, it can then help you to compute dJ/da;. 



A lot of the computations you have would be to compute the derivative of the final output variable, letter j in this case, with various intermediate variable such as a, b, c, u, r, v(해야 할 계산의 많은 부분은 여러 개의 변수에 대하여 마지막 변수의 미분을 계산하는 것이 된다.)

What we've done so far is, go backward here and figured out that, d_v is equal to 3. And again, the definition of d_v, is dJ/dV. d_a is equal to 3 and again, d_a is the value of dJ/da;. 

let's keep computing derivatives. Let's look at the value, u. So what is dJ/du? Well, through a similar calculation as what we did before, now we start off with u equals 6. 
If you pump up u to 6.001, 
then v which is previous 11, goes up to 11.001, 
and so J goes from 33 to 33.003. 

The analysis for u is very similar to the analysis we did for a. This is actually computed as dJ/dV * dV/dU. With this, we had . So we've got one more step of back propagation, we end up computing that du is also equal to 3, and du is dJ/du. Now, we just step through one last example in detail. what is dJ/db? Imagine if you are allowed to change the value of b and you want to tweak b a little bit in order to minimize or maximize the value of J.(J를 최대화하거나 최소화하는 등의 최적화하기 위하여 b값을 약간 수정하고 싶을 수 있다고 가정해보자.) So what is the derivative, what is the slope of this function J when you change the value of b a little bit?(b값을 약간 수정했을 때의 J의 기울기는 어떻게 될 것인가? J의 미분값은 어떻게 될까?) It turns out that, using the chain rule for calculus(미적분학의 chain rule을 이용해서,), this can be written as the product of two things(다음의 두 개의 곱으로 쓸 수 있다.), is dJ/du * du/dv. 

If you change b a little bit, so b goes to 3 to 3.001. Before it affect J, it would first affect u.(J에 앞서 u가 먼저  변화된 b값의 영향을 받는다.) So how much does it affect u(u값은 얼마나 영향을 받을까)? u is defined as b * c. So when b is equal to 3, this will go from 6 to 6.002;. Because c is equal to 2, in our example here. And so this tells us that, du/db is equal to 2 because when you pump up b by .001, u increase twice as much. So du/db, this is equal to 2. 

And now, we know that u has gone up twice as much as b has gone up. Well, what is dJ/du? We have already figured out that this is equal to 3 and so by multiplying these two hosts, we find that dJ/db is equal to 6. we want to know when u goes up by .002, how does that affect J;? The fact that dJ/du is equal to 3, that tells us that when u goes up by .002, J goes up 3 times as much. So J should go up by .006, right;? That comes from a fact that dJ/du is equal to 3. 

And if you check the math in detail, you will find that, if b becomes 3.001, then u becomes 6.002, v becomes 11.002, so that is a + u, that is 5 + u. And then J, which is equal to 3 * v, that answer being equal to 33.006. Right;?
That's how you get that dJ/db is equal to 6.

this is if we go backwards, so this is db is equal to 6 and db really is the Python code variable name for the dJ/db;. 

you also compute how d_j/d_c, this turns out to be d_j/d_u * d_u/d_a and this turns out to be 9. Just turns out to be 3 * 3. 

when computing derivatives in computing all of these derivatives, the most efficient way to do is through a right to left computation following the direction of the red arrows.
 

we will first compute the derivatives respect to v 
and then that becomes useful for computing the derivative respect a and the derivative respect to u.
And then, derivative respect to u, for example, this term over here and this term over here, those, in turn, become useful for computing the derivative respect to b and the derivative respect to c

So that was a computation graph and how there is a forward or left to right calculation to compute the cost functions such as J, do you might want to optimize. 
And a backwards or a right to left calculation to compute derivatives. 


반응형