[DL] Gradient descent algorithm

Gradient discent algorithm

- Minimize cost function

- Gradient discent is used many minimization problems

- For given cost function, cost (W,b), it will find W, b to minimize cost

- It can be applied to mort general function : cost (w1, w2, ...)

How it works?

Start with initial guesses
- Start at 0,0 (or any other value)
- Keeping changing W and b a little bit to try and reduce cost(W,b)
Each time you change the parameters, you select the gradient which reduces cost(W,b) the most possible
Repeat
Do so until you converge to a local minimum
Has an interesting property
- Where you start can determine which minimum you end up
Has an interesting property
- Where you start can determine which minimum you end up

x_data = [1., 2., 3., 4.]
y_data = [1., 3., 5., 7.]

W = tf.Variable(tf.random.normal([1], -100., 100.))
# random.nomal: 정규분포를 따르는 랜덤넘버를 1개 만든다.

for step in range(300):
    hypothesis = W * X
    cost = tf.reduce_mean(tf.square(hypothesis - Y))

    alpha = 0.01
    gradient = tf.reduce_mean(tf.multiply(tf.multiply(W, X) - Y, X))
    descent = W - tf.multiply(alpha, gradient)
    W.assign(descent)
    
    if step % 10 == 0:
        print('{:5} | {:10.4f} | {:10.6f}'.format(
            step, cost.numpy(), W.numpy()[0]))

0 | 78516.1484 | -122.657616
   10 | 30189.4043 | -75.677628
   20 | 11607.8047 | -46.546268
   30 |  4463.1934 | -28.482494
   40 |  1716.0947 | -17.281507
   50 |   659.8373 | -10.335999
   60 |   253.7070 |  -6.029227
   70 |    97.5501 |  -3.358683
   80 |    37.5080 |  -1.702733
   90 |    14.4218 |  -0.675911
  100 |     5.5452 |  -0.039199
  110 |     2.1321 |   0.355613
  120 |     0.8198 |   0.600429
  130 |     0.3152 |   0.752234
  140 |     0.1212 |   0.846365
  150 |     0.0466 |   0.904734
  160 |     0.0179 |   0.940928
  170 |     0.0069 |   0.963370
  180 |     0.0026 |   0.977287
  190 |     0.0010 |   0.985916
  200 |     0.0004 |   0.991267
  210 |     0.0002 |   0.994585
  220 |     0.0001 |   0.996642
  230 |     0.0000 |   0.997918
  240 |     0.0000 |   0.998709
  250 |     0.0000 |   0.999199
  260 |     0.0000 |   0.999504
  270 |     0.0000 |   0.999692
  280 |     0.0000 |   0.999809
  290 |     0.0000 |   0.999882

'DL' 카테고리의 다른 글

[DL] Transformer: Attention Is All You Need (2017) (1)	2022.11.20

Gradient discent algorithm

How it works?

'DL' 카테고리의 다른 글

티스토리툴바