Group Study (2022-2023)/Machine Learning

[Machine Learning] 4์ฃผ์ฐจ ์Šคํ„ฐ๋”” - Application & Tips

mopipi 2022. 10. 31. 10:40

Lec 7. Application & Tips

๐Ÿ“– Learning rate

  • ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด model ์ƒ์„ฑ ์‹œ ํ•ต์‹ฌ ์š”์†Œ
    • Learning rate์™€ Gradient๋ฅผ ์ด์šฉํ•ด ์ตœ์ ์˜ ํ•™์Šต ๋ชจ๋ธ ๊ฐ’ ์ฐพ์Œ (cost๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ์ง€์ )
    • Hyper parameter (์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ ์„ธํŒ…ํ•ด ์ฃผ๋Š” ๊ฐ’) ์ค‘ ํ•˜๋‚˜
  • ์–ด๋–ค optimizer๋ฅผ ํ†ตํ•ด ์ ์šฉํ• ์ง€ ์„ ์–ธํ•  ๋•Œ learning_rate๋„ ํ•จ๊ป˜ ์ง€์ •ํ•จ [GradientDescentOptimizer(learning_rate= x.xx)]

  • ์ ์ ˆํ•˜์ง€ ๋ชปํ•œ ํฌ๊ธฐ์˜ learning rate (Stepํฌ๊ธฐ)๋Š” ์ตœ์ € ๊ฐ’์„ ์–ป๊ธฐ ์–ด๋ ค์›€
    → ๋„ˆ๋ฌด ํด ๊ฒฝ์šฐ: overshooting (High) ๋ฐœ์ƒ ๊ฐ€๋Šฅ
    → ๋„ˆ๋ฌด ์ž‘์€ ๊ฒฝ์šฐ: ์‹œ๊ฐ„ ์†Œ์š”, local minimum์—์„œ ๋ฉˆ์ถค
    ∴ ์ ์ ˆํ•œ ํฌ๊ธฐ์˜ learning rate ํ•„์š” (ํ‰๊ท ์ ์œผ๋กœ learning rate = 0.01 ๋กœ ๋งž์ถ˜ ํ›„ ์กฐ์ •ํ•ด๋‚˜๊ฐ)

๐Ÿ“– Learning rate Decay ๊ธฐ๋ฒ•

  • Cost๊ฐ€ ์ผ์ • ๊ฐ’์—์„œ ๋” ์ด์ƒ ๋ณ€๋™ ์—†์„ ๋•Œ๋งˆ๋‹ค, learning rate ๊ฐ’์„ ์กฐ์ ˆํ•ด์คŒ์œผ๋กœ์จ cost๊ฐ€ ์ตœ์†Œํ™” ๋˜๊ฒŒ๋” ํ•จ
  • ์ฃผ๋กœ ์ฒ˜์Œ ์‹œ์ž‘ํ•  ๋•Œ learning rate๊ฐ’์„ ํฌ๊ฒŒ ์ค€ ํ›„, ์ผ์ • epoch ์ฃผ๊ธฐ๋กœ ๊ฐ’์„ ๊ฐ์†Œ์‹œํ‚ด
    → ์ตœ์ ์˜ ํ•™์Šต์— ๋„๋‹ฌํ•˜๊ธฐ๊นŒ์ง€์˜ ์‹œ๊ฐ„ ๋‹จ์ถ• ๊ฐ€๋Šฅ๊ตฌํ˜„ ๋ฐฉ์‹

  1. Step decay
    : N๋ฒˆ์˜ epoch ์ฃผ๊ธฐ๋กœ learning rate ์กฐ์ ˆ
    ์ฆ‰, ํŠน์ • step(epoch ๊ตฌ๊ฐ„)๋งˆ๋‹ค ์กฐ์ •
  2. Exponential decay

: exponential ํ•˜๊ฒŒ learning rate ๊ฐ์†Œ

1000๋ฒˆ์งธ ํ•™์Šต ๋งˆ๋‹ค 0.96๋งŒํผ exponential ๋งŒํผ learning rate ์กฐ์ ˆ

  1. 1/t decays
    : ๊ฐ 1/epoch๋กœ ์กฐ์ ˆ

๐Ÿ“– Data preprocessing

- Feature Scaling ๊ธฐ๋ฒ•

๋ฐ€์ง‘ ๋ถ„ํฌ ์ง€์—ญ์„ ์ œ์™ธํ•œ ๊ณณ์— ์กด์žฌํ•˜๋Š” outlier data๋“ค์„ ์ œ๊ฑฐํ•ด์คŒ → ์ฃผ์š” ๋ฐ์ดํ„ฐ์— ์ง‘์ค‘ํ•ด์„œ ํ•™์Šต ์„ฑ๋Šฅ ํ–ฅ์ƒ

(1) Standardization(ํ‘œ์ค€ํ™”)

: ํ‰๊ท ์—์„œ๋ถ€ํ„ฐ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์กŒ๋Š”์ง€ ํŒ๋‹จ

(2) Normalization (์ •๊ทœํ™”)

: 0 ~ 1 ๋ฒ”์œ„์—์„œ data ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Œ

 

- Noisy Data

  • ์“ธ๋ชจ ์—†๋Š” ๋ฐ์ดํ„ฐ(Noisy Data)๋“ค์„ ์ œ๊ฑฐํ•ด, ํ•™์Šต์— ์œ ์šฉํ•œ data๋งŒ ๋‚จ๊ธฐ๋Š” ๊ณผ์ •์ด preprocessing
  • Numeric, NLP(์ž์—ฐ์–ด ์ฒ˜๋ฆฌ), Face Image ๋“ฑ์—์„œ ์ •ํ™•ํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ํ•„์š”

๐Ÿ“– Overfitting

์ผ์ข…์˜ ๊ณผ์ ํ•ฉ ์ƒํƒœ. ์ฃผ์–ด์ง„ ํ•™์Šต data์— ๋Œ€ํ•ด Validation, accuracy๊ฐ’์ด ๋†’๊ฒŒ ๋‚˜์˜ด์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์‹ค์ œ data์— ์ ์šฉํ•  ๋•Œ ์™ธ๋ ค ์ •ํ™•๋„ ๊ฐ์†Œ

  • High bias (underfit) : ํ•™์Šต์ด ๋œ ๋œ ์ƒํƒœ
  • High variance (overfit) : ์ฃผ์–ด์ง„ ํ•™์Šต data์— ๊ณผํ•˜๊ฒŒ ์ ํ•ฉํ•ด ๋‹ค๋ฅธ data์—๋„ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๊ธฐ ์–ด๋ ค์šด ์ƒํƒœ (๋ณ€ํ™”๋Ÿ‰์ด ๋งŽ์Œ)

  ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•

  1. ํ•™์Šต ์‹œ ๋” ๋งŽ์€ training data ์ œ๊ณต
  2. Set a features
    b. feature ์ˆ˜ ์ค„์ด๊ธฐ : ์ผ์ข…์˜ ์ฐจ์› ์ถ•์†Œ(PCA)
    c. feature ์ˆ˜ ์ฆ๊ฐ€ : fitting์ด ๋œ ๋œ ๋ชจ๋ธ์„ ๊ตฌ์ฒดํ™” ํ•˜๊ธฐ์œ„ํ•ด ์‚ฌ์šฉ

 

→ ์ ์ ˆํ•œ ์ •๋„์˜ feature๋กœ ์กฐ์ • ํ•„์š”

  1. Regularization (์ •๊ทœํ™”)
    cost ํ•จ์ˆ˜ ๋’ค์— regularization๊ด€๋ จ term ์ถ”๊ฐ€ํ•œ ํ˜•ํƒœ

  • Regularization strength (λ)
    : 0 ~ 1 ์‚ฌ์ด ์ƒ์ˆ˜ ๊ฐ’. 0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก regularization ์ค‘์š”๋„ ์ž‘์Œ

์‹ค์Šต ์ฝ”๋“œ

# ์‚ฌ์šฉํ•  ํ•™์Šต data ์„ ์–ธ : 800 ๋Œ€ data + ์ผ๋ถ€ outlier๋กœ ๊ตฌ์„ฑ
xy = np.array([[828.659973, 833.450012, 908100, 828.349976, 831.659973],
               [823.02002, 828.070007, 1828100, 821.655029, 828.070007],
               [819.929993, 824.400024, 1438100, 818.97998, 824.159973],
               [816, 820.958984, 1008100, 815.48999, 819.23999],
               [819.359985, 823, 1188100, 818.469971, 818.97998],
               [819, 823, 1198100, 816, 820.450012],
               [811.700012, 815.25, 1098100, 809.780029, 813.669983],
               [809.51001, 816.659973, 1398100, 804.539978, 809.559998]])
x_train = xy[:, 0:-1]
y_train = xy[:, [-1]]

#์ •๊ทœํ™” ํ•จ์ˆ˜ - 0 ~ 1 ์‚ฌ์ด ๊ฐ’์œผ๋กœ scaling
def normalization(data):
    numerator = data - np.min(data, 0)
    denominator = np.max(data, 0) - np.min(data, 0)
    return numerator / denominator

#ํ•™์Šต data preprocessing (์ •๊ทœํ™” ํ•จ์ˆ˜ ์‚ฌ์šฉ) ํ•ด์„œ xy๋ฐ์ดํ„ฐ ๊ฐฑ์‹ 
xy = normalization(xy)

#ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ฐฑ์‹ 
x_train = xy[:, 0:-1]
y_train = xy[:, [-1]]

์ •๊ทœํ™”ํ•œ Data๋กœ Linear Regression ๋ชจ๋ธ ์ƒ์„ฑํ•  ํ•จ์ˆ˜ ์ •์˜

# dataset ์„ ์–ธ
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(len(x_train))

# Weight, bias ๊ฐ’ ์„ ์–ธ
W = tf.Variable(tf.random.normal((4, 1)), dtype=tf.float32)
b = tf.Variable(tf.random.normal((1,)), dtype=tf.float32)

# hypothesis ์ •์˜ (y = Wx + b)
def linearReg_fn(features):
    hypothesis = tf.matmul(features, W) + b
    return hypothesis
# L2 loss ํ•จ์ˆ˜ - overfitting ํ•ด๊ฒฐ
def l2_loss(loss, beta = 0.01): #beta => ๋žŒ๋‹ค๊ฐ’
    W_reg = tf.nn.l2_loss(W) # output = sum(t ** 2) / 2
    loss = tf.reduce_mean(loss + W_reg * beta) #์ •๊ทœํ™”๋œ loss๊ฐ’ ๊ณ„์‚ฐ
    return loss

# hypothesis ๊ฒ€์ฆํ•  Cost ํ•จ์ˆ˜ ์ •์˜
def loss_fn(hypothesis, features, labels, flag = False):
    cost = tf.reduce_mean(tf.square(hypothesis - labels)) #(๊ฐ€์„ค-y๊ฐ’)์„ ์ตœ์†Œํ™”
    if(flag): #flag๋ฅผ ํ†ตํ•ด L2 loss ์ ์šฉ ์—ฌ๋ถ€ ํŒ๋ณ„
        cost = l2_loss(cost)
    return cost

 

ํ•™์Šต ์ง„ํ–‰์„ ์œ„ํ•ด Learning rate ์„ค์ • - learning decay

is_decay = True # dacay ๊ธฐ๋ฒ• ์ ์šฉ ์œ ๋ฌด
starter_learning_rate = 0.1

#์ตœ์ ์˜ learning rate ์ฐพ๊ธฐ
if(is_decay):    
    # Exponential decay ๊ธฐ๋ฒ• ์‚ฌ์šฉํ—ค epoch 50๋ฒˆ๋งˆ๋‹ค learning rate ์กฐ์ ˆ
    learning_rate = tf.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=starter_learning_rate,
                                                                  decay_steps=50,
                                                                  decay_rate=0.96,
                                                                  staircase=True)
    optimizer = tf.keras.optimizers.SGD(learning_rate)
else:
    optimizer = tf.keras.optimizers.SGD(learning_rate=starter_learning_rate)

# loss๊ฐ’(|๊ฐ€์„ค - ์‹ค์ œ๊ฐ’|)๊ตฌํ•˜๋ฉฐ l2 loss ์ ์šฉ ์—ฌ๋ถ€ ๋”ฐ์ ธ์„œ loss๊ฐ’ ๋„์ถœ
def grad(hypothesis, features, labels, l2_flag):
    with tf.GradientTape() as tape:
        loss_value = loss_fn(linearReg_fn(features),features,labels, l2_flag)
    return tape.gradient(loss_value, [W,b]), loss_value #gradient, loss๊ฐ’ ์ถœ๋ ฅ
# ํ•™์Šตํ•˜๊ธฐ
EPOCHS = 101

for step in range(EPOCHS):
    for features, labels  in dataset:
        # type ๋งž์ถ”๊ธฐ
        features = tf.cast(features, tf.float32)
        labels = tf.cast(labels, tf.float32)
        #linear regression์— ๋Œ€ํ•œ gradient๊ฐ’ ๊ตฌํ•จ + loss๊ฐ’ ํ™•์ธ
        grads, loss_value = grad(linearReg_fn(features), features, labels, False)
        optimizer.apply_gradients(grads_and_vars=zip(grads,[W,b]))        
    if step % 10 == 0:
        print("Iter: {}, Loss: {:.4f}".format(step,loss_value))

ํ•™์Šต๊ณผ์ •


๐Ÿ“–Data sets

  • ํ•™์Šต์—์„œ ์‚ฌ์šฉ๋˜๋Š” data ์ข…๋ฅ˜
    • Training data (ํ•™์Šต ๋ฐ์ดํ„ฐ)
    • Validation data (ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ)
    • Testing data (ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ)
  • Traing , Validation data ๊ตฌ์„ฑ์ด ์ค‘์š”ํ•จ

  • ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•œ ํ›„์ฒ˜๋ฆฌ ํ•„์š”
    • hyperparameter ์„ค์ •, layer ๊ตฌ์„ฑ์„ ํ†ตํ•ด ํ–ฅ์ƒ ๊ฐ€๋Šฅ
  • ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ๋ฐ˜๋ณต์ ์œผ๋กœ ํ…Œ์ŠคํŠธ ํ›„ ๋ชจ๋ธ ์ƒ์„ฑ ๊ณผ์ • ๋ฐ˜๋ณต ์ˆ˜ํ–‰ํ•˜๋ฉฐ ์„ฑ๋Šฅ ๋†’์ž„

Evaluating a hypothesis

: ๋ชจ๋ธ์ด ์„ ํƒ ๋œ ํ›„ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ ํ…Œ์ŠคํŠธ ๊ฐ€๋Šฅ

  • ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ + ํ…Œ์ŠคํŠธ์šฉ ๋ฐ์ดํ„ฐ๋กœ ๊ตฌ์„ฑ
  • x๊ฐ’ = y๊ฐ’ ๊ฐ™์€์ง€ ํ™•์ธํ•˜๋Š” ๊ณผ์ • ํ•„์š”

Anomaly detection

Healthy Data๋กœ ํ•™์Šต ํ›„ ๋ชจ๋ธ ์ƒ์„ฑ → Unseen Data๋กœ model์„ ๋Œ๋ฆฌ๋ฉฐ ์ด์ƒ ์ผ€์ด์Šค ๋ฐœ์ƒ ๊ฐ์ง€


๐Ÿ“–Learning ๋ฐฉ๋ฒ•

Online Learning

  • ๋ฐ์ดํ„ฐ๊ฐ€ ์ธํ„ฐ๋„ท์— ์—ฐ๊ฒฐ๋œ ์ƒํƒœ. ์ง€์†์ ์œผ๋กœ ๋ณ€๊ฒฝ๋˜๋Š” ์ƒํ™ฉ์—์„œ ํ•™์Šต ์ง„ํ–‰
  • ์†๋„ ์ค‘์š”

Batch Learning (Offline)

  • ๋ฐ์ดํ„ฐ๊ฐ€ ์ธํ„ฐ๋„ท์— ์—ฐ๊ฒฐ๋˜์ง€ ์•Š์€ ์ƒํƒœ(static). ๋ฐ์ดํ„ฐ๊ฐ€ ๋ณ€ํ•˜์ง€ ์•Š๋Š” ์ƒํ™ฉ์—์„œ ํ•™์Šต ์ง„ํ–‰
  • ์ •ํ™•๋„ ์ค‘์š”

Fine Tunning

  • ๊ธฐ์กด Data(A)๋ฅผ ๊ฐ€์ง€๊ณ  ํ•™์Šต ์‹œํ‚จ ๋ชจ๋ธ์— ๋Œ€ํ•ด ๊ธฐ์กด weight ๊ฐ’ ๋ฏธ์„ธํ•˜๊ฒŒ ์กฐ์ ˆ
    → ์ƒˆ๋กœ์šด ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ Data๋กœ ๋‹ค์‹œ ํ•œ๋ฒˆ ๊ฐ€์ค‘์น˜ ์„ธ๋ฐ€ํ•˜๊ฒŒ ์กฐ์ •ํ•˜๋ฉฐ ํ•™์Šต (๊ธฐ์กด ๋ฐ์ดํ„ฐ๋Š” ๊ธฐ์กด๋Œ€๋กœ ๋ถ„๋ฅ˜ ์œ ์ง€)

Feature Extraction

  • ๊ธฐ์กด Data(A)๋ฅผ ๊ฐ€์ง€๊ณ  ํ•™์Šต ์‹œํ‚จ ๋ชจ๋ธ์— ๋Œ€ํ•ด, ์ƒˆ๋กœ์šด Data (B, C)๋ฅผ ๋ณ„๋„์˜ Task ์ทจ๊ธ‰ํ•ด ์ด๊ฒƒ๋“ค์— ๋Œ€ํ•ด์„œ๋งŒ ์ƒˆ๋กญ๊ฒŒ ํ•™์Šต์‹œํ‚ด
    (→ ๊ธฐ์กด weight ์กฐ์ ˆ X, ์ƒˆ๋กœ์šด layer ์ถ”๊ฐ€ ํ›„ ์ด๊ฒƒ์„ ํ•™์Šตํ•˜๊ณ  ์ตœ์ข…๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋„๋ก ํ•จ)

 

Efficient Models - ํšจ๊ณผ์ ์ธ ๋ชจ๋ธ ์ƒ์„ฑ

๊ฒฐ๊ณผ์ ์œผ๋กœ inference time ์ž์ฒด๋ฅผ ์ตœ์†Œํ™” ํ•˜๋Š” ๊ฒƒ์ด ํ•„์š” (๋งŽ์€ ์–‘์˜ data ์ฒ˜๋ฆฌ๋Š” ํ•„์—ฐ์ ์ด๋ฏ€๋กœ)
→ inference์— ์˜ํ–ฅ ๋ฏธ์น˜๋Š” weight ๊ฐ’ ์ค„์—ฌ์•ผ ํ•จ


์‹ค์Šต ์ฝ”๋“œ

(1) Fashion MNIST - Image Classification

# keras ์ œ๊ณต ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ Fashion MNIST ์‚ฌ์šฉ
fashion_mnist = tf.keras.datasets.fashion_mnist
# train data, test dat ๊ฐ€์ ธ์™€ set์œผ๋กœ ์ €์žฅ
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
#class 10๊ฐ€์ง€ ์กด์žฌ (0 ~ 9์‚ฌ์ด ๊ฐ’๊ณผ ๋งค์นญ)
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# 0 ~ 1์‚ฌ์ด๋กœ ์ •๊ทœํ™”
train_images = train_images / 255.0
test_images = test_images / 255.0

# keras์˜ ๋ชจ๋ธ ์ •์˜
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)), #์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ ๋งž์ถฐ์คŒ
    keras.layers.Dense(128, activation=tf.nn.relu), #128๊ฐœ์˜ layer๋กœ ์„ ์–ธ
    keras.layers.Dense(10, activation=tf.nn.softmax) #10๊ฐœ์˜ class๋กœ ๊ตฌ๋ถ„
])

#์ •์˜ํ•œ ๋ชจ๋ธ ์ปดํŒŒ์ผ - optimizer์„ ์–ธ, loss๊ฐ’ ์ •์˜, ์ •ํ™•๋„ ์ธก์ • ๋ฐฉ์‹ ๊ฒฐ์ •
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
#๋ชจ๋ธ ํ›ˆ๋ จ (5๋ฒˆ)
model.fit(train_images, train_labels, epochs=5)
#test model์˜ loss, acc ๊ฐ’ ๊ตฌํ•จ
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

ํ•™์Šต ๊ฒฐ๊ณผ : ์ •ํ™•๋„ = 0.87

(2) IMDB - Text Classification

imdb = keras.datasets.imdb
#ํ•™์Šต ์œ„ํ•œ data ๊ตฌ์„ฑ (train, test)
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

word_index = imdb.get_word_index()

# data ์ค‘ ์ž์—ฐ์—ฌ ์ „์ฒ˜๋ฆฌ ๊ณผ์ • (๊ธฐ์กด ์ด๋ฏธ์ง€๋Š” 255(ํฌ๊ธฐ)๋กœ ๋‚˜๋ˆ  ์ •๊ทœํ™”)
word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0 #๊ณต๋ฐฑ์— ๋Œ€ํ•œ ๊ฐ’
word_index["<START>"] = 1 #์‹œ์ž‘ ๊ฐ’
word_index["<UNK>"] = 2  # unknown - ๋ชจ๋ฅด๋Š” ๋‹จ์–ด
word_index["<UNUSED>"] = 3 #์‚ฌ์šฉ๋˜์ง€ ์•Š์€ ๊ฐ’ ์ •์˜

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])
    
# train, test data ๊ธธ์ด 256์œผ๋กœ ๋งž์ถฐ์คŒ (๋’ค์— 0์œผ๋กœ ์ฑ„์›€)
train_data = keras.preprocessing.sequence.pad_sequences(train_data,
                                                        value=word_index["<PAD>"],
                                                        padding='post',
                                                        maxlen=256)

test_data = keras.preprocessing.sequence.pad_sequences(test_data,
                                                       value=word_index["<PAD>"],
                                                       padding='post',
                                                       maxlen=256)
# input shape is the vocabulary count used for the movie reviews (10,000 words)
vocab_size = 10000

#๋ชจ๋ธ ์„ ์–ธ
model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation=tf.nn.relu))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))

#์ •์˜ํ•œ ๋ชจ๋ธ ์ปดํŒŒ์ผ - optimizer์„ ์–ธ, loss๊ฐ’ ์ •์˜, ์ •ํ™•๋„ ์ธก์ • ๋ฐฉ์‹ ๊ฒฐ์ •
model.compile(optimizer='adam',loss='binary_crossentropy', metrics=['accuracy'])

#๋ชจ๋ธ ํ‰๊ฐ€ํ•  test ๋ฐ์ดํ„ฐ ์ •์˜ ๋ฐ ๋ชจ๋ธ ํ›ˆ๋ จ
x_val = train_data[:10000]
partial_x_train = train_data[10000:]

y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

history = model.fit(partial_x_train,partial_y_train, 
						epochs=40, batch_size=512, 	
						validation_data=(x_val, y_val),verbose=1)
                    
results = model.evaluate(test_data, test_labels)
print(results)

40 epoch ํ•™์Šต ๊ฒฐ๊ณผ
Test data๋กœ ์‹คํ—˜ํ•œ ๊ฒฐ๊ณผ ์ •ํ™•๋„ - 0.86