{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Lecture4.ipynb",
      "provenance": [],
      "private_outputs": true,
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Bq_pFqPuN-ty"
      },
      "source": [
        "# Lecture 4: Intro to Neural Networks\n",
        "In this lecture, we will begin examining how a neural network operates and how it is trained. To do this, we will build a perceptron that classifies the Iris dataset."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tL6UPHfQN-tz"
      },
      "source": [
        "![neuron](https://appliedgo.net/media/perceptron/neuron.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lsgN5nrbN-t0"
      },
      "source": [
        "The simplest form a neural network is called the __Perceptron__. A Perceptron is essentially a single neuron that performs __Binary Classification__, that is classification that only has two outputs."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7D4gg2wjN-t1"
      },
      "source": [
        "![perceptron](https://cdn-images-1.medium.com/max/1600/1*n6sJ4yZQzwKL9wnF5wnVNg.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sO5dKXARN-t2"
      },
      "source": [
        "Our goal is to learn the weights in the green boxes above, the hypothesis being that there is some linear relationship between the inputs and the desired output. Let's express the image above as an equation.\n",
        "\n",
        "$f(x) = \\text{sign}(\\boldsymbol{w}\\boldsymbol{x} + b)$\n",
        "\n",
        "Rembering that\n",
        "$\\boldsymbol{w}\\boldsymbol{x} = \\sum\\limits_{j=1}^d w[j]\\cdot x[j]$\n",
        "\n",
        "In the case of the image above, $b$ is equivalent to $w_0$"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "i9Jx_VwHN-t2"
      },
      "source": [
        "This is all well and good, but how are we going to figure out the values of $\\boldsymbol{w}$? Because perceptrons perform binary classification, it is very obvious when they make a mistake. Like a person does, let's learn from those mistakes!\n",
        "\n",
        "Lets consider the simplest case of perceptron, a single input value $x$ and a corresponding single weight $w$\n",
        "\n",
        "$y = \\text{sign}(w \\cdot x + b)$\n",
        "\n",
        "If we were to make a classification mistake, we'd want to change $w$ and $b$ so that next time, we wouldnt make that mistake. Given the simplicity of the equation representing our perceptron, we can use the derivative with respect to the output to figure out how to change $w$ and $b$\n",
        "\n",
        "$\\frac{dy}{dw} = x$\n",
        "\n",
        "$\\frac{dy}{db} = 1$\n",
        "\n",
        "If we should have guessed 1, but guessed -1 instead, then we want to change $w$ to make $y$ more positive. $\\frac{dy}{dw}$ tells us how to do that. We see that by increasing $w$ by 1, we can increase the value of $y$ by $x$. Similarly, increasing $b$ by 1 increases $y$ by 1. So we should update $w$ and $b$ by\n",
        "\n",
        "$w \\leftarrow w + x$\n",
        "\n",
        "$b \\leftarrow b + 1$\n",
        "\n",
        "If we instead guessed -1 when we should have guessed 1 we similarly want $\\frac{dy}{dw}$ to be negative\n",
        "\n",
        "$w \\leftarrow w - x$\n",
        "\n",
        "$b \\leftarrow b - 1$\n",
        "\n",
        "We can consolidate these two equations by defining the correct output to be $y_n$, then we can simply write\n",
        "\n",
        "$w \\leftarrow w + y_n\\cdot x$\n",
        "\n",
        "$b \\leftarrow b + y_n$"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2e1mEVtDN-t3"
      },
      "source": [
        "And thats it! This is the basic form of __Gradient Descent__, a fundamental ML tool. If we iterate through the training data and update the weight and bias, we will eventually have a trained perceptron. Fortunately, this same principle applies to perceptrons with multiple inputs as well! We simply use vector operations for $w$ and $x$.\n",
        "\n",
        "![grad_descent](https://cdn-images-1.medium.com/max/500/1*9sd4Ve9DH-k4EcNba5fGTA.jpeg)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "caBSD0BjN-t4"
      },
      "source": [
        "Let's see if we can apply a perceptron algorithm to the Iris dataset introduced in the previous lecture"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "e6YxoYvlN-t5"
      },
      "source": [
        "# start by importing some of the modules we're going to use for this lecture\n",
        "import tensorflow as tf\n",
        "import matplotlib.pyplot as plt\n",
        "import matplotlib.image as mpimg\n",
        "from matplotlib.pyplot import imshow\n",
        "import numpy as np\n",
        "%matplotlib inline\n",
        "tf.enable_eager_execution()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9GWBYN6jN-t7"
      },
      "source": [
        "import matplotlib.pyplot as plt\n",
        "from mpl_toolkits.mplot3d import Axes3D\n",
        "from sklearn import datasets\n",
        "from sklearn.decomposition import PCA\n",
        "from sklearn.utils import shuffle\n",
        "\n",
        "iris = datasets.load_iris()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nujz3f_FN-t9"
      },
      "source": [
        "print(iris.DESCR)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-wRgZWtFN-uB"
      },
      "source": [
        "We want to perform binary classification, but Iris normally has 3 possible output classes. Let's drop the third."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NqMuy9iDN-uB"
      },
      "source": [
        "iris_data = iris.data[:100]\n",
        "iris_labels = iris.target[:100]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CkiJK0u2N-uD"
      },
      "source": [
        "iris_labels"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WKNlIFR8N-uG"
      },
      "source": [
        "Great! Just like before lets shuffle this and split it into training and validation sets."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jSFm6WyON-uG"
      },
      "source": [
        "iris_data, iris_labels = shuffle(iris_data, iris_labels)\n",
        "train_data = iris_data[:80]\n",
        "val_data = iris_data[80:]\n",
        "train_labels = iris_labels[:80]\n",
        "val_labels = iris_labels[80:]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EIrnK_t9N-uI"
      },
      "source": [
        "Finally, lets remove all but the first feature"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lBa5vYi8N-uI"
      },
      "source": [
        "redux_feature = 1\n",
        "\n",
        "train_data_redux = train_data[:,redux_feature]\n",
        "val_data_redux = val_data[:,redux_feature]\n",
        "\n",
        "# now lets plot our input feature versus the output for the training set\n",
        "plt.scatter(train_data_redux, train_labels, c=train_labels)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "x-wUSo9eN-uL"
      },
      "source": [
        "Great! This looks like it should be linearlly seperable (what the perceptron specializes in. Lets go ahead and implement it."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "inM6ecAUN-uL"
      },
      "source": [
        "# initialize learnable parameters\n",
        "weight = 0\n",
        "bias = 0\n",
        "\n",
        "# iterate through all the training data\n",
        "for epoch in range(5):\n",
        "  for i, data in enumerate(train_data_redux):\n",
        "      # make our prediction\n",
        "      guess = np.sign(weight*data + bias)\n",
        "      correct = train_labels[i]\n",
        "      # perceptron wants labels to be -1 or 1, if correct is 0, switch it to -1\n",
        "      if correct == 0:\n",
        "          correct = -1\n",
        "      # check if our guess is correct, update our parameters if we were wrong\n",
        "      if guess != correct:\n",
        "          weight = weight + 0.1*correct*data\n",
        "          bias = bias + 0.1*correct\n",
        "        \n",
        "print(weight)\n",
        "print(bias)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Ah76UVZVN-uO"
      },
      "source": [
        "# define a function that uses our learned values to make a new prediction\n",
        "def predict(input, weight, bias):\n",
        "    guess = np.sign(weight*input + bias)\n",
        "    # for comparison to the dataset, we want labels to be 0 or 1\n",
        "    if guess == -1:\n",
        "        guess = 0\n",
        "    return guess"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "scrolled": true,
        "id": "3NSOjM5jN-uP"
      },
      "source": [
        "# lets check how our model learned!\n",
        "total_correct = 0\n",
        "for i, data in enumerate(val_data_redux):\n",
        "    guess = predict(data, weight, bias)\n",
        "    if guess != val_labels[i]:\n",
        "        print(\"Incorrect\")\n",
        "    else:\n",
        "        print(\"Correct\")\n",
        "        total_correct += 1\n",
        "print(\"Accuracy: %f%%\" % (100*total_correct/len(val_labels)))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5d786NAHN-uS"
      },
      "source": [
        "Great! If you were using feature 3, chances are your network did very well."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NBHc_HltN-uT"
      },
      "source": [
        "## Activity: Perceptron Sandbox\n",
        "What happens if we try to learn a perceptron model using other features? Go back to the redux_features variable and change it. What is the result? Why do you think this happens? Do you have any ideas for improving the accuracy? If you look at the results of other tables, you might notice they have different accuracy despite everything else seeming similar. Can you explain why that's happening?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "otPSbi1uN-uT"
      },
      "source": [
        "## Perceptron With Multiple Features\n",
        "Now that we looked at the simple case, let's try training a perceptron that uses all 4 of Iris' input features."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3wdjkxTFN-uT"
      },
      "source": [
        "# initialize a weights vector and a bias for our new 4 features perceptron\n",
        "weights = np.zeros(train_data[0].shape)\n",
        "bias = 0"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "EVbWy5i3N-uX"
      },
      "source": [
        "# iterate through all the training data\n",
        "for i, data in enumerate(train_data):\n",
        "    output = 0\n",
        "    # note that now data is a vector of features\n",
        "    # this means we'll have to build up the output from each feature\n",
        "    for j, feature in enumerate(data):\n",
        "        output += feature * weights[j]\n",
        "    # add the bias after all features are accumulated\n",
        "    output += bias\n",
        "    # finally, apply the sign to get our guess\n",
        "    guess = np.sign(output)\n",
        "    \n",
        "    # like before, we want to compare to the correct label\n",
        "    correct = train_labels[i]\n",
        "    # if correct is 0, we want to make it -1 for proper training\n",
        "    if correct == 0:\n",
        "        correct = -1\n",
        "    \n",
        "    # check if we got the correct answer, update our parameters if we did not\n",
        "    if guess != correct:\n",
        "        # now apply the updates\n",
        "        for j in range(len(data)):\n",
        "            weights[j] = weights[j] + correct*data[j]\n",
        "        bias = bias + correct\n",
        "\n",
        "print(weights)\n",
        "print(bias)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aJTuHqT6N-ub"
      },
      "source": [
        "Cool! We learned a full iris model! Notice that this barely looks different from the case when we learned 1 feature, there are just a couple of added for loops. Let's see how accurate it is."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4KDRUBm5N-uc"
      },
      "source": [
        "def full_predict(input, weights, bias):\n",
        "    # again, iterate through features and accumulate\n",
        "    output = 0\n",
        "    for i, feature in enumerate(input):\n",
        "        output += feature*weights[i]\n",
        "    # add the bias\n",
        "    output += bias\n",
        "    # take the sign to find the guess\n",
        "    guess = np.sign(output)\n",
        "    # map guesses of -1 to 0 for compatibility with iris\n",
        "    if guess == -1:\n",
        "        guess = 0\n",
        "    return guess"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "LGe8okMlN-uf"
      },
      "source": [
        "# lets check how our model learned!\n",
        "total_correct = 0\n",
        "for i, data in enumerate(val_data):\n",
        "    guess = full_predict(data, weights, bias)\n",
        "    if guess != val_labels[i]:\n",
        "        print(\"Incorrect\")\n",
        "    else:\n",
        "        print(\"Correct\")\n",
        "        total_correct += 1\n",
        "print(\"Accuracy: %f%%\" % (100*total_correct/len(val_labels)))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kBNEM7o0N-uh"
      },
      "source": [
        "Woohoo! If things went well, this model got 100% accuracy on Iris, that's much better than the KNN algorithm we looked at."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "W8-FOpwON-uh"
      },
      "source": [
        "## Activity: Stump the Perceptron\n",
        "Do you think Perceptron is always better than KNN? Are there any cases where the Perceptron doesn't work at all? We're going to make our own toy datasets and see how learnable they are. Before you start, let's look at an example and some utility functions."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "mAWzBlceN-uh"
      },
      "source": [
        "# as an example, generate some random two dimensional data\n",
        "\n",
        "def my_data_generator(size):\n",
        "    datalist = []\n",
        "    labels = []\n",
        "    for i in range(size):\n",
        "        # generate a label thats 1 or -1\n",
        "        \n",
        "        # now based on that label, apply some function\n",
        "        x = np.random.randn(1)[0]\n",
        "        y = np.random.randn(1)[0]\n",
        "        if x**2 + y**2 < 0.5:\n",
        "          label = -1\n",
        "        else:\n",
        "          label = 1\n",
        "        # finally, lets add some noise to make it more interesting\n",
        "        #x += np.random.randn()\n",
        "        #y += np.random.randn()\n",
        "        datalist.append([x, y])\n",
        "        labels.append(label)\n",
        "    return np.asarray(datalist), np.asarray(labels)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YxdPTrS4N-uj"
      },
      "source": [
        "data, target = my_data_generator(10)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3S29eRBDN-ul"
      },
      "source": [
        "plt.scatter(data[:,0], data[:,1], c=target)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "r1s0w5fMN-un"
      },
      "source": [
        "Neat, this looks pretty seperable so we'd expect a perceptron to do well. Lets make a training and validation set too\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "scrolled": true,
        "id": "aPxNav9nN-un"
      },
      "source": [
        "train_data, train_labels = my_data_generator(100)\n",
        "val_data, val_labels = my_data_generator(200)\n",
        "plt.scatter(train_data[:,0], train_data[:,1], c=train_labels)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CwuNKziYN-up"
      },
      "source": [
        "plt.scatter(val_data[:,0], val_data[:,1], c=val_labels)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-Jky7EmoN-ur"
      },
      "source": [
        "# define a function that trains a perceptron\n",
        "def train_perceptron(train_data, train_labels):\n",
        "    num_features = train_data.shape[1]\n",
        "    # initialize learnable parameters\n",
        "    weights = np.zeros(num_features)\n",
        "    bias = 0\n",
        "    \n",
        "    for i, data in enumerate(train_data):\n",
        "        # compute the output by summing features\n",
        "        output = 0\n",
        "        for j in range(num_features):\n",
        "            output += data[j]*weights[j]\n",
        "        output += bias\n",
        "        # apply sign to get our guess\n",
        "        guess = np.sign(output)\n",
        "        correct = train_labels[i]\n",
        "        \n",
        "        if guess != correct:\n",
        "            #update parameters if guess was wrong\n",
        "            for j in range(num_features):\n",
        "                weights[j] = weights[j] + correct*data[j]\n",
        "            bias = bias + correct\n",
        "            \n",
        "    return weights, bias           "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "H5C64eZAN-ut"
      },
      "source": [
        "# train a new perceptron using our data\n",
        "weights, bias = train_perceptron(train_data, train_labels)\n",
        "print(weights)\n",
        "print(bias)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "O_gnxEp_N-uu"
      },
      "source": [
        "# now define a function to test the perceptron\n",
        "def test_perceptron(weights, bias, val_data, val_labels):\n",
        "    num_features = val_data.shape[1]\n",
        "    total_correct = 0\n",
        "    for i, data in enumerate(val_data):\n",
        "        output = 0\n",
        "        for j in range(num_features):\n",
        "            output += weights[j] * data[j]\n",
        "        output += bias\n",
        "        guess = np.sign(output)\n",
        "        correct = val_labels[i]\n",
        "        \n",
        "        if guess == correct:\n",
        "            total_correct += 1\n",
        "    return total_correct / val_data.shape[0]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "7d346iAUN-uv"
      },
      "source": [
        "test_perceptron(weights, bias, val_data, val_labels)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6mAk5Kh4N-uz"
      },
      "source": [
        "Not bad! Our perceptron here does an alright job, although not perfect. Here are a few things to try:\n",
        "* Make a dataset that the perceptron can learn with very high accuracy\n",
        "* Make a dataset that the perceptron can not learn\n",
        "* Make a dataset that a person can learn, but the perceptron cannot\n",
        "* For the last dataset, what changes could you make to help?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "X5zATUXHN-uz"
      },
      "source": [
        "## Comparison to KNN\n",
        "Now, lets try classifying your dataset with KNN, and see how it goes."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "kVw8mmrQN-u0"
      },
      "source": [
        "# lets define some utility functions to make things cleaner\n",
        "\n",
        "def euclid_distance(a, b):\n",
        "    return np.linalg.norm(a - b)\n",
        "\n",
        "def get_neighbors(distances, train_labels, k=5):\n",
        "    _, closest_points = tf.nn.top_k(-distances, k=5)\n",
        "    closest_points = closest_points.numpy()\n",
        "    return [train_labels[point] for point in closest_points]\n",
        "\n",
        "def best_guess(labels):\n",
        "    labels = np.asarray(labels)\n",
        "    return np.argmax(np.bincount(labels+1))      "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "PglvBeY8N-u2"
      },
      "source": [
        "# implement KNN the same way as the previous lecture\n",
        "def KNN_Accuracy(train_data, train_labels, val_data, val_labels, k=5):\n",
        "    total_correct = 0\n",
        "    # iterate through validation samples\n",
        "    for i, sample in enumerate(val_data):\n",
        "        # compute distances to all points in the training set\n",
        "        distances = np.asarray([euclid_distance(sample, neighbor) for neighbor in train_data])\n",
        "        # find closest k neighbors\n",
        "        knn = get_neighbors(distances, train_labels, k)\n",
        "        # determine which label is best\n",
        "        guess = best_guess(knn) - 1\n",
        "        # check if we got the right answer\n",
        "        if guess == val_labels[i]:\n",
        "            total_correct += 1\n",
        "    print(\"Accuracy: %f%%\" % (100*total_correct / val_data.shape[0]))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "LVtOTTbwN-u3"
      },
      "source": [
        "KNN_Accuracy(train_data, train_labels, val_data, val_labels)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JlPl4mLTN-u5"
      },
      "source": [
        "Interesting! KNN actually does better in this case. Try to figure out why. Can you consistently create a dataset where KNN outperforms the perceptron? How about vice-versa?"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MsDFcs9hN-u6"
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}