{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", " \n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Natural Language Processing Project\n", "\n", "Welcome to the NLP Project for this section of the course. In this NLP project you will be attempting to classify Yelp Reviews into 1 star or 5 star categories based off the text content in the reviews. This will be a simpler procedure than the lecture, since we will utilize the pipeline methods for more complex tasks.\n", "\n", "We will use the [Yelp Review Data Set from Kaggle](https://www.kaggle.com/c/yelp-recsys-2013).\n", "\n", "Each observation in this dataset is a review of a particular business by a particular user.\n", "\n", "The \"stars\" column is the number of stars (1 through 5) assigned by the reviewer to the business. (Higher stars is better.) In other words, it is the rating of the business by the person who wrote the review.\n", "\n", "The \"cool\" column is the number of \"cool\" votes this review received from other Yelp users. \n", "\n", "All reviews start with 0 \"cool\" votes, and there is no limit to how many \"cool\" votes a review can receive. In other words, it is a rating of the review itself, not a rating of the business.\n", "\n", "The \"useful\" and \"funny\" columns are similar to the \"cool\" column.\n", "\n", "Let's get started! Just follow the directions below!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports\n", " **Import the usual suspects. :) **" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Data\n", "\n", "**Read the yelp.csv file and set it as a dataframe called yelp.**" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": true }, "outputs": [], "source": [ "yelp = pd.read_csv('yelp.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Check the head, info , and describe methods on yelp.**" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
business_iddatereview_idstarstexttypeuser_idcoolusefulfunny
09yKzy9PApeiPPOUJEtnvkg2011-01-26fWKvX83p0-ka4JS3dc6E5A5My wife took me here on my birthday for breakf...reviewrLtl8ZkDX5vH5nAx9C3q5Q250
1ZRJwVLyzEJq1VAihDhYiow2011-07-27IjZ33sJrzXqU-0X6U8NwyA5I have no idea why some people give bad review...review0a2KyEL0d3Yb1V6aivbIuQ000
26oRAC4uyJCsJl1X0WZpVSA2012-06-14IESLBzqUCLdSzSqm0eCSxQ4love the gyro plate. Rice is so good and I als...review0hT2KtfLiobPvh6cDC8JQg010
3_1QQZuf4zZOyFCvXc0o6Vg2010-05-27G-WvGaISbqqaMHlNnByodA5Rosie, Dakota, and I LOVE Chaparral Dog Park!!...reviewuZetl9T0NcROGOyFfughhg120
46ozycU1RpktNG2-1BroVtw2012-01-051uJFq2r5QfJG_6ExMRCaGw5General Manager Scott Petello is a good egg!!!...reviewvYmM4KTsC8ZfQBg-j5MWkw000
\n", "
" ], "text/plain": [ " business_id date review_id stars \\\n", "0 9yKzy9PApeiPPOUJEtnvkg 2011-01-26 fWKvX83p0-ka4JS3dc6E5A 5 \n", "1 ZRJwVLyzEJq1VAihDhYiow 2011-07-27 IjZ33sJrzXqU-0X6U8NwyA 5 \n", "2 6oRAC4uyJCsJl1X0WZpVSA 2012-06-14 IESLBzqUCLdSzSqm0eCSxQ 4 \n", "3 _1QQZuf4zZOyFCvXc0o6Vg 2010-05-27 G-WvGaISbqqaMHlNnByodA 5 \n", "4 6ozycU1RpktNG2-1BroVtw 2012-01-05 1uJFq2r5QfJG_6ExMRCaGw 5 \n", "\n", " text type \\\n", "0 My wife took me here on my birthday for breakf... review \n", "1 I have no idea why some people give bad review... review \n", "2 love the gyro plate. Rice is so good and I als... review \n", "3 Rosie, Dakota, and I LOVE Chaparral Dog Park!!... review \n", "4 General Manager Scott Petello is a good egg!!!... review \n", "\n", " user_id cool useful funny \n", "0 rLtl8ZkDX5vH5nAx9C3q5Q 2 5 0 \n", "1 0a2KyEL0d3Yb1V6aivbIuQ 0 0 0 \n", "2 0hT2KtfLiobPvh6cDC8JQg 0 1 0 \n", "3 uZetl9T0NcROGOyFfughhg 1 2 0 \n", "4 vYmM4KTsC8ZfQBg-j5MWkw 0 0 0 " ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yelp.head()" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 10000 entries, 0 to 9999\n", "Data columns (total 10 columns):\n", "business_id 10000 non-null object\n", "date 10000 non-null object\n", "review_id 10000 non-null object\n", "stars 10000 non-null int64\n", "text 10000 non-null object\n", "type 10000 non-null object\n", "user_id 10000 non-null object\n", "cool 10000 non-null int64\n", "useful 10000 non-null int64\n", "funny 10000 non-null int64\n", "dtypes: int64(4), object(6)\n", "memory usage: 781.3+ KB\n" ] } ], "source": [ "yelp.info()" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
starscoolusefulfunny
count10000.00000010000.00000010000.00000010000.000000
mean3.7775000.8768001.4093000.701300
std1.2146362.0678612.3366471.907942
min1.0000000.0000000.0000000.000000
25%3.0000000.0000000.0000000.000000
50%4.0000000.0000001.0000000.000000
75%5.0000001.0000002.0000001.000000
max5.00000077.00000076.00000057.000000
\n", "
" ], "text/plain": [ " stars cool useful funny\n", "count 10000.000000 10000.000000 10000.000000 10000.000000\n", "mean 3.777500 0.876800 1.409300 0.701300\n", "std 1.214636 2.067861 2.336647 1.907942\n", "min 1.000000 0.000000 0.000000 0.000000\n", "25% 3.000000 0.000000 0.000000 0.000000\n", "50% 4.000000 0.000000 1.000000 0.000000\n", "75% 5.000000 1.000000 2.000000 1.000000\n", "max 5.000000 77.000000 76.000000 57.000000" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yelp.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Create a new column called \"text length\" which is the number of words in the text column.**" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": true }, "outputs": [], "source": [ "yelp['text length'] = yelp['text'].apply(len)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# EDA\n", "\n", "Let's explore the data\n", "\n", "## Imports\n", "\n", "**Import the data visualization libraries if you haven't done so already.**" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "sns.set_style('white')\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Use FacetGrid from the seaborn library to create a grid of 5 histograms of text length based off of the star ratings. Reference the seaborn documentation for hints on this**" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABDAAAADSCAYAAAC8VzCMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuYXXV97/H35AYJTAKoyTkKAo3NFy+VShEMxAAtVKC2\nlscWT6mKSoKllGpbUUTheIvhtEoFW/EIsVDw0haLtc0Bq2AlkVa5eTAVvwk0gJbKPZlAEpLJTP9Y\nK7IdJ8zO2mtmr5l5v56HZ2avvfZvfdcwn+yZ7/zWb/UMDg4iSZIkSZLUZFO6XYAkSZIkSdJIbGBI\nkiRJkqTGs4EhSZIkSZIazwaGJEmSJElqPBsYkiRJkiSp8WxgSJIkSZKkxpvW7QLUuYj4APC1zPxW\nl44/Hbge+FBm3tyNGqROdDNDEXEmcA4wANwGvD0z+8e6DqmqLufnLOAPgEFgZWa+Z6xrkDrV7Z/j\nyhrOBn4rM4/rVg1SFV1+D/oscDTwVLnpg5n5D2Ndx2TjDIyJ4RhgajcOHBELgG8AC7txfKkmXclQ\nRPw88CfAqzLz0LKGs8e6DqlD3crPQcAfAYcDvwAcHRHHj3UdUg269nMcQES8BDiPohEojTfdzM/h\nwOLMPKz8z+bFGHAGxjgSES8APgfMovhr7TuABRThuSIiTgGeC3wEmAnsC7w7M78UEX8FPAeYD7wb\nOBY4HtgBfCUzPzTkWB8BTh5Swucy8+NDtr0N+FPgnTWdpjRqGpihp4GzMnNn5/57wAvrOVupXk3L\nT2beFxEvzswdEfEcYA6wod6zlurTtAyV+80APg28Hzi9tpOVata0/ETELIqf2S6PiAOB6zLzA3We\ns4bnDIzx5QzgHzPzCOA9wNGZeTXFtPMzMvPfKf56e0ZmHg4sAS5sef2jmflSil+STsrMV1BMe3pR\n+Qb2E5n5/pZu4s7/hjYvyMzzMvMrQM8onK9Ut0ZlKDMfyMybACLieRRT4b88Gicu1aBR+Sn32xER\nS4B7gQeB79Z+1lJ9GpchYDlwBXBfvacq1a5p+ZkH3Ai8FTgSeHVEnFH7WetnOANjfPk68KWIOAxY\nCfxFy3M7GwhvAl4bEacCrwL2btnn2+XH/wQ2R8Rq4J+A92fmttYDlZ3HX2vZNMjwMzCk8aSRGSr/\nqvD/gMszc1XVk5NGWSPzk5lXlNchXwl8gOIvyVITNSpDEXEC8MLM/JOIOLbTk5NGWaPyk5nrgde3\nvOaT5fFXVD5DtcUGxjiSmbeU1ym+FngD8BbgV4fstpqiG/gv5cfPtTy3pRxnR0S8ClhMMT3q3yJi\ncWbe03Ks9+MPgZpgmpihiDiEYhHcSzLzE9XOTBp9TctPROxP8cvXLZk5EBFfBH6v+hlKo6tpGQL+\nF/CSiLgD6AXmRcQXMvN3qp2hNHqalp+IeBmwIDP/vtzUA2yvdHLaLV5CMo5ExP8B3lxOlzoHeEX5\nVD8wLSL2BV4EXJiZNwCvYZhFbSLiF4FvAjdn5ruB7wMxBqcgdVXTMhQRewNfBd5n80JN17T8UKx5\n8bmImB0RPcBvUfzwKjVS0zKUmWdk5ksz8zCK6fa32bxQUzUtPxQNiz+PiDnlHRnPBK6rMI52kw2M\n8eWTwOsj4k7gSzzzl6YbKBZgCorrGL8fEbdTLGQzMyJm0rKydGZ+F7gF+PeIuA1YT/EX4E64crXG\ng6ZlaAkwF3hXRNwZEXeUtwOTmqhR+Smvd/4o8K/AnRS3sfMyRzVZozIkjTONyk9mfo9iDZlbgDXA\nHZn5NxXPTbuhZ3DQ3zslSZIkSVKztbUGRkQcCVyUmceV024uo7jGZ21mLin3WUoxdWY7sCwzV0bE\nnsA1FH9h7ANOz8zHRuE8JEmSJEnSBDbiJSQRcS5wObBHuelC4AOZuRjYMyJ+LSLmUVyLtBA4EVhe\nXgt0FnBXue/VwAWjcA6SJEmSJGmCa2cNjHuAU1oe3wk8t1wwq5dixsURwOrM7M/MPmAdcCiwiOK6\nJCiuLTq+rsIlSZIkSdLkMeIlJJl5XUQc2LJpHfCXwPuAjRS3qfnt8vOdnqRYHby3ZfsmYPZIx4uI\nacD+wI8ys3/kU5C0k/mROmOGpOrMj9QZMySNrK01MIa4BDg6M38QEb8PXEwxy6K1OdELPEGx7kVv\ny7YNbYy/P7D+xhtvrFCa1Hg9ozy++dFEZ4ak6syP1BkzJFVXS36qNDAeo5hNAfAgcBRwK7AsImYA\nM4FDKG4ncwtwMnBb+XFVpwVLkiRJ0kS18oavs++++3U0xsIjD+eA/Z9fU0VSc1RpYCwF/iYitgPb\ngKWZ+VBEXAqspuisnJ+Z2yLiMuCqiFgFPA2cVlfhkiRJkjTRXPvt7Uyfta2jMR59/Bv8/tLfraki\nqTnaamBk5v0UMy3IzG9RLM45dJ8VwIoh27YAp3ZepiRJkiRNfFOnTWPqtOkdjdHT01kDRGqqdu5C\nIkmSJEmS1FU2MCRJkiRJUuPZwJAkSZIkSY1nA0OSJEmSJDWeDQxJkiRJktR4NjAkSZIkSVLj2cCQ\nJEmSJEmNZwNDkiRJkiQ1ng0MSZIkSZLUeNPa2SkijgQuyszjIuJ5wOXAPsBU4M2ZuT4ilgJnAtuB\nZZm5MiL2BK4B5gJ9wOmZ+dhonIgkSZIkSZq4RpyBERHnUjQs9ig3/SlwTWYeC1wAHBIR84BzgIXA\nicDyiJgOnAXclZmLgavL/SVJkiRJknZLO5eQ3AOc0vL4aGD/iPgacBrwL8ARwOrM7M/MPmAdcCiw\nCLihfN31wPE11S1JkiRJkiaRES8hyczrIuLAlk0HAY9n5gkRcQFwHrAW2Niyz5PAHKC3ZfsmYHYd\nRUuSJKmZfv/9n2bGzN6Oxth3j61c/on/XVNFkqSJoq01MIZ4DPjH8vN/BJYBt/LTzYle4AmKdS96\nW7ZtqFamJEmSxoNtM+czOGu/jsYY7FlfUzWSpImkyl1IVgEnl58vBtZQNDAWRcSMiJgDHFJuv6Vl\n35PL10qSJEmSJO2WKg2MdwGnR8Rq4DXARzPzIeBSYDXwdeD8zNwGXAa8LCJWAUuAD9ZTtiRJkiRJ\nmkzauoQkM+8Hjio/fwD41WH2WQGsGLJtC3Bq52VKkiRJkqTJrMoMDEmSJEmSpDFlA0OSJEmSJDWe\nDQxJkiRJktR4NjAkSZIkSVLj2cCQJEmSJEmNZwNDkiRJkiQ1ng0MSZIkSZLUeDYwJEmSJElS49nA\nkCRJkiRJjTetnZ0i4kjgosw8rmXbacAfZOZR5eOlwJnAdmBZZq6MiD2Ba4C5QB9wemY+VvM5SJIk\nSZKkCW7EGRgRcS5wObBHy7ZXAG9reTwPOAdYCJwILI+I6cBZwF2ZuRi4Grig1uolSZIkSdKk0M4l\nJPcAp+x8EBHPAT4CvKNlnyOA1ZnZn5l9wDrgUGARcEO5z/XA8XUULUmSJEmSJpcRGxiZeR3QDxAR\nU4ArgD8GnmrZbTawseXxk8AcoLdl+6ZyP0mSJEmSpN3S1hoYLQ4DXgRcBswEXhwRFwPf4KebE73A\nExTrXvS2bNvQUbWSJEmSJGlS2p0GRk9m3gb8AkBEHAh8ITP/uFwD4yMRMYOisXEIsAa4BTgZuK38\nuKrO4iVJkiRJ0uSwO7dRHdzVE5n5EHApsBr4OnB+Zm6jmKnxsohYBSwBPthBrZIkSZIkaZJqawZG\nZt4PHPVs2zJzBbBiyD5bgFM7L1OSJEmSJE1muzMDQ5IkSZIkqStsYEiSJEmSpMazgSFJkiRJkhrP\nBoYkSZIkSWo8GxiSJEmSJKnxbGBIkiRJkqTGs4EhSZIkSZIazwaGJEmSJElqvGnt7BQRRwIXZeZx\nEfGLwKVAP/A08ObMfCQilgJnAtuBZZm5MiL2BK4B5gJ9wOmZ+dhonIgkSZIkSZq4RpyBERHnApcD\ne5SbPgGcnZm/DFwHvCci5gHnAAuBE4HlETEdOAu4KzMXA1cDF9R/CpIkSZIkaaJr5xKSe4BTWh6/\nITO/V34+DdgKHAGszsz+zOwD1gGHAouAG8p9rweOr6VqSZIkSZI0qYzYwMjM6yguF9n5+CGAiDgK\nOBv4c2A2sLHlZU8Cc4Delu2byv0kSZIkSZJ2S6VFPCPiDcCngJPLNS36+OnmRC/wRLm9t2Xbhuql\nSpIkSZKkyaqtRTxbRcQbKRbrPDYzdzYkvgN8JCJmADOBQ4A1wC3AycBt5cdVdRQtSZIkSZIml91q\nYETEFOAS4H7guogYBL6ZmR+MiEuB1UAPcH5mbouIy4CrImIVxR1LTqu3fEmSJEmSNBm01cDIzPuB\no8qHz9nFPiuAFUO2bQFO7aRASZIkSZKkSmtgSJIkSZIkjaXdXgNDkiRJGk0DAwOsXbu2lrHmz5/P\n1KlTaxlLktRdNjAkSZLUKJs2PMqb3vt5Zs2Z29E4mzc+zNXLT2PBggU1VSZJ6iYbGJIkSWqcWXPm\nsve+L+h2GZKkBnENDEmSJEmS1Hg2MCRJkiRJUuPZwJAkSZIkSY1nA0OSJEmSJDWeDQxJkiRJktR4\nbd2FJCKOBC7KzOMiYj5wJTAArMnMs8t9lgJnAtuBZZm5MiL2BK4B5gJ9wOmZ+Vj9pyFJkiRJkiay\nEWdgRMS5wOXAHuWmi4HzM/MYYEpEvC4i5gHnAAuBE4HlETEdOAu4KzMXA1cDF4zCOUiSJEmSpAmu\nnUtI7gFOaXn8S5m5qvz8euAE4AhgdWb2Z2YfsA44FFgE3NCy7/G1VC1JkiRJkiaVERsYmXkd0N+y\nqafl803AbKAX2Niy/UlgzpDtO/eVJEmSJEnaLVUW8Rxo+bwX2ECxvsXsIdufKLf3DtlXkiRJkiRp\nt1RpYNwREYvLz08CVgG3AosiYkZEzAEOAdYAtwAnl/ueXO4rSZIkSZK0W9q6C8kQ7wIuLxfpvBu4\nNjMHI+JSYDXFJSbnZ+a2iLgMuCoiVgFPA6fVVbgkSZIk6acNDuzgoR//F2vXrq1lvPnz5zN16tRa\nxpI61VYDIzPvB44qP18HHDvMPiuAFUO2bQFO7bhKSZIkSdKIntr4Y65/YBPf/I+vdzzW5o0Pc/Xy\n01iwYEENlUmdqzIDQ5IkSZLUULPmzGXvfV/Q7TKk2lVZA0OSJEmSJGlM2cCQJEmSJEmNZwNDkiRJ\nkiQ1ng0MSZIkSZLUeDYwJEmSJElS49nAkCRJkiRJjWcDQ5IkSZIkNZ4NDEmSJEmS1HjTqrwoIqYB\nVwEHAf3AUmAHcCUwAKzJzLPLfZcCZwLbgWWZubLjqiVJkiRJ0qRSdQbGycDUzDwa+DDwUeBi4PzM\nPAaYEhGvi4h5wDnAQuBEYHlETK+hbkmSJEmSNIlUbWCsBaZFRA8wh2J2xWGZuap8/nrgBOAIYHVm\n9mdmH7AOeHmHNUuSJEmSpEmm0iUkwJPAwcAPgOcAvw68uuX5TcBsoBfYOOR1cyoeU5IkSZIkTVJV\nZ2D8EXBDZgZwKPDXwIyW53uBDUAfRSNj6HZJkiRJkqS2VW1gPM4zMys2UMzkuDMijim3nQSsAm4F\nFkXEjIiYAxwCrOmgXkmSJEmSNAlVvYTkE8BnI+JmYDpwHnA7cEW5SOfdwLWZORgRlwKrgR6KRT63\n1VC3JEmSJEmaRCo1MDLzKeANwzx17DD7rgBWVDmOJEmSJEkSVL+ERJIkSZIkaczYwJAkSZIkSY1n\nA0OSJEmSJDWeDQxJkiRJktR4NjAkSZIkSVLj2cCQJEmSJEmNZwNDkiRJkiQ1ng0MSZIkSZLUeDYw\nJEmSJElS402r+sKIOA/4DWA68CngZuBKYABYk5lnl/stBc4EtgPLMnNlhzVLkiRJkqRJptIMjIg4\nBliYmUcBxwIvBC4Gzs/MY4ApEfG6iJgHnAMsBE4ElkfE9FoqlyRJkiRJk0bVS0heA6yJiC8DXwH+\nCTgsM1eVz18PnAAcAazOzP7M7APWAS/vsGZJkiRJkjTJVL2E5LkUsy5eC/wcRROjtRmyCZgN9AIb\nW7Y/CcypeExJksbE7Xd8l/sf+M+Oxnjl4a9gxowZNVUkSZKkqg2Mx4C7M7MfWBsRW4H9W57vBTYA\nfRSNjKHbJUlqrE9cu45pMx+p/PrNfQ/zf9/fy8te+pIaq5IkSZrcqjYwVgN/CPx5RDwf2Au4MSKO\nycxvAicBNwG3AssiYgYwEzgEWNN52ZIkjZ499tqH6bP2q/z6HdufrrEaSZIkQcUGRmaujIhXR8R3\ngB7gLOA+4Ipykc67gWszczAiLqVoePRQLPK5rZ7SJUmSJEnSZFH5NqqZed4wm48dZr8VwIqqx5Ek\nSZIkSap6FxJJkiRJkqQxYwNDkiRJkiQ1ng0MSZIkSZLUeJXXwJAkScMbHBjgvvvuY8b0zt9m58+f\nz9SpU2uoSpIkaXyzgSFJUs22bHqED3/2YWbNua+jcTZvfJirl5/GggUL6ilMkiRpHLOBIUnSKJg1\nZy577/uCbpchSZI0YbgGhiRJkiRJajxnYEiSJGlCGhwYYP369bWM5Xo0ktR9NjAkSZI0IW3Z9AgX\nfuZRZs25t6NxXI9GkpqhowZGRMwFbgOOB3YAVwIDwJrMPLvcZylwJrAdWJaZKzs5piRJktQu16OR\npImj8hoYETEN+DSwudx0MXB+Zh4DTImI10XEPOAcYCFwIrA8IqZ3WLMkSZIkSZpkOlnE82PAZcCD\nQA9wWGauKp+7HjgBOAJYnZn9mdkHrANe3sExJUmSJEnSJFSpgRERbwEezsyvUTQvho61CZgN9AIb\nW7Y/CcypckxJkiRJkjR5VV0D463AQEScABwK/DXwvJbne4ENQB9FI2PodkmSJEmSpLZVamCU61wA\nEBE3Ab8H/FlELM7Mm4GTgJuAW4FlETEDmAkcAqzpuGpJkiRJ0qjyVsRqmjpvo/ou4PJykc67gWsz\nczAiLgVWU1xqcn5mbqvxmJIkSZKkUeCtiNU0HTcwMvOXWx4eO8zzK4AVnR5HUvsuvuyL7NXb2XIz\ne+/Zw3veuaSmiiRJkjQeeStiNUmdMzAkNcT3H38O07fu19EYc7bfU1M1kiRJktS5Tm6jKkmSJEmS\nNCZsYEiSJEmSpMbzEhJJwxoYGGDt2rW1jOWq05IkSZI6ZQND0rCe3PgYb3rv55k1Z25H47jqtCRJ\nkqQ6NLaBseySa5i1d7W7KGzdspn3nP0GXnjA/jVXJU0urjotSZIkqSka28BY/+Q8pg9Uu4vC5o0P\n8eijj9nAkCRJkiRpgnART0mSJEmS1Hg2MCRJkiRJUuNVuoQkIqYBnwUOAmYAy4DvA1cCA8CazDy7\n3HcpcCawHViWmSs7rnoEgwMD3H///ey918yOxvHOCZIkSZIkNUPVNTDeCDyamW+OiH2A/w98Fzg/\nM1dFxGUR8Trg34BzgMOAWcDqiPjnzNxeR/G7smXTI1x09Y+ZNedHlcfwzgmSJEmSJDVH1QbG3wJ/\nV34+FegHDsvMVeW264FfpZiNsToz+4G+iFgHvBy4vXrJ7fHuCZKk8W5wYID169fXNp4zCyVJ0nhW\nqYGRmZsBIqKXopHxPuBjLbtsAmYDvcDGlu1PAtXujSpJ0iSzZdMjXPiZR5k1596Ox3JmoVSdzURJ\naobKt1GNiAOAvwf+IjO/GBF/2vJ0L7AB6KNoZAzdLkmS2uCMQqn7bCZKUjNUXcRzHvBV4OzM/Ea5\n+c6IWJyZNwMnATcBtwLLImIGMBM4BFjTedmSJEnS2LGZKEndV3UGxnuBfYALIuJCYBB4B/DJiJgO\n3A1cm5mDEXEpsBrooVjkc1sNdUuSJEmSpEmk6hoY7wTeOcxTxw6z7wpgRZXjSJIkSZIkQQdrYEhS\nO+pc+MxFzyRJkqTJywaGpFFV18JnLnomSZI0PvkHLdXFBoakUefCZ5IkSZOXf9BSXWxg7EJdXUI7\nhJIkSZImO/+gpTrYwNiFOrqEdgglSZIkSaqHDYxnYZdQkjRReP2xJEka72xgjCIvQ5Hq4y9fUme8\n/lhqBt/PJKk6GxijyMtQpPr4y5fUOWcWSt3n+5lUnQ1A2cAYZf6wKNXHPEnd5w+PUud8P5OqsQGo\nUW9gREQP8CngUGArsCQz/2O0jytJw6nzly/wFzBNPv7wKDWDzURNVjYAJ7exmIHxm8AemXlURBwJ\nXFxuUxtcR0OqV12/fAE8teHHfPjtR3PwwQd3PJYZ1XhSxw+P/vIldaau97M638vAPGp88A9a49dY\nNDAWATcAZOa3I+LwMTjmhFHHm1Mdb0w7duwA6CiYdYzR6T8OO3bs4N57O//F1X+kxre6OvebNz7E\nhZ/518b88FhHxuocp+6xzN3EMpF/+arrvaauejRx1fF+Vtd7GTSvsV9nFgFnjE0gTfyD1kT/Oa6u\n/IxFA2M2sLHlcX9ETMnMgV3sPxWg5/G76Nm8V6UDTnnyATZt3Uj/1r5KrwfY/MR/smPbU40YY4+9\n9qF/656Vx9iy4UHeddEX2WPWPpXH2PToA8yYNburYzy9eQPvXXoCBxxwQOUafvjDH7L88q91dB5P\nb97Apz70tkr/SP3Kr/zKQcCPMrO/cgHPruP87LT9qf9i0/b+jr5/oZ4cNHGcnWN1mk+oJ6NQT07r\nHKfOsXbmbsmSJQcxDjI09akH2LRtU+O+X5uUoablB+p5n4F63mvqrAfg4IMPHl/vQVsfZtPmbY36\nfjWLI6srj03L4s6atj14y0GMgwz19D1OX/+sCfv9OlG/7yf6z3F15adncHCwo2JGEhEfB/41M68t\nHz+QmS98lv0XAatGtSipuw7OzPtGY2Dzo0nCDEnVmR+pM2ZIqq7j/IzFDIxvAa8Fro2IVwHfG2H/\nW4FXA/8F7Bjl2qRu+NEojm1+NBmYIak68yN1xgxJ1XWcn7GYgbHzLiQvLze9NTPXjupBJUmSJEnS\nhDLqDQxJkiRJkqROTel2AZIkSZIkSSOxgSFJkiRJkhrPBoYkSZIkSWq8sbgLSVtaFvs8FNgKLMnM\n/xiD4x4JXJSZx0XEfOBKYABYk5lnl/ssBc4EtgPLMnNlROwJXAPMBfqA0zPzsQ5rmQZ8FjgImAEs\nA77f5ZqmAJcDUdbwe8DT3aypPNZc4DbgeIpVmrtaT3m824GN5cP1wEfHsi4zZIZ2s65GZcj8mJ9d\n1GR+2q/JDJmh4WoyQ+3VY37Mz3A1mZ/2axqzDDVpBsZvAntk5lHAe4GLR/uAEXEuxTflHuWmi4Hz\nM/MYYEpEvC4i5gHnAAuBE4HlETEdOAu4KzMXA1cDF9RQ0huBR8sxTwT+ogE1/TowmJmLyvE+2u2a\nyn/gPg1sLjd1+2tEROwBkJm/XP53RhfqMkNmqC1Ny5D5MT/Pwvy0V5MZKnT7/4UZakPTMmR+zM+z\nMD/t1TSmGWpSA2MRcANAZn4bOHwMjnkPcErL41/KzFXl59cDJwBHAKszsz8z+4B1FN3Rn9Rb7nt8\nDfX8Lc/8D5sK9AOHdbOmzPwHii4ZwIHAE92uCfgYcBnwINDTgHoox94rIr4aEV+PoqM91nWZITPU\nrqZlyPwUzM8Q5qdtZqhghoYwQ20xPwXzM4T5aduYZqhJDYzZPDPtBKA/imk7oyYzr6MIx049LZ9v\nKmvqHVLXk8CcIdt37ttpPZsz86mI6AX+Dnhft2sq6xqIiCuBS4HPd7OmiHgL8HBmfq2ljtbvk658\njSi6oH+Wma+h6CJ+jrH/OpkhMzSihmbI/BTMz/B1mZ+RmaGCGRq+LjP07MxPwfwMX5f5GdmYZqhJ\nDYw+iuJ3mpKZA2NcQ+vxeoENFHXNHrL9CX663p37diwiDgBuAq7KzC82oSaAzHwLsAC4ApjZxZre\nCpwQEd+g6Nj9NfC8Ltaz01qKsJKZ64DHgHljXJcZwgy1oYkZMj+Frn+vmp8RNTE/YIZ26vr3qxka\nURMzZH4KXf9eNT8jamJ+YIwz1KQGxreAkwEi4lXA97pQwx0Rsbj8/CRgFXArsCgiZkTEHOAQYA1w\ny856y4+rhg62u8rrgr4KvDszryo339nlmt4YEeeVD7dSLBRzW0Qc042aMvOYzDwuM48Dvgu8Cbi+\nm1+j0tuAjwNExPMpwvnPY/x1MkNmaEQNzZD5KZifn63J/LTHDBXM0M/WZIZGZn4K5udnazI/7RnT\nDPUMDg7WVHdn4pnVd19ebnprZq4dg+MeCHwhM4+KiJ+nWMxmOnA3sDQzByPiDODtFFNhlmXmlyNi\nJnAV8D8pVqM9LTMf7rCWTwCnAj8ojzUIvAP4ZBdrmgX8FfA/KO5as7ys74pu1dRS200UqwEP0sX/\nb2Ut0ym+TgdSdIvfTdF9HLOvkxkyQxVqa0SGzI/5eZaazE97tZghM7SrmszQyHWYH/Ozq5rMT3u1\njGmGGtPAkCRJkiRJ2pUmXUIiSZIkSZI0LBsYkiRJkiSp8WxgSJIkSZKkxrOBIUmSJEmSGs8GhiRJ\nkiRJajwbGJIkSZIkqfFsYDRYRMyOiOsqvvaVEXHRMNtPj4i/6ry64Y81GuNLVZkhqTrzI3XGDEnV\nmR/tig2MZtsPOLTia18CzN3Fc4MVx2z3WHWPL1VlhqTqzI/UGTMkVWd+NKxp3S5Az+oS4PkR8aXM\nfH1EvBl4B9AD3A6cDbwUuL78OAjcAfwG8CFgr4h4b2YuH27wiHglcDEwE3gUeHtm3h8R3wC+A7wa\neC5wTmZ+NSJeAHwO2AdYAxxTHvcnxwIeBH6+HOOFwI2ZeWbdXxipTWZIqs78SJ0xQ1J15kfDcgZG\ns/0h8GAZ2pcAS4CFmXkY8AhwbmbeCVwGfAy4FPjLzLwLuBD4yrOEdjpwOfA7mXk4RYCvaNllemYe\nBfwx8JFy2yXAFzLzF4FrgednZt8wxzoA+E3gxcBJEfHiOr4YUgVmSKrO/EidMUNSdeZHw3IGxvhx\nHPAi4N8iogeYTtFlBFgG3AZszsw3tjneAmA+8JVyPIC9W56/ofy4hmIKF8AJwOkAmfnliNiwi7Fv\nzsyNABFxL0X3Uuo2MyRVZ36kzpghqTrzo5+wgTF+TAX+NjPfCRARs3jm/98+QC+wd0Tsl5mPtzne\nvWUXkzK881qe31p+HKSYqgWwg/Zm7fS3fN76eqmbzJBUnfmROmOGpOrMj37CS0iarZ9nwvkvwCkR\n8bwyZJ/RjxoNAAABOUlEQVQG3lk+95fAJ4FPUUyj2vna6c8y9g+A/SJiUfl4CfD5Eer5Z+B3ASLi\nJIp/MIbWKTWJGZKqMz9SZ8yQVJ350bBsYDTbQ8API+LG8nquDwE3Ad8rn78oIn4b+DmK67IuoVg4\n5rcoFp85MiI+OtzAmbkNOBX4eER8F3gT8Lby6V2tnvtHwOsj4vbytTunTn0HeFV5rKGvdSVedZMZ\nkqozP1JnzJBUnfnRsHoGB/26qj0RcQ7wtcz8QUS8AvhMZr6y23VJ44UZkqozP1JnzJBUnflpDqe7\naHesA74YEQPAFmBpl+uRxhszJFVnfqTOmCGpOvPTEM7AkCRJkiRJjecaGJIkSZIkqfFsYEiSJEmS\npMazgSFJkiRJkhrPBoYkSZIkSWo8GxiSJEmSJKnxbGBIkiRJkqTG+2/qcrM3MzgSxAAAAABJRU5E\nrkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "g = sns.FacetGrid(yelp,col='stars')\n", "g.map(plt.hist,'text length')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Create a boxplot of text length for each star category.**" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAERCAYAAACO6FuTAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH51JREFUeJzt3X90XXWZ7/F3miZNk5wWaCnFi068/nhwOrEjKNWhUvA3\njFS9OpglIJVYRoTOuOY6rqFOuTM4/HDJZTGKorcWi9J7uTM4akZWQR2c2sK1FkclVNYDLunciwMt\niU3TNE3SNrl/7JNzTo77pCft2Xufnf15rZXVJ/vsk/3N7sl+9vfnbpiYmEBERCTMnKQLICIi9UtJ\nQkREKlKSEBGRipQkRESkIiUJERGpSElCREQqmhv1Aczsp8CB/LfPArcAm4Fx4El3vy6/31rgGuAI\ncLO7P2hmLcB9wBJgELjK3fujLrOIiAQaopwnYWbzgMfc/dySbd8Bbnf37WZ2N/AQ8GPg+8A5QCuw\nAzgXuB7IuftNZvZB4E3u/onICiwiIlNEXZNYDrSZ2cNAI/Bp4Bx3355/fSvwDoJaxQ53PwoMmtkz\n+feuBD5bsu+GiMsrIiIlou6TGAY+5+7vBK4FtgANJa8fBBYAOYpNUgBDwMKy7ZP7iohITKKuSTwN\n/ArA3Z8xs36CJqVJOWCAoL9hQdn2/fntubJ9K8o3b70BeB44VoPyi4hkQSNwJrDL3UdLX4g6SVwN\ndALXmdlLCBLB98xslbtvAy4GHgF2ATebWTMwHzgbeBJ4DLgEeDz/7/bfPcQUb6hiHxERCfdmgj7h\ngqiTxCbga2a2naDfYQ3QD3zVzJqAp4AH3H3CzD6fL1wDsN7dx/Id2/fm3z8KfOg4x3seYMuWLSxd\nujSSX0iqc9NNN/Hcc88BcNZZZ3HjjTcmXKLk3H777Tz99NMAvPrVr+aTn/xkwiVKzvXXX8/Y2BgA\nzc3N3HXXXQmXKDn19DfywgsvcPnll0P+Gloq0iTh7keAK0JeujBk300ESaV022Hgshkc8hjA0qVL\nOeuss2bwNqm11tZWmpubC3GW/z+6u7vZsGFDIc7yuejo6GDPnj2FOMvnYt26dYXPxbp16+rlXPxO\nM70m00kk2traQuMs6uzsZNmyZSxbtozOzs6ki5Oo7u7u0DiLOjs76ejooKOjo64/F5FPppNsWrFi\nBbt37y7EWdfV1ZV0EaQOpSFRqiYhkdi5c2donFWdnZ11fbcYl/vvvz80lvqlmoRIDHp7ewGUKGSK\nyURZz58L1SQkEqXNK2pqCS4GunPW56JUb28vu3fvZvfu3YWbiHqkmoREYrKzdjLOssmLwWSc5fOh\nz0VRedNbvZ4PJQmJTNbvFCel5WIQF30u0kXNTRIZddZKGH0uAmlpelOSEIlYWi4Gcent7a3rNvi4\npGX+jJqbRCKmdvipNm0KFla48847Ey5J8tJw06AkUWMa6ihh0nAxiENvb29hWY6sd+JDOq4Tam6q\nMQ11lDBqhw9M1iLKY6lfShI1lJZxzyJJ2bdvX2gs9UtJooa05IDI9HK5XGgs9UtJQkRiM3/+/NBY\n6peSRA1pqKPI9LSE/FRpGA6sJFFDnZ2dnHHGGZxxxhnqpBQJoRupqdIw0EVDYGtsYGAg6SKI1C3N\nGSlKy5peqknUUE9PD6Ojo4yOjtLT05N0cUTqUldXl2oRpGegi5JEDaXlP10kSZozki5KEjV07Nix\n0FhEpFxa+mfUJ1FDra2tjI6OFmKRSZPNj6tXr064JFIvOjs7C9eJeq5ZqSZRQ5MJojwWScMoFolX\nb28vw8PDDA8P1/UwWCWJGlqyZEloLNnW09NTuBhoQINMSksfppJEDXV3d4fGkm1puRiIhFGSqKHO\nzk46Ojro6Oio6zZGEUmeOq4zSjUIKdfV1cU999xTiEVAHdeZpTHgUm716tXMmTOHOXPmaHSTFKjj\nWkSA4GIwPj7O+Ph4XV8MJF5p6atSkhCJWFouBiJhlCRERBKQlo5rJQmJTBrWyo9DWi4GcdHnIjC5\nIu6yZcvquh9To5skMpNNK/X8BxCHyaHRk3HW6XNRtGLFiqSLcFyqSUgkJtfK3717t+4agcOHD3P4\n8OGki5E4fS6m2rlzJzt37ky6GNNSkpBIqLO2qLe3l71797J3797MXxj1uShKS8JUkhCJ2KZNm0Jj\nyba0JEwlCYmEOmuL9u3bFxpnkT4X6RN5x7WZLQEeB94GHAM2A+PAk+5+XX6ftcA1wBHgZnd/0Mxa\ngPuAJcAgcJW790ddXpFaW7JkCXv27CnEWaZnXBd1dXWxYcOGQlyvIq1JmNlc4MvAcH7THcB6d18F\nzDGz95jZGcA64E3Au4BbzawJuBZ4wt0vAL4BbIiyrFJbamIp0urAU+kZ1wENgQ3cDtwN3AA0AOe4\n+/b8a1uBdxDUKna4+1Fg0MyeAZYDK4HPluyrJJEiamIp0hDYqXQOijI9BNbM1gD73P37BAmi/HgH\ngQVADjhQsn0IWFi2fXJfSQk9gGmq7u5u1SLyNJmuKOtDYD8CvN3MfkhQM/g6cHrJ6zlggKC/YUHZ\n9v357bmyfSUl1MQylVYHLtKjXAOZHwLr7qvc/SJ3vwj4OXAlsNXMLsjvcjGwHdgFrDSzZjNbCJwN\nPAk8BlyS3/eS/L6SEnoAk4RJy4UxDmkZAhv3shyfBDbmO6afAh5w9wkz+zywg6BZar27j5nZ3cC9\nZrYdGAU+FHNZ5SSpBiHlyi+MWb6BOHToUGhcb2JJEu7+lpJvLwx5fROwqWzbYeCyaEsmUcryBUBk\nttBkOhGJjSbTFbW1tYXG9UZJQiKjUSwilaUlYWqpcImMloSWcuqTKErL/BklCYnE5CiWybie/wgk\nPmnprI1LGpaPV3OTRCItw/skXiMjI6FxFqVlCXklCYmE7hglzODgYGicRWlZ30xJQkRio+VaitKy\nvpmShEQiLcP7JF5vectbQuMsSkvCVJKoMQ37DKRleJ/Eq3Qxu3pf2C5qaVnfTKObakzDPgNpGd4n\nItNTTaKGtHiZyPRKn5+QhmcpRCktIwCVJGooLf/pcejt7WXPnj3s2bNHCVMK1NyUPkoSEgklTJHp\npaVWpSRRQ+qsLdI8CQmjv5GitNSq1HEtIrHp7OyktbW1EEv9U02ihtTEUqR5EhKmt7eX4eFhhoeH\nM99XpeamDFITS5GaFSSMbqSK1NyUQaUrOqZhdccodXZ2smzZskIsIumkmkQNHTx4MDTOqq6uLtUi\nZArVMIvSci6UJGooLWuxiEjyJlcl6OjoqOvatpJEDWnxsqnuv//+zLc7y1RpWR5bipQkaigtHVFx\n0BIlEub5558PjbMoLasSKElIJDSKZSqtDizl0vI3oiRRQ2kZ9yzxU9Nb4NRTTw2NpX4pSdSQmpuK\n0jJyIw5qeiuamJgIjbMoLX8jShISicl5EsuWLavrkRtxSEuzQhw0TDx9lCRqKC13BnHRPImAZuIX\naZh4UVpGemnGdQ1plvFUOgeBkZGR0DiLXvOa17Bnz55CnGX79u0LjeuNahI1prtnKbd///7QOIu2\nbdsWGmdRWmpVShI11tnZqTtokQqOHTsWGmdRd3d3aFxvlCRqTOPhi3QuAmeeeWZonEVNTU2hcRZ1\ndnaydOlSli5dWtc3lkoSNabx8EU6FwEt11KkTvypWlpaaGlpSboY01KSqCGNhy/SuSh65JFHQmPJ\nNi3LkUEaD1+kc1GUllEscTjllFNC4yxKy9+IkkQNqSotYdIyikXilZbrhZKEREITC4vUJ1F04MCB\n0DiL0vIky0gn05nZHGAjYMA48DFgFNic//5Jd78uv+9a4BrgCHCzuz9oZi3AfcASYBC4yt37oyyz\nSK2Vr+m1evXqBEsj9eK3v/1taFxvoq5JXApMuPtKYANwC3AHsN7dVwFzzOw9ZnYGsA54E/Au4FYz\nawKuBZ5w9wuAb+R/hqRAWtpbJV7j4+OhcRYdOXIkNK43kSYJd/8OQe0A4PeA/cA57r49v20r8Hbg\nPGCHux9190HgGWA5sBJ4qGTft0VZ3pO1d+/e0DiL0tLeGgctIS9pVlVzk5m9BlgMNExuc/cfVfNe\ndx83s83Ae4E/IUgKkw4CC4AcUNpAOQQsLNs+uW/dSksbo8RLzU1F5513Hj/5yU8KcZY1NTUVahD1\nPLHwuDUJM/sfwPeAzwB/m//6m5kcxN3XAK8GvgrML3kpBwwQ9DcsKNu+P789V7avpIAWtZMwl156\naWicRVdeeWVoXG+qaW56K/AKd7/Q3S/Kf1U1RMPMrjCzv8p/OwIcAx43s1X5bRcD24FdwEozazaz\nhcDZwJPAY8Al+X0vye9bt+bOnRsaZ1F/f39onEVqbiq66667QuMsWr16NY2NjTQ2NtZ17bKaJPF/\nmXr3PxP/BLzOzLYR9Cn8GXAd8Ldm9ijQBDzg7nuBzwM7gB8QdGyPAXcDf2Bm24GPEtRi6lZjY2No\nnEVayK1IM66LXnzxxdA4q04//XROP/30pIsxrYq3u2b2NWAiv88vzOxHwNHJ19396uP9cHcfBj4Y\n8tKFIftuAjaVbTsMXHa849QLJYmitra2wpPH2traEi5NsjTjuqipqYnR0dFCnGW9vb288MILhbhe\nF/mbribxr8A2gn6E/wb8S/77bfnXpMyqVatC4ywaGxsLjbNowYIFoXEWLV++PDTOorQME69Yk3D3\newHM7AZ3v7X0NTO7JeqCpdFTTz0VGmeRalVFExMToXEW/eIXvwiNpX5N19x0G8FM59Vm9qqy97wR\nWB9x2VJHzQpFq1atYuvWrYU4yyab3crjLFINs2jFihXs3r27ENer6ZqbvknQtHSIYjPTNuBh4I+j\nL1r6aCG3ItWqinK5XGicRQ0NDaFxFn37298OjevNdM1Nu4BdZvat/CxoOY7u7m42bNhQiLPsN7/5\nTWicRfPnzw+Ns0jLchSlZe2magbz7zazl1CcyHZKPv41sNbdfx5V4dLm2WefnRLX62iFOBw9ejQ0\nlmxraGgo9MtkvSaRFtXMk9gGvN/dF7n7IuDdQA/BmkxfjLJwabNly5bQOIs0sbBIs8+L9NChorQM\n7qgmSfyBuxcazNx9K/Bad/8ZJz7JblZKy6qOcTjttNNC4yzS7POihQsXhsZZ9NKXvjQ0rjfVJIkB\nM/tTM2szs5yZfQz4rZmdXeX7M6N05mS9z6KM2uLFi0PjLNLNQ5EeOlSUlodRVXORv5xg5db/AP6d\nYLb0h/Pb/qry27Ln+uuvD42zSOsVSZiBgYHQOIvSslzLcRuL3f03wAdCXvpC7Ysjs4WWx5YwmlhY\n9Pzzz4fG9ea4ScLM3gn8HXAaU58n8Z8jLFcqbdq0aUp85513JlgaEZGTV01z0xcIVl99K3BRyZeU\nScudQRy6urpC4yzSBDIJ09raGhrXm2rGJva5+3cjL4nILNXc3FxY+bS5uTnh0ki9SEsnfjU1ie1m\ndoeZvcPMLpj8irxkKXTmmWeGxllU3vSWZZdffnlonEWqVRWlZfZ5NTWJyQfRvq5k2wRQv2O2EqJl\nOYq02GHRy1/+8tA4ixobGwsz8Ot5ApkUVTO6Sf0PVers7CzcHWV5SQ6AefPmMTw8XIizrPyRnV/5\nylcSLE2ytFxLUS6XK6wKXM8LP1Yzuun3CB481AG8GfifwNXuvifSkqVQT09PYVhfT09Ppod9TiaI\n8jiL9MhOCdPa2lpIEvXccV1Nn8RXgM8BQ8Be4H8BX4+yUGmVlidNxSEt69LEQedCwqRlYmE1SWKx\nu38PwN0n3H0jkO1nMMpxvexlLwuNs6i0uS3rTW9SlJamt2qSxGEzO4ugsxozWwmMRlqqlNLcgCJ3\nD42z6NChQ6GxZNtsGt30F8B3gVeY2c8JZl5fFmmpUkqT6Yq0/EKRzoWEScvn4rg1ifwT6t5A8Fzr\nDwOvdPcfR12wNHr44YdDYxGRtKpYkzCzr5FvYgp5DXe/OrJSpVRaqo9xmDt3bqGdNesPHRJJs+n+\nev81rkLI7LNo0SL27t1biEUknSomCXe/N86CyOyi5RdEZgc9Wa6G2tvbQ+Ms0gQykdlBSaKG9MjO\nomPHjoXGIpIux00SZnZDyLZboilOurW1tYXGkm1z5swJjUXSYLrRTbcBS4DVZvaqkpeagBXA+ojL\nljpdXV2FVWCzPplOiubNm8fhw4cLsUiaTDe66ZvA7xM8kW5byfajwE1RFiqtOjs7WbZsWSEWAQoJ\nojwWSYPpRjftAnaZ2c/c/YnS18zsA8AzURcujVSDEJHZpJpZTj1m9kV3/5yZnQbcDbwKeCDaoiVv\n8+bNPProozN6z9DQEHBio5vOP/981qxZM+P3xeFEzkWptWvXzmj/ej4XM9XS0sLIyEghFkmTanrR\nzgGWm9ljwE+AnQTLdEiI0dHRwvOMs0yd+EWlnwd9NiRtqqlJNABHgNZ8PJ7/mvXWrFkz47vZyTvm\njRs3RlCi5JzIuXjve98LwJYtWyIoUXqkZSE3kTDV1CR2A3uA1xOManoTQY1CZFptbW2Zr0WIpF01\nNYmL3f1n+bgP+KCZ/UmEZZJZop4fySgi1akmSew2s08DBlwPfAK4rZofbmZzgXsIno/dDNwM/BLY\nTNBk9aS7X5ffdy1wDUHT1s3u/qCZtQD3EczXGASucvf+an85ERE5OdU0N30RaCPowD4KvBL4apU/\n/wqgz90vAN4F3AXcAax391XAHDN7j5mdAawjaMp6F3CrmTUB1wJP5N//DWBD1b+ZiIictGqSxLnu\nvh444u7DwFUECaMa/0Dxwt5IkGTOcfft+W1bgbcD5wE73P2ouw8SzMFYDqwEHirZ921VHldERGqg\nmuamCTNrpvgAosVUeBhRuXxSwcxywD8CnwZuL9nlILAAyAEHSrYPAQvLtk/uKyIiMammJvH3wA+A\npWZ2J/A4cGe1BzCzlwKPAPe6+/1MHT6bAwYI+hsWlG3fn9+eK9tXRERiUs0zrr8OfIyg0/nXwKXu\nvqmaH57va3gY+FTJQ4x+ZmYX5OOLge3ALmClmTWb2ULgbOBJ4DHgkvy+l+T3FRGRmBy3ucnMvunu\n7ycYlTS57V/c/a1V/PwbgFOADWZ2I0Ez1Z8DX8h3TD8FPODuE2b2eWAHwYS99e4+ZmZ3A/ea2XZg\nFPjQDH8/ERE5CdMtFf4tgs7jl5jZr8ve8/+q+eHu/gmCIbPlLgzZdxOwqWzbYeCyao4lEhetYyVZ\nMl1N4irgNII+iT8r2X4U2BtloURmk9m8wF+cCVPJMhnTLRU+SNBx/J74iiNS/05mHav7778/ghKJ\nRKeaIbAicpJmWw1i0kwTZk9PD/fccw8AV199NatXr46oZFIreuCuSAxyuRy5XO74O85ypUlBCSId\nVJMQkVhpZeB0UZIQkVhpdeB0UXOTiIhUpCQhIiIVKUmIiEhF6pMQEamB2TqxUDUJERGpSDUJEZEa\nmK0TC1WTEBFJQFomFqomISKSkDRMLFSSEBFJSBomFqq5SUREKlKSEBGRipQkRESkIiUJERGpSElC\nREQqUpIQEZGKlCRERKQiJQkREalISUJERCpSkhARkYqUJEREpCIlCRERqUhJQkREKlKSEBGRipQk\nRESkIiUJERGpSElCREQqUpIQEZGKlCRERKQiJQkREalobtIFkHjdcMMN9PX1xXKs/v5+ANauXRvL\n8RYvXsytt94ay7FEskJJImP6+vrY19fHRC4X+bEa5gYfr72jo9Ef6+DByI8hkkWRJwkzWwHc5u4X\nmdkrgM3AOPCku1+X32ctcA1wBLjZ3R80sxbgPmAJMAhc5e79J1IG3T1PNZHLMfTxj0dUomS0f+lL\nSRdBZFaKNEmY2V8CVwJD+U13AOvdfbuZ3W1m7wF+DKwDzgFagR1m9j3gWuAJd7/JzD4IbAA+cSLl\n6OvrY9+L/Uw0LzzJ3+j4GmgCYO+Bo9Efa+xA5McQkWyLuibxK+B9wDfy35/r7tvz8VbgHQS1ih3u\nfhQYNLNngOXASuCzJftuOJmCTDQvZOS1J/Uj6k7LE59JuggiMstFOrrJ3b8FlN5SN5TEB4EFQA4o\nvSUeAhaWbZ/cV0REYhR3x/V4SZwDBgj6GxaUbd+f354r21ekZtRXJXJ8cSeJfzOzC9z9R8DFwCPA\nLuBmM2sG5gNnA08CjwGXAI/n/90e/iNFTkxfXx99/X3MO7Ut8mM1NDcCcHD8cOTHGt1/KPJjSHbE\nnSQ+CWw0sybgKeABd58ws88DOwiao9a7+5iZ3Q3ca2bbgVHgQzGXVTJg3qltvOm/X5F0MWrq//zX\n+2b8HtWqpJLIk4S7/zvwR/n4GeDCkH02AZvKth0GLou6fCIyWat6kQUL5kV+rLlNQdfk2JHByI81\nOBj9HJ3ZTpPpRASABQvmse4v3pB0MWrqC3fsSroIqae1m0REpCIlCRERqUjNTSIiJdSJP5WShIhI\nib6+PvpffJFTmhuOv/NJamYCgGMHok9KA2MTJ/Q+JQkRkTKnNDdwyznzky5GTa3/txObo6M+CRER\nqUhJQkREKlKSEBGRipQkRESkIiUJERGpSKObMmZoaIiGkZFZ97jPhoMHGTpyJOliiMw6qkmIiEhF\nmahJDA0N0TA2Ouse99kwdoChoZmt2tne3s6hpiaGPv7xiEqVjPYvfYn2eTM7F0NDQ4yMjpzQ0tr1\nbGT/IRrmHUu6GDJLqCYhIiIVZaIm0d7ezqFjLYy8dkPSRamplic+Q3t7Jv4LI9He3s5Ea+OsfOhQ\n+5yZzRYeGhpiZGR01i2tPXhglJaWoaSLkWqqSYiISEW6DRUR2tvbaZ43PisfOtTc1J50MVJNNQkR\nEalINQkRkRJDQ0OMjk2c8Kqp9WpgbIJ5QzPvn1FNQkREKlJNQkSkRHt7O/OPjczK50k0ts+8f0Y1\nCRERqUg1iQxqOHgwlrWbGkZGAJhoaYn+WAcPwgxnXIvI8SlJZMzixYtjO1Z/vpNsURwX73nzYv3d\nRLJCSSJjbr311tiOtXbtWgA2btwY2zFnanT/oVjWbjpyaBSAprboE+bo/kPkFs2u9nRJTmaSRMPY\ngVgW+Gs4OgzAxNzW6I81dgBYFPlxZqtYa1Vjwecil4v+4p1bNP+EfrfBwXiW5Th8+CgA8+dHf/kZ\nHBxlsf5ETkomkkSsF4P+4JkGixbGcWoXqYnlJKhWVRTn5+jgYD8AzQsWRH6sxYvi/d1mo0wkCV0M\nRKanvxGpJBNJQkRkJgZimnE9fHQCgNa5DZEfa2Bs4oQap5UkRERKxNk8NdYfNL3lFkbfcbKIE/vd\nlCREREqo6W0qzbgWEZGKlCRERKQiJQkREalISUJERCqq645rM2sAvgQsB0aAj7r7r5MtlYhIdtR7\nTeK9wDx3/yPgBuCOhMsjIpIp9Z4kVgIPAbj7TuD1yRZHRCRb6rq5CVgAHCj5/qiZzXH38TgOvnnz\nZh599NEZvac/PzlmcvzzTJx//vmsWbNmxu+Lg86FSDbVe5IYBHIl3x8vQTQCvPDCCzU5+MDAACP5\nB+fM1Im8b2BggOeee+6Ejhc1nYuiBx54gJ/+9Kczes/+/fsBuPLKK2d8vHPPPZcPfOADM35fHOI8\nF/V8HiDd56LkmtlY/lrDxMREzQ5Ua2b2X4B3u/vVZvZGYIO7//E0+68EtsdWQBGR2eXN7r6jdEO9\n1yS+BbzdzCbbOT5ynP13AW8GngeORVkwEZFZpBE4k+AaOkVd1yRERCRZ9T66SUREEqQkISIiFSlJ\niIhIRUoSIiJSUb2PbkolM1sB3ObuFyVdlqSY2VzgHqADaAZudvd/TrRQCTGzOcBGwIBx4GPu/stk\nS5UcM1sCPA68zd2fTro8STKzn1KcMPysu3cnWZ4wShI1ZmZ/CVwJDCVdloRdAfS5+4fN7FTg50Am\nkwRwKTDh7ivNbBVwC8G6ZJmTv3n4MjCcdFmSZmbzANz9LUmXZTpqbqq9XwHvS7oQdeAfgA35eA5w\nJMGyJMrdvwNck/+2A9ifXGkSdztwN/AfSRekDiwH2szsYTP7Qb4Fou4oSdSYu38LOJp0OZLm7sPu\nfsjMcsA/Ap9OukxJcvdxM9sM/D2wJeHiJMLM1gD73P37QEPCxakHw8Dn3P2dwLXAlnzTZF2puwLJ\n7GFmLwUeAe519/+ddHmS5u5rgFcDXzWz+QkXJwkfIVhB4YfAHwJfz/dPZNXT5G8Y3P0ZoJ9g1nNd\nUZ9EdDJ9p2RmZwAPA9e5+w+TLk+SzOwK4Cx3v43g4VnHCDqwM8XdV03G+UTxp+6+L8EiJe1qoBO4\nzsxeQrCY6fPJFul3KUlEJ+vrndwAnAJsMLMbCc7Hxe4+mmyxEvFPwNfMbBvB39yfZ/Q8lMr63wfA\nJoLPxXaCm4ar43oMwkxo7SYREalIfRIiIlKRkoSIiFSkJCEiIhUpSYiISEVKEiIiUpGShIiIVKQk\nIVJjZvY3ZnZ+0uUQqQUlCZHaW0XwYHmR1NNkOpGTYGb/iWD9nVaCWbMPAp8iWF7hfcBi4O+A+cCp\nwKfc/Ztm9jVgEfCK/P4XAm8jWLKjx91vivc3EQmnmoTIyekG/tndzyO42B8CdgHd7r4buC4fvx74\nKHBjyXv73H0Z0EuwZMnrgPOBV5pZc5y/hEglWrtJ5OT8APimmZ0DfBf4IsFDhiYXeLwSeLeZXQa8\nEWgvee/O/L+/AYbNbEf+Z/y1u4/FUXiR41FNQuQkuPtjwO8DDwEfJHj6Xmkb7g7gDQSP67yZqasD\nH87/jGMECeSvgdOAH5vZKyMvvEgVlCREToKZfRb4sLt/A1gHnEPw0Km5+ce2vhK40d0fAt5JSIe2\nmf0hsA34kbt/CvglwfOwRRKnJCFycr4AvN/MfkawJPjHCJ6j8WWCC/1XgV/mH3i/GJiff+BQobbh\n7j8HHgN2m9njwLPA1lh/C5EKNLpJREQqUk1CREQqUpIQEZGKlCRERKQiJQkREalISUJERCpSkhAR\nkYqUJEREpCIlCRERqej/AwbzFS0Gcw8CAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.boxplot(x='stars',y='text length',data=yelp,palette='rainbow')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Create a countplot of the number of occurrences for each type of star rating.**" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAERCAYAAACO6FuTAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFgVJREFUeJzt3X2QX1Wd5/F3Hggk2In4kCCCsBMr39S6s2GDomhLUGGU\n7Fjg7A5MIQo+kCHFsuKWsBINrtREoVTcCbMTp6CFDLDlGkZhJRNgXCyme1I6gcXCDPglto66QKLE\nkJAHyOP+cW+WXzp9kk429/drut+vKop7zz3319/ugv70uefec8fs2bMHSZIGM7bTBUiShi9DQpJU\nZEhIkooMCUlSkSEhSSoyJCRJReOb/gIRMRV4BDgb2AXcDuwGVmfmFXWfy4B5wA5gUWYuj4hjgDuB\nqcAm4JLMXN90vZKklzU6koiI8cA3gK11003AgsycA4yNiPMiYhpwJXAG8AHgyxFxFDAfeDwzzwTu\nABY2WaskaX9NX276KrAEeAYYA8zOzN762ArgHOB0oC8zd2bmJmANMAvoBu5v6Xt2w7VKkgZoLCQi\n4lLgN5n5d1QBMfDrvQBMBrqAjS3tm4EpA9r39pUktVGTcxIfA3ZHxDlUI4O/Bl7fcrwLeJ5qvmHy\ngPYNdXvXgL4HFBFHA28DnqWa/5AkHdw44A3Aqsx8qfVAYyFRzzsAEBEPAZcDX4mIMzPz74FzgYeA\nVcCiiJgATARmAquBlcBcqknvuUAvB/e2IfaTJO3v3UBfa0PjdzcN8Bnglnpi+kng7szcExGL68LG\nUE1sb4+IJcDSiOgFXgIuGsLnPwtw1113cfzxxzfzHUjSCLN27Vo+/OEPQ/07tFVbQiIz39uye9Yg\nx3uAngFt24ALDvFL7QI4/vjjOfHEEw/xVEka9fa7TO/DdJKkIkNCklRkSEiSigwJSVKRISFJKjIk\nJElFhoQkqciQkCQVGRKSpCJDQpJUZEhIkooMCUlSkSEhSSoyJCRJRYaEJKnIkJAkFRkSkqQiQ0KS\nVGRISJKKDAlJUtH4Jj88IsYCtwAB7AYuByYA9wFP1d2WZOayiLgMmAfsABZl5vKIOAa4E5gKbAIu\nycz1TdYsSXpZoyEBfBDYk5ndETEH+BLwPeBrmfn1vZ0iYhpwJTAbmAT0RcSDwHzg8cy8PiIuBBYC\nVzVcsySp1mhIZOa9EfG9evcUYANwGhARcT7VaOLTwOlAX2buBDZFxBpgFtAN3Fifv4IqJCQdYbt2\n7aK/v7/TZTRi+vTpjBs3rtNlvGI1PZIgM3dHxO3A+cC/B94I3JKZj0XEtcAXgB8DG1tO2wxMAbpa\n2l8AJjddrzQa9ff388D/+gonvPE1nS7liHrm6d/xfq5mxowZnS7lFavxkADIzEsjYirwj8AZmfls\nfegeYDHwMPsGQBfVqGNTvb237fl21CuNRie88TWcfMrrO12GhplG726KiIsj4rP17otUk9ffiYi3\n1W3vAx4FVgHdETEhIqYAM4HVwEpgbt13LtDbZL2SpH01PZL4DnBbRDxcf61PAb8G/iIitgNrgXmZ\nuTkiFgN9wBhgQWZuj4glwNKI6AVeAi5quF5JUoumJ663AhcOcqh7kL49QM+Atm3ABc1UJ0k6GB+m\nkyQVGRKSpCJDQpJUZEhIkooMCUlSUVseppOkVwqXKNmXISFJLfr7+3n0hj/lTccd2+lSjqhfbdgC\nn/2rQ16ixJCQpAHedNyxTH+9S8WBcxKSpAMwJCRJRYaEJKnIkJAkFRkSkqQiQ0KSVGRISJKKDAlJ\nUpEhIUkqMiQkSUWGhCSpyJCQJBU1usBfRIwFbgEC2A1cDrwE3F7vr87MK+q+lwHzgB3AosxcHhHH\nAHcCU4FNwCWZub7JmiVJL2t6JPFBYE9mdgMLgS8BNwELMnMOMDYizouIacCVwBnAB4AvR8RRwHzg\n8cw8E7ij/gxJUps0GhKZeS/V6ADgZGADMDsze+u2FcA5wOlAX2buzMxNwBpgFtAN3N/S9+wm65Uk\n7avxOYnM3B0RtwOLgf8OjGk5/AIwGegCNra0bwamDGjf21eS1CZtmbjOzEuBGcCtwMSWQ13A81Tz\nDZMHtG+o27sG9JUktUmjIRERF0fEZ+vdF4FdwCMRMaduOxfoBVYB3RExISKmADOB1cBKYG7dd27d\nV5LUJk2/vvQ7wG0R8XD9tf4j8FPg1npi+kng7szcExGLgT6qy1ELMnN7RCwBlkZEL9VdURc1XK8k\nqUWjIZGZW4ELBzl01iB9e4CeAW3bgAsaKU6SdFA+TCdJKjIkJElFhoQkqciQkCQVGRKSpCJDQpJU\nZEhIkooMCUlSkSEhSSoyJCRJRYaEJKnIkJAkFRkSkqQiQ0KSVGRISJKKDAlJUpEhIUkqMiQkSUWG\nhCSpyJCQJBWNb+qDI2I88E3gFGACsAj4NXAf8FTdbUlmLouIy4B5wA5gUWYuj4hjgDuBqcAm4JLM\nXN9UvZKk/TUWEsDFwHOZ+dGIOA74MfBF4GuZ+fW9nSJiGnAlMBuYBPRFxIPAfODxzLw+Ii4EFgJX\nNVivJGmAJkPi28Cyenss1SjhNGBmRJxPNZr4NHA60JeZO4FNEbEGmAV0AzfW56+gCglJUhs1NieR\nmVszc0tEdFGFxeeBfwQ+k5lzgJ8DXwAmAxtbTt0MTAG6WtpfqPtJktqo0YnriDgJeAhYmpnfAu7J\nzMfqw/cAp1IFQWsAdAEbqOYhulranm+yVknS/hoLiXqu4QHgmsxcWjc/EBFvrbffBzwKrAK6I2JC\nREwBZgKrgZXA3LrvXKC3qVolSYNrck7iWuDVwMKIuA7YQzUH8V8jYjuwFpiXmZsjYjHQB4wBFmTm\n9ohYAiyNiF7gJeCiBmuVJA2isZDIzKsY/G6k7kH69gA9A9q2ARc0U50kaSh8mE6SVGRISJKKDAlJ\nUpEhIUkqMiQkSUWGhCSpyJCQJBUZEpKkIkNCklRkSEiSigwJSVKRISFJKjIkJElFQwqJiLh5kLal\ng/WVJI0cB1wqPCJuBX4PeGtEvKXl0FFUrxiVJI1gB3ufxJ8BpwB/DnyxpX0n8GRDNUmShokDhkRm\n/jPwz8CsiJhMNXoYUx9+FfC7JouTJHXWkN5MFxHXUr2OdH1L8x6qS1GSpBFqqK8v/SQwPTN/22Qx\nkqThZai3wP4KLy1J0qgz1JHEGqAvIn4AvLi3MTOvL50QEeOBb1JNfE8AFgFPALcDu4HVmXlF3fcy\nYB6wA1iUmcsj4hjgTmAqsAm4JDPXI0lqm6GOJJ4G7gdeopq43vvPgVwMPJeZZwIfAP4CuAlYkJlz\ngLERcV5ETAOuBM6o+305Io4C5gOP1+ffASw8pO9MkvT/bUgjicz84sF77efbwLJ6exzVbbOzM7O3\nblsB/AHVqKIvM3cCmyJiDTAL6AZubOlrSEhSmw317qbdVHcztXomM08qnZOZW+tzu6jC4nPAV1u6\nvABMBrqAjS3tm6lutW1t39tXktRGQ7rclJljM3NcZo4DjgH+hJdHCUURcRLwELA0M79FNWrYqwt4\nnmq+YfKA9g11e9eAvpKkNjrkBf4yc0dmLgPee6B+9VzDA8A1mbl3nafHIuLMevtcoBdYBXRHxISI\nmALMBFYDK4G5dd+5dV9JUhsN9XLTR1t2xwBvAbYf5LRrgVcDCyPiOqrLVZ8Cbq4npp8E7s7MPRGx\nGOirP3tBZm6PiCXA0ojopZowv+gQvi9J0hEw1Ftg39OyvQd4DrjwQCdk5lXAVYMcOmuQvj1Az4C2\nbcAFQ6xPktSAod7d9LH6r/+oz1ld340kSRrBhvo+idOoHqhbCtwG/Coi3t5kYZKkzhvq5abFwIWZ\n+SOAiHgHcDNwelOFSZI6b6h3N71qb0AAZOYPqW6FlSSNYEMNid9FxHl7dyLifPZdNlySNAIN9XLT\nPOC+iOihuk11D/DOxqqSJA0LQx1JnAtsBU6muh32twxyK6skaWQZakjMA96VmVsy83HgNKqVWyVJ\nI9hQQ+Io9n3Cejv7L/gnSRphhjoncQ/wUER8u97/I+DeZkqSJA0XQ10F9j9TPSsRwO8BizPT9ztI\n0gg31JEEmXk3cHeDtUiShplDXipckjR6GBKSpCJDQpJUZEhIkooMCUlSkSEhSSoyJCRJRYaEJKlo\nyA/THa76Nac3ZOZ7IuJU4D7gqfrwksxcFhGXUS0iuANYlJnLI+IY4E5gKrAJuCQzfYeFJLVRoyER\nEVcDHwE2102nAV/LzK+39JlGtaLsbGAS0BcRDwLzgccz8/qIuBBYCFzVZL2SpH01PZL4GfAh4I56\n/zRgRv1mu6eAT1O9J7svM3cCmyJiDTAL6AZurM9bQRUSkqQ2anROIjO/C+xsafoRcHVmzgF+DnwB\nmAxsbOmzGZgCdLW0v1D3kyS1Ubsnru/JzMf2bgOnUgVBawB0ARuo5iG6Wtqeb1eRkqRK4xPXAzwQ\nEf8hMx8B3gc8CqwCFkXEBGAiMBNYDawE5gKP1P/ubXOtGuF27dpFf39/p8toxPTp0xk3blyny9AI\n0O6QmA/cHBHbgbXAvMzcHBGLgT5gDLAgM7dHxBJgaUT0Ai8BF7W5Vo1w/f393PSTZbz25GmdLuWI\nWv/Ldfwn/pgZM2Z0uhSNAI2HRGb+Enhnvf0Y1YT0wD49QM+Atm3ABU3Xp9HttSdPY+qbT+x0GdKw\n5cN0kqQiQ0KSVGRISJKKDAlJUpEhIUkqMiQkSUWGhCSpyJCQJBUZEpKkIkNCklRkSEiSigwJSVKR\nISFJKjIkJElFhoQkqciQkCQVGRKSpCJDQpJUZEhIkooMCUlS0fimv0BEvB24ITPfExHTgduB3cDq\nzLyi7nMZMA/YASzKzOURcQxwJzAV2ARckpnrm65XkvSyRkcSEXE1cAtwdN10E7AgM+cAYyPivIiY\nBlwJnAF8APhyRBwFzAcez8wzgTuAhU3WKknaX9OXm34GfKhl/7TM7K23VwDnAKcDfZm5MzM3AWuA\nWUA3cH9L37MbrlWSNECjIZGZ3wV2tjSNadl+AZgMdAEbW9o3A1MGtO/tK0lqo3ZPXO9u2e4Cnqea\nb5g8oH1D3d41oK8kqY3aHRL/OyLOrLfPBXqBVUB3REyIiCnATGA1sBKYW/edW/eVJLVRu0PiM8D1\nEfEPwFHA3Zm5DlgM9AHfp5rY3g4sAf5VRPQCnwS+2OZaJWnUa/wW2Mz8JfDOensNcNYgfXqAngFt\n24ALmq5vtNm1axf9/f2dLqMR06dPZ9y4cZ0uQxpRGg8JDS/9/f388d/+LRNPOKHTpRxR2555hmVz\n5zJjxoxOlyKNKIbEKDTxhBOY9KY3dboMSa8ALsshSSoyJCRJRYaEJKnIkJAkFRkSkqQiQ0KSVGRI\nSJKKRsVzEj5lLEmHZ1SERH9/P3+yaCUTjzup06UcUds2/JpvfQ6fMpbUmFEREgATjzuJY1/3Lzpd\nhiS9ojgnIUkqMiQkSUWGhCSpyJCQJBUZEpKkIkNCklRkSEiSijrynEREPApsrHd/AXwJuB3YDazO\nzCvqfpcB84AdwKLMXN7+aiVp9Gp7SETE0QCZ+d6WtnuBBZnZGxFLIuI84IfAlcBsYBLQFxEPZuaO\ndtcsSaNVJ0YSs4BjI+IBYBzwOWB2ZvbWx1cAf0A1qujLzJ3ApohYA/xr4NEO1CxJo1In5iS2Al/J\nzPcD84G7gDEtx18AJgNdvHxJCmAzMKVdRUqSOhMST1EFA5m5BlgPTGs53gU8D2yiCouB7ZKkNulE\nSHwc+BpARJxAFQQPRsSc+vi5QC+wCuiOiAkRMQWYCazuQL2SNGp1Yk6iB7gtInqp5h0upRpN3BoR\nRwFPAndn5p6IWAz0UV2OWpCZ2ztQrySNWm0PifrupIsHOXTWIH17qEJFktQBPkwnSSoyJCRJRYaE\nJKnIkJAkFRkSkqQiQ0KSVGRISJKKDAlJUpEhIUkqMiQkSUWGhCSpyJCQJBUZEpKkIkNCklRkSEiS\nigwJSVKRISFJKjIkJElFhoQkqciQkCQVje90AQcSEWOAvwRmAS8Cn8zMn3e2KkkaPYb7SOJ84OjM\nfCdwLXBTh+uRpFFluIdEN3A/QGb+CHhrZ8uRpNFluIfEZGBjy/7OiBjuNUvSiDGs5ySATUBXy/7Y\nzNx9gP7jANauXbtP47p169jy7E/YtXX9ka+wg17c+Azr1k1g0qRJQz5n3bp1bPnpT9m1YUODlbXf\ni+vWse4Nbzjkn8X/yX62/HbjwTu/gmx4+jnWRRzyz+KnTzzDht9tabCy9lu3diNTj1t3yD+Lnzy7\nkee27miwsvZ7euNWfn/d4D+Llt+Z4wYeG7Nnz56GSzt8EfFHwB9m5scj4h3Awsz8twfo3w30tq1A\nSRpZ3p2Zfa0Nw30k8V3gnIj4h3r/Ywfpvwp4N/AssKvJwiRpBBkHvIHqd+g+hvVIQpLUWU4CS5KK\nDAlJUpEhIUkqMiQkSUXD/e6mV5yIeDtwQ2a+p9O1dFJEjAe+CZwCTAAWZeb3OlpUh9QPgN4CBLAb\nuDwzn+hsVZ0TEVOBR4CzM/OpTtfTSRHxKC8/MPyLzPxEJ+sZjCFxBEXE1cBHgM2drmUYuBh4LjM/\nGhHHAT8GRmVIAB8E9mRmd0TMAb5EtS7ZqFP/8fANYGuna+m0iDgaIDPf2+laDsTLTUfWz4APdbqI\nYeLbwMJ6eywwsh5fPQSZeS8wr949BRhZj7sfmq8CS4BnOl3IMDALODYiHoiI79dXIYYdQ+IIyszv\nAjs7XcdwkJlbM3NLRHQBy4DPdbqmTsrM3RFxO/DnwF0dLqcjIuJS4DeZ+XfAmA6XMxxsBb6Sme8H\n5gN3Dce16YZdQRo5IuIk4CFgaWb+j07X02mZeSkwA7g1IiZ2uJxO+BjVCgo/AE4F/rqenxitnqL+\ngyEz1wDrqZ56Hlack2jGqP8rKSKmAQ8AV2TmDzpdTydFxMXAiZl5A9XLs3ZRTWCPKpk5Z+92HRR/\nmpm/6WBJnfZx4PeBKyLiBKrFTJ/tbEn7MySa4Von1UuiXg0sjIjrqH4m52bmS50tqyO+A9wWEQ9T\n/T/3qVH6c2jl/yPQQ/XfRS/VHw0fP8gq1x3h2k2SpCLnJCRJRYaEJKnIkJAkFRkSkqQiQ0KSVGRI\nSJKKDAnpCIqI/xIR7+p0HdKRYkhIR9YcqpfKSyOCD9NJhyki3ki19s4kqidmlwPXUC2t8CHgdcCf\nAROB44BrMvNvIuI24LXA9Lr/WcDZVMt1/M/MvL6934lU5khCOnyfAL6XmadT/bLfAqwCPpGZ/wRc\nUW+/FfgkcF3Luc9l5luAn1AtV/JvgHcBb46ICe38JqQDce0m6fB9H/ibiJgN3Af8N6oXDO1d4PEj\nwB9GxAXAO4BXtZz7o/rfTwNbI6Kv/ozPZ+b2dhQvDYUjCekwZeZK4F8C9wMXUr15r/X6bR/wNqpX\ndS5i39WBt9WfsYsqQD4PvAb4YUS8ufHipSEyJKTDFBE3Ah/NzDuAK4HZVC+dGl+/svXNwHWZeT/w\nfgaZ0I6IU4GHgb/PzGuAJ6jehS0NC4aEdPhuBv5dRDxGtRz45VTv0PgG1S/6W4En6pfdvw6YWL9s\n6P+NNjLzx8BK4J8i4hHgF8CKtn4X0gF4d5MkqciRhCSpyJCQJBUZEpKkIkNCklRkSEiSigwJSVKR\nISFJKjIkJElF/xcZofnjpeOA5AAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.countplot(x='stars',data=yelp,palette='rainbow')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Use groupby to get the mean values of the numerical columns, you should be able to create this dataframe with the operation:**" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coolusefulfunnytext length
stars
10.5767691.6048061.056075826.515354
20.7195251.5631070.875944842.256742
30.7885011.3066390.694730758.498289
40.9546231.3959160.670448712.923142
50.9442611.3817800.608631624.999101
\n", "
" ], "text/plain": [ " cool useful funny text length\n", "stars \n", "1 0.576769 1.604806 1.056075 826.515354\n", "2 0.719525 1.563107 0.875944 842.256742\n", "3 0.788501 1.306639 0.694730 758.498289\n", "4 0.954623 1.395916 0.670448 712.923142\n", "5 0.944261 1.381780 0.608631 624.999101" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stars = yelp.groupby('stars').mean()\n", "stars" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Use the corr() method on that groupby dataframe to produce this dataframe:**" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coolusefulfunnytext length
cool1.000000-0.743329-0.944939-0.857664
useful-0.7433291.0000000.8945060.699881
funny-0.9449390.8945061.0000000.843461
text length-0.8576640.6998810.8434611.000000
\n", "
" ], "text/plain": [ " cool useful funny text length\n", "cool 1.000000 -0.743329 -0.944939 -0.857664\n", "useful -0.743329 1.000000 0.894506 0.699881\n", "funny -0.944939 0.894506 1.000000 0.843461\n", "text length -0.857664 0.699881 0.843461 1.000000" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stars.corr()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Then use seaborn to create a heatmap based off that .corr() dataframe:**" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWEAAAD9CAYAAABtLMZbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl4XHW5wPHvmSX71iZ0SVK60PJ2k1aWglLZhCKLioqy\nqCACIqAIKIvcWwRbrooriLKDRe69elUWBUQeWSy7UJaWlr7dIC1dQ9oknWSyzHL/ONM06ZZJOjNn\nZvp+nmeeyVnmzHsmM++88zu/8ztOPB7HGGOMN3xeB2CMMfsyS8LGGOMhS8LGGOMhS8LGGOMhS8LG\nGOMhS8LGGOOhQDo3/nhQrP9bwkPXv+B1CFmjYUmD1yFkjfLqSq9DyBoP3zbB2dttDCTnnNKte/18\nqZDWJGyMMZnkBLMirw6IJWFjTN7wBSwJG2OMZ5xg7h3msiRsjMkb/mJLwsYY4xlrjjDGGA/ZgTlj\njPGQVcLGGOMhx29J2BhjPOOzJGyMMd5xfJaEjTHGM/4Cv9chDJglYWNM3rBK2BhjPGRtwsYY4yHr\nHWGMMR5yfHbasjHGeMbahI0xxkPWJmyMMR7yBVLTRU1EHOC3wDSgA7hAVVf1Wv5l4EogAtyvqncM\n9rlyrwHFGGN2w/E5Sd/6cRpQqKofB74P/GKH5T8FjgNmAt8VkUFfp8qSsDEmb/j8TtK3fswEngRQ\n1VeBQ3dY/jYwBChOTA/6epp7bI4QkZd3sXEHiCe+IYwxJmuk8MBcBdDSazoiIj5VjSWmFwMLgBDw\nkKq2DvaJ+msTPnOwGzbGmExLYRe1VqC813RPAhaRjwCnAKOBNuC/ReQLqvqXwTzRHiNW1QZVbQCi\nwM+AJ4Bf4VbDxhiTVVLYJvwicDKAiBwBLOq1rAVoBzpVNQ5swm2aGJRke0fcDdwOzAeOAe4FPjnY\nJ82kqhkHMfGm7/HKCed4HUraNX3wAmsWzcPxBRg+7mRGTPh0n+WrXr+Vti3LAYeucBOBgnKmfWr7\nQd0Vr9xMoLCSMR+9KMORp164+Q1aNzwMjp/S6qMpqzm2z/JIZyObG9x99xfUMHT/83F8BT3LN6++\nF5+/jKq6MzIadzpsbXyVxlV/wHECVNUez5D6E/ss7+5oZO07PwfAHyynbupV+PwFhFuWsXHZvQAE\nCodQN/W7OL5gxuMfiBQ2RzwMnCAiLyamzxORs4BSVb1HRO4CXhCRTmAl8LvBPlGySbhIVf+a+PsR\nEblysE+YSeOuPJ+6r3yWaKjN61DSLh6L8N6C25h+0r34AoUs/MfFVI+aSbBo+xf0uEMv61l34VOX\nMuGIa3qWrV/2CG0t71E5bHrGY0+1eDxK89oHGT5xLo6vgE16I8WVh+APVvSs07z2fyirOZ6SoR8j\n9OFzbN34BBUjTwMg1Pg03eE1FJZN8moXUiYei7JR72HsEbfg8xXw3mtXUT7sCAIF2w/mNzU8QsXw\noxg66mQ2rfg9zeueYuioU1n/7m3UH3QdBSUj2LL2KbrCmygsrfNwb/qXqi5qiQr34h1mL+u1/E7g\nzlQ8V7INKIFEO8i29pBBHwnMpLaVDSw4/VKvw8iI9pYGisvrCRSU4vMFqNjvIFo2vb3Lddct/TNV\nI2dQUjUWgNbGdwg1LWXEhM9kMuS06e5YS6BwBD5/CY4ToKBM6Awt3WmdosppABSWHUhnm/v56gwt\np6t9FWU1OfFDr1+dbWsoKKnFHyjB8QUoqZpM+5Z3+qxTVD6OaCQEQCzSjuME6Gxbiz9YTtPqh3n/\n9WuJdoeyPgFDSntHZEyySfgy4D4R+QC3KeI76QspdTY++k/ikajXYWREpDuEP1jaM+0PlhDpCu20\nXiwWYcOKv1I3+SwAusJNrF54HwfMuCJHvlr7F4+G8flLeqZ9viJi0fY+6xQUjyHcvACAcPMC4rFO\not3NtG54iCGjziVfXoxYpA1fYPv7whcoIRrp+8swWFTDltWPsfKlSwg1LaBi+Eyi3a20Ny9l6KjP\nMPrgm2jb/BZtmxdmOvwBS2GbcMYk1Ryhqm+KyKeAA4BVqvphesMyyWp4625aGxfS1ryK8urJPfOj\n3e0ECsp3Wr95/WtUDptOIOgmqQ8bniXS2criZ66iK9xELNpJceX+DB93Usb2IVVa1v2JzpDSHV5D\nQen4nvmxWAdBf2mfdavqz2bLmt/R1jSfosrp+ALltG/5N9FIiMYVPyXa3Uw83kWwqJbS6k9kelf2\n2qYVv6e9eQmdoQaKKw/smR+LtOMPlPVZd+Oy+6ideiVl1dPZ2vgaa9/5OcMPPJ+CkpE91W9Z9SGE\nW1dQOvSgjO7HQOXtAD4i8iVgDrAE+IiI3KCqD6Y1slRysudbL9VGT78QcNt53/jbV4l0bcXnL6J1\n01vUJ6rd3po3vM6QuiN6pmsnnk7txNMB2Ljy74RbV+dkAgaorP0i4LYJb1hyDbFIG46vkM7QUiqG\nn9Jn3Y7WRVTWnkmwaARbNz5BUflUyvY7nvJhswBoa5pPd8f6nEzAAMPGfxVw24RXvnwJ0e4QPn8h\n7VsWUz3mC33W9QfL8Qfccw4ChUOJRtooKB5BLNpBV/sGCkpG0N68mKq6WRnfj4HKpgo3WckemLsC\nOERVQyJSDjwD5E4SjufHT8s9cXwBxh7ybd55+kqIxxk+/tMUlNQQ6Wxl+as3M+mouQCEW9fkbJJN\nluP4qar/Mo0rfkwcKKs+Bn9wCLFIG5tX303NuMsJFNXS9P5vcJwgweI6how6z+uw08Lx+Rl+4AU0\nvDEbgKq6WQQLhxLtDrFuya2MmnYdI+Qi1uvtEHfPQxg58Zs4vgC1ky/jg0U3A1BSNYnymh1PGss+\nuZiEnXgSCUpEXlTVI3tNP6+q/ZYIjwcl/7Nfkh66/gWvQ8gaDUsavA4ha5RXD3rIgbzz8G0T9jqD\nrrnkC0nnnFG//UtWZOxkK+FVIvJz3H7Cn8DtF2eMMVklb9uEcfvDHQ2cAJwFnLjn1Y0xxgM5ePwn\n2a+NXwJ/UNVvAYex87BuxhjjuVzsopZsEu5W1ZUAiYGNY/2sb4wxGef4fEnfskWyzRENIvJfwMvA\nDGBt+kIyxpjByaYKN1nJfh2chztS0MlAI/D1tEVkjDGDlLeVsKp24A5haYwxWcsXyJ7kmiy70Kcx\nJn9kUYWbLEvCxpi84eRgFzVLwsaYvJFNbb3JsiRsjMkbudg7wpKwMSZ/WCVsjDHesUrYGGM85PhT\nc425TLIkbIzJG3ZgzhhjPGTNEcYY4yXHKmFjjPGMVcLGGOMlaxM2xhjv2GnLxhjjISdgXdSMMcY7\ndmDOGGM8ZAfmjDHGO45Vwn09dP0L6dx8Tvn8D2d6HULWqD640usQskZ9zRivQ8gi/7f3m7BK2Bhj\nvGOnLRtjjJesi5oxxnjIRlEzxhjvWHOEMcZ4yXpHGGOMh6x3hDHGeCdV/YRFxAF+C0wDOoALVHXV\nLta7E2hS1esG+1y5V7sbY8zu+Jzkb3t2GlCoqh8Hvg/8YscVROQiYOpeh7y3GzDGmKzh8yd/27OZ\nwJMAqvoqcGjvhSLyMeAw4M69DnlvN2CMMVnD50v+tmcVQEuv6YiI+ABEZATwA+BbwF43QlubsDEm\nf6Sud0QrUN5r2qeqscTfXwSqgSeAkUCxiCxV1QcG80SWhI0x+SN1vSNeBE4F/iwiRwCLti1Q1V8D\nvwYQkXMBGWwCBkvCxph8krpK+GHgBBF5MTF9noicBZSq6j2pehKwJGyMyScpGjtCVePAxTvMXraL\n9ebt7XNZEjbG5I98O21ZRGbtbpmqPpX6cIwxZi/03/Us6/RXCZ+1m/lxwJKwMSa75FslrKrnZSoQ\nY4zZa/k6nrCIrMetfh1gKLBKVSelMzBjjBmwfB1FTVVHbvtbREYDN6QrIGOMGbQcrIQH/LWhqg3A\nxDTEYowxeyd1py1nTLLNEf+L2xwB7ml6G9MWkTHGDFI8Byvh/rqoHaWq84F5QDgxuwN4Pd2BJavp\ngxdYs2geji/A8HEnM2LCp/ssX/X6rbRtWQ44dIWbCBSUM+1Td/QsX/HKzQQKKxnz0YsyHHnmVc04\niIk3fY9XTjjH61DSbuzV11AyYQKxri5W3TSXznXrepZVn3giI88+GyJRNj32GJsefggnEOCA2ddT\nWFdHNBTivZ/eTOfatR7uQerF43FueXctK7eGKfD5+O6UempLCgHY3NnN3IUNODjEibNyawcXHjiS\nU+urPY56gHy5d+pDfxHfKiJHAtcCJ7B9xCA/EE1nYMmIxyK8t+A2pp90L75AIQv/cTHVo2YSLBrS\ns864Qy/rWXfhU5cy4YhrepatX/YIbS3vUTlsesZjz7RxV55P3Vc+SzTU5nUoaTfk6GNwCoIsvvAC\nyqZMYfTlV7Ds6qt6lo/+9mW8fcaXiHV0cNAf/0jTU/+g5qSTiLa3s/iC8ykatT9jr7qapZd/x8O9\nSL0XN7XSHYvx68Mn8G5zG7frOuZ8dCwAQwuD/OKw8QAsaW7j/hUbOKVuqJfhDkouVsL9NYz8A1gI\nHA5or9vSNMeVlPaWBorL6wkUlOLzBajY7yBaNr29y3XXLf0zVSNnUFLlvulaG98h1LSUERM+k8mQ\nPdO2soEFp1/qdRgZUT5tGs0vvwJAaPFiSif1PYTRvnw5gYoKfEVFPfOKx46l+eWXAOhYs5riMWMy\nFm+mLGpu47CaCgAmVZWyrDW8y/VuW7qWyyfX4+RgQsPxJX/LEv31E74GuEZEZqvqnAzFlLRIdwh/\nsLRn2h8sIdIV2mm9WCzChhV/ZdpJ7rgbXeEmVi+8j8nH/IjG95/JWLxe2vjoPynev9brMDLCX1pK\nNNTrfRCJukfN4+5hjfb3VjF13gPE2tvZ/NxzRNvaaF+2jKojZ7Jl/nzKpk4luN9+HkWfPu2RKKWB\n7cnH70AsHsfXK9m+tKmFMWVF1CWaKXJODn5xJNuAcr+IPAgMA/4ELEyMNu+JhrfuprVxIW3Nqyiv\nntwzP9rdTqCgfKf1m9e/RuWw6QSCJQB82PAskc5WFj9zFV3hJmLRToor92f4uJMytg8mfaJtbfhL\nS7bP8Pl6EnDxAQcw5ONH8uZnP0MsHGb8D+cw9Nhj2fS3vzF6zFgm33EnWxe+TdvSdz2KPn1KAn7a\nI7Ge6Tj0ScAAT6/fwudH5/AXUBb1ekhWskn4TuDnwGxg24G6I9IVVH9GT78QcNt53/jbV4l0bcXn\nL6J101vUT975TOvmDa8zpG57uLUTT6d24ukAbFz5d8Ktq/edBJyDlcJAbV34NkNmzmTzM89QNnUq\n7StX9CyLhkJEOzuId3UB0L1lM/7yCsomT6bl9ddouOVXlE6cSOGIkbvbfM6aWlXCK42tHD2iiiXN\nbYwtK9ppHW0NM6WqdBePzg252CacbBIuVtVnROQ/VVVFpCOtUSXJ8QUYe8i3eefpKyEeZ/j4T1NQ\nUkOks5Xlr97MpKPmAhBuXbPvJNn+xOP9r5Pjtjz3HFUzDmfKXXcDsHLOHKpnzcJXVEzjXx9l0yOP\nMOWuu4l1d9Ox9gMaH3+MQGkZ4+deRN3XziOytZVVc2/yeC9Sb+awShY0hbjs38sBuGrK/jyzfgvh\naIxT6qtp6YpQGsi9AXD6yKK23mQ58SQ+lCLyBHALcB1wDXCDqn6qv8edP6cx/z/xSfr8D2d6HULW\nqD640usQskb9jDFeh5A16n/9f3tdxoZe/VvSOafs8E9nRdmcbCX8DeBnQA3wPXYe7NgYYzwXz8FK\nONmxIz4QkS/j9hP+GJBfvdiNMfkhX9uEReRXwLvAaOBg3NOWz01jXMYYM3A5WAknG/Fhqnon8LFE\nW3B9GmMyxphBiTtO0rdskWybsF9EDgHeF5ECYOfOuMYY47UcrISTTcLzgN8C5wE/Ae7Y8+rGGJN5\ncbKnwk1Wskn46sT9Y7gH544F7ktLRMYYM0jxPBxFbZttI6A4wCHA6ekJxxhjBi+b2nqTlWwXtc5e\nky+KyI/SFI8xxgxa3vYTTiTd3lfWiO1hdWOM8Ua+VsL0HT/4beDJNMRijDF7JW8rYVWdl+5AjDFm\nb+Vz7whjjMl6cV/ujQJnSdgYkzfiSZ8EnD0sCRtj8kbedlEzxphckLcH5owxJhfYgTljjPGQVcLG\nGOMhaxM2xhgPxRzromaMMZ5JVZuwiDi4w/dOAzqAC1R1Va/lnwZmA93A/ap6z2CfK/caUIwxZjfi\nji/pWz9OAwpV9ePA94FfbFsgIoHE9PHAMcA3RGS/wcZsSdgYkzfiOEnf+jGTxBg5qvoqcGivZZOA\n5araqqrdwAvAUYON2ZKwMSZvpLASrgBaek1HRMS3m2VbgcrBxmxtwsaYvJHCfsKt9L2Wpk9VY72W\nVfRaVg40D/aJ0pqEG5Y0pHPzOaX64EF/Ueadpjda+l9pH1E8ZIPXIWSNVFzCPYVd1F4ETgX+LCJH\nAIt6LXsXGC8iVUA7blPETwf7RFYJG2PyRiyesi5qDwMniMiLienzROQsoFRV7xGRK4GncC/5do+q\nrh/sE1kSNsbkjVQ1R6hqHLh4h9nLei1/HHg8Fc9lSdgYkzds7AhjjPGQJWFjjPGQJWFjjPFQPG5J\n2BhjPGOVsDHGeCiWgycBWxI2xuQNa44wxhgPxaw5whhjvGNtwsYY4yFrjjDGGA9ZJWyMMR6yStgY\nYzwUi1sXNWOM8Uys/1WyTr9fGyLyvb25iJ0xxmRKPO4kfcsWyVTCIeBhEdkA3As8mRhr0xhjskou\nHpjrtxJW1TtUdSbwA+CrQIOI3CAiQ9IenTHGDEBeVsKJ6yidCZyDezG77wB+4DHgyLRGZ4wxA5CL\nlXAyzRGvAQ8CZ6rq6m0zReSjaYvKGGMGIZpFFW6ykknCB+6qDVhV/yMN8RhjzKBlUzNDspJJwteK\nyDW4l3Z2gLiq1qY3rOSFm9+gdcPD4PgprT6asppj+yyPdDayueEOAPwFNQzd/3wcX0HP8s2r78Xn\nL6Oq7oyMxp0OY6++hpIJE4h1dbHqprl0rlvXs6z6xBMZefbZEImy6bHH2PTwQziBAAfMvp7Cujqi\noRDv/fRmOteu9XAPMqdqxkFMvOl7vHLCOV6HklbxeJx7mrbS0BUh6MA3ayoYHtz+sX8+FOaxlnb8\nDhxTVsysipKeZS3RGNeubWL2yCHUBnOjN2s8B7sMJPPKngnUqmp7uoMZqHg8SvPaBxk+cS6Or4BN\neiPFlYfgD1b0rNO89n8oqzmekqEfI/Thc2zd+AQVI08DINT4NN3hNRSWTfJqF1JmyNHH4BQEWXzh\nBZRNmcLoy69g2dVX9Swf/e3LePuMLxHr6OCgP/6Rpqf+Qc1JJxFtb2fxBedTNGp/xl51NUsv/46H\ne5EZ4648n7qvfJZoqM3rUNLutfZOuuNx5tYOZXlHN/M2h7h6eFXP8gc3h/hlfTUFjsOVHzQxs6yI\nEp+PaDzO3R+2UujLrcoyF0dRS+b0kveAcLoDGYzujrUECkfg85fgOAEKyoTO0NKd1imqnAZAYdmB\ndLa5V63uDC2nq30VZTWfzHjc6VA+bRrNL78CQGjxYkonTeyzvH35cgIVFfiKinrmFY8dS/PLLwHQ\nsWY1xWPGZCxeL7WtbGDB6Zd6HUZGLO3oZnpxIQATioKs6uzus3x0QYBQNE7XDiXk7zeHmFVRzBB/\nbp2Blpe9I4ACYJGILEpMx1X17DTGlLR4NIzPv/3nk89XRCzat2AvKB5DuHkBpdWfINy8gHisk2h3\nM60bHqJm3BW0b3kl02Gnhb+0lGgotH1GJAqO0/P7rP29VUyd9wCx9nY2P/cc0bY22pcto+rImWyZ\nP5+yqVMJ7rdvnJOz8dF/Urx/1rSopVV7LE5Jr2rW70AsHsfnuPPqgwGuXddEkeMwo9Stgp/bGqbC\n7+Og4kIebs6tXwv52hzxk7RHMUAt6/5EZ0jpDq+hoHR8z/xYrIOgv7TPulX1Z7Nlze9oa5pPUeV0\nfIFy2rf8m2gkROOKnxLtbiYe7yJYVEtp9ScyvSspE21rw1+6/QsJn6/nHVl8wAEM+fiRvPnZzxAL\nhxn/wzkMPfZYNv3tb4weM5bJd9zJ1oVv07b0XY+iN+lS4nPo6JWZYnF6EvDqrm7eDHfy21E1FDoO\ntza28kpbB8+GwviAReFO3u+K8JvGVq4eXkVlDlTF+dpF7Q3gJKCovxUzpbL2i4DbJrxhyTXEIm04\nvkI6Q0upGH5Kn3U7WhdRWXsmwaIRbN34BEXlUynb73jKh80CoK1pPt0d63M6AQNsXfg2Q2bOZPMz\nz1A2dSrtK1f0LIuGQkQ7O4h3dQHQvWUz/vIKyiZPpuX112i45VeUTpxI4YiRXoXvDSf3PrADJUVB\nFrR3ckRpEcs6uti/YPtHvsTno8BxCDgOjuNQ6ffRFotx48ihPevcuH4zF9ZU5EQCBojGcu9/mkwS\nfhRYB6xJTGdNwe84fqrqv0zjih8TB8qqj8EfHEIs0sbm1XdTM+5yAkW1NL3/GxwnSLC4jiGjzvM6\n7LTY8txzVM04nCl33Q3AyjlzqJ41C19RMY1/fZRNjzzClLvuJtbdTcfaD2h8/DECpWWMn3sRdV87\nj8jWVlbNvcnjvciwXPztOkAzSgpZGO5i9rrNAFy8XwUvhMJ0xuN8sryE48uLuX79ZoI4DA/6Oaas\nzOOI904u/kudeD9Ri8hzqnrMYDZ+/Fmv5+BLkh5zV13idQhZo+mNFq9DyBr1nxzudQhZY9qT8/e6\njH3sjUjSOefUgwNZUTYnUwkvFJHDgbdIVMGq2pXWqIwxZhBysRJOJgkfDXy613QcGJeecIwxZvCy\nqetZsvpNwqo6LROBGGPM3orlYyUsIs+yw8E4VT0ubREZY8wg5WtzxDcT9w5wCDA9feEYY8zg5eUo\naqqqvSaXisj5aYzHGGMGLa8qYRGpVNUWEflGr9m1QG53JDTG5K28SsLA48BM4GDckzXAHc7yi+kO\nyhhjBiOWZ80R3SLyGjAB6D2owGnAx9MalTHGDEK+VcLHA3XA7YCd7mWMyXrpTMIiUoR7qbdhQCtw\nrqo27WI9B7cl4RFVvau/7e42CatqFFgNnLK7dYwxJpukuZ/wxcBCVf2hiJwBzAYu38V6c4GqXczf\npdwYGskYY5IQizlJ3wZhJvBk4u+/47YW9CEiXwCivdbrV25cOMoYY5KQqkpYRL4OXMH2E9UcYAOw\nbfSprUDFDo+ZApwNnA5cn+xzWRI2xuSNVLUJq+p9wH2954nIX4DyxGQ50LzDw87B7cb7DDAG6BSR\n91X1qT09lyVhY0zeSHPviBeBk4HXE/fP916oqtds+1tEfgCs7y8BgyVhY0weSfOBuduBeSLyPNCJ\n2/SAiFwBLFfVxwazUUvCxpi8kc5KWFXDwJd2Mf+Xu5h3Y7LbtSRsjMkbsZjXEQycJWFjTN6wJGyM\nMR7Ky0HdjTEmV/R34eK+smOwH0vCxpi8kW8D+BhjTE6xNmFjjPGQVcI7KK+uTOfmc0p9zRivQ8ga\nxUM2eB1C1vjg6Y1eh5A1UnFZ96hVwsYY4534gLpH2IE5Y4xJKeuiZowxHrI2YWOM8VAsB0thS8LG\nmLxhlbAxxngoL5OwiJyCe7Xl4m3zVPW4dAZljDGDEY3mXhZOphKeg3utJevcaYzJagMbOyI7JJOE\nN6vqv9IeiTHG7KW8Om1ZRL6R+LNLRO4CFpC48qiq3pWB2IwxZkDyrRIembh/NXE/InGfe3tpjNkn\n5GAPtd0n4W3XSBKR/1TVudvmi8iPMhGYMcYM1MBOW84Oe2qOOB+4AJgkIicnZvuBIPD9DMRmjDED\nkoOtEXtsjngQeBq4DrgpMS8GbEp3UMYYMxjRHBxGbU/NEZ3A+yLyInB0r0XdIrJGVV9Ie3TGGDMA\n8dzLwUl1UTsDKAVeAmYARUBURBao6hXpDM4YYwYiloPtEb4k1gkCx6rq94ETgK2qehRweFojM8aY\nAYrH40nfskUylXA1biLuTNwPTcwvTFdQxhgzGPk6itpvgIUishiYCNwsItcBT6Y1MmOMGaAsKnCT\n1m8SVtV7ReQRYDywQlWbRMSvqtH0h2eMMcnLq37C24jIdOAbuAfkEBFU9evpDswYYwYqr7qo9fI7\n4DZgTXpDMcaYvZOXlTCwQVXvSXskxhizl3IwByeVhN8XkWuBN9k+itpTaY1qALY2vkrjqj/gOAGq\nao9nSP2JfZZ3dzSy9p2fA+APllM39Sp8/gLCLcvYuOxeAAKFQ6ib+l0cXzDj8adDPB7nlnfXsnJr\nmAKfj+9Oqae2xO3Msrmzm7kLG3BwiBNn5dYOLjxwJKfWV3scderE43HuadpKQ1eEoAPfrKlgeHD7\nW/35UJjHWtrxO3BMWTGzKkp6lrVEY1y7tonZI4dQG9w3LjxTNeMgJt70PV454RyvQ9lr+VoJFwKS\nuIGbiLMiCcdjUTbqPYw94hZ8vgLee+0qyocdQaCgsmedpoZHqBh+FENHncymFb+ned1TDB11Kuvf\nvY36g66joGQEW9Y+RVd4E4WldR7uTeq8uKmV7liMXx8+gXeb27hd1zHno2MBGFoY5BeHjQdgSXMb\n96/YwCl1Q/e0uZzzWnsn3fE4c2uHsryjm3mbQ1w9vKpn+YObQ/yyvpoCx+HKD5qYWVZEic9HNB7n\n7g9bKfQ5HkafWeOuPJ+6r3yWaKjN61BSIpv6/yar35M1VPU84EfAn4DZuIP6ZIXOtjUUlNTiD5Tg\n+AKUVE2mfcs7fdYpKh9HNBICIBZpx3ECdLatxR8sp2n1w7z/+rVEu0N5k4ABFjW3cVhNBQCTqkpZ\n1hre5Xq3LV3L5ZPrcZz8SjpLO7qZXuxW/hOKgqzq7O6zfHRBgFA0TtcOH9jfbw4xq6KYIf5kzmHK\nD20rG1hw+qVeh5EysVg86Vu2SKZ3xLeAz+GepPE7YALwrfSGlZxYpA1foLRn2hcoIRrp+40eLKph\n0/J5tK5/jng8wn4HfJnOtjW0Ny9lxMRLKCgeweq3bqS4YjylQw/K9C6kRXskSmlgeyLxO+7pnL5e\nyfalTS2MKSuiriT/zrlpj8Up6VXN7rj/9cEA165roshxmFHqVsHPbQ1T4fdxUHEhDzfnR1WYjI2P\n/pPi/WtZ+XYPAAAKEElEQVS9DiNlcrESTqY54kzgKOBpVb1FRF5Lc0z92rTi97Q3L6Ez1EBx5YE9\n82ORdvyBsj7rblx2H7VTr6SsejpbG19j7Ts/Z/iB51NQMrKn+i2rPoRw64q8ScIlAT/tke1ddeLQ\nJwEDPL1+C58fvV+GI8uMEp9DR68PYyy+ff9Xd3XzZriT346qodBxuLWxlVfaOng2FMYHLAp38n5X\nhN80tnL18Coq96GqOB/EIvnZRc2H+zne9q7uTF84yRk2/quA2ya88uVLiHaH8PkLad+ymOoxX+iz\nrj9Yjj/gXig6UDiUaKSNguIRxKIddLVvoKBkBO3Ni6mqm5Xx/UiXqVUlvNLYytEjqljS3MbYsqKd\n1tHWMFOqSnfx6NwnRUEWtHdyRGkRyzq62L9g+9u8xOejwHEIOA6O41Dp99EWi3HjyO3t4jeu38yF\nNRX7VgLOkyapXBzAJ5kk/D/AfGC0iDwBPJLekJLn+PwMP/ACGt6YDUBV3SyChUOJdodYt+RWRk27\njhFyEev19p4x7kZO/CaOL0Dt5Mv4YNHNAJRUTaK85lDP9iPVZg6rZEFTiMv+vRyAq6bszzPrtxCO\nxjilvpqWrgilAb/HUabPjJJCFoa7mL1uMwAX71fBC6EwnfE4nywv4fjyYq5fv5kgDsODfo4pK+tn\ni/uAHExeu5LO3hEiUoQ7zvowoBU4V1Wbdljnu8BZQBT4kar2my+dZNpQRGQSMBVQVV2YbNCf+9by\n/PjPpsCvnf/wOoSs0bR8g9chZI0Pnt7odQhZ45Ru3ety/JzZ65POOQ/MGTmg5xORK4ByVf2hiJwB\nfExVL++1vBJYCIwDyoG3VHVMf9vd0+WNfsTOF/X8qIicqarXDSR4Y4zJhDT3epgJ/CTx999xe4v1\n1ga8j5uAy3Cr4X7tqTli6cDiM8YYb6WqOUJEvg5cwfZC1AE2AC2J6a1AxS4e+gGwBPdYWlIXRd7T\n5Y3mJRmvMcZkhVR1UVPV+4D7es8Tkb/gVrkk7pt3eNhJwAhgNG7SfkpEXlTV1/f0XPvGeZnGmH1C\nNJLWEXZfBE4GXk/cP7/D8i1AWFW7AUSkGaiiH8mcrBFQ1Uiv6SpV3fEbwBhjPJfmkzVuB+aJyPO4\nXXXPhp4DdstV9TEReV1EXsFtD35BVf/Z30b3dGBuBG6bxwMi8lXc8toHPIB7wU9jjMkq6eyipqph\n4Eu7mP/LXn/fANwwkO3uqRI+AvgO7sA9d+Im4Rjwj4E8gTHGZEpejaKW6GT8iIh8RlX/um2+iJTv\n7jHGGOOlWDz3TltO5rzM74rISAARORx4Ob0hGWPM4MRj8aRv2SKZ3hE3Ak+IyL+AQ4HT0xuSMcYM\nTixPrzG3GNgEnIDbHrwyrREZY8wgxWK5l4STaY54Hvitqk4B1mHNEcaYLJWvzRHHqeoHAKr6MxF5\nNs0xGWPMoMRz8MBcMkm4UkT+FxiCO4zbO/2sb4wxnsimCjdZyTRH3AqcBzQC9zLAjsjGGJMpudgc\nkdSlA1R1BRBX1Ubc0YOMMSbrxOKxpG/ZIpnmiM0ichFQKiJnsvPIQcYYkxVi6R3AJy2SqYTPB8YC\nH+L2E/56WiMyxphBysXmiGQq4ctU9dptE4krbnw/fSEZY8zg5FXvCBE5H7gAmCQiJydm+4ACLAkb\nY7JQmi9vlBZ7qoQfBJ4GrgNuSsyL4Z49Z4wxWSeeg2fM7WkUtU7ci9Z9I2PRGGPMXsimtt5k2eWN\njDF5I6/ahI0xJtfkYhc1J83XZDLGGLMHSZ0xZ4wxJj0sCRtjjIcsCRtjjIcsCRtjjIcsCRtjjIcs\nCRtjjIf26SQsIqNFZJ+9Zp6I/ERE3hKRo3az/H4RmZXpuAZDRPwi8qyIvCAilV7Hk2oiUpgYz2Wg\njztNREbsMO/cxEBcKY1LRH4gInaG7QDt00k4YV/uKH06cKSqzvc6kBSoA8pUdaaqtngdTBqMxB1Q\na6C+A1TsYn6q3veDjcsk5PwZcyJSBNwPjAaCwBXARcA43C+ZX6rq/4nIR3Ev1RQBOoALvYk4dUTk\nXGCiqn5fRAqBpcDNwLlAFHhNVS8XkXrgLqAICOO+PucBtcDjIvJj4FxVPSux3fWqOjLze7RXbgcm\niMgdwBuqepeICHCHqh4rIm8D/wIOwh2I6rPAwcA1QBfumNl/AH4MLAMOU9VmEfkmbnL/WeZ3qY/r\ncEc0/E/c9/G9wNDEssuAFuAZ4BPAFOAHwM+B6cADIjJTVSM7blREvgWcjfua/EFVbxOR+4FOYAww\nAviaqr6VqHgvBZqAbuCPwJG94gI4TUS+lIhttqo+ntqXIf/kQyX8TeA9Vf04cCZwNLBJVY8ETgDm\niEg1bhK6RFWPxf3A/tKrgFNsx4rma8Clif1/V0T8wM+AW1T1ONwP5o9VdQ6wHvc1Cu+wnVz8dXAJ\n8C6wbof52/alAvhvVT0msc5Jifn7A58DPgZco6px3BEEz0ws/wowL31hJ+0mYImqzsVNyP9U1U/i\nfqHekbgi+lXAA7j/4zNV9a/Am8BXd5OAJwFn4CbSo4DPiciBicXvq+qngNuAbyQ+Q1fjvk4nAqW4\nr23vuAA+UNXjcYuhS1L9IuSjfEjCArwMoKorcX8ezU9Mh4AlwAHASFVdlHjMfGBy5kNNKwf3Q3Ee\n8C0ReRb314EDfAS4TkSeAWYDw3o9xtnNtvLBjvvxVuJ+De6vAoBFqhpX1XagPTHvfuAcEZkCbEhc\nWzGbfAT4euL/eTfuldBR1UeBeuBfqro+se7u/scAU3HfI08nbkOB8Yllbybut71W44HFqtqpqjHg\npd1sc0HifgNQPPBd2/fkQxJ+F5gBICLjgLNwf5IhIuW4b9hVwDoR+UjiMcfg/uSE3E44HbhfOgCH\n4O7LhcBFiYr/YNzK5V3cKu843F8Of9rddkRkNNt/5uaiDtxmFnBfk976q/AdAFVdjXstxf/A/dmf\nDWJs/7y+i9vMdhzwJdzKHRH5HvAP4FAROXwXj9uRAu+o6nGJ98vvgIWJZTu+ViuAiYkDcT4Sn7nE\n9v291svFX1GeyockfCcwTkSew30TnQjUiMjzuG1kN6jqh7jjIt8mIvOBb+P+XILcftM8CYxN7NPp\nuO2Ci4AXRORpYCPwKu7P1BsSr9E8dv6gvQ60JHqK3ID7pdV7ea6I47ZTnpyoEqfvsGwgf98NzMR9\njbPBJqAg0avhJuCMxK+dvwPviMghuE0o1+AeKLs3UYS8hNsmXLXjBlV1IfBMokfJa8AE3Kaanf7v\nqtqEe7zheeAJ3Oq4OxFXMBFXrr1fsoKNombMLojI6cBUVb3B61iyQeLYwjWq+l+J6fnAdar6greR\n5b6c7x1hTKqJyE24TVanehxK1lDVqIiUisgC3J4Tr1oCTg2rhI0xxkP50CZsjDE5y5KwMcZ4yJKw\nMcZ4yJKwMcZ4yJKwMcZ4yJKwMcZ46P8BCWGXIaznedoAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.heatmap(stars.corr(),cmap='coolwarm',annot=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## NLP Classification Task\n", "\n", "Let's move on to the actual task. To make things a little easier, go ahead and only grab reviews that were either 1 star or 5 stars.\n", "\n", "**Create a dataframe called yelp_class that contains the columns of yelp dataframe but for only the 1 or 5 star reviews.**" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "collapsed": true }, "outputs": [], "source": [ "yelp_class = yelp[(yelp.stars==1) | (yelp.stars==5)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Create two objects X and y. X will be the 'text' column of yelp_class and y will be the 'stars' column of yelp_class. (Your features and target/labels)**" ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = yelp_class['text']\n", "y = yelp_class['stars']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Import CountVectorizer and create a CountVectorizer object.**" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.feature_extraction.text import CountVectorizer\n", "cv = CountVectorizer()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Use the fit_transform method on the CountVectorizer object and pass in X (the 'text' column). Save this result by overwriting X.**" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "collapsed": false }, "outputs": [], "source": [ "X = cv.fit_transform(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train Test Split\n", "\n", "Let's split our data into training and testing data.\n", "\n", "** Use train_test_split to split up the data into X_train, X_test, y_train, y_test. Use test_size=0.3 and random_state=101 **" ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3,random_state=101)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training a Model\n", "\n", "Time to train a model!\n", "\n", "** Import MultinomialNB and create an instance of the estimator and call is nb **" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.naive_bayes import MultinomialNB\n", "nb = MultinomialNB()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now fit nb using the training data.**" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nb.fit(X_train,y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predictions and Evaluations\n", "\n", "Time to see how our model did!\n", "\n", "**Use the predict method off of nb to predict labels from X_test.**" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "collapsed": true }, "outputs": [], "source": [ "predictions = nb.predict(X_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Create a confusion matrix and classification report using these predictions and y_test **" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.metrics import confusion_matrix,classification_report" ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[159 69]\n", " [ 22 976]]\n", "\n", "\n", " precision recall f1-score support\n", "\n", " 1 0.88 0.70 0.78 228\n", " 5 0.93 0.98 0.96 998\n", "\n", "avg / total 0.92 0.93 0.92 1226\n", "\n" ] } ], "source": [ "print(confusion_matrix(y_test,predictions))\n", "print('\\n')\n", "print(classification_report(y_test,predictions))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Great! Let's see what happens if we try to include TF-IDF to this process using a pipeline.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Text Processing\n", "\n", "** Import TfidfTransformer from sklearn. **" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.feature_extraction.text import TfidfTransformer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Import Pipeline from sklearn. **" ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.pipeline import Pipeline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Now create a pipeline with the following steps:CountVectorizer(), TfidfTransformer(),MultinomialNB()**" ] }, { "cell_type": "code", "execution_count": 157, "metadata": { "collapsed": false }, "outputs": [], "source": [ "pipeline = Pipeline([\n", " ('bow', CountVectorizer()), # strings to token integer counts\n", " ('tfidf', TfidfTransformer()), # integer counts to weighted TF-IDF scores\n", " ('classifier', MultinomialNB()), # train on TF-IDF vectors w/ Naive Bayes classifier\n", "])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the Pipeline\n", "\n", "**Time to use the pipeline! Remember this pipeline has all your pre-process steps in it already, meaning we'll need to re-split the original data (Remember that we overwrote X as the CountVectorized version. What we need is just the text**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train Test Split\n", "\n", "**Redo the train test split on the yelp_class object.**" ] }, { "cell_type": "code", "execution_count": 158, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = yelp_class['text']\n", "y = yelp_class['stars']\n", "X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3,random_state=101)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now fit the pipeline to the training data. Remember you can't use the same training data as last time because that data has already been vectorized. We need to pass in just the text and labels**" ] }, { "cell_type": "code", "execution_count": 159, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Pipeline(steps=[('bow', CountVectorizer(analyzer='word', binary=False, decode_error='strict',\n", " dtype=, encoding='utf-8', input='content',\n", " lowercase=True, max_df=1.0, max_features=None, min_df=1,\n", " ngram_range=(1, 1), preprocessor=None, stop_words=None,\n", " strip_...f=False, use_idf=True)), ('classifier', MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))])" ] }, "execution_count": 159, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# May take some time\n", "pipeline.fit(X_train,y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Predictions and Evaluation\n", "\n", "** Now use the pipeline to predict from the X_test and create a classification report and confusion matrix. You should notice strange results.**" ] }, { "cell_type": "code", "execution_count": 153, "metadata": { "collapsed": false }, "outputs": [], "source": [ "predictions = pipeline.predict(X_test)" ] }, { "cell_type": "code", "execution_count": 154, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0 228]\n", " [ 0 998]]\n", " precision recall f1-score support\n", "\n", " 1 0.00 0.00 0.00 228\n", " 5 0.81 1.00 0.90 998\n", "\n", "avg / total 0.66 0.81 0.73 1226\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/marci/anaconda/lib/python3.5/site-packages/sklearn/metrics/classification.py:1074: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.\n", " 'precision', 'predicted', average, warn_for)\n" ] } ], "source": [ "print(confusion_matrix(y_test,predictions))\n", "print(classification_report(y_test,predictions))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like Tf-Idf actually made things worse! That is it for this project. But there is still a lot more you can play with:\n", "\n", "**Some other things to try....**\n", "Try going back and playing around with the pipeline steps and seeing if creating a custom analyzer like we did in the lecture helps (note: it probably won't). Or recreate the pipeline with just the CountVectorizer() and NaiveBayes. Does changing the ML model at the end to another classifier help at all?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Great Job!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }