Python在数据科学领域的应用真的是越来越普及,得益于Python相对来讲通俗易懂的语言风格,语法简单且容易入门的特性,给很多数据科学领域的朋友,减轻了一部分学习编程语言的繁重。Pandas
+ NumPy
+ Matplotlib
,这三者的结合基本可以胜任任意简单的数据分析和可视化的任务。复杂一点的可能还会需要SciPy
的帮助。
本文目的
这次,我打算用一篇长文来记录一下自己是如何利用Pandas进行数据分析的。网上有很多的Pandas
入门教程,因此我这里并不打算针对所有Pandas的基础操作描述的那么清楚,还是希望更多的表达一些对于数据分析的想法和实现。
广义上,数据分析其实包含了从导入数据->清洗数据->分析数据->展示数据,这一从头到尾的流程。狭义上,数据分析指的就是中间分析数据这一块内容。本文按照广义上的数据分析的过程来一步步探讨。
接下来我们就正式开始本次数据分析之旅。
正文
下面的这一段代码主要是包的调用和一些环境配置,Seaborn
是也是一个plot包,可用来画出比Matplotlib
更漂亮的图,它本身是基于Matplotlib
设计的,对NumPy
和Pandas
都有很好的支持。这里我就不做过多解释了,对Seaborn
有兴趣的朋友可以留言咨询或者自行探索。1
2
3
4
5
6
7
8
9
10import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(rc={"figure.figsize": (10, 6.25)})
sns.set_style("darkgrid")
colors = ["windows blue", "amber", "faded green", "greyish", "dusty purple", "red violet", "marine", "jungle green", "chocolate brown", "dull pink", "reddish orange"]
sns.set_palette(sns.xkcd_palette(colors))
导入数据
我这次用的数据是IGN上近20年来的各种平台的游戏,来源于这里。
1 | reviews = pd.read_csv("ign.csv") |
数据读入之后,我们来看一下这里都有些什么内容。1
reviews.head()
Unnamed: 0 | score_phrase | title | url | platform | score | genre | editors_choice | release_year | release_month | release_day |
---|---|---|---|---|---|---|---|---|---|---|
0 | Amazing | LittleBigPlanet PS Vita | /games/littlebigplanet-vita/vita-98907 | PlayStation Vita | 9.0 | Platformer | Y | 2012 | 9 | 12 |
1 | Amazing | LittleBigPlanet PS Vita – Marvel Super Hero E… | /games/littlebigplanet-ps-vita-marvel-super-he… | PlayStation Vita | 9.0 | Platformer | Y | 2012 | 9 | 12 |
2 | Great | Splice: Tree of Life | /games/splice/ipad-141070 | iPad | 8.5 | Puzzle | N | 2012 | 9 | 12 |
3 | Great | NHL 13 | /games/nhl-13/xbox-360-128182 | Xbox 360 | 8.5 | Sports | N | 2012 | 9 | 11 |
4 | Great | NHL 13 | /games/nhl-13/ps3-128181 | PlayStation 3 | 8.5 | Sports | N | 2012 | 9 | 11 |
5 | Amazing | LittleBigPlanet PS Vita | /games/littlebigplanet-vita/vita-98907 | PlayStation Vita | 9.0 | Platformer | Y | 2012 | 9 | 12 |
6 | Amazing | LittleBigPlanet PS Vita – Marvel Super Hero E… | /games/littlebigplanet-ps-vita-marvel-super-he… | PlayStation Vita | 9.0 | Platformer | Y | 2012 | 9 | 12 |
7 | Great | Splice: Tree of Life | /games/splice/ipad-141070 | iPad | 8.5 | Puzzle | N | 2012 | 9 | 12 |
8 | Great | NHL 13 | /games/nhl-13/xbox-360-128182 | Xbox 360 | 8.5 | Sports | N | 2012 | 9 | 11 |
9 | Great | NHL 13 | /games/nhl-13/ps3-128181 | PlayStation 3 | 8.5 | Sports | N | 2012 | 9 | 11 |
这里先简单介绍一下每一列都代表什么吧:
- score_phrase – IGN用一个词来评价当前游戏,与得分直接相关;
- title – 游戏名称;
- url – 完整评论的地址;
- platform – 游戏平台(PS4, PC, Xbox, etc.);
- score – 游戏的具体评分,从1.0到10.0;
- genre – 游戏分类;
- editors_choice – 是否为IGN编辑推荐的游戏,与评分有关系。
- release_year – 游戏发布年份;
- release_month – 发布月份;
- release_day – 发布日期。
我们来看下总共多少个数据。1
reviews.shape
(18625, 11)
看来我们这次的数据里一共18625
条数据,一共11
列属性。
清洗数据
源数据导入后一般来说是不能直接使用的,需要进行一定范围的数据清洗,不过本次的数据基本不需要清洗,收集这个数据的Eric Grinstein已经对数据进行了清洗工作。不过这里我们仍需要做一点简单的清洗工作,去除一些我们不需要的内容。
1 | reviews = reviews.iloc[:, 1:] |
score_phrase | title | url | platform | score | genre | editors_choice | release_year | release_month | release_day | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Amazing | LittleBigPlanet PS Vita | /games/littlebigplanet-vita/vita-98907 | PlayStation Vita | 9.0 | Platformer | Y | 2012 | 9 | 12 |
1 | Amazing | LittleBigPlanet PS Vita – Marvel Super Hero E… | /games/littlebigplanet-ps-vita-marvel-super-he… | PlayStation Vita | 9.0 | Platformer | Y | 2012 | 9 | 12 |
2 | Great | Splice: Tree of Life | /games/splice/ipad-141070 | iPad | 8.5 | Puzzle | N | 2012 | 9 | 12 |
3 | Great | NHL 13 | /games/nhl-13/xbox-360-128182 | Xbox 360 | 8.5 | Sports | N | 2012 | 9 | 11 |
4 | Great | NHL 13 | /games/nhl-13/ps3-128181 | PlayStation 3 | 8.5 | Sports | N | 2012 | 9 | 11 |
分析数据
数据清洗之后,其实就是分析过程的正式开始。在开始分析过程之前,先说点题外话。我本身对于游戏是很热爱的,从小到大,游戏机,掌机,PC,也拥有过不少的游戏平台。从和父母斗智斗勇中各种争取时间玩的红白机,小霸王学习机,到后来可以躲在被子里玩的GameBoy,但偶尔还得探出头担心父母进屋里发现自己的小秘密;再往后的世嘉,所以又不得不和父母软磨硬泡恳求游戏时间。直至家里第一台PC的出现,基本其他的游戏平台就很少碰了,除了后来的PSP,那是我从GameBoy之后时隔很多年再次拿起掌机玩游戏。要说起游戏,游戏平台,游戏的历史,真的说上三天三夜也说不完,其实这也是我为什么选择这么IGN的这个数据作为数据分析的数据来源。我也很想看看这20年来电子游戏产业的发展和趋势。
好了,咱们言归正传,就我个人而言,拿到这么多的数据之后,第一反应是:这么多的游戏,究竟是分布在了多少平台上呢?我亲身体验过的平台其实并不多,大概10个左右吧。那么这个数据集里究竟包含了多少平台呢?
1 | all_platforms = reviews["platform"].unique() |
array(['PlayStation Vita', 'iPad', 'Xbox 360', 'PlayStation 3',
'Macintosh', 'PC', 'iPhone', 'Nintendo DS', 'Nintendo 3DS',
'Android', 'Wii', 'PlayStation 4', 'Wii U', 'Linux',
'PlayStation Portable', 'PlayStation', 'Nintendo 64', 'Saturn',
'Lynx', 'Game Boy', 'Game Boy Color', 'NeoGeo Pocket Color',
'Game.Com', 'Dreamcast', 'Dreamcast VMU', 'WonderSwan', 'Arcade',
'Nintendo 64DD', 'PlayStation 2', 'WonderSwan Color',
'Game Boy Advance', 'Xbox', 'GameCube', 'DVD / HD Video Game',
'Wireless', 'Pocket PC', 'N-Gage', 'NES', 'iPod', 'Genesis',
'TurboGrafx-16', 'Super NES', 'NeoGeo', 'Master System',
'Atari 5200', 'TurboGrafx-CD', 'Atari 2600', 'Sega 32X', 'Vectrex',
'Commodore 64/128', 'Sega CD', 'Nintendo DSi', 'Windows Phone',
'Web Games', 'Xbox One', 'Windows Surface', 'Ouya',
'New Nintendo 3DS', 'SteamOS'], dtype=object)
这么多的平台……说实话,这里有很多我听都没听过,像Dreamcast
,Atari 2600
,Vectrex
等等,看来这20年,游戏产业的发展还是很多元化的,至少从游戏平台上就可以看出端倪。
有了游戏平台的信息,自然而然地就会问,每个平台大概都出过多少游戏呢?1
reviews["platform"].value_counts(dropna=False)
PC 3370
PlayStation 2 1686
Xbox 360 1631
Wii 1366
PlayStation 3 1356
Nintendo DS 1045
PlayStation 952
Wireless 910
iPhone 842
Xbox 821
PlayStation Portable 633
Game Boy Advance 623
GameCube 509
Game Boy Color 356
Nintendo 64 302
Dreamcast 286
PlayStation 4 277
Nintendo DSi 254
Nintendo 3DS 225
Xbox One 208
PlayStation Vita 155
Wii U 114
iPad 99
Lynx 82
Macintosh 81
Genesis 58
NES 49
TurboGrafx-16 40
Android 39
Super NES 33
NeoGeo Pocket Color 31
N-Gage 30
Game Boy 22
iPod 17
Sega 32X 16
Windows Phone 14
Master System 13
Arcade 11
Linux 10
NeoGeo 10
Nintendo 64DD 7
Commodore 64/128 6
Saturn 6
Atari 2600 5
WonderSwan 4
TurboGrafx-CD 3
Game.Com 3
Atari 5200 2
New Nintendo 3DS 2
Vectrex 2
Pocket PC 1
WonderSwan Color 1
Ouya 1
Web Games 1
SteamOS 1
Dreamcast VMU 1
Windows Surface 1
DVD / HD Video Game 1
Sega CD 1
Name: platform, dtype: int64
从上面的统计来看,PC端无疑是最大的贡献者,这也可以理解,毕竟个人电脑从上个世纪末开始出现井喷,到后来虽然出货量开始下降,但一直都是人们学习生活娱乐中不可或缺的一部分,并且早期的个人电脑绝大部分都是以Windows为操作系统。不过让我没想到的是Dreamcast
竟然还有286款游戏,看来是我孤陋寡闻了……
下面来看看排名前十的平台都有哪些。
1 | platforms = reviews["platform"].value_counts()[:10].index.tolist() |
['PC',
'PlayStation 2',
'Xbox 360',
'Wii',
'PlayStation 3',
'Nintendo DS',
'PlayStation',
'Wireless',
'iPhone',
'Xbox']
既然前十的平台我已经知道了,那么下面来看看每个平台的游戏质量如何,虽然PC端的游戏最多,但不一定好游戏占比就是最多的,对吧?
想知道每个平台的游戏质量如何,我得先从所有的数据中将只属于前十的平台的游戏提取出来。这里我创建一个filter,用来筛选游戏平台。1
2fil = reviews["platform"] == platforms[0] # create a filter
fil
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 True
8 False
9 True
...
18615 False
18616 True
18617 False
18618 True
18619 True
18620 False
18621 False
18622 False
18623 False
18624 True
Name: platform, Length: 18625, dtype: bool
1 | for platform in platforms[1:]: |
下面是提取出来的所有数据:1
filtered_reviews
score_phrase | title | url | platform | score | genre | editors_choice | release_year | release_month | release_day | |
---|---|---|---|---|---|---|---|---|---|---|
3 | Great | NHL 13 | /games/nhl-13/xbox-360-128182 | Xbox 360 | 8.5 | Sports | N | 2012 | 9 | 11 |
4 | Great | NHL 13 | /games/nhl-13/ps3-128181 | PlayStation 3 | 8.5 | Sports | N | 2012 | 9 | 11 |
6 | Awful | Double Dragon: Neon | /games/double-dragon-neon/xbox-360-131320 | Xbox 360 | 3.0 | Fighting | N | 2012 | 9 | 11 |
7 | Amazing | Guild Wars 2 | /games/guild-wars-2/pc-896298 | PC | 9.0 | RPG | Y | 2012 | 9 | 11 |
8 | Awful | Double Dragon: Neon | /games/double-dragon-neon/ps3-131321 | PlayStation 3 | 3.0 | Fighting | N | 2012 | 9 | 11 |
9 | Good | Total War Battles: Shogun | /games/total-war-battles-shogun/pc-142564 | PC | 7.0 | Strategy | N | 2012 | 9 | 11 |
10 | Good | Tekken Tag Tournament 2 | /games/tekken-tag-tournament-2/ps3-124584 | PlayStation 3 | 7.5 | Fighting | N | 2012 | 9 | 11 |
11 | Good | Tekken Tag Tournament 2 | /games/tekken-tag-tournament-2/xbox-360-124581 | Xbox 360 | 7.5 | Fighting | N | 2012 | 9 | 11 |
12 | Good | Wild Blood | /games/wild-blood/iphone-139363 | iPhone | 7.0 | NaN | N | 2012 | 9 | 10 |
13 | Amazing | Mark of the Ninja | /games/mark-of-the-ninja-135615/xbox-360-129276 | Xbox 360 | 9.0 | Action, Adventure | Y | 2012 | 9 | 7 |
14 | Amazing | Mark of the Ninja | /games/mark-of-the-ninja-135615/pc-143761 | PC | 9.0 | Action, Adventure | Y | 2012 | 9 | 7 |
16 | Okay | Home: A Unique Horror Adventure | /games/home-a-unique-horror-adventure/pc-137135 | PC | 6.5 | Adventure | N | 2012 | 9 | 6 |
17 | Great | Avengers Initiative | /games/avengers-initiative/iphone-141579 | iPhone | 8.0 | Action | N | 2012 | 9 | 5 |
18 | Mediocre | Way of the Samurai 4 | /games/way-of-the-samurai-4/ps3-23516 | PlayStation 3 | 5.5 | Action, Adventure | N | 2012 | 9 | 3 |
19 | Good | JoJo’s Bizarre Adventure HD | /games/jojos-bizarre-adventure/xbox-360-137717 | Xbox 360 | 7.0 | Fighting | N | 2012 | 9 | 3 |
20 | Good | JoJo’s Bizarre Adventure HD | /games/jojos-bizarre-adventure/ps3-137896 | PlayStation 3 | 7.0 | Fighting | N | 2012 | 9 | 3 |
21 | Good | Mass Effect 3: Leviathan | /games/mass-effect-3-leviathan/xbox-360-138918 | Xbox 360 | 7.5 | RPG | N | 2012 | 8 | 31 |
22 | Good | Mass Effect 3: Leviathan | /games/mass-effect-3-leviathan/ps3-138915 | PlayStation 3 | 7.5 | RPG | N | 2012 | 8 | 31 |
23 | Good | Mass Effect 3: Leviathan | /games/mass-effect-3-leviathan/pc-138919 | PC | 7.5 | RPG | N | 2012 | 8 | 31 |
24 | Amazing | Dark Souls (Prepare to Die Edition) | /games/dark-souls-prepare-to-die-edition/pc-13… | PC | 9.0 | Action, RPG | Y | 2012 | 8 | 31 |
25 | Good | Symphony | /games/symphony/pc-136470 | PC | 7.0 | Shooter | N | 2012 | 8 | 30 |
27 | Good | Tom Clancy’s Ghost Recon Phantoms | /games/tom-clancys-ghost-recon-online/pc-109114 | PC | 7.5 | Shooter | N | 2012 | 8 | 29 |
28 | Great | Thirty Flights of Loving | /games/thirty-flights-of-loving/pc-138374 | PC | 8.0 | Adventure | N | 2012 | 8 | 29 |
29 | Okay | Legasista | /games/legasista/ps3-127147 | PlayStation 3 | 6.5 | Action, RPG | N | 2012 | 8 | 28 |
31 | Great | World of Warcraft: Mists of Pandaria | /games/world-of-warcraft-mists-of-pandaria/pc-… | PC | 8.7 | RPG | Y | 2012 | 10 | 4 |
32 | Bad | Hell Yeah! Wrath of the Dead Rabbit | /games/hell-yeah-wrath-of-the-dead-rabbit/ps3-… | PlayStation 3 | 4.9 | Platformer | N | 2012 | 10 | 4 |
33 | Amazing | Pokemon White Version 2 | /games/pokemon-white-version-2/nds-129228 | Nintendo DS | 9.6 | RPG | Y | 2012 | 10 | 3 |
34 | Good | War of the Roses | /games/war-of-the-roses-140577/pc-115849 | PC | 7.3 | Action | N | 2012 | 10 | 3 |
35 | Amazing | Pokemon Black Version 2 | /games/pokemon-black-version-2/nds-129224 | Nintendo DS | 9.6 | RPG | Y | 2012 | 10 | 3 |
36 | Okay | Drakerider | /games/drakerider/iphone-135745 | iPhone | 6.5 | RPG | N | 2012 | 10 | 3 |
… | … | … | … | … | … | … | … | … | … | … |
18546 | Great | Devil Daggers | /games/devil-daggers/pc-20049771 | PC | 8.5 | Shooter | N | 2016 | 2 | 27 |
18547 | Good | Superhot | /games/superhot/pc-20018899 | PC | 7.5 | Action | N | 2016 | 2 | 25 |
18549 | Good | Battleborn | /games/battleborn/pc-20021225 | PC | 7.1 | Shooter | N | 2016 | 5 | 6 |
18554 | Good | The Park | /games/the-park/pc-20042102 | PC | 7.0 | Adventure | N | 2016 | 5 | 4 |
18555 | Great | Hitman: Episode 2 | /games/hitman-episode-2/pc-20051629 | PC | 8.5 | Shooter | N | 2016 | 4 | 29 |
18557 | Amazing | Hearts of Iron IV | /games/hearts-of-iron-iv/pc-20012080 | PC | 9.0 | Strategy | Y | 2016 | 6 | 6 |
18559 | Okay | Dangerous Golf | /games/dangerous-golf/pc-20048436 | PC | 6.0 | Sports, Action | N | 2016 | 6 | 3 |
18567 | Great | Offworld Trading Company | /games/offworld-trading-company/pc-20018639 | PC | 8.0 | Strategy | N | 2016 | 4 | 28 |
18568 | Okay | The Walking Dead: Michonne – Episode 3: What … | /games/the-walking-dead-michonne-episode-3/pc-… | PC | 6.3 | Adventure | N | 2016 | 4 | 27 |
18570 | Good | Battlefleet Gothic: Armada | /games/battlefleet-gothic-armada/pc-20030300 | PC | 7.1 | Strategy | N | 2016 | 4 | 22 |
18572 | Amazing | Overwatch | /games/overwatch/pc-20027413 | PC | 9.4 | Shooter | Y | 2016 | 5 | 28 |
18575 | Good | Fallout 4: Nuka World | /games/fallout-4-nuka-world/pc-20054761 | PC | 7.9 | RPG | N | 2016 | 8 | 30 |
18578 | Good | Master of Orion | /games/master-of-orion-wargaming/pc-20038452 | PC | 7.1 | Strategy | N | 2016 | 8 | 26 |
18580 | Great | Quadrilateral Cowboy | /games/quadrilateral-cowboy/pc-159788 | PC | 8.5 | Puzzle | N | 2016 | 7 | 28 |
18581 | Great | Fallout 4: Vault-Tec Workshop | /games/fallout-4-vault-tec-workshop/pc-20054769 | PC | 8.2 | RPG | N | 2016 | 7 | 27 |
18583 | Great | Kentucky Route Zero: Act 4 | /games/kentucky-route-zero-act-4/pc-20046280 | PC | 8.5 | Adventure | N | 2016 | 7 | 22 |
18586 | Great | F1 2016 | /games/f1-2016/pc-20054151 | PC | 8.8 | Racing | N | 2016 | 8 | 24 |
18589 | Amazing | Deus Ex: Mankind Divided | /games/deus-ex-mankind-divided/pc-20013794 | PC | 9.2 | Action, RPG | Y | 2016 | 8 | 19 |
18595 | Bad | Ghostbusters | /games/ghostbusters-the-movie/pc-20052317 | PC | 4.4 | Action | N | 2016 | 7 | 16 |
18596 | Okay | Necropolis | /games/necropolis/pc-20030346 | PC | 6.5 | Action, Adventure | N | 2016 | 7 | 14 |
18598 | Okay | Furi | /games/furi/pc-20044439 | PC | 6.8 | Action | N | 2016 | 7 | 13 |
18600 | Good | Hitman: Episode 4 | /games/hitman-episode-4/pc-20051637 | PC | 7.4 | Shooter | N | 2016 | 8 | 19 |
18603 | Good | Grow Up | /games/grow-up/pc-20054824 | PC | 7.8 | Platformer | N | 2016 | 8 | 18 |
18606 | Okay | Starcraft II: Nova Covert Ops – Mission Pack 2 | /games/starcraft-ii-nova-covert-ops-mission-pa… | PC | 6.4 | Strategy | N | 2016 | 8 | 4 |
18607 | Good | Pokemon Go | /games/pokemon-go/iphone-20042699 | iPhone | 7.0 | Battle | N | 2016 | 7 | 13 |
18613 | Great | XCOM 2: Shen’s Last Gift | /games/xcom-2-shens-last-gift/pc-20055520 | PC | 8.0 | Strategy | N | 2016 | 7 | 1 |
18616 | Good | Batman: The Telltale Series – Episode 1: Real… | /games/batman-the-telltale-series-episode-1-re… | PC | 7.5 | Adventure | N | 2016 | 8 | 2 |
18618 | Amazing | Starbound | /games/starbound-2016/pc-128879 | PC | 9.1 | Action | Y | 2016 | 7 | 28 |
18619 | Good | Human Fall Flat | /games/human-fall-flat/pc-20051928 | PC | 7.9 | Puzzle, Action | N | 2016 | 7 | 28 |
18624 | Masterpiece | Inside | /games/inside-playdead/pc-20055740 | PC | 10.0 | Adventure | Y | 2016 | 6 | 28 |
13979 rows × 10 columns
展示数据
现在已经有了前十平台的数据,需要思考的就是如何来呈现每个平台的游戏质量呢?当然可以用每个平台的score
的平均值来对比,但未免有点单薄了。数据属性中有一列是score_phrase
,用一个单词来形容当前游戏的好坏,与score
直接挂钩,用这个来展示应该会更容易理解和分析。
这里可以用Matplotlib.pyplot
的bar
来画,也可以用Seaborn
中的countplot
,后者使用起来更容易方便。1
sns.countplot(x="platform", hue="score_phrase", data=filtered_reviews, palette=sns.xkcd_palette(colors));
展示的结果如上图所示,我们可以看到PC平台下,Great
和Good
这两栏下的游戏数量基本就占了大半,但我并不能说PC端的游戏质量就比其他平台高出一筹,因为我们依然无法判断每个平台下优秀的作品占比如何。这幅图只能直观地告诉我们每个平台下,所有分数的一个分布状况。
所以,下面的工作,我要继续细化一下数据分析和展示的部分。
进一步分析与展示数据
因为原先划分的score_phrase
太多了,我决定将它们重新划为三个部分:好于Good
的,差于Okay
的,剩下的就是中间部分。我的这个标准可能比较严格,在我看来,评分8.0
以上的才算的上是优秀的作品,也就是高于Good
的;至于那些评分低于6.0
的,也就是还不到Okay
的,算作差劲也不算失礼吧。
1 | all_score_phrases = set(reviews["score_phrase"].unique()) |
这里我先用饼图来展示一下前十的平台,整体的游戏质量分布情况。
这里,我创建了一个新列,叫score_phrase_new
,为了区别原有的score_phrase
。
1 | filtered_reviews["score_phrase_new"] = filtered_reviews["score_phrase"].apply(category_score_phrase) |
score_phrase | title | url | platform | score | genre | editors_choice | release_year | release_month | release_day | score_phrase_new | |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | Great | NHL 13 | /games/nhl-13/xbox-360-128182 | Xbox 360 | 8.5 | Sports | N | 2012 | 9 | 11 | Better than Good |
4 | Great | NHL 13 | /games/nhl-13/ps3-128181 | PlayStation 3 | 8.5 | Sports | N | 2012 | 9 | 11 | Better than Good |
6 | Awful | Double Dragon: Neon | /games/double-dragon-neon/xbox-360-131320 | Xbox 360 | 3.0 | Fighting | N | 2012 | 9 | 11 | Worse than Okay |
7 | Amazing | Guild Wars 2 | /games/guild-wars-2/pc-896298 | PC | 9.0 | RPG | Y | 2012 | 9 | 11 | Better than Good |
8 | Awful | Double Dragon: Neon | /games/double-dragon-neon/ps3-131321 | PlayStation 3 | 3.0 | Fighting | N | 2012 | 9 | 11 | Worse than Okay |
先来用数字直观地看一下每个平台下,每个评分阶段的数量。1
filtered_reviews.groupby(["platform", "score_phrase_new"]).count()
score_phrase | title | url | score | genre | editors_choice | release_year | release_month | release_day | ||
---|---|---|---|---|---|---|---|---|---|---|
platform | score_phrase_new | |||||||||
Nintendo DS | Average | 462 | 462 | 462 | 462 | 462 | 462 | 462 | 462 | 462 |
Better than Good | 207 | 207 | 207 | 207 | 207 | 207 | 207 | 207 | 207 | |
Worse than Okay | 376 | 376 | 376 | 376 | 375 | 376 | 376 | 376 | 376 | |
PC | Average | 1394 | 1394 | 1394 | 1394 | 1393 | 1394 | 1394 | 1394 | 1394 |
Better than Good | 1323 | 1323 | 1323 | 1323 | 1322 | 1323 | 1323 | 1323 | 1323 | |
Worse than Okay | 653 | 653 | 653 | 653 | 652 | 653 | 653 | 653 | 653 | |
PlayStation | Average | 362 | 362 | 362 | 362 | 362 | 362 | 362 | 362 | 362 |
Better than Good | 313 | 313 | 313 | 313 | 313 | 313 | 313 | 313 | 313 | |
Worse than Okay | 277 | 277 | 277 | 277 | 277 | 277 | 277 | 277 | 277 | |
PlayStation 2 | Average | 716 | 716 | 716 | 716 | 716 | 716 | 716 | 716 | 716 |
Better than Good | 542 | 542 | 542 | 542 | 542 | 542 | 542 | 542 | 542 | |
Worse than Okay | 428 | 428 | 428 | 428 | 426 | 428 | 428 | 428 | 428 | |
PlayStation 3 | Average | 516 | 516 | 516 | 516 | 515 | 516 | 516 | 516 | 516 |
Better than Good | 569 | 569 | 569 | 569 | 569 | 569 | 569 | 569 | 569 | |
Worse than Okay | 271 | 271 | 271 | 271 | 271 | 271 | 271 | 271 | 271 | |
Wii | Average | 551 | 551 | 551 | 551 | 547 | 551 | 551 | 551 | 551 |
Better than Good | 321 | 321 | 321 | 321 | 321 | 321 | 321 | 321 | 321 | |
Worse than Okay | 494 | 494 | 494 | 494 | 494 | 494 | 494 | 494 | 494 | |
Wireless | Average | 473 | 473 | 473 | 473 | 471 | 473 | 473 | 473 | 473 |
Better than Good | 308 | 308 | 308 | 308 | 306 | 308 | 308 | 308 | 308 | |
Worse than Okay | 129 | 129 | 129 | 129 | 129 | 129 | 129 | 129 | 129 | |
Xbox | Average | 307 | 307 | 307 | 307 | 307 | 307 | 307 | 307 | 307 |
Better than Good | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | |
Worse than Okay | 160 | 160 | 160 | 160 | 160 | 160 | 160 | 160 | 160 | |
Xbox 360 | Average | 631 | 631 | 631 | 631 | 631 | 631 | 631 | 631 | 631 |
Better than Good | 646 | 646 | 646 | 646 | 646 | 646 | 646 | 646 | 646 | |
Worse than Okay | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | 354 | |
iPhone | Average | 412 | 412 | 412 | 412 | 405 | 412 | 412 | 412 | 412 |
Better than Good | 321 | 321 | 321 | 321 | 315 | 321 | 321 | 321 | 321 | |
Worse than Okay | 109 | 109 | 109 | 109 | 108 | 109 | 109 | 109 | 109 |
事实上,上面的表格大部分内容也用不上,我们需要的其实就三列:游戏平台,评分阶段和数量。因此我就压缩一下原表格,让它变成下面的样子。1
2
3count_df = filtered_reviews.groupby(["platform", "score_phrase_new"]).count().reset_index().iloc[:, :3]
count_df.rename(columns={"score_phrase": "count"}, inplace=True)
count_df
platform | score_phrase_new | count | |
---|---|---|---|
0 | Nintendo DS | Average | 462 |
1 | Nintendo DS | Better than Good | 207 |
2 | Nintendo DS | Worse than Okay | 376 |
3 | PC | Average | 1394 |
4 | PC | Better than Good | 1323 |
5 | PC | Worse than Okay | 653 |
6 | PlayStation | Average | 362 |
7 | PlayStation | Better than Good | 313 |
8 | PlayStation | Worse than Okay | 277 |
9 | PlayStation 2 | Average | 716 |
10 | PlayStation 2 | Better than Good | 542 |
11 | PlayStation 2 | Worse than Okay | 428 |
12 | PlayStation 3 | Average | 516 |
13 | PlayStation 3 | Better than Good | 569 |
14 | PlayStation 3 | Worse than Okay | 271 |
15 | Wii | Average | 551 |
16 | Wii | Better than Good | 321 |
17 | Wii | Worse than Okay | 494 |
18 | Wireless | Average | 473 |
19 | Wireless | Better than Good | 308 |
20 | Wireless | Worse than Okay | 129 |
21 | Xbox | Average | 307 |
22 | Xbox | Better than Good | 354 |
23 | Xbox | Worse than Okay | 160 |
24 | Xbox 360 | Average | 631 |
25 | Xbox 360 | Better than Good | 646 |
26 | Xbox 360 | Worse than Okay | 354 |
27 | iPhone | Average | 412 |
28 | iPhone | Better than Good | 321 |
29 | iPhone | Worse than Okay | 109 |
数据拿到手了,下面又该是用图形展示数据的时候。这次我们来看一下每一个评分阶段对于各自游戏平台占比究竟是多少。
1 | bar_width = 1 |
1 | f, ax = plt.subplots(1) |
我并没有直接把具体百分比的数值标记在上面,不过通过直观的图形依然可以看到一些信息。从图中可以看出来,相对来说,PlayStation
,PlayStation2
,Wii
,PC
和Nintendo DS
的游戏质量都是很不错的,高质量游戏占比高,且低质量游戏占比低。PlayStation3
虽然低质量游戏占比很小,但是高品质游戏也不算很多。iPhone
和Xbox
的表现算是最差的了,低质量游戏占比分属最高的一二,高品质游戏也是最低的两个平台。其实iPhone
是这样的倒是不意外了,因为毕竟iPhone
平台的起点相对于其他的平台要低很多,基本上三五个人,甚至一个人做出的游戏都有,这样很难保证游戏兼顾趣味性和剧情或者其他方面。在后期维护上面肯定也要比大公司开发的游戏差了很多。遗憾的是Xbox
竟然也有如此差劲的表现,着实令我难以理解。
总结
至此,我打算分析的内容就呈现完了,这就是我个人拿到数据之后一个简单的想法,然后试着去将这个想法用数据分析的方法展现出来,供自己去理解。后面我还会对这个数据集进一步的分析,比如去探讨一下年份和分数的关系,游戏类别和分数的关系。希望这篇文章可以起到抛砖引玉的作用,能让各位看完之后对于如何开始分析一份数据有自己的想法。
各位看官对于本文有任何不明白的地方,欢迎提问,也欢迎指正和建议。