Python + Neo4j（安装）可视化分析漫威十年人物关系图谱

Lili Liang 梁莉莉

Created: May 3, 2019 ｜ Last updated: Jan 5, 2025 ｜ # 可视化

1 数据爬取
- 1.1 漫威人物关系图谱网站
- 1.2 爬取人物关系数据
2 Neo4j的安装及服务启动
- 2.1 Neo4j下载安装
- 2.2 开启Neo4j服务
3 数据准备
- 3.1 加入列名
- 3.2 放入本地Neo4j的import文件
4 数据可视化

1 数据爬取

1.1 漫威人物关系图谱网站

网址传送门：https://graphics.straitstimes.com/STI/STIMEDIA/Interactives/2018/04/marvel-cinematic-universe-whos-who-interactive/index.html
- 注：网站被墙，因此很多朋友反应网站打不开，但是用梯子是可以访问的。另外，我会将爬下来的 csv 文件的 github 地址直接贴在下文中，有需要的朋友可以自取。
网站介绍：网站是基于 Graph 技术开发的，主要是关于漫威人物、漫威电影的图谱。
网站一览

首页

python-neo4j-marvel-1

人物关系

python-neo4j-marvel-2

点击头像，可看到人物的详细信息：（钢铁侠！！

python-neo4j-marvel-3

漫威宇宙系列电影

python-neo4j-marvel-4

1.2 爬取人物关系数据

注：浏览器推荐使用 Google Chrome

打开 F12 调试，选择 Network（注意是在首页）

python-neo4j-marvel-5

关键词搜索（ Ctrl + F ），输入 marvel-data.json

python-neo4j-marvel-6

进入 marvel-data.json ，复制 json 接口的 URL

python-neo4j-marvel-7

在这里，我们主要需要爬取的数据是 characters 与 relationship

python-neo4j-marvel-8

python代码

跑代码之前需要先配置 python 环境，在这里我使用的 IDE 是 PyCharm，PyCharm 的安装见此处：Win10环境Python+Tensorflow+Jupyter入门安装详解

代码中的 url 即刚才在网站上复制的 json 接口

import json
import requests
 
headers = {
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
 
url = 'https://graphics.straitstimes.com/STI/STIMEDIA/Interactives/2018/04/marvel-cinematic-universe-whos-who-interactive/data/marvel-data.json'
response = requests.get(url=url, headers=headers)
result = json.loads(response.text)
 
num = 0
names = []
item = {0: 'friend', 1: 'enemy', 2: 'creation', 3: 'family', 4: 'work', 5: 'love'}
 
for i in result['relationship']:
    subject = result['relationship'][i]['id']
    object = result['relationship'][i]['target_id']
 
    if subject not in names:
        names.append(subject)
    if object not in names:
        names.append(object)
 
    relation = int(result['relationship'][i]['relationship'])
    with open('relation_message.csv', 'a+') as f:
        f.write(subject + ',' + object + ',' + item[relation] + '\n')
 
for j in names:
    num += 1
    with open('names_message.csv', 'a+') as f:
        f.write(j + ',' + str(num) + '\n')
 
for k in result['characters']:
    id = result['characters'][k]['id']
    name = result['characters'][k]['name']
    status = result['characters'][k]['status']
    species = result['characters'][k]['species']
    with open('message.csv', 'a+') as f:
        f.write(id + ',' + name + ',' + status + ',' + species + '\n')

运行完毕后会产生三个 csv 文件

csv 文件 github 地址：https://github.com/leungll/Marvel-File

python-neo4j-marvel-9

至此，数据爬取部分已经完成。

2 Neo4j的安装及服务启动

2.1 Neo4j下载安装

进入官网下载：https://neo4j.com/download-center/#releases

python-neo4j-marvel-10

若点击之后网站没有弹出下载提示，则点击此处下载

python-neo4j-marvel-11

解压安装包

注：切记安装路径 不要含任何中文字符或空格（例如文件名为 Program Files ），否则远程服务器访问本地数据库时会乱码，导致无法读取数据

python-neo4j-marvel-12

2.2 开启Neo4j服务

管理员身份打开 Windows PowerShell

python-neo4j-marvel-13

启动

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned
Import-Module '（neo4j文件目录）\bin\Neo4j-Management.psd1'
Invoke-Neo4j console

python-neo4j-marvel-14

访问服务器

localhost:7474

python-neo4j-marvel-15

输入密码

初始用户名：neo4j，初始密码：neo4j

python-neo4j-marvel-16

之后服务器会要求你修改密码

python-neo4j-marvel-17

3 数据准备

生成的三个 csv 文件中，names_message.csv 即角色，relation_message.csv 即人物关系情况

3.1 加入列名

在 names_message.csv 中

python-neo4j-marvel-18

在 relation_message.csv 中

python-neo4j-marvel-19

3.2 放入本地Neo4j的import文件

python-neo4j-marvel-20

4 数据可视化

4.1 加载names_message.csv文件

LOAD CSV  WITH HEADERS FROM 'file:///names_message.csv' AS data CREATE (:people{name:data.name, id:data.id});

python-neo4j-marvel-21

182 个人物节点已生成

4.2 加载relation_message.csv文件

LOAD CSV  WITH HEADERS FROM "file:///relation_message.csv" AS relations
MATCH (entity1:people{name:relations.subject}) , (entity2:people{name:relations.object})
CREATE (entity1)-[:rel{relation: relations.relation}]->(entity2)

python-neo4j-marvel-22

1144 对人物关系已建立完毕

4.3 查看人物关系图谱

取消限制（去掉 LIMIT 25 ）

python-neo4j-marvel-23

运行

python-neo4j-marvel-24

显示人物及关系

选择全屏

python-neo4j-marvel-25

替换人物名称

python-neo4j-marvel-26

替换人物关系

python-neo4j-marvel-27

4.4 筛选人物关系

托尼·斯达克的朋友

match p=(n:people{name:"tonys"})-[:rel{relation:"friend"}]->() return p;

python-neo4j-marvel-28

其中「thor」为「雷神」，「stever」为「美队」，「blackw」为「黑寡妇」，「vision」为「幻视」，「peterp」为「蜘蛛侠」，「bruceb」为「绿巨人」

美队的女友

match p=(n:people{name:"stever"})-[:rel{relation:"love"}]->() return p;

python-neo4j-marvel-29

其余的查询类似我们熟知的 SQL 语句，大家可以多多尝试。

Tags: / Python / Neo4j / Marvel