zoukankan      html  css  js  c++  java
  • 基于内容的推荐例子(电影推荐)

    推荐例子介绍

    根据典型关键数据

    导演

    演员

    关键字

    题材

    'keywords', 'cast', 'genres', 'director'

    构造自然语言的组合特征,利用CountVectorizer计算每个词出现的次数,作为特征向量,

    使用余弦相似性构造所有电影之间的相似性。

    代码

    https://github.com/fanqingsong/Content-based-Recommandation-Engine

    import pandas as pd
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.metrics.pairwise import cosine_similarity
    
    
    def get_title_from_index(index):
        return df[df.index == index]["title"].values[0]
    
    
    def get_index_from_title(title):
        return df[df.title == title]["index"].values[0]
    
    
    # Reading CSV File
    df = pd.read_csv("movie_dataset_content.csv", encoding='utf-8')
    
    
    # Selecting Features
    features = ['keywords', 'cast', 'genres', 'director']
    
    # Creating a column in DF which combines all selected features
    for feature in features:
        df[feature] = df[feature].fillna('')
    
    
    def combine_features(row):
        return row['keywords'] + " " + row['cast'] + " " + row["genres"] + " " + row["director"]
    
    
    df["combined_features"] = df.apply(combine_features, axis=1)
    
    # making an object of CountVectorizer class to create count matrix
    cv = CountVectorizer()
    
    # Creating count matrix from this new combined column
    count_matrix = cv.fit_transform(df["combined_features"])
    
    # Computing the Cosine Similarity based on the count_matrix
    cosine_sim = cosine_similarity(count_matrix)
    
    movie_liked_by_user = "Thor"
    
    # Getting index of this movie from its title
    liked_movie_index = get_index_from_title(movie_liked_by_user)
    
    similar_movies = list(enumerate(cosine_sim[liked_movie_index]))
    
    # Get a list of similar movies in descending order of similarity score
    predictions = sorted(similar_movies, key=lambda x: x[1], reverse=True)
    
    # Print titles of 10 predicted movies
    i = 0
    for movie in predictions:
        print(get_title_from_index(movie[0]))
        i = i+1
        if i > 10:
            break

    运行

    root@DESKTOP-OGSLB14:~/mine/Content-based-Recommandation-Engine# python3 content_based_recommender.py
    Thor
    Thor: The Dark World
    The Avengers
    Captain America: The Winter Soldier
    Avengers: Age of Ultron
    Captain America: Civil War
    Pirates of the Caribbean: Dead Man's Chest
    Cinderella
    Jack Ryan: Shadow Recruit
    The Amazing Spider-Man 2
    Captain America: The First Avenger
    root@DESKTOP-OGSLB14:~/mine/Content-based-Recommandation-Engine#

  • 相关阅读:
    ARC和MRC兼容和转换
    ARC下的内存管理
    嵌入式硬件系列一:处理器介绍
    嵌入式Linux GCC常用命令
    一. Linux 下的常用命令
    ARM学习中的必知基本常识
    二叉搜索树详解
    从入门到高手,嵌入式必会技能及学习步骤总结
    史上最全Linux目录结构说明
    排序系列之六:快速排序法进阶
  • 原文地址:https://www.cnblogs.com/lightsong/p/11235333.html
Copyright © 2011-2022 走看看