zoukankan      html  css  js  c++  java
  • 教师编制考试数据分析

    .背景:因为女朋友最近考上了教师编,所以拿到了教师编制 笔试 面试的数据,进行笔试面试 上岸数据分析。

    数据源:xx省xx市教师编制考试成绩数据

    1.准备数据:

    # 导入相关包
    import
    pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from sklearn import svm from sklearn import metrics import matplotlib.pyplot as plt import seaborn as sns from pandas import plotting sns.set_style("whitegrid") plt.style.use('seaborn')
    # 导入数据集
    io = r'G:PythonLearnirisdataDataCalculate.xlsx'
    data = pd.read_excel(io, sheet_name='Sheet1')

    查看数据:

    data.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 39 entries, 0 to 38
    Data columns (total 7 columns):
     #   Column            Non-Null Count  Dtype  
    ---  ------            --------------  -----  
     0   ranking_written   39 non-null     int64  
     1   written           39 non-null     float64
     2   ranking_audition  39 non-null     int64  
     3   audition          39 non-null     float64
     4   total             39 non-null     float64
     5   ranking_total     39 non-null     int64  
     6   complete          39 non-null     object 

    查看数据:

    print(data)
    ranking_written  written  ranking_audition  ...  total  ranking_total  complete
    0                 1    84.75                 2  ...  87.30              1        ON
    1                 2    78.70                 3  ...  84.40              2        ON
    2                 7    75.15                 1  ...  83.58              3        ON
    3                12    72.70                 4  ...  81.88              4        ON
    4                 8    74.70                 8  ...  81.72              5        ON
    5                 4    75.70                15  ...  81.52              6        ON
    6                 3    76.15                21  ...  81.34              7        ON
    7                13    72.05                 6  ...  81.26              8        ON
    8                 6    75.20                19  ...  81.08              9        ON
    9                11    73.95                16  ...  80.82             10        ON
    10               15    70.70                 7  ...  80.60             11        ON
    11               10    73.95                22  ...  80.46             12        ON
    12               14    71.65                10  ...  80.38             13        ON
    13                9    74.15                29  ...  79.82             14       OFF
    14                5    75.55                33  ...  79.78             15       OFF
    15               29    65.10                 5  ...  78.72             16       OFF
    16               19    68.80                18  ...  78.64             17       OFF
    17               21    67.05                11  ...  78.30             18       OFF
    18               17    69.60                31  ...  77.76             19       OFF
    19               25    65.70                13  ...  77.64             20       OFF
    20               20    68.35                26  ...  77.62             21       OFF
    21               22    66.50                20  ...  77.60             22       OFF
    22               26    65.60                14  ...  77.60             23       OFF
    23               30    65.10                12  ...  77.52             24       OFF
    24               32    63.85                 9  ...  77.38             25       OFF
    25               16    70.20                35  ...  77.16             26       OFF
    26               24    65.75                23  ...  76.82             27       OFF
    27               27    65.55                25  ...  76.62             28       OFF
    28               31    64.95                24  ...  76.50             29       OFF
    29               18    69.10                38  ...  76.12             30       OFF
    30               28    65.45                32  ...  76.10             31       OFF
    31               23    65.85                34  ...  75.78             32       OFF
    32               38    59.30                17  ...  74.96             33       OFF
    33               34    60.65                27  ...  74.54             34       OFF
    34               36    60.00                28  ...  74.28             35       OFF
    35               33    62.35                37  ...  73.78             36       OFF
    36               39    59.25                30  ...  73.74             37       OFF
    37               35    60.20                36  ...  73.16             38       OFF
    38               37    59.90                39  ...  23.96             39       OFF

    1.探索数据之间的关系:

     通过 violinplot 与  pointplot 通过斜率与分布,探索笔试和面试 以及上岸的关系

    # 设置颜色主题
    antV = ['#1890FF', '#2FC25B', '#FACC14', '#223273', '#8543E0', '#13C2C2', '#3436c7', '#F04864']
    # 绘制  pointplot
    # 各特征与上岸之间的关系
    f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
    sns.despine(left=True)
    sns.violinplot(x='complete', y='ranking_written', data=data, palette=antV, ax=axes[0, 0])
    sns.violinplot(x='complete', y='written', data=data, palette=antV, ax=axes[0, 1])
    sns.violinplot(x='complete', y='ranking_audition', data=data, palette=antV, ax=axes[1, 0])
    sns.violinplot(x='complete', y='audition', data=data, palette=antV, ax=axes[1, 1])

     

    # 绘制  pointplot
    # 各特征与上岸之间的关系
    f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
    sns.despine(left=True)
    sns.pointplot(x='complete', y='ranking_written', data=data, color=antV[0], ax=axes[0, 0])
    sns.pointplot(x='complete', y='written', data=data, color=antV[0], ax=axes[0, 1])
    sns.pointplot(x='complete', y='ranking_audition', data=data, color=antV[0], ax=axes[1, 0])
    sns.pointplot(x='complete', y='audition', data=data, color=antV[0], ax=axes[1, 1])

     

    各特征值之间矩阵图关系

    sns.pairplot(data=data, palette=antV, hue='complete')

    Andrews Curves 适合进行数据校验,对数据中异常的数据进行数据校验。
    plt.subplots(figsize=(10, 8))
    plotting.andrews_curves(data, 'complete', colormap='cool')

    分别基于 笔试和面试 笔试排名和面试排名进行线性回归分析:

    sns.lmplot(data=data, x='written', y='audition', palette=antV, hue='complete')

    sns.lmplot(data=data, x='ranking_written', y='ranking_audition', palette=antV, hue='complete')

    最后通过热力图找出不同属性之间的相关性 相关性体现在热力图的正负值:

    2.机器学习

    通过机器学习 以笔试成绩 面试成绩预测其是否上岸,其他辅助数据笔试排名 面试排名

    进行机器学习之前 将数据集进行拆分为训练集和测试集 将是否上岸转换为 0 1

    # 载入特征和标签集
    X = data[['ranking_written', 'written', 'ranking_audition', 'audition', 'total', 'ranking_total']]
    Y = data['complete']
    # 对标签集进行编码
    encoder = LabelEncoder()
    y = encoder.fit_transform(Y)
    print(y)
    [1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     0 0]

     将数据集进行 7:3 的拆分  拆分为训练数据和测试数据

    # 对各阶段排名 以及成绩 最终是否进入进行机器学习
    train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=101)
    print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
    (27, 6) (27,) (12, 6) (12,)

    检查不同模型的准确性分析

    # 通用模型的机器学习测试方式
    model = svm.SVC()
    model.fit(train_X, train_y)
    prediction = model.predict(test_X)
    print('The accuracy of the SVM is: {0}'.format(metrics.accuracy_score(prediction, test_y)))
    The accuracy of the SVM is: 1.0
    # 笔试属性 与最终结果之间的关系
    written = data[['ranking_written', 'written', 'complete']]
    train_w, test_w = train_test_split(written, test_size=0.3, random_state=0)
    train_x_w = train_w[['ranking_written', 'written']]
    train_y_w = train_w.complete
    test_x_w = test_w[['ranking_written', 'written']]
    test_y_w = test_w.complete
    
    model = svm.SVC()
    model.fit(train_x_w, train_y_w)
    prediction = model.predict(test_x_w)
    print('The accuracy of the SVM using Written is: {0}'.format(metrics.accuracy_score(prediction, test_y_w)))
    # 面试属性 与最终结果之间的关系
    audition = data[['ranking_audition', 'audition', 'complete']]
    train_a, test_a = train_test_split(audition, test_size=0.3, random_state=0)
    train_x_a = train_a[['ranking_audition', 'audition']]
    train_y_a = train_a.complete
    test_x_a = test_a[['ranking_audition', 'audition']]
    test_y_a = test_a.complete
    
    model = svm.SVC()
    model.fit(train_x_a, train_y_a)
    prediction = model.predict(test_x_a)
    print('The accuracy of the SVM using audition is: {0}'.format(metrics.accuracy_score(prediction, test_y_a)))
    # 总成绩属性 与最终结果之间的关系
    audition = data[['ranking_total', 'total', 'complete']]
    train_a, test_a = train_test_split(audition, test_size=0.3, random_state=0)
    train_x_a = train_a[['ranking_total', 'total']]
    train_y_a = train_a.complete
    test_x_a = test_a[['ranking_total', 'total']]
    test_y_a = test_a.complete
    model = svm.SVC()
    model.fit(train_x_a, train_y_a)
    prediction = model.predict(test_x_a)
    print('The accuracy of the SVM using total is: {0}'.format(metrics.accuracy_score(prediction, test_y_a)))
    The accuracy of the SVM is: 1.0
    The accuracy of the SVM using Written is: 0.9166666666666666
    The accuracy of the SVM using audition is: 0.8333333333333334
    The accuracy of the SVM using total is: 1.0
  • 相关阅读:
    困扰几周了,请教啊,android与websevice数据交互很诡异的问题
    最新版本_adt-bundle-windows-x86_64-20140702 无法建立avd
    android向web提交数据,中文乱码
    activity怎么控制fragment中的textview组件
    关于云储存或者百度云的基础问题, 用java/android 实现上传文件到云储存(比如百度云)
    短信列表如何让同一个号码的短信只显示一条,刚刚加载短信列表会加载所有的数据列。求指教
    Android图片上传到服务器的问题
    安卓模拟器这么慢,大家都怎么调试的?
    浏览器前缀-----[译]Autoprefixer:一个以最好的方式处理浏览器前缀的后处理程序
    windows 下安装nodejs及其配置环境
  • 原文地址:https://www.cnblogs.com/ad-zhou/p/13716971.html
Copyright © 2011-2022 走看看