zoukankan      html  css  js  c++  java
  • S&P_01_Analyzing one categorical varialbe

    1. Analyzing categorical data

    1.1 Identifying individuals, variables and categorical variables in a data set

    Two types of variables are used in statistics: Quantitative and Categorical (also called qualitative). Quantitative variables are numerical variables: counts, percents, or numbers. Categorical variables are descriptions of groups or things, like “breeds of dog” or “voting preference”.

    Quantitative variables can be counted, like the numbers on the deck of cards. 2,3,4,5,6... those were all quantitative. In other words they are numerical values. 

    General rule: if you can add it, it’s quantitative. For example, a G.P.A. of 3.3 and a G.P.A. of 4.0 can be added together (3.3 + 4.0 = 7.3), so that means it’s quantitative. 

    A deck of cards also has qualitative values. The qualitative values are descriptions. we have spades, clubs,diamonds, hearts, etc. 

    As a general rule, if you can’t add something, then it’s categorical. For example, you can’t add cat + dog, or Republican + Democrat.'

    1.2 Distributions in 2-way tables. 

    There are our buckets for the amount of time studying. And also we create buckets for the percent correct. And then, we figure out what % of our entire student population falls into each of these categoreies. So for example, 2% of our students studied 21 to 40 minutes and got between 80 and 100% on the exam. This is a 2-way table. it's describing a joint distribution. You can view these as 2 variables. The time studied and the % correct.

     

     All we did is we totaled up each of these rows to 100. We total this rows and write it in the margin. This describes the distribution of the scores in the class.  20% of the students got 80 to 100% correct on that test. You don't know the breakdown by how much they actually studied. 

    There is another marginal distribution. the distribution of the amount of time people studied in the class. We could total up each of these columns. And this marginal distribution of the time studed. 

    The distribution of one variable given a bucket that you are falling into another variable. This is called a conditional distribution. becuase you are getting a distribution conditioned on a value of another variable.

  • 相关阅读:
    Kuangbin带你飞 专题二十 斜率DP
    牛客每日一题3月
    2020 SWJTU-ICPC Training Round #2(18年福建省赛)部分题解
    codeforces每日一题31-40
    SWJTU校队训练&Codeforces&Atcoder&&牛客三月补题
    Kuangbin带你飞-专题九连通图
    POJ 3667 Hotel(线段树区间合并)
    AtCoder Beginner Contest 157(题解)
    2.22专项测试复盘
    2.21专项测试复盘
  • 原文地址:https://www.cnblogs.com/tlfox2006/p/9394253.html
Copyright © 2011-2022 走看看