zoukankan      html  css  js  c++  java
  • 信用评分卡 (part 4 of 7)_高级数据分析

    python信用评分卡(附代码,博主录制)

    Credit Scorecards – Advanced Analytics 高级数据分析(part 4 of 7)

    Modeling in Advanced Analytics

    Advanced Analytics: Model Development – by Roopam

    The room, full of Analysts, erupts with a loud round of laughter when a young business analyst narrates to us an incident from his recent trip back home. A distant aunt inquired about his new profession. His response – I am into modeling. She got all excited and asked – is it just on the ramp or will I see you on the television? Jokes apart, this left me wondering about the roots of the word modeling or model. What is a model?

    A model is defined as a simplified representation of reality. A representation of reality, hmmm, a photograph is a representation of reality – a moment of reality capture on the reel – does that makes it into a model. I think yes. Similarly, a newspaper reporter covering an incident and makes it into breaking news is also a model – a descriptive model. Now, let us try to link models with Analytics.

    当一位年轻的商业分析师向我们讲述他最近回家的事件时,充满分析师的房间爆发出一阵响亮的笑声。 一位遥远的阿姨询问了他的新职业。 他的回答 - 我正在进行建模。 她兴高采烈地问道 - 它只是在坡道上还是我会在电视上看到你? 开玩笑,这让我想知道建模或模型这个词的根源。 什么是模特?

    模型被定义为现实的简化表示。 现实的表现,嗯,照片是现实的代表 - 在卷轴上捕捉现实的瞬间 - 这使它成为一个模型。 我想是的。 同样,报道一个事件并将其作为突发新闻的报纸记者也是一个模型 - 描述性模型。 现在,让我们尝试将模型与Google Analytics相关联。

    Data warehouse, Business Intelligence and Advanced Analytics

    Analytics has received a massive boost because of the emergence of information technology. We are living in the era of big data. A plethora of data collected at every stage of the business process had created a need to extract knowledge out of the information. This overall process has three aspects to it

    1. Data warehouse or data marts: transactional data is extracted-transformed and loaded (ETL) into a data model / schema for the purpose of analysis
    2. Business Intelligence or dashboards: “as is” business reports
    3. Predictive Analytics or Advanced Analytics: high-end statistical and data mining exercise

    As the quantum of data is exponentially increasing, Hadoop and big data technologies are replacing the data warehouses. However, the thought process for business intelligence and predictive analytics – the focus of this article – will not change much. Let me try to distinguish between business intelligence and predictive Analytics using something I learned at a professional theater.

    1.数据仓库或数据集市:事务数据被提取 - 转换和加载(ETL)到数据模型/模式中以进行分析
    2.商业智能或仪表板:“按原样”业务报告
    3.预测分析或高级分析:高端统计和数据挖掘练习

    随着数据量的呈指数增长,Hadoop和大数据技术正在取代数据仓库。但是,商业智能和预测分析的思维过程 - 本文的重点 - 不会发生太大变化。让我尝试使用我在专业剧院学到的东西来区分商业智能和预测分析

    5Ws for business intelligence & predictive Analytics – Lessons from Theater

    5 Ws for Data Warehouse, Business Intelligence, and Advanced Analytics – by Roopam

    I joined a professional theater group a few years ago. To understand the nuances of acting we started with improv or improvisation theater. This form of theater does not have a predefined script but the actors built the story while performing. Most people thought I was a good improv actor. However, the style of remembering dialogue while performing did not work very well for me and hence it was the end of my theater gig. However, I learn some good lessons from the whole experience. One of them was the five-Ws of deciphering a character to build the drama.

    1. What had happened?
    2. When did it happen?
    3. Where did it happen?
    4. Who was part of this?
    5. Why did it happen?

    Clearly, the first four questions are trying to report an as-is version of the reality – a descriptive model. This is exactly what the business intelligence professionals try to achieve through the fancy reporting platforms & software. The fifth question is the trickiest of the lot. The question that keeps scientists and inquisitive minds awake late at night.

    几年前我加入了一个专业剧团。为了理解表演的细微差别,我们从即兴剧或即兴剧开始。这种形式的剧院没有预定义的剧本,但演员在表演时建立了故事。大多数人都认为我是一个很好的即兴演员。然而,在表演时记住对话的风格对我来说并不是很好,因此它是我戏剧演出的结束。但是,我从整个经历中学到了一些好的教训。其中一个是解读一个角色来制作戏剧的五个W.

    1.发生了什么事?
    2.什么时候发生的?
    3它发生在哪里?
    4谁是这个的一部分?
    5.为什么会这样?

    显然,前四个问题试图报告现实的现实版本 - 描述性模型。这正是商业智能专业人员试图通过花哨的报告平台和软件实现的目标。第五个问题是最棘手的问题。让科学家和好奇的头脑在深夜醒来的问题。

    Newton’s Legacy

    An apple falls from a tree. How difficult is it to answer the first four questions? Most of us can answer them with a help of a clock and a map. However, Isaac Newton answered the fifth question and his answer – Gravity. If he had stopped there, nobody would have remembered him after close to four hundred years since his birth. He gave a mathematical model to explain this phenomenon.

    Replace apple and earth with any other objects and you have the general equation for the model. Albert Einstein did shatter the Newtonian notion of Gravity. However, this model still holds good for all problems of practical purposes and used extensively in rocket science.

    Advanced analytics tries to facilitate the answer to the fifth question of why did something happen using predictive modeling.  The combination of high-end statistical and data mining techniques along with analysts’ business acumen produces models that help organizations make informed decisions. Remember, this is just the beginning and causality is still a fair distance!

    一棵苹果从树上掉下来。回答前四个问题有多难?我们大多数人都可以借助时钟和地图来回答这些问题。然而,Isaac Newton回答了第五个问题和他的回答 - Gravity。如果他已经停在那里,那么在他出生后近四百年后,没有人会想起他。他给出了一个数学模型来解释这种现象。

    4重力

    用任何其他物体替换苹果和地球,你就可以得到模型的一般公式。阿尔伯特爱因斯坦确实粉碎了牛顿的重力概念。然而,这种模型仍然适用于所有实际问题,并广泛用于火箭科学。

    高级分析试图通过预测建模来回答第五个问题,即为什么会发生某些事情。高端统计和数据挖掘技术与分析师的商业敏锐度相结合,可以生成帮助组织做出明智决策的模型。请记住,这只是一个开始,因果关系仍然是一个公平的距离

    Credit Scoring Models

    Credit scorecards are models to predict the probability of a borrower default on his/her loan. The following is a simplified version of credit score with three variables

    Credit Score = Age + Loan to Value Ratio (LTV) + Installment (EMI) to Income Ratio (IIR)

    信用记分卡是预测借款人违约贷款概率的模型。 以下是具有三个变量的信用评分的简化版本

    信用评分=年龄+贷款与价值比率(LTV)+分期付款(EMI)与收入比率(IIR)

    贷款价值比,英文loan to value,简写LTV,指贷款金额和抵押品价值的比例,多见于抵押贷款,如房产抵押贷款。

    如某客户A的房产抵押贷款,抵押房产估值为100万人民币,而银行的信贷政策规定LTV<70%,银行最多可以贷给A客户70万元的贷款。

    不同的抵押品贷款的LTV根据银行自身政策,各不相同。反映银行对抵押物的风险预期

    A 28-year-old man with the LTV of 75 and the IIR of 60 will have the score of 10+50+5 =65 and hence is a high credit risk.
    一名28岁男子的LTV为75,IIR为60,他的得分为10 + 50 + 5 = 65,因此信用风险很高。

     

    Classification of good & bad loans using two variables – LTV & IIR – by Roopam

    Now the question is, how did we arrive at the bucket-wise score points and associated risk tables? By now, after going through the previous three articles of the series, you must have some idea how we will go about it. We have a historical list of good / bad borrowers (article 2) that we want to distinguish using predictor variables (article 3). There are several statistical & data mining techniques that could help us achieve our object such as

    1. Decision tree
    2. Neural Networks
    3. Support Vector Machines
    4. Probit Regression
    5. Linear discriminant analysis
    6. Logistic Regression

    Logistic regression is the most commonly used technique for the purpose. We will explore more about logistic regression in the next article.

    Sign-off Note

    I must conclude this article by saying that the good analysts find a good mathematical model as beautiful as the model walking on the catwalk ramp.

    现在的问题是,我们是如何得出存储分数和相关风险表的? 到目前为止,在完成系列的前三篇文章之后,你必须知道我们将如何去做。 我们有一个好/坏借款人的历史清单(第2条),我们希望使用预测变量来区分(第3条)。 有几种统计和数据挖掘技术可以帮助我们实现我们的目标,例如

    1.决策树
    2.神经网络
    3.支持向量机
    4.概率回归
    5.线性判别分析
    6. Logistic回归

    Logistic回归是最常用的技术。 我们将在下一篇文章中探讨有关逻辑回归的更多信息。

    签字笔记
    我必须在结束本文时说,优秀的分析师找到了一个很好的数学模型,就像模特走在T台上一样漂亮。

    python风控建模实战lendingClub(博主录制,catboost,lightgbm建模,2K超清分辨率)

    https://study.163.com/course/courseMain.htm?courseId=1005988013&share=2&shareId=400000000398149

     微信扫二维码,免费学习更多python资源

     
  • 相关阅读:
    2019CSUST集训队选拔赛题解(二)
    2019CSUST集训队选拔赛题解(一)
    Dilworth定理
    直线石子合并(区间DP)
    后缀自动机 个人学习笔记
    HDU_6709 CCPC网络赛H 优先队列 贪心
    2019省赛翻车记
    【挖坑】某场组队训练找到的想要挖一挖的东西
    暑假补题需要点的技能点
    QAQorz的训练记录
  • 原文地址:https://www.cnblogs.com/webRobot/p/9736387.html
Copyright © 2011-2022 走看看