添加目录和文件

This commit is contained in:
marslbr 2025-10-29 12:55:17 +08:00
parent fc4e7fce59
commit 0605aa66dd
11 changed files with 2728 additions and 0 deletions

6
.idea/vcs.xml Normal file
View File

@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="VcsDirectoryMappings">
<mapping directory="" vcs="Git" />
</component>
</project>

View File

@ -0,0 +1,256 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset=“UTF-8”>
<title>分析报告 - 结网</title>
<meta name="jwreport" content="width=device-width, initial-scale=1.0">
<link rel="icon" type ="image/png" href="../icon.png">
<link rel="stylesheet" type="text/css" href="../stylesheet.css">
<script>
MathJax = {
tex: {inlineMath: [['$', '$'], ['\\(', '\\)']]}
};
</script>
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" type="text/javascript" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
</head>
<body>
<div class="document-container">
<p class="document-block-title">基于MovieLens的用户画像建模报告</p>
<div class="document-block-description">
<span>刘弼仁</span>
<span> | </span>
<span>{{report_date}}</span>
</div>
<h1>报告背景</h1>
<p>用户画像是一种将用户属性、行为和好恶等信息转化为标签并关联而形成的用户模型。其目的是连接用户需求和产品运营,帮助企业更高效、更精准的认识、理解用户,从而更有效的分配资源提供为用户提供产品和服务、提高运营效率。</p>
<blockquote>
<i>Ta是一个怎样的人</i>
<br>
男性中青年产品经理北方人喜欢美食、宠物和3C和网络游戏阴阳师和原神
</blockquote>
<p>本次报告数据来源于MovieLens通过介绍用户画像建模并将其应用于用户分群和电影推荐的流程希望读者能够对用户画像有所了解。</p>
<h1>用户画像建模</h1>
<h2>数据集解释</h2>
<p>本次报告使用的数据集为MovieLens10K。该数据集为GroupLens从MovieLens网站收集并提供的用户、电影和评分数据。发布时间为2009年1月1日。</p>
<h3>数据集文件结构</h3>
<img src="MovieLens数据集的文件结构.png"></img>
<h2>建模流程</h2>
<h3>获取数据</h3>
<p>通过打开本地数据文件获取数据集,其中用户数为{{users}},电影数为{{movies}},评分数为{{ratings}}。</p>
<h3>构建标签体系</h3>
<p>考虑本次报告所构建的用户画像将应用于用户分群和电影推荐,设计标签分类如下:</p>
<img src="标签分类.png"></img>
<p>因为在用户分群时将使用基于核函数的主成分分析算法和K均值聚类算法、电影推荐时将使用Apriori算法所以本次报告使用独热编码方法生成标签。</p>
<blockquote>
<i>独热编码One-Hot Encoding</i>
<br>
一种将称名或顺序型特征转化为机器学习算法容易处理的格式的方法它为特征的每个值衍生一个二进制的新特征。在该衍生特征中如果某份样本属于该特征值则为1否则为0。
<br>
<br>
例如“性别”特征将会衍生“gender: male”和“gender: female”两个新特征并会为所有样本按照衍生特征赋值0/1。其中男性为{“gender:male”: 1}{“gender:female”: 0};女性为{“gender:male”: 0}{“gender:female”: 1}。
</blockquote>
<h3>整体用户画像</h3>
<p>通过各特征来描绘用户整体画像。</p>
<h4>性别</h4>
<iframe src="label_genders.html"></iframe>
<h4>年龄</h4>
<p>定义年龄小于18岁为“age: under18”18至24岁为“age: 18~24”25至34岁为“age: 25~34”35至44岁为“age: 35~44”45至54岁至“age: 45~54”大于54岁为“age: above54”。</p>
<iframe src="label_ages.html"></iframe>
<h4>职业</h4>
<iframe src="label_occupations.html"></iframe>
<h4>州级行政区</h4>
<p>根据邮政编码(美国)解析州级行政区。</p>
<iframe src="label_states.html"></iframe>
<h4>评分数</h4>
<p>统计每位用户评分数并按照等频分箱、分箱数为5。</p>
<iframe src="label_ratings.html"></iframe>
<h4>平均评分</h4>
<p>统计每位用户平均评分并定义为平均评分并按照等频分箱、分箱数为5。</p>
<iframe src="label_rating.html"></iframe>
<h4>最近评分最高的电影TOP1至5</h4>
<p>统计每位用户最近评分最高的电影前五名并定义为最喜欢的电影TOP1至5。</p>
<iframe src="label_movies.html"></iframe>
<h4>喜欢的电影体裁TOP1至5</h4>
<p>统计每位用户评分过的电影体裁数前五名并定义为喜欢的电影体裁TOP1至5。</p>
<iframe src="label_genres.html"></iframe>
<h3>用户分群</h3>
<p>按照用户标签的关联性进行归类分组,企业可以将用户群体划分为具有相似属性、行为和好恶的细分群体。</p>
<h4>算法说明</h4>
<p>本次报告首先使用基于核函数的主成分分析算法就衍生特征进行降维处理最后使用K均值聚类算法进行聚类处理并基于间隔统计量确定最优聚类簇数最终实现用户分群目的。</p>
<blockquote>
<i>基于核函数的主成分分析</i>
<br>
首先使用核函数(例如径向基)方法,将高维且线性不可分的数据映射到更高维但线性可分的特征空间,最后在该特征空间使用主成分分析算法进行降维处理。
<br>
<br>
<i>主成分分析</i>
<br>
一种常用的数据降维和特征提取的统计方法,其假设是数据内在存在线性关系,通过将数据转换到由主成分构成的坐标系统,从而在最大保留数据信息的前提下降维。
<br>
<br>
<i>核函数</i>
<br>
考虑本次报告使用独热编码方法生成的标签数据具有高维性和稀疏性通常使用基于余弦相似度核函数Cosine Similarity Kernel的主成分分析算法的降维效果更好。
</blockquote>
<blockquote>
<i>K均值聚类</i>
<br>
一种无监督学习的基于距离的聚类算法其目标是将数据点划分为K个簇使得相同簇的数据点尽可能相同、不同簇的数据点尽可能不同。缺点是需要预先指定聚类簇数本报告使用间隔统计量确定最优聚类簇数。
<br>
<br>
<i>间隔统计量GapStatistic</i>
<br>
用于评估聚类效果的统计方法,公式如下:
\[Gap(K)=E(log{D}_{K})-log{D}_{K}\]
其中,\({D}_{K}=\sum {\sum {dist{(x,c)}^{2}}}\)。选择\(Gap(K)>=Gap(K+1)\)的最小K值作为最优聚类簇数。
</blockquote>
</div>
<footer>
<p>临渊羡鱼,不如退而结网</p>
</footer>
<script>
function resetIframeHeight(iframeId) {
const iframe = document.getElementById(iframeId);
iframe.style.height = '0px';
iframe.contentWindow.postMessage({action: 'requestHeight', iframeId: iframeId}, '*');
}
window.addEventListener('message', function(event) {
if (event.data.action === 'responseHeight') {
const iframe = document.getElementById(event.data.iframeId);
if (iframe) {
iframe.style.height = event.data.height + 'px';
}
}
}, false);
</script>
</body>
</html>

View File

@ -0,0 +1,283 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset=“UTF-8”>
<title>分析报告 - 结网</title>
<meta name="jwreport" content="width=device-width, initial-scale=1.0">
<link rel="icon" type ="image/png" href="../icon.png">
<link rel="stylesheet" type="text/css" href="../stylesheet.css">
<script>
MathJax = {
tex: {inlineMath: [['$', '$'], ['\\(', '\\)']]}
};
</script>
<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
<script id="MathJax-script" type="text/javascript" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
</head>
<body>
<div class="document-container">
<p class="document-block-title">基于GiveMeSomeCredit的贷款申请评分卡建模报告</p>
<div class="document-block-description">
<span>刘弼仁</span>
<span> | </span>
<span>{{report_date}}</span>
</div>
<h1>建模背景</h1>
<p>贷款申请评分卡是一种成熟的应用统计模型,其作用是对申请人做风险评估,识别可能产生逾期的客户并做出决策,包括申请审批和风险定价等,具有较高的准确性和可靠性。</p>
<p>本报告数据来源于GiveMeSomeCredit通过介绍评分卡典型建模流程希望读者能够就贷款申请评分模型有了初步了解。</p>
<h1>算法选择说明</h1>
<p>在贷款申请评分卡建模过程中通常选择逻辑回归Logistics Regression算法该算法的函数作用是将申请人的贷款申请信息综合起来并转化为逾期概率为决策人员提供了量化风险评估的依据。</p>
<h2>决策人员的风险评估思路如下:</h2>
<img src="risk_assessment.png"></img>
<p>审批贷款申请时,假设只有通过或拒绝两种审批结果,审批通过的概率为\(approve\)。审批通过后,客户也只有还款或逾期两种还款结果,逾期的概率为\(overdue\)(还款的概率为\(repay\)\(overdue+repay=1\))。银行就逾期的损失为\(loss\),就还款的收益为\(revenue\)。综合收益为:</p>
\[approve(repay\times revenue-overdue\times loss)\]
<p>站在决策人员的立场,审批通过的充分条件为综合收益大于零,推导可得:</p>
\[overdue / repay < revenue / loss\]
<p>就是说,当该申请人发生逾期的概率和还款的概率的比值(定义为逾期还款概率比率,\(odd\))小于收益和损失的比值审批通过。所以,计算申请人的逾期还款概率比率成为首要工作。</p>
<p>假设,已知申请人的贷款申请信息(定义为特征变量的值的集合,\(x=({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m)})\)),则该申请人的逾期还款概率比率为:</p>
\[odd(overdue|({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m}))=p(overdue|({x}_{1,}{x}_{2},\cdot \cdot \cdot {x}_{m}))/p(repay|({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m}))\]
\[=p(overdue)/p(repay)\times (f({x}_{1,}{x}_{2},\cdot \cdot \cdot {x}_{m}|overdue)/f({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m}|repay))\]
\[=p(overdue)/p(repay)\times f({x}_{1}|overdue)/f({x}_{1}|repay)\times f({x}_{2}|overdue)/f({x}_{2}|repay)\times \cdot \cdot \cdot f({x}_{m}|overdue)/f({x}_{m}|repay)\]
\[\to F(x)=ln(odd(overdue|({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m})))\]
\[=ln(p(overdue)/p(repay))+ln(f({x}_{1}|overdue)/f({x}_{1}|repay))+\cdot \cdot \cdot ln(f({x}_{m}|overdue)/f({x}_{m}|repay))\]
<p>定义\(ln(f({x}_{i}|overdue)/f({x}_{i}|repay))\)为特征变量的值的的证据权重\(woe({x}_{i})\),就数据集而言证据权重是评价某个特征变量逾期还款分布情况的较好统计量。</p>
<p>综上所述,在每个特征变量相互独立的情况下,计算申请人的逾期还款概率比率为对数逾期还款样本比率加上各特征变量的值的证据权重,即\(F(x)=a+\sum ^{m}_{i=1} {woe({x}_{i})}\)。</p>
<p>另外,推导可得:</p>
\[odd(overdue|F({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m})=F(x))={e}^{F(x)}\]
\[=p(overdue|({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m}))/p(repay|({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m}))\]
\(\to p(overdue|({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m}))=1/(1+{e}^{-F(x)})\),刚好是逻辑回归函数!
<p>将\(F(x)\)由对数比率经线性转化则为贷款申请评分卡!</p>
<h1>建模流程</h1>
<h2>获取数据</h2>
<p>连接数据库获取原始数据集,目标变量为<b>SeriousDlqin2yrs</b>,特征变量数为<b>10</b>个。数据预览如下:</p>
<iframe id="dataset_preview" src="dataset_preview.html" onload="resetIframeHeight('dataset_preview')"></iframe>
<h2>数据预处理</h2>
<h3>数据清洗</h3>
<p>删除目标变量包含缺失值和重复的样本。处理后,样本数为<b>{{samples}}</b>份。</p>
<h3>缺失值处理</h3>
<p>在特征变量证据权重编码时,将对缺失值单独作为一箱并纳入模型。</p>
<h3>异常值处理</h3>
<p>在特征变量证据权重编码时,可消除异常值的影响,故不作异常值处理。</p>
<h3>特征变量证据权重编码</h3>
<p>逻辑回归假设之一为特征变量和目标变量之间存在线性关系,但在实际情况多为非线性。通过分箱,可将非线性关系转化为线性。另外,分箱可以减少缺失值和异常值对逻辑回归的影响并提升逻辑回归的鲁棒性。</p>
<p>本次报告使用决策树进行分箱分箱后使用证据权重编码。以特征变量“Age”为例其证据权重编码结果如下</p>
<iframe src="dictionary.html"></iframe>
<p>由上图可看出特征变量“Age”分箱后各箱证据权重呈线性关系且单调递减即随着年龄升高逾期还款概率比率降低。这与贷款申请审批经验符合其经济稳定性的增强、收入水平的提升、信用记录的积累、消费观念的成熟以及风险管理能力的提升表现出更低的逾期风险。</p>
<p><b>决策树分箱说明</b></p>
<ol>
<li>统计特征变量值数取其与5的最小值作为决策树算法的最大叶节点数本次报告控制最大分箱数为5最小叶节点样本数为5%)。</li>
<li>基于最大叶节点数使用决策树算法就特征变量拟合目标变量,利用决策树的分裂节点作为划分点进而将连续型特征变量划分为不同的区间。</li>
<li>统计各区间的证据权重并检验单调性。</li>
<li>如果检验通过则将上述区间作为特征变量分箱结果。如果检验未通过则将最大叶节点数减1并重复上述步骤至检验通过。</li>
</ol>
<h2>特征变量选择</h2>
<h3>基于信息价值选择特征变量</h3>
<p>信息价值是与证据权重密切相关的指标可用来评估特征变量的预测能力。通常选择信息价值大于等于0.1的特征变量。</p>
\[iv=(overduty-repay)\times ln(odd)\]
<p><b>信息价值说明</b></p>
<p>概率是描述随机变量确定性的量度,熵是描述随机变量不确定性的量度。假设\(p(x)\)和\(q(x)\)是逾期和还款的两个概率分布,可使用相对熵表示\(q(x)\)拟合\(p(x)\)所产生的信息损失,公式如下:</p>
\[D(p||q)=\sum {p(x)log(p(x)/q(x))}\]
<p>相对熵没有对称性,即\(D(p||q)\neq D(q||p)\)如果将两个概率分布之间的相对熵求和和越大说明两个概率分布的距离越大。该和即为KL距离公式如下</p>
\[DistanceKL=\int {(f(p|overduty)-f(p|repay))\times log(f(p|overduty)/f(p|overduty))dx}\]
<p>上式离散形式即为信息价值。在选择特征变量时,特征变量的信息价值越大说明逾期还款的概率分布的距离越大、区分逾期还款的能力越强。</p>
<h3>基于有条件的后向步进淘汰特征变量</h3>
<p>使用逻辑回归算法需检验其前提条件:</p>
<ul>
<li>特征变量之间相互独立</li>
<li>特征变量的回归系数均为正数</li>
</ul>
<p>本次报告使用方差扩大因子Variance Inflation Factor评估特征变量与其它变量的共线性。通常淘汰方差扩大因子大于5的特征变量。</p>
\[vif=1/(1-{maximun(r)}^{2})\]
<p>其中,\(r\)为特征变量与其它特征变量的复相关系数。</p>
<p><b>有条件的后向步进淘汰特征变量说明</b></p>
<ol>
<li>统计特征变量的方差扩大因子和回归系数。</li>
<li>淘汰方差扩大因子大于5或回归系数小于0.1且方差扩子因子最大的特征特征变量。</li>
<li>重复上述步骤至没有特征变量可淘汰。</li>
</ol>
<p>处理后,选择的特征变量数为<b>{{variables_independent}}</b>个,特征变量预览如下:</p>
<iframe id="statistics" src="statistics.html" onload="resetIframeHeight('statistics')"></iframe>
<h2>评分卡开发和验证</h2>
<h3>评分卡开发</h3>
<p>本次报告中贷款申请评分卡公式为(本次报告控制\(a\)为500\(b\)为\(50/ln(2)\)</p>
\[score=a-blog(odd(overdue|({x}_{1},{x}_{2},\cdot \cdot \cdot {x}_{m})))\]
\[=a-b({\beta }_{0}+{\beta }_{1}woe({x}_{1})+{\beta }_{2}woe({x}_{2})+\cdot \cdot \cdot {\beta }_{m}woe({x}_{m}))\]
<p>其中,\({\beta }_{i}\)为特征变量的回归系数(\({\beta }_{0}\)基于回归系数分摊至各特征变量)。</p>
<p>以“Age”为例其评分卡编制结果如下</p>
<iframe id="dictionary_score" src="dictionary_score.html" onload="resetIframeHeight('dictionary_score')"></iframe>
<p>由上表可看出,\(分数=加权基础分数+加权回归系数\times 证据权重\)。</p>
<h3>评分卡验证</h3>
<p>本次报告使用柯斯和提升统计量评估评分卡,柯斯统计量为<b>{{ks}}</b>,提升统计量为<b>{{lift}}</b></p>
<p><b>柯斯统计量说明</b></p>
<p>柯斯统计量全称Kolmogorov-Smirnov常用于评估模型对于目标变量的区分能力。先将总分数划分为若干区间并作为横坐标再将逾期和还款的累计样本数占比作为纵坐标即可绘制两条洛伦兹曲线。柯斯统计量就是两条洛伦兹曲线间最大距离。</p>
<p>通常柯斯统计量小于20不建议使用该评分卡20~40说明该评分卡区分能力较好、40~50良好、50~60很好、60~75非常好大于75建议审慎使用。</p>
<p><b>提升统计量说明</b></p>
<p>提升统计量,常用于量化评估模型对目标变量的预测能力较随机选择的提升程度。先将总分数划分为若干区间并作为横坐标,再计算各区间的累计逾期样本数占比和累计样本数占比的比值,最大值就是提升统计量。</p>
<p>通常提升统计量折线图在高位保持若干区间后迅速下降至1时表示该评分卡区分能力较好。</p>
<iframe src="model_evaluation.html"></iframe>
<p><b>评分卡评价表</b></p>
<iframe id="business_evaluation" src="business_evaluation.html" onload="resetIframeHeight('business_evaluation')"></iframe>
<p>以分箱<b>[500, 550)</b>为例该分箱5.61%是逾期客户。假设审批通过16位客户产生的收益可平衡1位逾期客户的损失5.61%可作为平衡点拒绝规则不能低于550否则损失大于收益。</p>
<p>以拒绝规则<b><550</b>为例若选择该拒绝规则则会拒绝36.53%客户这部分中15.72%是逾期客户。使用该评分卡后逾期客户减少85.59%。</p>
</div>
<footer>
<p>临渊羡鱼,不如退而结网</p>
</footer>
<script>
function resetIframeHeight(iframeId) {
const iframe = document.getElementById(iframeId);
iframe.style.height = '0px';
iframe.contentWindow.postMessage({action: 'requestHeight', iframeId: iframeId}, '*');
}
window.addEventListener('message', function(event) {
if (event.data.action === 'responseHeight') {
const iframe = document.getElementById(event.data.iframeId);
if (iframe) {
iframe.style.height = event.data.height + 'px';
}
}
}, false);
</script>
</body>
</html>

238
rfm/template.html Normal file
View File

@ -0,0 +1,238 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset=“UTF-8”>
<title>数据报告</title>
<link rel="icon" type ="image/png" href="icon.png">
<link rel="stylesheet" type="text/css" href="stylesheet.css">
</head>
<body>
<div class="document-container">
<p class="document-block-title">基于RFM模型的客户价值分析报告</p>
<div class="document-block-description">
<span>刘弼仁</span>
<span> | </span>
<span>{{report_date}}</span>
</div>
<h1>分析背景</h1>
<p>在面向客户制定运营策略时我们希望针对不同的客户推行不同的策略实现精准化运营以期获得最大的投入产出比ROI。精准化运营的前提是客户分类。通过客户分类细分出不同的客户群体对不同的客户群体采取不同的运营策略合理分配有限的资源以实现投入产出最大化。</p>
<p>在客户分类中RFM模型是一个经典的客户分类模型该模型利用交易环节中最核心的三个变量即最近消费Recency、消费频率Frequency和消费金额Monetray细分客户群体从而分析不同群体的客户价值。</p>
<p>本报告使用Kaggle的SuperstoreData作为数据集探索如何基于RFM模型对客户群体进行细分以及细分后如何对客户价值进行分析。</p>
<h1>分析过程</h1>
<h2>数据预览</h2>
<iframe scrolling="no" style="height: 300px !important;" src="数据预览.html"></iframe>
<p>数据集共{{sample_size}}份样本。其中客户ID数据类型为字符串交易金额为小数交易日期为日期。</p>
<h2>构建RFM模型</h2>
<p>其中R为最近一次交易日期距最远交易日期间隔单位为日数据类型为整数F为交易笔数数据类型为整数M为累计交易金额数据类型为小数。R、F和M均已正向化。</p>
<h2>客户分类</h2>
<p>本报告就R、F和M基于平均值划分为小于等于平均值部分和大于部分</p>
<table>
<tr>
<th>客户分类</th>
<th>R大于R平均值</th>
<th>F大于F平均值</th>
<th>M大于M平均值</th>
</tr>
<tr>
<td>流失客户</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>一般维持客户</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>新客户</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>潜力客户</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>重要挽留客户</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>重要深耕客户</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>重要唤回客户</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>重要价值客户</td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
<h1>数据解读</h1>
<h2>近十二个自然月客户数趋势</h2>
<iframe src="近十二个自然月客户数趋势.html"></iframe>
<p>上图表示近十二个自然月的每个自然月对应的前后滚动十二个自然月的客户数,反映了客户发展的趋势为越来越多。</p>
<h2>客户分类分布</h2>
<iframe src="客户分类分布.html"></iframe>
<p>上图表示各客户分类在R、F和M分布其中R越靠近右侧反映了该客户分类越最近交易F越靠近上侧反映了该客户分类越交易频繁M越大反映了该客户分类越交易金额大。</p>
<h2>客户占比</h2>
<iframe src="客户占比.html"></iframe>
<p>上图表示各客户分类的客户占比,反映了重要价值客户、流失客户和新客户这三类客户分类的客户占比较大,是后续分析的重点。</p>
<h2>交易金额占比</h2>
<iframe src="交易金额占比.html"></iframe>
<p>上图表示各客户分类的交易金额占比,反映了重要价值客户、新客户和重要唤回客户这三类客户分类的交易金额占比较大。</p>
<h2>近十二个自然月客户占比趋势</h2>
<iframe src="近十二个自然月客户占比趋势.html"></iframe>
<p>上图表示重要价值客户、流失客户和新客户这三类客户分类的客户占比,反映了近期新客户占比提升、重要价值客户和流式客户占比下降,建议针对重要价值客户制定相应运营策略。</p>
<h2>近十二个自然月留存率趋势</h2>
<iframe src="近十二个自然月留存率趋势.html"></iframe>
<p>上图表示重要价值客户、流失客户和新客户这三类客户分类的近十二个自然月的留存率,反映了重要价值客户较流式客户和新客户黏性大,近期新客户黏性较大。</p>
<p>通过客户分类,我们可以根据客户细分群体制定相应的产品和运营策略和方案:</p>
<ul>
<li>重要价值客户</li>
<p>最近交易、经常交易、交易金额高,建议提供客制化商品/服务以维持客户交易能力和忠诚度。</p>
<li>潜力客户</li>
<p>最近交易、经常交易、交易金额低,建议提供短期、针对性促销方案以期提升客户交易能力。</p>
<li>重要唤回客户</li>
<p>最近无交易、过往经常交易且交易金额高,建议调查客户流失原因并尝试挽回,加强客户关系管理以期提升客户满意度。</p>
<li>...</li>
</ul>
</div>
<footer>
<p>临渊羡鱼,不如退而结网</p>
</footer>
</body>
</html>

224
rfm/交易金额占比.html Normal file
View File

@ -0,0 +1,224 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Awesome-pyecharts</title>
<script type="text/javascript" src="https://assets.pyecharts.org/assets/v5/echarts.min.js"></script>
<script type="text/javascript" src="https://assets.pyecharts.org/assets/v5/themes/walden.js"></script>
</head>
<body >
<div id="72698a8dae1c4f28857f18e2c25602af" class="chart-container" style="width:500px; height:350px; "></div>
<script>
var chart_72698a8dae1c4f28857f18e2c25602af = echarts.init(
document.getElementById('72698a8dae1c4f28857f18e2c25602af'), 'walden', {renderer: 'canvas'});
var option_72698a8dae1c4f28857f18e2c25602af = {
"animation": false,
"animationThreshold": 2000,
"animationDuration": 1000,
"animationEasing": "cubicOut",
"animationDelay": 0,
"animationDurationUpdate": 300,
"animationEasingUpdate": "cubicOut",
"animationDelayUpdate": 0,
"aria": {
"enabled": false
},
"series": [
{
"type": "pie",
"name": "\u5ba2\u6237\u5206\u7c7b",
"colorBy": "data",
"legendHoverLink": true,
"selectedMode": false,
"selectedOffset": 10,
"clockwise": true,
"startAngle": 90,
"minAngle": 0,
"minShowLabelAngle": 0,
"avoidLabelOverlap": true,
"stillShowZeroSum": true,
"percentPrecision": 2,
"showEmptyCircle": true,
"emptyCircleStyle": {
"color": "lightgray",
"borderColor": "#000",
"borderWidth": 0,
"borderType": "solid",
"borderDashOffset": 0,
"borderCap": "butt",
"borderJoin": "bevel",
"borderMiterLimit": 10,
"opacity": 1
},
"data": [
{
"name": "\u91cd\u8981\u4ef7\u503c\u5ba2\u6237",
"value": 72.11
},
{
"name": "\u65b0\u5ba2\u6237",
"value": 7.44
},
{
"name": "\u91cd\u8981\u5524\u56de\u5ba2\u6237",
"value": 5.87
},
{
"name": "\u6d41\u5931\u5ba2\u6237",
"value": 5.57
},
{
"name": "\u6f5c\u529b\u5ba2\u6237",
"value": 4.41
},
{
"name": "\u91cd\u8981\u6df1\u8015\u5ba2\u6237",
"value": 2.92
},
{
"name": "\u91cd\u8981\u633d\u7559\u5ba2\u6237",
"value": 0.98
},
{
"name": "\u4e00\u822c\u7ef4\u6301\u5ba2\u6237",
"value": 0.69
}
],
"radius": [
"0%",
"75%"
],
"center": [
"50%",
"60%"
],
"roseType": "area",
"label": {
"show": true,
"position": "outside",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{b|{b}}\n{c|{c}}",
"rich": {
"b": {
"fontWeight": "bold"
},
"c": {
"lineHeight": 25
}
},
"valueAnimation": false
},
"labelLine": {
"show": true,
"showAbove": false,
"length": 15,
"length2": 15,
"smooth": false,
"minTurnAngle": 90,
"maxSurfaceAngle": 90
}
}
],
"legend": [
{
"data": [
"\u91cd\u8981\u4ef7\u503c\u5ba2\u6237",
"\u65b0\u5ba2\u6237",
"\u91cd\u8981\u5524\u56de\u5ba2\u6237",
"\u6d41\u5931\u5ba2\u6237",
"\u6f5c\u529b\u5ba2\u6237",
"\u91cd\u8981\u6df1\u8015\u5ba2\u6237",
"\u91cd\u8981\u633d\u7559\u5ba2\u6237",
"\u4e00\u822c\u7ef4\u6301\u5ba2\u6237"
],
"selected": {},
"show": false,
"left": "center",
"top": "top",
"orient": "horizontal",
"align": "auto",
"padding": 5,
"itemGap": 10,
"itemWidth": 10,
"itemHeight": 10,
"inactiveColor": "#86909C",
"textStyle": {
"color": "#86909C",
"fontStyle": "normal",
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"fontSize": 12
},
"backgroundColor": "transparent",
"borderColor": "#ccc",
"borderWidth": 0,
"borderRadius": 0,
"pageButtonItemGap": 5,
"pageButtonPosition": "end",
"pageFormatter": "{current}/{total}",
"pageIconColor": "#2f4554",
"pageIconInactiveColor": "#aaa",
"pageIconSize": 15,
"animationDurationUpdate": 800,
"selector": false,
"selectorPosition": "auto",
"selectorItemGap": 7,
"selectorButtonGap": 10
}
],
"tooltip": {
"show": false,
"trigger": "item",
"triggerOn": "mousemove|click",
"axisPointer": {
"type": "line"
},
"showContent": true,
"alwaysShowContent": false,
"showDelay": 0,
"hideDelay": 100,
"enterable": false,
"confine": false,
"appendToBody": false,
"transitionDuration": 0.4,
"textStyle": {
"fontSize": 14
},
"borderWidth": 0,
"padding": 5,
"order": "seriesAsc"
},
"title": [
{
"show": true,
"target": "blank",
"subtarget": "blank",
"padding": 5,
"itemGap": 10,
"textAlign": "auto",
"textVerticalAlign": "auto",
"triggerEvent": false
}
]
};
chart_72698a8dae1c4f28857f18e2c25602af.setOption(option_72698a8dae1c4f28857f18e2c25602af);
</script>
</body>
</html>

613
rfm/客户分类分布.html Normal file
View File

@ -0,0 +1,613 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Awesome-pyecharts</title>
<script type="text/javascript" src="https://assets.pyecharts.org/assets/v5/echarts.min.js"></script>
<script type="text/javascript" src="https://assets.pyecharts.org/assets/v5/themes/walden.js"></script>
</head>
<body >
<div id="091c7865f8594b17bc01d20333af9128" class="chart-container" style="width:500px; height:350px; "></div>
<script>
var chart_091c7865f8594b17bc01d20333af9128 = echarts.init(
document.getElementById('091c7865f8594b17bc01d20333af9128'), 'walden', {renderer: 'canvas'});
var option_091c7865f8594b17bc01d20333af9128 = {
"animation": false,
"animationThreshold": 2000,
"animationDuration": 1000,
"animationEasing": "cubicOut",
"animationDelay": 0,
"animationDurationUpdate": 300,
"animationEasingUpdate": "cubicOut",
"animationDelayUpdate": 0,
"aria": {
"enabled": false
},
"series": [
{
"type": "scatter",
"name": "\u4e00\u822c\u7ef4\u6301\u5ba2\u6237",
"symbolSize": 10,
"data": [
{
"name": 3003.74,
"value": [
564.64,
18.43
],
"symbolSize": 32.63309719719160003383614417,
"symbolKeepAspect": false,
"label": {
"show": true,
"position": "right",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{a|{a}}\n{b|{b}}",
"rich": {
"a": {
"fontWeight": "bold"
},
"b": {
"lineHeight": 25
}
},
"valueAnimation": false
}
}
],
"label": {
"show": true,
"position": "right",
"margin": 8,
"valueAnimation": false
}
},
{
"type": "scatter",
"name": "\u65b0\u5ba2\u6237",
"symbolSize": 10,
"data": [
{
"name": 1216.97,
"value": [
690.72,
7.48
],
"symbolSize": 22.38607900739663846510074138,
"symbolKeepAspect": false,
"label": {
"show": true,
"position": "right",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{a|{a}}\n{b|{b}}",
"rich": {
"a": {
"fontWeight": "bold"
},
"b": {
"lineHeight": 25
}
},
"valueAnimation": false
}
}
],
"label": {
"show": true,
"position": "right",
"margin": 8,
"valueAnimation": false
}
},
{
"type": "scatter",
"name": "\u6d41\u5931\u5ba2\u6237",
"symbolSize": 10,
"data": [
{
"name": 800.91,
"value": [
467.05,
5.30
],
"symbolSize": 20,
"symbolKeepAspect": false,
"label": {
"show": true,
"position": "right",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{a|{a}}\n{b|{b}}",
"rich": {
"a": {
"fontWeight": "bold"
},
"b": {
"lineHeight": 25
}
},
"valueAnimation": false
}
}
],
"label": {
"show": true,
"position": "right",
"margin": 8,
"valueAnimation": false
}
},
{
"type": "scatter",
"name": "\u6f5c\u529b\u5ba2\u6237",
"symbolSize": 10,
"data": [
{
"name": 2977.77,
"value": [
687.19,
19.88
],
"symbolSize": 32.48416081344384562115848649,
"symbolKeepAspect": false,
"label": {
"show": true,
"position": "right",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{a|{a}}\n{b|{b}}",
"rich": {
"a": {
"fontWeight": "bold"
},
"b": {
"lineHeight": 25
}
},
"valueAnimation": false
}
}
],
"label": {
"show": true,
"position": "right",
"margin": 8,
"valueAnimation": false
}
},
{
"type": "scatter",
"name": "\u91cd\u8981\u4ef7\u503c\u5ba2\u6237",
"symbolSize": 10,
"data": [
{
"name": 7775.70,
"value": [
702.08,
28.24
],
"symbolSize": 60,
"symbolKeepAspect": false,
"label": {
"show": true,
"position": "right",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{a|{a}}\n{b|{b}}",
"rich": {
"a": {
"fontWeight": "bold"
},
"b": {
"lineHeight": 25
}
},
"valueAnimation": false
}
}
],
"label": {
"show": true,
"position": "right",
"margin": 8,
"valueAnimation": false
}
},
{
"type": "scatter",
"name": "\u91cd\u8981\u5524\u56de\u5ba2\u6237",
"symbolSize": 10,
"data": [
{
"name": 7291.27,
"value": [
598.71,
25.16
],
"symbolSize": 57.22182316600213053009481289,
"symbolKeepAspect": false,
"label": {
"show": true,
"position": "right",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{a|{a}}\n{b|{b}}",
"rich": {
"a": {
"fontWeight": "bold"
},
"b": {
"lineHeight": 25
}
},
"valueAnimation": false
}
}
],
"label": {
"show": true,
"position": "right",
"margin": 8,
"valueAnimation": false
}
},
{
"type": "scatter",
"name": "\u91cd\u8981\u633d\u7559\u5ba2\u6237",
"symbolSize": 10,
"data": [
{
"name": 5439.61,
"value": [
503.55,
9.09
],
"symbolSize": 46.60266473972693084666348378,
"symbolKeepAspect": false,
"label": {
"show": true,
"position": "right",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{a|{a}}\n{b|{b}}",
"rich": {
"a": {
"fontWeight": "bold"
},
"b": {
"lineHeight": 25
}
},
"valueAnimation": false
}
}
],
"label": {
"show": true,
"position": "right",
"margin": 8,
"valueAnimation": false
}
},
{
"type": "scatter",
"name": "\u91cd\u8981\u6df1\u8015\u5ba2\u6237",
"symbolSize": 10,
"data": [
{
"name": 5081.89,
"value": [
686.49,
12.03
],
"symbolSize": 44.55116211384142031516361066,
"symbolKeepAspect": false,
"label": {
"show": true,
"position": "right",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{a|{a}}\n{b|{b}}",
"rich": {
"a": {
"fontWeight": "bold"
},
"b": {
"lineHeight": 25
}
},
"valueAnimation": false
}
}
],
"label": {
"show": true,
"position": "right",
"margin": 8,
"valueAnimation": false
}
}
],
"legend": [
{
"data": [
"\u4e00\u822c\u7ef4\u6301\u5ba2\u6237",
"\u65b0\u5ba2\u6237",
"\u6d41\u5931\u5ba2\u6237",
"\u6f5c\u529b\u5ba2\u6237",
"\u91cd\u8981\u4ef7\u503c\u5ba2\u6237",
"\u91cd\u8981\u5524\u56de\u5ba2\u6237",
"\u91cd\u8981\u633d\u7559\u5ba2\u6237",
"\u91cd\u8981\u6df1\u8015\u5ba2\u6237"
],
"selected": {},
"show": false,
"left": "center",
"top": "top",
"orient": "horizontal",
"align": "auto",
"padding": 5,
"itemGap": 10,
"itemWidth": 10,
"itemHeight": 10,
"inactiveColor": "#86909C",
"textStyle": {
"color": "#86909C",
"fontStyle": "normal",
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"fontSize": 12
},
"backgroundColor": "transparent",
"borderColor": "#ccc",
"borderWidth": 0,
"borderRadius": 0,
"pageButtonItemGap": 5,
"pageButtonPosition": "end",
"pageFormatter": "{current}/{total}",
"pageIconColor": "#2f4554",
"pageIconInactiveColor": "#aaa",
"pageIconSize": 15,
"animationDurationUpdate": 800,
"selector": false,
"selectorPosition": "auto",
"selectorItemGap": 7,
"selectorButtonGap": 10
}
],
"tooltip": {
"show": false,
"trigger": "item",
"triggerOn": "mousemove|click",
"axisPointer": {
"type": "line"
},
"showContent": true,
"alwaysShowContent": false,
"showDelay": 0,
"hideDelay": 100,
"enterable": false,
"confine": false,
"appendToBody": false,
"transitionDuration": 0.4,
"textStyle": {
"fontSize": 14
},
"borderWidth": 0,
"padding": 5,
"order": "seriesAsc"
},
"xAxis": [
{
"type": "value",
"name": "R",
"show": true,
"scale": false,
"nameLocation": "end",
"nameGap": 15,
"nameTextStyle": {
"color": "#86909C",
"fontStyle": "normal",
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"fontSize": 12
},
"gridIndex": 0,
"axisLine": {
"show": true,
"onZero": true,
"onZeroAxisIndex": 0,
"lineStyle": {
"show": true,
"width": 1,
"opacity": 1,
"curveness": 0,
"type": "solid",
"color": "#86909C"
}
},
"axisTick": {
"show": false,
"alignWithLabel": false,
"inside": true,
"lineStyle": {
"show": true,
"width": 1,
"opacity": 1,
"curveness": 0,
"type": "solid",
"color": "#86909C"
}
},
"axisLabel": {
"show": true,
"position": "inside",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"valueAnimation": false
},
"inverse": false,
"offset": 0,
"splitNumber": 5,
"min": 475,
"max": 750,
"minInterval": 0,
"splitLine": {
"show": false,
"lineStyle": {
"show": true,
"width": 1,
"opacity": 1,
"curveness": 0,
"type": "dashed",
"color": "#E5E6EB"
}
},
"animation": true,
"animationThreshold": 2000,
"animationDuration": 1000,
"animationEasing": "cubicOut",
"animationDelay": 0,
"animationDurationUpdate": 300,
"animationEasingUpdate": "cubicOut",
"animationDelayUpdate": 0,
"data": [
686.49
]
}
],
"yAxis": [
{
"type": "value",
"name": "F",
"show": true,
"scale": false,
"nameLocation": "end",
"nameGap": 15,
"nameTextStyle": {
"color": "#86909C",
"fontStyle": "normal",
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"fontSize": 12
},
"gridIndex": 0,
"axisLine": {
"show": true,
"onZero": true,
"onZeroAxisIndex": 0,
"lineStyle": {
"show": true,
"width": 1,
"opacity": 1,
"curveness": 0,
"type": "solid",
"color": "#86909C"
}
},
"axisTick": {
"show": false,
"alignWithLabel": false,
"inside": true,
"lineStyle": {
"show": true,
"width": 1,
"opacity": 1,
"curveness": 0,
"type": "solid",
"color": "#86909C"
}
},
"axisLabel": {
"show": true,
"position": "inside",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"valueAnimation": false
},
"inverse": false,
"offset": 0,
"splitNumber": 5,
"minInterval": 0,
"splitLine": {
"show": false,
"lineStyle": {
"show": true,
"width": 1,
"opacity": 1,
"curveness": 0,
"type": "dashed",
"color": "#E5E6EB"
}
},
"animation": true,
"animationThreshold": 2000,
"animationDuration": 1000,
"animationEasing": "cubicOut",
"animationDelay": 0,
"animationDurationUpdate": 300,
"animationEasingUpdate": "cubicOut",
"animationDelayUpdate": 0
}
],
"title": [
{
"show": true,
"target": "blank",
"subtarget": "blank",
"padding": 5,
"itemGap": 10,
"textAlign": "auto",
"textVerticalAlign": "auto",
"triggerEvent": false
}
]
};
chart_091c7865f8594b17bc01d20333af9128.setOption(option_091c7865f8594b17bc01d20333af9128);
</script>
</body>
</html>

224
rfm/客户占比.html Normal file
View File

@ -0,0 +1,224 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Awesome-pyecharts</title>
<script type="text/javascript" src="https://assets.pyecharts.org/assets/v5/echarts.min.js"></script>
<script type="text/javascript" src="https://assets.pyecharts.org/assets/v5/themes/walden.js"></script>
</head>
<body >
<div id="72fcdbab20ae43dfbddd4023dc41aed3" class="chart-container" style="width:500px; height:350px; "></div>
<script>
var chart_72fcdbab20ae43dfbddd4023dc41aed3 = echarts.init(
document.getElementById('72fcdbab20ae43dfbddd4023dc41aed3'), 'walden', {renderer: 'canvas'});
var option_72fcdbab20ae43dfbddd4023dc41aed3 = {
"animation": false,
"animationThreshold": 2000,
"animationDuration": 1000,
"animationEasing": "cubicOut",
"animationDelay": 0,
"animationDurationUpdate": 300,
"animationEasingUpdate": "cubicOut",
"animationDelayUpdate": 0,
"aria": {
"enabled": false
},
"series": [
{
"type": "pie",
"name": "\u5ba2\u6237\u5206\u7c7b",
"colorBy": "data",
"legendHoverLink": true,
"selectedMode": false,
"selectedOffset": 10,
"clockwise": true,
"startAngle": 90,
"minAngle": 0,
"minShowLabelAngle": 0,
"avoidLabelOverlap": true,
"stillShowZeroSum": true,
"percentPrecision": 2,
"showEmptyCircle": true,
"emptyCircleStyle": {
"color": "lightgray",
"borderColor": "#000",
"borderWidth": 0,
"borderType": "solid",
"borderDashOffset": 0,
"borderCap": "butt",
"borderJoin": "bevel",
"borderMiterLimit": 10,
"opacity": 1
},
"data": [
{
"name": "\u91cd\u8981\u4ef7\u503c\u5ba2\u6237",
"value": 36.20
},
{
"name": "\u6d41\u5931\u5ba2\u6237",
"value": 27.15
},
{
"name": "\u65b0\u5ba2\u6237",
"value": 23.88
},
{
"name": "\u6f5c\u529b\u5ba2\u6237",
"value": 5.78
},
{
"name": "\u91cd\u8981\u5524\u56de\u5ba2\u6237",
"value": 3.15
},
{
"name": "\u91cd\u8981\u6df1\u8015\u5ba2\u6237",
"value": 2.25
},
{
"name": "\u4e00\u822c\u7ef4\u6301\u5ba2\u6237",
"value": 0.90
},
{
"name": "\u91cd\u8981\u633d\u7559\u5ba2\u6237",
"value": 0.71
}
],
"radius": [
"0%",
"75%"
],
"center": [
"50%",
"60%"
],
"roseType": "area",
"label": {
"show": true,
"position": "outside",
"color": "#86909C",
"margin": 8,
"fontSize": 12,
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"formatter": "{b|{b}}\n{c|{c}}",
"rich": {
"b": {
"fontWeight": "bold"
},
"c": {
"lineHeight": 25
}
},
"valueAnimation": false
},
"labelLine": {
"show": true,
"showAbove": false,
"length": 15,
"length2": 15,
"smooth": false,
"minTurnAngle": 90,
"maxSurfaceAngle": 90
}
}
],
"legend": [
{
"data": [
"\u91cd\u8981\u4ef7\u503c\u5ba2\u6237",
"\u6d41\u5931\u5ba2\u6237",
"\u65b0\u5ba2\u6237",
"\u6f5c\u529b\u5ba2\u6237",
"\u91cd\u8981\u5524\u56de\u5ba2\u6237",
"\u91cd\u8981\u6df1\u8015\u5ba2\u6237",
"\u4e00\u822c\u7ef4\u6301\u5ba2\u6237",
"\u91cd\u8981\u633d\u7559\u5ba2\u6237"
],
"selected": {},
"show": false,
"left": "center",
"top": "top",
"orient": "horizontal",
"align": "auto",
"padding": 5,
"itemGap": 10,
"itemWidth": 10,
"itemHeight": 10,
"inactiveColor": "#86909C",
"textStyle": {
"color": "#86909C",
"fontStyle": "normal",
"fontWeight": "normal",
"fontFamily": "PingFang SC",
"fontSize": 12
},
"backgroundColor": "transparent",
"borderColor": "#ccc",
"borderWidth": 0,
"borderRadius": 0,
"pageButtonItemGap": 5,
"pageButtonPosition": "end",
"pageFormatter": "{current}/{total}",
"pageIconColor": "#2f4554",
"pageIconInactiveColor": "#aaa",
"pageIconSize": 15,
"animationDurationUpdate": 800,
"selector": false,
"selectorPosition": "auto",
"selectorItemGap": 7,
"selectorButtonGap": 10
}
],
"tooltip": {
"show": false,
"trigger": "item",
"triggerOn": "mousemove|click",
"axisPointer": {
"type": "line"
},
"showContent": true,
"alwaysShowContent": false,
"showDelay": 0,
"hideDelay": 100,
"enterable": false,
"confine": false,
"appendToBody": false,
"transitionDuration": 0.4,
"textStyle": {
"fontSize": 14
},
"borderWidth": 0,
"padding": 5,
"order": "seriesAsc"
},
"title": [
{
"show": true,
"target": "blank",
"subtarget": "blank",
"padding": 5,
"itemGap": 10,
"textAlign": "auto",
"textVerticalAlign": "auto",
"triggerEvent": false
}
]
};
chart_72fcdbab20ae43dfbddd4023dc41aed3.setOption(option_72fcdbab20ae43dfbddd4023dc41aed3);
</script>
</body>
</html>

View File

@ -0,0 +1,59 @@
<!DOCTYPE html>
<html lang="zh-CN">
<style>
/*全局样式*/
* {
font-family: "PingFang SC", "Nunito";
font-size: 14px;
color: rgb(29, 33, 41);
}
body {
margin: 0px;
}
/*页面布局*/
.document-container {
width: 16cm;
margin: auto;
}
/*文档标题*/
.document-block-title {
font-size: 24px;
font-weight: 700;
}
/*表格*/
table {
width: 16cm;
table-layout: auto;
border-collapse: collapse;
text-align: right;
scrolling: no;
}
/*表格-首行*/
table tr th {
height: 35px;
font-size: 14px;
font-weight: 500;
background-color: rgb(242 243 245);
padding: 8px 16px;
}
/*表格-数据行*/
table tr td {
height: 30px;
font-size: 12px;
font-weight: 400;
border-bottom: 1px solid rgb(229 230 235);
padding: 8px 16px;
}
/*表格第一列左对齐*/
table th:first-child, table td:first-child {
text-align: left;
}
</style>
<head>
<meta charset=“UTF-8”>
<title>pyecharts</title>
</head>
<body>
{{ html_content }}
</body>
</html>

View File

@ -0,0 +1,2 @@
load_from:
- python_file: dagster.py

View File

@ -0,0 +1,602 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>赔案档案</title>
<style>
:root {
--arcoblue-1: #e8f3ff;
--arcoblue-2: #bedaff;
--arcoblue-3: #94bfff;
--arcoblue-4: #6aa1ff;
--arcoblue-5: #4080ff;
--arcoblue-6: #165dff;
--arcoblue-7: #0e42d2;
--arcoblue-8: #072ca6;
--arcoblue-9: #031a79;
--arcoblue-10: #000d4d;
--green-1: #e8ffea;
--green-2: #aff0b5;
--green-3: #7be188;
--green-4: #4cd263;
--green-5: #23c343;
--green-6: #00b42a;
--green-7: #009a29;
--green-8: #008026;
--green-9: #006622;
--green-10: #004d1c;
--red-1: #ffece8;
--red-2: #fdcdc5;
--red-3: #fbaca3;
--red-4: #f98981;
--red-5: #f76560;
--red-6: #f53f3f;
--red-7: #cb272d;
--red-8: #a1151e;
--red-9: #770611;
--red-10: #4d000a;
--orange-1: #fff7e8;
--orange-2: #ffe4ba;
--orange-3: #ffcf8b;
--orange-4: #ffb65d;
--orange-5: #ff9a2e;
--orange-6: #ff7d00;
--orange-7: #d25f00;
--orange-8: #a64500;
--orange-9: #792e00;
--orange-10: #4d1b00;
--gray-1: #f7f8fa;
--gray-2: #f2f3f5;
--gray-3: #e5e6eb;
--gray-4: #c9cdd4;
--gray-5: #a9aeb8;
--gray-6: #86909c;
--gray-7: #6b7785;
--gray-8: #4e5969;
--gray-9: #272e3b;
--gray-10: #1d2129;
--color-primary: var(--arcoblue-6);
--color-primary-light: var(--arcoblue-1);
--color-success: var(--green-6);
--color-warning: var(--orange-6);
--color-danger: var(--red-6);
--color-text: var(--gray-10);
--color-text-secondary: var(--gray-8);
--color-border: var(--gray-3);
--color-bg: var(--gray-1);
--color-bg-secondary: #FFFFFF;
--border-radius-small: 4px;
--border-radius-medium: 6px;
--border-radius-large: 8px;
--font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, 'Noto Sans', sans-serif;
--box-shadow: 0 4px 10px rgba(0, 0, 0, 0.04);
--box-shadow-hover: 0 8px 20px rgba(0, 0, 0, 0.08);
--spacing-xs: 4px;
--spacing-sm: 8px;
--spacing-md: 16px;
--spacing-lg: 24px;
--spacing-xl: 32px;
}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
font-family: var(--font-family);
}
body {
background-color: var(--color-bg);
color: var(--color-text);
line-height: 1.6;
padding: var(--spacing-lg);
min-height: 100vh;
}
.container {
max-width: 1200px;
margin: 0 auto;
background: var(--color-bg-secondary);
border-radius: var(--border-radius-large);
box-shadow: var(--box-shadow);
overflow: hidden;
}
header {
background: var(--color-primary);
color: white;
padding: var(--spacing-xl);
position: relative;
}
.header-content {
display: flex;
justify-content: space-between;
align-items: center;
flex-wrap: wrap;
}
h1 {
font-size: 20px;
font-weight: 600;
margin-bottom: var(--spacing-sm);
}
.header-info {
font-size: 14px;
opacity: 0.9;
}
.insurance-logo {
background: white;
padding: var(--spacing-xs) var(--spacing-md);
border-radius: 50px;
font-weight: 500;
color: var(--color-primary);
box-shadow: var(--box-shadow);
}
.section {
padding: var(--spacing-lg);
border-bottom: 1px solid var(--color-border);
}
.section:last-child {
border-bottom: none;
}
h2 {
color: var(--color-primary);
font-size: 16px;
margin-bottom: var(--spacing-lg);
padding-bottom: var(--spacing-sm);
border-bottom: 1px solid var(--color-primary-light);
display: flex;
align-items: center;
font-weight: 500;
}
h2:before {
content: "";
display: inline-block;
width: 3px;
height: 16px;
background: var(--color-primary);
border-radius: var(--border-radius-small);
margin-right: var(--spacing-sm);
}
.card-container {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(300px, 1fr));
gap: var(--spacing-md);
}
.card {
background: var(--color-bg-secondary);
border-radius: var(--border-radius-medium);
padding: var(--spacing-md);
border: 1px solid var(--color-border);
box-shadow: var(--box-shadow);
transition: transform 0.2s ease, box-shadow 0.2s ease;
}
.card:hover {
transform: translateY(-2px);
box-shadow: var(--box-shadow-hover);
}
.card h3 {
color: var(--color-primary);
font-size: 14px;
margin-bottom: var(--spacing-md);
padding-bottom: var(--spacing-sm);
border-bottom: 1px dashed var(--color-border);
font-weight: 500;
}
/* 字段布局 */
.info-grid {
display: grid;
grid-template-columns: 1fr;
gap: var(--spacing-sm);
font-size: 14px;
}
.info-item {
display: flex;
flex-direction: column;
margin-bottom: 12px;
}
.info-label {
font-size: 12px;
color: var(--color-text-secondary);
margin-bottom: 4px;
}
.info-value {
font-size: 15px;
font-weight: 500;
word-break: break-word;
padding: 4px 0;
}
.invoice-card {
background: var(--color-bg-secondary);
border-radius: var(--border-radius-medium);
padding: var(--spacing-md);
margin-bottom: var(--spacing-md);
box-shadow: var(--box-shadow);
border: 1px solid var(--color-border);
}
.invoice-header {
display: flex;
justify-content: space-between;
align-items: flex-start;
margin-bottom: var(--spacing-md);
padding-bottom: var(--spacing-sm);
border-bottom: 1px solid var(--color-border);
}
.invoice-number-container {
display: flex;
flex-direction: column;
}
.invoice-number {
font-size: 15px;
font-weight: 600;
color: var(--color-primary);
}
.invoice-reference {
margin-top: 4px;
font-size: 12px;
color: var(--color-text-secondary);
}
.invoice-examination {
display: flex;
align-items: center;
}
.examination-tag {
padding: 6px 12px;
border-radius: 50px;
font-weight: 600;
display: inline-block;
font-size: 13px;
border: 1px solid;
}
.examination-tag.valid {
background: var(--green-1);
color: var(--green-7);
border-color: var(--green-3);
}
.examination-tag.suspicious {
background: var(--orange-1);
color: var(--orange-7);
border-color: var(--orange-3);
}
.examination-tag.invalid {
background: var(--red-1);
color: var(--red-7);
border-color: var(--red-3);
}
.invoice-details {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(200px, 1fr));
gap: var(--spacing-md);
margin-bottom: var(--spacing-md);
font-size: 14px;
}
.detail-item {
display: flex;
flex-direction: column;
}
.detail-label {
font-size: 12px;
color: var(--color-text-secondary);
margin-bottom: 4px;
}
.detail-value {
font-size: 14px;
font-weight: 500;
}
table {
width: 100%;
margin-top: var(--spacing-md);
border-collapse: collapse;
border-radius: var(--border-radius-medium);
overflow: hidden;
box-shadow: var(--box-shadow);
font-size: 14px;
table-layout: fixed;
}
th {
background-color: var(--color-primary-light);
color: var(--color-primary);
font-weight: 500;
padding: 12px;
text-align: left;
width: 20%;
word-break: break-word;
}
td {
padding: 12px;
border-bottom: 1px solid var(--color-border);
width: 20%;
word-break: break-word;
}
tr:nth-child(even) {
background-color: var(--color-bg);
}
tr:hover {
background-color: var(--color-primary-light);
}
.amount-total {
text-align: right;
font-weight: 600;
font-size: 15px;
color: var(--color-primary);
margin-top: var(--spacing-md);
padding-top: var(--spacing-md);
border-top: 1px solid var(--color-border);
}
footer {
text-align: center;
padding: var(--spacing-md);
color: var(--color-text-secondary);
font-size: 12px;
background: var(--color-bg);
border-top: 1px solid var(--color-border);
}
.highlight {
color: var(--color-danger);
font-weight: 500;
}
@media (max-width: 768px) {
.card-container {
grid-template-columns: 1fr;
}
.invoice-details {
grid-template-columns: 1fr;
}
.header-content {
flex-direction: column;
align-items: flex-start;
}
.insurance-logo {
margin-top: var(--spacing-md);
}
}
</style>
</head>
<body>
<div class="container">
<header>
<div class="header-content">
<div>
<h1>赔案档案</h1>
<div class="header-info">
<p>赔案编号: {{ dossier.赔案层.赔案编号 }} | 保险总公司: {{ dossier.赔案层.保险总公司 }}</p>
</div>
</div>
</div>
</header>
<div class="section">
<h2>影像件层</h2>
<table>
<thead>
<tr>
<th>影像件序号</th>
<th>影像件名称</th>
<th>已分类(含旋正)</th>
<th>影像件类型</th>
<th>已识别</th>
</tr>
</thead>
<tbody>
{% for image in dossier.影像件层 %}
<tr>
<td>{{ image.影像件序号 }}</td>
<td>{{ image.影像件名称 }}</td>
<td>{{ image.已分类 }}</td>
<td>{{ image.影像件类型 }}</td>
<td>{{ image.已识别 }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
<div class="section">
<h2>赔案层</h2>
<div class="card-container">
<div class="card">
<h3>申请人信息</h3>
<div class="info-grid">
<div class="info-item">
<div class="info-label">与被保险人关系</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.与被保险人关系 }}</div>
</div>
<div class="info-item">
<div class="info-label">姓名</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.姓名 }}</div>
</div>
<div class="info-item">
<div class="info-label">证件类型</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.证件类型 }}</div>
</div>
<div class="info-item">
<div class="info-label">证件号码</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.证件号码 }}</div>
</div>
<div class="info-item">
<div class="info-label">证件有效期</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.证件有效期起 | date }} 至 {{
dossier.赔案层.申请人信息.证件有效期止 | date }}
</div>
</div>
<div class="info-item">
<div class="info-label">性别</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.性别 }}</div>
</div>
<div class="info-item">
<div class="info-label">出生</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.出生 | date }} | {{
dossier.赔案层.申请人信息.年龄 }}岁
</div>
</div>
<div class="info-item">
<div class="info-label">手机号</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.手机号 }}</div>
</div>
<div class="info-item">
<div class="info-label">住址</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.省 }} {{ dossier.赔案层.申请人信息.地 }} {{
dossier.赔案层.申请人信息.县 }}
</div>
<div class="info-value">{{ dossier.赔案层.申请人信息.详细地址 }}</div>
</div>
</div>
</div>
<div class="card">
<h3>受益人信息</h3>
<div class="info-grid">
<div class="info-item">
<div class="info-label">与被保险人关系</div>
<div class="info-value">{{ dossier.赔案层.受益人信息.与被保险人关系 }}</div>
</div>
<div class="info-item">
<div class="info-label">户名</div>
<div class="info-value">{{ dossier.赔案层.受益人信息.户名 }}</div>
</div>
<div class="info-item">
<div class="info-label">开户银行</div>
<div class="info-value">{{ dossier.赔案层.受益人信息.开户银行 }}</div>
</div>
<div class="info-item">
<div class="info-label">银行账号</div>
<div class="info-value">{{ dossier.赔案层.受益人信息.银行账号 }}</div>
</div>
</div>
</div>
</div>
</div>
<div class="section">
<h2>发票层</h2>
{% for invoice in dossier.发票层 %}
<div class="invoice-card">
<div class="invoice-header">
<div class="invoice-number-container">
<div class="invoice-number">{{ invoice.票据类型 }} | {{ invoice.票据号码 }}</div>
<span class="invoice-reference">关联影像件序号: {{ invoice.关联影像件序号 }}</span>
</div>
<div class="invoice-examination">
{% if invoice.查验状态 == '真票' %}
<span class="examination-tag valid">{{ invoice.查验状态 }}</span>
{% elif invoice.查验状态 == '无法查验' %}
<span class="examination-tag suspicious">{{ invoice.查验状态 }}</span>
{% else %}
<span class="examination-tag invalid">{{ invoice.查验状态 }}</span>
{% endif %}
</div>
</div>
<div class="invoice-details">
<div class="detail-item">
<div class="detail-label">就诊人</div>
<div class="detail-value">{{ invoice.就诊人 }}</div>
</div>
<div class="detail-item">
<div class="detail-label">票据代码</div>
<div class="detail-value">{{ invoice.票据代码 }}</div>
</div>
<div class="detail-item">
<div class="detail-label">校验码后六位</div>
<div class="detail-value">{{ invoice.校验码后六位 }}</div>
</div>
<div class="detail-item">
<div class="detail-label">开票日期</div>
<div class="detail-value">{{ invoice.开票日期 | date }}</div>
</div>
<div class="detail-item">
<div class="detail-label">票据金额</div>
<div class="detail-value">{{ invoice.票据金额 }}元</div>
</div>
<div class="detail-item">
<div class="detail-label">就诊类型</div>
<div class="detail-value">{{ invoice.就诊类型 }}</div>
</div>
<div class="detail-item">
<div class="detail-label">医药机构</div>
<div class="detail-value">{{ invoice.医药机构 }}</div>
</div>
<div class="detail-item">
<div class="detail-label">推定疾病</div>
<div class="detail-value">{{ invoice.推定疾病 }}</div>
</div>
</div>
<table>
<thead>
<tr>
<th>大项</th>
<th>小项</th>
<th>数量</th>
<th>金额</th>
</tr>
</thead>
<tbody>
{% for item in invoice.项目 %}
<tr>
<td>{{ item.大项 }}</td>
<td>{{ item.小项 }}</td>
<td>{{ item.数量 }}</td>
<td>{{ item.金额 }}元</td>
</tr>
{% endfor %}
</tbody>
</table>
<div class="amount-total">总金额: {{ invoice.票据金额 }}元</div>
</div>
{% endfor %}
</div>
<footer>
<p>@liubiren.cloud</p>
</footer>
</div>
</body>
</html>

View File

@ -0,0 +1,221 @@
# -*- coding: utf-8 -*-
import json
import re
from csv import DictReader, DictWriter
from pathlib import Path
from typing import List, Dict
import torch
from transformers import BertTokenizerFast, BertForTokenClassification
# 命名实体识别
class NER:
def __init__(self):
# 实体标签映射
self.label_map = {
0: "O", # 非药品命名实体
1: "B-DRUG", # 药品命名实体-开始
2: "I-DRUG", # 药品命名实体-中间
}
# 加载预训练分词器
self.tokenizer = BertTokenizerFast.from_pretrained(
pretrained_model_name_or_path=Path("./models/bert-base-chinese").resolve()
)
# 加载预训练模型
self.model = BertForTokenClassification.from_pretrained(
pretrained_model_name_or_path=Path("./models/bert-base-chinese").resolve(),
)
# 设置模型为预测模式
self.model.eval()
def recognize_drugs(self, text: str) -> List[Dict]:
"""识别药品命名实体"""
if not text.strip():
return []
# 分词编码
inputs = self.tokenizer(
text,
return_tensors="pt",
padding=True,
truncation=True,
return_offsets_mapping=True,
)
# TOKEN于文本中起止位置
offset_mapping = inputs.pop("offset_mapping")[0].cpu().numpy()
with torch.no_grad():
# 模型预测
outputs = self.model(**inputs)
# 获取TOKEN预测标签
predictions = torch.argmax(outputs.logits, dim=2)
entities = []
current_entity = None
# 遍历所有TOKEN、预测标签索引和起止索引
for token, offset, label_id in zip(
self.tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]),
offset_mapping,
predictions[0].cpu().numpy(),
):
print(label_id)
continue
# 映射TOKEN标签
label = self.label_map.get(label_id, "O")
# 若遇到特殊TOKEN则跳过
if (
token in ["[CLS]", "[SEP]", "[PAD]"]
or offset[0] == 0
and offset[1] == 0
):
continue
if label == "B-DRUG":
if current_entity:
self._combine_tokens(current_entity, text)
entities.append(current_entity)
current_entity = {
"start": offset[0],
"end": offset[1],
"tokens": [token],
"offsets": [offset],
"type": label,
}
elif label == "I-DRUG":
if current_entity:
if offset[0] == current_entity["end"]:
current_entity["end"] = offset[1]
current_entity["tokens"].append(token)
current_entity["offsets"].append(offset)
else:
self._combine_tokens(current_entity, text)
entities.append(current_entity)
current_entity = {
"start": offset[0],
"end": offset[1],
"tokens": [token],
"offsets": [offset],
"type": label,
}
else:
if current_entity:
self._combine_tokens(current_entity, text)
entities.append(current_entity)
current_entity = None
if current_entity:
self._combine_tokens(current_entity, text)
entities.append(current_entity)
return entities
@staticmethod
def _combine_tokens(current_entity: Dict, text: str):
"""合并TOKEN"""
# 从文本中提取命名实体文本
current_entity["text"] = text[current_entity["start"] : current_entity["end"]]
"""
# 使用示例(需要训练好的模型)
dl_ner = NER()
text = "患者需要硫酸吗啡缓释片治疗癌症疼痛"
entities = dl_ner.recognize_drugs(text)
print(entities)
exit()
"""
def drug_extraction(text) -> tuple[str, str | None]:
"""药品数据提取"""
# 正则匹配两个“*”之间内容作为药品类别,第二个“*”之后内容作为药品名称。
if match := re.match(
pattern=r"\*(?P<drug_type>.*?)\*(?P<drug_name>.*)",
string=(text := text.strip()),
):
# 药品类别
drug_type = match.group("drug_type").strip()
# 药品名称
drug_name = (
match.group("drug_name")
.upper() # 小写转大写
.replace("(", " ")
.replace(")", " ")
.replace("", " ")
.replace("", " ")
.replace("[", " ")
.replace("]", " ")
.replace("", " ")
.replace("", " ")
.replace(":", " ")
.replace("", " ")
.replace(",", " ")
.replace("", " ")
.replace("·", " ")
.replace("`", " ")
.replace("@", " ")
.replace("#", " ")
.replace("*", " ")
.replace("/", " ") # 就指定符号替换为空格
.strip()
)
# 就药品名称中多个空格替换为一个空格
drug_name = re.sub(pattern=r"\s+", repl=" ", string=drug_name)
for section in drug_name.split(" "):
print(section)
# 若匹配失败则药品类型默认为文本、药品名称默认为None
else:
drug_type, drug_name = text, None
return drug_type, drug_name
dataframe = []
# 就票据查验结果和疾病对应关系进行数据清洗(暂仅考虑增值税发票且为真票)
with open("票据查验结果和疾病对应关系.csv", "r", encoding="utf-8") as file:
for row in DictReader(file):
try:
disease = row["疾病"]
response = json.loads(row["票据查验结果"])
# 遍历项目
for item in response["data"]["details"]["items"]:
name = item["name"]
drug_extraction(name)
exit()
except Exception as e:
print(e)
exit()
with open("1.csv", "w", newline="", encoding="utf-8") as file:
writer = DictWriter(file, fieldnames=dataframe[0].keys())
writer.writeheader()
writer.writerows(dataframe)