zoukankan      html  css  js  c++  java
  • 基于海明距离的加权平均值人职匹配模型(Sqlserver2014/16内存表实现)

    最近给某大学网站制作一个功能,需要给全校所有的学生提供就业单位发布职位的自动匹配,学生登陆就业网,就可以查看适合自己的职位,进而可以在线投递。

    全校有几万名学生,注册企业发布的职位也有上万,如何在很短的时间内(不影响学生访问网站),通过建立好的匹配模型迅速的对学生——职位进行匹配?

    • 建模篇

    我以前给银行开发过房地产自动估价软件,按照标准做法是用欧几里得贴近算法或者海明贴近度,但是那种算法太复杂,属于应用数学的范畴,需要依靠精准的建模。我们就业办的老师是在实战上打拼的,没有高深的理论基础,所以建模必须简单。根据调研,发现使用海明距离的结合加权的算法比较简单,也容易转化成匹配度的百分比。具体算法出自2010年3月《计算机工程》中的《模糊匹配中的匹配度计算方法》:

       

     

     

    这种算法的基本思路是每个项目都有一定的权重分值,然后按照学生和职位每个项目的匹配度,乘以权重分值再除以总的权重,就是学生对于每个职位的匹配情况。

    学生在就业网上设置过职位搜索器的,按照职位搜索器中设置的项目进行匹配,有这么些项目:

    比如“薪资”一项的匹配方法如下:

     如果学生没有设置过搜索器的,则使用另一套维度,其中使用了就业数据的大数据分析方法,介绍略。 

     

    • 实现篇

    网站基于.net C# + Sqlserver,每次要对所有的有效职位进行匹配,如此大的计算量使用传统方法肯定会慢,甚至造成性能瓶颈。因此思考采用两种方式提高计算效率:

    1. 使用MongoDB。
    2. 使用Sqlserver2014或者SqlServer2016的内存优化表。

    MongoDB是典型的NoSQL数据库,交换数据是json格式,这种数据库存取的速度非常快,没有Sqlserver那些复杂的权限、并发、锁、存储引擎,因此很适合作为高吞吐量的数据存储方式;

    微软在Sqlserver2014和Sqlserver2016中开发了内存优化表和本地编译存储过程,两者也有很好的性能表现

    (顺便吐槽一下网上有人说Oracle一句命令就可以把表升级为内存表,一句命令就可以把存储过程升级成本地编译存储过程,而Sqlserver这方面限制太多,内存表不能建索引、不能建Check....(2016版可以),而本地编译存储过程的限制更是多得多,不能用function,不能用游标,不能用链接数据库........。我想这是两种数据库不同的实现机制形成的,在《SQL编程风格》151页中描述:“T-SQL是一个简单的一遍扫描的编译器,以C和Algol语言模型创建......Oracle中的PL/SQL是以ADA和SQL/PSM为模型创建的,它是一种复杂语言,可以用来开发应用程序。”所以Oralce的存储过程要升级简直易如反掌。)

    我们还是先选择SqlServer的内存表作为数据缓冲池,本来想使用本地编译存储过程实现模型匹配算法,但是限制实在太多,所以只好使用普通的存储过程。每天有批处理把职位数据同步到内存表里,然后学生登录后进行计算,每周还进行职位的推送。

    应大家要求公布算法代码:(因为和学校签订过保密协议,所以删除部分行,请谅解)

    基本的建表、初始化数据脚本:

      1 --建表
      2 CREATE DATABASE [DataAnalysis]
      3  CONTAINMENT = NONE
      4  ON  PRIMARY 
      5 ( NAME = N'DataAnalysis', FILENAME = N'd:DATADataAnalysis.mdf' , SIZE = 5120KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ), 
      6  FILEGROUP [DataAnalysisFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA  DEFAULT
      7 ( NAME = N'DataAnalysisContainer', FILENAME = N'd:DATAHashCollisionsContainer' , MAXSIZE = UNLIMITED)
      8  LOG ON 
      9 ( NAME = N'DataAnalysis_log', FILENAME = N'd:DATADataAnalysis_log.ldf' , SIZE = 2304KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
     10 GO
     11 
     12 ALTER DATABASE [DataAnalysis] SET COMPATIBILITY_LEVEL = 120
     13 GO
     14 
     15 
     16 Use [DataAnalysis]
     17 GO
     18 
     19 if exists(select * from sysobjects where id=object_id('EnterPrisePositions'))
     20     DROP TABLE EnterPrisePositions
     21 GO
     22 
     23 CREATE TABLE EnterPrisePositions
     24 (
     25     [ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 4096),
     26     [EntID] uniqueidentifier NOT NULL,
     27     [EntUserID] uniqueidentifier NOT NULL,
     28     [EntName] [nvarchar](80) NOT NULL,
     29     [PosiID] [uniqueidentifier] NOT NULL,
     30     [PosiName] [nvarchar](40) NULL,
     31     [JobTypeID] [uniqueidentifier] NULL,
     32     --.......
     33 ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY)
     34 GO
     35 
     36 INSERT INTO [EnterPrisePositions] ([EntID], [EntUserID], [EntName], [OrgCode]..........)
     37 SELECT e.[EntID], p.[EntUserID], e.[EntName], e.[OrgCode], e.[HyID], e.[SubHyID], e.[ThirdHyID],..........
     38   FROM .....[dbo].[Position] p INNER JOIN ......[dbo].[Enterprise] e ON p.[EntUserID] = e.[EntUserID]
     39  WHERE p.DelFlag <> 1 AND p.EffectiveDate <= GetDate() AND GetDate() <= p.ExpiryDate AND e.[CheckFlag] = 1 AND e.[DelFlag] = 0 AND e.[IsBlack] = 0
     40 GO
     41 
     42 
     43 if exists(select * from sysobjects where id=object_id('UserPositionResult'))
     44     DROP TABLE UserPositionResult
     45 GO
     46 
     47 CREATE TABLE UserPositionResult
     48 (
     49     [ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 4096),
     50     [Score] FLOAT,
     51     [PosiID] [uniqueidentifier] NOT NULL,
     52     [PosiName] [nvarchar](40) NULL,    
     53     --............
     54 ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY)
     55 GO
     56 
     57 
     58 --辅助函数
     59 -- =============================================
     60 -- Author:        Ben
     61 -- Create date: 2016-08-08
     62 -- Description:    处理专业ID用
     63 -- =============================================
     64 CREATE FUNCTION [GetPositionSpecialty] (
     65     @String NVARCHAR(150)
     66 ) RETURNS @temptable TABLE (ID INT IDENTITY(1,1), Specialty NVARCHAR(8)) AS
     67 BEGIN
     68 
     69     DECLARE @idx INT=1
     70     DECLARE @slice NVARCHAR(150) 
     71     DECLARE @Delimiter NVARCHAR(1) = ','
     72     IF LEN(@String) < 1 OR LEN(ISNULL(@String,'')) = 0
     73     BEGIN
     74         INSERT INTO @temptable(Specialty) VALUES('0')
     75         RETURN
     76     END
     77     WHILE @idx != 0
     78     BEGIN
     79         SET @idx = CHARINDEX(@Delimiter,@String)
     80         IF @idx != 0
     81             SET @slice = LEFT(@String,@idx - 1)
     82         ELSE
     83             SET @slice = @String
     84         IF LEN(@slice) > 0
     85             INSERT INTO @temptable(Specialty) VALUES(@slice)
     86         SET @String = RIGHT (@String, LEN(@String) - @idx)
     87         IF LEN(@String) = 0
     88             BREAK
     89     END
     90     RETURN
     91 END
     92 
     93 GO
     94 
     95 
     96 -- =============================================
     97 -- Author:        Ben
     98 -- Create date: 2016-08-08
     99 -- Description:    处理职位搜索器用
    100 -- =============================================
    101 CREATE PROCEDURE [GetStudentSearch]
    102     -- Add the parameters for the stored procedure here
    103     @Ssqtj NVARCHAR(1000), @ProvinceId NCHAR(6) OUTPUT, @Zydm NVARCHAR(10) OUTPUT, @JobNature NVARCHAR(8) OUTPUT, @JobTypeID uniqueidentifier OUTPUT, @SubJobTypeID uniqueidentifier OUTPUT,
    104     @Salary NVARCHAR(30) OUTPUT, @Computer NVARCHAR(50) OUTPUT, @Language NVARCHAR(50) OUTPUT, @Education NVARCHAR(4) OUTPUT, @HyID uniqueidentifier OUTPUT, @SubHyID uniqueidentifier OUTPUT,
    105     @ThirdHyID uniqueidentifier OUTPUT
    106 AS
    107 BEGIN
    108     DECLARE @idx INT=1, @StartPos INT
    109     DECLARE @Delimiter NVARCHAR(1) = '|'
    110 
    111     IF LEN(@Ssqtj) < 1 OR LEN(ISNULL(@Ssqtj, '')) = 0
    112         RETURN
    113 
    114     SET @idx = CHARINDEX(@Delimiter, @Ssqtj, @idx)
    115     IF @idx != 0
    116         SET @ProvinceId = LEFT(@Ssqtj, @idx - 1)
    117 
    118 
    119     SET @StartPos = @idx + 1
    120     SET @idx = CHARINDEX(@Delimiter, @Ssqtj, @StartPos)
    121     IF @idx != 0
    122         SET @Zydm = SUBSTRING(@Ssqtj, @StartPos, @idx - @StartPos)
    175         
    176         SET @StartPos = @idx + 1
    177         SET @ThirdHyID = SUBSTRING(@Ssqtj, @StartPos, 36)
    178     END TRY
    179     BEGIN CATCH
    180     END CATCH
    181 END
    182 GO
    183 
    184 --重要,参数表
    185 if exists(select * from sysobjects where id=object_id('Parameters'))
    186     DROP TABLE Parameters
    187 GO
    188 CREATE TABLE [Parameters]
    189 (
    190     [ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 1024),
    191     [Type] NVARCHAR(50) NOT NULL,
    192     [Set1] NVARCHAR(50) NULL,
    193     [Set2] NVARCHAR(50) NULL,
    194     [Value] FLOAT NOT NULL
    195 ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
    196 GO
    197 
    198 INSERT INTO [Parameters] ([Type], [Set1], [Set2], [Value])
    199 SELECT 'Salary', '面议', '面议', 10
    200 UNION ALL
    201 SELECT 'Salary', '面议', '1500~2000', 10
    202 UNION ALL
    203 SELECT 'Salary', '面议', '2000~3000', 10
    204 UNION ALL
    205 SELECT 'Salary', '面议', '3000~4000', 10
    206 UNION ALL
    207 SELECT 'Salary', '面议', '4000~5000', 10
    208 UNION ALL
    209 SELECT 'Salary', '面议', '5000~6000', 10
    210 UNION ALL
    211 SELECT 'Salary', '面议', '6000~7000', 10
    212 UNION ALL
    213 SELECT 'Salary', '面议', '7000~8000', 10
    214 UNION ALL
    215 SELECT 'Salary', '面议', '8000以上', 10
    216 
    217 UNION ALL
    218 SELECT 'Salary', '1500~2000', '面议', 10
    219 UNION ALL
    220 SELECT 'Salary', '1500~2000', '1500~2000', 10
    221 UNION ALL
    222 SELECT 'Salary', '1500~2000', '2000~3000', 5
    223 UNION ALL
    224 SELECT 'Salary', '1500~2000', '3000~4000', 2
    225 UNION ALL
    226 SELECT 'Salary', '1500~2000', '4000~5000', 2
    227 UNION ALL
    228 SELECT 'Salary', '1500~2000', '5000~6000', 2
    229 UNION ALL
    230 SELECT 'Salary', '1500~2000', '6000~7000', 4
    231 UNION ALL
    232 SELECT 'Salary', '1500~2000', '7000~8000', 3
    233 UNION ALL
    234 SELECT 'Salary', '1500~2000', '8000以上', 2
    235 
    236 UNION ALL
    237 SELECT 'Salary', '2000~3000', '面议', 10
    238 UNION ALL
    239 SELECT 'Salary', '2000~3000', '1500~2000', 10
    240 UNION ALL
    241 SELECT 'Salary', '2000~3000', '2000~3000', 10
    242 UNION ALL
    243 SELECT 'Salary', '2000~3000', '3000~4000', 6
    244 UNION ALL
    245 SELECT 'Salary', '2000~3000', '4000~5000', 5
    246 UNION ALL
    247 SELECT 'Salary', '2000~3000', '5000~6000', 4
    248 UNION ALL
    249 SELECT 'Salary', '2000~3000', '6000~7000', 5
    250 UNION ALL
    251 SELECT 'Salary', '2000~3000', '7000~8000', 4
    252 UNION ALL
    253 SELECT 'Salary', '2000~3000', '8000以上', 3
    254 
    255 UNION ALL
    256 SELECT 'Salary', '3000~4000', '面议', 10
    257 UNION ALL
    258 SELECT 'Salary', '3000~4000', '1500~2000', 10
    259 UNION ALL
    260 SELECT 'Salary', '3000~4000', '2000~3000', 10
    261 UNION ALL
    262 SELECT 'Salary', '3000~4000', '3000~4000', 10
    263 UNION ALL
    264 SELECT 'Salary', '3000~4000', '4000~5000', 8
    265 UNION ALL
    266 SELECT 'Salary', '3000~4000', '5000~6000', 6
    267 UNION ALL
    268 SELECT 'Salary', '3000~4000', '6000~7000', 6
    269 UNION ALL
    270 SELECT 'Salary', '3000~4000', '7000~8000', 5
    271 UNION ALL
    272 SELECT 'Salary', '3000~4000', '8000以上', 4
    273 
    274 --........
    275 
    276 UNION ALL
    277 SELECT 'Weight', 'HasNoSearch', 'Education', 20
    278 UNION ALL
    279 SELECT 'Weight', 'HasNoSearch', 'Profession', 20
    280 UNION ALL
    281 SELECT 'Weight', 'HasNoSearch', 'Industry', 8
    282 UNION ALL
    283 SELECT 'Weight', 'HasNoSearch', 'Enterprise', 12
    284 GO
    285 
    286 
    287 --其他临时表,大数据分析
    288 CREATE TABLE IndutryRanking
    289 (
    290     Gzydm NVARCHAR(10)  COLLATE Chinese_PRC_Stroke_90_BIN2  NOT NULL, SubIndustry uniqueidentifier  NOT NULL, Ranking TINYINT NOT NULL,
    291 CONSTRAINT [PK_IndutryRanking] PRIMARY KEY NONCLUSTERED HASH
    292 (
    293     Gzydm ,
    294     SubIndustry 
    295 )WITH ( BUCKET_COUNT = 2048)
    310 
    311 CREATE TABLE EnterpriseRanking
    312 (
    313     Gzydm NVARCHAR(10) COLLATE Chinese_PRC_Stroke_90_BIN2  NOT NULL, Zzjgdm NVARCHAR(10) COLLATE Chinese_PRC_Stroke_90_BIN2 NOT NULL, Ranking FLOAT NOT NULL,
    314 CONSTRAINT [PK_EnterpriseRanking] PRIMARY KEY NONCLUSTERED HASH
    315 (
    316     Gzydm ,
    317     Zzjgdm 
    318 )WITH ( BUCKET_COUNT = 131072)
    319 ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY)
    320 GO
    321 
    322 INSERT INTO EnterpriseRanking (Gzydm, Zzjgdm, Ranking)
    323 SELECT Gzydm, Zzjgdm, LOG(Count(*)+1, 1.8)
    324 FROM ..........[dbo].[vm_AllSyEmployment]
    325 WHERE Gzydm IS NOT NULL AND ZZJGDM IS NOT NULL
    326 GROUP BY Gzydm,Zzjgdm
    327 ORDER BY Count(*) DESC
    328 
    329 GO
    330 
    331 
    332 CREATE FUNCTION [dbo].[GetCurrentBynf]()
    333 RETURNS char(4)
    334 AS
    335 BEGIN
    336     -- Declare the return variable here
    337     DECLARE @Bynf int
    338 
    339     -- Add the T-SQL statements to compute the return value here
    340     SELECT @Bynf = YEAR(GETDATE())
    341 
    342     IF MONTH(GETDATE()) >= 9 OR ( MONTH(GETDATE()) = 8 AND DAY(GETDATE()) >25 )
    343     BEGIN
    344         SET @Bynf = @Bynf + 1
    345     END
    346 
    347     RETURN CAST(@Bynf AS char(4))
    348 END
    349 
    350 GO
    View Code

    做个批处理,每次重启数据库的时候把数据加入,每天定时更新相关数据表

    职位匹配脚本:

      1 CREATE PROCEDURE RetirePositionsByXsxh
      2     --参数
      3     -- Add the parameters for the stored procedure here
      4     @Xsxh NVARCHAR(20), @ResultType INT = 0, @PageSize INT = 99999, @StartPage INT = 0, @ReleaseDateRange SMALLINT = 9999
      5     WITH ENCRYPTION
      6 AS
      7 BEGIN
      8     -- SET NOCOUNT ON added to prevent extra result sets from
      9     -- interfering with SELECT statements.
     10     SET NOCOUNT ON;
     11 
     12     --声明变量
     13     -- Insert statements for procedure here
     14 DECLARE @Ssqtj NVARCHAR(1000), @TableRows int, @PositionRows int,
     15 @ProvinceId NCHAR(6), @Zydm NVARCHAR(10), @JobNature NVARCHAR(8), @JobTypeID uniqueidentifier, @SubJobTypeID uniqueidentifier,@Salary NVARCHAR(30), 
     16 @Computer NVARCHAR(50), @Language NVARCHAR(50), @Education NVARCHAR(4), @HyID uniqueidentifier, @SubHyID uniqueidentifier, @ThirdHyID uniqueidentifier,
     17 --........
     18 
     19 DECLARE @StudentSsqtj TABLE (SsqtjID INT IDENTITY(1,1) NOT NULL Primary Key, Ssqtj NVARCHAR(1000))
     20 DECLARE @PositionSpecialty TABLE (ID INT NOT NULL Primary Key, Specialty NVARCHAR(1000))
     21 --...........
     22 
     23 INSERT INTO @StudentSsqtj (Ssqtj)
     24 SELECT Ssqtj
     25   FROM .......[dbo].[PosiSearch] WITH (SNAPSHOT)
     26  WHERE Xsxh = @Xsxh
     27 
     28 SELECT @TableRows = Count(*) FROM @StudentSsqtj
     29 SELECT @PositionRows = Count(*) FROM [EnterPrisePositions]
     30 
     31 IF @TableRows > 0
     32 BEGIN
     33     SELECT @Salary_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasSearch' AND [Set2] = 'Salary'
     34     SELECT @JobLocation_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasSearch' AND [Set2] = 'JobLocation'
     35     SELECT @Education_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasSearch' AND [Set2] = 'Education'
     36     --........
     37     
     38     SELECT @JobLocation_Same_Province = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Province' AND [Set1] = 'SameProvince'
     39     SELECT @JobLocation_Same_City = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Province' AND [Set1] = 'SameCity'
     40     SELECT @JobLocation_Not_Same = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Province' AND [Set1] = 'NotSame'
     41     SELECT @Education_Same = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Education' AND [Set1] = 'Same'
     42     SELECT @Education_Not_Same = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Education' AND [Set1] = 'NotSame'
     43     --..........
     44     
     45     WHILE @TableRows != 0
     46     BEGIN
     47         SELECT @Ssqtj = Ssqtj
     48           FROM @StudentSsqtj
     49          WHERE SsqtjID = @TableRows
     50 
     51         --取职位搜索器
     52           EXEC [GetStudentSearch] @Ssqtj, @ProvinceId OUTPUT, @Zydm OUTPUT, @JobNature OUTPUT, @JobTypeID OUTPUT , @SubJobTypeID OUTPUT, @Salary OUTPUT, 
     53                @Computer OUTPUT, @Language OUTPUT, @Education OUTPUT, @HyID OUTPUT, @SubHyID OUTPUT,@ThirdHyID OUTPUT
     54 
     55         
     56          WHILE @PositionRows != 0
     57          BEGIN
     58 
     59         --取每个职位
     60             SELECT @Pos_HyID = [HyID], @Pos_SubHyID = [SubHyID], @Pos_ThirdHyID = --.......
     61               FROM [EnterPrisePositions]
     62              WHERE ID = @PositionRows AND DATEDIFF(hour, ReleaseDate, GETDATE()) <= 24 * @ReleaseDateRange
     63 
     64                 IF @@ROWCOUNT <> 0
     65              BEGIN
     66                    SET @Salary_Score = NULL
     67                    SET @JobLocation_Score = NULL
     68                    SET @Education_Score = NULL
     69                    SET @Profession_Score = NULL
     70                    SET @Industry_Score = NULL
     71                    SET @JobNature_Score = NULL
     72                    SET @JobType_Score = NULL
     73                    SET @Computer_Score = NULL
     74                    SET @Language_Score = NULL
     75                    SET @CurrentValue = 0
     76                    SET @WeightSummary = 0
     77                    SET @ParaValue = NULL
     78                    SET @CurrentScore = NULL
     79 
     80             --计算薪资
     81 
     82                 SELECT @ParaValue = [Value] FROM [Parameters] WHERE [Type] = 'Salary' AND [Set1] = @Pos_Salary AND [Set2] = @Salary
     83                     IF @ParaValue IS NOT NULL
     84                  BEGIN
     85                         SET @Salary_Score = ABS(10 - @ParaValue)*@Salary_Weight
     86                    END
     87                
     88 
     89                     IF @ProvinceId <> ''
     90                  BEGIN
     91                         IF (LEFT(@Pos_ProvinceId,4) = LEFT(@ProvinceId, 4) OR (
     92                            LEFT(@Pos_ProvinceId,2) = LEFT(@ProvinceId, 2) AND LEFT(@Pos_ProvinceId,2) IN ('10','12','31','50')))
     93                            AND @JobLocation_Same_City IS NOT NULL
     94                      BEGIN
     95                             SET @JobLocation_Score = ABS(10 - @JobLocation_Same_City)*@JobLocation_Weight
     96                        END
     97                    ELSE IF LEFT(@Pos_ProvinceId,2) = LEFT(@ProvinceId, 2) AND @JobLocation_Same_Province IS NOT NULL
     98                      BEGIN
     99                             SET @JobLocation_Score = ABS(10 - @JobLocation_Same_Province)*@JobLocation_Weight
    100                        END
    101                       ELSE IF @JobLocation_Not_Same IS NOT NULL
    102                      BEGIN
    103                             SET @JobLocation_Score = ABS(10 - @JobLocation_Not_Same)*@JobLocation_Weight
    104                        END
    105                    END
    106             
    107             --计算学历
    108                     IF @Education NOT IN ('', '不限')
    109                  BEGIN
    110                         IF @Pos_Education = @Education
    111                      BEGIN
    112                             SET @Education_Score = ABS(10 - @Education_Same) * @Education_Weight
    113                        END
    114                       ELSE
    115                      BEGIN
    116                             SET @Education_Score = ABS(10 - @Education_Not_Same)*@Education_Weight
    117                        END
    118                    END
    119 
    120             --计算专业
    121                    IF @Zydm <> ''
    122                 BEGIN
    123                    DELETE FROM @PositionSpecialty
    124                    INSERT INTO @PositionSpecialty (ID, Specialty)
    125                         SELECT ID, Specialty FROM [GetPositionSpecialty](@Pos_SpecialtyIds)                    
    126                         SELECT @Specialty_Rows = Count(*) FROM @PositionSpecialty
    127                          WHILE @Specialty_Rows <> 0
    128                          BEGIN
    129                             SELECT @Pos_SpecialtyId = Specialty FROM @PositionSpecialty WHERE ID = @Specialty_Rows
    130                                 IF @Pos_SpecialtyId = '0' OR LEFT(@Pos_SpecialtyId,2) = LEFT(@Zydm,2)
    131                              BEGIN
    132                                     SET @Profession_Score =  ABS(10 - @Profession_Match) * @Profession_Weight
    133                                END
    134                               ELSE
    135                              BEGIN
    136                                     IF @Profession_Score <> 0 OR @Profession_Score IS NULL
    137                                         SET @Profession_Score =  ABS(10 - @Profession_Not_Match) * @Profession_Weight
    138                                END
    139                                SET @Specialty_Rows = @Specialty_Rows - 1
    140                            END
    141 
    142                   END
    143 
    144         --其他略
    145 
    146 
    147         --计算匹配度,加到匹配度临时表
    148                     IF @Salary_Score IS NOT NULL
    149                     BEGIN
    150                         SET @CurrentValue = @CurrentValue + @Salary_Score
    151                         SET @WeightSummary = @WeightSummary + @Salary_Weight
    152                     END
    153                     IF @JobLocation_Score IS NOT NULL
    154                     BEGIN
    155                         SET @CurrentValue = @CurrentValue + @JobLocation_Score
    156                         SET @WeightSummary = @WeightSummary + @JobLocation_Weight
    157                     END
    158                     IF @Education_Score IS NOT NULL
    159                     BEGIN
    160                         SET @CurrentValue = @CurrentValue + @Education_Score
    161                         SET @WeightSummary = @WeightSummary + @Education_Weight
    162                     END
    163 
    164                     IF @WeightSummary != 0
    165                  BEGIN
    166                        SET @CurrentScore = @CurrentValue / @WeightSummary
    167                        SET @ParaValue = NULL
    168                     SELECT @ParaValue = [Score] FROM @PositionScore WHERE ID = @PositionRows
    169                      BEGIN
    170                             IF @ParaValue IS NULL
    171                          BEGIN
    172                                 INSERT INTO @PositionScore(ID, Score, Salary_Score, Province_Score, Education_Score , Profession_Score , Industry_Score , JobNature_Score , JobType_Score , Computer_Score , Language_Score)
    173                                  VALUES (@PositionRows, @CurrentScore, @Salary_Score, @JobLocation_Score,@Education_Score , @Profession_Score , @Industry_Score , @JobNature_Score , @JobType_Score , @Computer_Score , @Language_Score)
    174                            END
    175                           ELSE
    176                          BEGIN
    177                                 IF @CurrentScore < @ParaValue
    178                              BEGIN
    179                                 UPDATE @PositionScore SET Score = @CurrentScore WHERE ID = @PositionRows
    180                                END
    181                            END
    182                        END
    183                    END               
    184                END 
    185                SET @PositionRows = @PositionRows - 1
    186            END
    187         
    188         --SELECT @ProvinceId, @Zydm, @JobNature, @JobTypeID, @SubJobTypeID,@Salary, @Computer, @Language , @Education, @HyID, @SubHyID, @ThirdHyID
    189            SET @TableRows = @TableRows - 1 
    190     END
    191 END
    192 ELSE --学生没有建立职位搜索器
    193 BEGIN
    194     SELECT @Top1_Industry = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Industry' AND [Set1] = 'Top1'
    195     SELECT @Top2_Industry = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Industry' AND [Set1] = 'Top2'
    196     --.......
    197     SELECT @Enterprise_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasNoSearch' AND [Set2] = 'Enterprise'
    198 
    199     SELECT @Education = (CASE Xldm WHEN '01' THEN '研究生' WHEN '11' THEN '研究生' WHEN '31' THEN '本科' WHEN '61' THEN '高职' ELSE '' END), @Zydm = Gzydm 
    200       FROM .......[dbo].[StudentBasic] WITH (SNAPSHOT)
    201      WHERE Xsxh = @Xsxh
    202 
    203     WHILE @PositionRows != 0
    204     BEGIN
    205 
    206     --基本处理方法同前,也是一个个项目
    207 
    208     SELECT @Pos_SubHyID = [SubHyID], @Pos_Education = [Education], @Pos_SpecialtyIds = [SpecialtyIds], @Pos_Zzjgdm = [OrgCode]
    209         FROM [EnterPrisePositions] WITH(SNAPSHOT)
    210        WHERE ID = @PositionRows
    211          AND DATEDIFF(hour, ReleaseDate, GETDATE()) <= 24 * @ReleaseDateRange
    212 
    213           IF @@ROWCOUNT <> 0
    214        BEGIN
    215             SET @Education_Score = NULL
    216             SET @Profession_Score = NULL
    217             SET @Industry_Score = NULL
    218             SET @Enterprise_Score = NULL
    219 
    220         --..........                  
    221 
    222             IF @WeightSummary != 0
    223             BEGIN
    224                 SET @CurrentScore = @CurrentValue / @WeightSummary
    225                 SET @ParaValue = NULL
    226             SELECT @ParaValue = [Score] FROM @PositionScore WHERE ID = @PositionRows
    227                 BEGIN
    228                     IF @ParaValue IS NULL
    229                     BEGIN
    230                         INSERT INTO @PositionScore(ID, Score,  Education_Score , Profession_Score , Industry_Score , Enterprise_Score)
    231                             VALUES (@PositionRows, @CurrentScore, @Education_Score , @Profession_Score , @Industry_Score , @Enterprise_Score)
    232                     END
    233                     ELSE
    234                     BEGIN
    235                         IF @CurrentScore < @ParaValue
    236                         BEGIN
    237                         UPDATE @PositionScore SET Score = @CurrentScore WHERE ID = @PositionRows
    238                         END
    239                     END
    240                 END
    241             END
    242         END
    243 
    244         SET @PositionRows = @PositionRows - 1
    245     END
    246 
    247     --SELECT @Education, @Zydm
    248 
    249 END
    250 
    251 
    252 --根据各种输入参数输出
    253 IF @ResultType = 1
    254     SELECT s.*, p.*
    255     FROM [EnterPrisePositions] P WITH (SNAPSHOT) LEFT OUTER JOIN @PositionScore S ON p.ID = s.ID
    256     WHERE s.Score IS NOT NULL
    257     ORDER BY Score 
    258     offset @PageSize*@StartPage rows fetch next @PageSize rows only  --Sql2012的新的分页特性,效率很高
    259 ELSE IF @ResultType = 2
    260 BEGIN
    261     if exists(select * from sysobjects where id=object_id('UserPositionResult'))
    262         DROP TABLE UserPositionResult
    263 
    264     CREATE TABLE UserPositionResult
    265     (
    266         [ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 4096),
    267         [Score] FLOAT,
    268         [PosiID] [uniqueidentifier] NOT NULL,
    269         [PosiName] [nvarchar](40) NULL,
    270         [EntName] [nvarchar](80) NOT NULL,
    271     --......
    272 
    273     WHERE s.Score IS NOT NULL
    274     ORDER BY Score 
    275   END
    276 ELSE
    277     SELECT s.Score, p.PosiID, p.PosiName, p.EntName, p.EntUserID, p.JobNature, p.Number, p.Salary, p.Education, p.Specialty
    278     FROM [EnterPrisePositions] P WITH (SNAPSHOT) LEFT OUTER JOIN @PositionScore S ON p.ID = s.ID
    279     WHERE s.Score IS NOT NULL
    280     ORDER BY Score 
    281     offset @PageSize*@StartPage rows fetch next @PageSize rows only
    282 
    283 --输出总记录数
    284 SELECT Count(*) AS [TotalCount] FROM [EnterPrisePositions] P WITH (SNAPSHOT) LEFT OUTER JOIN @PositionScore S ON p.ID = s.ID
    285     WHERE s.Score IS NOT NULL
    286 END
    287 GO
    View Code

     

    Sqlserver内存表的表现还是令人满意的,所有的职位匹配都计算一遍1秒还不到。想想哈希桶的威力是比较大,以前是B树索引,现在直接把时间复杂度近似降低到了O(1),对再大的数据量也是如此(需要设置合适的哈希桶数值)。

    计算时的截图:

    加大数据样本,到了上万条,也是1秒钟搞定。

    心得:

    1. 对于不需要持久化的数据库,Sqlserver的内存表是最佳选择,建表的时候使用DURABILITY = SCHEMA_ONLY选项,不写日志,读写速度扛扛的;
    2. 长的存储过程,千万不要使用游标,性能极其低下,而且或产生一大堆的锁,影响其他进程。(改用While循环);
    3. 对内存表读取使用SnapShot隔离级别;对普通读取,实时数据准确度要求不高的情况下(比如数据分析)使用nolock隔离级别。

    前台页面显示样式如下:

    我们还做了邮件推送,定期给学生推送职位。

    希望这篇文章起到抛砖引玉的作用,能够听取大家的建议。

     

  • 相关阅读:
    整数数组中最大子数组的和的问题(续)
    整数数组中最大子数组的和的问题
    PHP开发环境(Apache+mysql+PHPstorm+php)的搭建
    echart 库 初始
    2.15 学习总结 之 天气预报APP volley(HTTP库)之StringRequest
    2.14 学习总结 之 序列化
    2.13 阶段实战 使用layui重构选课系统
    2.12 学习总结 之 表单校验插件validate
    jsp连接数据库的乱码问题 servlet请求参数编码处理get post
    Rocket
  • 原文地址:https://www.cnblogs.com/thanks/p/5866315.html
Copyright © 2011-2022 走看看