Research and Implementation of Course Correlation Method Based on Data Mining

xiaoxiao2021-03-06  17

With the large-scale application of network-based information systems, a large number of historical data has laid a great foundation for assistive decision. In the teaching management system based on campus network, with the application, information about teaching has already formed a condition for forming a teaching information data warehouse. On the other hand, the expansion of teaching, the academic management personnel and the teacher, and the teachers are difficult to find out the relationship between the previous courses and subsequent courses according to the students' results data distribution, and the decision-making of teaching processes accordingly. Therefore, with the corresponding data mining tool, the curriculum related law or model hidden in the data is discovered, providing support for decision-making, which is very necessary and feasible.

The association rule is one of the main modes of current data mining research, focusing on determining connection between different fields in data, identifying a dependency between multiple domains that meet a given support and credibility threshold. Mining Association Rules refers to the rules having this form in the data warehouse: Due to some events, some events occur. For example, in the student grade database, we can find that "Discrete Mathematics" has more than 80 points, "data structure" score is also 66% of the possibility of 80 points, so we can strengthen the "Discrete Mathematics" class. Teaching to improve the teaching effect of the "data structure" class.

1. Form definition of courses related to association rules

We describe the course-related relationship rules in student grade students.

Set i = {I1, I2, ..., IM} for the collection, D = {T1, T2, ..., TN, ...} for the collections of students' performance library record, Ti íi ( 1 ≤ i ≤ N) and contains only courses that meet the conditions, such as satisfying the curriculum score or more than 80 points.

Its association rules are shaped as the following formula:

Among them, {P1, P2, ... pn} ìi, {Q1, Q2, ... QM} ìi, {P1, P2, ... Pn} ∩ {Q1, Q2, ... QM} = φ.

The definition of credibility and support is given below. Set each rule containing the items defined by the {P1, P2, ... pn} defined by the I, right {q1, q2, ... QM} defined I h, which are the sub-item set I set. Let G = H∪B, indicating that the h and b are collected simultaneously. Conversion of rules C = | g | / | b |, support s = | g | / | d |. For example, 300 of the 1000 student score records show "discrete mathematics" performance excellent, and 150 "data structures" have excellent results in this 300 records, and the rules "Discrete mathematics" has excellent performance "data structure "Excellent credibility C = 150/300 = 0.5, support S = 150/1000 = 0.15.

From the perspective of semantics, the credibility of the rules indicates the correct level of this rule; support means that this rule can introduce a few percent of the target, that is, this rule is important for overall data. The user can define two thresholds, requiring the support and credibility of the rules generated by the data mining system not less than a given threshold.

In this way, we use inclusive, support, and credibility to identify each excavated association rule. For example, we can say that the example mentioned above is:

"Discrete Mathematics" excellent → "Data Structure" has excellent results, c = 0.50, s = 0.15

2. Data mining algorithm

The classic APRIORI algorithm, the mining problem of its associated rules can be broken down into two sub-issues:

1 Find out all the item sets of the user specified minimum support (a non-air set) of the user specified minimum support (ItemSet, i). A project set with minimal support is called a Frequent ItemSets, which is called non-frequent project set. 2 Use frequent project sets to generate the required association rules. For each frequent project set A, all non-air set A is found, if the ratio support (a) / support (a) ≥ minconf, generates association rules a → (a-a).

Since the second sub-problem is more easy and intuitive, a large number of research work is mainly concentrated on the first sub-issue.

For our specific questions, we use the following way.

2.1 Transaction Database Data Structure

In a classic data mining method, transaction databases use horizontal structures such as student grade databases:

Among them, each student is a transaction, which contains all the data of this student. But this structure does not match

We usually manage the structure of the system. At present, the structure of the student score management system we use is longitudinal structure, as follows, each student can have multiple records, each record contains relatively more information. Considering that the database is very large, if it is converted into a horizontal manner, it will take a lot of time, and the formation of a new database will take a lot of storage space, so we decided to use the structure of the original database, only in the mining algorithm Adjustment.

2.2 mining algorithm

The mining algorithm adopts a classic APRIORI algorithm, taking into account the use of longitudinal data structures, there is some adjustments on the algorithm.

The algorithm is as follows:

L1 = find-frequent_l_itemsets (d);

For (k = 2; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

Return L = ∪KLK;

ProCedera Apriori_gen (LK-1: Frequent (k-1) -ItemSets; min_sup: minimum

Support threshold

For Each ItemSet L1∈LK-1

For Each ItemSet L2∈LK-1

IF (L1 [1] = L2 [1]) ∧ (L1 [2] = L2 [2]) ∧ ... (L1 [K-2] = L2 [K-2]) ∧ (L1 [K-1] = L2 [K-1]).

C = l1 join l2; // join step; generate candidates

IF HAS_INFREQUENT_SUBSET (C, LK-1) THEN

Delete C // Prune Step: Remove Unfruitful Candidate

Else Add c to ck;

}

Return CK;

Procedure Has_INFREQUENT_SUBSET (C: Candidate K-item;

LK-1: frequent (k-1) -itemset;

For Each (k-1) -subset s of c

IF S

LK-1 THEN

Return True;

Return False;

The Apriori method determines that frequent item sets by the candidate Frequent project set can get the desired result, but because our database adopts a vertical structure, the data distribution of each transaction is in many records, so our improvement algorithm For each project of the Hou Selection Frequent Project Set, scan the transaction library to get the required data.

Since the transaction database is very large, the current research has revolved the time of reducing the number of scanned transaction libraries [2] [3] [4] [5], but we have a way, is this method is feasible ? To this end, we have taken two techniques: (1) from the database implementation technology, through the collection characteristics of data queries in SQL, the inquiry of the query is designed in the cycle:

Select count (*) from k1999 where (km = kc1 and (select xh from k1999 where km = kc2 and (xh in (SELECT XH from k1999 where km = kc3 and ....kscj> 80)) and kscj > 80)) and kscj> 80);

KC1, KC2, ... for different courses in the statement.

This statement can find a few homework at the same time for excellent students, effectively reduce the cycle of the program and information search volume, which greatly improves the operational efficiency of the program.

Compared to the classic APRIORI method, although every frequent project set only needs to scan a disclosure of the database, it is necessary to find the candidate items included in each business during its scanning process, and it takes a certain amount of statistics. Time, so from this aspect, the method we take is feasible.

(2) Methods of time-mine, as described below.

3. Time mining method

The transaction database is usually generated year by year. Many of them are we have already used when making mining, and has also produced many valid data. If we have lost the previous data in future mining, it is wasteful for resources. Waste time;

Therefore, we propose a method of minute mining, so that the data generated in front of the previous excavation will be retained, and only the new data is only processed while excavating, and the mining speed can be greatly improved.

Such as: The data of the 1997, 1998, 1999, 2000, four schools, the first time, produced frequently 1 sets as follows:

Frequent 2 sets of frequent 1 sets are as follows:

In this kind, the N items generated will be retained at the end of the mining. If we make a mining in 2001, we only need to scan the 2001 data, and the data of this year, the number of applicants, results The number of outstanding people is accumulated to the original data, recalculate the support.

This is only ideal, that is, the support of the support. But usually support is randomly adjusted.

We have found that from 2 sets of candidates, the maximum cost is the maximum cost, resulting in the bottleneck of algorithm efficiency, solving this problem, can improve the excavation speed. Therefore, we only consider the time-time mining of the candidate frequently 2 sets. The mining algorithm is as follows:

Open history_record; // check history_record

IF Empty Then {

C1 = find-candidate-frequent_l_itemsets (d);

C2 = APRIORI_GEN (L1, Least_SUP); // Set Least_Sup That You Recognizer

For Each C∈CK {// Scan D for Count

C.count ;

}

}

Else {

if Have_New_Data Then {// Check D

Locate new_data;

For each c∈ck {// scan new_data in d for counts

C.count ;

}

}

Else {

L2 = filter (c∈C2 | SUP ≥ min_sup); for (k = 3; LK-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

}

Return L = ∪KLK;

The APRIORI_GEN (LK-1, 0) function is unchanged.

The system needs to set up a support Least_Sup, which is the smallest user-recognized, and cannot be changed in the system run, or may be defined as 0. Thus, when data mining is performed at each different time, a candidate 1 set and candidate 2 sets of support for the support is 0 (or any number of users set) are permanently retained. When the system needs to run, first adopted Database filtering technology can quickly get two sets. Break through this bottleneck, the system running speed will be greatly improved.

4. Conclude

Through research, we use the above algorithm for curriculum correlation analysis, set its unchanging minimum support Least_SUP to 0.2, compared with the use of traditional methods, the mining speed is greatly improved. At the same time, it further demonstrates the effectiveness and practicability of the association rules in the course-related analysis, which will provide students with decisions related to courses.

About the Author:

Qu Shouning: Professor, Male, Male, September 1962, Vice President, Vice President of September 1962, Master Tutor, Division of Shandong Computer Society, and Director of Jinan Computer Society. Mainly engaged in teaching and research in network databases and information systems, assumes 5 national 863 programs, and 5 items for 21st century teaching reform projects and provincial and ministerial programs, published more than 30 papers, and won 4 provincial and ministerial results prizes.

The Apriori method determines that frequent item sets by the candidate Frequent project set can get the desired result, but because our database adopts a vertical structure, the data distribution of each transaction is in many records, so our improvement algorithm For each project of the Hou Selection Frequent Project Set, scan the transaction library to get the required data.

Since the transaction database is very large, the current research has revolved the time of reducing the number of scanned transaction libraries [2] [3] [4] [5], but we have a way, is this method is feasible ? To this end, we have taken two techniques:

(1) From the database implementation technology, through the collection characteristics of data queries in SQL, a SQL statement is designed in the cycle in the loop:

Select count (*) from k1999 where (km = kc1 and (select xh from k1999 where km = kc2 and (xh in (SELECT XH from k1999 where km = kc3 and ....kscj> 80)) and kscj > 80)) and kscj> 80);

KC1, KC2, ... for different courses in the statement.

This statement can find a few homework at the same time for excellent students, effectively reduce the cycle of the program and information search volume, which greatly improves the operational efficiency of the program.

Compared to the classic APRIORI method, although every frequent project set only needs to scan a disclosure of the database, it is necessary to find the candidate items included in each business during its scanning process, and it takes a certain amount of statistics. Time, so from this aspect, the method we take is feasible.

(2) Methods of time-mine, as described below.

3. Time mining method The transaction database is usually generated year by year. Many of them have been used in the mining, and many valid data have been generated. If we have lost the previous data re-performed in future mining It is wasteful to waste resources;

Therefore, we propose a method of minute mining, so that the data generated in front of the previous excavation will be retained, and only the new data is only processed while excavating, and the mining speed can be greatly improved.

Such as: The data of the 1997, 1998, 1999, 2000, four schools, the first time, produced frequently 1 sets as follows:

Frequent 2 sets of frequent 1 sets are as follows:

In this kind, the N items generated will be retained at the end of the mining. If we make a mining in 2001, we only need to scan the 2001 data, and the data of this year, the number of applicants, results The number of outstanding people is accumulated to the original data, recalculate the support.

This is only ideal, that is, the support of the support. But usually support is randomly adjusted.

We have found that from 2 sets of candidates, the maximum cost is the maximum cost, resulting in the bottleneck of algorithm efficiency, solving this problem, can improve the excavation speed. Therefore, we only consider the time-time mining of the candidate frequently 2 sets. The mining algorithm is as follows:

Open history_record; // check history_record

IF Empty Then {

C1 = find-candidate-frequent_l_itemsets (d);

C2 = APRIORI_GEN (L1, Least_SUP); // Set Least_Sup That You Recognizer

For Each C∈CK {// Scan D for Count

C.count ;

}

}

Else {

if Have_New_Data Then {// Check D

Locate new_data;

For each c∈ck {// scan new_data in d for counts

C.count ;

}

}

Else {

L2 = filter (c∈C2 | SUP ≥ min_sup);

For (k = 3; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

}

Return L = ∪KLK;

The APRIORI_GEN (LK-1, 0) function is unchanged.

The system needs to set up a support Least_Sup, which is the smallest user-recognized, and cannot be changed in the system run, or may be defined as 0. Thus, when data mining is performed at each different time, a candidate 1 set and candidate 2 sets of support for the support is 0 (or any number of users set) are permanently retained. When the system needs to run, first adopted Database filtering technology can quickly get two sets. Break through this bottleneck, the system running speed will be greatly improved.

4. Conclude

Through research, we use the above algorithm for curriculum correlation analysis, set its unchanging minimum support Least_SUP to 0.2, compared with the use of traditional methods, the mining speed is greatly improved. At the same time, it further demonstrates the effectiveness and practicability of the association rules in the course-related analysis, which will provide students with decisions related to courses.

About the Author:

Qu Shouning: Professor, Male, Male, September 1962, Vice President, Vice President of September 1962, Master Tutor, Division of Shandong Computer Society, and Director of Jinan Computer Society. Mainly engaged in teaching and research in network databases and information systems, assumes 5 national 863 programs, and 5 items for 21st century teaching reform projects and provincial and ministerial programs, published more than 30 papers, and won 4 provincial and ministerial results prizes. 2.2 mining algorithm

The mining algorithm adopts a classic APRIORI algorithm, taking into account the use of longitudinal data structures, there is some adjustments on the algorithm.

The algorithm is as follows:

L1 = find-frequent_l_itemsets (d);

For (k = 2; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

Return L = ∪KLK;

ProCedera Apriori_gen (LK-1: Frequent (k-1) -ItemSets; min_sup: minimum

Support threshold

For Each ItemSet L1∈LK-1

For Each ItemSet L2∈LK-1

IF (L1 [1] = L2 [1]) ∧ (L1 [2] = L2 [2]) ∧ ... (L1 [K-2] = L2 [K-2]) ∧ (L1 [K-1] = L2 [K-1]).

C = l1 join l2; // join step; generate candidates

IF HAS_INFREQUENT_SUBSET (C, LK-1) THEN

Delete C // Prune Step: Remove Unfruitful Candidate

Else Add c to ck;

}

Return CK;

Procedure Has_INFREQUENT_SUBSET (C: Candidate K-item;

LK-1: frequent (k-1) -itemset;

For Each (k-1) -subset s of c

IF S

LK-1 THEN

Return True;

Return False;

The Apriori method determines that frequent item sets by the candidate Frequent project set can get the desired result, but because our database adopts a vertical structure, the data distribution of each transaction is in many records, so our improvement algorithm For each project of the Hou Selection Frequent Project Set, scan the transaction library to get the required data.

Since the transaction database is very large, the current research has revolved the time of reducing the number of scanned transaction libraries [2] [3] [4] [5], but we have a way, is this method is feasible ? To this end, we have taken two techniques:

(1) From the database implementation technology, through the collection characteristics of data queries in SQL, a SQL statement is designed in the cycle in the loop:

Select count (*) from k1999 where (km = kc1 and (select xh from k1999 where km = kc2 and (xh in (SELECT XH from k1999 where km = kc3 and ....kscj> 80)) and kscj > 80)) and kscj> 80);

KC1, KC2, ... for different courses in the statement.

This statement can find a few homework at the same time for excellent students, effectively reduce the cycle of the program and information search volume, which greatly improves the operational efficiency of the program. Compared to the classic APRIORI method, although every frequent project set only needs to scan a disclosure of the database, it is necessary to find the candidate items included in each business during its scanning process, and it takes a certain amount of statistics. Time, so from this aspect, the method we take is feasible.

(2) Methods of time-mine, as described below.

3. Time mining method

The transaction database is usually generated year by year. Many of them are we have already used when making mining, and has also produced many valid data. If we have lost the previous data in future mining, it is wasteful for resources. Waste time;

Therefore, we propose a method of minute mining, so that the data generated in front of the previous excavation will be retained, and only the new data is only processed while excavating, and the mining speed can be greatly improved.

Such as: The data of the 1997, 1998, 1999, 2000, four schools, the first time, produced frequently 1 sets as follows:

Frequent 2 sets of frequent 1 sets are as follows:

In this kind, the N items generated will be retained at the end of the mining. If we make a mining in 2001, we only need to scan the 2001 data, and the data of this year, the number of applicants, results The number of outstanding people is accumulated to the original data, recalculate the support.

This is only ideal, that is, the support of the support. But usually support is randomly adjusted.

We have found that from 2 sets of candidates, the maximum cost is the maximum cost, resulting in the bottleneck of algorithm efficiency, solving this problem, can improve the excavation speed. Therefore, we only consider the time-time mining of the candidate frequently 2 sets. The mining algorithm is as follows:

Open history_record; // check history_record

IF Empty Then {

C1 = find-candidate-frequent_l_itemsets (d);

C2 = APRIORI_GEN (L1, Least_SUP); // Set Least_Sup That You Recognizer

For Each C∈CK {// Scan D for Count

C.count ;

}

}

Else {

if Have_New_Data Then {// Check D

Locate new_data;

For each c∈ck {// scan new_data in d for counts

C.count ;

}

}

Else {

L2 = filter (c∈C2 | SUP ≥ min_sup);

For (k = 3; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

}

Return L = ∪KLK;

The APRIORI_GEN (LK-1, 0) function is unchanged.

The system needs to set up a support Least_Sup, which is the smallest user-recognized, and cannot be changed in the system run, or may be defined as 0. Thus, when data mining is performed at each different time, a candidate 1 set and candidate 2 sets of support for the support is 0 (or any number of users set) are permanently retained. When the system needs to run, first adopted Database filtering technology can quickly get two sets. Break through this bottleneck, the system running speed will be greatly improved. 4. Conclude

Through research, we use the above algorithm for curriculum correlation analysis, set its unchanging minimum support Least_SUP to 0.2, compared with the use of traditional methods, the mining speed is greatly improved. At the same time, it further demonstrates the effectiveness and practicability of the association rules in the course-related analysis, which will provide students with decisions related to courses.

About the Author:

Qu Shouning: Professor, Male, Male, September 1962, Vice President, Vice President of September 1962, Master Tutor, Division of Shandong Computer Society, and Director of Jinan Computer Society. Mainly engaged in teaching and research in network databases and information systems, assumes 5 national 863 programs, and 5 items for 21st century teaching reform projects and provincial and ministerial programs, published more than 30 papers, and won 4 provincial and ministerial results prizes.

The Apriori method determines that frequent item sets by the candidate Frequent project set can get the desired result, but because our database adopts a vertical structure, the data distribution of each transaction is in many records, so our improvement algorithm For each project of the Hou Selection Frequent Project Set, scan the transaction library to get the required data.

Since the transaction database is very large, the current research has revolved the time of reducing the number of scanned transaction libraries [2] [3] [4] [5], but we have a way, is this method is feasible ? To this end, we have taken two techniques:

(1) From the database implementation technology, through the collection characteristics of data queries in SQL, a SQL statement is designed in the cycle in the loop:

Select count (*) from k1999 where (km = kc1 and (select xh from k1999 where km = kc2 and (xh in (SELECT XH from k1999 where km = kc3 and ....kscj> 80)) and kscj > 80)) and kscj> 80);

KC1, KC2, ... for different courses in the statement.

This statement can find a few homework at the same time for excellent students, effectively reduce the cycle of the program and information search volume, which greatly improves the operational efficiency of the program.

Compared to the classic APRIORI method, although every frequent project set only needs to scan a disclosure of the database, it is necessary to find the candidate items included in each business during its scanning process, and it takes a certain amount of statistics. Time, so from this aspect, the method we take is feasible.

(2) Methods of time-mine, as described below.

3. Time mining method

The transaction database is usually generated year by year. Many of them are we have already used when making mining, and has also produced many valid data. If we have lost the previous data in future mining, it is wasteful for resources. Waste time;

Therefore, we propose a method of minute mining, so that the data generated in front of the previous excavation will be retained, and only the new data is only processed while excavating, and the mining speed can be greatly improved.

Such as: The data of the 1997, 1998, 1999, 2000 four academies is used, and the generation of frequently 1 set is as follows: The frequently 1 set generation is frequent 2 sets as follows:

In this kind, the N items generated will be retained at the end of the mining. If we make a mining in 2001, we only need to scan the 2001 data, and the data of this year, the number of applicants, results The number of outstanding people is accumulated to the original data, recalculate the support.

This is only ideal, that is, the support of the support. But usually support is randomly adjusted.

We have found that from 2 sets of candidates, the maximum cost is the maximum cost, resulting in the bottleneck of algorithm efficiency, solving this problem, can improve the excavation speed. Therefore, we only consider the time-time mining of the candidate frequently 2 sets. The mining algorithm is as follows:

Open history_record; // check history_record

IF Empty Then {

C1 = find-candidate-frequent_l_itemsets (d);

C2 = APRIORI_GEN (L1, Least_SUP); // Set Least_Sup That You Recognizer

For Each C∈CK {// Scan D for Count

C.count ;

}

}

Else {

if Have_New_Data Then {// Check D

Locate new_data;

For each c∈ck {// scan new_data in d for counts

C.count ;

}

}

Else {

L2 = filter (c∈C2 | SUP ≥ min_sup);

For (k = 3; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

}

Return L = ∪KLK;

The APRIORI_GEN (LK-1, 0) function is unchanged.

The system needs to set up a support Least_Sup, which is the smallest user-recognized, and cannot be changed in the system run, or may be defined as 0. Thus, when data mining is performed at each different time, a candidate 1 set and candidate 2 sets of support for the support is 0 (or any number of users set) are permanently retained. When the system needs to run, first adopted Database filtering technology can quickly get two sets. Break through this bottleneck, the system running speed will be greatly improved.

4. Conclude

Through research, we use the above algorithm for curriculum correlation analysis, set its unchanging minimum support Least_SUP to 0.2, compared with the use of traditional methods, the mining speed is greatly improved. At the same time, it further demonstrates the effectiveness and practicability of the association rules in the course-related analysis, which will provide students with decisions related to courses.

About the Author:

Qu Shouning: Professor, Male, Male, September 1962, Vice President, Vice President of September 1962, Master Tutor, Division of Shandong Computer Society, and Director of Jinan Computer Society. Mainly engaged in teaching and research in network databases and information systems, assumes 5 national 863 programs, and 5 items for 21st century teaching reform projects and provincial and ministerial programs, published more than 30 papers, and won 4 provincial and ministerial results prizes.

Among them, each student is a transaction, which contains all the data of this student. But such a structure does not meet the structure of our usual management system. At present, the structure of the student score management system we use is longitudinal structure, as follows, each student can have multiple records, each record contains relatively more information. Considering that the database is very large, if it is converted into a horizontal manner, it will take a lot of time, and the formation of a new database will take a lot of storage space, so we decided to use the structure of the original database, only in the mining algorithm Adjustment.

2.2 mining algorithm

The mining algorithm adopts a classic APRIORI algorithm, taking into account the use of longitudinal data structures, there is some adjustments on the algorithm.

The algorithm is as follows:

L1 = find-frequent_l_itemsets (d);

For (k = 2; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

Return L = ∪KLK;

ProCedera Apriori_gen (LK-1: Frequent (k-1) -ItemSets; min_sup: minimum

Support threshold

For Each ItemSet L1∈LK-1

For Each ItemSet L2∈LK-1

IF (L1 [1] = L2 [1]) ∧ (L1 [2] = L2 [2]) ∧ ... (L1 [K-2] = L2 [K-2]) ∧ (L1 [K-1] = L2 [K-1]).

C = l1 join l2; // join step; generate candidates

IF HAS_INFREQUENT_SUBSET (C, LK-1) THEN

Delete C // Prune Step: Remove Unfruitful Candidate

Else Add c to ck;

}

Return CK;

Procedure Has_INFREQUENT_SUBSET (C: Candidate K-item;

LK-1: frequent (k-1) -itemset;

For Each (k-1) -subset s of c

IF S

LK-1 THEN

Return True;

Return False;

The Apriori method determines that frequent item sets by the candidate Frequent project set can get the desired result, but because our database adopts a vertical structure, the data distribution of each transaction is in many records, so our improvement algorithm For each project of the Hou Selection Frequent Project Set, scan the transaction library to get the required data.

Since the transaction database is very large, the current research has revolved the time of reducing the number of scanned transaction libraries [2] [3] [4] [5], but we have a way, is this method is feasible ? To this end, we have taken two techniques:

(1) From the database implementation technology, through the collection characteristics of data queries in SQL, a SQL statement is designed in the cycle in the loop:

Select count (*) from k1999 where (km = kc1 and (select xh from k1999 where km = kc2 and (xh in (SELECT XH from k1999 where km = kc3 and ....kscj> 80)) and kscj > 80)) and kscj> 80);

KC1, KC2, ... for different courses in the statement.

This statement can find a few homework at the same time for excellent students, effectively reduce the cycle of the program and information search volume, which greatly improves the operational efficiency of the program. Compared to the classic APRIORI method, although every frequent project set only needs to scan a disclosure of the database, it is necessary to find the candidate items included in each business during its scanning process, and it takes a certain amount of statistics. Time, so from this aspect, the method we take is feasible.

(2) Methods of time-mine, as described below.

3. Time mining method

The transaction database is usually generated year by year. Many of them are we have already used when making mining, and has also produced many valid data. If we have lost the previous data in future mining, it is wasteful for resources. Waste time;

Therefore, we propose a method of minute mining, so that the data generated in front of the previous excavation will be retained, and only the new data is only processed while excavating, and the mining speed can be greatly improved.

Such as: The data of the 1997, 1998, 1999, 2000, four schools, the first time, produced frequently 1 sets as follows:

Frequent 2 sets of frequent 1 sets are as follows:

In this kind, the N items generated will be retained at the end of the mining. If we make a mining in 2001, we only need to scan the 2001 data, and the data of this year, the number of applicants, results The number of outstanding people is accumulated to the original data, recalculate the support.

This is only ideal, that is, the support of the support. But usually support is randomly adjusted.

We have found that from 2 sets of candidates, the maximum cost is the maximum cost, resulting in the bottleneck of algorithm efficiency, solving this problem, can improve the excavation speed. Therefore, we only consider the time-time mining of the candidate frequently 2 sets. The mining algorithm is as follows:

Open history_record; // check history_record

IF Empty Then {

C1 = find-candidate-frequent_l_itemsets (d);

C2 = APRIORI_GEN (L1, Least_SUP); // Set Least_Sup That You Recognizer

For Each C∈CK {// Scan D for Count

C.count ;

}

}

Else {

if Have_New_Data Then {// Check D

Locate new_data;

For each c∈ck {// scan new_data in d for counts

C.count ;

}

}

Else {

L2 = filter (c∈C2 | SUP ≥ min_sup);

For (k = 3; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

}

Return L = ∪KLK;

The APRIORI_GEN (LK-1, 0) function is unchanged.

The system needs to set up a support Least_Sup, which is the smallest user-recognized, and cannot be changed in the system run, or may be defined as 0. Thus, when data mining is performed at each different time, a candidate 1 set and candidate 2 sets of support for the support is 0 (or any number of users set) are permanently retained. When the system needs to run, first adopted Database filtering technology can quickly get two sets. Break through this bottleneck, the system running speed will be greatly improved. 4. Conclude

Through research, we use the above algorithm for curriculum correlation analysis, set its unchanging minimum support Least_SUP to 0.2, compared with the use of traditional methods, the mining speed is greatly improved. At the same time, it further demonstrates the effectiveness and practicability of the association rules in the course-related analysis, which will provide students with decisions related to courses.

About the Author:

Qu Shouning: Professor, Male, Male, September 1962, Vice President, Vice President of September 1962, Master Tutor, Division of Shandong Computer Society, and Director of Jinan Computer Society. Mainly engaged in teaching and research in network databases and information systems, assumes 5 national 863 programs, and 5 items for 21st century teaching reform projects and provincial and ministerial programs, published more than 30 papers, and won 4 provincial and ministerial results prizes.

The Apriori method determines that frequent item sets by the candidate Frequent project set can get the desired result, but because our database adopts a vertical structure, the data distribution of each transaction is in many records, so our improvement algorithm For each project of the Hou Selection Frequent Project Set, scan the transaction library to get the required data.

Since the transaction database is very large, the current research has revolved the time of reducing the number of scanned transaction libraries [2] [3] [4] [5], but we have a way, is this method is feasible ? To this end, we have taken two techniques:

(1) From the database implementation technology, through the collection characteristics of data queries in SQL, a SQL statement is designed in the cycle in the loop:

Select count (*) from k1999 where (km = kc1 and (select xh from k1999 where km = kc2 and (xh in (SELECT XH from k1999 where km = kc3 and ....kscj> 80)) and kscj > 80)) and kscj> 80);

KC1, KC2, ... for different courses in the statement.

This statement can find a few homework at the same time for excellent students, effectively reduce the cycle of the program and information search volume, which greatly improves the operational efficiency of the program.

Compared to the classic APRIORI method, although every frequent project set only needs to scan a disclosure of the database, it is necessary to find the candidate items included in each business during its scanning process, and it takes a certain amount of statistics. Time, so from this aspect, the method we take is feasible.

(2) Methods of time-mine, as described below.

3. Time mining method

The transaction database is usually generated year by year. Many of them are we have already used when making mining, and has also produced many valid data. If we have lost the previous data in future mining, it is wasteful for resources. Waste time;

Therefore, we propose a method of minute mining, so that the data generated in front of the previous excavation will be retained, and only the new data is only processed while excavating, and the mining speed can be greatly improved.

Such as: The data of the 1997, 1998, 1999, 2000 four academies is used, and the generation of frequently 1 set is as follows: The frequently 1 set generation is frequent 2 sets as follows:

In this kind, the N items generated will be retained at the end of the mining. If we make a mining in 2001, we only need to scan the 2001 data, and the data of this year, the number of applicants, results The number of outstanding people is accumulated to the original data, recalculate the support.

This is only ideal, that is, the support of the support. But usually support is randomly adjusted.

We have found that from 2 sets of candidates, the maximum cost is the maximum cost, resulting in the bottleneck of algorithm efficiency, solving this problem, can improve the excavation speed. Therefore, we only consider the time-time mining of the candidate frequently 2 sets. The mining algorithm is as follows:

Open history_record; // check history_record

IF Empty Then {

C1 = find-candidate-frequent_l_itemsets (d);

C2 = APRIORI_GEN (L1, Least_SUP); // Set Least_Sup That You Recognizer

For Each C∈CK {// Scan D for Count

C.count ;

}

}

Else {

if Have_New_Data Then {// Check D

Locate new_data;

For each c∈ck {// scan new_data in d for counts

C.count ;

}

}

Else {

L2 = filter (c∈C2 | SUP ≥ min_sup);

For (k = 3; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

}

Return L = ∪KLK;

The APRIORI_GEN (LK-1, 0) function is unchanged.

The system needs to set up a support Least_Sup, which is the smallest user-recognized, and cannot be changed in the system run, or may be defined as 0. Thus, when data mining is performed at each different time, a candidate 1 set and candidate 2 sets of support for the support is 0 (or any number of users set) are permanently retained. When the system needs to run, first adopted Database filtering technology can quickly get two sets. Break through this bottleneck, the system running speed will be greatly improved.

4. Conclude

Through research, we use the above algorithm for curriculum correlation analysis, set its unchanging minimum support Least_SUP to 0.2, compared with the use of traditional methods, the mining speed is greatly improved. At the same time, it further demonstrates the effectiveness and practicability of the association rules in the course-related analysis, which will provide students with decisions related to courses.

About the Author:

Qu Shouning: Professor, Male, Male, September 1962, Vice President, Vice President of September 1962, Master Tutor, Division of Shandong Computer Society, and Director of Jinan Computer Society. Mainly engaged in teaching and research in network databases and information systems, assumes 5 national 863 programs, and 5 items for 21st century teaching reform projects and provincial and ministerial programs, published more than 30 papers, and won 4 provincial and ministerial results prizes.

2.2 mining algorithm

The mining algorithm adopts a classic APRIORI algorithm, taking into account the use of longitudinal data structures, there is some adjustments on the algorithm. The algorithm is as follows:

L1 = find-frequent_l_itemsets (d);

For (k = 2; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

Return L = ∪KLK;

ProCedera Apriori_gen (LK-1: Frequent (k-1) -ItemSets; min_sup: minimum

Support threshold

For Each ItemSet L1∈LK-1

For Each ItemSet L2∈LK-1

IF (L1 [1] = L2 [1]) ∧ (L1 [2] = L2 [2]) ∧ ... (L1 [K-2] = L2 [K-2]) ∧ (L1 [K-1] = L2 [K-1]).

C = l1 join l2; // join step; generate candidates

IF HAS_INFREQUENT_SUBSET (C, LK-1) THEN

Delete C // Prune Step: Remove Unfruitful Candidate

Else Add c to ck;

}

Return CK;

Procedure Has_INFREQUENT_SUBSET (C: Candidate K-item;

LK-1: frequent (k-1) -itemset;

For Each (k-1) -subset s of c

IF S

LK-1 THEN

Return True;

Return False;

The Apriori method determines that frequent item sets by the candidate Frequent project set can get the desired result, but because our database adopts a vertical structure, the data distribution of each transaction is in many records, so our improvement algorithm For each project of the Hou Selection Frequent Project Set, scan the transaction library to get the required data.

Since the transaction database is very large, the current research has revolved the time of reducing the number of scanned transaction libraries [2] [3] [4] [5], but we have a way, is this method is feasible ? To this end, we have taken two techniques:

(1) From the database implementation technology, through the collection characteristics of data queries in SQL, a SQL statement is designed in the cycle in the loop:

Select count (*) from k1999 where (km = kc1 and (select xh from k1999 where km = kc2 and (xh in (SELECT XH from k1999 where km = kc3 and ....kscj> 80)) and kscj > 80)) and kscj> 80);

KC1, KC2, ... for different courses in the statement.

This statement can find a few homework at the same time for excellent students, effectively reduce the cycle of the program and information search volume, which greatly improves the operational efficiency of the program.

Compared to the classic APRIORI method, although every frequent project set only needs to scan a disclosure of the database, it is necessary to find the candidate items included in each business during its scanning process, and it takes a certain amount of statistics. Time, so from this aspect, the method we take is feasible.

(2) Methods of time-mine, as described below.

3. Time mining method

The transaction database is usually generated year by year. Many of them are we have already used when making mining, and has also produced many valid data. If we have lost the previous data in future mining, it is wasteful for resources. Waste the time; therefore, we propose a method of mining, so that the data generated in front of the previous excavation is retained, and only the new data is only processed while excavating, and the mining speed can be greatly improved.

Such as: The data of the 1997, 1998, 1999, 2000, four schools, the first time, produced frequently 1 sets as follows:

Frequent 2 sets of frequent 1 sets are as follows:

In this kind, the N items generated will be retained at the end of the mining. If we make a mining in 2001, we only need to scan the 2001 data, and the data of this year, the number of applicants, results The number of outstanding people is accumulated to the original data, recalculate the support.

This is only ideal, that is, the support of the support. But usually support is randomly adjusted.

We have found that from 2 sets of candidates, the maximum cost is the maximum cost, resulting in the bottleneck of algorithm efficiency, solving this problem, can improve the excavation speed. Therefore, we only consider the time-time mining of the candidate frequently 2 sets. The mining algorithm is as follows:

Open history_record; // check history_record

IF Empty Then {

C1 = find-candidate-frequent_l_itemsets (d);

C2 = APRIORI_GEN (L1, Least_SUP); // Set Least_Sup That You Recognizer

For Each C∈CK {// Scan D for Count

C.count ;

}

}

Else {

if Have_New_Data Then {// Check D

Locate new_data;

For each c∈ck {// scan new_data in d for counts

C.count ;

}

}

Else {

L2 = filter (c∈C2 | SUP ≥ min_sup);

For (k = 3; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

}

Return L = ∪KLK;

The APRIORI_GEN (LK-1, 0) function is unchanged.

The system needs to set up a support Least_Sup, which is the smallest user-recognized, and cannot be changed in the system run, or may be defined as 0. Thus, when data mining is performed at each different time, a candidate 1 set and candidate 2 sets of support for the support is 0 (or any number of users set) are permanently retained. When the system needs to run, first adopted Database filtering technology can quickly get two sets. Break through this bottleneck, the system running speed will be greatly improved.

4. Conclude

Through research, we use the above algorithm for curriculum correlation analysis, set its unchanging minimum support Least_SUP to 0.2, compared with the use of traditional methods, the mining speed is greatly improved. At the same time, it further demonstrates the effectiveness and practicability of the association rules in the course-related analysis, which will provide students with decisions related to courses.

About the Author:

Qu Shouning: Professor, Male, Male, September 1962, Vice President, Vice President of September 1962, Master Tutor, Division of Shandong Computer Society, and Director of Jinan Computer Society. Mainly engaged in teaching and research in network databases and information systems, assumes 5 national 863 programs, and 5 items for 21st century teaching reform projects and provincial and ministerial programs, published more than 30 papers, and won 4 provincial and ministerial results prizes. The Apriori method determines that frequent item sets by the candidate Frequent project set can get the desired result, but because our database adopts a vertical structure, the data distribution of each transaction is in many records, so our improvement algorithm For each project of the Hou Selection Frequent Project Set, scan the transaction library to get the required data.

Since the transaction database is very large, the current research has revolved the time of reducing the number of scanned transaction libraries [2] [3] [4] [5], but we have a way, is this method is feasible ? To this end, we have taken two techniques:

(1) From the database implementation technology, through the collection characteristics of data queries in SQL, a SQL statement is designed in the cycle in the loop:

Select count (*) from k1999 where (km = kc1 and (select xh from k1999 where km = kc2 and (xh in (SELECT XH from k1999 where km = kc3 and ....kscj> 80)) and kscj > 80)) and kscj> 80);

KC1, KC2, ... for different courses in the statement.

This statement can find a few homework at the same time for excellent students, effectively reduce the cycle of the program and information search volume, which greatly improves the operational efficiency of the program.

Compared to the classic APRIORI method, although every frequent project set only needs to scan a disclosure of the database, it is necessary to find the candidate items included in each business during its scanning process, and it takes a certain amount of statistics. Time, so from this aspect, the method we take is feasible.

(2) Methods of time-mine, as described below.

3. Time mining method

The transaction database is usually generated year by year. Many of them are we have already used when making mining, and has also produced many valid data. If we have lost the previous data in future mining, it is wasteful for resources. Waste time;

Therefore, we propose a method of minute mining, so that the data generated in front of the previous excavation will be retained, and only the new data is only processed while excavating, and the mining speed can be greatly improved.

Such as: The data of the 1997, 1998, 1999, 2000, four schools, the first time, produced frequently 1 sets as follows:

Frequent 2 sets of frequent 1 sets are as follows:

In this kind, the N items generated will be retained at the end of the mining. If we make a mining in 2001, we only need to scan the 2001 data, and the data of this year, the number of applicants, results The number of outstanding people is accumulated to the original data, recalculate the support.

This is only ideal, that is, the support of the support. But usually support is randomly adjusted.

We have found that from 2 sets of candidates, the maximum cost is the maximum cost, resulting in the bottleneck of algorithm efficiency, solving this problem, can improve the excavation speed. Therefore, we only consider the time-time mining of the candidate frequently 2 sets. The mining algorithm is as follows:

Open history_record; // check history_record

If Empty Then {C1 = FIND-CANDIDATE-FREQUENT_L_ITEMSETS (D);

C2 = APRIORI_GEN (L1, Least_SUP); // Set Least_Sup That You Recognizer

For Each C∈CK {// Scan D for Count

C.count ;

}

}

Else {

if Have_New_Data Then {// Check D

Locate new_data;

For each c∈ck {// scan new_data in d for counts

C.count ;

}

}

Else {

L2 = filter (c∈C2 | SUP ≥ min_sup);

For (k = 3; lk-1 ≠ ф, k ) {

CK = APRIORI_GEN (LK-1, Min_SUP);

For Each C∈CK {// Scan D for Count

C.count ;

}

LK = {c∈ck | c.count ≥ min_sup}

}

}

Return L = ∪KLK;

The APRIORI_GEN (LK-1, 0) function is unchanged.

The system needs to set up a support Least_Sup, which is the smallest user-recognized, and cannot be changed in the system run, or may be defined as 0. Thus, when data mining is performed at each different time, a candidate 1 set and candidate 2 sets of support for the support is 0 (or any number of users set) are permanently retained. When the system needs to run, first adopted Database filtering technology can quickly get two sets. Break through this bottleneck, the system running speed will be greatly improved.

4. Conclude

Through research, we use the above algorithm for curriculum correlation analysis, set its unchanging minimum support Least_SUP to 0.2, compared with the use of traditional methods, the mining speed is greatly improved. At the same time, it further demonstrates the effectiveness and practicability of the association rules in the course-related analysis, which will provide students with decisions related to courses.

About the Author:

Qu Shouning: Professor, Male, Male, September 1962, Vice President, Vice President of September 1962, Master Tutor, Division of Shandong Computer Society, and Director of Jinan Computer Society. Mainly engaged in teaching and research in network databases and information systems, assumes 5 national 863 programs, and 5 items for 21st century teaching reform projects and provincial and ministerial programs, published more than 30 papers, and won 4 provincial and ministerial results prizes.

转载请注明原文地址:https://www.9cbs.com/read-45506.html

New Post(0)