Thesis Defence: Yange Bian (MSc Mathematics)

Date
to
Campus
Online

 You are encouraged to attend the defence. The details of the defence and attendance information is included below.

Date: 27 January 2026

Time: 11:30 AM (PT)

Defence mode: Remote

Virtual Attendance: via Microsoft Teams

LINK TO JOIN: Please contact the Office of Graduate Administration for information regarding remote attendance for online defences.

To ensure the defence proceeds with no interruptions, please mute your audio and video on entry and do not inadvertently share your screen. The meeting will be locked to entry 5 minutes after it begins: please ensure you are on time.

Thesis entitled: K-Medoids Based Semi-Supervised Classification Methods: An Application to the Categorization of Respondents by Their Self-Rated Mental Health Status for the Canadian Perspectives Survey Series 4

 Abstract: 

Background: As data representativeness has been hardly addressed in the existing research for categorical data, this work proposes a mode-based measure of data representativeness by response level. Two novel methods, one an additive procedure and the other a multilayered filtering procedure, are also proposed that both modify the K-Medoids clustering algorithm to select a small set of highly representative and distinctive individuals to represent compactly a dataset with mostly categorical variables. The fourth survey of the Canadian Perspectives Survey Series (CPSS4) is designed to monitor the major impacts of COVID-19. CPSS4 includes 4,218 original responses that, in this work, are pseudo-randomly resampled into a subsample of 400 observations according to survey weights representative of the computed population counts by age group and sex by geographic region from the Labor Force Survey (LFS) demography projection data. Each survey questionnaire assesses 98 unique variables that are either categorical or numerical. The chosen response variable is the self-rated mental health status comprising the following 5 ordinal levels: ‘0: Poor’, ‘1: Fair’, ‘2: Good’, ‘3: Very Good’, and ‘4: Excellent’.

Methods: The original ordinal response levels are segregated by different numbers of cut points to form 15 possible combinations for analysis. The additive procedure involves visually assessing each response level to select a fixed number of separated, distinguishable clusters represented by medoids as data prototypes. In the multilayered procedure, the stepwise, comprehensive filtering criteria utilizing multiple clustering metrics aim at reducing efficiently the number of medoids’ observations by response level. An optimal set of medoids is selected as prototypes from each combination of response levels upon assessing average prediction accuracy along with a few other features. For both procedures, only the highly repeatedly selected medoids across the different combinations of response levels are included in both corresponding final selections.

Results: This work implemented both proposed methods across all 15 combinations of response levels, and eventually identified 6 most commonly selected medoids prototypes achieving an average data representativeness of 0.7148 by the original response levels. As a major trade-off, they only resulted in an average prediction accuracy of 0.3225 (95% CI: 0.2234 - 0.43524) by 5-fold cross-validation. In this final selection, the medoid from the second most frequent response level ‘3: Very Good’ had the highest representativeness of 0.824742, the medoid from the most frequent response level ‘2: Good’ had the second highest representativeness of 0.793814, and the one from the least frequent response level ‘0: Poor’ had the lowest representativeness of 0.546392. This work also found that applying either the K-Medoids Prototype-Based classifiers or the optimized SVM classifiers improved the average prediction accuracy but consistently at a cost of using nearly all 400 observations. Both types of existing classifiers thus incurred high costs of model complexity and non-interpretability.

Conclusions: For future research, one of the major considerations lies in automating the manual implementations of both proposed methods, and modifying the current procedures based on different types of variables is also to be considered for enhancing the diversity and consistency of medoids’ final selection.

Defence Committee:

Chair: Dr. Catharine Schiller, University of Northern British Columbia

Supervisor: Dr. Kevin J. Keen, University of Northern British Columbia

Committee Member: Dr. Michael D. de B. Edwardes, Everest Clinical Research Inc.

Committee Member: Dr. Mohammad El Smaily, University of Northern British Columbia

External Examiner: Dr. Brian Franczak, MacEwan University

Contact Information

Graduate Administration in the Office of the Registrar, 

University of Northern British Columbia  

Email: grad-office@unbc.ca

Web:  https://www2.unbc.ca/graduate-programs