Table of contents
  1. About Me
  2. 2023
    1. CR4CR Autograder Model Presentation
    2. CR4CR GeoGebra Interactive Proof Presentation
    3. Inference and Prediction on Crude Diabetes Prevalence in U.S. States Based on Vegetable Consumption
  3. 2022
    1. Replication and Improvement on “How do 401(k)s Affect Saving? Evidence from Change in 401(k) Eligibility”
    2. Effects of International Monetary Fund’s Financial Crisis Policy Program in the Republic of Korea
    3. Transit and Housing in California
  4. Gap Years Due to Military Service (2020-2022)
  5. 2019
    1. SAAS x Trace Data
  6. 2018
    1. Meaning of Probabilities in Social Sciences
    2. Analyzing Undergraduate Statistics Majors’ Preparation in Communication with Non-Statisticians in the University of California, Berkeley
    3. Is there a statistical relationship between a region’s legalization of euthanasia and that region’s suicide rate?
    4. Philosophy of Human Rights

About Me

Welcome to My Portfolio Website!

image

My name is Yeh Chan (Yehchan) Yoo! I am currently a student in the Master of Science program in Statistics - Advanced Methods and Data Analysis at the University of Washington – expected to graduate in March 2026! I graduated from the University of California, Berkeley, in December 2023 – having studied Statistics and Political Economy + minored in Data Science as an undergraduate. I also worked as the primary Data Scientist for Mindful Conversion in 2024 before heading to the University of Washington for graduate school.

This website was built to host some of the research work I did during the last few years; I sincerely hope you enjoy reading and interacting with my work!

(The work I have below is arranged by creation date from most recent to least recent.)

2023

CR4CR Autograder Model Presentation

Presented on October 9, 2023 (Link, PDF Link)

In this presentation, I share the results of my research as a research assistant with UC Berkeley’s BEAR Center on the CR4CR project. The goal of the project was to explore how RoBERTa – a state-of-the-art large language model – could be applied to automatically grade short answers - a task with significant implications for scaling educational assessment. Through data collection, model training, and rigorous evaluation of a test set, I was able to develop a grading system that achieved a test accuracy of 75% when assessing short answers. In this presentation, I discuss the methodology, results, and limitations of the research, to further our understanding of both the potential and challenges of leveraging powerful deep learning models like RoBERTa for educational applications.


CR4CR GeoGebra Interactive Proof Presentation

Presented on October 6, 2023 (Link, PDF Link)

In this presentation for UC Berkeley’s BEAR Center, I discuss my past work working with GeoGebra and the potential for the program to be used in developing interactive educational material. GeoGebra is a free educational software that allows for both the creation and sharing of dynamic visualization of geometry and algebra. I discuss what the development process is like for developing GeoGebra visualizations and share my tips for developing these visualizations effectively. I also share some of the applets I made on GeoGebra, which garnered over 15,000 views overall.


Inference and Prediction on Crude Diabetes Prevalence in U.S. States Based on Vegetable Consumption

Last updated on May 12, 2023 (Link)

This paper was written in collaboration with my classmates Christina Đặng, Conan Minihan, and Tetsuro Escudero as the final project report for DATA 102: Data, Inference, and Decisions, taught by Mr. Ramesh Sridharan and Professor Eaman Jahani for the Spring 2023 semester. This paper uses inferential and predictive techniques to examine the relationship between vegetable consumption and crude diabetes prevalence in American states and predict diabetes prevalence based on vegetable consumption.

2022

Replication and Improvement on “How do 401(k)s Affect Saving? Evidence from Change in 401(k) Eligibility”

Last updated on December 16, 2022 (Link)

This paper was written in collaboration with my classmate Xinyi Zi as the final project paper for STAT 156: Causal Inference, taught by Professor Peng Ding for the Fall 2022 semester. This paper attempts to explore, replicate, critique, and re-do Professor Alexander M. Gelber’s 2011 causal inference paper “How Do 401(k)s Affect Saving? Evidence from Changes in 401(k) Eligibility”.


Effects of International Monetary Fund’s Financial Crisis Policy Program in the Republic of Korea

Last updated on December 9, 2022 (Link)

This paper was written as my term paper for POLECON 101: Contemporary Theories of Political Economy, taught by Mr. Khalid Kadir for the Fall 2022 semester. Using some of the complex political economy theories taught in this course, this paper explores the political economy perspective of how the International Monetary Fund’s program in response to South Korea’s financial crisis in the late 1990s affected South Korean society in the short, medium, and long run.


Transit and Housing in California

Last updated on November 13, 2022 (Link)

This article was made within 3 days for the Fall 2022 UC Berkeley Datathon for Social Good and won second place in the Urban Studies track. For this article, my Datathon team (consisting of Gain Boonavich, Anita Ding, Yixin Huang, and myself) used Python to perform linear regression and create data visualizations to dive into the relationship between the amount of investment in transit and the number of housing units in Californian counties.

Gap Years Due to Military Service (2020-2022)

From May 4, 2020, to February 3, 2022

I served in the Republic of Korea Air Force as a translator sergeant from 2020 to 2022 – translating various government, legal, mechanical, and logistics documents (from Korean to English and from English to Korean) to facilitate smooth communication on logistical issues between the Republic of Korea Air Force and various foreign arms manufacturers.

2019

SAAS x Trace Data

Last updated on December 13, 2019 (Link)

Wordcloud from SAAS x Trace Data

During the Fall 2019 semester, as a member of the Data Consulting committee in the Student Association for Applied Statistics (SAAS), I worked in a team to create our own JSON key-value pair classification systems using machine learning and natural language processing models with data provided by a startup named Trace Data (later acquired by Netskope). More specifically, I worked with Amal Bhatnagar to create an unsupervised clustering algorithm using tf-idf as the main metric.

2018

Meaning of Probabilities in Social Sciences

Last updated on December 20, 2023; originally written during the Fall 2018 semester (Link)

As a declared Statistics major interested in social sciences, I often found that probability was used a lot in social science research. But I often wondered: what do these probabilities fundamentally mean? I wrote an article on the meaning of probabilities in social sciences to help answer this question during my time as a member of the Research and Publication Committee in Statistics Undergraduate Student Association (now called SAAS or Student Association for Applied Statistics).

  • The title links to the raw HTML file of my article. The officially published version of my article is linked here.

Analyzing Undergraduate Statistics Majors’ Preparation in Communication with Non-Statisticians in the University of California, Berkeley

Last updated on May 9, 2018 (Link)

Assessing the preparation of undergraduate statistics majors in statistical writing for non-statisticians at the University of California, Berkeley, revealed a significant gap. Despite substantial training in statistical writing within major classes, students lacked practical experience communicating with non-statisticians. Interviews with professors indicated a hesitancy to enforce stricter writing requirements, citing resource constraints and a desire for program growth. However, the findings underscored the pressing need for increased resources devoted to statistical writing education within the Department of Statistics. This project was written for my final project for the COLWRIT R4B class on discourse conventions in various academic fields.


Is there a statistical relationship between a region’s legalization of euthanasia and that region’s suicide rate?

Last updated on May 1, 2018 (Link)

Statistical analysis of data from Mexico indicates that the legalization of passive euthanasia in certain Mexican regions likely is unrelated to the Mexican regions’ raw suicide rates in the short run. Difference-in-difference analysis on data from the Netherlands (and Norway) indicates that, while major events towards the legalization of active and passive euthanasia may have had a decreasing short-run impact on the raw suicide rate of the Netherlands, such effect – if present – likely became diluted over time. This research article was written during my time as a member of the Research and Publication Committee in Statistics Undergraduate Student Association (now called SAAS or Student Association for Applied Statistics).


Philosophy of Human Rights

Last updated on April 11, 2018 (Link)

Amnesty International logo

This presentation was created to teach members of Amnesty International at Berkeley about the philosophical meaning and questions behind the concept of human rights during my time in the organization.

  • The title links to the PowerPoint version of my presentation. For the pdf version, go here.