Benchmarking LLM-Based Synthetic Data Generators for Structural Coherence in behavioral and Human-centered Datasets

Application process

A completed online application must be submitted by 4.30 pm 22 September 2025. Late or incomplete applications will not be accepted. Any required supporting documentation (including references) must also be received by 4.30 pm on the closing date in order for the application to be considered.

Project number

201

Project description

This project benchmarks large language model (LLM)-based synthetic data generators, such as GReaT and TabulaLLM, with a focus on their ability to preserve structural coherence in behavioural and human-centred datasets.

These datasets encompass psychological, educational, and user behaviour data that often include ordinal scales, categorical variables, logical constraints, and complex theory-driven relationships unique to human-centred research.

The project will evaluate how effectively current LLM-based models generate synthetic data that maintains these important structural and semantic properties.

This project is for a single student.

Location

Mainly at the University.

Supervisor

Senior Lecturer
School of Information Management