We propose a new method for analysis of longitudinal/repeated measurements data with missing identifiers for all subjects. In this pseudo longitudinal setting, the aggregated information consisting of all repeated observations of all subjects is available but information about the links connecting these observations to the correct study participants are unavailable. These scenarios occur when subject identification variable was not created initially due to various study restraints or such information was subsequently lost or kept out of reach due to confidentiality or data access problems. Classical complete likelihood approaches are inapplicable for these data designs due to the large number of possible observation connecting links (and extreme computational complexity that that entails) that they have to explicitly and exhaustively encompass. The method we propose is based on maximization of appropriately constructed pseudo likelihood of the observed data (that eliminates the least likely links from consideration) combined with a range of parametric effect and error size models that greatly reduce the computational time. We conduct a large simulation study to assess the performance of this approach over various scenarios both under the null and alternative hypotheses. We show an example of the implementation of this novel approach to analysis of quality improvement of nursing care data where the study participant identifying variable was not available. The data consist of 51 pre and post knowledge assessment scores on ten survey items as well as background information regarding nursing unit, education level, certification, years of experience in nursing and unit, work day schedule and employment status.


Author: Cyril Rakovski

Coauthor(s): Sadeeka Al-Majid, PhD

Status: Work In Progress