GDPR and risk averse ethics committees could undermine global data research collaboration on data sets like MIMIC.

GDPR Fact sheet:

  • Entered into force in May 2016, but only applied from 25 May 2018
  • The GDPR is directly enforceable in all EU member states
  • Goals: (i) harmonizing data protection across the EU (ii) facilitating the flow of information across borders (iii) enhancing privacy protection

Developments in heath information systems in recent decades have made it increasingly possible to capture large amounts of data at the point of patient care. The use of this digital “big” data is creating new opportunities for machine learning and artificial intelligence (AI) applications to gain valuable insights and guide clinical decision-making [1].

The MIMIC database and the potential for growth

The MIMIC (Medical Information Mart for Intensive Care) database, maintained by the Laboratory for Computational Physiology at the Massachusetts Institute of Technology, provides a good example of a database being utilised for AI research [2].

MIMIC-III is the third iteration of the database, which contains clinical data acquired during the routine hospital care of patients admitted to critical care units at the Beth Israel Deaconess Medical Center in Boston, Massachusetts, between 2001 and 2012 (data to 2016 is planned to be added this fall).

Data include vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more [2]. The requirement for individual patient consent has been waived by the local IRB for over 10 years because the project does not impact clinical care and because data are de-identified by removing all protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) before being included in the database [2].

The database is made freely accessible to researchers globally for secondary analysis once a data use agreement is accepted. MIMIC has proved to be a highly valuable resource, supporting, among other things, a wide range of AI research [2].

However, MIMIC is somewhat limited by the fact it is only a single-centre database. As a result, there have been attempts in recent years to scale up the project both nationally and internationally. A multicentre, multinational, database has a number of potential advantages, including allowing cross-validation across institutions to determine which findings and models are institution-specific and which are generalizable [3].

The role of ethics committees in promoting/curtailing data research

National efforts have led in recent years to the development of the multi-centre MIT-Philips eICU Collaborative Research Database, which contains data from patients admitted to critical care units between 2014 and 2015 in more than 300 hospitals throughout continental United States.

Although there have also been numerous efforts with international collaborators to set up other critical care databases (e.g. in China, Brazil, Belgium and Spain) and to link MIMIC with other established critical care databases (e.g. in the United Kingdom and France), these efforts have so far been undermined by local ethics committees suggesting that individual patient consent may be required for data to be included in databases or linked with MIMIC [4].

Ethics committees play an essential role in ensuring that patients’ interests regarding the use of their data are respected, however, there are growing concerns that ethics committees are overly concerned with risk and do not give sufficient consideration to the societal value of such research [5].

Risk averse decisions by ethics committees requiring individual informed consent for pseudonymised (de-identified) data to be used in databases can be problematic though, as they can undermine AI research by: (a) making the creation of databases impractical due to increased costs, and (b) creating selection bias due to insufficient data or unrepresentative “training data” from which the algorithm learns and identifies patterns [1,6].

Large variations between countries can also be a barrier to global collaborative health research efforts, and therefore decrease their generalisability and slow down knowledge discovery [7].

The impact of GDPR on data collaboration

The uncertainty of ethics committees regarding data research appear to have been only heightened by the new EU General Data Protection Regulation (GDPR), which entered into force in May 2016, but only applied from 25 May 2018. The GDPR is directly enforceable in all EU member states and was introduced with the ultimate goals of harmonizing data protection across the EU, facilitating the flow of information across borders, and enhancing privacy protection.

Early drafts of the GDPR raised significant concerns that the regulation may severely restrict data research [8]. While the final text adopted a more research-friendly approach [9], it may be that persistent concerns are partly a result of the concerns regarding the earlier drafts of the GDPR.

The GDPR applies to any personal data of data subjects residing in the EU, regardless of whether the processing takes place in the EU or not. Pseudonymized data is now explicitly recognised as personal data if it could be attributed to a natural person by the use of additional information.

However, the GDPR does not apply to anonymous or anonymised data. Consequently, databases being utilised for AI research are using fully anonymised data, they will need to either obtain explicit consent from the data subject or for the data to be processed under the scientific research exemption set out in the GDPR, which could occur without consent if subject to appropriate technical and organisational safeguards [10].

Nevertheless, the GDPR allows a lot of interpretation by member states on key aspects of data protection, including sufficient methods of pseudonymization; when data are considered fully non-identifiable; what further limitations should be set on processing sensitive data for research purposes; and sufficient safeguards and conditions for processing data under the research exemption [4]. This has raised concerns that the goal of the GDPR to harmonise data protection across the EU will be undermined [10].

It remains to be seen what the full impact of GDPR will be on AI research, however, it is important that the issue of consent requirements for databases is proactively addressed.

It has been suggested that negotiating sector-specific codes of conduct by professional bodies could help address this issue [11], by providing guidance to database curators, researchers, and ethics committees concerning necessary organizational and technical safeguards to protect patient’s rights without unduly impeding important research [4].

Greater harmonisation internationally in the use of data in AI research will be beneficial both to investigators and to members of the public.

  1. House of Commons Science and Technology Committee. (2018). Algorithms in decision-making.  Retrieved from
  2. Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035.
  3. Celi LA, Mark RG, Stone DJ, Montgomery RA. “Big data” in the intensive care unit. Closing the data loop. Am J Respir Crit Care Med. 2013;187(11):1157-60.
  4. McLennan S, Shaw D, Celi LA. The challenge of local consent requirements for global critical care databases. Intensive Care Medicine. 2018:
  5. Spector T, Prainsack B. Ethics for healthcare data is obsessed with risk – not public benefits. The Conversation 2018. Available online at:
  6. Tu JV, Willison DJ, Silver FL, Fang J, Richards JA, Laupacis A, Kapral MK. Investigators in the Registry of the Canadian Stroke Network. Impracticability of informed consent in the Registry of the Canadian Stroke Network. N Engl J Med. 2004;350(14):1414-21.
  7. Celi LA, Mark RG, Stone DJ, Montgomery RA. “Big data” in the intensive care unit. Closing the data loop. Am J Respir Crit Care Med. 2013;187(11):1157-60.
  8. Nyrén O, Stenbeck M, Grönberg H. The European Parliament proposal for the new EU General Data Protection Regulation may severely restrict European epidemiological research. Eur J Epidemiol 2014; 29:227–230
  9. Rumbold JMM, Pierscionek B. The Effect of the General Data Protection Regulation on Medical Research. J Med Internet Res 2017;19(2):e47.
  10. Shabani M, Borry P. Rules for processing genetic data for research purposes in view of the new EU General Data Protection Regulation. Eur J Hum Genet. 2018; 26: 149-156.
  11. BBMRI-ERIC. Position Paper on General Data Protection Regulation 2015. Available online at:
Author names and affiliations:

Stuart McLennan1, David Shaw1,2, Leo Anthony Celi3,4


1 Institute for Biomedical Ethics, University of Basel, Basel, Switzerland.

2 Care and Public Health Research Institute, Maastricht University, the Netherlands

3 Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA

4 Harvard–Massachusetts Institute of Technology Division of Health Sciences and Technology, Cambridge, MA, USA


*Corresponding author:

Dr. Stuart McLennan

Institute for Biomedical Ethics

University of Basel

Bernoullistrasse 28

4056 Basel


Email: [email protected]


Conflict of interests:

This work was supported by the Swiss National Science Foundation´s National Research Programme “Smarter Health Care” (NRP 74) and the Universität Basel´s Forschungsfonds for excellent young researchers. LAC works at the Laboratory for Computational Physiology at the Massachusetts Institute of Technology, which developed and maintains the Medical Information Mart for Intensive Care (MIMIC) database. The authors have no other competing interests to declare.