Paradigm for Including Privacy: A Proposal for CS 4710; Mutual Benefit: How Communicating Privacy Risk of Healthcare Big Data Benefits All

Peng, Kelvin, School of Engineering and Applied Science, University of Virginia
Seabrook, Bryn, EN-Engineering and Society, University of Virginia
Graham, Daniel, EN-Comp Science Dept, University of Virginia
Vrugtman, Rosanne, EN-Comp Science Dept, University of Virginia

Machine learning privacy encompasses both technical and ethical concerns. Technical awareness and advancements in privacy protection affect the privacy policy environment, just as policy forms the permissible bounds for how – and what – technology can be used. This work investigates the relationship between specific aspects of privacy machine learning research education and the policy which researchers use to guide their work and relationship with patients. Education of students in the existence of privacy risk in machine learning affects the awareness of the concept in the technical working body of machine learning scientists. Awareness in turn directly affects the rate of use of privacy-preserving techniques in machine learning – without knowledge of the risk, there is no ability to counter privacy threats. The legal and policy environment regulates the practice of machine learning usage in healthcare, and because of this likewise influences machine learning education requirements.

Awareness of privacy risk in machine learning is the foundation for practical reduction of the risk. Reforming machine learning education to include exposure to privacy threat concepts can be important in promoting this awareness. By including exercises and lessons on privacy risk in CS 4710 – Artificial Intelligence at the University of Virginia, educators may provide an understanding and awareness of privacy risk in machine learning to undergraduates. The issue of privacy risk is of great importance, as evidenced by several prominent institutions offering graduate-level courses on the topic. The technical capstone proposes an additional focus when teaching artificial intelligence, including machine learning, courses: privacy. The new paradigm seeks to expose undergraduate students to the concept of privacy risk while developing the critical thinking skills necessary to identify the presence of the risk. And as new risk threats are discovered, the paradigm of privacy risk education in artificial intelligence and machine learning should place the threats on the forefront of the curriculum. With exposure to privacy risk as a concept in the academic setting, students will be able to identify threats more readily in practice and protect those who contribute to research datasets. And by integrating artificial intelligence education with that of privacy risks of the technology, a holistic awareness of the issue can be conveyed to students. Students will be able to be more prepared for the threats which face their work in artificial intelligence and protect individuals private data in practice.

Being able to counteract privacy threats in machine learning is not enough to address privacy as an issue. A significant purpose of reducing privacy risk is building trust in the institutions which conduct machine learning research. In the application of machine learning to healthcare, this effect is evermore important as patients’ consent for data use, and thus their buy-in, is needed to carry out effective research. Doing so is not a trivial task. Institutions have lost the public’s trust when dealing risk; the mass media has consolidated its power and now exerts great influence in the public’s perception of risks. Without the trust in institutions to rely on, researchers must communicate the risks – and benefits – of machine learning in healthcare in a new paradigm. Social media presents as an ideal environment for the healthcare industry to communicate regarding healthcare machine learning. Perceptions of risk are increasingly being developed on social media platforms. Researchers and healthcare providers must pursue a policy of openness surrounding their use of patients’ data, and as such must communicate developments clearly. Using social media is the ideal, but not the only, format for such communication. This solution is only a part of the socio-technical framework which should govern relationships of risk between researchers and patients. Rather, the framework of risk analysis and policies of openness and communication, in whatever form, should dictate researchers’ behavior.

Technical achievements do not exist in a vacuum. Nor do political developments. There exist important, influential interactions between the technical and social effects of machine learning in healthcare. Investigating these effects simultaneously provides a holistic understanding of the entire field of machine learning in healthcare. A single-perspective view of privacy risk in machine learning is myopic. Without understanding of the prevalence of privacy risk vulnerabilities in machine learning technology, the formation of best practice policies regulating the practice cannot be adequately guided. The topics of importance in machine learning privacy education cannot be known without understanding of the socio-technical constrains of privacy rights and healthcare law. Only a full-breadth understanding can produce solutions which encompass the entire problem, and thus provide a true solution and not a temporary fix.

BS (Bachelor of Science)

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Daniel Graham, Rosanne Vrugtman

STS Advisor: Bryn Seabrook

All rights reserved (no additional license for public reuse)
Issued Date: