Synthetic Data Generation: Generating High-Utility Synthetic Parcel Data; Applications and Implications of Synthetic Data

Author: ORCID icon
Jiang, Janessa, School of Engineering and Applied Science, University of Virginia
Elliott, Travis, EN-Engineering and Society, University of Virginia
Baritaud, Catherine, EN-Engineering and Society, University of Virginia
Graham, Daniel, EN-Comp Science Dept, University of Virginia

As technological advancements occur at an exponential rate, the demand for data also increases. The spread of information through online platforms has also raised concerns about data privacy. To address scarcity and privacy concerns, synthetic data has been gaining popularity and acceptance in various fields. Synthetic data is algorithmically-created and serves as an alternative to real-world data. Due to the applicability of this technology, the impact of synthetic data will continue to grow as the field of artificial intelligence flourishes. An example of data that requires anonymity is customer personal information. At Amazon Web Services, customer billing data and usage statistics are bundled into packages called parcels. Data scarcity and risk of re-identification of anonymized data were issues related to using real parcel data for testing. The technical report follows the research, process, and challenges faced while generating high-utility synthetic data to fit business needs. The STS research paper further explores applications of synthetic data in addition to the social, ethical, and environmental implications of this technology. Synthetic data usage comes with significant risks related to bias, lack of consistent legislation, and environmental impacts.

BS (Bachelor of Science)
artificial intelligence, synthetic data
All rights reserved (no additional license for public reuse)
Issued Date: