Article Abstract

Kinship identification based on rare severe variants from trio whole exome sequencing

Authors: Qian Qin, Zi-Xiu Li, Bing-Bing Wu, Hui-Jun Wang, Xin-Ran Dong, Yu-Lan Lu, Wen-Hao Zhou

Abstract

Background: To automatic identify potential sample swaps or kinship label errors for inter-family or intra-family whole exome sequencing (WES) data, this study added a kinship quality control module to the current trio WES analysis pipeline.
Methods: The study involved 105 trio WES, including 323 samples in total. The module used total variants data (total variants) or the rare, severe variants data during data processing (feature variants) to evaluate the similarity between samples. Then, the module identified different kinships based on sample similarity, including setting thresholds to distinguish related and non-related kinship and thresholds to reconstruct the level of kinship (first-, second-, and third-degree kinship).
Results: Based on total variants, no clear threshold of sample similarity could be set to identify the kinship. In contrast, sample similarity scores based on feature variants could not only accurately identify whether there is relationship between samples, but also reconstruct the pedigree tree among samples. At last, the study simulated sample swap events for two trios to test whether feature variants could accurately identify the swapped samples.
Conclusions: We developed a kinship quality control module into a pre-published NGS data processing pipeline (Fudan pipeline 2.0) to automate the identification of kinship or sample swap events.