Human Cancer Cell Lines Fact And Fantasy PdfBy Bonnie R. In and pdf 07.12.2020 at 09:30 4 min read
File Name: human cancer cell lines fact and fantasy .zip
Oncotarget a primarily oncology-focused, peer-reviewed, open access, biweekly journal aims to maximize research impact through insightful peer-review; eliminate borders between specialties by linking different fields of oncology, cancer research and biomedical sciences; and foster application of basic and clinical science. Its scope is unique. The term "oncotarget" encompasses all molecules, pathways, cellular functions, cell types, and even tissues that can be viewed as targets relevant to cancer as well as other diseases.
- Isogenic human disease models
- False cell lines: The problem and a solution
- False cell lines: The problem and a solution
Breast cancer cell lines are frequently used to elucidate the molecular mechanisms of the disease. However, a large proportion of cell lines are affected by problems such as mislabeling and cross-contamination. Therefore, it is of great clinical significance to select optimal breast cancer cell lines models. Using tamoxifen survival-related genes from breast cancer tissues as the gold standard, we selected the optimal cell line model to represent the characteristics of clinical tissue samples.
Isogenic human disease models
Oncotarget a primarily oncology-focused, peer-reviewed, open access, biweekly journal aims to maximize research impact through insightful peer-review; eliminate borders between specialties by linking different fields of oncology, cancer research and biomedical sciences; and foster application of basic and clinical science.
Its scope is unique. The term "oncotarget" encompasses all molecules, pathways, cellular functions, cell types, and even tissues that can be viewed as targets relevant to cancer as well as other diseases. The term was introduced in the inaugural Editorial , Introducing OncoTarget. Sponsored Conferences. Impact Journals is a member of the Society for Scholarly Publishing. Keywords: cancer cell lines, next-generation sequencing, cell line-identification, DNA-sequencing, data-heterogeneity and incompleteness.
Cancer cell lines CCL are important tools for cancer researchers world-wide. However, handling of cancer cell lines is error-prone, and critical errors such as misidentification and cross-contamination occur more often than acceptable.
Based on the fact that CCL today very often are sequenced partly or entirely anyway as part of the studies performed, we developed Uniquorn, a computational method that reliably identifies CCL samples based on variant profiles derived from whole exome or whole genome sequencing. Notably, Uniquorn does neither require a particular sequencing technology nor downstream analysis pipeline but works robustly across different NGS platforms and analysis steps.
We evaluated Uniquorn by comparing more than CCL profiles from three large CCL libraries, embracing duplicates, against each other. Errors are strongly associated to low quality mutation profiles. The R-package Uniquorn is freely available as Bioconductor-package. CCLs help to uncover cancer etiology and to study the mode-of-action of anticancer drugs.
They are indispensable for functional investigation of proteins and pathways with much reduced ethical and legal issues compared to patient-derived tumor samples [ 1 , 2 ]. This error had wide-ranging, negative consequences because a number of research results were attributed to the wrong tissue-type.
Since no universally accepted nomenclature system for CCLs exists [ 1 , 8 ], researchers keep on inventing names of little discriminative power. T, but the similarity of both names makes mixing them up very easy. Meanwhile, high-impact journals require explicit verification of CCL integrity with respect to identity and absence of cross-contamination prior to publishing related research-results [ 1 ]. The usual way of establishing the identity of a CCL sample under study from now on called query sample q is to compare it to CCLs whose identity is known from now on called R, a library of reference samples by experimentally comparing certain cell line specific features [ 1 , 3 , 5 , 6 , 8 ].
Established identification methods differ in the characteristic genomic entity that is compared between q and the samples in R. Both methods require additional and costly experiments which do not contribute to the scientific goal of the original study.
Furthermore, in all available methods the genotyping-technology — including the subsequently used software — applied to analyze the query q and to analyze the references R must be identical for achieving the expected accuracy. This implies access to the physical samples, which is difficult in large projects with numerous partners where often only information on samples or data generated from these is exchanged, but not the samples themselves.
At the same time, modern CCL-based research is increasingly based on high-throughput next generation sequencing NGS [ 3 , 12 — 14 ]. It is a natural idea to use these profiles for identifying the origin of a given query sample within such a reference library or within multiple libraries. However, typical NGS procedures do not extract the kind of genetic information necessary for STR or SPIA-based identification, as both methods require homogeneous and locus-specific genotype data, but these loci are often omitted from sequencing or filtered afterwards because they are assumed to be unrelated to the cancer itself.
Furthermore, major chromosomal deletions, e. Thus, the information required for identification is not readily available. Both methods were evaluated only with homogeneous NGS profiles, i. Such a scenario of homogeneous, easily comparable NGS data sets is quite different from that typically found today, where different labs use different technologies, leading to heterogeneous NGS profiles.
For instance, Hudson et al. Causes for the data heterogeneity between large-scale sequencing projects are complex and include technical and design aspects. For example, sequencing of sub-clonal and aneuploid cancer-cell cultures may cause heterogeneous sequencing results [ 19 ]. Furthermore, studies differ in their aims and priorities, leading to different choices of algorithmic parameters and workflow designs which in turn can cause differing genotyping results even for the same CCLs [ 20 ].
Here, we present Uniquorn, a novel in silico approach for the robust and fast identification of CCLs within reference libraries based on their variant profiles.
Uniquorn uses only NGS data and is based on the assumption that already today, most experiments on CCLs involve extensive sequencing.
The algorithm is designed to compare variant profiles derived from a wide range of sequencing technology, quality, depth, and scope to make it useful for large and distributed research projects. Technically, Uniquorn is based on the computation of confidence-scores for the pairwise identity of the query sample to any sample from a reference library R, taking into account the prevalence of each variant in the library and a statistical assessment of the observed number of common variants.
NGS profiles between these libraries are highly heterogeneous, because different laboratories created the data using different technologies and software and even covering partly different genomic regions [ 18 ]. SNP-based identification using the available data is impractical, as in two out of these three sets all SNPs were filtered to facilitate identification of driver mutations.
Furthermore, neither of these data sets contains information on STRs. We also show that several pairs of cell lines which our method identifies as identical although they have different names indeed should be considered identical considering their extremely similar mutational profiles, and identify several candidates for cross-contamination of cell lines.
To this end, each variant in a reference library is weighted according to its inverse frequency. Only rare variants are used further. To assess the impact of different thresholds for this weight, we studied the distribution of variant counts in each of the three libraries Figure 2A. In Figure 2C , we show the distribution of the number of variants per CCL using different weight thresholds.
Figure 1: Uniquorn workflow. CCLs from a reference library are compared to a given query sample q based on their set of small variants variant profile. Variants are weighted according to their prevalence within the library e. CCLE and frequent variants are excluded afterwards.
Subsequently, Uniquorn computes a confidence score quantifying the likelihood for each reference sample r being identical to q. Significantly different amounts of variants in q and r affect the statistical test that assesses whether q and r are similar. Therefore, a regularization step calculates the minimal amount of matching variants required to predict that q and r are related. Figure 2: Distribution of CCL variant frequencies and weights across libraries.
Differences between software, technologies and filters non-exhaustive i. It is shown, that all panels possess unique, i. B Distribution of weights per library. C Number of variants per reference sample for different weight thresholds in the different reference libraries. We manually identified duplicates in this set and tested how reliably Uniquorn would detect them.
To this end, each of the CCL samples was once utilized as query-sample and all three libraries as references. Uniquorn predicted for each of the query-reference-pairs whether they were derived from the same cell line or not.
Results are shown in Table 1. The more important metric is sensitivity, which is also very high for thresholds 0. Limiting the comparison to unique variants weight threshold 1.
Quantitative regularization slightly reduces identification efficiency, but supresses many false positive predictions. Figure 3 shows more detailed performance characteristics.
Table 1: Results of cross validation for different weight thresholds columns 2 to 5. A higher threshold enforces utilization of more specific variants but reduces the amount of considered variants.
Depending on the threshold 0. Numbers in brackets show results when the to-be-expected amount of matching variants is set manually to 10 variants; numbers without brackets show statistically estimated background-noise strength regularized, see methods.
Figure 3: Results of the cross-identification benchmark depending on regularization and variant inclusion weight. A Number of false positives. B Number of false negatives.
C Number of false positives. D Number of true negatives. E Sensitivity. F F1-Score harmonic mean of specificity and sensitivity. G Specificity. H Positive Predictive Value. Best specificity and sensitivity values are achieved using a weight threshold of 0. A threshold of 1. The previous evaluation measured the performance of Uniquorn when searching a CCL of a reference library within the set of reference libraries. We also tested how the method performs when it has to deal with profiles that are not derived from CCLs.
Specifically, we used profiles from the genomes data set [ 21 ] as query samples and tested whether Uniquorn would assign them to a reference CCL — any such assignment certainly would be an error. Using a weight threshold of 1. By default, the regularization filter automatically measures the strength of the background-noise and adjusts the required amount of matching mutations accordingly.
However, users can set both thresholds manually to adapt to different reference libraries or to change the balance between false prediction rates and sensitivity see Figure 4 for ROC analysis. Thresholds 0. The vertical black line shows the Uniquorn default threshold confidence score of The threshold was chosen as optimal cutoff between sensitivity and specificity.
Uniquorn compares favourably to other methods for the identification of CCLs in terms of the amount of data and experimental work necessary see Table 2. In first place, it is similar to established methods e. Uniquorn, however, is different to the aforementioned methods due to its focus on in silico identification of CCLs based on variant profiles obtained from different high-throughput sequencing technologies. Unlike SNP-based methods, Uniquorn does not depend on common, well characterized and publicly available genomic entities, but instead relies predominantly on rare somatic mutations, as SNP-based comparisons have severe drawbacks when applied in cancer research.
Second, the loci of the most characteristic SNPs often are not genotyped during exome sequencing, and even less often so in panel sequencing. Moreover, cancer is frequently associated with large structural variants, often removing important loci, and with polyploid chromosomes whose variant calls cannot be directly compared to diploid references. Uniquorn was designed to robustly deal with such problems.
False cell lines: The problem and a solution
Hundreds of misleading reports are published every year containing data on human cancer cell lines that are derived from some other species, tissue or individual to that claimed. In consequence, millions of dollars provided for cancer research are being spent on the production of misleading data. This review describes how cross-contamination occurs, catalogues the use of false cell lines in leading biomedical journals, and suggests ways to resolve the problem. This is a preview of subscription content, access via your institution. Rent this article via DeepDyve. J Biol Chem — Cancer Res —
False cell lines: The problem and a solution
Isogenic human disease models are a family of cells that are selected or engineered to accurately model the genetics of a specific patient population, in vitro. They are provided with a genetically matched 'normal cell' to provide an isogenic system to research disease biology and novel therapeutic agents. Cancer is one such disease for which isogenic human disease models have been widely used.
Мидж смотрела на цифры, не веря своим глазам. - Этот файл, тот, что загрузили вчера вечером… - Ну. - Шифр еще не вскрыт.
Девушка засмеялась: - Это же чудо-маркер.