KRAS and EGFR Scatter plot
import CanDI.candi as can
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import cluster, decomposition, preprocessing
Cancer Object Instantiation
I’m interested in studying non-small cell lung cancer using the data in depmap and ccle. I start by instantiating a cancer object that will allow me to explore the data space of non-small cell lung cancer cell lines. Since I don’t want any small cell lung cancer cell lines included I will specify a disease subtype during instantiation. The subtype argument of Cancer object instantiation works by string matching in the lineage_subtype collumn of the cell_lines dataset. Below you can see that we have a variety of cell types within a given lineage subtype.
lung = can.Cancer("Lung Cancer", subtype="NSCLC")
lung.subtypes
array(['Non-Small Cell Lung Cancer (NSCLC), Adenocarcinoma',
'Non-Small Cell Lung Cancer (NSCLC), Large Cell Carcinoma',
'Non-Small Cell Lung Cancer (NSCLC), unspecified',
'Non-Small Cell Lung Cancer (NSCLC), Squamous Cell Carcinoma',
'Non-Small Cell Lung Cancer (NSCLC), Adenosquamous Carcinoma',
'Non-Small Cell Lung Cancer (NSCLC), Mucoepidermoid Carcinoma'],
dtype=object)
I want to look at how oncogenic mutations affect global genetic dependencies. Let’s choose KRAS and EGFR as our oncogenic mutations. I’m going to make two CellLineCluster objects per oncogene, eight in total. For each oncogene I want to make a CellLineCluster where the oncogene of interest is mutated and another where it is wild type.
To Analyze KRAS
Lung - KRAS MT
Lung - KRAS WT
To Analyze EGFR
Lung - EGFR MT
Lung - EGFR WT
MT = Mutant
WT = Wild Type
#Mutated function automatically ignores silent mutations
kras_mt_ids = lung.mutated("KRAS", variant="Variant_Classification", item = "Missense_Mutation")
egfr_mt_ids = lung.mutated("EGFR", variant="Variant_Classification")
kras_wt_ids = list(set(lung.depmap_ids) - set(kras_mt_ids))
egfr_wt_ids = list(set(lung.depmap_ids) - set(egfr_mt_ids))
#Instantiate KRAS Clusters
kras_mt = can.CellLineCluster(kras_mt_ids)
kras_wt = can.CellLineCluster(kras_wt_ids)
#Instantiate EGFR Clusters
egfr_mt = can.CellLineCluster(egfr_mt_ids)
egfr_wt = can.CellLineCluster(egfr_wt_ids)
Load Complete
Analyzing Global Gene Dependency
To see how KRAS and EGFR mutations affect global gene dependency I’m going to plot the average gene effect for every gene of the mutant and wildtype clusters against each other. This if gene effect skews towards wildtype or mutation status for any give gene. The Function below will be used to make this plot. Unless you are interested in specifically how this plot is made you can skip the following cell.
def gene_effect_scatter(mt, wt, gene, control, tc1=None, tc2=None, name=None):
#Average Gene Effect for control agnostic groups
mt_effect = mt.gene_dependency.mean(1)
wt_effect = wt.gene_dependency.mean(1)
#For Labeling
mt_lab = mt_effect.loc[[gene, control]]
wt_lab = wt_effect.loc[[gene, control]]
#Make Figure appropriate size, dpi, and font
plt.rcParams.update({"figure.figsize": (8, 8),
"savefig.dpi": 300,
"font.family": "sans-serif",
"font.size": 12
})
#Generate Figure and Axis objects
fig, ax = plt.subplots(1,1)
#Label Axes
ax.set_xlabel(f"{gene} MT Average Gene Effect (CERES Score)")
ax.set_ylabel(f"{gene} WT Average Gene Effect (CERES Score)")
#Draw Line at median common essential value
ax.axhline(y = 0.50,
c = "black",
linewidth=0.5,
label = "Minimun Gene Dependencey Probability"
)
ax.axvline(x = 0.50,
c= "black",
linewidth=0.5)
#Plot all genes
ax.scatter(mt_effect,
wt_effect,
c = "#2166ac",
alpha = 0.7,
s = 50
)
#Outline Genes To label
ax.scatter(mt_lab,
wt_lab,
c = "#2166ac",
s = 50,
edgecolor = "black",
linewidth = 2,
alpha = 0.7
)
ax.legend()
#Label control agnostic Series
if tc1:
for i in range(mt_lab.shape[0]):
text = list(mt_lab.index)
ax.annotate(text[i],
xy = (mt_lab[i], wt_lab[i]),
xytext = tc1[i],
xycoords = "data",
arrowprops = {"arrowstyle": "-"}
)
plt.show()
if name:
fig.savefig(name, dpi=300)
return
Note about Gene Effect Scores: Dependency vs Essentiality
A more negative gene effect means more dependent. A gene effect of -1.0 is the median gene effect of all common essential genes. If a gene has a gene effect of -1.0 or lower it then that gene is essential. A cell line can still be dependent on a gene with a lower gene effect if knocking out that gene slows growth/proliferation.
Average Gene Effect in KRAS Wildtype and KRAS Mutant Cell Lines
KRAS dependency heavily favors KRAS mutant cell lines. No other gene’s depedencies are as skewed toward KRAS mutant cell lines. KRAS mutations appear to be self essentializing.
#Text_coords are custom per graph
text_coords1 = [(0.95, 0.25), (0.1, 0.4)]
text_coords2 = [(0.95, 0.25), (0.1, 0.4)]
gene_effect_scatter(kras_mt,
kras_wt,
"KRAS",
"EGFR",
tc1 = text_coords1,
tc2 = text_coords2,
name= None#figures/kras_gene_dependency_scatter.pdf"
)
Load Complete
/tmp/ipykernel_2402738/1514998632.py:62: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
xy = (mt_lab[i], wt_lab[i]),
Average Gene Effect of EGFR MT vs EGFR WT Cell Lines
We don’t observed the same self esentializing effect with EGFR and EGFR mutations as we do with KRAS and KRAS mutations. The KRAS point moves upwards along the diaganol when KRAS mutant cell lines are removed, indicating the self essentializing effect of KRAS mutations was represented equally in both EGFR mutant and EGFR wild type cell lines.
tc1 = [(0.4, 0.1), (0.2, 0.6)]
tc2 = [(0.4, 0.1), (0.2, 0.6)]
gene_effect_scatter(egfr_mt,
egfr_wt,
"EGFR",
"KRAS",
tc1 = tc1,
tc2 = tc2,
name = None#"figures/egfr_gene_dependency_scatter.pdf"
)
/tmp/ipykernel_2402738/1514998632.py:62: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
xy = (mt_lab[i], wt_lab[i]),