No gene sets were produced for the rest 118 drugs because no genes in their samples were consistently differential expressed. There are also 25 drugs which have only 1 sam ple in MCF7 cell line. As the result, these 118 MCF7 cell line inconsistent drugs as well as the 25 single sample drugs were removed. Figure 1. C shows the identified Enzalutamide pancreatic cancer signa ture gene sets for three drugs Estradiol, estrol and raloxi fene. Estradiol and Estrol are two forms of estrogen, which plays an important role in human breast cancer. It is therefore nature to see that the signature gene sets of these two drugs share many genes that also have similar expression patterns. For instance, genes EGR3, MYBL1 and C8orf33 are significantly up regulated and EFNA1 are down regulated after treated by both drug.
Furthermore, these genes are highly relevant to breast cancer. EGR3 encodes a transcriptional regulator that belongs to EGR family and has been shown to be involved in the estrogen signaling pathway in breast cancer. MYBL1 belongs to a group of genes that encode the MYB proto oncogene protein. MYB has been shown to be highly expressed in ER breast tumors and tumor cell lines and is essential for the proliferation of ER breast cancer cells. EFNA1 encodes a member of the ephrin family. It is highly compartmentalized in normal breast tissue and lost in inva sive cancers. it is plausible to observe its down regulation after the E2 treatment. For the third drug, raloxifene, it is a known estrogen receptor modulator aiming at inducing the estrogen level. Our resulted signature includes both EGR3 and MYBL1 genes being down regulated.
This simi GSK-3 larity between the identified Estrol and Estradiol signature gene sets suggest that they may share similar MoA. In contrast, the reverse correlation between the raloxifene and E2 gene signatures suggest that their MoA may be opposite to each other. Later analysis indeed showed that E2 and Estrol as well as other 15 drugs are detected to be within the same MoA while roloxifene was predicted top ranked in the reverse prediction list with an independent E2 treatment sample. These results demonstrated that the signature gene sets selected by our proposed algorithm are biologi cally meaningful. Quality control Quality control is applied on the drugs of cMap MCF7 cell line drugs with more than 3 samples.
The goal of quality control is to remove the samples that are not consistently expressed with the others. Our investigation of the cMap data revealed that, there was a considerable amount of out lier samples, whose expression patterns differ significantly from different the rest in the same drug. Including these outliers would introduce only noise in defining the MoA and it is therefore important to remove the outlier sam ples. Note that signature gene set selection could also serve the purpose of quality control since some drugs could end selected no gene set.