4 Experimental DesignThe searches were carried out on the MDL Dr

4. Experimental DesignThe searches were carried out on the MDL Drug Data Report (MDDR) database. The 102,516 molecules in the MDDR database were converted to Pipeline Pilot ECFC_4 fingerprints and folded to give 1024-element fingerprints [28].For the screening experiments, three data sets (DS1�CDS3) [29] were chosen from the MDDR database. Dataset DS1 contains selleckchem Imatinib 11 MDDR activity classes, with some of the classes involving actives that are structurally homogeneous and others involving actives that are structurally heterogeneous (structurally diverse). The DS2 dataset contains 10 homogeneous MDDR activity classes and the DS3 dataset contains 10 heterogeneous MDDR activity classes. Full details of these datasets are given in Tables Tables11�C3.

Each row in the tables contains an activity class, the number of molecules belonging to the class, and the class’s diversity, which was computed as the mean pair-wise Tanimoto similarity calculated across all pairs of molecules in the class using ECFP6. The pair-wise similarity calculations for all datasets were performed using Pipeline Pilot software [28]. Table 1MDDR activity classes for DS1 dataset.Table 3MDDR activity classes for DS3 dataset.For each dataset (DS1�CDS3), the screening experiments were conducted with 10 reference structures selected randomly from each activity class and the similarity measure used to obtain an activity score for all of its compounds. These activity scores were then sorted in descending order with the recall of the active compounds, meaning the percentage of the desired activity class compounds that are retrieved in the top 1% and 5% of the resultant sorted activity scores, providing a measure of the performance of our similarity method.

3. Results and DiscussionOur goal was to identify different retrieval effectiveness of using different search approaches. In this study, we tested the TAN, BIN, and BINRF models against the MDDR database using three different datasets (DS1�CDS3). The results of the searches of DS1�CDS3 are presented in Tables Tables44–6,6, respectively, using cutoffs at both 1% and 5%.Table 4The recall is calculated using the top 1% and top 5% of the DS1 data sets when ranked using the TAN, BIN, and BINRF.Table 6The recall is calculated using the top 1% and top 5% of the DS3 data sets when ranked using the TAN, BIN, and BINRF.

In these tables, the first column from the left contains the results for the TAN, the second column contains the corresponding results when BIN is used, and the last column of each table contains the corresponding results when BINRF is used.Each row in the tables lists the recall for the top 1% and 5% of a sorted ranking when averaged over the ten searches for each activity class; and the penultimate row in each table corresponds to the mean value for that similarity method when averaged over all of the activity Drug_discovery classes for a dataset.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>