We give the corresponding correlations between these measures on our datasets in Table S1 in the Supplementary Material. The eigenvector centrality is an extension of the node degree in which connections to more important nodes have more impact on the score.
The nodes that are connected to many highly connected nodes end up having higher score than nodes which are connected to the same number of less-connected nodes Bonacich, The closeness centrality measures the average length of the shortest path from the node to other nodes.
The nodes with higher closeness centrality on average have smaller distance to the other nodes Bavelas, The betweenness centrality quantifies the frequency with which a given node appears in the shortest paths between nodes in the network. Thus, removal of nodes with high betweenness centrality has big impact on the shortest paths between nodes Freeman, Finally, information centrality is based on information along the paths from a given node to the other nodes Stephenson and Zelen, Besides quantifying several different topological features, we also annotate hub proteins, defined as proteins that interact with many proteins Jeong et al.
While early works on hub proteins defined them using a fixed minimal number of Jeong et al. This results in different cut-offs that define hubs for different interactomes different organisms and emphasizes the fact that hubs are a property of the whole interactome system rather than a property of individual proteins.
We used the latter definition using the cut-off that corresponds to the 90 th percentile of the interaction counts in the complete human PPI network, which is consistent with several recent studies Han et al. Hub proteins have increased levels of intrinsic disorder Meng et al. The disordered protein-binding regions are also linked to certain human diseases Uversky, Thus, we also annotate putative disordered protein binding regions. The selection of this method is motivated by the fact that it is accurate and popular, and provides fast predictions i.
We annotate and separately analyze the molecular functions, biological processes, and cellular components, where the latter define the subcellular locations.
We compare the sequence-derived and functional characteristics between the drug targets, non-drug targets, and possibly druggable proteins using statistical tests of significance of differences.
We used the Anderson-Darling test with the p -value cutoff of 0. We annotate the cellular functions and subcellular locations associated with a particular set of proteins using enrichment analysis offered by the PANTHER system Muruganujan et al.
This system generates a list of annotations that are statistically over-represented when compared with the annotations present in the whole human proteome. We measure similarity between two sets of proteins by comparing the cellular function and subcellular location GO terms associated with these two protein sets.
The set of the non-drug targets likely includes a relatively large number of druggable proteins. The ability to characterize properties that differentiate the drug targets and druggable proteins from the non-drug targets hinges on the annotation of the non-druggable and possibly druggable proteins in the set of these non-drug targets.
Druggability of proteins requires that they interact with a drug-like compound and that this interaction provides a desired therapeutic effects Hopkins and Groom, ; Russ and Lampel, ; Keller et al.
Thus, one way to annotate possibly druggable and non-druggable proteins is to analyze protein-disease associations. Figure 1 shows the fractions of the proteins associated with different classes of diseases among the drug targets and the non-drug targets.
As expected, the number of the disease associated proteins is significantly higher among the drug targets compared to the non-drug targets. The latter suggests that the non-drug targets include both non-druggable proteins those that lack association with any of the diseases and possibly druggable proteins those that are associated with diseases.
We note that the use of the diseases associations provides a partial support for their druggability since it does not address the ability of the possibly druggable proteins to interact with drug-like molecules. Figure 1 Fraction of drug targets and non-drug targets associated with different classes of diseases.
The green and red bars show the fraction of disease associated proteins among the drug targets and non-drug targets for each disease class. The disease classes are sorted by the value of the fraction of the drug targets. Figure 2 analyzes relation between the drug targets, non-drug targets, and disease associations. Figure 2A reveals that the disease-associated proteins are likely to be drug targets.
The fraction of drug targets increases for the proteins that are associated with more disease. This increase is sharper for a lower number of diseases and plateaus for proteins with about 10 or more disease associations. Therefore, we hypothesize that the non-drug targets with a relatively large number of disease associations can be used as a proxy for possibly druggable proteins.
Figure 2B is a Venn diagram that visualizes overlap between the disease associated proteins black borders , the drug targets dataset D; green border , and the non-drug targets dataset N; red border.
We define the set of the non-drug targets that are associated with 13 or more diseases as possibly druggable proteins Nd dataset; orange area in Figure 2B. The latter set of proteins constitutes the set of the non-druggable proteins Nn dataset.
Figure 2 Relation between drug targets, non-drug targets and diseases associations. Panel A shows the fraction of the drug targets among proteins associated with a given minimal number of diseases K.
Among the non-drug targets we define the Nn dataset of non-druggable proteins brown area , i. The high degree of the latter overlap suggests that the Nd dataset should include a substantial number of druggable proteins. Figure 3 further tests the validity of the hypothesis that the Nd and Nn datasets include the possibly druggable and the non-druggable proteins, respectively.
It quantifies similarity in the context of cellular functions and subcellular location between the drug targets, possibly druggable proteins, non-druggable proteins, and the non-drug targets. First, we generate a set of GO terms that are associated with each of these datasets, i. We perform this analysis separately for each of the three GO terms categories: molecular functions, biological processes, and cellular components; the latter is a proxy for the subcellular location.
The gray lines in Figure 3 shows the similarity values for each GO term category while the blue lines show the average across the three categories. The left-most set of results reveals that the cellular functions and subcellular location of the drug targets D dataset are similar to the possibly druggable proteins Nd dataset , which aligns with our hypothesis that the Nd dataset in fact includes druggable proteins.
The second set of results, which compares the drug targets against the non-druggable proteins Nn dataset , shows lack of similarity in the biological processes and subcellular locations and modestly reduced levels of similarity in the molecular functions.
The other two sets of results, which compare the possibly druggable against the non-druggable proteins and the drug targets against the non-drug targets, similarly reveal the lack of similarity in the biological processes and subcellular locations, while showing similarity in the molecular functions. The average similarities for these two dataset pairs are low and equal 0. To sum up, the above analysis demonstrates that drug targets and the possibly druggable proteins share much higher levels of functional and subcellular location similarity compared to the similarity between possibly druggable proteins, non-druggable proteins, and non-drug targets.
This finding, which uses an independent source of information compared to the approach we used to annotate the possibly druggable proteins, supports validity of our annotations of the possibly druggable and the non-druggable proteins. Figure 3 Similarity in cellular processes and subcellular locations between the drug targets D dataset , possibly druggable proteins Nd dataset , non-druggable proteins Nn dataset , and non-drug targets N dataset.
We measure similarity for four pairs of these datasets D vs. Nd, D vs. Nn, D vs. N, and Nn vs. Nd based on the comparison of the corresponding sets of GO terms associated with these datasets, i. The gray markers show the similarity for each GO-term category while the blue markers are the average across the three categories.
Our ability to identify novel druggable proteins relies on the understanding of functional and sequence-derived characteristics that differentiate drug targets from the non-drug targets. We compare a broad range of these characteristics between the drug targets, non-drug targets, possibly druggable proteins, and non-druggable proteins.
Figure 4 focuses on the characteristics derived directly from the protein sequence, including the residue-level conservation content of conserved residues in protein chains , number of domains and the content of domain-annotated residues, and the number of the alternative splicing isoforms. Altogether, relatively low numbers of the conserved residues are characteristics for the drug targets and these numbers are also relatively low among the possibly druggable proteins.
Interestingly, the residue-level conservation of the residues on the protein surface, where the protein-drug interaction occurs, follows the same pattern Figure 5E. This finding complements prior results that show that drug targets have lower evolutionary rates and higher similarity to orthologous genes Lv et al. Panels A shows the amount of conserved residues. Panels B and C focus on the protein domains while Panel D quantifies the number of splicing isoforms.
The whiskers show the 5 and 95 percentiles, the top and bottom of the box correspond to the first and third quartiles, the middle bar is the median, and the cross marker is the average. Panels A , B , and C quantify the abundance of intrinsic disorder while Panels D and E quantify the amount of surface and the amount of conserved residues on the surface, respectively. At the same time, they a similar number domains when contrasted with the possibly druggable proteins.
The underlying reasons for this enrichment could be two-fold. First, there could be proportionally more multi-domain proteins among the drug targets and the possibly druggable proteins.
Consequently, inclusion of a larger number of domains could increase the likelihood that these proteins host at least one druggable domain. However, our result could also mean that these proteins are more studied and understood, and thus their domain annotations are more complete. Moreover, the fact that at least close to half of proteins in all considered datasets have domain annotations, which suggests that they are functionally annotated, suggests that our functional similarity analysis in Figure 3 should be robust.
This suggests that enrichment in the number of alternative splicing variants could serve as a marker for druggability. The alternative splicing was found to contribute to drug resistance Siegfried and Karni, ; Zhao, , which supports veracity of our result. Interestingly, recent studies suggest that targeting alternative splicing events could lead to therapeutic opportunities Le et al. Our analysis also reveals that majority of the drug targets and the possibly druggable proteins have multiple isoforms.
Thus, gene level analysis of drug targets may not be adequate, considering that these genes would encode multiple proteins. Overall, we identified three potential sequence-derived markers of druggability. The drug targets and possibly druggable proteins share lower numbers of conserved residues and are more likely to have multiple domains and isoforms when compared to the non-druggable proteins.
This study is the first to analyze two relevant sequence-derived structural characteristics that can be accurately predicted from the protein sequence: intrinsic disorder and solvent accessibility. Proteins with disordered regions are associated with a wide range of human diseases Uversky et al. We note that while authors in Kim et al. Figures 5A—C quantify two key aspects of the disorder: the overall content of disordered residues and the length of disordered regions.
Proteins with higher disorder content are functionally distinct from structured proteins while long disordered regions are thought to correspond to disordered protein domains Tompa et al. This is in agreement with a recent study that demonstrates that the current drug targets are biased to exclude disordered proteins Hu et al. There are several reasons for this bias.
The protein structures are used during the rational drug design process Gane and Dean, ; Lundstrom, ; Mavromoustakos et al. The structures are also indispensable for modeling associated with drug repurposing and repositioning Moriaud et al. This is while proteins with disordered regions are much less likely to have structures Hu et al. This coincides with the observation that disordered regions are capable of interactions with multiple partners Oldfield et al. Our results suggests that although low disorder amounts are a strong marker for the current drug targets, the set of possibly druggable proteins includes large amounts of disorder.
In fact, the disordered proteins may become the key to unlocking a substantial portion of yet to be discovered druggable targets Uversky, ; Hu et al. This could be driven by the fact that drug targets are often membrane proteins Yildirim et al. They are also mostly structured proteins Hu et al. Moreover, presence of disordered regions on the protein surface also leads to an increase of the surface area compared to structured conformations Wu et al.
This again, like in the case of the results in Figure 4 , shows that the possibly druggable proteins are more similar to drug targets than to the non-druggable proteins. Finally, we observe that the number of conserved residues on the putative surface Figure 5E maintains the same relation between the different protein sets as the overall number of conserved residues shown in Figure 4A , i.
Topological features of the PPI networks are among the most studied characteristics of the drug targets Zhu et al. A unique aspect of our analysis is that we focus on a set of orthogonal measures, i. This offers a more focused and balanced analysis given the high degree of similarity between many of these measures. Our results are in line with several prior studies that correspondingly show that drug targets have more connected and denser local network neighborhoods Zhu et al.
This finding suggests that drug targets are possibly more relevant biologically or are at a higher point of control and thus can better modify physiology, making them better therapeutic targets.
The novel element in our study is that we find that all considered network centrality measures for the possibly druggable are even higher than for the drug targets orange vs. Consequently, they are also significantly higher than for the non-druggable proteins orange vs.
Thus, our study suggests that these measures can be used as markers of druggability. Panels A , B , C , and D concern the betweenness centrality, eigenvector centrality, closeness centrality, and information centrality measures, respectively. Figure 7 analyzes the abundance of the PPI network hubs among the drug targets, possibly druggable and non-druggable proteins.
Similarly large difference was observed in Mitsopoulos et al. Our study reveals additional important details. This suggests that high connectivity in the PPI network is a strong marker for druggability. Several studies analyzed cellular functions and subcellular locations of the drug targets Lauss et al. The green bars in Figure 8 provide a list of significantly enriched functions and locations for our set of drug targets. Our results indicate that most of the drug targets are enzymes, including kinases and oxidoreductases, followed by substatial numbers of channels, and in particular ion channels.
They are often involved in binding, signalling, regulation, and transport. These finding are in close agreement with the results in Bakheet and Doig, Figure 8 also shows that drug targets are primarily found in membranes, with a large numbers also found in the cytoplasm and the intracellular space.
Consistent results are found in Bakheet and Doig, ; Wang et al. Figure 8 Molecular functions, processes, and subcellular locations that are enriched among the drug targets D dataset and the possibly druggable proteins Nd dataset. The bars quantify the ratios of enrichment relative to the human proteome and the corresponding p-values are shown on the right.
GO terms are identified on the left, including their names and the number of the correspnding proteins in the given dataset.
This study is the first to perform this type of analysis for the possibly druggable proteins orange bars in Figure 8. Our analysis suggests that the possibly druggable proteins share functional similarities with the drug targets. They are similarly involved in the catalysis, signaling, and binding. However, the possibly druggable proteins tend to bind proteins and nucleic acids, instead of anions and ions which are the main partners for the drug targets. Moreover, the possibly druggable proteins are often involved in the metabolic and biosynthesis processes, and in the cell death cycle.
The preference for the protein-protein and protein-nucleic acids binding and the cell death cycle involvement are supported by their significant enrichment in the intrinsic disorder compared to the drug targets, see Figures 5A, B , and the fact that disordered regions are known to facilitate these types of functions Vuzman and Levy, ; Uversky et al.
We further investigate this in Figure 9 that analyzes the differences in the content of the putative disordered protein-protein binding regions. These results confirm the enrichment in the corresponding functional annotations for the possibly druggable proteins. Interestingly, Figure 8 also reveals that the possibly druggable proteins are localized across the cell and they do not have a specifically associated subcellular location, unlike the drug targets that are found mostly in the membranes and cytoplasm.
Overall, our empirical analysis provides new insights into the cellular functions and subcellular locations of the druggable proteins. Recent research approximates that the druggable human proteome has about 4, proteins Finan et al. Annotation of the remaining druggable human proteins would facilitate development and screening of drugs, drug repurposing and repositioning, understanding and mitigation of drug side-effects, and prediction of drug—protein interactions.
We contrast the drug targets against the possibly druggable and non-druggable proteins to identify markers that could be used to identify novel druggable proteins. This is in contrast to the prior studies that compare drug targets against non-drug targets Zheng et al.
We annotate the possibly druggable and non-druggable proteins based on the presence and promiscuity of disease associations, and we validate these annotations via functional similarity analysis. We cover a wide range of sequence-derived characteristics to define these markers.
These characteristics can be computed across the entire human proteome, allowing for a complete sweep of all potential candidate proteins. We investigate several important characteristic that were missed in the past studies including putative intrinsic disorder, residue-level conservation, presence and number of alternative splicing isoforms, inclusion of domains, and putative solvent accessibility surface area , as well as the key features from the prior works, such as the topological features of PPIs, cellular functions and subcellular locations.
Figure 10 summarizes the results. It shows the difference in the values of the key markers when comparing the possibly druggable proteins in orange , the non-druggable proteins in brown , all non-drug targets in red , and the expanded set of human and human-like drug targets in light green against the human drug targets in dark green. We observe that the possibly druggable proteins are significantly more similar to the drug targets than the non-druggable proteins for majority of the markers.
These markers include high abundance of alternative splicing isoforms, relatively large number of domains, higher degree of centrality in the corresponding PPI network and correspondingly much higher rate of hubs , lower number of conserved residues, and lower number of residues on the putative sequence-derived surface.
Thus, these factors could serve as high-quality markers for druggability. This suggests that the high levels of disorder combined with the presence of the abovementioned markers should be used together to effectively enlarge the current collection of drug targets.
This is in accord with several recent studies that postulate inclusion of the disorder-enriched proteins into the set of druggable proteins Cuchillo and Michel, ; Uversky, ; Chen and Tou, ; Joshi and Vendruscolo, ; Ambadipudi and Zweckstetter, ; Hu et al.
The markers are sorted in the ascending order by the difference for the non-druggable proteins in brown. Our analysis also shows that the possibly druggable proteins are functionally similar to the drug targets, being involved in the catalysis, signaling, and binding. The main difference is that the possibly druggable proteins target interactions with proteins and nucleic acids, unlike the current drug targets that favor interactions with anions and ions.
Figure 10 points to the high amount of the disordered protein-binding regions for the possibly druggable proteins compared to the drug targets, which is in concert with the disordered nature of the druggable proteins. This is in agreement with the literature that shows that disordered regions often facilitate PPIs Mohan et al.
Finally, we show that the possibly druggable proteins are involved in the metabolic and biosynthesis processes and that they are localized across the cell, without a preference for specific subcellular locations. This is unlike the current drug targets that are located primarily in the membranes. To sum up, our empirical analysis has led us to formulate several markers that may help with identifying novel druggable human proteins and has produced interesting insights into the cellular functions and subcellular locations of potentially druggable proteins.
Overington , B. Al-Lazikani , A. Hopkins Published 1 December Medicine Nature Reviews Drug Discovery For the past decade, the number of molecular targets for approved drugs has been debated. Here, we reconcile apparently contradictory previous reports into a comprehensive survey, and propose a consensus number of current drug targets for all classes of approved therapeutic drugs.
One striking feature is the relatively constant historical rate of target innovation the rate at which drugs against new targets are launched ; however, the rate of developing drugs against new families is… Expand. View on Springer. Save to Library Save. Create Alert Alert. Share This Paper. Background Citations. Methods Citations.
Gene name. Class Biological process Molecular function Disease. External id. Reliability Enhanced Supported Approved Uncertain. Reliability Supported Approved. Validation Supported Approved Uncertain. Annotation Intracellular and membrane Secreted - unknown location Secreted in brain Secreted in female reproductive system Secreted in male reproductive system Secreted in other tissues Secreted to blood Secreted to digestive system Secreted to extracellular matrix.
Searches Enhanced Supported Approved Uncertain Intensity variation Spatial variation Cell cycle intensity correlation Cell cycle spatial correlation Cell cycle biologically Custom data cell cycle dependant Cell cycle dependent protein Cell cycle independent protein Cell cycle dependent transcript Cell cycle independent transcript Multilocalizing Localizing 1 Localizing 2 Localizing 3 Localizing 4 Localizing 5 Localizing 6 Main location Additional location.
Type Protein Rna. Phase G1 S G2 M. Cell type. Expression Not detected Low Medium High. Cell type Any Alveolar cells type 1 Alveolar cells type 2 B-cells Basal glandular cells Basal keratinocytes Bipolar cells Cardiomyocytes Cholangiocytes Ciliated cells Club cells Collecting duct cells Cone photoreceptor cells Cytotrophoblasts Distal tubular cells Ductal cells Early spermatids Endothelial cells Enterocytes Erythroid cells Exocrine glandular cells Extravillous trophoblasts Fibroblasts Glandular cells Granulocytes Hepatocytes Hofbauer cells Horizontal cells Intestinal endocrine cells Ito cells Kupffer cells Late spermatids Leydig cells Macrophages Melanocytes Monocytes Mucus-secreting cells Muller glia cells Pancreatic endocrine cells Paneth cells Peritubular cells Proximal tubular cells Rod photoreceptor cells Sertoli cells Smooth muscle cells Spermatocytes Spermatogonia Suprabasal keratinocytes Syncytiotrophoblasts T-cells Undifferentiated cells Urothelial cells.
Category Cell type enriched Group enriched Cell type enhanced Low cell type specificity Not detected Detected in all Detected in many Detected in some Detected in single Is highest expressed. Category Cell line enriched Group enriched Cell line enhanced Low cell line specificity Not detected Detected in all Detected in many Detected in some Detected in single Is highest expressed.
The question is often raised, but the answer remains to be uncovered because the definition of drug "target" continues to evolve. Historical conceptualization is focused on catalytic sites, substrate binding sites, or epigenetic modification sites. Current understanding that protein-protein interactions are druggable, along with the emerging realization that "nodes" in signaling pathways and biological networks themselves can be manipulated with small molecules in non-traditional ways, has opened up new targeting options.
This review is intended to provide a status update, and you can also access a list of 36 actionable web resources for target hunting. Overington and colleagues concluded on the "basis of existing knowledge" that "all current drugs with a known mode-of-action act through distinct molecular drug targets" Ref.
0コメント