Document Type : Original Research Article
Department of Chemistry, Faculty of Physical Science, Ahmadu Bello University, P.M.B 1045, Zaria, Kaduna State Nigeria
Epidermal growth factor receptor (EGFR) belongs to the tyrosine kinase receptor family and plays a significant role in critical cellular procedures in many cancers. EGFR was also identified as the main target for combating of tumor related-illness such as non-small cell lung cancer (NSCLC). NSCLC was the most common and lethal type of lung cancer, with nearly 1.8 million cases and less than 20% survival rate in every 5-years after diagnosis. This research aimed to identify potential EGFR-tyrosine kinase inhibitors (TKIs) using computer-aided techniques. The virtual screening executed identified compounds C4, C6, C14, and C26 as the hit compounds, with compound C6 as the best hit with MolDock score: -138.245 kcal/mol, re-rank score: -116.868 and pose energy: -79.4185, respectively. All the identified hit EGFR-TKIs were seen to have a higher MolDock score than the reference drug Afatinib with a MolDock score of -112.894 kcal/mol. Based on the quantum chemical calculations, compound C4 was the most reactive among the hit, with a minor energy gap of 3.20 eV. The best hit EGFR-TKIs were ascertained to be drug-like and orally bioavailable due to their compliance with the filtering criteria used in evaluating their drug-likeness. Furthermore, their average pharmacokinetic profiles were displayed based on their absorption, distribution, metabolism, excretion, and toxicity (ADMET) features. These hit EGFR-TKIs can serve as potential EGFR-TKIs because of their affinity towards EGFR-TK receptor, reactivity and safety.
In 1956, Stanley Cohen discovered EGFR while working at the University of Vanderbilt, USA . As disseminated in the mammalian cell surfaces and cells epithelial, ﬁbroblasts, cells glial, keratinocytes, EGFR which belongs to tyrosine kinase receptor family plays a significant part in crucial cellular processes which include propagation, variation, movement, apoptosis, and angiogenesis and in many cancers. EGFR overexpression has been described in several cancers. EGFR was also identified as the primary target for combating of tumor-related illnesses, such as NSCLC [2-5]. Consequently, developing potent EGFR inhibitors is one of the research hotspots for managing human tumors like lung cancer (most especially non-small cell lung cancer) [6, 7].
Lung cancer remains the foremost cause of tumor-related death worldwide. This tumor (lung cancer) is divided into Oat cell lung cancer (known as small cell lung cancer) and non-small cell lung cancer, respectively . NSCLC, which comprises bulky cell carcinoma, glandular carcinoma, and squamous cell carcinoma, was the most common and lethal type of tumor (cancers of the lung), with close to 2.0 million cases and about 17.0% rate of survival every five years after diagnosis at all stages and 2.0% rate of survival in every 5-years at the fourth (IV) stage [9, 10]. NSCLC accounted for about 80% to 85% of lung cancers, while Oat cell lung cancer accounted for about 15 to 20% of remaining lung cancer cases . The critical approaches used in the management of NSCLC are operation, radio-therapy targeted-therapy, and chemotherapy. Despite of development in the management modalities, the diagnosis in NSCLC patients has not meaningfully improved .
EGFR-TKIs have been used to manage NSCLC among patients with the mutated activated, classical, and gatekeeper classes of EGFR mutations [13, 14]. EGFR-TKIs targeting the NSCLC, such as Erlotinib and Gefitinib (first-generation EGFR-TKIs), can inhibit tumor growth by binding to the EGFR ATP-binding site as well as providing substantial medical assistance in patients with NSCLC by harboring the EGFR activating (L858R) and classical (del. E746-A750) mutations in exon 19 and exon 21, respectively . However, the time frame for their efficacy was minimal due to the resistance established by the EGFRT790M (the gatekeeper mutation), which was observed in about 50% of patients with EGFRT790M mutation . Afatinib and dacomitinib (second-generation EGFR-TKIs) were developed to resolve the induced EGFRT790M mutation-related resistance to the EGFR-TKIs of the first-generation [15, 17]. Unfortunately, their clinical effectiveness was limited due to serious side effects such as rashes on the skin and toxicity of gastrointestinal as well as the nonexistence of choices between EGFR mutant and its wild type (EGFRWT) [17, 18]. Osimertinib) and rociletinib (third-generation EGFR-TKIs) were designed and established to inhibit the EGFRT790M resistance mutation while being more selective to the EGFRWT and overcoming the observed toxicity to the irreversible EGFR-TKIs of the second-generation. However, these irreversible third-generation EGFR-TKIs cannot achieve the stated goal due to the emergence of C797S mutation [17, 19, 20].
Pawara and a co-worker reported computational identification of 2,4-disubstituted amino-pyrimidines as L858R/T790M-EGFR double mutant inhibitors using pharmacophore mapping, molecular docking, binding free energy calculation, DFT study, and molecular dynamic simulation . Pharmacophore modeling, 3D-QSAR, docking, and ADME prediction of quinazoline-based EGFR inhibitors were reported by shaquiquzzaman in 2016 . This research aims to identify potential EGFR-TKIs by employing some computer-aided techniques; structure-based virtual screening, quantum mechanical calculations, drug-likeness, and ADMET properties evaluation.
Materials and Methods
Dataset, structure generation and optimization
Thirty (30) sets of quinazoline analogs bearing aryl semicarbazone scaffolds as potent EGFR-TKIs were designed, synthesized and assessed for their anti-proliferative activities against four different cancer cell lines: A549, HepG2, MCF-7, PC-3 and EGFR-TK by Tu et al. were sourced and used in this work . The software developed by the University of Cambridge (Chemdraw) was used in drawing the two-dimensional structures of the sourced dataset . In this work, the MMFF with density functional Theory (DFT) at B3LYP/6-311G* theory level (which gives better results for neutral molecules) was used for the optimum conformational search for all the thirty (30) sets of quinazoline analogs [17, 23].
Preparation of the quinazoline analogues (Ligands)
The optimum conformations of the thirty sets of quinazoline analogues obtained were saved in mol2 file format. The thirty sets of quinazoline analogues were further prepared using the default setting of the molegro virtual docker (MVD) by selecting "if missing" to all set of the parameters, respectively [24, 25].
Identification of the amino acids in the active site of the 3IKA-EGFR-TK receptor
The crystal structure of the 3IKA-EGFR-TK receptor, together with WZ4002 (as its co-crystalized ligand) was retrieved from the RCSB pdb database (https://www.rcsb.org/), respectively . In this work, for the identification of the group of amino acids in the binding poses of the 3IKA-EGFR-TK receptor, the co-crystalized ligand (WZ4002) fused to the 3IKA-EGFR-TK receptor were visualized with version 184.108.40.20650 of discovery studio. The group of amino acids recognized in the binding poses of the 3IKA-EGFR-TK receptor were ALA743, LEU844, LEU718, MET793, MET790, and VAL726, respectively . The 2D view of the native ligand in the binding poses of the 3IKA-EGFR-TK receptor is presented in Figure 1.
Figure 1. The WZ4002 in the binding poses of 3IKA-EGFR-TK receptor in 2D-view
Preparation of 3IKA-EGFR-TK receptor
Before investigating the protein-ligand interactions, the 3IKA-EGFR-TK receptor was prepared by importing it onto the 3D view space (interface) of the MVD. Then, a group of amino acids with structure errors were reconstructed and restored. Furthermore, after re-building and repairing the groups of amino acids with errors in their structures, the surface was molded, and cavities were detected before removing the WZ4002 from the 3IKA-EGFR-TK receptor, respectively .
Molecular docking calculations and validation of the molecular docking procedure
The molecular docking protocol was implemented by choosing the molecular docking algorithm to be the plant score and the scoring function to be the MolDock score, respectively. The grid box for defining the binding poses that consist of the binding cavities in the protein was set to be 24Å. For all other calculations, default settings were sustained and preserved . To successfully validate the molecular docking protocol, the co-crystallized ligand (the best hit compound 6) is re-docked into the binding pose of the 3IKA-EGFR-TK receptor, and the root means square deviation (RMSD) value is calculated .
Quantum chemical calculations
Determining the structural behavior of the best hit compounds at the end of the molecular docking-based virtual screening is of utmost importance to see how structural orientation influences the best hit compound's biological activities. For this purpose, DFT computations were performed to gain insight into the comprehensive information in structure, electronics, and energy states for the best hit compounds. The computation of the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) was achieved using spartan 14 software at the B3LYP theory level, using 6-31G* basis set and hybrid DFT, respectively [31-33]. The quantum chemical descriptors (chemical hardness (η), softness (δ), electronegativity (χ), and chemical potential (µ)) were computed and were derived from the HOMO and LUMO energies of the investigated compounds .
Drug-like and ADMET-Pharmacokinetics Studies
The modeling of the drug-like and ADMET features of these investigated compounds were evaluated using the SWISSADME (http://www.swissadme.ch/index.php) and pkCSM (http://structure.bioc.cam.ac.uk/pkcsm) online web servers, respectively [22, 35, 36].
Result and Discussions
Structure-based (molecular docking) virtual screening
The molecular docking as a tool in structure-based design is performed to virtually screen all the investigated EGFR-TKIs and identify which among them has the highest affinity against EGFR-Tk receptor (as the best hit) that can serve as potential EGFR-TKI, respectively. The investigated compounds were scored based on their MolDock score values. The MolDock score values of the investigated compounds range between -85.0697 to -138.245 kcal/mol, respectively. Based on the virtual screening performed on the investigated compounds, compound C6 with a MolDock score of -138.245 kcal/mol and -116.868 as its re-rank score was identified to be the best hit, followed by C14 with a MolDock score of -137.792 kcal/mol and -117.998 as its re-rank score, followed by C4 with a MolDock score of -136.397 kcal/mol and -113.395 as its re-rank score and then C26 with a MolDock score of -133.261 kcal/mol and -112.208 as its re-rank score, respectively (Table 1). The docking parameters (such as the MolDock and re-rank scores) for the hit compounds were higher than the minimum recommended values of -90.00 kcal/mol and -80.0, respectively .
Table 1. Ligand-protein interactions of the identified four (4) top hit compounds
The interactions between C6 and the binding poses of the 3IKA-EGFR-TK receptor observed were hydrogen bonds with these set of amino acids LYS745 (2.30Å), ASN842 (3.03Å), ARG841 (2.50Å) and GLU762 (2.64Å), hydrophobic interactions with these set of amino acids PHE723, LEU792, LEU718, VAL726, ALA743, LEU844 and ARG841, electrostatic interaction with ASP855 residue, and other interactions with MET790 and CYS797 group of amino acids as shown in Table 1 and Figure 2A, respectively.
The interactions of C14 with the binding poses of the 3IKA-EGFR-TK receptor were via hydrogen bonds with amino acid residue PRO794 (2.85Å), hydrophobic interactions with these set of amino acids LEU718, GLY796, PHE723, ALA743, MET790 and LEU844, electrostatic interaction with ASP855 (2) residue, and halogen interactions with ASP800 amino acid as clearly presented in Table 2 and Figure 2B, respectively.
The interactions between C4 and the active site of the 3IKA-EGFR-TK receptor were observed to be with these group of amino acids CYS797 (2.13Å), ARG841 (2.86Å), ARG841 (2.56Å) through hydrogen bonds. Hydrophobic interactions with this set of amino acid residues ALA743, MET793, LEU844, CYS797, LEU799, ARG841, and LEU718 were also seen. The electrostatic interaction with this set of amino acid ASP800 was seen in Table 1 and Figure 2C.
C26 interacted with the binding poses of the 3IKA-EGFR-TK receptor with these set of amino acids LYS745 (2.56Å), THR854 (2.56Å), and GLN791 (2.56Å) via hydrogen bond. It also interacted with this set of amino acids LEU718, LEU858, LEU792, PHE723, ALA743, LEU844, MET793, and LEU844 via hydrophobic interactions. Apart from hydrogen bond and hydrophobic interactions, it further interacted with this set of amino acids LYS745 and MET790 via electrostatic and other interactions, as depicted in Table 1 and Figure 2D, respectively.
Figure 2. 2D interaction of (A) C6, (B) C14, (C) C6 and (D) C26 in complex with 3IKA-EGFR-TK receptor
Afatinib, an FDA-accepted drug, was also docked into the active site of the 3IKA- EGFR-TK receptor to compare the reference drug (Afatinib) and the identified best-hit compounds. Afatinib with a MolDock score of 112.894 kcal/mol and a re-rank score of -91.2323 interacted via hydrogen bond interactions with this set of amino acid residues: MET793 (1.80Å), LEU792 (2.74Å), GLN791 (2.64Å), PRO794 (2.45Å) and MET793 (2.34Å), hydrophobic interactions with these sets of residues LEU718 (2), GLY719, ALA743, MET793, LEU844, PHE723 and VAL726, and halogen interactions with SER720 residues as presented in Table 1 and Figure 3A, respectively. All the identified hit compounds were seen to have better MolDock scores and re-rank scores than Afatinib, the reference drug.
The native ligand (compound 6) was re-docked into the binding poses of the 3IKA-EGFR-TK receptor to successfully validate the molecular docking procedure, and the root means square deviation (RMSD) value was calculated to be 3.47Å which indicates that the compound 6 deviated from its initial geometry with an RMSD value of 3.47Å, respectively. This further validates the reproducible molecular docking procedure employed in this study. The 3D interactions of the superimposed ligands (re-docked compound 6 and the co-crystallized ligand 6) in the binding pose of the 3IKA-EGFR-TK receptor are demonstrated in Figure 3B.
Figure 3. 2D interactions of (A) Afatinib in complex with 3IKA-EGFR-TK receptor and (B) 2D interaction of re-docked compound 6 and the co-crystallized ligand 6 in the binding pose of 3IKA-EGFR-TK receptor
Quantum Chemical Calculations
The energy DFT calculations on the best hit EGFR-TKIs was performed at the B3LYP/6-31G* level of theory. Based on these calculations, compound C4 has the least energy gap (ΔE) of 3.20 eV (Table 2), which is the most reactive among them and third-ranked in terms of affinity towards the 3IKA-EGFR-TK receptor. The trend in terms of increasing order in their reactivity is according to: C4 (3.20eV) > C14 (3.42eV) > C6 (4.02eV) > C26 (4.12eV), respectively. The reactivity of the best hit molecules was further confirmed by the values of the quantum chemical descriptors (η, δ, χ and µ), which were derivatives of the HOMO and LUMO energies. The HOMO and LUMO maps of the reactive compound (compound C4) is shown in Figure 4A and B. The molecular electrostatic potential surface (MEPS) map describes the charge propagation and distribution (positive and negative), which gives a better understanding of a molecule's chemical and physical features. It also forecasts the reactive sites for a nucleophilic or electrophilic attack in a molecule for binding to a receptor protein in a ligand-protein interaction. MEPS map of the reactive compound (compound C4) is depicted in Figure 4C. From the MEPS maps, the red color indicates negative potential (nucleophilic region), which has the affinity to attract proton; the blue color represents the positive potential (nucleophilic region) which has the affinity to reject proton, and the green color represents zero potential. For the studied molecules, the blue regions are located on the amino groups of the molecules. In contrast, their red colors are located on the oxygen and nitrogen atoms of the molecules. And the green colors were distributed on the carbon atoms, respectively.
Table 2. HOMO, LUMO, Energy gap, and quantum chemical descriptors of the hit compounds
Chemical Hardness: η, Chemical Softness: δ, electronegativity: χ, and Chemical potential: µ.
Figure 4. The (A) HOMO, (B) LUMO and (C) MEP maps of compound C4
Drug-like and ADMET evaluation
SwissADME web server was utilized to theoretically predict the drug-likeness of the investigated EGFR-TKIs, including the best hit identified by adopting the filtering conditions of Lipinski's rule of five . Any small molecule that violates more than 1 of the stated condition may have issues related to its bioavailability. The drug-like features of the best-hit EGFR-TKIs showed only 1 violation (that is, their molecular weight was more than 500) (Table 3). All the best-hit EGFR-TKIs have their number hydrogen of bond donors between 3 and 4, which is within the accepted range. Their number of hydrogen bond acceptors is between 7 and 9, which is also within the accepted range. Furthermore, the calculated MlogP value was never beyond the accepted range of 5. Based on filtering conditions, the best-hit EGFR-TKIs were ascertained to be drug-like by not violating more than the threshold values set by the filtering conditions. Furthermore, according to the value of their bioavailability score (0.55 for all), they were all further confirmed to be orally bioavailable and pharmacologically active.
Table 3. Drug-like features of the best hit compounds based on Lipinski's RO5 filtering criteria
The online web server pkCSM was also used to theoretically evaluate the ADMET features of the investigated EGFR-TKIs, including the best hit compounds identified. The best identified hit EGFR-TKIs were all observed to have human intestinal absorption (HIA) between the range of 88.462 to 95.723%, respectively. The values of their HIA were found to be greater than the minimum recommended rate of 30% set for the assessment of this property. This signals that these EGFR-TKIs can be absorbed within the human intestine. The recognized threshold value set for the blood-brain barrier (BBB) permeability is < -1 to > 0.3 and central nervous system (CNS) permeability is > -2 to < -3, respectively. The BBB permeability for these hit EGFR-TKIs were observed to be <-1; this means that the compounds can poorly penetrate via the BBB. For their CNS permeability value, it is > -3 for all, except for C6, the best hit (-2.795), which indicates that they can as well poorly penetrate the CNS except C6, respectively. Cytochrome is important in the enzymatic breakdown/ metabolism of small molecules in the body. As such, it becomes necessary to consider the breakdown/metabolism of these small molecules in the human body. The following group of cytochromes (CYP) 1A2, 2C9, 2C19, 2D6, and 3A4 were responsible for the breaking down/metabolism of small molecules in the body, respectively. However, the significant one in the set is CYP3A4 (a good small molecule is expected to be a substrate and an inhibitor of CYP3A4). The best-recognized hits EGFR-TKIs were all substrates and inhibitors of CYP3A4, respectively. Based on this, it is further confirmed that these small molecules can be a breakdown in the body. Another important factor is how these small molecules can be removed from the body (excretion/total clearance). Excretion/total clearance defines the relationship between the removal rate and concentration within the body. The identified hit compounds displayed a greater excretion value and were found to be in the acknowledged threshold for a drug. Based on the toxicity test, they were all observed to be non-toxic. Finally, the general ADMET features of these best identified hit EGFR-TKIs displayed their average pharmacokinetic profiles (Table 4).
Table 4. ADMET-Pharmacokinetic features of the best hit compounds
The molecular docking-based virtual screening executed identified compounds C6, C14, C4, and C26 with the highest MolDock scores as the best hit EGFR-TKIs among the investigated compounds, respectively. The MolDock score values of the investigated EGFR-TKIs range between -85.0697 to -138.245 kcal/mol, respectively. Compound C6 has the highest MolDock score of -138.245 kcal/mol and a re-rank score of -116.868, respectively. All the identified hit EGFR-TKIs were seen to have higher MolDock scores than the reference drug Afatinib with a MolDock score of -112.894 kcal/mol and a re-rank score of -95.8943. The DFT calculations identified compound C4 as the most reactive among them, with the least energy gap (ΔE) of 3.20 eV, respectively. The best hit EGFR-TKIs were ascertained to be drug-like due to their compliance with the filtering criteria used in the evaluation of drug-likeness of a small molecule (or by not having more than one violation of the filtering conditions used).
Furthermore, based on the value of their bioavailability score of 0.55, they were all confirmed to be orally bioavailable and active pharmacologically. And their ADMET features displayed their average pharmacokinetic profiles. These hit compounds can serve as potential EGFR-TKIs because of their safety and efficacy after passing the pre-clinical trial. Also, they can serve as a template for designing new EGFR-TKIs.
The authors sincerely acknowledged the Tertiary Education Trust Fund (Tetfund) for sponsoring this research under the Tetfund IBR research project grant (identification number: TETF/DR&D/UNI/ZARIA/IBR/2020/VOL.1/52) as well as Ahmadu Bello University, Zaria for its technical support in the course of this research.
The authors declare no conflict of interest.
M. T. Ibrahim : 0000-0002-6146-5667
A. Uzairu : 0000-0002-6973-6361