Biobix: Applied Bioinformatics Research Thesisonderwerpen
Lopend onderzoek Biomerker predictie / Methylatie Metabonomics
Peptidomics Translational biotechnology (text mining) Structural
Genomics miRNA prediction / Target Prediction Exploring genomic
dark matter (junk mining) Samenwerking met diverse instituten
Ambities om te peer-reviewed te publiceren
The reason for bioinformatics to exist ? empirical finding: if
two biological sequences are sufficiently similar, almost
invariably they have similar biological functions and will be
descended from a common ancestor. (i) function is encoded into
sequence, this means: the sequence provides the syntax and (ii)
there is a redundancy in the encoding, many positions in the
sequence may be changed without perceptible changes in the
function, thus the semantics of the encoding is robust.
Protein Structure Introduction Why ? How do proteins fold ?
Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein
Database (PDB) Protein Modeling Bioinformatics & Proteomics
Weblems
Why protein structure ? Proteins perform a variety of cellular
tasks in the living cells Each protein adopts a particular folding
that determines its function The 3D structure of a protein can
bring into close proximity residues that are far apart in the amino
acid sequence Catalytic site: Business End of the molecule
Rationale for understanding protein structure and function
structure determination Protein sequence structure prediction
-large numbers of sequences, including Protein structure whole
genomes - three dimensional - complicated - mediates function
?Protein function homology rational mutagenesis- rational drug
design and treatment of disease biochemical analysis- protein and
genetic engineering model studies- build networks to model cellular
pathways- study organismal function and evolution
About the use of protein models (Peitch) Structure is preserved
under evolution when sequence is not Interpreting the impact of
mutations/SNPs and conserved residues on protein function.
Potential link to disease Function ? Biochemical: the chemical
interactions occerring in a protein Biological: role within the
cell Phenotypic: the role in the organism Gene Ontology functional
classification ! Priorisation of residues to mutate to determine
protein function Providing hints for protein function:Catalytic
mechanisms of enzymes often require key residues to be close
together in 3D space (protein-ligand complexes, rational drug
design, putative interaction interfaces)
MIS-SENSE MUTATIONe.g. Sickle Cell AnaemiaCause: defective
haemoglobin due to mutation in -globin geneSymptoms: severe anaemia
and death in homozygote
Normal -globin - 146 amino acidsval - his - leu - thr - pro -
glu - glu - --------- 1 2 3 4 5 6 7Normal gene (aa 6) Mutant
geneDNA CTC CACmRNA GAG GUGProduct Glu ValineMutant -globinval -
his - leu - thr - pro - val - glu - ---------
Protein Conformation Christian Anfinsen Studies on reversible
denaturation Sequence specifies conformation Chaperones and
disulfide interchange enzymes: involved but not controlling final
state, they provide environment to refold if misfolded Structure
implies function: The amino acid sequence encodes the proteins
structural information
How does a protein fold ? by itself: Anfinsen had developed
what he called his "thermodynamic hypothesis" of protein folding to
explain the native conformation of amino acid structures. He
theorized that the native or natural conformation occurs because
this particular shape is thermodynamically the most stable in the
intracellular environment. That is, it takes this shape as a result
of the constraints of the peptide bonds as modified by the other
chemical and physical properties of the amino acids. To test this
hypothesis, Anfinsen unfolded the RNase enzyme under extreme
chemical conditions and observed that the enzymes amino acid
structure refolded spontaneously back into its original form when
he returned the chemical environment to natural cellular
conditions. "The native conformation is determined by the totality
of interatomic interactions and hence by the amino acid sequence,
in a given environment."
Protein Structure Introduction Why ? How do proteins fold ?
Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein
Database (PDB) Protein Modeling Bioinformatics & Proteomics
Weblems
The Basics Proteins are linear heteropolymers: one or more
polypeptide chains Below about 40 residues the term peptide is
frequently used. A certain number of residues is necessary to
perform a particular biochemical function, and around 40-50
residues appears to be the lower limit for a functional domain
size. Protein sizes range from this lower limit to several hundred
residues in multi-functional proteins. Three-dimentional shapes
(folds) adopted vary enormously Experimental methods: X-ray
crystallography NMR (nuclear magnetic resonance) Electron
microscopy Ab initio calculations
Levels of protein structure Zeroth: amino acid composition
(proteomics, %cysteine, %glycine)
Amino Acid Residues The basic structure of an a-amino acid is
quite simple. R denotes any one of the 20 possible side chains (see
table below). We notice that the Ca-atom has 4 different ligands
(the H is omitted in the drawing) and is thus chiral. An easy trick
to remember the correct L-form is the CORN-rule: when the Ca-atom
is viewed with the H in front, the residues read "CO-R-N" in a
clockwise direction.
Amino Acid Residues
Amino Acid Residues
Amino Acid Residues
Amino Acid Residues
Levels of protein structure Primary: This is simply the order
of covalent linkages along the polypeptide chain, I.e. the sequence
itself
Backbone Torsion Angles
Backbone Torsion Angles
Levels of protein structure Secondary Local organization of the
protein backbone: alpha- helix, Beta-strand (which assemble into
Beta- sheets) turn and interconnecting loop.
Ramachandran / Phi-Psi Plot
The alpha-helix
A Practical Approach: Interpretation Residues with hydrophobic
properties conserved at i, i+2, i+4 separated by unconserved or
hydrophilic residues suggest surface beta- strands. A short run of
hydrophobic amino acids (4 residues) suggests a buried beta-
strand. Pairs of conserved hydrophobic amino acids separated by
pairs of unconserved, or hydrophilic residues suggests an
alfa-helix with one face packing in the protein core. Likewise, an
i, i+3, i+4, i+7 pattern of conserved hydrophobic residues.
Beta-sheets
Topologies of Beta-sheets
Secondary structure prediction ?
Secondary structure prediction:CHOU-FASMAN Chou, P.Y. and
Fasman, G.D. (1974). Conformational parameters for amino acids in
helical, - sheet, and random coil regions calculated from proteins.
Biochemistry 13, 211-221. Chou, P.Y. and Fasman, G.D. (1974).
Prediction of protein conformation. Biochemistry 13, 222-245.
Secondary structure prediction:CHOU-FASMAN Method Assigning a
set of prediction values to a residue, based on statistic analysis
of 15 proteins Applying a simple algorithm to those numbers
Secondary structure prediction:CHOU-FASMAN Calculation of
preference parameters For each of the 20 residues and each
secondary structure ( - helix, -sheet and -turn): observed counts P
= Log --------------------- + 1.0 expected counts Preference
parameter > 1.0 specific residue has a preference for the
specific secondary structure. Preference parameter = 1.0 specific
residue does not have a preference for, nor dislikes the specific
secondary structure. Preference parameter < 1.0 specific residue
dislikes the specific secondary structure.
Secondary structure prediction:CHOU-FASMAN Applying algorithm1.
Assign parameters to residue.2. Identify regions where 4 out of 6
residues have P(a)>100: -helix. Extend helix in both directions
until four contiguous residues have an average P(a)P(b): -helix.3.
Repeat this procedure to locate all of the helical regions.4.
Identify regions where 3 out of 5 residues have P(b)>100:
-sheet. Extend sheet in both directions until four contiguous
residues have an average P(b)105 and P(b)>P(a): -helix.5. Rest:
P(a)>P(b) -helix. P(b)>P(a) -sheet.6. To identify a bend at
residue number i, calculate the following value: p(t) =
f(i)f(i+1)f(i+2)f(i+3) If: (1) p(t) > 0.000075; (2) average
P(t)>1.00 in the tetrapeptide; and (3) averages for tetrapeptide
obey P(a)
P(b): -turn.
Secondary structure prediction:CHOU-FASMAN Successful method?
19 proteins evaluated: Successful in locating 88% of helical and
95% of regions Correctly predicting 80% of helical and 86% of -
sheet residues Accuracy of predicting the three conformational
states for all residues, helix, b, and coil, is 77% Chou &
Fasman:successful method After 1974:improvement of preference
parameters
Sander-Schneider: Evolution of overall structure Naturally
occurring sequences with more than 20% sequence identity over 80 or
more residues always adopt the same basic structure (Sander and
Schneider 1991)
Structural Family Databases SCOP: Structural Classification of
Proteins FSSP: Family of Structurally Similar Proteins CATH: Class,
Architecture, Topology, H omology
Levels of protein structure Tertiary Packing of secondary
structure elements into a compact spatial unit Fold or domain this
is the level to which structure is currently possible
Domains
Protein Architecture
Domains Protein Dissection into domain Conserved Domain
Architecture Retrieval Tool (CDART) uses information in Pfam and
SMART to assign domains along a sequence (automatic when
blasting)
Domains From the analysis of alignment of protein families
Conserved sequence features, usually associate with a specific
function PROSITE database for protein signature protein (large
amount of FP & FN) From aligment of homologous sequences
(PRINTS/PRODOM) From Hidden Markov Models (PFAM) Meta approach:
INTERPRO
The positive inside rule(EMBO J. 5:3021; EJB 174:671,205:1207;
FEBS lett. 282:41) Bacterial IM In: 16% KR out: 4% KR Eukaryotic PM
In: 17% KR out: 7% KR Thylakoid membrane In: 13% KR out: 5% KR
Mitochondrial IM In: 10% KR out: 3% KR
GPCR Topology Membrane-bound receptors Transducing messages as
photons, organic odorants, nucleotides, nucleosides, peptides,
lipids and proteins. 6 different families A very large number of
different domains both to bind their ligand and to activate G
proteins. Pharmaceutically the most important class Challenge:
Methods to find novel GCPRs in human genome
GPCR Topology
GPCR Topology GPCR Structure Seven transmembrane regions
Hydrophobic/ hydrophilic domains Conserved residues and motifs
(i.e. NPXXY)
Levels of protein structure Difficult to predict Functional
units: Apoptosome, proteasome
Protein Structure Introduction Why ? How do proteins fold ?
Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein
Database (PDB) Protein Modeling Bioinformatics & Proteomics
Weblems
What is X-ray Crystallography X-ray crystallography is an
experimental technique that exploits the fact that X-rays are
diffracted by crystals. X-rays have the proper wavelength (in the
ngstrm range, ~10-8 cm) to be scattered by the electron cloud of an
atom of comparable size. Based on the diffraction pattern obtained
from X-ray scattering off the periodic assembly of molecules or
atoms in the crystal, the electron density can be reconstructed. A
model is then progressively built into the experimental electron
density, refined against the data and the result is a quite
accurate molecular structure.
NMR or Crystallography ? NMR uses protein in solution Can look
at the dynamic properties of the protein structure Can look at the
interactions between the protein and ligands, substrates or other
proteins Can look at protein folding Sample is not damaged in any
way The maximum size of a protein for NMR structure determination
is ~30 kDa.This elliminates ~50% of all proteins High solubility is
a requirement X-ray crystallography uses protein crystals No size
limit: As long as you can crystallise it Solubility requirement is
less stringent Simple definition of resolution Direct calculation
from data to electron density and back again Crystallisation is the
process bottleneck, Binary (all or nothing) Phase problem Relies on
heavy atom soaks or SeMet incorporation Both techniques require
large amounts of pure protein and require expensive equipment!
Protein Structure Introduction Why ? How do proteins fold ?
Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein
Database (PDB) Protein Modeling Bioinformatics & Proteomics
Weblems
PDB
PDB
PDB
PDB
Visualizing Structures Cn3D versie 4.0 (NCBI)
Visualizing Structures Ball: Van der Waals radius N, blue/O,
red/S, yellow/C, gray (green) Stick: length joins center
Visualizing Structures From N to C
Visualizing Structures Demonstration of Protein explorer PDB,
install Chime Search helicase (select structure where DNA is
present) Stop spinning, hide water molecules Show basic residues,
interact with negatively charged backbone RASMOL / Cn3D
Protein Structure Introduction Why ? How do proteins fold ?
Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein
Database (PDB) Protein Modeling Bioinformatics & Proteomics
Weblems
Modeling
Protein Stucture Molecular Modeling:building a 3D protein
structure from its sequence
Modeling Finding a structural homologue Blast versus PDB
database or PSI- blast (E40% identity, any aligment method is OK
5.0 ~ 3.0 ~ 2.5 ~ 2.0 CASP4: overall model accuracy ranging from 1
to 6 for 50-10% sequence identity**T128/sodm 1.0 (198 residues;
50%) **T111/eno 1.7 (430 residues; 51%) **T122/trpa 2.9 (241
residues; 33%)**T125/sp18 4.4 (137 residues; 24%) **T112/dhso 4.9
(348 residues; 24%) **T92/yeco 5.6 (104 residues; 12%)