Biology 403, Third Lecture
Wednesday 20 January 2010

Amino Acids and Peptides


Proteins are made up of amino acids, and free amino acids and oligomers of amino acids (peptides) play important roles in cells. We discuss amino acids and peptides in terms of their structures and functions.

Acid-base equilibrium

Almost all biochemical reactions take place in aqueous solution, for which there is a nonzero concentration of protons (hydrogen ions) and hydroxide ions present. By definition the pH of a solution is the negative base-10 logarithm of the hydrogen ion concentration; similarly, pOH is the negative logarithm of the hydroxide ion concentration. The product of the hydrogen ion concentration and the hydroxide ion concentration is almost precisely 10-14 M2, so
-log([H+][OH-]) = 14 = -log([H+]) - log([OH-]) = pH + pOH.

Neutral pH refers to the condition under which pH = pOH; clearly pH = 7 under those conditions. Pure water is at pH=pOH=7, but pH=7 can occur even with externally-introduced ions present, provided that the externally introduced ions yield the same number of protons as hydroxide ions.

The Henderson-Hasselbalch equation

A fundamental equation regarding ions in aqueous solution relates the acid-base equilibrium to the pH:
pH = pKa + log([base] / [acid])
where Ka is the equilibrium constant for the ionization of a solute.

This is the Henderson-Hasselbalch equation. With it we can determine the [base]/[acid] for a solute. Several of the problems in chapter 3 depend on this equation. We can actually derive this equation rather easily. The chemical reaction in which an acid becomes ionized can be represented as
HA → H+ + A-
Therefore the equilibrium constant for this process is
keq = [products]/[reactants] = [A-][H+] / [HA]
This particular equilibrium constant is so frequently analyzed that it has a particular name, namely
Kakeq for this reaction.
Thus Ka = [A-][H+] / [HA]
We define the pKa of a reaction to be the negative base-10 logarithm of this equilibrium constant, i.e.
pKa ≡ -log10Ka
But that means that
pKa = -log10 ([A-][H+] / [HA]) = -log10 (([A-] / [HA]) * [H+])
But the logarithm of a product is the sum of the logs, so
pKa = -(log10 (([A-] / [HA]) + log10[H+]) = -log10 ([A-] / [HA]) + -log10[H+]
But that second term (-log10[H+]) is, by definition, the pH of the system, so
pKa = -log10 ([A-] / [HA]) + pH
If we rearrange the terms in this equation,
pH = pKa + log10 ([A-] / [HA]) = pKa + log10[base]/[acid]
That was a pretty easy derivation, in my opinion. To solve the homework problems, though, you need to know the equation, and how to use it.

Amino acid structures

An amino acid is any molecule that contains both a carboxyl group and an amine group. The amino acids that serve as building blocks of proteins are alpha-amino acids, i.e. molecules in which the amine group and the carboxyl group are separated by one intervening saturated carbon atom:


There are amino acids of relevance to biochemistry that are not alpha-amino acids, such as beta alanine:


but we will concentrate on the alpha-amino acids today.

Note that we depict amino acids as zwitterions, i.e. molecules that contain both a positive charge and a negative charge. At extremes of the pH range, an amino acid will not be a zwitterion. At very low pH the carboxyl group will become protonated:


At very high pH the amine group loses a proton and becomes uncharged:


These acid-base equilibrium phenomena are instances of the reactivity of free amino acids.

Among the alpha amino acids the only variation possible from one to another is in the identity of the R group. The simplest R consists of a hydrogen atom: in that instance, the amino acid is glycine. The next simplest R is a methyl (CH3) group; this is alanine. The other eighteen are shown throughout chapter three of your textbook; they range from the simplest ones (glycine and alanine) up to tryptophan, which contains a fused double-ring system.

A special case, but nonetheless one of the amino acids coded for by the ribosomal protein synthetic system, is proline. It is, strictly speaking, not an amino acid at all: it is an imino acid, because the amine group is covalently bonded through three methylene groups to the alpha carbon. Thus the alpha carbon and the amine nitrogen contribute to a five-membered ring. This structure is more restricted rotationally than the ordinary amino acids. We will see the implications of that restriction in proline's role in proteins.


Every alpha-amino acid except glycine is a chiral molecule; that is, it contains at least one carbon atom with four unequivalent substituents, so that the molecule is not superimposable upon its mirror image. Glycine is nonchiral because two of the substituents on its alpha carbon are hydrogens. Two of the amino acids—isoleucine and threonine—have a second chiral center on the side chain. If you don't immediately remember what chirality is and how it works, please review what you learned about it in your organic chemistry course. We will discuss chirality repeatedly throughout the remainder of the semester.

The amino acids that make up proteins are all L-amino acids; that is, the substituents on the alpha carbon are arranged in a specific order that relates them to a reference molecule that rotates polarized light in the leftward direction. The mirror image of an L-amino acid is a D-amino acid; it is related to a reference molecule that rotates polarized light to the right. D-amino acids do play a role in a few biochemical systems, such as bacterial cell walls and some antibiotics, but they are not synthesized by the ribosomal apparatus. It is not accidental that bacteria incorporate these D-amino acids into their cell walls: most of the proteolytic enzymes (enzymes that cleave peptide bonds) that the hosts of these bacteria produce in order to destroy the bacteria act only against L-amino acids, so including D-amino acids in their cell walls confers a competitive advantage to these bacteria.

To remember what the absolute configuration of an L-amino acid is, we use the mnemonic CORN. Envision the amino acid, arranged so that you are looking down the bond from the alpha hydrogen to the alpha carbon. Specifically, you are looking from the H to the C, and examining the order in which the other three substituents on the alpha carbon appear. One of those three will be the carbonyl carbon (CO); the next, as you go around clockwise, will be the side chain (R); the last will be the nitrogen (N). Thus, CORN. For this mnemonic to be useful you must remember to start by looking down the H-C bond, and you have to remember to traverse the other three substituents clockwise. A D-amino acid would be NROC.

Abbreviations for Amino Acids

Each of the 20 ribosomally-encoded amino and imino acids has a three-letter abbreviation and a one-letter abbreviation. A few other letters get used in discussions of amino acids; these are shown in the chart as well. By convention the three-letter abbreviations are lower-case, whereas the one-letter abbreviations are upper-case. This table lists all of them, in alphabetical order by one-letter code. The columns in this table following the abbreviations will be discussed in the next section.

imino acid
sidechain 3-letter
sidechain group
alanine –CH3 ala A 2.4 9.9    
aspartate or
see below asx B 2.0-2.1 8.7-9.9 depends depends
cysteine –CH2–SH cys C 1.9 10.7 8.4 S-
aspartate –CH2–COO- asp D 2.0 9.9 3.9 COO-
glutamate –CH2–CH2–COO- glu E 2.1 9.5 4.1 COO-
phenylalanine –CH2–CH2–phenyl phe F 2.2 9.3    
glycine –H gly G 2.4 9.8    
histidine –CH2–imidazole his H 1.8 9.3 6.0 ring NH+
isoleucine –CH(–CH3)–CH2–CH3 ile I 2.3 9.8    
ile or leu* See specific aa's Xle J 2.3 ~9.8    
lysine –(CH2)4–NH3+ lys K 2.2 9.1 10.5 NH3+
leucine –CH2–CH–(CH3)2 leu L 2.3 9.7
methionine –CH2–CH2–S–CH3 met M 2.1 9.3    
asparagine –CH2–CONH2 asn N 2.1 8.7    
pyrollysine see textbook pyl O 2.2? 9.1?    
proline –(CH2)4 (cyclized) pro P 2.0 10.6    
glutamine –CH2–CH2–CONH2 gln Q 2.2 9.1    
arginine –(CH2)3–NH–C(NH2)=NH arg R 1.8 9.0 12.5 NH3+
serine –(CH2)–OH ser S 2.2 9.2 ~13 O
threonine –CH(CH3)–OH thr T 2.1 9.1 ~13 O-
selenocysteine† –CH2–SeH sec U 1.9? 10.7? 5.2 Se-
valine –CH–(CH3)2 val V 2.3 9.7    
tryptophan –CH2–indole trp W 2.5 9.4    
unknown‡ any of these Xaa X 1.8-2.4 8.7-10.7 varies varies
tyrosine –CH2p–phenyl–OH tyr Y 2.2 9.2 10.5 O-
glutamate or
see below glx Z 2.1-2.2 9.1-9.5 depends depends

Notes for table:

* The utility of the B and Z codes derives from the fact that proteins were traditionally analyzed by acid hydrolysis, which breaks the peptide bonds between amino acids, enabling quantitation of the individual amino acids in the protein. But this type of hydrolysis also converts the amide side chains of asn and gln to the carboxylate forms, asp and glu, respectively. So we cannot distinguish between asp and asn, or between glu and gln, in acid hydrolysis, and it's therefore useful to have a simple way of representing the one-or-the-other cases. Similarly, mass-spectrometric analysis cannot distinguish between leu and ile, because their molecular masses are identical. So the J code is used to identify ile or leu if the only basis for sequence information is derived from mass spectrometry.

† Selenocysteine is ribosomally encoded under special circumstances. See F.Zinoni et al (1986), Proc.Natl.Acad.Sci. 83: 4650-4654. I put question marks next to the main-chain pKa values for sec because I have been unable to find values for them. I'm reasonably confident that they're similar to the cys values.

‡ The use of X or Xaa for an unknown amino acid might perhaps be in memory of Albert Einstein's uncle Jacob, who allegedly told Albert, "algebra is a merry science. We go hunting for a little animal whose name we don't know, so we call it x. When we bag our game we pounce on it and give it its right name." Unfortunately, I do not have much confidence that this quote is really correct: I found it in four slightly different versions on the web, and it's not obvious which one, if any, is accurate. The whole story may be apocryphal, but it's entertaining.

The three-letter abbrevations are straightforward; the one-letter abbreviations are mostly logical, but in a few cases aren't. Some useful mnemonics for the one-letter abbrevations are: Whether these mnemonics will be useful to you depends on how you process information.

I do expect you to memorize the structures of the twenty standard amino acids (i.e., all of those above except sec, glx, asx, and Xaa). Most of them are pretty easy; the harder ones are the ones where, on this chart, I have cheated and put English words like "imidazole" or "indole" in place of an actual structure. Look them up and memorize them. If some of my crabbed notations in this table aren't clear, look at the structures as they're depicted in the textbook.

Amino acids: acid/base chemistry

The most obvious chemistry in which free amino acids can participate involves acid-base equilibrium at the main-chain carboxyl and amine groups:
H3N+–CHR–COO- + OH- ↔ H2N–CHR–COO- + H2O

Every amino and imino acid can undergo these interconversions. The equilbrium in these reactions is far to the left at pH values close to neutral, but at low pH equilibrium in the second reaction will lie somewhat to the right, and at high pH equilibrium in the first reaction will lie somewhat to the right.

The pKa values for these reactions—the pH values at which the reactions depicted above have 50% products and 50% reactants—depend somewhat on what sidechain (R group) is present. Thus since the pKa for deprotonating the amine group in alanine is 9.9, an aqueous solution of alanine at pH 9.9 will be half in the protonated (H3N+) form and half in the deprotonated (H2N) form. Below pH 9.9, more than half will be protonated; above 9.9, less than half will be protonated. Exactly one pH unit below the pKa, 90% of the alanine will be protonated at the amine end and 10% will be deprotonated. The pKa for the amino group of threonine is lower—about 9.1. So at pH 9.5, more than half of the threonine in a solution will be deprotonated.

Every free amino acid has at least two pKa values: the one associated with protonation of the carboxylate and the one associated with deprotonation of the amine group. Amino acids in which the side-chain itself contains an ionizable group have a third pKa value—the one one associated with protonation or deprotonation at the side chain. The full collection of pKa values appears in the table above, taken from your textbook.

Side-chain reactivity

From the table above it is clear that one of the ways in which amino acid side-chains can participate in chemical reactions is through acid-base interactions. But other kinds of chemistry occurs in side-chains as well. Sulfur atoms in cysteine and methionine can become oxidized to sulfates, sulfites, and related forms. The side-chain hydroxyl groups of serine and tyrosine can form covalent bonds to ligands, such as phosphate groups. Both of the nitrogen atoms in the imidazole side-chain of histidine can covalently bond to various ligands.

Not all side-chain reactivities involve formation of covalent bonds. Side-chain polar groups can form hydrogen bonds with other polar groups. "Salt bridges" between oppositely charged groups (e.g. the side-chain terminal amine group of lysine and the side-chain carboxylate of aspartate) are often found in proteins.

Peptides and proteins

Peptides and proteins are, respectively, oligomers and polymers of amino acids. Most are heteropolymers, i.e. the individual building blocks are not all identical. Chemists can manufacture homopolymers of amino acids, in which all the building blocks are identical, but these do not play roles in real biological systems.

A dipeptide is produced, formally, by removing water from two amino acids:
The covalent bond between the carbonyl carbon in the center of this assembly and the amide nitrogen is called an amide bond or peptide bond. The C-N bond has some double bond character because of a resonance in which the carbonyl oxygen can take on a formal negative charge and the amide nitrogen can take on a formal positive charge. This partial double-bond character obliges the six atoms of the peptide group—the main-chain carbonyl carbon, the carbonyl nitrogen, both adjoining alpha carbons, and the hydrogen attached to the nitrogen—to lie in a single plane, termed the peptide plane. Study fig. 4.6 to see how this works.

Building up a tri-, tetra-, oligo-, or polypeptide is accomplished, formally, in the same way as the creation of the dipeptide:
The way this is usually accomplished in the cell is within the mechanisms of the ribosome, where the lengthening of the polypeptide chain is accomplished under careful enzymatic control. The ligation reaction, in which the chain is lengthened by one residue, is endergonic, and the energy required to drive it is obtained from hydrolysis, not of our familiar energy currency ATP, but rather of its cousin GTP:
GTP + n-length-peptide + amino acid → GDP + Pi + (n+1)-length peptide

Ordinarily there is a free amine group at one end of the polymer and a free carboxylate at the opposite end:
But occasionally cyclic peptides are formed, in which the chain bends around and a peptide bond is formed between the terminal amine group and the terminal carboxyl group, with the usual elimination of water. This type of cyclization is not carried out at the ribosome; it is carried out by a specific enzymatic synthesis.

Side-chain reactivity in peptides and proteins

We have already mentioned that the side chains in peptides and proteins can undergo acid-base interactions. It is worth noting that in an intact protein, these are the only acid-base interactions that can occur, except for those involving the terminal amino and carboxyl groups. The amide nitrogens and carbonyl groups are fully engaged in peptide bonds in all the amino acids between the second and next-to-last amino acids in a protein and therefore cannot participate in reactions without rupturing the polypeptide chain.

Protein side chains do show other kinds of reactivity as well, in similar ways to those mentioned above in the contexts of free amino acids. The reactivities of side chains in intact proteins differ somewhat from the reactivities of side chains in free amino acids, in that the molecular environment in which the side chain finds itself is likely to be different from that of a free amino acid. Thus a zwitterionic free amino acid, even if it has a hydrophobic side chain, tends to end up in a fairly hydrophilic (water-loving) environment, so its reactivity will be characteristic of an aqueous species. By contrast, an amino acid with a hydrophobic side chain in an intact protein will usually be found in a hydrophobic environment, and its reactivity will be altered appropriately.


Disulfides are covalent bonds between sulfur atoms. The amino acid cysteine can participate in disulfide bonds. The formation of a disulfide is an oxidation-reduction reaction:
    R–SH + R'–SH + 1/2O2 → R–S–S–R' + H2O
In this equation I have designated the oxidizing agent as dioxygen; other oxidizing agents can in fact operate to produce the disulfide.

The only one of the twenty ribosomally-encoded amino acids that can produce disulfide bonds is cysteine, where the connection to the sulfur is on the methylene group in the middle. When two cysteine residues are oxidized to produce the disulfide, the resulting species (R–S–S–R') is sometimes (especially in the older literature) known as a cystine moiety. Within a protein, a disulfide bond can be produced from a pair of cysteine residues that are far apart in amino acid sequence, as long as there is an energetically favorable way to bring the two cysteines spatially close to one another. Some proteins, especially those that operate in an oxidizing environment, contain one or more disulfides that are important to their stability. We will discuss the phenomena that drive proteins to fold into a stable conformation in the next lecture, so this brief discussion is intended to whet your appetites for more details later.