Main Content

Ground-State Protein Folding Using Variational Quantum Eigensolver (VQE)

Note

Installation Required: This functionality requires MATLAB Support Package for Quantum Computing.

This example shows an efficient method for using qubits to encode a protein fold on a 3-D tetrahedral lattice [1],[2]. The ground-state is found through a simulated variational quantum eigensolver (VQE) routine. The final circuit from the simulation is run on a real QPU for comparison. Running this example requires Global Optimization Toolbox as well as MATLAB® Support Package for Quantum Computing.

The protein is a neuropeptide with seven amino acids, APRLRFY, pictured below. The example assumes a coarse-grained protein model, where "beads" representing amino acids can traverse the lattice and interact with each other. Each bond between amino acids can be in one of four directions, corresponding to corners of a tetrahedron. These four turns can be represented by two qubits.

Image of APRLRFY neuropeptide molecule.

Model Protein Fold with Qubits

The purpose of the embedding methods of [1] and [2] is to model the physical pairwise interaction energy of the protein beads while also imposing energy penalties on unrealistic configurations (such as overlaps or beads on the same lattice site). Therefore, the example is configurable in that you can modify the amino acid letters to test the effects on folding. However, the example code assumes a protein with seven amino acids, so you cannot change the length of the protein string.

Configuration Qubits

A protein of N beads (here N=7) can make N-1 turns, therefore 2*(N-1)=12 bits are required for the APRLRFY protein bonds. However, without loss of generality, the first two turns can be fixed to 01 and 00. One of the bits in the third turn is fixed due to other symmetry considerations. The turn2qubit mapping depicts the 12 bits, denoting the value of the 5 that are fixed, and the 7 that are variable which will be represented by qubits. For details, see the dense encoding scheme in [2].

hyperParams.protein = 'APRLRFY';                     
hyperParams.turn2qubit = '0100q1qqqqqq';            
hyperParams.numQubitsConfig = sum(hyperParams.turn2qubit=='q');

Interaction Qubits

The methods of [1] and [2] can consider interactions between an arbitrary number of nearest-neighbors (NN). This example only considers 1-NN interaction terms, which are only possible between some beads due to the structure of the lattice. Beads separated by less than 5 bonds cannot be 1-NN (Eq. 29 of [2]). There are two interaction pairs in the length-7 protein, between beads 1 and 6, and between beads 2 and 7. Two interaction bits are used to specify whether there is any interaction between the beads in each of these pairs.

hyperParams.numQubitsInteraction = 2;

A matrix of contact energy values is used as a look-up table for pairwise interaction between different amino acids. For this example, the values are chosen at random.

Save the buildMJInteractions function in a separate file on the path.

function mat = buildMJInteractions(protein)
% Create the MJ interaction energy matrix for the protein, specified with
% 1-letter codes 
N = length(protein);
mat = zeros(N,N);
rng(29507)
MJ = rand(20)*-6;
MJ = triu(MJ) + triu(MJ, 1)';
acids = ["C","M","F","I","L","V","W","Y","A","G","T","S","N","Q","D","E","H","R","K","P"];
acid2idx = dictionary(acids, 1:20);
for i = 1:N
    for j=1:N
        mat(i,j) = MJ(acid2idx(protein(i)), acid2idx(protein(j)));
    end
end
end

Call the buildMJInteractions function to find the interaction energies for the protein.

hyperParams.interactionEnergy = buildMJInteractions(hyperParams.protein);

Write Function to Calculate Energy of Folds

The exactHamiltonian function calculates the energy for each possible fold of the protein. Only 1-NN interactions are considered, and energy penalties are imposed for unrealistic configurations.

Save the exactHamiltonian function in a separate file on the path.

function energies = exactHamiltonian(bitstrings,hyperParams)
% Compute the Hamiltonian for each bit string (i.e., the energy for each fold).
% See [2] for details. This does not consider the Hch constraint from
% side-chains and the interaction term is only 1-nearest-neighbor (1-NN).

% H = Hgc + Hin_1

lambdaDis = 720;    % Penalty for interaction distance
lambdaLoc = 20;     % Penalty for interaction location
lambdaBack = 50;    % Penalty for unphysical geometry

energies = zeros(size(bitstrings,1),1);
numBeads = length(hyperParams.protein);

for idx = 1:length(energies)
    bitstring = bitstrings(idx,:);
    config = hyperParams.turn2qubit; 
    config(config=='q') = bitstring(1:hyperParams.numQubitsConfig);
    turns = bin2dec(reshape(config,2,numBeads-1)');
    
    %% Geometric Hamiltonian Hgc

    % Count number of adjacent turns which are equal and impose a penalty for each
    energies(idx) = lambdaBack*sum(turns(1:end-1) == turns(2:end));
  
    %% 1-NN Interaction Hamiltonian Hin
    currInteractionQubit = hyperParams.numQubitsConfig;
    for i=1:(numBeads-4) 
       for j=(i+5):2:numBeads 
           
           currInteractionQubit = currInteractionQubit+1;
           if bitstring(currInteractionQubit)=='0'
               continue;
           end       
            
           % Add the interaction energy 
           energies(idx) = energies(idx) + hyperParams.interactionEnergy(i,j);
            
           % Compute distances between interacting beads
           deltaN_ij = zeros(1,4);
           deltaN_ir = zeros(1,4);
           deltaN_mj = zeros(1,4);
           for k=0:3
               deltaN_ij(k+1) = (-1).^(i:j-1)*(turns(i:j-1) == k);
               deltaN_ir(k+1) = (-1).^(i:j-2)*(turns(i:j-2) == k);
               deltaN_mj(k+1) = (-1).^(i+1:j-1)*(turns(i+1:j-1) == k);
           end
           d_ij = norm(deltaN_ij)^2;
           d_ir = norm(deltaN_ir)^2;
           d_mj = norm(deltaN_mj)^2;
            
           % Add penalty for distance not equal to 1
           energies(idx) = energies(idx) + lambdaDis*(d_ij-1);         
            
           % Add penalty for unphysical nearest neighbour collisons
           energies(idx) = energies(idx) + lambdaLoc*(2-d_ir);
           energies(idx) = energies(idx) + lambdaLoc*(2-d_mj);
           if i-1 >= 1
               for k=0:3
                   deltaN_mj(k+1) = (-1).^(i-1:j-1)*(turns(i-1:j-1) == k);
               end
               d_mj = norm(deltaN_mj)^2;
               energies(idx) = energies(idx) + lambdaLoc*(2-d_mj);
           end
           
           if j+1 <= numBeads
               for k=0:3
                   deltaN_ir(k+1) = (-1).^(i:j)*(turns(i:j) == k);
               end
               d_ir = norm(deltaN_ir)^2;
               energies(idx) = energies(idx) + lambdaLoc*(2-d_ir);
           end
       end
    end
end
end

Compute Minimum Energy for All Folds

Nine qubits are required to fully model a fold (7 configuration qubits and 2 interaction qubits). Use the exactHamiltonian function directly with all possible bit strings (each representing a possible fold), to find the minimum energy fold. You can later compare this value to the result of the quantum-based optimization. This exhaustive search is realistic for small proteins, but is not practical for larger proteins since the number of combinations of folds grows quickly.

hyperParams.numQubitsTotal = hyperParams.numQubitsConfig + hyperParams.numQubitsInteraction; 

allFolds = dec2bin(0:2^hyperParams.numQubitsTotal-1,hyperParams.numQubitsTotal);
allEnergies = exactHamiltonian(allFolds,hyperParams);

hyperParams.GroundState.Energy = min(allEnergies);
hyperParams.GroundState.Index = find(allEnergies == hyperParams.GroundState.Energy);
allFolds(hyperParams.GroundState.Index,:)
ans = 
    '101000110'
    '101001010'

There are two lowest-energy folds because the last amino acid can occupy either position in the lattice. Find the energy of each of the identified folds.

allEnergies(hyperParams.GroundState.Index)
ans = 2×1    
   -4.7973
   -4.7973

This minimum-energy value matches the interaction energy between beads 1 and 6.

Write CVaR-VQE Objective Function

Next, use a quantum circuit to translate this optimization problem over a set of bit strings into a different optimization problem, this one over a set of angles between -pi and pi. These angles have no connection to the angles between the protein beads. The objective function accepts the set of angles for use inside of the quantum circuit, and the circuit returns various bit strings with different probabilities. Call the exactHamiltonian function on the bit strings returned by the quantum circuit, and use a weighted mean of the smaller energies computed as the value of the objective function.

Specifically, after the energy is computed for each observed fold, the associated probabilities are sorted by energy. The objective function returns an expectation energy computed from the tail end of the probability distribution, cutoff by an alpha parameter. This expectation energy is a conditional value at risk (CVaR). An alpha value of 0.05 was used experimentally in [1], but for a noise-free simulation ProteinVQEObjective uses a smaller cutoff value of 0.025.

Save the ProteinVQEObjective function in a separate file on the path.

function [energy, maxProbFold] = ProteinVQEObjective(parameters,hyperParams)    
% Construct and simulate the variational circuit 
ansatz = ProteinConfigAnsatz(parameters);
qState = simulate(ansatz);
allProbs = (qState.Amplitudes).^2;

% There are 10 qubits in the circuit, but only these 9 are used to define a fold 
foldQubits = [1:8 10];

% Get the most probable fold
% Only compute this if second output maxProbFold is requested
if nargout > 1 
    [~,idx] = max(allProbs);
    maxProbKet = char(qState.BasisStates(idx));
    maxProbFold = maxProbKet(foldQubits);
end

% Sample, and query/get the states and probabilities of the fold qubits 
qMeasurement = randsample(qState, hyperParams.numShots);
[states, probs] = querystates(qMeasurement, foldQubits);

% Sort the probabilities by the energy
[energies,sort_idx] = sort(exactHamiltonian(char(states), hyperParams));
probs = probs(sort_idx);

% Compute CVaR over the low energy tail of the energy distribution,
% delimited by a cutoff parameter alpha. 
alpha = .025;  
cut_idx = nnz(cumsum(probs) < alpha);
cvar_probs = probs(1:cut_idx);
cvar_probs(end+1) = alpha - sum(cvar_probs);

% Compute expectation energy as the sum of cutoff state energies weighted by their probability 
energy = dot(cvar_probs, energies(1:cut_idx+1))/alpha;
end

Define the number of shots to use in ProteinVQEObjective, and create a function handle that passes in all of the parameter values to ProteinVQEObjective.

hyperParams.numShots = 1024; 
objFcn = @(theta) ProteinVQEObjective(theta,hyperParams);

Create Circuit Ansatz

Create and plot the variational circuit ansatz for this protein with random angles. Qubits 1-7 represent the configuration, and the other three qubits are for the interaction (including a helper qubit that does not get measured). This circuit can be found as Figure 3 in [1].

Save the ProteinConfigAnsatz function in a separate file on the path.

function ansatz = ProteinConfigAnsatz(parameters)
% Create the circuit ansatz for a 7 amino acid neuropeptide (10 qubit circuit).
parameters = reshape(parameters, [2 9]);

gates = [hGate([1:7 9 10])
         ryGate([1:7 9 10], parameters(1,:))
         cxGate(1:4, 2:5)
         cxGate(5,10)
         cxGate(10,9)
         cxGate(9,8)
         cxGate(8,9)
         cxGate(9,8)
         cxGate([8 7 6], [7 6 1])
         ryGate([1:8 10], parameters(2,:))
        ];

ansatz = quantumCircuit(gates);
end

Call ProteinConfigAnsatz with random angles to construct the circuit. Plot the circuit to view the qubits and gates.

ansatz = ProteinConfigAnsatz(rand(2,9));
plot(ansatz)

Image of protein circuit gates.

Simulate Iterations of CVaR-VQE

There are two layers of RY rotation gates over the configuration and interaction qubits. These rotation angles are learned parameters in the variational circuit. Use surrogateopt (Global Optimization Toolbox) with the objective function to find the set of RY rotation angles that produce the minimum expectation energy. The original paper [1] uses a genetic algorithm to find the rotation angles. surrogateopt converges to similar values, but because it uses intervals internally, the convergence behavior is comparatively different.

Define the number of angles and set options for the maximum number of function evaluations, plotting function, and initial points. Then call surrogateopt with the objective function, upper and lower bounds for the angles, and options.

numAngles = 2*hyperParams.numQubitsTotal; 
rng default

options = optimoptions("surrogateopt",...
    "MaxFunctionEvaluations",10, ...
    "PlotFcn","optimplotfval",...
    "InitialPoints",pi*ones(numAngles,1));

lb = repmat(-pi,numAngles,1);
ub = repmat(pi,numAngles,1);
[angles,minEnergy] = surrogateopt(objFcn,lb,ub,[],[],[],[],[],options);

Image of optimization plot for protein objective function.

surrogateopt stopped because it exceeded the function evaluation limit set by 
'options.MaxFunctionEvaluations'.

Call the objective function using the optimized angles to find the state bit string with highest probability as well as its associated energy.

[groundStateEnergy,groundStateFold] = ProteinVQEObjective(angles,hyperParams)
groundStateEnergy = -4.7973
groundStateFold = '101000110'

This ground-state fold is one of the two folds obtained earlier with the minimum energy calculation.

allFolds(allEnergies==minEnergy,:)
ans =

  2×9 char array

    '101000110'
    '101001010'

To visualize the ground-state fold, save the plotProtein function in a separate file on the path. This function enables you to plot the protein structure from the qubit values.

function plotProtein(bitstring,hyperParams)
% Plot protein structure from the bitstring

% The input bitstring is expected to be of length 9, with the first 7 bits
% specifying turns in direction in the structure, and the last 2 bits
% specifying interactions between beads.

% Number of beads
N = length(hyperParams.protein);

% Construct 3D coordinates representing the 4 corners of a tetrahedron
% centered in 0. These represent the 4 directions each new bead might
% be added on in a tetrahedral grid.
turn2bead = ones(4,3);
turn2bead(2:4,:) = -1+2*eye(3);

% Construct complete bitstring by inserting input bitstring into
% the complete string mask.
completeBitstring = hyperParams.turn2qubit;
completeBitstring(completeBitstring=='q') = bitstring(1:hyperParams.numQubitsConfig);

 % Each pair of bits is converted into a number between 0 and 3
turns = bin2dec(reshape(completeBitstring,2,[])');

% Change direction in which we follow the tetrahedral grid in each bead
% on the line.
signs = (-1).^(0:N-1)';

% Compute placements of each bead.
beads = cumsum(signs.*[zeros(1,3);turn2bead(turns+1,:)]);

% Plot the beads connecting by lines, and add a text label to each
figure
plot3(beads(:,1),beads(:,2),beads(:,3),'.-','LineWidth',2,'MarkerSize',80,'SeriesIndex',1)
axis off

viewDir = -[6 4 1];
view(viewDir);

% Plot text, ensuring it is in front of the lines
beadsText = beads + 0.03*viewDir;
text(beadsText(:,1),beadsText(:,2),beadsText(:,3),hyperParams.protein', ...
    'FontWeight', 'bold', 'HorizontalAlignment','center')

% Whether there are interactions between any pair of beads is determined
% by the additional bits in the string.
interactions = [];
currInteractionQubit = hyperParams.numQubitsConfig+1;

for i=1:(N-5)
    for j=(i+5):2:N
        if bitstring(currInteractionQubit) == '1'
            interactions = [interactions;beads(i,:);beads(j,:);nan*ones(1,3)]; %#ok<AGROW>
        end
        currInteractionQubit = currInteractionQubit+1;
    end
end

if ~isempty(interactions)
    hold on
    plot3(interactions(:,1),interactions(:,2),interactions(:,3),'k--','LineWidth',2)
    hold off
    legend([hyperParams.protein+" Protein Structure";"Interactions"], "Location","southoutside")
else
    legend(hyperParams.protein+" Protein Structure", "Location","southoutside")
end
end

The protein has two possible interactions considered in the 1-NN model. The fold with the lowest energy only has one of these interactions. Use plotProtein to visualize the lowest energy fold.

plotProtein(groundStateFold,hyperParams)

Plot of ground-state fold for APRLRFY protein

Next, construct the quantum circuit using the optimized angles and simulate the circuit to see the expected probability distribution over states. Use a threshold to filter out states with probabilities less than 2%. The ground-state fold 101000110 appears again as the state with the highest probability.

optimized_circuit = ProteinConfigAnsatz(angles);
sv = simulate(optimized_circuit);
histogram(sv,[1:8 10],Threshold=0.02)

Image of simulated states for protein folds.

Run Final Iteration on QPU

Connect to the IonQ device using quantum.backend.QuantumDeviceAWS. Specify the region of the device and path to a bucket to store results.

reg = "us-east-1";
bucketPath = "s3://amazon-braket-bat365/doc-examples";
device = quantum.backend.QuantumDeviceAWS("Harmony",S3Path=bucketPath,Region=reg)

Create a task to run the circuit ansatz with optimized angles on the QPU. Specify the number of shots as 1,000.

task = run(optimized_circuit,device,NumShots=1000);
wait(task)

Fetch the results and plot a histogram of the states for all but the ninth qubit. Specify a lower threshold of 2% to filter out unlikely states.

results = fetchOutput(task);
histogram(results,[1:8 10],Threshold=0.02)

Histogram of states and their probabilities as obtained from a QPU

The ground-state fold 101000110 appears again as the state with the highest probability, so the QPU results agree with the local simulation of the circuit.

References

[1] Robert, Anton, Panagiotis Kl. Barkoutsos, Stefan Woerner, and Ivano Tavernelli. “Resource-Efficient Quantum Algorithm for Protein Folding.” Npj Quantum Information 7, no. 1 (February 17, 2021): 38. https://doi.org/10.1038/s41534-021-00368-4.

[2] Robert, Anton, Panagiotis Kl. Barkoutsos, Stefan Woerner, and Ivano Tavernelli. “Supplementary Information for 'Resource-Efficient Quantum Algorithm for Protein Folding'" Npj Quantum Information 7, no. 1 (February 17, 2021): 38. https://doi.org/10.1038/s41534-021-00368-4.

See Also

Related Topics