The “engineered” component is about the Furin cleavage site on the sars-cov-2 spike protein.
The virus shares 92% genetic similarity to bat coronaviruses, except the spike protein, which is nearly identical to a pangolin coronavirus(which is otherwise only ~38% similar) with one key exception: The Furin cleavage site using “lab standard” sequences.
The gene sequence for the amino acids in the furin site in CoV-2 uses a very rare set of two codons, three letter words so six letters in a row, that are rarely used individually and have never been seen together in tandem in any coronaviruses in nature. But these same ‘rare in nature’ codons turn out to be the very ones that are always used by scientists in the laboratory when researchers want to add the amino acid arginine, the ones that are found in the furin site. When scientists add a dimer of arginine codons to a coronavirus, they invariably use the word, CGG-CGG, but coronaviruses in nature rarely (<1%) use this codon pair. For example, in the 580,000 codons of 58 Sarbecoviruses the only CGG pair is CoV-2; none of the other 57 sarbecoviruses have such a pair.According to Andersen, the CGG codon isn't quite as rare in coronaviruses. He also comments that the stability of the CGG codon in the Furin cleavage site has been remarkably high over the course of the pandemic, which is a hint that the CGG codon may be selected for and crucial for the virus.
Quoting him:
> Now, the codons. Here, Baltimore is talking about the two codons coding for the first two arginines (R) following the P - CGG. The CGG codon is rare in viruses because it's an example of an unmethylated "CpG" site that can be bound by TLR9, leading to immune cell activation.
> Despite being rare, however, CGG codons are found in all coronaviruses, albeit at low frequency. Specifically, of all arginine codons, CGG is used at these frequencies in these viruses:
> SARS: 5% SARS2: 3% SARSr: 2% ccCoVs: 4% HKU9: 7% FCoV: 2%
> Nothing unusual here.
> Furthermore, if we go back to the FCoV sequences and compare them to SARS-CoV-2 at the nucleotide level you'll see that FCoV also uses CGG to code for R immediately following the P. The next R is CGA (non-CpG) in FCoV, while it's CGG in SARS-CoV-2 - one nucleotide difference.
> We see CGG multiple times in different ways - here's an example comparing another "PR" stretch between SARS-CoV-2, RaTG13, and SARS-CoV in the N gene. Note how SARS-CoV-2 and RaTG13 both use CGG, while SARS-CoV-2 uses CGC for the first R, while later R's are coded by CGT or AGA
> One final point about the CGG codons in the FCS - if they were somehow "unnatural", we'd see SARS-CoV-2 evolve away from "CGG" during the ongoing pandemic. We have more than a million genomes to analyze, so what do we find if we look at synonymous mutations at the "CGG_CGG" site?
> Remarkably stable. Specifically, CGG is 99.87% conserved in the first codon and 99.84% conserved in the second.
> This is very strong evidence that SARS-CoV-2 'prefers' CGG in these positions.
CGG-CGG is the most potent furin cleavage site because it works on the outer cell membranes and on the interior. Viruses that have it will outcompete all others -- but all this means is that SARS-Cov-2 with the CGG-CGG FCS has been well adapted to humans since the beginning of the pandemic and less potent mutations haven't been able to keep up. There's no "natural/unnatural" axis to consider. The most infectious virus "prefers" to be the most infectious, indeed. It's tautological. Evidence of efficacy doesn't disprove laboratory alteration.