Jonathan and I were talking about Eternacon on Monday and I pointed out that we really should start talking about transitioning from a nearest neighbor thermodynamics prediction model in the puzzle interface to a base pair probability (BPP) prediction model, and that we could discuss it as a community at Eternacon. Jonathan brought up the idea of somehow taking what we display in the dot plot and displaying that in the puzzle. If we truly are moving towards a base pair probability model, then letās start brainstorming about how BPP data could be communicated in Eterna puzzles.
Do we want to display BPP at the base level in the puzzle interface? What are our visualization options at the base level? And since so much āplayer puzzleā solving currently uses the energy metric, should we consider creating a BPP metric for the game? (Thanks @JR1 for the idea) The BPP metric for pseudoknots where we need to balance the two stems would be different than a BPP metric for standard puzzles.
We donāt know for certain that we are headed towards a BPP model in the game, a new algorithm could emerge, but since that looks like the most likely path right now, it would be beneficial to start envisioning possibilities.
Arcplot/arcknot uses shades of gray by default for differing probability levels. You can also specify your own colormap using the config section. The default is:
cmap: #000000#666666#999999#CCCCCC
But you can use colors instead of shades of gray. Most browsers only implement about 216 unique colors (effectively 00, 33, 66, 99, cc, ff for each of R G and B). Telling more than a few colors apart with small lines/dots is a challenge.
What arcknot does not allow at the moment is explicit user defined ābinningā boundaries. In principle that could be added. If currently converts the pairing probabilty into a logarithmic value corresponding to Free Energy values and ābinsā the values into the colors. You can change the ābinsizeā to change the scaling. At the moment the default binsize corresponds to roughly a free energy difference of 1, or if I recall a factor of 1/5.27 in probabiity.
[Update: there was a bug that prevented the proper use of a user-defined colormap. fixed in ArcKnot v3.4.6, so yes, you can set a colormap (cmap) and use it to color the pairing probabilities data.]
Iām making it even easier to create colored dotplots. Coming soon in the next beta release:
Jennifer Pearl challenged me to consider whether the energy metric (and ensemble) actually could be dropped given that the thermodynamics are real. The discussion led to this article in Chemistry World about Alphafold.
The algorithm may have learnt about energy landscapes, but it needs a little help to find the global minimum
The success of AlphaFold for predicting the structures of more than 200 million proteins, announced last August, led to excited claims that the algorithm will revolutionise biology, drug discovery and molecular medicine. That remains to be seen, but some were keen to temper the hype by pointing out that AlphaFold had not in fact āsolved the protein-folding problemā. Rather, it had sidestepped the question by using machine learning to find associations between sequence and known structures that it then generalised to unknown structures.
Perhaps we initially get a ML model with a confidence estimate or MSA strength calculation (if we are lucky), and then a researcher develops a corresponding energy calculation. Still interesting to think about possible new visualizations. I envision a 3D model where color shading indicates which areas are hot or cold so that a player can quickly see which area needs improvement.