Base pair probability depiction in Eterna puzzles

Jonathan and I were talking about Eternacon on Monday and I pointed out that we really should start talking about transitioning from a nearest neighbor thermodynamics prediction model in the puzzle interface to a base pair probability (BPP) prediction model, and that we could discuss it as a community at Eternacon. Jonathan brought up the idea of somehow taking what we display in the dot plot and displaying that in the puzzle. If we truly are moving towards a base pair probability model, then letā€™s start brainstorming about how BPP data could be communicated in Eterna puzzles.

Do we want to display BPP at the base level in the puzzle interface? What are our visualization options at the base level? And since so much ā€œplayer puzzleā€ solving currently uses the energy metric, should we consider creating a BPP metric for the game? (Thanks @JR1 for the idea) The BPP metric for pseudoknots where we need to balance the two stems would be different than a BPP metric for standard puzzles.

We donā€™t know for certain that we are headed towards a BPP model in the game, a new algorithm could emerge, but since that looks like the most likely path right now, it would be beneficial to start envisioning possibilities.

I do this to show pairing relationships for a design. It would get messy for a big oneā€¦

definitely setting filter levels for probs would be good so you can decide what strengths you want to see

1 Like

There are Dot Plot puzzles that depict the degree of base pairing probabilities which might be another way of visualizing the concept, Eterna

1 Like

Here is a article discussing BPP:

1 Like

I opened the doc and the first thing I saw was this

and it is a good example of filter levels, in that each color is a different filter level. I think that this implemented with arcplot would be great.

Arcplot/arcknot uses shades of gray by default for differing probability levels. You can also specify your own colormap using the config section. The default is:

cmap: #000000 #666666 #999999 #CCCCCC

But you can use colors instead of shades of gray. Most browsers only implement about 216 unique colors (effectively 00, 33, 66, 99, cc, ff for each of R G and B). Telling more than a few colors apart with small lines/dots is a challenge.

What arcknot does not allow at the moment is explicit user defined ā€˜binningā€™ boundaries. In principle that could be added. If currently converts the pairing probabilty into a logarithmic value corresponding to Free Energy values and ā€œbinsā€ the values into the colors. You can change the ā€œbinsizeā€ to change the scaling. At the moment the default binsize corresponds to roughly a free energy difference of 1, or if I recall a factor of 1/5.27 in probabiity.

[Update: there was a bug that prevented the proper use of a user-defined colormap. fixed in ArcKnot v3.4.6, so yes, you can set a colormap (cmap) and use it to color the pairing probabilities data.]

Iā€™m making it even easier to create colored dotplots. Coming soon in the next beta release:

  • cmap: 5/RHGCB

1 Like

Jennifer Pearl challenged me to consider whether the energy metric (and ensemble) actually could be dropped given that the thermodynamics are real. The discussion led to this article in Chemistry World about Alphafold.

The algorithm may have learnt about energy landscapes, but it needs a little help to find the global minimum

The success of AlphaFold for predicting the structures of more than 200 million proteins, announced last August, led to excited claims that the algorithm will revolutionise biology, drug discovery and molecular medicine. That remains to be seen, but some were keen to temper the hype by pointing out that AlphaFold had not in fact ā€˜solved the protein-folding problemā€™. Rather, it had sidestepped the question by using machine learning to find associations between sequence and known structures that it then generalised to unknown structures.

Perhaps we initially get a ML model with a confidence estimate or MSA strength calculation (if we are lucky), and then a researcher develops a corresponding energy calculation. Still interesting to think about possible new visualizations. I envision a 3D model where color shading indicates which areas are hot or cold so that a player can quickly see which area needs improvement.

1 Like

I really want to make one puzzle with more than 1/3 gu pairs thatā€™s a harder version of one of the 1000 point puzzles.