Base pair probability depiction in Eterna puzzles

DigitalEmbrace · May 12, 2023, 9:50pm

Jonathan and I were talking about Eternacon on Monday and I pointed out that we really should start talking about transitioning from a nearest neighbor thermodynamics prediction model in the puzzle interface to a base pair probability (BPP) prediction model, and that we could discuss it as a community at Eternacon. Jonathan brought up the idea of somehow taking what we display in the dot plot and displaying that in the puzzle. If we truly are moving towards a base pair probability model, then let’s start brainstorming about how BPP data could be communicated in Eterna puzzles.

Do we want to display BPP at the base level in the puzzle interface? What are our visualization options at the base level? And since so much “player puzzle” solving currently uses the energy metric, should we consider creating a BPP metric for the game? (Thanks @JR1 for the idea) The BPP metric for pseudoknots where we need to balance the two stems would be different than a BPP metric for standard puzzles.

We don’t know for certain that we are headed towards a BPP model in the game, a new algorithm could emerge, but since that looks like the most likely path right now, it would be beneficial to start envisioning possibilities.

Jennifer_Pearl · May 14, 2023, 12:14am

I do this to show pairing relationships for a design. It would get messy for a big one…

definitely setting filter levels for probs would be good so you can decide what strengths you want to see

jnicol · May 14, 2023, 11:26am

There are Dot Plot puzzles that depict the degree of base pairing probabilities which might be another way of visualizing the concept, Eterna

JR1 · May 14, 2023, 3:16pm

Here is a article discussing BPP:

Jennifer_Pearl · May 17, 2023, 12:53am

I opened the doc and the first thing I saw was this

and it is a good example of filter levels, in that each color is a different filter level. I think that this implemented with arcplot would be great.

jandersonlee · May 18, 2023, 12:00am

Arcplot/arcknot uses shades of gray by default for differing probability levels. You can also specify your own colormap using the config section. The default is:

cmap: #000000 #666666 #999999 #CCCCCC

But you can use colors instead of shades of gray. Most browsers only implement about 216 unique colors (effectively 00, 33, 66, 99, cc, ff for each of R G and B). Telling more than a few colors apart with small lines/dots is a challenge.

What arcknot does not allow at the moment is explicit user defined ‘binning’ boundaries. In principle that could be added. If currently converts the pairing probabilty into a logarithmic value corresponding to Free Energy values and “bins” the values into the colors. You can change the “binsize” to change the scaling. At the moment the default binsize corresponds to roughly a free energy difference of 1, or if I recall a factor of 1/5.27 in probabiity.

[Update: there was a bug that prevented the proper use of a user-defined colormap. fixed in ArcKnot v3.4.6, so yes, you can set a colormap (cmap) and use it to color the pairing probabilities data.]

I’m making it even easier to create colored dotplots. Coming soon in the next beta release:

cmap: 5/RHGCB

DigitalEmbrace · August 2, 2023, 11:30am

Jennifer Pearl challenged me to consider whether the energy metric (and ensemble) actually could be dropped given that the thermodynamics are real. The discussion led to this article in Chemistry World about Alphafold.

The algorithm may have learnt about energy landscapes, but it needs a little help to find the global minimum

The success of AlphaFold for predicting the structures of more than 200 million proteins, announced last August, led to excited claims that the algorithm will revolutionise biology, drug discovery and molecular medicine. That remains to be seen, but some were keen to temper the hype by pointing out that AlphaFold had not in fact ‘solved the protein-folding problem’. Rather, it had sidestepped the question by using machine learning to find associations between sequence and known structures that it then generalised to unknown structures.

Perhaps we initially get a ML model with a confidence estimate or MSA strength calculation (if we are lucky), and then a researcher develops a corresponding energy calculation. Still interesting to think about possible new visualizations. I envision a 3D model where color shading indicates which areas are hot or cold so that a player can quickly see which area needs improvement.

KangLeTian · January 31, 2024, 11:18pm

I really want to make one puzzle with more than 1/3 gu pairs that’s a harder version of one of the 1000 point puzzles.