NUPACK 4.0 in Eterna?

@LFP6 @DigitalEmbrace

NUPACK 4.0 now has native python library support for linux so I was wondering if we could use that maybe and speed up deployment of all of NUPACK functions to Eterna? Otherwise I can help code up the c++ stuff for it.

(I was prevented form replying again in the analysis thread due to reply limits and I figured this was a good place for the discussion…)

1 Like

Do you happen to know if NuPACK 4.0 can recognize any tertiary structure besides pseudoknots?

no and psuedonknots are excluded from the partition in NUPACK 4.0 it says at the start of the user manual…sorry

I just checked my code and the stuff I used for the PNAS stuff as v3.0 so i will be quiet

I am working on a project that I think the community will like as a workaround and to make my stuff and theories more accessible to community testing and consensus as that is what is important. NUPACK 4.0 is a radical change in teh codebase for the software. They have moved to a unified framework across architectures adn tehy have native python libraries tah expose everything and its fast and nice… The instructions for NUPACK that you get from the NUPACK team if you request it guids you in installin g it in linux or linux subsystem for windows. If you use linux subsystem you can install everything and access it all via jupiterbooks from your browser. Anyone can run it so I am working on a juptierbook for the community to at least start with that can let them play with this a bit. Also, some of the outputs are weird and I have to use the old documentations and knowledge and old code ot figure out what im looking at. I hope it turns out ok…

Would be cool to see NuPACK 4.0 in action. I wonder if the new version still penalizes GU closing pairs? There is an issue with our current implementation that throws a huge energy penalty for some GU closing pairs.

@DigitalEmbrace

Here is eterna4.0 in action…

https://forum.eternagame.org/t/first-version-of-nupack-4-0-jupyterlabs-script-with-mfe-and-ensemble-info/4224

I will look into the GU thing later… there is comments about closing pair energies in the energy parameters I saw that you might want to look at. specificly look at the stuff in teh nupack4.0 model specificatios in the user guide here

https://docs.nupack.org/

Couple comments here:

  • Python support doesn’t really impact usage within Eterna at all - we expect the folding engines to be C/C++ and compile them to WebAssembly via Emscripten. Python doesn’t buy us anything in this case.
  • Re: “speed up deployment of all of NUPACK functions to Eterna” - what do you mean? What functionality is missing? Is it actually functionality new in 4.0, or just not exposed via EternaScript? (If it’s the latter, updating to 4.0 doesn’t have any impact)
  • As far as updating to 4.0 in general: In general it would be good to stay updated. However, I get nervous doing this, particularly if the actual output changes. That would mean submitted solutions and cached computations may no longer be valid, some puzzles may become unsolvable, etc. We haven’t thought through how to handle this - though we probably should (and now’s probably a good time to do it during our major rewrite).
  • As far as GU closing pairs: @DigitalEmbrace I wouldn’t really call it “an issue with our current implementation”. As far as I know, this is purely NUPACK’s expected behavior. From my understanding, the idea is that these are expected never to fold correctly, so they parametrize it in a way that it will never be predicted to form. I don’t see that as a problem (or at the very least, not our problem).
1 Like

Hi LFP6,

I looked at the eternaJS source code and the FullFold.cpp and the other one are only getting the MFE secondary structure, the MFE, and the pairing probabilities. That is data from 3 specific libraries and there is no subopt functions exposed as well as ED. There are like 7 or 8 different functions under teh pfunc group that we dont have access to that is not even coded up in etenaJS and this is stuff in both 3.1 and 4.

Also, you can use the old energy parameters models from nupack_3 for example “rna99_nupac3”, and its results match so far what I have with nupack3 in my old software.

I don’t think this is something I have the bandwidth to do myself, but if someone (could be you) make a PR, I could review it. It would require:

  1. Adding the C++ bindings (such as those currently in FullFold/FullEval)
  2. Adding the TypeScript bindings (ie, expanding the Folder class and one or more of its subclasses)
  3. Registering additional callbacks via buildScriptInterface in PoseEditMode

The main thing I’d be nervous about is that presumably not all engines expose this functionality. Presumably we could have a flag (like canScoreStructures, canDotPlot, canMultifold, etc), though we’d want to be careful to design the API carefully so that if other engines can do comparable things, we don’t box ourselves in to an awkward API when trying to expand it in the future

I’l start by writing up the C++ bindings as that is the part I can help the most in right now. I test the folding using it on my own system first of course and then will do a pull of Eterna code and integrate it and test

@LFP6 @lroppy

I finished up the code for nupack subopt in eterna source code and did a pull request this afternoon so if LFP6 and Eterna are ok with how I implemented it and push the release we can have subopt whenever. I will be working on ED later as it is a bit more complex to get working that I thought as it is a specials c program with lots of file print commands that I need to filter through and change to output to strings and such in its own c++ program… basically doing a rewrite while keeping their code and intentions the same… did not want to undertake now…

@Jennifer_Pearl Will the new API cover oligo binding in NuPACK? I don’t know if pairing probabilities can even be applied to oligo binding, or applied with much utility.

The new additions to Nuapck-3.0.4 that I made to Eterna that I am waiting for Jonathan to review and implement support subopt structure and energy calculations for both a single RNA strand as well as multifold oligos. It’s actually interesting that you brought up paring probs for oligs as I was just talking to Jonathan about that and what I saw in the code. I currently in my build of nupacl-3.0.6 I use for my research I go through the painstaking process of running the oligos, their concentrations in the test environment, and masses through nupacks concentrations and complexes utilities to generate a bunch of files that then allow nupack to very accurately predict the pfunc of multfold oligo’s. I’m going to work on implementing something like that as it will probably help with accuracy and removal of some variations in the background that I think may be creating a bit of noise in the data.