Prolog: An AI program that plays Eternagame

prolog · July 18, 2020, 8:09pm

I am writing an A-I program in the Prolog language that plays Eternagame. The aim is not to explicitly solve the puzzles but to provide a reasonable number of candidate solutions. I do not want to take the fun out of the game, but instead to use the emergent wisdom of the gaming community as I go along. I am sharing my results and source code on a google drive. Here is the link for it:

https://drive.google.com/drive/folders/1ROnT1-C_X8OE1CGoHJEMUFE45WB4Nd5T?usp=sharing

I may not have anything new for a couple of days. The last time I ran the program on this dinky laptop, it took over 40 minutes to complete. I’m going to try and move it to my Z600 workstation. It’s a workhorse with two Xeon chips, but it’s old, can’t be upgraded to Windows 10 and therefore it is not safe to connect it to the internet anymore. So I’ll have to work there, then put the results on a flash drive and load them onto this laptop, from whence I can update the google drive.

I’m hoping to hear from some of you.

ch1ck3n · July 19, 2020, 12:20am

i can give you a good computer

prolog · July 19, 2020, 1:46am

That sounds great. How can we work this out?

ch1ck3n · July 19, 2020, 2:14pm

i was just joking
um sorry

prolog · July 19, 2020, 6:17pm

That’s okay. No harm done.

dosoonkim · July 20, 2020, 5:19am

This looks awesome!

Could you link to some descriptions of how to interpret Prolog and some of the basic strategies you might be implementing in the program?

Specifically, does providing candidate solutions mean all of the candidates are viable solutions? Or more that it provides jumping off points that you can use to get to the end of puzzles?

prolog · July 20, 2020, 5:23pm

Here is a link to the Prolog manual:
Prolog Manual
Once you get there you can also go to the download page and get the free(!) compiler so you can try it yourself. I find the manual a bit opaque, but you youngsters who were raised on Unix may find it quite readable.

There are a couple of good books on the subject: “Programming in Prolog” by Clocksin & Mellish and “The Art of Prolog” by Sterling and Shapiro.

Here’s a very quick primer for those who haven’t gotten the emails I sent you:
:- means ‘if’.
. means ‘or’.
, means ‘and’, except when it’s just an argument delimiter within () or .
Variables start with an uppercase letter.
constants start with a lowercase letter or a number.
Square brackets enclose a list.
is the empty list.
[H|T] is a list that starts with the head, H and ends with the tail T.
Subroutines are called predicates.
A predicate may have multiple clauses, separated by commas (again meaning ‘and’).

Your second question is one that also occupies my mind. The easiest thing to do is to generate all possible combinations of bases that are not locked in the puzzle description, thus producing an astronomical number of possible sequences. To pair that down I apply rules. The first is to specify that all pairs which are bonded in the final structure consist of bond pairs (u-a, c-g and u-g) in the sequence. In some cases it is necessary to add an empty cell (e) to make things line up. The e is then deleted before output.

Then there are more heuristic limits. These include requiring strong (c,g) bonds when closing loops and requiring that (u,a) bonds alternate along the length of the RNA when possible. I will be looking for a lot more of these in the future.

So, the idea is to provide output that is more than randomly likely to succeed, is a much smaller set than all that could be imagined, but still leaves room for those “outside the box” solutions, which, I believe are the main objective of the Open Vaccine project. I have added a Focus variable, which may be set to w(wide), m(medium), or n(narrow) to allow for the application of more or fewer rules during the run.

The sets for the first two puzzles are different. Of course, finding a solution to either of these is simple. However, if we take the problem at it’s word, then most solutions will have more than five of the required base. You can’t actually get to those from within the game. I think there are about 12K solutions for each one and I think that the output sets enumerate all of them. I can’t say that i have either counted or checked them all.

A few quick words on the google drive.
The source file is Rnafold.pl
Text (.txt) files are the raw output from the program.
File 1 through 11 are for the first eleven puzzles.
gs1 is for the next puzzle (#1 in gene synthesizer).
A “_w”, " _m" or " _n" in the file name indicates it was done with a specific focus.
Files with the extension .rtf have been edited with comments and better readability when printed.

This is getting to be a pretty long reply, so I think I will stop now.

prolog · July 21, 2020, 8:33pm

One of my big problems right now is that the only way I have to find out which of the ‘candidates’ is an actual solution is to enter each sequence through the GUI of eternagame. This is pretty much impractical for more than a few sequences. I could get much more useful results if there were some way to run my output file, as a whole, through eternagame and have it tell me which ones worked.

quantropy · July 22, 2020, 8:06am

The folding engines in Eterna are buried deep in the code (for efficiency, but also possibly for copyright reasons) and I don’t think that there is any way to use them to test a design automatically. But the code for most of the folding engines is available online (mostly in C++, but there’s also Vienna 1 code in javascript) and so it should be possible to compile these to check your designs.

LFP6 · July 22, 2020, 3:46pm

The engines themselves are readily available from their original creators, and our custom patches and compilation instructions for usage within our game are available here: EternaJS/lib at master · eternagame/EternaJS · GitHub

We compile with emscripten to webassembly so that it’s usable within a browser, however that of course isn’t a necessity (they’re originally built to compile as CLI apps). One thing to note though is that if you use the models directly, you won’t get information on whether a design passes constraints - you just get what the model spits out (MFE, structure energies, etc).

If you want to work within the bounds of Eterna itself, you can write an in-game booster to run through the possibilities and check if it is satisfied or not.

Feel free to send questions my way on any of this - I’m happy to do what I can to clarify.

prolog · July 22, 2020, 9:36pm

Thanks so much. This may take me a while. I just got my brain reconfigured for Prolog and it’s been a while since I worked in C. I guess my first order is to get a C++ compiler. I suspect I will be back with some questions.

prolog · July 23, 2020, 3:04am

I’ve read this a little more thoroughly now and see that I don’t need my own C compiler and IDE. That’s a relief! I have a few things on my plate now, like implementing multi-threading in Prolog, making the next steps in rnafold.pl and trying to learn a few things about RNA. I will be getting back to this and I am sure it will be a great help in the near future.

prolog · July 24, 2020, 2:27pm

I have placed a document on the google drive (link in original post). “Rnafold_SDD.docx” is a Software Design Description which presents one way in which Rnafold could be integrated with the eternagame software. It is not a very detailed description at this point, but is intended as a starting point.

prolog · July 26, 2020, 11:46pm

There are four new files on the google drive. One is just the latest version of rnafold.pl. The other three are output files for gs1 (gene synthesizer level 1) with wide medium and narrow focus. Note the difference in file size. The run times were similar (36 to 41 min.). I think this is because, although the narrower focus produces fewer results, it does extra crunching to weed out the less likely ones.

wayment · July 27, 2020, 12:04am

Wanted to put a plug here for an open source python package that I’ve developed and use for analyzing RNA structure: GitHub - DasLab/arnie: Python utility to estimate, compare, and reweight RNA energetics across many secondary structure algorithms. . It includes instructions for downloading packages and then calling them all through a python interface.

prolog · July 27, 2020, 5:10pm

This looks extremely interesting and will take me a while to digest. I am wondering if you have already done what I am working on? I’m taking an expert system, rather than neural net approach to finding likely solutions to eternagame puzzles. I have always thought that the best thing was to combine the two, but that is not easily done.

I can see that you have done a lot along the lines of “give me a sequence and I will predict the structure.” I am not yet clear on whether you have done “give me a structure and I will predict one or more sequences that will produce it.”

prolog · July 27, 2020, 5:16pm

Oops!
I missed the “no c-g bond” restriction in gs1. I have replaced the wide and narrow focus results and eliminated the medium (ran out of rules). The file sizes have decreased significantly. I also replaced the current version of rnafold.pl.

ch1ck3n · July 27, 2020, 5:19pm

welcome wayment to eterna forums!

prolog · July 27, 2020, 5:25pm

[prolog eBook]
Here’s another link to what I think is a good Prolog tutorial.

prolog · July 28, 2020, 4:16pm

github says to install Python 2.7.12. But Python wants to download version 3.8.5. Will that work?