First we cover some basics about adaptive sampling to get you going.

We will briefly talk about

resources
files
generators
how to run a simple trajectory

Imports¶

In [1]:

import sys, os

Alright, let’s load the package and pick the Project since we want to start a project

In [2]:

from adaptivemd import Project

Let’s open a project with a UNIQUE name. This will be the name used in the DB so make sure it is new and not too short. Opening a project will always create a non-existing project and reopen an exising one. You cannot chose between opening types as you would with a file. This is a precaution to not accidentally delete your project.

In [3]:

# Use this to completely remove the example-worker project from the database.
Project.delete('tutorial')

In [4]:

project = Project('tutorial')

Now we have a handle for our project. First thing is to set it up to work on a resource.

The `Resource`¶

What is a resource?¶

A Resource specifies a shared filesystem with one or more clusteres attached to it. This can be your local machine or just a regular cluster or even a group of cluster that can access the same FS (like Titan, Eos and Rhea do).

Once you have chosen your place to store your results it is set for the project and can (at least should) not be altered since all file references are made to match this resource.

Let us pick a local resource on your laptop or desktop machine for now. No cluster / HPC involved for now.

In [6]:

from adaptivemd import LocalResource

We now create the Resource object

In [7]:

resource = LocalResource()

Since this object defines the path where all files will be placed, let’s get the path to the shared folder. The one that can be accessed from all workers. On your local machine this is trivially the case.

In [8]:

resource.shared_path

Out[8]:

'$HOME/adaptivemd/'

Okay, files will be placed in $HOME/adaptivemd/. You can change this using an option when creating the Resource

LocalCluster(shared_path='$HOME/my/adaptive/folder/')

If you are interested in more information about Resource setup consult the documentation about Resource

Last, we save our configured Resource and initialize our empty prohect with it. This is done once for a project and should not be altered.

In [17]:

project.initialize(resource)

Files¶

In [18]:

from adaptivemd import File, Directory

First we define a File object. Instead of just a string, these are used to represent files anywhere, on the cluster or your local application. There are some subclasses or extensions of File that have additional meta information like Trajectory or Frame. The underlying base object of a File is called a Location.

We start with a first PDB file that is located on this machine at a relative path

In [21]:

pdb_file = File('file://../files/alanine/alanine.pdb')

File like any complex object in adaptivemd can have a .name attribute that makes them easier to find later. You can either set the .name property after creation, or use a little helper method .named() to get a one-liner. This function will set .name and return itself.

For more information about the possibilities to specify filelocation consult the documentation for File

In [ ]:

pdb_file.name = 'initial_pdb'

The .load() at the end is important. It causes the File object to load the content of the file and if you save the File object, the actual file is stored with it. This way it can simply be rewritten on the cluster or anywhere else.

In [ ]:

pdb_file.load()

Generators¶

TaskGenerators are instances whose purpose is to create tasks to be executed. This is similar to the way Kernels work. A TaskGenerator will generate Task objects for you which will be translated into a ComputeUnitDescription and executed. In simple terms:

The task generator creates the bash scripts for you that run a simulation or run pyemma.

A task generator will be initialized with all parameters needed to make it work and it will now what needs to be staged to be used.

In [48]:

from adaptivemd.engine.openmm import OpenMMEngine

A task generator that will create jobs to run simulations. Currently it uses a little python script that will excute OpenMM. It requires conda to be added to the PATH variable or at least openmm to be installed on the cluster. If you setup your resource correctly then this should all happen automatically.

So let’s do an example for an OpenMM engine. This is simply a small python script that makes OpenMM look like a executable. It run a simulation by providing an initial frame, OpenMM specific system.xml and integrator.xml files and some additional parameters like the platform name, how often to store simulation frames, etc.

In [49]:

engine = OpenMMEngine(
    pdb_file=pdb_file,
    system_file=File('file://../files/alanine/system.xml').load(),
    integrator_file=File('file://../files/alanine/integrator.xml').load(),
    args='-r --report-interval 1 -p CPU'
).named('openmm')

We have now an OpenMMEngine which uses the previously made pdb File object and uses the location defined in there. The same for the OpenMM XML files and some args to run using the CPU kernel, etc.

Last we name the engine openmm to find it later.

In [50]:

engine.name

Out[50]:

'openmm'

Next, we need to set the output types we want the engine to generate. We chose a stride of 10 for the master trajectory without selection and a second trajectory with only protein atoms and native stride.

Note that the stride and all frame number ALWAYS refer to the native steps used in the engine. In out example the engine uses 2fs time steps. So master stores every 20fs and protein every 2fs

In [51]:

engine.add_output_type('master', 'master.dcd', stride=10)
engine.add_output_type('protein', 'protein.dcd', stride=1, selection='protein')

In [52]:

from adaptivemd.analysis.pyemma import PyEMMAAnalysis

The instance to compute an MSM model of existing trajectories that you pass it. It is initialized with a .pdb file that is used to create features between the $c_\alpha$ atoms. This implementaton requires a PDB but in general this is not necessay. It is specific to my PyEMMAAnalysis show case.

In [53]:

modeller = PyEMMAAnalysis(
    engine=engine,
    outtype='protein',
    features={'add_inverse_distances': {'select_Backbone': None}}
).named('pyemma')

Again we name it pyemma for later reference.

The other two option chose which output type from the engine we want to analyse. We chose the protein trajectories since these are faster to load and have better time resolution.

The features dict expresses which features to use. In our case use all inverse distances between backbone c_alpha atoms.

Next step is to add these to the project for later usage. We pick the .generators store and just add it. Consider a store to work like a set() in python. It contains objects only once and is not ordered. Therefore we need a name to find the objects later. Of course you can always iterate over all objects, but the order is not given.

To be precise there is an order in the time of creation of the object, but it is only accurate to seconds and it really is the time it was created and not stored.

In [54]:

project.generators.add(engine)
project.generators.add(modeller)

Note, that you cannot add the same engine twice. But if you create a new engine it will be considered different and hence you can store it again.

Create one initial trajectory¶

Finally we are ready to run a first trajectory that we will store as a point of reference in the project. Also it is nice to see how it works in general.

We are using a Worker approach. This means simply that someone (in our case the user from inside a script or a notebook) creates a list of tasks to be done and some other instance (the worker) will actually do the work.

First we create the parameters for the engine to run the simulation. Since it seemed appropriate we use a Trajectory object (a special File with initial frame and length) as the input. You could of course pass these things separately, but this way, we can actualy reference the no yet existing trajectory and do stuff with it.

A Trajectory should have a unique name and so there is a project function to get you one. It uses numbers and makes sure that this number has not been used yet in the project.

In [56]:

trajectory = project.new_trajectory(engine['pdb_file'], 100, engine)
trajectory

Out[56]:

Trajectory('alanine.pdb' >> [0..100])

This says, initial is alanine.pdb run for 100 frames and is named xxxxxxxx.dcd.

You might wonder why a Trajectory object is necessary. You could just build a function that will take these parameters and run a simulation. At the end it will return the trajectory object. The same object we created just now.

The main reason is to familiarize you with the general concept of asyncronous execution and so-called Promises. The trajectory object we built is similar to a Promise so what is that exactly?

A Promise is a value (or an object) that represents the result of a function at some point in the future. In our case it represents a trajectory at some point in the future. Normal promises have specific functions do deal with the unknown result, for us this is a little different but the general concept stands. We create an object that represents the specifications of a Trajectory and so, regardless of the existence, we can use the trajectory as if it would exists:

Get the length

In [61]:

print trajectory.length

and since the length is fixed, we know how many frames there are and can access them

In [64]:

print trajectory[20]

Frame(sandbox:///{}/00000001/[20])

ask for a way to extend the trajectory

In [65]:

print trajectory.extend(100)

<adaptivemd.engine.engine.TrajectoryExtensionTask object at 0x110e6e210>

ask for a way to run the trajectory

In [66]:

print trajectory.run()

<adaptivemd.engine.engine.TrajectoryGenerationTask object at 0x110dd46d0>

We can ask to extend it, we can save it. We can reference specific frames in it before running a simulation. You could even build a whole set of related simulations this way without running a single frame. You might understand that this is pretty powerful especially in the context of running asynchronous simulations.

Last, we did not answer why we have two separate steps: Create the trajectory first and then a task from it. The main reason is educational: > It needs to be clear that a ``Trajectory`` *can exist* before running some engine or creating a task for it. The ``Trajectory`` *is not* a result of a simulation action.

Now, we want that this trajectory actually exists so we have to make it. This requires a Task object that knows to describe a simulation. Since Task objects are very flexible and can be complex there are helper functions (i.e. factories) to get these in an easy manner, like the ones we already created just before. Let’s use the openmm engine to create an openmm task now.

In [57]:

task = engine.run(trajectory)

As an alternative you can directly use the trajectory (which knows its engine) and call .run()

In [58]:

task = trajectory.run()

That’s it, just take a trajectory description and turn it into a task that contains the shell commands and needed files, etc.

Finally we need to add this task to the things we want to be done. This is easy and only requires saving the task to the project. This is done to the project.tasks bundle and once it has been stored it can be picked up by any worker to execute it.

In [32]:

project.queue(task)  # shortcut for project.tasks.add(task)

That is all we can do from here. To execute the tasks you need to run a worker using

adaptivemdworker -l tutorial --verbose

Once this is done, come back here and check your results. If you want you can execute the next cell which will block until the task has been completed.

In [33]:

print project.files
print project.trajectories

<StoredBundle for with 6 file(s) @ 0x111fa1150>
<ViewBundle for with 0 file(s) @ 0x111fa1450>

and close the project.

In [27]:

project.close()

The final project.close() will close the DB connection.

In [ ]: