adaptive.Worker`s are the main execution units of your :class:`adaptive.Task instances.
adaptive.Task object contains specifics about what you want
to happen, like create a trajectory with this length, it does not know anything
about where to run it and how to achieve the goal there. The
definition is concrete but it misses knowlegde that only the actual
adaptive.Worker that executes it has. Things like the actual working
directory, (you do not want to interfere with other workers), how to copy
a file from A to B, etc…
There are two ways to use a
- a manual way in a script, or
- through a stand-alone bash command. That will run a python script which creates a Worker with some options and just runs it until it is shut down.
You will be mostly using the 2. way since it is much simpler and you will typically submit it to the queue and then it will listen in the DB for task to be run in regular intervals.
How does it work¶
Technically a worker gets a task to execute (the task of picking a task from the DB is not solved by the worker!). Then
- A new worker directory is created named according to the task
- It will convert the given task into a bash script (this might involve already copying files from the DB to some folders since this is something that is not handled in a bash script)
- The bash script is executed within the current working directory
- Once it is finished and succeeded the outputs are stored and created files are registered as being existent now.
- A Callback is run, if the task had one
The actual worker will run somewhere on the HPC or as a separate process on your local machine. In both cases the Worker instance will not be present in your execution script or notebook. Hence changes or function you call in your notebook will have no effect to the worker running somewhere else.
Still, any worker that you create through the
adaptivemdworker script will
be stored in the project, so its settings are visible to anyone with access
you your project DB.
Using the BD, you have a way to connect to the worker. You can set a specaicl property which is checked by the running worker in regular intervals and if it takes special values the Worker will act. You could try
The other typical thing that is of interest is the status of the worker
This is bad and should not happen, but it can. When a worker dies it does not mean that its execution thread died. The bash script will be run in another thread that is monitored (and should also die if the worker is killed).
Now the worker stalls and stops accepting tasks, etc. What happens?
The worker will continuously send a heartbeat to the DB, which is just a current timestamp. It does this every 10 seconds. You can simply check this by
If it is supposed to write it every 10 seconds and it does not do that for a
minute we get suspicious. When calling
project.trigger() which will also
look for open events to be run, the project also checks, if all workers are
still alive – where alive means that there last alive time is > 60s.
So, if a worker is considered dead, it is sends the
kill command just to make
sure that it will be dead when we will consider it being so and not secretly
keep on working. There would be no problem, if it would sill run correctly but
if it really had failed we want to retry the failed job.
Next, the current task is considered failed and will be restarted. This means
just to set the
created. And another worker that is
responding can pick it up. This task will overwrite all files that the failed
task would have generated and so we keep consistent in the database.
adaptivemdworker takes some options
- usage: adaptivemdworker [-h] [-t [WALLTIME]] [-d [MONGO_DB_PATH]]
- [-g [GENERATORS]] [-w [WRAPPERS]] [-l] [-v] [-a] [–sheep] [-s [SLEEP]] [–heartbeat [HEARTBEAT]] project_name
Run an AdaptiveMD worker
- positional arguments:
- project_name project name the worker should attach to
- optional arguments:
-h, --help show this help message and exit
- -t [WALLTIME], –walltime [WALLTIME]
- minutes until the worker shuts down. If 0 (default) it will run indefinitely
- -d [MONGO_DB_PATH], –mongodb [MONGO_DB_PATH]
- the mongodb url to the db server
- -g [GENERATORS], –generators [GENERATORS]
- a comma separated list of generator names used to dispatch the tasks. the worker will only respond to tasks from generators whose names match one of the names in the given list. Example: –generators=openmm will only run scripts from generators named openmm
- -w [WRAPPERS], –wrappers [WRAPPERS]
- a comma separated list of simple function call to the resource. This can be used to add e.g. CUDA support for specific workers. Example: –wrappers=add_path(“something”),add_cuda_module()
-l, --local if true then the DB is set to the default local port -v, --verbose if true then stdout and stderr of subprocesses will be rerouted. Use for debugging. -a, --allegro if true then the DB is set to the default allegro setting --sheep if true then the DB is set to the default sheep setting
- -s [SLEEP], –sleep [SLEEP]
- polling interval for new jobs in seconds. Default is 2 seconds. Increase to get less traffic on the DB
- –heartbeat [HEARTBEAT]
- heartbeat interval in seconds. Default is 10 seconds.