Your first calculation#

This page contains a simple tutorial for setting up a CASTEP calculation with AiiDA using aidia_castep plugin.

Note

A notebook example can be found here.

Step-by-step example - Silicon#

Here we setup a simple calculations of silicon that is identical to the ones in the CASTEP’s on-line tutorial. Before we start, AiiDA should be setup properly already. You should have a working profile with a CASTEP Code node and a Computer node. Both of them can be added using verdi command-line interface.

Loading classes#

The plugin system allows plugin classes to be loaded using factory functions:

from aiida.plugins import DataFactory, CalculationFactory
# Classes defined in this plugin
CastepCalculation = CalculationFactory("castep.castep")
OtfgData = DataFactory("castep.otfgdata")
UspData = DataFactory("castep.uspdata")
# Native aiida classes
KpointsData = DataFactory("array.kpoints")
Dict = DataFactory("dict")
StructureData = DataFactory("structure")

Note

It is best to work through the example interactively using verdi shell.

The first step#

The inputs of the calculation needs to be stored in a nested dictionary. It is often more convenient to work with a ProcessBuilder which enables tab completions for link names and can also display doc strings:

builder = CastepCalculation.get_builder()

First, we need to define the code of the calculation which represents the actual executable to be used to do the work:

code = Code.get_from_string("castep-xx.xx@your_computer")
builder.code = code

Note

While defining calculations may seem to be complicated here, in practice most calculations will be performed by CastepBaseWorkChain which supports more concise and flexible inputs.

Setup CASTEP parameters#

For most CASTEP calculations we need a <seed>.param file and a <seed>.cell file.

The the <seed>.cell file contains the input geometry related parameters and should look like this:

%BLOCK LATTICE_CART
      2.6954645000       2.6954645000       0.0000000000
      2.6954645000       0.0000000000       2.6954645000
      0.0000000000       2.6954645000       2.6954645000
%ENDBLOCK LATTICE_CART
%BLOCK POSITIONS_ABS
Si                       0.0000000000       0.0000000000       0.0000000000
Si                       1.3477325500       1.3477325500       1.3477325500
%ENDBLOCK POSITIONS_ABS
kpoints_mp_grid     : 4 4 4
symmetry_generate   : True
%BLOCK SPECIES_POT
Si    C19
%ENDBLOCK SPECIES_POT

The <seed>.param file contains the other controlling parameters:

task                : singlepoint
basis_precision     : medium
fix_occupancy       : True
opt_strategy        : speed
num_dump_cycles     : 0
xc_functional       : lda
write_formatted_density: True

With aiida_castep we need a single Dict node that includes the keyword-value pairs that should have been put into the two files. The node is constructed based on a dictionary:

param_in = {"PARAM": {
                      "task" : "singlepoint",
                      "basis_precision": "medium",
                      "fix_occupancy": True,
                      "opt_strategy": "speed",
                      "num_dump_cycles": 0,
                      "xc_functional": "lda",
                      "write_formatted_density": True,
            },
            "CELL":  {
                      "symmetry_generate": True,
            }}

To construct the node, call:

param_data = Dict(dict=param_in)

Not everything you otherwise have to write in the <seed>.cell goes into the dictionary. For example, there is no need to supply lattice_cart and positions_abs as they will be defined by the StructureData input node. Finally, we store the Dict in the builder:

builder.parameters = param_data

A plain dictionary can also be used as the input as a Dict node can be automatically generated from it. This would also work:

builder.parameters = param_in

The downside is that a new Dict node is always created even the contents are identical.

Note

It is recommended to use python types instead of strings to make it easy for querying. No internal type check or enforcement is implemented. The bottom line is that the text files generated needs to be understood by CASTEP.

Note

Block type keywords can be set using a list of strings each for a single line.

Setup structure and k-points#

The input structure has to be stored as a StructureData node. Details about StructureData can be found in AiiDA’s documentation. For now, we can write:

StructureData = DataFactory("structure")
silicon = StructureData()
cell = [[2.6954645, 2.6954645, 0],
        [2.6954645, 0, 2.6954645],
        [0, 2.6954645, 2.6954645]]
silicon.set_cell(cell)
silicon.append_atom(position=[0, 0, 0], symbols="Si")
silicon.append_atom(position=[1.34773255, 1.34773255, 1.34773255], symbols="Si")

Alternatively, one can pass a ase.Atoms object to the constructive as keyword argument:

from ase import Atoms
a_si = Atoms("Si2", cell=cell, scaled_positions=[[0, 0, 0], [0.25, 0.25, 0.25]])
silicon = StructureData(ase=a_si)

To define the k points mesh, run:

KpointsData = DataFactory("array.kpoints")
kpoints = KpointsData()
kpoints.set_kpoints_mesh((4, 4, 4))

Here we are using a MP grid, alternatively k-points may be passed explicitly as in KpointsData. See AiiDA’s documentation for more information. Finally, we save them in the builder as inputs:

builder.kpoints = kpoints
builder.structure = silicon

Note

There are several useful routines in aiida_castep.utils to work with ase, such as generating constraints or converting trajectory to a list of Atoms for visualisation. The output structure of a CastepCalculation is automatically sorted to have a index consistent with the input structure.

Setup pseudo potentials#

CASTEP has the ability to generate pseudopotentials on-the-fly. Of course, using a pre-generated pseudo potential set is also supported and you can reuse the on-the-fly generated (OTFG)) potential files. There are several libraries built-in in CASTEP and new, revised versions comes out at new releases. Internally, OTFG potentials are generated based on a 1 line specification string which can be defined manually. A OTFG library is in fact a hard-coded collection of such string for a range of elements.

Files based native pseudopotentials has the suffix usp or recpot. In newer version of CASTEP, upf files are also supported. This plugin introduces UspData and OtfgData classes. Their usage is similar to the UpfData defined in aiida_core. To get a OtfgData:

otfg, create = OtfgData.get_or_create(otfg_string)

This avoids creation of duplicated nodes. If a new node is created, the variable create will be True. The element is automatically inferred from the otfg_string supplied. If no element is found, we assume that the string refers to built-in library in CASTEP, for example "C9".

A similar interface also exists for UspData node:

si00, create = UspData.get_or_create(path_to_workdir + "/Si_00.usp")

The md5 of usp files will be compared to see if the same UspData already exists. If that is the case the existing UspData node will be returned. A more convenient way of uploading a set of usp files is to use upload_usp_family function in aiida_castep.data.usp.

Note

The element is inferred from the file name which should be in the format <element>_<foo>.usp. Norm-conserving recpot files are treated as if they are usp files.

To let the builder use the pseudo potential:

builder.pseudos.Si = si00

Alternatively, and in fact used more commonly, is to create a family of the potentials:

from aiida_castep.data.usp import upload_usp_family
upload_usp_family("./", "LDA_test", "A family of LDA potentials for testing")

This will upload all valid pseudopotentials inside the current working directory into a family named “LDA_test”. More commonly, CASTEP calculations are done with on-the-fly generated pseudopotentials (OTFG), for which built-in libraries are available to use. However, these built-in libraries still have to be registered with AiiDA. To do so, one can upload a single family with the library string:

from aiida_castep.data.otfg import upload_otfg_family
upload_otfg_family(["C19"], "C19", "The C19 library shipped with CASTEP")

This family contains a single OTFG string that is the library named “C19”. The library name is treated specially in a way such that it will match to any elements, unless an explicity entry exists within the same family. For example, the following

upload_otfg_family(["C19", "O 2|1.5|12|13|15|20:21(qc=5)"], "MyFamily", "The family that uses a specific OTFG for oxygen and C19 for everything else.")

uploads a family “MyFamily”, which uses “C19” for any elements except for “O”, for which generation setting “2|1.5|12|13|15|20:21(qc=5)” will be used instead.

A family is just a collection of pseudopotentials and/or a library name. To apply it to a calculation, one can use an utility function:

CastepCalculation.use_pseudos_from_family(builder, "C19")

This sets the pseudos port of the builder to:

{'Si': <OTFGData: uuid: ca9d4083-e96e-4b12-a02a-81a6a4c34929 (pk: 32)>}

For this shortcut to work, the structure file of the builder must be define beforehand. Otherwise, one can also pass a dictionary manually to the pseudos port with keys and values being the specie names and the pseudopotential node to be used for each.

Note

Families of norm-conserving potentials from PseudoDojo can be installed with the aiida-pseudos package.

Setting the resources#

To run our calculations on remote clusters, we need request some resources. Please refer to AiiDA’s documentation for details as the settings are scheduler dependent. Options of running calculations are set under the metadata.options namespace. These properties are eventually stored as the attributes of the created CalcJobNode. As an example for now:

builder.metadata.options.max_wallclock_seconds = 3600
builder.metadata.options.resources = {"num_machines": 1}

This lets AiiDA know that we want to run on a single node for a maximum of 3600 seconds. You may want to set the custom_scheduler_commands for inserting additional lines in to the submission script, for example, to define the account to be charged.

Submission#

Now we are ready to submit the calculation. But before actual submission we can have a glance of the inputs to see if there is any mistake by using:

CastepCalculation.get_castep_input_summary(builder)

A dictionary is returned as a summary of the inputs of the calculation:

{'CELL': {'symmetry_generate': True},
 'PARAM': {'basis_precision': 'medium',
  'fix_occupancy': True,
  'num_dump_cycles': 0,
  'opt_strategy': 'speed',
  'task': 'singlepoint',
  'write_formatted_density': True,
  'xc_functional': 'lda'},
 'kpoints': 'Kpoints mesh: 4x4x4 (+0.0,0.0,0.0)',
 'label': None,
 'pseudos': {'Si': <OTFGData: uuid: ca9d4083-e96e-4b12-a02a-81a6a4c34929 (pk: 32)>},
 'structure': {'cell': [[2.6954645, 2.6954645, 0.0],
   [2.6954645, 0.0, 2.6954645],
   [0.0, 2.6954645, 2.6954645]],
  'formula': 'Si2',
  'label': None}}

To test generating the input files, call:

CastepCalculation.submit_test(builder)

This write inputs to written to date coded sub folders inside submit_test folder at current working directory. The input keywords for cell and param file will be check, and if there is any mistake an exception will be raised.

Note

The content of the folder should be identical to what will be uploaded to remote computer. Hence we can also check if the job script is correctly generated. The dryrun test can be performed locally with:

CastepCalculation.dryrun_test(builder)

Finally, we are ready to submit the calculation:

from aiida.engine import submit
calcjob = submit(builder)

The first line stores the calculation and all of its inputs. The seconds line mark our calculation for submission. The actual submission is handled by one of AiiDA’s daemon process, so you need to have it running in the background.

Note

Don’t forget to launch the daemon with verdi daemon start if you have not done so. When the submit function is called, the workchain and its inputs are serialized to the AiiDA database. Then task is sent to the RabbitMQ server to indicate that this workflow is ready to launch. It is then up to the daemon process to check-in the job and deserialize the inputs from the database to actually run the job.

Monitoring#

Monitoring the state of calculations can be done using verdi process list. Inside a interactive shell, the state of a calculation may be checked with calcjob.get_process_state().

Once the calculation is finished, the state can be access with calcjob.exit_status and calcjob.exit_message. If the calculation has finished without error then the exit_status should be 0.

Another way to monitor the jobs is to use verdi process list command, which lists all running calculations/workflows by default. It is also possible to filter the jobs by state/age using various optional arguments.

You will see that the just submitted job to be in Waiting state, and the process status going through several stages, such as upload, submit, update, retrieve.

Once the job is finished, it will be in the Finished state and no longer show up with verdi process list. Use verdi process list -p1 -a to list all jobs created within the past 24 hours.

Shell alias can be set to make interacting with verdi command easier. As an example, the following lines can be included in the .bashrc:

alias vplp="verdi process list -P pk process_label label state scheduler_state"
alias vpl="verdi process list"
alias vco="verdi calcjob outputcat"
alias vcg="verdi calcjob gotocomputer"
alias vci="verdi calcjob inputcat"

Accessing Results#

A series of node will be created when the calculation is finished and parsed. Use calc.get_outgoing().all() to access the output nodes. Alternatively, the main Dict node’s content can be return using calc.res.<tab completion>. Other nodes can be access using calc.outputs.<tab completion>. The calculation’s state is set to “FINISHED” after it is completed without error. This does not mean that the underlying task has succeeded. For example, an unconverged geometry optimization due to the maximum iteration being reached is still a successful calculation, as CASTEP has done what the user requested. On the other hand, if the calculation is terminated due to the time limit (cleanly exited or not), it will have an none-zero exit_status.

Further Reading#

As mentioned above, most calculation will be done using CastepBaseWorkChain which make it easier for defining the inputs and also adds the ability to correct some common problems, such as SCF convergence problems and running out of walltimes.

See also

Example notebook can be found here, or run interactively with Binder.