CLI

The CurryBO Core CLI is the software that contains all the CurryBO logic. It is used by currybo-benchmarks and CurryBO web, but also has a CLI for direct interaction.

A command like


currybo \
--measurements denmark_measurements.csv \
--options denmark_options.csv \
--substrates name=Thiol,type=smiles name=Imine,type=smiles \
--conditions name=Catalyst,type=smiles \
--targets name=Delta_Delta_G,type=scalar \
--objectives name=Delta_Delta_G,abs_threshold=1,maximize=True \
--batch-size 2

might result in this:


{
  "estimated_current_optimum": {
    "point": {
      "Catalyst": "O=P1(O)OC2=C(C3=C(F)C=C(OC)C=C3F)C=C4C(C=CC=C4)=[C@]2[C@]5=C(O1)C(C6=C(F)C=C(OC)C=C6F)=CC7=C5C=CC=C7"
    },
    "value": {
      "Delta_Delta_G": 1.2688742979134011
    }
  },
  "next_points": [
    {
      "point": {
        "Thiol": "SC1CCCCC1",
        "Imine": "O=C(C1=CC=CC=C1)/N=C/C2=CC=C(Cl)C=C2Cl",
        "Catalyst": "O=P1(O)OC2=C(C3=C(F)C=C(OC)C=C3F)C=C4C(C=CC=C4)=[C@]2[C@]5=C(O1)C(C6=C(F)C=C(OC)C=C6F)=CC7=C5C=CC=C7"
      },
      "value": {
        "Delta_Delta_G": {
          "mean": 1.2686868041322046,
          "stdev": 0.021312972361864586
        }
      }
    },
    {
      "point": {
        "Thiol": "SC1CCCCC1",
        "Imine": "O=C(C1=CC=CC=C1)/N=C/C2=CC=C(C(F)(F)F)C=C2",
        "Catalyst": "O=P1(O)OC2=C(CC3=CC(C(F)(F)F)=CC(C(F)(F)F)=C3)C=C4C(CCCC4)=C2C5=C(O1)C(CC6=CC(C(F)(F)F)=CC(C(F)(F)F)=C6)=CC7=C5CCCC7"
      },
      "value": {
        "Delta_Delta_G": {
          "mean": 1.267070886641988,
          "stdev": 0.021635567155093717
        }
      }
    }
  ]
}

Output

In the output above, CurryBO returns

estimated_current_optimum: The condition that CurryBO currently thinks has the best general target(s), as point (condition) and value (target)
next_points: A list of --batch-size points (substrates + conditions) to measure next in order to find the optimum as quickly as possible, together with the values (as mean and standard deviation) of all targets it currently expects for these points.

Synopsis


usage: currybo [-h] [--measurements MEASUREMENTS] [--options OPTIONS] [--conditions CONDITIONS [CONDITIONS ...]]
               [--substrates SUBSTRATES [SUBSTRATES ...]] [--targets TARGETS [TARGETS ...]] [--objectives OBJECTIVES [OBJECTIVES ...]]
               [--final-objective FINAL_OBJECTIVE] [--seed SEED] [--surrogate {SimpleGP,AdditiveStructureGP}] [--kernel {TanimotoKernel}]
               [--likelihood {GaussianLikelihood}]
               [--x-utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}]
               [--x-utility-kwargs X_UTILITY_KWARGS]
               [--w-utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility}]
               [--w-utility-kwargs W_UTILITY_KWARGS]
               [--utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}]
               [--utility-kwargs UTILITY_KWARGS]
               [--acquisition {SequentialAcquisition,SequentialLookaheadAcquisition,JointLookaheadAcquisition}]
               [--aggregation {Mean,Sigmoid,MSE,Min}] [--batch-size BATCH_SIZE]
               [--batch-strategy {QSequentialAcquisition,QProbabilityOfOptimality}] [--qpo-num-samples QPO_NUM_SAMPLES] [--silent]
 
Find general parameters in synthesis using Bayesian Optimization
 
options:
  -h, --help            show this help message and exit
  --measurements MEASUREMENTS
                        Measurements .csv file
  --options OPTIONS     Options for substrate and condition columns
  --conditions CONDITIONS [CONDITIONS ...]
                        Condition columns of data set, as keyval. Specify [name, type (smiles, scalar, array)], e.g. `name=Catalyst,type=smiles`
  --substrates SUBSTRATES [SUBSTRATES ...]
                        Substrate columns that should be evaulated for generality. Specify [name, type (smiles, scalar, array)], e.g.
                        `name=Ketone,type=smiles`
  --targets TARGETS [TARGETS ...]
                        Target columns of data set. Specify [name, type (scalar)], e.g. `name=Yield,type=scalar`
  --objectives OBJECTIVES [OBJECTIVES ...]
                        Objectives for optimization. Specify [name, threshold, lower_bound, upper_bound, maximize]
  --final-objective FINAL_OBJECTIVE
                        Objective index to optimize when all objectives reached their threshold
  --seed SEED           Seed for RNG
  --surrogate {SimpleGP,AdditiveStructureGP}
                        Surrogate Model Type, defaults to `SimpleGP`
  --kernel {TanimotoKernel}
                        Covariance Kernel for the Surrogate Model
  --likelihood {GaussianLikelihood}
                        Likelihood
  --x-utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}
                        Utility function Type for x. Defaults to QuantileUtility
  --x-utility-kwargs X_UTILITY_KWARGS
                        Arguments to pass to the x utility, as a keyval string
  --w-utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility}
                        Utility function Type for w. Defaults to UncertaintyUtility
  --w-utility-kwargs W_UTILITY_KWARGS
                        Arguments to pass to the w utility, as a keyval string
  --utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}
                        Utility function for Joint Acquisitions
  --utility-kwargs UTILITY_KWARGS
                        Arguments to pass to the utility, as a keyval string
  --acquisition {SequentialAcquisition,SequentialLookaheadAcquisition,JointLookaheadAcquisition}
                        Acquisition Strategy, defaults to `SequentialAcquisition`
  --aggregation {Mean,Sigmoid,MSE,Min}
                        Aggregation Function, defaults to `Mean`
  --batch-size BATCH_SIZE
                        Batch Size, defaults to 1
  --batch-strategy {QSequentialAcquisition,QProbabilityOfOptimality}
                        Batch Strategy, defaults to QSequentialAcquisition
  --qpo-num-samples QPO_NUM_SAMPLES
                        Nuber of samples for qPO
  --silent              Do not generate any output. Useful for automated runs.

Arguments

`--help`

Print the help message and exit.

`--measurements`

required

e.g. --measurements measurements-file.csv

Specify which measurements file to use. This file defines what values (at least 1) were already measured. Provide this file as a .csv (comma-separated) with parameter names as column headers. Each measurement is one line.

Lists of values that correspond to one parameter (array type) are space-separated. All (SUBSTRATE, CONDITION, TARGET) need to be included here.

Example CSV:

measurements.csv


substrate,base,fluoride,yield
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2,ClC1=CC=C(S(=O)(F)=O)C=C1,0.42
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2,O=S(C1=CC=CC=N1)(F)=O,0.48
...

`--options`

required

e.g. --options options-file.csv

Specify which options file to use. This file defines what options (at least 1 per column) CurryBO should consider. Provide this file as a .csv (comma-separated) with parameter names as column headers. Each option is one line in a column. Options in different columns and the same row have no correlation.

Lists of values that correspond to one parameter (array type) are space-separated. Only (SUBSTRATE, CONDITION) should be included here.

Note that duplicates in a column are automatically removed by CurryBO.

options.csv


substrate,base,fluoride
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2,ClC1=CC=C(S(=O)(F)=O)C=C1
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2,O=S(C1=CC=CC=N1)(F)=O
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2
OCCCCC1=CC=CC=C1
OCCCCC1=CC=CC=C1
OC(C)CCC1=CC=CC=C1

`--conditions`

required

e.g. --conditions name=fluoride,type=smiles name=temperature,type=scalar

Specify what columns of your measurements/options should be treaded as conditions. For each condition, specify a name (equals a column name in your input files) and a type (one of smiles, scalar or array), separated by a comma. Do not use spaces around the = or ,. Column names with spaces can be handled with e.g. --conditions "name=my condition,type=smiles".

smiles: A molecule, defined by its SMILES string
array: A list of values that correspond to the same parameter, e.g. a list of descriptors for a molecule. Space-separated, e.g. 2.3 4.5 6.7
scalar: A number, e.g. a temperature

`--substrates`

required

e.g. --substrates name=substrate,type=smiles name=temperature,type=scalar

Same as --conditions, except for defining substrates.

`--targets`

required

e.g. --targets name=yield,type=scalar

Same as --conditions, except for defining targets. Targets must always be of type scalar.

`--objectives`

required

e.g. --objectives name=yield,abs_threshold=0.9,maximize=True name=stereoselectivity,rel_threshold=0.6

Specify what CurryBO should optimize for. More information on Multi-Objective BO can be found here. Use the same key-value notation as described above.

If only one objective is given, CurryBO will optimize this objective.

If multiple objective are given, CurryBO will apply the following order of rules:

Optimize the first objective until its threshold is reached
Optimize the second objective until its threshold is reached
…
Optimize final-objective (below) to its optimum

If an objective cannot reach its threshold, CurryBO will optimize it as far as possible and then stop.

Possible keys:

name (required): Name of the column, usually a TARGET.
abs_threshold (required*): Defines what value this target should at least have.
rel_threshold (required*): Defines what value between 0 and 1 this target should at least have. Here, 0 is the lowest measured value and 1 the highest. If bounds are set, 0 is lower_bound and 1 is upper_bound.
lower_bound: Lower bound of the scalarizer. If not set, this value is the lowest measurement.
upper_bound: Upper bound of the scalarizer. If not set, this value is the highest measurement.
maximize: Whether this objective should be maximized (default) or minimized (maximize=False).

`--final-objective`

e.g. --final-objective 1

When all objectives have been satisfied, further optimize the objective at this index. Starts with 0, defaults to 0.

`--seed`

e.g. --seed 1234

Sets the seed for all random number generations.

`--surrogate`

Choose from {SimpleGP,AdditiveStructureGP}

e.g. --surrogate SimpleGP

Sets the surrogate model type. More info here.

`--kernel`

Choose from {TanimotoKernel}

e.g. --kernel TanimotoKernel

Sets the kernel for the surrogate model.

`--likelihood`

Choose from {GaussianLikelihood}

e.g. --likelihood GaussianLikelihood

Sets the likelihood for the surrogate model.

`--x-utility`

Choose from {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}

e.g. --x-utility QualitativeImprovement

Sets the Condition utility function for SequentialAcquisition or SequentialLookaheadAcquisition. Defaults to QuantileUtility. More info here.

`--x-utility-kwargs`

e.g. --x-utility-kwargs beta=5

Some utility functions (e.g. QuantileUtility) can be configured with arguments. Pass these as key-value strings here.

`--w-utility`

Choose from {Random,SimpleRegret,UncertaintyUtility,QuantileUtility}

e.g. --w-utility QualitativeImprovement

Sets the Substrate utility. Otherwise same as --x-utility

`--w-utility-kwargs`

See --x-utility-kwargs

`--utility`

Choose from {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}

e.g. --utility QualitativeImprovement

Sets the Condition and Substrate utility function for JointLookaheadAcquisition. Defaults to QuantileUtility. More info here.

`--utility-kwargs`

See --x-utility-kwargs

`--acquisition`

Choose from {SequentialAcquisition,SequentialLookaheadAcquisition,JointLookaheadAcquisition}

e.g. --acquisition SequentialLookaheadAcquisition

Sets the acquisition strategy. Defaults to SequentialAcquisition. More info here.

Keep in mind that SequentialAcquisition and SequentialLookaheadAcquisition use --x-utility and --w-utility while JointLookaheadAcquisition uses --utility.

`--aggregation`

Choose from {Mean,Sigmoid,MSE,Min}

e.g. --aggregation Min

Sets the aggregation function. Defaults to Mean. More info here

`--batch-size`

e.g. --batch-size 5

Sets the number of conditions/substrates CurryBO proposes for the next round of measurements. Defaults to 1. More info here.

`--batch-strategy`

Choose from {QSequentialAcquisition,QProbabilityOfOptimality}

e.g. --batch-strategy QProbabilityOfOptimality

Sets the batching strategy. Defaults to QSequentialAcquisition. More info here

`--qpo-num-samples`

e.g. --qpo-num-samples 20

Sets the number of samples QProbabilityOfOptimality should use. Ignored if --batch-strategy is not QProbabilityOfOptimality. Defaults to 10. More info here.

`--silent`

Do not generate any console output. Useful for automated runs.