CLI
The CurryBO Core CLI is the software that contains all the CurryBO logic. It is used by currybo-benchmarks and CurryBO web, but also has a CLI for direct interaction.
A command like
currybo \
--measurements denmark_measurements.csv \
--options denmark_options.csv \
--substrates name=Thiol,type=smiles name=Imine,type=smiles \
--conditions name=Catalyst,type=smiles \
--targets name=Delta_Delta_G,type=scalar \
--objectives name=Delta_Delta_G,abs_threshold=1,maximize=True \
--batch-size 2might result in this:
{
"estimated_current_optimum": {
"point": {
"Catalyst": "O=P1(O)OC2=C(C3=C(F)C=C(OC)C=C3F)C=C4C(C=CC=C4)=[C@]2[C@]5=C(O1)C(C6=C(F)C=C(OC)C=C6F)=CC7=C5C=CC=C7"
},
"value": {
"Delta_Delta_G": 1.2688742979134011
}
},
"next_points": [
{
"point": {
"Thiol": "SC1CCCCC1",
"Imine": "O=C(C1=CC=CC=C1)/N=C/C2=CC=C(Cl)C=C2Cl",
"Catalyst": "O=P1(O)OC2=C(C3=C(F)C=C(OC)C=C3F)C=C4C(C=CC=C4)=[C@]2[C@]5=C(O1)C(C6=C(F)C=C(OC)C=C6F)=CC7=C5C=CC=C7"
},
"value": {
"Delta_Delta_G": {
"mean": 1.2686868041322046,
"stdev": 0.021312972361864586
}
}
},
{
"point": {
"Thiol": "SC1CCCCC1",
"Imine": "O=C(C1=CC=CC=C1)/N=C/C2=CC=C(C(F)(F)F)C=C2",
"Catalyst": "O=P1(O)OC2=C(CC3=CC(C(F)(F)F)=CC(C(F)(F)F)=C3)C=C4C(CCCC4)=C2C5=C(O1)C(CC6=CC(C(F)(F)F)=CC(C(F)(F)F)=C6)=CC7=C5CCCC7"
},
"value": {
"Delta_Delta_G": {
"mean": 1.267070886641988,
"stdev": 0.021635567155093717
}
}
}
]
}Output
In the output above, CurryBO returns
estimated_current_optimum: The condition that CurryBO currently thinks has the best general target(s), aspoint(condition) andvalue(target)next_points: A list of--batch-sizepoints (substrates + conditions) to measure next in order to find the optimum as quickly as possible, together with the values (as mean and standard deviation) of all targets it currently expects for these points.
Synopsis
usage: currybo [-h] [--measurements MEASUREMENTS] [--options OPTIONS] [--conditions CONDITIONS [CONDITIONS ...]]
[--substrates SUBSTRATES [SUBSTRATES ...]] [--targets TARGETS [TARGETS ...]] [--objectives OBJECTIVES [OBJECTIVES ...]]
[--final-objective FINAL_OBJECTIVE] [--seed SEED] [--surrogate {SimpleGP,AdditiveStructureGP}] [--kernel {TanimotoKernel}]
[--likelihood {GaussianLikelihood}]
[--x-utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}]
[--x-utility-kwargs X_UTILITY_KWARGS]
[--w-utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility}]
[--w-utility-kwargs W_UTILITY_KWARGS]
[--utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}]
[--utility-kwargs UTILITY_KWARGS]
[--acquisition {SequentialAcquisition,SequentialLookaheadAcquisition,JointLookaheadAcquisition}]
[--aggregation {Mean,Sigmoid,MSE,Min}] [--batch-size BATCH_SIZE]
[--batch-strategy {QSequentialAcquisition,QProbabilityOfOptimality}] [--qpo-num-samples QPO_NUM_SAMPLES] [--silent]
Find general parameters in synthesis using Bayesian Optimization
options:
-h, --help show this help message and exit
--measurements MEASUREMENTS
Measurements .csv file
--options OPTIONS Options for substrate and condition columns
--conditions CONDITIONS [CONDITIONS ...]
Condition columns of data set, as keyval. Specify [name, type (smiles, scalar, array)], e.g. `name=Catalyst,type=smiles`
--substrates SUBSTRATES [SUBSTRATES ...]
Substrate columns that should be evaulated for generality. Specify [name, type (smiles, scalar, array)], e.g.
`name=Ketone,type=smiles`
--targets TARGETS [TARGETS ...]
Target columns of data set. Specify [name, type (scalar)], e.g. `name=Yield,type=scalar`
--objectives OBJECTIVES [OBJECTIVES ...]
Objectives for optimization. Specify [name, threshold, lower_bound, upper_bound, maximize]
--final-objective FINAL_OBJECTIVE
Objective index to optimize when all objectives reached their threshold
--seed SEED Seed for RNG
--surrogate {SimpleGP,AdditiveStructureGP}
Surrogate Model Type, defaults to `SimpleGP`
--kernel {TanimotoKernel}
Covariance Kernel for the Surrogate Model
--likelihood {GaussianLikelihood}
Likelihood
--x-utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}
Utility function Type for x. Defaults to QuantileUtility
--x-utility-kwargs X_UTILITY_KWARGS
Arguments to pass to the x utility, as a keyval string
--w-utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility}
Utility function Type for w. Defaults to UncertaintyUtility
--w-utility-kwargs W_UTILITY_KWARGS
Arguments to pass to the w utility, as a keyval string
--utility {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}
Utility function for Joint Acquisitions
--utility-kwargs UTILITY_KWARGS
Arguments to pass to the utility, as a keyval string
--acquisition {SequentialAcquisition,SequentialLookaheadAcquisition,JointLookaheadAcquisition}
Acquisition Strategy, defaults to `SequentialAcquisition`
--aggregation {Mean,Sigmoid,MSE,Min}
Aggregation Function, defaults to `Mean`
--batch-size BATCH_SIZE
Batch Size, defaults to 1
--batch-strategy {QSequentialAcquisition,QProbabilityOfOptimality}
Batch Strategy, defaults to QSequentialAcquisition
--qpo-num-samples QPO_NUM_SAMPLES
Nuber of samples for qPO
--silent Do not generate any output. Useful for automated runs.Arguments
--help
Print the help message and exit.
--measurements
required
e.g. --measurements measurements-file.csv
Specify which measurements file to use. This file defines what values (at least 1) were already measured. Provide this file as a .csv (comma-separated) with parameter names as column headers. Each measurement is one line.
Lists of values that correspond to one parameter (array type) are space-separated. All (SUBSTRATE, CONDITION, TARGET) need to be included here.
Example CSV:
substrate,base,fluoride,yield
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2,ClC1=CC=C(S(=O)(F)=O)C=C1,0.42
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2,O=S(C1=CC=CC=N1)(F)=O,0.48
...--options
required
e.g. --options options-file.csv
Specify which options file to use. This file defines what options (at least 1 per column) CurryBO should consider. Provide this file as a .csv (comma-separated) with parameter names as column headers. Each option is one line in a column. Options in different columns and the same row have no correlation.
Lists of values that correspond to one parameter (array type) are space-separated. Only (SUBSTRATE, CONDITION) should be included here.
Note that duplicates in a column are automatically removed by CurryBO.
substrate,base,fluoride
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2,ClC1=CC=C(S(=O)(F)=O)C=C1
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2,O=S(C1=CC=CC=N1)(F)=O
OCCCCC1=CC=CC=C1,N12CCCN=C1CCCCC2
OCCCCC1=CC=CC=C1
OCCCCC1=CC=CC=C1
OC(C)CCC1=CC=CC=C1--conditions
required
e.g. --conditions name=fluoride,type=smiles name=temperature,type=scalar
Specify what columns of your measurements/options should be treaded as conditions. For each condition, specify a name (equals a column name in your input files) and a type (one of smiles, scalar or array), separated by a comma. Do not use spaces around the = or ,. Column names with spaces can be handled with e.g. --conditions "name=my condition,type=smiles".
smiles: A molecule, defined by its SMILES stringarray: A list of values that correspond to the same parameter, e.g. a list of descriptors for a molecule. Space-separated, e.g.2.3 4.5 6.7scalar: A number, e.g. a temperature
--substrates
required
e.g. --substrates name=substrate,type=smiles name=temperature,type=scalar
Same as --conditions, except for defining substrates.
--targets
required
e.g. --targets name=yield,type=scalar
Same as --conditions, except for defining targets. Targets must always be of type scalar.
--objectives
required
e.g. --objectives name=yield,abs_threshold=0.9,maximize=True name=stereoselectivity,rel_threshold=0.6
Specify what CurryBO should optimize for. More information on Multi-Objective BO can be found here. Use the same key-value notation as described above.
If only one objective is given, CurryBO will optimize this objective.
If multiple objective are given, CurryBO will apply the following order of rules:
- Optimize the first objective until its threshold is reached
- Optimize the second objective until its threshold is reached
- …
- Optimize
final-objective(below) to its optimum
If an objective cannot reach its threshold, CurryBO will optimize it as far as possible and then stop.
Possible keys:
name(required): Name of the column, usually aTARGET.abs_threshold(required*): Defines what value this target should at least have.rel_threshold(required*): Defines what value between 0 and 1 this target should at least have. Here, 0 is the lowest measured value and 1 the highest. If bounds are set, 0 islower_boundand 1 isupper_bound.lower_bound: Lower bound of the scalarizer. If not set, this value is the lowest measurement.upper_bound: Upper bound of the scalarizer. If not set, this value is the highest measurement.maximize: Whether this objective should be maximized (default) or minimized (maximize=False).
--final-objective
e.g. --final-objective 1
When all objectives have been satisfied, further optimize the objective at this index. Starts with 0, defaults to 0.
--seed
e.g. --seed 1234
Sets the seed for all random number generations.
--surrogate
Choose from {SimpleGP,AdditiveStructureGP}
e.g. --surrogate SimpleGP
Sets the surrogate model type. More info here.
--kernel
Choose from {TanimotoKernel}
e.g. --kernel TanimotoKernel
Sets the kernel for the surrogate model.
--likelihood
Choose from {GaussianLikelihood}
e.g. --likelihood GaussianLikelihood
Sets the likelihood for the surrogate model.
--x-utility
Choose from {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}
e.g. --x-utility QualitativeImprovement
Sets the Condition utility function for SequentialAcquisition or SequentialLookaheadAcquisition. Defaults to QuantileUtility. More info here.
--x-utility-kwargs
e.g. --x-utility-kwargs beta=5
Some utility functions (e.g. QuantileUtility) can be configured with arguments. Pass these as key-value strings here.
--w-utility
Choose from {Random,SimpleRegret,UncertaintyUtility,QuantileUtility}
e.g. --w-utility QualitativeImprovement
Sets the Substrate utility. Otherwise same as --x-utility
--w-utility-kwargs
See --x-utility-kwargs
--utility
Choose from {Random,SimpleRegret,UncertaintyUtility,QuantileUtility,QuantitativeImprovement,QualitativeImprovement}
e.g. --utility QualitativeImprovement
Sets the Condition and Substrate utility function for JointLookaheadAcquisition. Defaults to QuantileUtility. More info here.
--utility-kwargs
See --x-utility-kwargs
--acquisition
Choose from {SequentialAcquisition,SequentialLookaheadAcquisition,JointLookaheadAcquisition}
e.g. --acquisition SequentialLookaheadAcquisition
Sets the acquisition strategy. Defaults to SequentialAcquisition. More info here.
Keep in mind that SequentialAcquisition and SequentialLookaheadAcquisition use --x-utility and --w-utility while JointLookaheadAcquisition uses --utility.
--aggregation
Choose from {Mean,Sigmoid,MSE,Min}
e.g. --aggregation Min
Sets the aggregation function. Defaults to Mean. More info here
--batch-size
e.g. --batch-size 5
Sets the number of conditions/substrates CurryBO proposes for the next round of measurements. Defaults to 1. More info here.
--batch-strategy
Choose from {QSequentialAcquisition,QProbabilityOfOptimality}
e.g. --batch-strategy QProbabilityOfOptimality
Sets the batching strategy. Defaults to QSequentialAcquisition. More info here
--qpo-num-samples
e.g. --qpo-num-samples 20
Sets the number of samples QProbabilityOfOptimality should use. Ignored if --batch-strategy is not QProbabilityOfOptimality. Defaults to 10. More info here.
--silent
Do not generate any console output. Useful for automated runs.