BRAVO 2021

Colin A. Gross

2021-09-13

BRAVO for your VCFs

“Spin up and instance with your VCFs!”

Spinup Steps

  1. Prepare Data: (Bravo Data Prep) ✓
  2. Deploy Data: (Bravo API) ✓
  3. Deploy API: (Bravo API) ✓
  4. Deply UI: (Bravo Vue UI) ◎

Inputs

  • VCFs
  • Crams

Software Dependencies

  • Nextflow, python, tabix, samtools, htslib, VEP, BamUtil, Loftee
  • Flask, pymongo, mongodb
  • Npm, vue-cli

Data Dependencies

HUGO names, Ensembl Id mapping, OMIM mapping, Gencode gene list, loftee ancestor.fa, loftee conservation files, reference sequence

Current State: Data Prep

Consolidated and coherent

✓ Consolidate data processing scripts

workflows/
├── coverage
│   ├── Coverage.nf
│   └── nextflow.config
├── prepare_vcf
│   ├── PrepareVCF.nf
│   └── nextflow.config
└── sequences
    ├── Sequences.nf
    └── nextflow.config

✓ Document data processing config

  // Use wildcard like chr11.*.bcf for list of bcf files.
  chromosome      = "chr11"

  // Required input data
  vcfs            = "data/chr11.subset.bcf"
  // Use *.tsv.gz for list of cadd files.
  cadd_tsvs       = "data/chr11.sites.cadd.tsv.gz"

  // Optional input data
  // Use NO_FILE to indicate the optional samples list is not being used.
  samples         = "NO_FILE"

  // Executeable paths.
  // Use name of exec if in PATH or is symlinked in bin/ of this pipeline
  vep             = "vep"
  counts_exec     = "ComputeAlleleCountsAndHistograms"

  // Scripts use full path or path to symlink in local bin/
  add_cadd_script = "bin/add_cadd_scores.py"

✓ Document software dependencies

## BamUtil
Clone and install from [bamUtil repo](https://github.com/statgen/bamUtil)

## VEP
Installation [instructions](https://useast.ensembl.org/info/docs/tools/vep/script/vep_download.html) on ensembl.org.

## Loftee
Master branch of [LoF plugin](https://github.com/konradjk/loftee) doesn't work for GRCh38
per this [issue](https://github.com/konradjk/loftee/issues/73#issuecomment-733109901)
    ```
    git clone --depth 1 --branch grch38 --single-branch git@github.com:konradjk/loftee.git
    ```

✓ Document data dependencies

#### Genenames: HUGO Gene Nomenclature Commitee (HGNC)
Can be obtained from [genenames custom downloads](https://www.genenames.org/download/custom/)
 
The genenames set from all the chromosomes with the above columns can be obtained using curl.
    ```
    curl 'https://www.genenames.org/cgi-bin/...
    ```

and idiosyncracies

Headers need to be rewritten to match expected column names and the results gzipped
    ```
echo -e "symbol\tname\talias_symbol\tprev_symbol\tensembl_gene_id" > hgcn_genenames.txt
tail -n +2 hgcn_custom_result.txt >> hgcn_genenames.txt
gzip hgcn_genenames.txt
    ```

✓ Update dependencies

  • hstlib, samtools, bamtools, tabix, vcftools
  • python packages & scripts
  • Nextflow 1.x -> 2.0

Current State: API

Up to date & operational

✓ Dependency Updates

  • marshmallow
  • pymongo
  • pysam

✓ Refactor Framework Code

Explicit calls to web arg parsing

@bp.route('/coverage', methods = ['GET'])
def get_coverage():
    arguments = {
       'chrom': fields.Str(required = True, validate ... ),
       'start': fields.Int(required = True, validate ...), }
    args = parser.parse(arguments,
    validate = partial(validate_coverage_http_request_args ...```

Framework decorator

cov_argmap = {
    'chrom': fields.Str(required = True, validate ...),
    'start': fields.Int(required = True, validate ...), }

@bp.route('/coverage', methods=['GET'])
@use_args(cov_argmap, location='query', validate=validate_paging_args)
def get_coverage(args):

✓ Package

Allows for editable install to avoid local ENV complications.

python -m pip install -e .[dev,test]

Actual test tooling to separate integration & unit tests.

[pytest]
testpaths =
    tests
mongodb_fixture_dir = 
    tests/mongo_fixtures
markers =
    integration: using actual data or actual mongodb. (run using --integration)
    default: applied by top level conftest.py to otherwise unmarked tests.

✓ Define I/O Data

Mongo fixtures tests/fixtures/genes.json

    "chrom": "5",
    "start": 10736171,
    "xstart": { "$numberLong": "5010736171" },

Endpoint expectations defined tests.

GENE_RESULT = [
  {'chrom': '11', 'full_gene_name': 'hemoglobin subunit beta pseudogene 1',
    'gene_id': 'ENSG00000229988', 'gene_name': 'HBBP1')

Current State: UI

Moving to Vue

✓ Move Server Side Functions to API

✓ Simplify Build & Deploy

Previously:

  1. build each vue component
  2. copy all dist/ directories to flask assets
  3. serve flask app

Currently:

npm run serve

◎ Consolidate Components

TODO: Migrate all .vue components under single application.

workspace/ui_bravo2/web_components/
├── bravo-model
│   ├── package.json
│   └── src
├── bravo-region
│   ├── package.json
│   └── src
├── bravo-search
│   ├── package.json
│   └── src
└── bravo-variant
    ├── package.json
    └── src

◎ Test Components

TODO: Verify expected appearance & behavior

it("Horizontal rule gets centered", () => {
  expect(cmp.find(Message).attributes().style).toBe("padding-top: 10px;");
});

it('centers horizontal rule', () => {
  const wrapper = shallowMount(SearchBox, { props: { autofocus: true } })
  const hrStyle = wrapper.find('hr').attributes().style
  expect(hrStyle).to.include('margin-right: auto')
  expect(hrStyle).to.include('margin-left: auto')
})

Next work

Short Term

  • Finish UI Migration
  • Update deployment playbook
  • Test Data Prep Pipeline with freeze10 data.
  • Test Data Prep Pipeline with public data.
  • Advertise

Mid Term

  • Reduce Data Prep requirements
  • Better handle missing info
  • Flask app deployment plan for SPH infrastructure

Long Term

  • Continuous Integration via GH Actions
  • External deployments & collaboration

End