CommunityArte e Designgithub.com

BioTender-max/awesome-bio-agent-skills

A curated collection of AI agent skills for biomedical research, covering genomics, proteomics, single-cell analysis, clinical AI, and protein design.

Funciona com~Claude Code~Codex CLI~Cursor
npx skills add BioTender-max/awesome-bio-agent-skills

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

Documentação

ESM2 Protein Language Model

Prerequisites

RequirementMinimumRecommended
Python3.8+3.10
PyTorch1.10+2.0+
CUDA11.0+11.7+
GPU VRAM8GB24GB (A10G)
RAM16GB32GB

How to run

First time? See Installation Guide to set up Modal and biomodals.

Option 1: Modal

cd biomodals
modal run modal_esm2_predict_masked.py \
  --input-faa sequences.fasta \
  --out-dir embeddings/

GPU: A10G (24GB) | Timeout: 300s default

Option 2: Python API (recommended)

import torch
import esm

# Load model
model, alphabet = esm.pretrained.esm2_t33_650M_UR50D()
batch_converter = alphabet.get_batch_converter()
model = model.eval().cuda()

# Process sequences
data = [("seq1", "MKTAYIAKQRQISFVK...")]
batch_labels, batch_strs, batch_tokens = batch_converter(data)

with torch.no_grad():
    results = model(batch_tokens.cuda(), repr_layers=[33])

# Get embeddings
embeddings = results["representations"][33]

Key parameters

ESM2 Models

ModelParametersSpeedQuality
esm2_t6_8M8MFastestFast screening
esm2_t12_35M35MFastGood
esm2_t33_650M650MMediumBetter
esm2_t36_3B3BSlowBest

Output format

embeddings/
├── embeddings.npy       # (N, 1280) array
├── pll_scores.csv       # PLL for each sequence
└── metadata.json        # Sequence info

Sample output

Successful run

$ modal run modal_esm2_predict_masked.py --input-faa designs.fasta
[INFO] Loading ESM2-650M model...
[INFO] Processing 100 sequences...
[INFO] Computing pseudo-log-likelihood...

embeddings/pll_scores.csv:
sequence_id,pll,pll_normalized,length
design_0,-0.82,0.15,78
design_1,-0.95,0.08,85
design_2,-1.23,-0.12,72
...

Summary:
  Mean PLL: -0.91
  Sequences with PLL > 0: 42/100 (42%)

What good output looks like:

  • PLL_normalized: > 0.0 (more natural-like)
  • Embeddings shape: (N, 1280) for 650M model
  • Higher PLL = more natural sequence

Decision tree

Should I use ESM2?
│
├─ What do you need?
│  ├─ Sequence plausibility score → ESM2 PLL ✓
│  ├─ Embeddings for clustering → ESM2 ✓
│  ├─ Variant effect prediction → ESM2 ✓
│  └─ Structure prediction → Use ESMFold
│
├─ What model size?
│  ├─ Fast screening → esm2_t12_35M
│  ├─ Standard use → esm2_t33_650M ✓
│  └─ Best quality → esm2_t36_3B
│
└─ Use case?
   ├─ QC filtering → PLL > 0.0 threshold
   ├─ Diversity analysis → Mean-pooled embeddings
   └─ Mutation scanning → Per-position log-odds

PLL interpretation

Normalized PLLInterpretation
> 0.2Very natural sequence
0.0 - 0.2Good, natural-like
-0.5 - 0.0Acceptable
< -0.5May be unnatural

Typical performance

Campaign SizeTime (A10G)Cost (Modal)Notes
100 sequences5-10 min~$1Quick screen
1000 sequences30-60 min~$5Standard
5000 sequences2-3h~$20Large batch

Throughput: ~100-200 sequences/minute with 650M model.


Verify

wc -l embeddings/pll_scores.csv  # Should match input + 1 (header)

Troubleshooting

OOM errors: Use smaller model or batch sequences Slow processing: Use esm2_t12_35M for speed Low PLL scores: May indicate unusual/designed sequences

Error interpretation

ErrorCauseFix
RuntimeError: CUDA out of memorySequence too long or large batchReduce batch size
KeyError: representationWrong layer requestedUse layer 33 for 650M model
ValueError: sequenceInvalid amino acidCheck for non-standard AAs

Next: Structure prediction with chai or boltzprotein-qc for filtering.

Habilidades Relacionadas

affaan-m/postgres-patterns

PostgreSQL database patterns for query optimization, schema design, indexing, and security. Based on Supabase best practices.

community

zorak1103/ha-mcp

A Model Context Protocol (MCP) server that provides AI assistants with access to Home Assistant, enabling smart home control and automation management.

community

WangZetian-IVERSON/uiux-demo-skill

AI-powered UI/UX portfolio deck builder skill. One prompt → complete image-rich case study. Works with Copilot, Claude Code, Codex. Config-driven Python toolchain with automated quality review.

community

kookr-ai/kookr

A smart attention router for developers running multiple AI coding agents.

community

czlonkowski/n8n-workflow-patterns

Proven workflow architectural patterns from real n8n workflows. Use when building new workflows, designing workflow structure, choosing workflow patterns, planning workflow architecture, or asking about webhook processing, HTTP API integration, database operations, AI agent workflows, batch processing, or scheduled tasks. Always consult this skill when the user asks to create, build, or design an n8n workflow, automate a process, or connect services — even if they don't explicitly mention 'patterns'. Covers webhook, API, database, AI, batch processing, and scheduled automation architectures. Also use when optimizing a slow workflow or speeding up large-item-count processing (node count, batchSize, all-items vs per-item).

community

zlphoenix/fullstack-project-generator

Fullstack project scaffolding skill for Claude Agent - supports Web (Next.js), Backend (Spring Boot), Android (Kotlin/Compose), iOS (SwiftUI), and Docker deployment

community