{"passport":{"unfragile":{"@version":"1.0","version":"2026-05","artifact":{"id":"pypi_pypi-catboost","slug":"pypi-catboost","name":"catboost","type":"framework","url":"https://catboost.ai","page_url":"https://unfragile.ai/pypi-catboost","categories":["model-training"],"tags":["catboost"],"pricing":{"model":"open_source","free":true,"starting_price":null},"status":"active","verified":false},"capabilities":[{"id":"pypi_pypi-catboost__cap_0","uri":"capability://data.processing.analysis.gradient.boosting.model.training.with.categorical.feature.handling","name":"gradient-boosting model training with categorical feature handling","description":"Trains gradient boosting decision tree ensembles with native categorical feature support through ordered target encoding, eliminating the need for manual one-hot encoding. CatBoost implements symmetric trees and oblivious decision trees to reduce overfitting, with per-iteration metric tracking and early stopping via validation datasets. The training pipeline processes data through a columnar pool structure that maintains feature statistics and categorical mappings throughout the boosting iterations.","intents":["Train a gradient boosting model on datasets with mixed categorical and numerical features without preprocessing","Achieve better generalization on tabular data compared to standard XGBoost or LightGBM","Monitor model performance across iterations and stop training when validation metrics plateau"],"best_for":["Data scientists working with tabular datasets containing categorical variables","Teams building production ML pipelines that need minimal feature engineering","Practitioners optimizing for prediction accuracy on structured data competitions"],"limitations":["Training speed slower than LightGBM on very large datasets (>10M rows) due to symmetric tree construction overhead","Categorical feature encoding is learned during training, making inference on unseen categories require fallback strategies","GPU training requires NVIDIA CUDA 11.0+ with compute capability 3.5+, limiting deployment to recent hardware"],"requires":["Python 3.8+","NumPy 1.16.0+","Pandas 0.24.0+ (for DataFrame input)","For GPU training: CUDA 11.0+, cuDNN 8.0+, NVIDIA driver 450.0+"],"input_types":["CSV files","Pandas DataFrames","NumPy arrays","CatBoost Pool objects (columnar format)"],"output_types":["Trained CatBoostClassifier or CatBoostRegressor model object","Feature importance scores","Prediction arrays","Training history with per-iteration metrics"],"categories":["data-processing-analysis","machine-learning-training"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_1","uri":"capability://data.processing.analysis.gpu.accelerated.gradient.boosting.training","name":"gpu-accelerated gradient boosting training","description":"Executes the entire gradient boosting training pipeline on NVIDIA GPUs using CUDA kernels, including histogram computation, loss calculation, and tree construction. CatBoost implements GPU-specific optimizations through custom CUDA kernels in catboost/cuda/methods/ and catboost/cuda/targets/ that parallelize metric calculation and boosting progress tracking across GPU blocks. The GPU training path maintains feature-parity with CPU training while achieving 10-50x speedup on large datasets.","intents":["Train large-scale gradient boosting models 10-50x faster using GPU acceleration","Iterate quickly on hyperparameter tuning with reduced wall-clock time","Scale training to datasets that would be prohibitively slow on CPU"],"best_for":["ML engineers with access to NVIDIA GPUs training on datasets >1M rows","Kaggle competitors optimizing model training time within competition constraints","Production teams needing sub-minute training times for online learning scenarios"],"limitations":["GPU memory constraints limit batch sizes; datasets >100GB require careful memory management or multi-GPU strategies","GPU training only supports NVIDIA hardware; no AMD or Intel GPU support","Some advanced features (custom loss functions, certain metric types) have limited GPU implementation coverage","GPU training requires exact CUDA/cuDNN version matching; version mismatches cause silent failures or crashes"],"requires":["NVIDIA GPU with compute capability 3.5+ (Kepler generation or newer)","CUDA 11.0 or 11.8 (version-specific wheels)","cuDNN 8.0+","NVIDIA driver 450.0+","CatBoost GPU-enabled wheel (catboost-gpu package or GPU-compiled source)"],"input_types":["Pandas DataFrames","NumPy arrays","CatBoost Pool objects","CSV files (loaded into memory)"],"output_types":["Trained GPU-compatible model","Training metrics per iteration","GPU memory usage statistics"],"categories":["data-processing-analysis","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_10","uri":"capability://planning.reasoning.model.interpretation.through.shap.values.and.decision.path.analysis","name":"model interpretation through shap values and decision path analysis","description":"Provides model-agnostic and model-specific interpretation methods: SHAP values (Shapley Additive exPlanations) for feature contribution to individual predictions, and decision path analysis showing which tree splits influenced each prediction. CatBoost computes SHAP values by iterating through the tree ensemble and computing the marginal contribution of each feature to the final prediction. Decision paths trace the route through trees for each sample, identifying which splits were activated.","intents":["Explain individual predictions to stakeholders and regulators","Identify feature interactions and non-linear relationships in model decisions","Debug model failures by analyzing which features drove incorrect predictions"],"best_for":["Compliance teams needing model explainability for regulatory requirements (GDPR, Fair Lending)","Product teams explaining model decisions to end users","Data scientists debugging model errors and validating model behavior"],"limitations":["SHAP value computation is O(n_features × n_trees); slow for large models (>5000 trees) or high-dimensional data (>500 features)","SHAP values assume feature independence; misleading for highly correlated features","Decision path analysis is sample-specific; patterns may not generalize across the dataset","SHAP computation requires loading full model into memory; not suitable for extremely large ensembles"],"requires":["Trained CatBoost model","Feature names or indices","Dataset for SHAP computation (can be training, validation, or test set)","CatBoost 0.24+ for native SHAP support"],"input_types":["Trained CatBoostClassifier or CatBoostRegressor","Pandas DataFrame or NumPy array with same features as training data","Sample indices for which to compute SHAP values"],"output_types":["SHAP values (array of shape n_samples × n_features)","Base value (model's average prediction)","Decision paths (list of split indices per sample)","Feature contribution rankings per sample"],"categories":["planning-reasoning","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_11","uri":"capability://automation.workflow.multi.gpu.distributed.training.with.synchronization","name":"multi-gpu distributed training with synchronization","description":"Distributes gradient boosting training across multiple GPUs on a single machine or across multiple machines using AllReduce synchronization. CatBoost's distributed training (catboost/cuda/train_lib/) partitions data across GPUs, computes local histograms in parallel, and synchronizes gradients/Hessians using collective communication primitives (NCCL for multi-GPU, MPI for multi-machine). The training loop maintains consistency by ensuring all GPUs process the same boosting iterations.","intents":["Train on datasets too large for single GPU memory by distributing across multiple GPUs","Accelerate training 2-8x using multiple GPUs compared to single GPU","Scale training to multi-machine clusters for very large datasets"],"best_for":["ML engineers with access to multi-GPU infrastructure training on datasets >50GB","Teams building large-scale recommendation systems or NLP models","Researchers training on high-dimensional datasets requiring distributed computation"],"limitations":["Multi-GPU training requires NCCL 2.0+ and careful GPU synchronization; debugging distributed training is complex","Communication overhead between GPUs can dominate computation time for small models or datasets; speedup diminishes with >8 GPUs","Multi-machine training requires MPI setup and network bandwidth; not suitable for slow networks","Distributed training requires exact reproducibility of data partitioning; non-deterministic data loading breaks reproducibility"],"requires":["Multiple NVIDIA GPUs (2+) with compute capability 3.5+","CUDA 11.0+ and cuDNN 8.0+","NCCL 2.0+ for multi-GPU synchronization","For multi-machine: MPI implementation (OpenMPI or MPICH)","CatBoost compiled with multi-GPU support"],"input_types":["Training data (partitioned across GPUs)","Pandas DataFrame or NumPy array","CatBoost Pool objects"],"output_types":["Trained model (identical to single-GPU training)","Training metrics per iteration","GPU utilization statistics"],"categories":["automation-workflow","data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_12","uri":"capability://tool.use.integration.apache.spark.integration.for.distributed.inference.and.training","name":"apache spark integration for distributed inference and training","description":"Integrates CatBoost with Apache Spark through native JVM bindings (catboost4j-prediction, catboost4j-spark) enabling distributed inference on Spark DataFrames and distributed training on Spark clusters. The Spark integration wraps the native C++ model in Java classes, allowing Spark executors to load and run models in parallel. Training on Spark uses Spark's distributed data loading and partitioning, with CatBoost handling the boosting logic on the driver node.","intents":["Run batch inference on Spark DataFrames without converting to Python/Pandas","Train models on data stored in Spark (HDFS, Delta Lake, Parquet) without ETL to Python","Integrate CatBoost predictions into Spark ML pipelines for end-to-end ML workflows"],"best_for":["Data engineers building Spark-based ML pipelines at scale","Teams with data already in Spark (HDFS, Delta Lake) avoiding expensive data movement","Organizations standardized on Spark for distributed computing"],"limitations":["Spark integration adds JVM overhead; inference latency is 5-10% higher than native C++ inference","Distributed training on Spark requires data shuffling; network overhead can dominate for small datasets","Spark integration requires Java 8+ and Scala 2.11+; not compatible with Python-only Spark clusters","Model serialization through Spark requires careful handling of categorical feature encoding; encoding must be preserved across Spark workers"],"requires":["Apache Spark 2.4+","Java 8+","Scala 2.11+ (for Scala API)","CatBoost Spark package (catboost4j-spark)","Spark cluster with sufficient memory for model loading on each executor"],"input_types":["Spark DataFrame with feature columns","Parquet, CSV, or Delta Lake files","Trained CatBoost model (serialized)"],"output_types":["Spark DataFrame with predictions","Trained model (Spark ML format)","Feature importance (Spark DataFrame)"],"categories":["tool-use-integration","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_2","uri":"capability://data.processing.analysis.multi.class.and.multi.label.classification.with.custom.loss.functions","name":"multi-class and multi-label classification with custom loss functions","description":"Supports multi-class classification through softmax loss and multi-label classification through binary cross-entropy per label, with extensible custom loss function framework. CatBoost's loss function system (catboost/libs/metrics/metric.cpp) allows users to define custom objectives by implementing gradient and Hessian computations, which are then integrated into the boosting loop. The framework handles automatic differentiation for loss functions and supports both built-in losses (CrossEntropy, MultiClass, MultiLogloss) and user-defined objectives.","intents":["Train multi-class classifiers on problems with >2 target classes","Implement domain-specific loss functions that weight misclassification errors differently","Build multi-label classification models where samples can belong to multiple classes simultaneously"],"best_for":["NLP practitioners building multi-class text classification models","Computer vision teams training multi-label image classification systems","Domain experts with custom loss requirements (e.g., asymmetric costs for false positives vs false negatives)"],"limitations":["Custom loss functions require C++ implementation and recompilation; Python-only loss definitions not supported","Multi-label training requires manual label encoding; no built-in multi-hot vector support","Custom loss functions must provide analytically-computed gradients and Hessians; automatic differentiation not available"],"requires":["Python 3.8+","CatBoost 0.24+","For custom losses: C++ compiler and CatBoost source code","Properly formatted target labels (0-indexed class indices for multi-class, binary matrix for multi-label)"],"input_types":["Pandas DataFrames with feature columns","NumPy arrays","CatBoost Pool with label column"],"output_types":["Trained multi-class/multi-label classifier","Probability predictions (shape: n_samples × n_classes)","Class predictions (argmax of probabilities)","Per-class feature importance"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_3","uri":"capability://data.processing.analysis.feature.importance.computation.with.multiple.attribution.methods","name":"feature importance computation with multiple attribution methods","description":"Computes feature importance through multiple attribution approaches: PredictionValuesChange (impact on predictions when feature is permuted), LossFunctionChange (impact on loss metric), and Shap values (Shapley-based feature contribution). The implementation in catboost/libs/model_interface/ computes importance scores by iterating through the trained tree ensemble and measuring how much each feature contributes to splits and predictions. Shap value computation uses tree-based algorithms optimized for gradient boosting structure.","intents":["Understand which features drive model predictions for model debugging and validation","Identify the most important features for feature selection and dimensionality reduction","Explain individual predictions using Shapley values for model interpretability"],"best_for":["Data scientists validating model behavior and detecting data leakage","Regulatory teams needing model explainability for compliance (GDPR, Fair Lending)","Feature engineers prioritizing which features to engineer or collect"],"limitations":["Shap value computation is O(n_features × n_trees) and becomes slow for models with >1000 trees or >100 features","Feature importance is computed on training/validation data; importance may differ significantly on out-of-distribution test data","PredictionValuesChange importance can be misleading for correlated features (high importance may be due to proxy effects)","Shap computation requires loading the entire model into memory; not suitable for extremely large ensembles (>10k trees)"],"requires":["Trained CatBoost model","Feature names or indices","Dataset for importance computation (can be training, validation, or test set)","For Shap values: CatBoost 0.24+"],"input_types":["Trained CatBoostClassifier or CatBoostRegressor","Pandas DataFrame or NumPy array with same features as training data","CatBoost Pool object"],"output_types":["Feature importance scores (array of shape n_features)","Shap values (array of shape n_samples × n_features)","Feature importance DataFrame with feature names and scores"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_4","uri":"capability://data.processing.analysis.cross.validation.with.stratified.and.time.series.splits","name":"cross-validation with stratified and time-series splits","description":"Implements cross-validation framework supporting stratified k-fold (for classification), k-fold (for regression), and time-series splits with proper train/validation/test separation. CatBoost's cross-validation (cv function) handles data splitting, trains independent models on each fold, and aggregates metrics across folds. The implementation respects categorical feature encoding learned on training folds and applies it consistently to validation folds, preventing data leakage.","intents":["Estimate model generalization performance using k-fold cross-validation","Evaluate models on time-series data with proper temporal ordering","Tune hyperparameters using cross-validation scores without manual fold management"],"best_for":["ML practitioners with small-to-medium datasets (<100k rows) where cross-validation is computationally feasible","Time-series forecasters needing proper temporal validation","Hyperparameter tuning workflows requiring robust performance estimates"],"limitations":["Cross-validation is k times slower than single train/test split; impractical for very large datasets (>10M rows)","Stratified splits require discrete target variable; not applicable to regression with continuous targets","Time-series splits assume temporal ordering in data; requires manual data sorting before CV","No built-in support for grouped k-fold (e.g., multiple samples per group) without custom split logic"],"requires":["CatBoost 0.15+","Training data with labels","For time-series: data sorted by time","For stratified splits: discrete classification target"],"input_types":["Pandas DataFrame","NumPy arrays","CatBoost Pool objects","X (features) and y (labels) arrays"],"output_types":["Cross-validation metrics (mean and std across folds)","Per-fold predictions","Trained models for each fold","Feature importance aggregated across folds"],"categories":["data-processing-analysis","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_5","uri":"capability://tool.use.integration.model.serialization.and.deployment.across.languages","name":"model serialization and deployment across languages","description":"Exports trained models to multiple formats (ONNX, C++, Python pickle, JSON) enabling deployment across different runtime environments. CatBoost implements language-specific model interfaces: C++ API (catboost/libs/model_interface/) for production servers, Java/JVM bindings (catboost/jvm-packages/) for Spark integration, and Python pickle for simple deployments. The ONNX export converts the tree ensemble to ONNX standard format, enabling inference in any ONNX-compatible runtime (TensorFlow Lite, CoreML, etc.).","intents":["Deploy trained models to production servers written in C++, Java, or other languages","Export models to mobile/edge devices via ONNX or CoreML formats","Integrate CatBoost predictions into Apache Spark pipelines for distributed inference"],"best_for":["Production ML engineers deploying models to polyglot infrastructure","Mobile/edge ML teams targeting iOS, Android, or embedded systems","Data engineers building Spark-based ML pipelines at scale"],"limitations":["ONNX export does not preserve categorical feature encoding; requires manual preprocessing in deployment code","C++ API requires linking against CatBoost native libraries; adds deployment complexity","Java/JVM bindings have ~5-10% performance overhead compared to native C++ inference","Pickle serialization is Python-version dependent; models trained on Python 3.8 may not load on Python 3.11 without compatibility layers"],"requires":["Trained CatBoost model","For ONNX: skl2onnx or onnx-simplifier (optional)","For C++: CatBoost C++ headers and compiled libraries","For Java: CatBoost JVM package and Java 8+","For Spark: PySpark 2.4+ or Scala 2.11+"],"input_types":["Trained CatBoostClassifier or CatBoostRegressor","Model file path (for loading)"],"output_types":["ONNX model file (.onnx)","C++ model header file (.h)","Python pickle file (.pkl)","JSON model representation","Java/Scala model wrapper"],"categories":["tool-use-integration","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_6","uri":"capability://planning.reasoning.hyperparameter.optimization.with.bayesian.search","name":"hyperparameter optimization with bayesian search","description":"Integrates with Optuna and Hyperopt for Bayesian hyperparameter optimization, automatically tuning learning rate, tree depth, regularization, and categorical feature handling parameters. CatBoost provides a scikit-learn compatible interface (get_params/set_params) that enables seamless integration with standard hyperparameter optimization libraries. The optimization loop trains models on cross-validation folds and uses acquisition functions to select promising hyperparameter combinations.","intents":["Automatically find optimal hyperparameters without manual grid search","Tune categorical feature encoding parameters (target encoding smoothing, prior)","Balance model complexity vs generalization through regularization parameter search"],"best_for":["ML practitioners with moderate computational budgets (100-1000 model evaluations)","Teams using Optuna or Hyperopt for multi-model hyperparameter optimization","Researchers comparing CatBoost hyperparameter sensitivity across datasets"],"limitations":["Bayesian optimization requires 10-20 initial random evaluations before becoming efficient; total optimization time is high for large search spaces","Hyperparameter optimization is dataset-specific; optimal parameters for one dataset may not transfer to similar datasets","Some CatBoost parameters (e.g., custom loss functions) cannot be tuned through standard hyperparameter optimization","Optimization requires careful metric selection; optimizing for accuracy may not optimize for business metrics (precision, recall, AUC)"],"requires":["CatBoost 0.15+","Optuna 2.0+ or Hyperopt 0.2+","Training data with labels","Computational budget for 100+ model trainings"],"input_types":["Training data (Pandas DataFrame or NumPy array)","Hyperparameter search space definition (dict or Optuna sampler)","Metric function (sklearn.metrics function or custom callable)"],"output_types":["Best hyperparameters found","Optimization history (trials with scores)","Trained model with best hyperparameters","Optimization visualization (parameter importance, history plots)"],"categories":["planning-reasoning","automation-workflow"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_7","uri":"capability://data.processing.analysis.dataset.statistics.and.histogram.computation","name":"dataset statistics and histogram computation","description":"Computes and caches dataset statistics (histograms, quantiles, feature distributions) during training to accelerate tree construction and enable feature analysis. The statistics module (catboost/libs/dataset_statistics/) maintains columnar histograms for each feature, updated incrementally as the boosting ensemble grows. These statistics are used internally for split finding and can be exported for external analysis of feature distributions and relationships.","intents":["Understand feature distributions and identify data quality issues (missing values, outliers, skewness)","Accelerate tree construction by reusing cached histograms across boosting iterations","Analyze feature interactions and correlations to guide feature engineering"],"best_for":["Data scientists performing exploratory data analysis before model training","ML engineers optimizing training speed through histogram caching","Data quality teams monitoring feature distributions in production data"],"limitations":["Histogram computation requires loading full dataset into memory; not suitable for datasets >100GB","Histograms are approximate (binned) representations; exact quantiles require full data scan","Statistics are computed on training data only; distribution shift in production data is not detected","Histogram granularity (number of bins) is fixed at training time; cannot be changed post-hoc"],"requires":["CatBoost 0.24+","Training data loaded into memory","Sufficient RAM for histogram storage (~1KB per feature per bin)"],"input_types":["Pandas DataFrame","NumPy arrays","CatBoost Pool objects"],"output_types":["Feature histograms (bin edges and counts)","Quantile values (min, 25%, 50%, 75%, max)","Feature statistics (mean, std, skewness)","Missing value counts per feature"],"categories":["data-processing-analysis"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_8","uri":"capability://automation.workflow.early.stopping.with.validation.monitoring","name":"early stopping with validation monitoring","description":"Monitors validation metric (loss, accuracy, custom metric) during training and stops boosting when metric plateaus or degrades, preventing overfitting. CatBoost's early stopping (boosting_progress_tracker.cpp) tracks per-iteration validation metrics and compares against the best observed value. When validation metric fails to improve for a specified number of iterations (patience), training terminates and the best model is returned.","intents":["Prevent overfitting by stopping training when validation performance degrades","Reduce training time by terminating unpromising training runs early","Automatically find the optimal number of boosting iterations without manual tuning"],"best_for":["Practitioners with limited computational budgets wanting to avoid wasted training","Teams building production models where overfitting is a critical concern","Hyperparameter optimization workflows where early stopping reduces per-trial time"],"limitations":["Early stopping requires a separate validation dataset; reduces training data available for model fitting","Patience parameter (iterations without improvement) is a hyperparameter itself; suboptimal values lead to premature or delayed stopping","Early stopping is metric-specific; optimizing for one metric may degrade other metrics","Validation set must be representative of test distribution; biased validation sets lead to incorrect stopping decisions"],"requires":["CatBoost 0.15+","Separate validation dataset (10-20% of training data)","Metric function (built-in or custom)","Patience parameter (typically 10-50 iterations)"],"input_types":["Training data (Pandas DataFrame or NumPy array)","Validation data (same format as training)","Metric name (string) or custom metric function"],"output_types":["Trained model (stopped at best iteration)","Best iteration number","Validation metric history","Training/validation metric comparison"],"categories":["automation-workflow","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0},{"id":"pypi_pypi-catboost__cap_9","uri":"capability://data.processing.analysis.prediction.with.confidence.intervals.and.uncertainty.quantification","name":"prediction with confidence intervals and uncertainty quantification","description":"Generates predictions with associated uncertainty estimates through prediction interval computation and quantile regression. CatBoost supports quantile loss functions (MAE, Quantile) that enable training models to predict specific quantiles (e.g., 5th and 95th percentile) rather than point estimates. By training separate models for lower and upper quantiles, practitioners can construct prediction intervals that quantify model uncertainty.","intents":["Generate prediction intervals (confidence bounds) around point predictions","Quantify model uncertainty for risk-aware decision making","Build probabilistic forecasts for time-series applications"],"best_for":["Risk management teams needing uncertainty quantification for decision support","Time-series forecasters building probabilistic forecasts","Medical/financial applications where prediction confidence is critical"],"limitations":["Prediction intervals require training multiple models (one per quantile); 3x training time for 3-quantile setup","Quantile regression assumes independent errors; heteroscedastic data may require separate variance models","Prediction intervals are only as good as the quantile models; miscalibrated quantile models produce invalid intervals","No built-in calibration methods; intervals may be too narrow or too wide without post-hoc adjustment"],"requires":["CatBoost 0.24+","Training data with continuous target variable","Separate model training for each quantile level","Quantile loss function (Quantile or MAE)"],"input_types":["Training data (Pandas DataFrame or NumPy array)","Quantile levels (e.g., [0.05, 0.5, 0.95])","Test data for prediction"],"output_types":["Point predictions (median quantile)","Lower bound predictions (e.g., 5th percentile)","Upper bound predictions (e.g., 95th percentile)","Prediction interval width"],"categories":["data-processing-analysis","planning-reasoning"],"confidence":0.5,"matches":0,"success_rate":0}],"trust":{"score":32,"verified":false,"data_access_risk":"high","permissions":["Python 3.8+","NumPy 1.16.0+","Pandas 0.24.0+ (for DataFrame input)","For GPU training: CUDA 11.0+, cuDNN 8.0+, NVIDIA driver 450.0+","NVIDIA GPU with compute capability 3.5+ (Kepler generation or newer)","CUDA 11.0 or 11.8 (version-specific wheels)","cuDNN 8.0+","NVIDIA driver 450.0+","CatBoost GPU-enabled wheel (catboost-gpu package or GPU-compiled source)","Trained CatBoost model"],"failure_modes":["Training speed slower than LightGBM on very large datasets (>10M rows) due to symmetric tree construction overhead","Categorical feature encoding is learned during training, making inference on unseen categories require fallback strategies","GPU training requires NVIDIA CUDA 11.0+ with compute capability 3.5+, limiting deployment to recent hardware","GPU memory constraints limit batch sizes; datasets >100GB require careful memory management or multi-GPU strategies","GPU training only supports NVIDIA hardware; no AMD or Intel GPU support","Some advanced features (custom loss functions, certain metric types) have limited GPU implementation coverage","GPU training requires exact CUDA/cuDNN version matching; version mismatches cause silent failures or crashes","SHAP value computation is O(n_features × n_trees); slow for large models (>5000 trees) or high-dimensional data (>500 features)","SHAP values assume feature independence; misleading for highly correlated features","Decision path analysis is sample-specific; patterns may not generalize across the dataset","builder identity is not verified yet","no observed match outcomes yet"],"rank_breakdown":{"adoption":0.05,"quality":0.35,"ecosystem":0.43,"match_graph":0.25,"freshness":0.9,"weights":{"adoption":0.3,"quality":0.2,"ecosystem":0.15,"match_graph":0.23,"freshness":0.12}},"observed_outcomes":{"matches":0,"success_rate":0,"avg_confidence":0,"top_intents":[],"last_matched_at":null},"maintenance":{"status":"active","updated_at":"2026-05-24T12:16:25.060Z","last_scraped_at":"2026-05-03T15:20:16.568Z","last_commit":null},"community":{"stars":null,"forks":null,"weekly_downloads":null,"model_downloads":null,"model_likes":null}},"distribution":{"claim_url":"https://unfragile.ai/submit?claim=pypi-catboost","compare_url":"https://unfragile.ai/compare?artifact=pypi-catboost"}},"signature":"vVLjsH4k7a0uyOs5QOkNq9bNikFY40Aq5QVeCIsYli3mdipXUvhnqHYxcvKnY24C0kerl8BuNAOLaXHGqOTkCQ==","signedAt":"2026-06-15T16:49:33.273Z","signedBy":"unfragile.ai","version":1},"_links":{"self":"https://unfragile.ai/api/v1/passport/pypi-catboost","artifact":"https://unfragile.ai/pypi-catboost","verify":"https://unfragile.ai/api/v1/verify?slug=pypi-catboost","publicKey":"https://unfragile.ai/api/v1/trust-passport-public-key","spec":"https://unfragile.ai/trust","schema":"https://unfragile.ai/schema.json","docs":"https://unfragile.ai/docs"}}