Lightgbm là gì

Nội dung chính Show

Get Started and Documentation
External (Unofficial) Repositories
How to Contribute
Microsoft Open Source Code of Conduct
Reference Papers

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

Faster training speed and higher efficiency.
Lower memory usage.
Better accuracy.
Support of parallel, distributed, and GPU learning.
Capable of handling large-scale data.

For further details, please refer to Features.

Benefiting from these advantages, LightGBM is being widely-used in many winning solutions of machine learning competitions.

Comparison experiments on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, distributed learning experiments show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.

Get Started and Documentation

Our primary documentation is at https://lightgbm.readthedocs.io/ and is generated from this repository. If you are new to LightGBM, follow the installation instructions on that site.

Next you may want to read:

Documentation for contributors:

How we update readthedocs.io.
Check out the Development Guide.

News

Please refer to changelogs at GitHub releases page.

Some old update logs are available at Key Events page.

External (Unofficial) Repositories

FLAML (AutoML library for hyperparameter optimization): https://github.com/microsoft/FLAML

Optuna (hyperparameter optimization framework): https://github.com/optuna/optuna

Julia-package: https://github.com/IQVIA-ML/LightGBM.jl

JPMML (Java PMML converter): https://github.com/jpmml/jpmml-lightgbm

Nyoka (Python PMML converter): https://github.com/SoftwareAG/nyoka

Treelite (model compiler for efficient deployment): https://github.com/dmlc/treelite

lleaves (LLVM-based model compiler for efficient inference): https://github.com/siboehm/lleaves

Hummingbird (model compiler into tensor computations): https://github.com/microsoft/hummingbird

cuML Forest Inference Library (GPU-accelerated inference): https://github.com/rapidsai/cuml

daal4py (Intel CPU-accelerated inference): https://github.com/intel/scikit-learn-intelex/tree/master/daal4py

m2cgen (model appliers for various languages): https://github.com/BayesWitnesses/m2cgen

leaves (Go model applier): https://github.com/dmitryikh/leaves

ONNXMLTools (ONNX converter): https://github.com/onnx/onnxmltools

SHAP (model output explainer): https://github.com/slundberg/shap

Shapash (model visualization and interpretation): https://github.com/MAIF/shapash

dtreeviz (decision tree visualization and model interpretation): https://github.com/parrt/dtreeviz

SynapseML (LightGBM on Spark): https://github.com/microsoft/SynapseML

Kubeflow Fairing (LightGBM on Kubernetes): https://github.com/kubeflow/fairing

Kubeflow Operator (LightGBM on Kubernetes): https://github.com/kubeflow/xgboost-operator

lightgbm_ray (LightGBM on Ray): https://github.com/ray-project/lightgbm_ray

Mars (LightGBM on Mars): https://github.com/mars-project/mars

ML.NET (.NET/C#-package): https://github.com/dotnet/machinelearning

LightGBM.NET (.NET/C#-package): https://github.com/rca22/LightGBM.Net

Ruby gem: https://github.com/ankane/lightgbm-ruby

LightGBM4j (Java high-level binding): https://github.com/metarank/lightgbm4j

lightgbm-rs (Rust binding): https://github.com/vaaaaanquish/lightgbm-rs

MLflow (experiment tracking, model monitoring framework): https://github.com/mlflow/mlflow

{treesnip} (R {parsnip}-compliant interface): https://github.com/curso-r/treesnip

{mlr3extralearners} (R {mlr3}-compliant interface): https://github.com/mlr-org/mlr3extralearners

lightgbm-transform (feature transformation binding): https://github.com/microsoft/lightgbm-transform

Support

How to Contribute

Check CONTRIBUTING page.

Microsoft Open Source Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact with any additional questions or comments.

Reference Papers

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree". Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 3149-3157.

Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu. "A Communication-Efficient Parallel Algorithm for Decision Tree". Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 1279-1287.

Huan Zhang, Si Si and Cho-Jui Hsieh. "GPU Acceleration for Large-scale Tree Boosting". SysML Conference, 2018.

Note: If you use LightGBM in your GitHub projects, please add lightgbm in the requirements.txt.

License

This project is licensed under the terms of the MIT license. See LICENSE for additional details.

Tôi đang cố gắng chạy lightgbm của mình để lựa chọn tính năng như bên dưới;

sự khởi tạo

# Initialize an empty array to hold feature importances feature_importances = np.zeros(features_sample.shape[1]) # Create the model with several hyperparameters model = lgb.LGBMClassifier(objective='binary', boosting_type = 'goss', n_estimators = 10000, class_weight ='balanced')

sau đó tôi phù hợp với mô hình như dưới đây

# Fit the model twice to avoid overfitting for i in range(2): # Split into training and validation set train_features, valid_features, train_y, valid_y = train_test_split(train_X, train_Y, test_size = 0.25, random_state = i) # Train using early stopping model.fit(train_features, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)], eval_metric = 'auc', verbose = 200) # Record the feature importances feature_importances += model.feature_importances_

nhưng tôi gặp lỗi dưới đây

Training until validation scores don't improve for 100 rounds. Early stopping, best iteration is: [6] valid_0's auc: 0.88648 ValueError: operands could not be broadcast together with shapes (87,) (83,) (87,)

LightGBM, short for Light Gradient Boosting Machine, is a free and open source distributed gradient boosting framework for machine learning originally developed by Microsoft.[4][5] It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability.

LightGBMOriginal author(s)Guolin Ke[1] / Microsoft ResearchDeveloper(s)Microsoft and LightGBM Contributors[2]Initial release2016 (2016)Stable release

v3.3.2[3] / January 6, 2022 (2022-01-06)

Repositorygithub.com/microsoft/LightGBMWritten inC++, Python, R, COperating systemWindows, macOS, LinuxTypeMachine learning, Gradient boosting frameworkLicenseMIT LicenseWebsitelightgbm.readthedocs.io

The LightGBM framework supports different algorithms including GBT, GBDT, GBRT, GBM, MART[6][7] and RF.[8] LightGBM has many of XGBoost's advantages, including sparse optimization, parallel training, multiple loss functions, regularization, bagging, and early stopping. A major difference between the two lies in the construction of trees. LightGBM does not grow a tree level-wise — row by row — as most other implementations do.[9] Instead it grows trees leaf-wise. It chooses the leaf it believes will yield the largest decrease in loss.[10] Besides, LightGBM does not use the widely-used sorted-based decision tree learning algorithm, which searches the best split point on sorted feature values,[11] as XGBoost or other implementations do. Instead, LightGBM implements a highly optimized histogram-based decision tree learning algorithm, which yields great advantages on both efficiency and memory consumption. [12] The LightGBM algorithm utilizes two novel techniques called Gradient-Based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) which allow the algorithm to run faster while maintaining a high level of accuracy.[13]

LightGBM works on Linux, Windows, and macOS and supports C++, Python,[14] R, and C#.[15] The source code is licensed under MIT License and available on GitHub.[16]

Gradient-Based One-Side Sampling (GOSS) is a method that leverages the fact that there is no native weight for data instance in GBDT. Since data instances with different gradients play different roles in the computation of information gain, the instances with larger gradients will contribute more to the information gain. Thus, in order to retain the accuracy of the information, GOSS keeps the instances with large gradients and randomly drops the instances with small gradients.[13]

Exclusive Feature Bundling (EFB) is a near-lossless method to reduce the number of effective features. In a sparse feature space many features are nearly exclusive, implying they rarely take nonzero values simultaneously. One-hot encoded features are a perfect example of exclusive features. EFB bundles these features, reducing dimensionality to improve efficiency while maintaining a high level of accuracy. The bundle of exclusive features into a single feature is called an exclusive feature bundle. [13]

Machine learning
ML.NET
Data binning
Gradient boosting
XGBoost
scikit-learn

^ "Guolin Ke".
^ "microsoft/LightGBM". GitHub.
^ "Releases · microsoft/LightGBM". GitHub.
^ Brownlee, Jason (March 31, 2020). "Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost".
^ Kopitar, Leon; Kocbek, Primoz; Cilar, Leona; Sheikh, Aziz; Stiglic, Gregor (July 20, 2020). "Early detection of type 2 diabetes mellitus using machine learning-based prediction models". Scientific Reports. 10 (1): 11981. Bibcode:2020NatSR..1011981K. doi:10.1038/s41598-020-68771-z. PMC 7371679. PMID 32686721 – via www.nature.com.
^ "Understanding LightGBM Parameters (and How to Tune Them)". neptune.ai. May 6, 2020.
^ "An Overview of LightGBM". avanwyk. May 16, 2018.
^ "Parameters — LightGBM 3.0.0.99 documentation". lightgbm.readthedocs.io.
^ The Gradient Boosters IV: LightGBM – Deep & Shallow
^ XGBoost, LightGBM, and Other Kaggle Competition Favorites | by Andre Ye | Sep, 2020 | Towards Data Science
^ Manish, Mehta; Rakesh, Agrawal; Jorma, Rissanen (Nov 24, 2020). "SLIQ: A fast scalable classifier for data mining". International Conference on Extending Database Technology. CiteSeerX 10.1.1.89.7734.
^ "Features — LightGBM 3.1.0.99 documentation". lightgbm.readthedocs.io.
^ a b c Ke, Guolin; Meng, Qi; Finley, Thomas; Wang, Taifeng; Chen, Wei; Ma, Weidong; Ye, Qiwei; Liu, Tie-Yan (2017). "LightGBM: A Highly Efficient Gradient Boosting Decision Tree". Advances in Neural Information Processing Systems. 30.
^ "lightgbm: LightGBM Python Package" – via PyPI.
^ "Microsoft.ML.Trainers.LightGbm Namespace". docs.microsoft.com.
^ "microsoft/LightGBM". October 6, 2020 – via GitHub.

Guolin Ke; Qi Meng; Thomas Finely; Taifeng Wang; Wei Chen; Weidong Ma; Qiwei Ye; Tie-Yan Liu (2017). "LightGBM: A Highly Efficient Gradient Boosting Decision Tree" (PDF). {{cite journal}}: Cite journal requires |journal= (help)
Quinto, Butch (2020). Next-Generation Machine Learning with Spark – Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More. Apress. ISBN 978-1-4842-5668-8.