LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:
For further details, please refer to Features. Benefiting from these advantages, LightGBM is being widely-used in many winning solutions of machine learning competitions. Comparison experiments on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, distributed learning experiments show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings. Get Started and DocumentationOur primary documentation is at https://lightgbm.readthedocs.io/ and is generated from this repository. If you are new to LightGBM, follow the installation instructions on that site. Next you may want to read: Documentation for contributors:
NewsPlease refer to changelogs at GitHub releases page. Some old update logs are available at Key Events page. External (Unofficial) RepositoriesFLAML (AutoML library for hyperparameter optimization): https://github.com/microsoft/FLAML Optuna (hyperparameter optimization framework): https://github.com/optuna/optuna Julia-package: https://github.com/IQVIA-ML/LightGBM.jl JPMML (Java PMML converter): https://github.com/jpmml/jpmml-lightgbm Nyoka (Python PMML converter): https://github.com/SoftwareAG/nyoka Treelite (model compiler for efficient deployment): https://github.com/dmlc/treelite lleaves (LLVM-based model compiler for efficient inference): https://github.com/siboehm/lleaves Hummingbird (model compiler into tensor computations): https://github.com/microsoft/hummingbird cuML Forest Inference Library (GPU-accelerated inference): https://github.com/rapidsai/cuml daal4py (Intel CPU-accelerated inference): https://github.com/intel/scikit-learn-intelex/tree/master/daal4py m2cgen (model appliers for various languages): https://github.com/BayesWitnesses/m2cgen leaves (Go model applier): https://github.com/dmitryikh/leaves ONNXMLTools (ONNX converter): https://github.com/onnx/onnxmltools SHAP (model output explainer): https://github.com/slundberg/shap Shapash (model visualization and interpretation): https://github.com/MAIF/shapash dtreeviz (decision tree visualization and model interpretation): https://github.com/parrt/dtreeviz SynapseML (LightGBM on Spark): https://github.com/microsoft/SynapseML Kubeflow Fairing (LightGBM on Kubernetes): https://github.com/kubeflow/fairing Kubeflow Operator (LightGBM on Kubernetes): https://github.com/kubeflow/xgboost-operator lightgbm_ray (LightGBM on Ray): https://github.com/ray-project/lightgbm_ray Mars (LightGBM on Mars): https://github.com/mars-project/mars ML.NET (.NET/C#-package): https://github.com/dotnet/machinelearning LightGBM.NET (.NET/C#-package): https://github.com/rca22/LightGBM.Net Ruby gem: https://github.com/ankane/lightgbm-ruby LightGBM4j (Java high-level binding): https://github.com/metarank/lightgbm4j lightgbm-rs (Rust binding): https://github.com/vaaaaanquish/lightgbm-rs MLflow (experiment tracking, model monitoring framework): https://github.com/mlflow/mlflow {treesnip} (R {parsnip}-compliant interface): https://github.com/curso-r/treesnip {mlr3extralearners} (R {mlr3}-compliant interface): https://github.com/mlr-org/mlr3extralearners lightgbm-transform (feature transformation binding): https://github.com/microsoft/lightgbm-transform SupportHow to ContributeCheck CONTRIBUTING page. Microsoft Open Source Code of ConductThis project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact with any additional questions or comments. Reference PapersGuolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree". Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 3149-3157. Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu. "A Communication-Efficient Parallel Algorithm for Decision Tree". Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 1279-1287. Huan Zhang, Si Si and Cho-Jui Hsieh. "GPU Acceleration for Large-scale Tree Boosting". SysML Conference, 2018. Note: If you use LightGBM in your GitHub projects, please add lightgbm in the requirements.txt. LicenseThis project is licensed under the terms of the MIT license. See LICENSE for additional details. Tôi đang cố gắng chạy lightgbm của mình để lựa chọn tính năng như bên dưới; sự khởi tạo # Initialize an empty array to hold feature importances feature_importances = np.zeros(features_sample.shape[1]) # Create the model with several hyperparameters model = lgb.LGBMClassifier(objective='binary', boosting_type = 'goss', n_estimators = 10000, class_weight ='balanced')sau đó tôi phù hợp với mô hình như dưới đây # Fit the model twice to avoid overfitting for i in range(2): # Split into training and validation set train_features, valid_features, train_y, valid_y = train_test_split(train_X, train_Y, test_size = 0.25, random_state = i) # Train using early stopping model.fit(train_features, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)], eval_metric = 'auc', verbose = 200) # Record the feature importances feature_importances += model.feature_importances_nhưng tôi gặp lỗi dưới đây Training until validation scores don't improve for 100 rounds. Early stopping, best iteration is: [6] valid_0's auc: 0.88648 ValueError: operands could not be broadcast together with shapes (87,) (83,) (87,)LightGBM, short for Light Gradient Boosting Machine, is a free and open source distributed gradient boosting framework for machine learning originally developed by Microsoft.[4][5] It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability.
v3.3.2[3]
/ January 6, 2022 (2022-01-06) The LightGBM framework supports different algorithms including GBT, GBDT, GBRT, GBM, MART[6][7] and RF.[8] LightGBM has many of XGBoost's advantages, including sparse optimization, parallel training, multiple loss functions, regularization, bagging, and early stopping. A major difference between the two lies in the construction of trees. LightGBM does not grow a tree level-wise — row by row — as most other implementations do.[9] Instead it grows trees leaf-wise. It chooses the leaf it believes will yield the largest decrease in loss.[10] Besides, LightGBM does not use the widely-used sorted-based decision tree learning algorithm, which searches the best split point on sorted feature values,[11] as XGBoost or other implementations do. Instead, LightGBM implements a highly optimized histogram-based decision tree learning algorithm, which yields great advantages on both efficiency and memory consumption. [12] The LightGBM algorithm utilizes two novel techniques called Gradient-Based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) which allow the algorithm to run faster while maintaining a high level of accuracy.[13] LightGBM works on Linux, Windows, and macOS and supports C++, Python,[14] R, and C#.[15] The source code is licensed under MIT License and available on GitHub.[16] Gradient-Based One-Side Sampling (GOSS) is a method that leverages the fact that there is no native weight for data instance in GBDT. Since data instances with different gradients play different roles in the computation of information gain, the instances with larger gradients will contribute more to the information gain. Thus, in order to retain the accuracy of the information, GOSS keeps the instances with large gradients and randomly drops the instances with small gradients.[13] Exclusive Feature Bundling (EFB) is a near-lossless method to reduce the number of effective features. In a sparse feature space many features are nearly exclusive, implying they rarely take nonzero values simultaneously. One-hot encoded features are a perfect example of exclusive features. EFB bundles these features, reducing dimensionality to improve efficiency while maintaining a high level of accuracy. The bundle of exclusive features into a single feature is called an exclusive feature bundle. [13]
|