Xgbfi特征重要性分析(xgboost扩展)

10,818次阅读

没有评论

共计 1639 个字符，预计需要花费 5 分钟才能阅读完成。

用于训练好的xgboost模型分析对应特征的重要性，当然你也可以使用fmap来观察

Xgbfi is a XGBoost model dump parser, which ranks features as well as feature interactions by different metrics.

Xgbfir – Python porting

Gain: Total gain of each feature or feature interaction
FScore: Amount of possible splits taken on a feature or feature interaction
wFScore: Amount of possible splits taken on a feature or feature interaction weighted by the probability of the splits to take place
Average wFScore: wFScore divided by FScore
Average Gain: Gain divided by FScore
Expected Gain: Total gain of each feature or feature interaction weighted by the probability to gather the gain
Average Tree Index
Average Tree Depth

Leaf Statistics
Split Value Histograms

评判准则的相关说明:

You can install using the pip package manager by running

pip install xgbfir

Clone the repo and install:

git clone https://github.com/limexp/xgbfir.git
cd xgbfir
sudo python setup.py install

Or download the source code by pressing ‘Download ZIP’ on this page. Install by navigating to the proper directory and running

sudo python setup.py install

from sklearn.datasets import load_iris, load_boston
import xgboost as xgb
import xgbfir

# loading database
boston = load_boston()

# doing all the XGBoost magic
xgb_rmodel = xgb.XGBRegressor().fit(boston['data'], boston['target'])

# saving to file with proper feature names
xgbfir.saveXgbFI(xgb_rmodel, feature_names=boston.feature_names, OutputXlsxFile='bostonFI.xlsx')


# loading database
iris = load_iris()

# doing all the XGBoost magic
xgb_cmodel = xgb.XGBClassifier().fit(iris['data'], iris['target'])

# saving to file with proper feature names
xgbfir.saveXgbFI(xgb_cmodel, feature_names=iris.feature_names, OutputXlsxFile='irisFI.xlsx')

现在你看下生成的excel文件

参考

https://github.com/limexp/xgbfir

https://github.com/Far0n/xgbfi

正文完

请博主喝杯咖啡吧！