prev_model["learner"]["gradient_booster"]["model"]["iteration_indptr"].append( KeyError: 'iteration_indptr #2666
Unanswered
AvrAmit56741
asked this question in
Q&A
Replies: 2 comments 3 replies
-
@AvrAmit56741 have you just tried running with our example job? or you have modified the code? |
Beta Was this translation helpful? Give feedback.
3 replies
-
@AvrAmit56741 I consulted a researcher working with xgboost here and the error causing the crash, KeyError: 'iteration_indptr', is likely due to differences in the expected xgboost lib version and model format. The model representation of xgboost has updates across versions, and some keys will be different. We will add to the documentation to specify what versions are supported. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Python version (
python3 -V
)Python 3.10.6
NVFlare version (
python3 -m pip list | grep "nvflare"
)2.4.1
NVFlare branch (if running examples, please use the branch that corresponds to the NVFlare version,
git branch
)No response
Operating system
Ubuntu22
Have you successfully run any of the following examples?
Please describe your question
I am currently implementing a federated learning project using XGBoost within a Dockerized environment, as outlined in your Containerized Deployment with Docker guide. The setup consists of one Ubuntu server and two Docker clients, both derived from the same Dockerfile mentioned in the guide. I run the server using the start.sh script which located at server1/startup directory
both docker clients are running on windows 11 if it matters (the server is crash).
The clients successfully connect to the server, and I am able to initiate tasks without any issues. However, I encounter a server crash after completing the first round of tasks. The specific error message logged is as follows:
2024-06-25 17:33:13,092 - ScatterAndGather - ERROR - [identity=example_project, run=1d1dfb11-6be3-41c8-84f2-cee3d2d1b2f2, wf=scatter_and_gather]: Exception in ScatterAndGather control_flow: KeyError: 'iteration_indptr'
2024-06-25 17:33:13,103 - ScatterAndGather - ERROR - Traceback (most recent call last):
File "/home/amit/Desktop/NVFlare/venv/lib/python3.10/site-packages/nvflare/app_common/workflows/scatter_and_gather.py", line 275, in control_flow
self._global_weights = self.shareable_gen.shareable_to_learnable(aggr_result, fl_ctx)
File "/home/amit/Desktop/NVFlare/venv/lib/python3.10/site-packages/nvflare/app_opt/xgboost/tree_based/shareable_generator.py", line 136, in shareable_to_learnable
model = update_model(model, update)
File "/home/amit/Desktop/NVFlare/venv/lib/python3.10/site-packages/nvflare/app_opt/xgboost/tree_based/shareable_generator.py", line 57, in update_model
prev_model["learner"]["gradient_booster"]["model"]["iteration_indptr"].append(
KeyError: 'iteration_indptr'
2024-06-25 17:33:13,105 - ServerRunner - ERROR - [identity=example_project, run=1d1dfb11-6be3-41c8-84f2-cee3d2d1b2f2, wf=scatter_and_gather]: Aborting current RUN due to FATAL_SYSTEM_ERROR received: Exception in ScatterAndGather control_flow: KeyError: 'iteration_indptr'
Both clients have uninterrupted access to the necessary data, and there are no errors reported on the client side prior to the crash.
(venv) amit@amit-VMware-Virtual-Platform:/tmp/nvflare/jobs/xgboost/app/config$ cat config_fed_server.conf
{
format_version = 2
task_data_filters = []
task_result_filters = []
workflows = [
{
id = "scatter_and_gather"
name = "ScatterAndGather"
args {
min_clients = 2
num_rounds = 100
start_round = 0
wait_time_after_min_received = 0
aggregator_id = "aggregator"
persistor_id = "persistor"
shareable_generator_id = "shareable_generator"
train_task_name = "train"
train_timeout = 0
allow_empty_global_weights = true
}
}
]
components = [
{
id = "persistor"
path = "nvflare.app_opt.xgboost.tree_based.model_persistor.XGBModelPersistor"
args {
save_name = "xgboost_model.json"
}
}
{
id = "shareable_generator"
path = "nvflare.app_opt.xgboost.tree_based.shareable_generator.XGBModelShareableGenerator"
args {}
}
{
id = "aggregator"
path = "nvflare.app_opt.xgboost.tree_based.bagging_aggregator.XGBBaggingAggregator"
args {}
}
]
}
Help!
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions