the-stack-v2 is the training data of starcoder v2. Whereas, the starcoder merely provides the metadata of its training dataset. To convinient for your usage of this dataset, I share this script for you to download the dataset directly.
You could apply the below commondline for dataset download directly. The only thing you need to do is to set your huggingface access token through the --hug_access_token parameter.
python --hug_access_token {your_huggingface_access_token}