-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.txt
163 lines (133 loc) · 7.4 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
Scrape project and production data on existing and planned generators in the
United States from the Energy Information Agency for several years. Data are
processed and uploaded to a Postgresql database.
TABLE OF CONTENTS
* FILE LAYOUT
* GOALS
* RESOURCES
GOALS
To provide useful generation project data for power system modeling. In
particular, to format this data for its use in the Switch power system
planning model <https://github.com/switch-model/switch>.
Download and archive data in a way that can detect future changes to the
upstream repository (i.e. if the federal datasets are tampered with). We will
use a subset of the data for the moment, but would like to keep the remaining
data available for future work, especially since the upstream datasets could
be removed eventually.
Code for data scraping and processing is ment to be clean, so that updating the
catalogue as new data is released each year is easy. The code tries to be clean
and generally useful, so recruiting outside collaborators to help maintain it
is possible.
FILE LAYOUT
pip_requirements.txt is a working list of requirements. It needs to get moved
to a setup.py file.
The scraping code is currently in scrape.py which may later get re-organized as
a package. Functions for downloading files in an archive-safe manner and
unzipping files are in utils.py. Functions to interact with the Postgresql
database are in database_interface.py. All these should get migrated into a
package that lives in a subdirectory.
The codes located in other_dat/* were manually extracted from the latest
"Layout" Excel workbook from the EIA860 form. Their extraction and save should
get automated, and they should live in the directory with other auto-extracted
files - either downloads or a new directory for intermediate outputs.
The average heat rates located in other_dat/* were manually extracted from the
EIA website.
The following resulting datasets contain data suitable for general use in power
system analysis and modeling:
* generation_projects_YYYY.tab:
Unit-level characteristics sourced from the EIA 860 form. Turbines belonging
to the same combined cycle are lumped together. Units are aggregated by plant,
technology, energy source and vintage. This is usually the case for plants
with several identical units, such as a motor facility.
- Technology, location, capacity, vintage, energy source, and other key data
- This outputs are not ment to be used for unit commitment modeling
- All existing plants and all those under construction are processed. Plants
in planning stages are only included if they have initiated their regulatory
approval process.
* historic_heat_rates_(NARROW/WIDE).tab:
Monthly generation data for thermal projects sourced from the EIA 923 form
and crossed with generation project data from the EIA 860 form. The EIA 923
form reports data on a plant-level basis, so generation data is also
calculated in that basis. All coal types are treated indistinctly, but
generation data is reported for all fuels if a plant use multiple energy
sources. The following data is provided for each plant and fuel:
- Monthly net electricity production
- Monthly capacity factor
- Monthly heat rate
- Fraction of electricity produced by each of the plant's fuels
- Singles out the second best monthly heat rate calculated
Missing plants from either the EIA860 or the EIA923 forms are printed out to
the file incomplete_data_thermal_YYYY.csv
Plants that use a secondary fuel to generate more than 5% of their electricity
are also printed to multi_fuel_heat_rates.tab
Plants with consistently negative heat rates are printed out to
negative_heat_rate_outputs.tab and are removed from the historic dataset
* historic_hydro_capacity_factors_(NARROW/WIDE).tab:
Monthly generation data for hydro projects sourced from the EIA 923 form
and crossed with generation project data from the EIA 860 form. The following
data is provided for each plant:
- Monthly net electricity production
- Monthly electricity consumption (relevant for pumped hydro plants)
- Monthly capacity factor (calculated on the basis of electricity generated)
Missing plants from either the EIA860 or the EIA923 forms are printed out to
the file incomplete_data_hydro_YYYY.csv
Quality control:
- Mismatches between the plants present in the EIA-860 and EIA-923 forms are
registered in csv files, and a summary of the incomplete information is
printed to the console
- Historical hydro capacity factors and heat rates are printed alongside
other relevant data, so QA/QC can be done by visual inspection as well
- To Do: Flag outliers for manual review
- To Do: Maybe use Jupyter Notebooks for manual filtering
* existing_generation_projects_YYYY.tab, new_generation_projects_YYYY.tab,
uprates_to_generation_projects_YYYY.tab:
These datasets result from crossing project data with heat rate data, as well
as filtering by NERC region.
- Plants with better heat rates than the best historical records found online
are ignored and assigned an average heat rate per technology, since it is
assumed that reporting errors ocurred.
- The top and bottom .5% of heat rates are also ignored, since they contain
unrealistic values. These heat rates get replaced by the heat rate at the
top and bottom .5 percentile, respectively.
- Plants without heat rate data (such as plants under construction or with
missing information in the EIA923 form) are assigned the average heat rate
of plants with the same technology, energy source and vintage, considering
a 4-year window.
* heat_rate_distributions.pdf:
Histograms showing the distribution of heat rate values per technology and
energy source.
RESOURCES
Please keep updating & expanding this list as you explore available data.
EIA Electricity Data
https://www.eia.gov/electricity/data.cfm
EIA-860 - Catalog of existing & planned generation
https://www.eia.gov/electricity/data/eia860/
generator-level specific information about existing and planned generators
and associated environmental equipment at electric power plants with 1
megawatt or greater of combined nameplate capacity.
Static data (zip files) and documentation
EIA-923 - Input and output of existing generators
https://www.eia.gov/electricity/data/eia923/
detailed electric power data -- monthly and annually -- on electricity
generation, fuel consumption, fossil fuel stocks, and receipts at the
power plant and prime mover level.
Static data (zip files) and documentation
EIA OPEN DATA API
https://www.eia.gov/opendata/?category=0
This could be used for everything (including EIA-860 datasets).
We need to assess whether this is easier or harder to use than static datasets
from zip files.
* API Query Browser
https://www.eia.gov/opendata/qb.php
* The bulk download facility may be better for archiving
https://www.eia.gov/opendata/bulkfiles.php
* If we use their URL-based API, we should still use the functionality of utils
to cache download results in an archive-safe manner.
EIA Electricity Data Browser
https://www.eia.gov/electricity/data/browser/
* Interactive graphical website for browsing their entire data portal.
* Can potentially use to help construct API queries, but another one of their
tools may be more useful for that.
EIA Average Tested Heat Rates by Prime Mover and Energy Source, 2007 - 2015
https://www.eia.gov/electricity/annual/html/epa_08_02.html
Average heat rates for benchmarking.