Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework event selection #74

Merged
merged 68 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
909171e
first implementation of Aachen DL selection
mafrahm Feb 16, 2024
e89a005
fix loose muon definition
mafrahm Feb 19, 2024
844a946
switch to fakeable lepton as default lepton
mafrahm Feb 19, 2024
2c55d46
fixes in object definitions
mafrahm Feb 19, 2024
d28d66e
use fakeable leptons for isolation
mafrahm Feb 22, 2024
edd1c04
generalize btag usage in selection
mafrahm Feb 22, 2024
57fc6bd
use 2018 btag WPs in 2022
mafrahm Feb 22, 2024
6697c74
temporary fix for merged signal process
mafrahm Feb 22, 2024
e9e1243
add synchronization dataset
mafrahm Feb 22, 2024
04ca845
minor fixes
mafrahm Feb 22, 2024
6f9ab20
add remastered sl selection
mafrahm Feb 22, 2024
6ec30c4
update cf
mafrahm Feb 22, 2024
b2fcc2e
use cf pileup producer
mafrahm Feb 22, 2024
f238126
minor cleanup
mafrahm Feb 22, 2024
16f0851
add short process labels
mafrahm Feb 22, 2024
cb5b502
cleanup in dl selector steps
mafrahm Feb 22, 2024
92530b5
minor fix in cf pileup producer
mafrahm Feb 22, 2024
525bb51
change name of new dl/sl selectors
mafrahm Feb 22, 2024
be1b57d
add 2017 dl triggers to dl1
mafrahm Feb 22, 2024
bf5cb36
remove cutflow_features producer from sl1 and dl1
mafrahm Feb 22, 2024
11b3e0a
switch default selectors to dl1 or sl1
mafrahm Feb 22, 2024
a24b203
move common selector functions to different files
mafrahm Feb 22, 2024
237a2c5
minor changes
mafrahm Feb 22, 2024
c5dd041
add top pt reweighting module
mafrahm Feb 22, 2024
852239a
fix lepton selection for run 2
mafrahm Feb 22, 2024
16d6ee5
implement v boson pt reweighting
mafrahm Feb 22, 2024
9b889d9
fix typo in electron column name
mafrahm Feb 23, 2024
70a4f11
cleanup in top pt reweighting producer
mafrahm Feb 23, 2024
222a0cc
make muon object masks to regular array (remove potential nones)
mafrahm Feb 23, 2024
07dab4f
remove deepTagMD variable from producers
mafrahm Feb 23, 2024
7ca2bdd
enable muon and btag weight for 2022
mafrahm Feb 23, 2024
cada0d7
add some SF configuration for run 3
mafrahm Feb 26, 2024
b4fa107
switch to lepton pt requirement instead of cone pt
mafrahm Feb 26, 2024
43d2de9
update top pt reweighting
mafrahm Feb 27, 2024
ccbb770
set versioning in hbwtasks via law config
mafrahm Feb 27, 2024
6f6e82f
update cmsdb
mafrahm Feb 27, 2024
1dca9a2
add parameter to add dataset extensions
mafrahm Feb 27, 2024
f1174ab
bugfix in FatJet selection
mafrahm Feb 28, 2024
599343a
remove default pilot for ReduceEvents
mafrahm Feb 28, 2024
23fb2f6
add ABCD categories; NOTE: category id schema changed!
mafrahm Feb 28, 2024
7e0c391
simplify lepton definition
mafrahm Feb 29, 2024
c9f67c1
update cmsdb
mafrahm Feb 29, 2024
278ce96
consider qcd when running merged analysis
mafrahm Feb 29, 2024
1633852
add lt variable and angles between W and lepton
mafrahm Feb 29, 2024
da11d8f
bugfix for events with nan btagDeepFlavB scores
mafrahm Feb 29, 2024
abfb0b4
cleanup in ml_inputs
mafrahm Feb 29, 2024
34330ce
remove dataset attribute from MLClassifierBase
mafrahm Feb 29, 2024
67680b3
add all mli inputs to uses to allow reusing PrepareMLEvents
mafrahm Feb 29, 2024
03f0870
streamline setting default calibrators, producers
mafrahm Feb 29, 2024
b2f36dc
cleanup in sl ml models
mafrahm Feb 29, 2024
05cbb43
reduce set of created categories for now
mafrahm Feb 29, 2024
4af51df
separate category creation
mafrahm Feb 29, 2024
095c26a
remove category ids and event weights from reconstruction producers
mafrahm Feb 29, 2024
8a8cc9e
lint
mafrahm Feb 29, 2024
e949d9f
minor fixes and config
mafrahm Feb 29, 2024
3083365
run merging tasks with htcondor
mafrahm Feb 29, 2024
42bb15b
fix zero padding
mafrahm Feb 29, 2024
822bd9c
pass correct workflow to cf.PrepareMLEvents
mafrahm Feb 29, 2024
c58714c
cleanup
mafrahm Feb 29, 2024
55b6866
cleanup in dl ml models
mafrahm Feb 29, 2024
bf1e2db
add correct jec_era for 2022preEE data
mafrahm Feb 29, 2024
795ec6e
minor config changes
mafrahm Mar 7, 2024
18f20a8
update inference models
mafrahm Mar 7, 2024
de658f2
add helper to trace back function call
mafrahm Mar 7, 2024
8aa6d9c
update cf
mafrahm Mar 15, 2024
f48d843
add event mask to PrepareMLEvents producer
mafrahm Mar 15, 2024
c8d77e3
add missing ml input variable
mafrahm Mar 15, 2024
84e2309
remove missing variable
mafrahm Mar 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 17 additions & 8 deletions hbw/analysis/create_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,46 +72,55 @@ def create_hbw_analysis(
campaign_run3_2022_postEE_nano_v12 = cmsdb.campaigns.run3_2022_postEE_nano_v12.campaign_run3_2022_postEE_nano_v12
campaign_run3_2022_postEE_nano_v12.x.EE = "post"

# default configs
# 2017
c17 = add_config( # noqa
analysis_inst,
campaign_run2_2017_nano_v9.copy(),
config_name="c17",
config_id=17,
config_id=1700,
add_dataset_extensions=False,
)
# configs with limited number of files
l17 = add_config( # noqa
analysis_inst,
campaign_run2_2017_nano_v9.copy(),
config_name="l17",
config_id=117,
config_id=1701,
limit_dataset_files=2,
add_dataset_extensions=False,
)

# 2022 preEE
c22pre = add_config( # noqa
analysis_inst,
campaign_run3_2022_preEE_nano_v12.copy(),
config_name="c22pre",
config_id=2201,
config_id=2200,
add_dataset_extensions=False,
)
l22pre = add_config( # noqa
analysis_inst,
campaign_run3_2022_preEE_nano_v12.copy(),
config_name="l22pre",
config_id=12201,
config_id=2201,
limit_dataset_files=2,
add_dataset_extensions=False,
)

# 2022 postEE
c22post = add_config( # noqa
analysis_inst,
campaign_run3_2022_postEE_nano_v12.copy(),
config_name="c22post",
config_id=2202,
config_id=2210,
add_dataset_extensions=False,
)
l22post = add_config( # noqa
analysis_inst,
campaign_run3_2022_postEE_nano_v12.copy(),
config_name="l22post",
config_id=12202,
config_id=2211,
limit_dataset_files=2,
add_dataset_extensions=False,
)

return analysis_inst
184 changes: 135 additions & 49 deletions hbw/config/categories.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@

import law

from time import time

from columnflow.config_util import create_category_combinations
from columnflow.ml import MLModel
from hbw.util import call_once_on_config
Expand Down Expand Up @@ -72,62 +74,126 @@ def add_gen_categories(config: od.Config) -> None:


@call_once_on_config()
def add_categories_selection(config: od.Config) -> None:
"""
Adds categories to a *config*, that are typically produced in `SelectEvents`.
"""
def add_abcd_categories(config: od.Config) -> None:
config.add_category(
name="sr",
id=1,
selection="catid_sr",
)
config.add_category(
name="fake",
id=2,
selection="catid_fake",
)
config.add_category(
name="highmet",
id=3,
selection="catid_highmet",
label=r"MET \geq 20",
)
config.add_category(
name="lowmet",
id=6,
selection="catid_lowmet",
label=r"MET < 20",
)

# adds categories based on the existence of gen particles
add_gen_categories(config)

@call_once_on_config()
def add_lepton_categories(config: od.Config) -> None:
config.x.lepton_channels = {
"sl": ("1e", "1mu"),
"dl": ("2e", "2mu", "emu"),
}[config.x.lepton_tag]

config.add_category(
name="incl",
id=1,
id=0,
selection="catid_selection_incl",
label="Inclusive",
)

cat_1e = config.add_category( # noqa
name="1e",
id=1000,
id=10,
selection="catid_selection_1e",
label="1 Electron",
)

cat_1mu = config.add_category( # noqa
name="1mu",
id=2000,
id=20,
selection="catid_selection_1mu",
label="1 Muon",
)
# dl categories
cat_2e = config.add_category( # noqa
name="2e",
id=3000,
id=30,
selection="catid_selection_2e",
label="2 Electron",
)

cat_2mu = config.add_category( # noqa
name="2mu",
id=4000,
id=40,
selection="catid_selection_2mu",
label="2 Muon",
)

cat_emu = config.add_category( # noqa
name="emu",
id=5000,
id=50,
selection="catid_selection_emu",
label="1 Electron 1 Muon",
)


@call_once_on_config()
def add_jet_categories(config: od.Config) -> None:
cat_resolved = config.add_category( # noqa
name="resolved",
id=100,
selection="catid_resolved",
label="resolved",
)
cat_boosted = config.add_category( # noqa
name="boosted",
id=200,
selection="catid_boosted",
label="boosted",
)

cat_1b = config.add_category( # noqa
name="1b",
id=300,
selection="catid_1b",
label="1b",
)
cat_2b = config.add_category( # noqa
name="2b",
id=600,
selection="catid_2b",
label="2b",
)


@call_once_on_config()
def add_categories_selection(config: od.Config) -> None:
"""
Adds categories to a *config*, that are typically produced in `SelectEvents`.
"""

# adds categories based on the existence of gen particles
add_gen_categories(config)

# adds categories for ABCD background estimation
add_abcd_categories(config)

# adds categories based on number of leptons
add_lepton_categories(config)


def name_fn(root_cats):
cat_name = "__".join(cat.name for cat in root_cats.values())
return cat_name
Expand All @@ -149,6 +215,10 @@ def add_categories_production(config: od.Config) -> None:
"""
Adds categories to a *config*, that are typically produced in `ProduceColumns`.
"""
if config.has_tag("add_categories_ml_called"):
logger.warning("We should not call *add_categories_production* when also building ML categories")
# when ML categories already exist, don't do anything
return
#
# switch existing categories to different production module
#
Expand All @@ -168,81 +238,99 @@ def add_categories_production(config: od.Config) -> None:
cat_emu = config.get_category("emu")
cat_emu.selection = "catid_emu"

#
# define additional 'main' categories
#

cat_resolved = config.add_category(
name="resolved",
id=10,
selection="catid_resolved",
label="resolved",
)
cat_boosted = config.add_category(
name="boosted",
id=20,
selection="catid_boosted",
label="boosted",
)

cat_1b = config.add_category(
name="1b",
id=100,
selection="catid_1b",
label="1b",
)
cat_2b = config.add_category(
name="2b",
id=200,
selection="catid_2b",
label="2b",
)
add_jet_categories(config)

#
# define all combinations of categories
#

category_blocks = OrderedDict({
"lepid": [config.get_category("sr"), config.get_category("fake")],
# "met": [config.get_category("highmet"), config.get_category("lowmet")],
"lep": [config.get_category(lep_ch) for lep_ch in config.x.lepton_channels],
"jet": [cat_resolved, cat_boosted],
"b": [cat_1b, cat_2b],
"jet": [config.get_category("resolved"), config.get_category("boosted")],
"b": [config.get_category("1b"), config.get_category("2b")],
})

t0 = time()
n_cats = create_category_combinations(
config,
category_blocks,
name_fn=name_fn,
kwargs_fn=kwargs_fn,
skip_existing=False, # there should be no existing sub-categories
)
logger.info(f"Number of produced category insts: {n_cats}")
logger.info(f"Number of produced category insts: {n_cats} (took {(time() - t0):.3f}s)")


@call_once_on_config()
def add_categories_ml(config, ml_model_inst):
if config.has_tag("add_categories_production_called"):
raise Exception("We should not call *add_categories_production* when also building ML categories")
#
# prepare non-ml categories
#

cat_1e = config.get_category("1e")
cat_1e.selection = "catid_1e"

cat_1mu = config.get_category("1mu")
cat_1mu.selection = "catid_1mu"

cat_2e = config.get_category("2e")
cat_2e.selection = "catid_2e"

cat_2mu = config.get_category("2mu")
cat_2mu.selection = "catid_2mu"

cat_emu = config.get_category("emu")
cat_emu.selection = "catid_emu"

add_jet_categories(config)

#
# add parent ml model categories
#

# if not already done, get the ml_model instance
if isinstance(ml_model_inst, str):
ml_model_inst = MLModel.get_cls(ml_model_inst)(config)

# add ml categories directly to the config
# NOTE: this is a bit dangerous, because our ID depends on the MLModel, but
# we can reconfigure our MLModel after having created these categories
ml_categories = []
for i, proc in enumerate(ml_model_inst.processes):
ml_categories.append(config.add_category(
# NOTE: name and ID is unique as long as we don't use
# multiple ml_models simutaneously
name=f"ml_{proc}",
id=(i + 1) * 10000,
id=(i + 1) * 1000,
selection=f"catid_ml_{proc}",
label=f"ml_{proc}",
))

#
# create combination of categories
#

# NOTE: building this many categories takes forever: has to be improved...
category_blocks = OrderedDict({
"lepid": [config.get_category("sr"), config.get_category("fake")],
# "met": [config.get_category("highmet"), config.get_category("lowmet")],
"lep": [config.get_category(lep_ch) for lep_ch in config.x.lepton_channels],
"jet": [config.get_category("resolved"), config.get_category("boosted")],
"b": [config.get_category("1b"), config.get_category("2b")],
"dnn": ml_categories,
})

# # NOTE: temporary solution: only build DNN leafs
# combined_categories = [cat for cat in config.get_leaf_categories() if len(cat.parent_categories) != 0]
# category_blocks = OrderedDict({
# "leafs": combined_categories,
# "dnn": ml_categories,
# })

t0 = time()
# create combination of categories
n_cats = create_category_combinations(
config,
Expand All @@ -251,6 +339,4 @@ def add_categories_ml(config, ml_model_inst):
kwargs_fn=kwargs_fn,
skip_existing=True,
)
logger.info(f"Number of produced ml category insts: {n_cats}")

# TODO unfinished
logger.info(f"Number of produced ml category insts: {n_cats} (took {(time() - t0):.3f}s)")
Loading
Loading