In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search
This repository contains code for our EMNLP 2024 paper "In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search"
- Overview
- Data
- Searching
- Model evaluation using LINT
- Generating statement likelihood distribution plot over
We are the first work to tackle the problem of systematically generating evaluation data in the long-tail distribution for large language models. In this work, we propose a Logic-Induced-Knowledge-Search (LINK🔗) framework for systematically generating long-tail knowledge statements. Grounded by a symbolic logic rule, we search for long-tail values for each variable of the rule by first prompting a large language model, then verifying the correctness of the values with a critic, and lastly pushing for the long-tail distribution with a reranker.
Using this framework we construct a dataset, Logic-Induced-Long-Tail (LINT), consisting of 200 symbolic rules and 50K knowledge statements spanning across four different domains.
contains the definition of all the rules we use for LINK.
Here is an example rule definition.
"rule2": {
"premise_map": {
"is_allergic_to(P, A)": "[P] is allergic to [A]",
"is_ingredient_in(Z, B)": "[Z] is a ingredient in [B]",
"is_one_type_of(Z, A)": "[Z] is one type of [A]"
"conclusion_map": {
"is_not_able_to_eat(P, B)": "[P] is not able to eat [B]"
"variables": {
"P": [
"P", # variable name
"Person", # data type
true, # whether it is a generic node
null # node type(null or "factual")
"A": [
"Food Allergen",
"B": [
"Name of a dish or food",
"Z": [
"domain": "food and physical conditions"
Our dataset LINT is publicly released here.
contains an example script for knowledge beam search with LINK.
rule_keys="rule0 rule1 rule2 rule3 rule4 rule5 rule6 rule7 rule8 rule9"
rule_path="../data/rules.json" # the path for rule definition
python LINK/ \
--do_search \
--knowledge_n_sample 50 \ # the number of generated values of each call
--beam_size 200 \
--deduplicate \ # whether to deduplicate across different calls
--rule_keys $rule_keys \
--output_directory $output_directory \
--rule_path $rule_path \
--traverse_order premise \ # start searching from the predicate in the premise that contains the generic node; it will be automatically changed to starting from conclusion if there is a factual node in the rule
--get_verifier_samples \ # whether to return sample details of the critic model
--factual_verifier_threshold 0.85 \ # the start threshold of the factual critic model
--datatype_verifier_threshold 0.85 \ # the start threshold of the factual critic model
--accumulate_verifier_confidence \ # whether to filter values based on the accumulated confidence of all values in the beam
--dynamic_verifier_threshold \ # whether to use dynamic critic threshold
--dynamic_ranker # whether to use the dynamic ranker (take top the 75% values)
contains an example script to generate knowledge statements with LLMs.
rule_keys="rule0 rule1 rule2 rule3 rule4 rule5 rule6 rule7 rule8 rule9"
rule_path="../data/rules.json" # the path for rule definition
python LINK/ \
--beam_size 200 \
--search_n_sample 50 \ # the number of generated values of each call
--do_search \
--deduplicate \ # whether to deduplicate across different calls
--full \ # whether to use the full rule during generation
--output_directory $output_directory \
--rule_keys $rule_keys \
--knowledge_model_path $model \
--meta_rule_info $rule_path
contains an example script for entailment classification task on LINT.
rule_keys="rule0 rule1 rule2 rule3 rule4 rule5 rule6 rule7 rule8 rule9"
positive_conclusion_rules="rule26 rule27 rule28 rule29" # rules that with a positive conclusion, e.g. Person X can do something
output_directory="../output/probing_set" # the path to probing set
rule_path="../data/rules.json" # the path for rule definition
# with COT
python LINK/ \
--output_directory $output_directory \
--rule_keys $rule_keys \
--method_name $method_name \
--rule_path $rule_path \
--do_probe \
--collect_rationale \
--traverse_order premise \
--probe_model_path $probe_model \
--positive_conclusion_rules $positive_conclusion_rules \
--cot --cot_icl
# without COT
python LINK/ \
--output_directory $output_directory \
--rule_keys $rule_keys \
--method_name $method_name \
--rule_path $rule_path \
--do_probe \
--collect_rationale \
--traverse_order premise \
--probe_model_path $probe_model \
--positive_conclusion_rules $positive_conclusion_rules
We provide a script for reproducing the distribution plots of the generated statements by LINK, ChatGPT and GPT4.
- Run the script to get the likelihood of each knowledge statement in the dataset
contains an example script to get the likelihood of each knowledge statement in the dataset and preprocess it for further plots.
model_name=("gpt") # select from gpt(text-davinci-003), llama(llama-7b), ft(fasttext)
order_name=("conclusion_first" "premise_first")
rule_path="../data/rules.json" # path to rule definition
# conclusion first rules
# Rules that searched from the predicate in the conclusion
rule_index="0 1 3 4 5 6 7 8 9 10"
for name in "${model_name[@]}"; do
python --input_dir $input_dir --model $name --rule_indexes $rule_index --output_dir $save_dir --meta_rule_info $rule_path
# premise first rules
# Rules that searched from the predicate in the premise that contains the generic node
rule_index="20 21 22 23 24 25 26 27 28 29 30"
for name in "${model_name[@]}"; do
python --input_dir $input_dir --model $name --rule_indexes $rule_index --output_dir $save_dir --meta_rule_info $rule_path
# concat all the likelihood
for name in "${order_name[@]}"; do
python --data_dir ${save_dir}/${name} --model_indexes gpt --primary_rank_model gpt
Get the distribution plot for each rule
contains our code to get the distribution plots for each rule -
Get the delta plot
contains our code to get the delta plot for all the rules