Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Remove interacts from conceptual modeling phase #5

Open
emjun opened this issue May 23, 2023 · 1 comment
Open

Proposal: Remove interacts from conceptual modeling phase #5

emjun opened this issue May 23, 2023 · 1 comment

Comments

@emjun
Copy link
Owner

emjun commented May 23, 2023

Key points:

  • Proposal: Remove interacts from the DSL and conceptual model disambiguation phase.
  • Rationale: Interactions start to get at statistical formulation. They are a good example of a modeling decision that is pretty squarely in between/combination of conceptual and statistical concerns. However, for the process supported by rTisane, they seem better suited to be considered in the statistical modeling phase.
  • Alternative implementations that allow for interactions:
    • (1) Update statistical model disambiguation process to always ask about interactions among confounders
    • (2) Update query function to accept specification about interaction(s), only show/ask about interactions in the statistical modeling disambiguation interface when interactions are specified in query

Longer discussion:

  • Previous interaction for includinginteracts: So that analysts could express research questions and hypotheses about interaction effects between two or more variables. In other words, we wanted analysts to be able to include interactions in their statistical models, either by including it in their query or by including interactions during statistical model disambiguation.
  • Current implementation: interacts constructs a new variable that analysts involve in conceptual relationships. Concretely, something like this is possible:
a <- categorical("a")
b <- categorical("b")
z <- continuous("z")
ixn <- interacts(a, b)  # Construct a variable representing an interaction

cm <- ConceptualModel() %>%
     assume(causes(a, z)) %>% 
     assume(causes(ixn, z)) # Involve interaction in a conceptual relationship
  • The troubles:
    • The problem with treating interactions as variables that are "equal" to their component variables in the conceptual model is a bit misleading. Interactions aren't actually equivalent concepts. In other words, the conceptual model is now not totally conceptual. With interaction variables included, the conceptual model starts to mix conceptual and statistical concepts.
    • Reasoning about interactions is also up for debate in the causal diagramming community. Pearl has argued that interactions are already captured in causal diagrams, and others have proposed new graphical structures to explicitly represent and reason about interactions. Because we are relying on Cinelli, Forney, and Pearl's recommendations for graphical reasoning about statistical models, I'm not sure it's completely sound to reason about interactions as nodes in the underlying graph.
    • Apart from the above reasons, because interactions start to call to mind statistical formulation, it is better to consider during the statistical model disambiguation phase, when rTisane guides analysts to think a bit lower level than the input spec/DSL already.
  • Conclusion: Basically, I think we can have a cleaner delineation/separation of conceptual and statistical models by removing interacts from the DSL while retaining it as a consideration/possibility in the statistical model disambiguation phase. Either alternatives (very top) would help achieve that although alternative 1 may be even cleaner.

Open to comments, debate, deliberation.

@emjun
Copy link
Owner Author

emjun commented May 23, 2023

Key discussion points/decisions

  • Encouraging end-users to think through interactions (e.g., "A's influence conditioning on B") earlier than later seems important to the rTisane workflow.
  • How closely do we want our graph representation to correspond to a causal diagram? The graph IR is already seaprate from a causal graph given that it contains measurement + causal edges. Therefore, we should not restrict rTisane's graph to only causal information. It should contain any information that is helpful for compiling a statistical model from a conceptual model.
  • Default semantics: If analyst does not specify interactions up front, rTisane will not consider including interactions in the output statistical model. This seems clearer/simpler/more straightforward than an implicit inclusion of interactions. (Mantra: All things equal, let's opt for something simple.)
  • Syntax: Likely to be similar to what we already have. Not perfect, but could work on to improve in the future.

Other notes

  • The alternative of asking for interactions in the query feels kinda late.
  • The alternative of suggesting all possible interactions at the statistical model disambiguation phase seems like potentially interesting/encourage analysts to think about interactions more (especially since there is more opportunity for exposition/explanation in the interface), but generating all n-way interactions is NP-hard (?) and finding a way to suggest/show them well seems hard. For the sake of the imminent evaluation, let's skip this. (Made an issue to revisit this later)

Conclusion

Allow analysts to annotate conceptual models with interactions that are likely to exist among variables. Do this by supporting:

a <- categorical("a")
b <- categorical("b")
z <- continuous("z")
ixn <- interacts(a, b)  # Construct a variable representing an interaction

cm <- ConceptualModel() %>%
     assume(causes(a, z)) %>% 
     interacts(variables=list(a, b), outcome=c) # Annotate conceptual model with an interaction between a and b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant