Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-monday: new connector #2405

Merged
merged 3 commits into from
Feb 24, 2025
Merged

source-monday: new connector #2405

merged 3 commits into from
Feb 24, 2025

Conversation

JustinASmith
Copy link
Contributor

@JustinASmith JustinASmith commented Feb 14, 2025

Description:

This is an initial version of a connector to capture Boards, Teams, Users, Items, and Tags from Monday's GraphQL API.

There are GraphQL/API limitations worth mentioning:

  • Individual queries have a complexity limit of 5M / Minute. Complexity defines the load that each call puts on the API.
  • Queries have a max depth limit of 7, therefore, deeply nested items can sometime be challenging to obtain.
  • Limit on number of queries that can run per minute and day.

Currently the streams are set up on a 5 minute interval. However, the API still produces HTTP 429 (i.e., Too Many Requests) even with this interval since each binding try to execute concurrently. The HTTPSession class is already set up to handle rate limiting, back off, and retry. This has seemed to work okay, but Monday.com also has a way to check the current complexity budget and complexity cost of a query. I have a function that can be used to check the current budget and wait until the complexity budget next resets. We may either need to use the complexity checks or extend the CDK to have some sort of queuing mechanism for requests so that we do not exceed complexity or requests per minute limitations.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on local stack. Confirmed:

  • OAuth works as expected in the UI when setting the redirect_uri to localhost:3000 for the OAuth app in Monday.com.
  • flowctl preview shows valid results in standard out.
  • flowctl raw spec
  • Currently running into SpecFailed errors again when installing connector in local stack.

OAuth2 Enhancements:

  • Added AuthorizationCodeFlowOAuth2Credentials class to support authorization code flow in OAuth2.
  • Modified TokenSource to handle AuthorizationCodeFlowOAuth2Credentials in the fetch_token and _fetch_oauth2_token methods. [1] [2] [3] [4]

This change is Reviewable

@williamhbaker
Copy link
Member

Looking really good so far! No major issues that I can see, and not even really any minor issues from a quick look.

Some general thoughts:

  • There's a balance between query complexity & usefullness of information we get back. Get more information is good but more expensive. I'm not saying anything needs to change with the queries, but one thing to keep in mind is that it is much easier for us to modify the connector to add new fields in the future, rather than removing them, from the standpoint of backwards compatibility.
  • I advised you to try to use our standard HTTP backoff mechanism...I think that's probably still a good idea, but considering what you've found about how restrictive the API limits are, I think it is very likely we are going to want to rig this up to processing bindings in a serial fashion, rather than all of them concurrently. We can wait and see if its an actual issue that user tasks encounter though.
  • It would be worth sharpening the pencil a little bit on some of these streams to see if a practical incremental strategy exists for them. That would prevent us from needing to manage compatibility issues in the future if we found that we wanted to make streams incremental and such a thing was possible, where the collection key would need to change as a result (from _meta/row_id. If we could make any of these incremental it should also help with how much of the complexity budget they use too.

Copy link
Contributor Author

@JustinASmith JustinASmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial self review

@JustinASmith
Copy link
Contributor Author

Limitation(s) identified:

  • Renaming a subitem does not register in the activity log as an event.

@JustinASmith JustinASmith marked this pull request as ready for review February 21, 2025 12:55
@JustinASmith
Copy link
Contributor Author

@williamhbaker I have moved this to ready for review. I have made the minor changes we discussed in Slack and on our previous VC.

@JustinASmith
Copy link
Contributor Author

I am checking into CI failure now. I tested locally and all worked fine, but I had my test.flow.yaml slightly changed to use custom config instead of config.yaml

Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some inline comments below!

A couple of more general things:

  • I recommend enabling a type checker and fixing anything it flags if you haven't already done that. I mention this in one of my comments as well - I pulled this PR down and pylance flagged a few things that I think are legitimate issues to resolve.
  • The graphql.py code could benefit from some more explicit data modeling. Ideally we'd have pydantic models to represent the shape of the API responses, and could use those in place of more dynamic typing and dictionary lookups. I am 100% sure how practical this will be do to so this isn't a total blocker, but if you're able to make any improvements there I think it would help the maintainability of the code.

subitems: list["Item"] | None = None


Item.model_rebuild()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will rebuild the model schema, which I think is needed since subitems is a Forward Reference and Python cannot know what Item is at class definition time for this subitems field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we have lots of examples of using forward references that just put the class name in quotes. What happens if you get rid of this Item.model_rebuild line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm you are right. I guess I did not need that after all. I will update to remove that line since I just tested without and everything worked as expected!

Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

One remaining tiny question, but overall the updates look good!

= added 3 commits February 24, 2025 11:44
This is a new connector for Monday.com. The
requirements for this connector are
to incorporate the following streams:

- Boards
- Items
- Tags
- Teams
- Users

The connector will be able to perform full
refreshes and incremental updates on `Boards`
and `Items` using the Activity Log.
The rest are full refresh.

Monday's API is GraphQL-based, which comes with
complexity and rate limits. To combat this we do
not select all fields in the GraphQL query, but
instead only select a smaller subset. The streams
are in full parity with the fields that Airbytes
Monday source connector has. We can easily add
additional fields or streams if needed later.

Monday's Activity Logs do not capture all possible
events. The one event we know of now is renaming
a subitem. This event is not captured and therefore
will not be reflected in the incremental updates.
However, there are CDK improvements being worked
that will support scheduled backfills which can
be used to schedule backfills for the stream.

The connector supports API Access tokens as the
authentication method. The Monday OAuth2 app
will be set up and submitted to Monday for review
so that OAuth2 can be usesd for authentication in
the future.
Add an additional method of authentication using the authorization code flow grant type in the request to get access tokens. This is required by Monday.com. So this update supports the `source-monday` connector OAuth2 authentication once that is enabled.
@JustinASmith JustinASmith merged commit 2e6471e into main Feb 24, 2025
82 of 88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants