From b52cca57d0cab07205f189caf9118eeae1e823a1 Mon Sep 17 00:00:00 2001 From: David Gasquez Date: Fri, 18 Aug 2023 15:44:59 +0200 Subject: [PATCH] :art: --- Data/Dashboards.md | 12 ++++++------ Data/Data Culture.md | 14 +++++++------- Data/Data Engineering.md | 4 ++-- Data/Reverse ETL.md | 22 +++++++++++----------- Data/Sharing Data Insights.md | 18 +++++++++--------- Identity.md | 2 +- Journaling.md | 6 +++--- Knowledge Graphs.md | 4 ++-- Large Language Models.md | 2 +- 9 files changed, 42 insertions(+), 42 deletions(-) diff --git a/Data/Dashboards.md b/Data/Dashboards.md index d7be688..2bb5ce5 100644 --- a/Data/Dashboards.md +++ b/Data/Dashboards.md @@ -12,10 +12,10 @@ - Purpose and explanation of the data being shown. - Caveats and assumptions. - Extra Context: - - Why this dashboard exists. - - Who it's for. - - When it was built, and if and when it's set to expire . - - What features it's tracking via links to team repositories, project briefs, screenshots, or video walk-throughs. + - Why this dashboard exists. + - Who it's for. + - When it was built, and if and when it's set to expire . + - What features it's tracking via links to team repositories, project briefs, screenshots, or video walk-throughs. - Take-aways. - Metadata (owner, related OKRs, TTL, …). - Make them so its easy to go one layer down (X went down in Y location, or for Z new users, etc). @@ -43,8 +43,8 @@ The value is that now discussions are happening about the data. - [They can serve endless needs, but in doing so, rarely do they serve _particular_ needs perfectly](https://win.hyperquery.ai/p/analysis-or-dashboard). - Dashboards shouldn't be single-use - Ask this: - - Can this new dashboard request be added into an existing one? - - What are you going to do differently by looking at the Dashboard? Focus on that [[Metrics|metric]] and add it to the main Dashboard + - Can this new dashboard request be added into an existing one? + - What are you going to do differently by looking at the Dashboard? Focus on that [[Metrics|metric]] and add it to the main Dashboard - Beware of the death by 1,000 filters: After a dashboard had gone live, you'll be flooded with requests for new views, filters, fields, pages, everything ([can you just ...](https://richardswinbank.net/blog/can_you_just)). - Dashboards are decision-making infrastructure, and infrastructure needs to be maintained. Be explicit about which Dashboards are disposable and add a TTL to them. - The numbers and charts on a dashboard very rarely have any direct personal meaning to the people using it. There's tons of other work to do, and unless that dashboard is directly tied to your performance or compensation, there are probably more important things to look at. People are more likely to check stock prices when they actually own (and thus benefit from) the stock. diff --git a/Data/Data Culture.md b/Data/Data Culture.md index 50f33cb..e5cad1b 100644 --- a/Data/Data Culture.md +++ b/Data/Data Culture.md @@ -9,8 +9,8 @@ - Data is fundamentally a collaborative design process rather than a tool, an analysis, or even a product. [Data works best when the entire feedback loop from ideation to production is an iterative process](https://pedram.substack.com/p/data-can-learn-from-design). - [To get buy in, explain how the business could benefit from better data](https://youtu.be/Mlz1VwxZuDs) (e.g: more and better insights). Start small and show value. - Run *[Purpose Meetings](https://www.avo.app/blog/tracking-the-right-product-metrics)* or [Business Metrics Review](https://youtu.be/nlMn572Dabc). - - Purpose Meetings are 30 min meetings in which stakeholders, engineers and data align on the goal of a release and what is the best way to evaluate the impact and understand its success. Align on the goal, commit on metrics and design the data. - - Business Metrics Review is a 30 to 60 minutes meeting to chat and explore key metrics and teach how to think with data. + - Purpose Meetings are 30 min meetings in which stakeholders, engineers and data align on the goal of a release and what is the best way to evaluate the impact and understand its success. Align on the goal, commit on metrics and design the data. + - Business Metrics Review is a 30 to 60 minutes meeting to chat and explore key metrics and teach how to think with data. - Value of clear goals and expectations. Validate what you think your job is with your manager and stakeholders, repeatedly. - [While the output of your team is what you want to maximize, you'll need some indicators that will help guide you day-to-day](https://data-columns.hightouch.io/your-first-60-days-as-a-first-data-hire-weeks-3-4/). Decide what's important to you (test coverage, documentation missing, queries run, models created, ...), and generate some internal reports for yourself. - [Data teams should be a part of the business conversations from the beginning](https://cultivating-algos.stitchfix.com/). Get the data team involved early, have open discussions with them about the existing work, and how to prioritize new work against the existing backlog. Don’t accept new work without addressing the existing bottlenecks, and don’t accept new work without requirements. **Organizational [[politics]] matter way more than any data methods or technical knowledge**. @@ -19,7 +19,7 @@ - The modern data team needs to have *real organizational power*—it needs to be able to say "no” and mean it. If your data team does not truly have the power to say no to stakeholders, it will get sent on all kinds of wild goose chases, be unproductive, experience employee churn, etc. - Data should report to the CEO. Ideally at least with some weekly metrics split into (a) notable trends, (b) watching close, and (c) business as usual. - If data is the most precious asset in a company, does it make sense to have only one team responsible for it? - - [People talk about data as the new oil but for most companies it’s a lot closer uranium](https://news.ycombinator.com/item?id=27781286). Hard to find people who can to handle or process it correctly, nontrivial security/liabilities if PII is involved, expensive to store and a generally underwhelming return on effort relative to the anticipated utility. + - [People talk about data as the new oil but for most companies it’s a lot closer uranium](https://news.ycombinator.com/item?id=27781286). Hard to find people who can to handle or process it correctly, nontrivial security/liabilities if PII is involved, expensive to store and a generally underwhelming return on effort relative to the anticipated utility. - [The pain in data teams come from needing to influence PMs/peers with having little control of them. Data teams need to become really great internal marketers/persuaders](https://anchor.fm/census/episodes/The-evolution-of-the-data-industry--data-jobs-w-Avo-CEO-and-Co-founder-Stefania-Olafsdottir-e16hu1l). That said, it shouldn't be the data team job to convince the organization to be data driven. That's not an effective way of spending resources. - People problems are orders of magnitude more difficult to solve than data problems. - **Integrate data where the decision is made**. E.g: Google showing restaurant scores when you're looking something for dinner. @@ -43,7 +43,7 @@ - Do weekly recaps in Slack in to highlight key items, company-wide progress toward north-stars, improvements in certain areas, new customer highlights. All positive and fun stuff. - How can we measure the data team impact? - Making a [[Writing a Roadmap|roadmap]] can help you telling if you are hitting milestone deadlines or letting them slip. - - Embedded data team members need to help other teams build their roadmap too. + - Embedded data team members need to help other teams build their roadmap too. - Also, having a changelog ([do releases!](https://betterprogramming.pub/great-data-platforms-use-conventional-commits-51fc22a7417c)) will help show the team impact on the data product across time. - [Push for a *centralization of the reporting structure*, but keeping the *work management decentralized*](https://erikbern.com/2021/07/07/the-data-team-a-short-story.html). - Unify resources (datasets, entities, definitions, metrics). Have one source of truth for each one and make that clear to everyone. That source of truth needs heavy curation. Poor curation leads to confusion, distrust and…. lots of wasted effort. @@ -85,9 +85,9 @@ - [Data ownership is a hard problem](https://www.linkedin.com/posts/chad-sanderson_heres-why-data-ownership-is-an-incredibly-activity-6904107936533114880-gw8n/). Data is fundamentally generated by services (or front-end instrumentation) which is managed by engineers. CDC and other pipelines are built by data engineers. The delineation of ownership responsibilities is very rarely established, with each group wanting to push 'ownership' onto someone else so they can do the jobs they were hired for. - [Becoming a data-driven organization is a journey, which unfolds over time and requires critical thinking, human judgement, and experimentation](https://hbr.org/2022/02/why-becoming-a-data-driven-organization-is-so-hard). Fail fast, learn faster. - [Path to create a data-driven organization](https://twitter.com/_abhisivasailam/status/1520274838450888704): - - 1. Get a well-placed leader with influence to message, model, and demand data-driven execution. - - 2. Hire/fire based on data aptitude and usage. - - 3. Create mechanisms that force analytical conversations. Sometimes there is no way around spending an afternoon breaking down metrics by different segments until you find The Thing. + - 1. Get a well-placed leader with influence to message, model, and demand data-driven execution. + - 2. Hire/fire based on data aptitude and usage. + - 3. Create mechanisms that force analytical conversations. Sometimes there is no way around spending an afternoon breaking down metrics by different segments until you find The Thing. - [Start small. Don't try to wrangle data for the entire company until you have the tools and process down for one team](https://data-columns.hightouch.io/your-first-60-days-as-a-first-data-hire-weeks-3-4/). - Difficulty to work with data scales exponentially with size. - [Rule of thumb; your first customer as a data person should be growth](https://twitter.com/josh_wills/status/1577699871335010304). diff --git a/Data/Data Engineering.md b/Data/Data Engineering.md index 1dcfe40..575d6f5 100644 --- a/Data/Data Engineering.md +++ b/Data/Data Engineering.md @@ -36,8 +36,8 @@ graph LR; - Decouple producers and consumers adding a layer in between. That can be something as simple as a text file or complex as a [[Databases|database]]. - **Schemas changes**. Most of the time you won't be there at the exact time of the change so aim to save everything. - Ideally, the schema will evolve in a backward compatible way: - - Data types don't change in the same column. - - Columns are either deleted or added but never renamed. + - Data types don't change in the same column. + - Columns are either deleted or added but never renamed. - Create a few extra columns like `processed_at` or `schema_version`. - Generate stats to provide the operator with feedback. - Data coming from pipelines should be easily reproducible. If you want to re-run a process, you should ensure that it will produce always the same result. This can be achieved by enforcing the [Functional Data Engineering Paradigm](https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a). diff --git a/Data/Reverse ETL.md b/Data/Reverse ETL.md index a512153..2252af9 100644 --- a/Data/Reverse ETL.md +++ b/Data/Reverse ETL.md @@ -5,17 +5,17 @@ ## Why? - It provides a source of truth for all the tools: **the data warehouse**. - - Each tool can use and share the same definitions, events, and properties. - - Tracking is less dependent of business rules. - - Centralized tests can be added to validate assumptions. - - It removes some tools limitations (e.g. Customer.io ways of doing segmentation, Pendo limitation on events cohort). - - SQL Queries will return the same numbers than other BI tools like Mixpanel. + - Each tool can use and share the same definitions, events, and properties. + - Tracking is less dependent of business rules. + - Centralized tests can be added to validate assumptions. + - It removes some tools limitations (e.g. Customer.io ways of doing segmentation, Pendo limitation on events cohort). + - SQL Queries will return the same numbers than other BI tools like Mixpanel. - You get to use all the data you have, improving your [[Data Culture]]. - - You can use the real source of truth for all the events and not rely on tracking only. - - You can join sources like ChartMogul, Customer.io, etc, - - You can create more interesting events by enriching the events and user profiles with extra properties/traits (Trial Started with a conversion provability attached). Makes product analytics much more powerful. + - You can use the real source of truth for all the events and not rely on tracking only. + - You can join sources like ChartMogul, Customer.io, etc, + - You can create more interesting events by enriching the events and user profiles with extra properties/traits (Trial Started with a conversion provability attached). Makes product analytics much more powerful. - It is much easier to re-use the data available in the warehouse than it is to import the data in any new tool we use in the future. - - You can be much more flexible with the tools we want to use because the data is shared and owned by us. - - You avoid being locked in to BI tools like Mixpanel since the logic will be stored in our warehouse. + - You can be much more flexible with the tools we want to use because the data is shared and owned by us. + - You avoid being locked in to BI tools like Mixpanel since the logic will be stored in our warehouse. - As any new tool, it gives more flexibility and power. - - The current state is the starting point! We start using it to fix some issues or add some interesting profile properties + - The current state is the starting point! We start using it to fix some issues or add some interesting profile properties diff --git a/Data/Sharing Data Insights.md b/Data/Sharing Data Insights.md index ef6a968..eb88ee5 100644 --- a/Data/Sharing Data Insights.md +++ b/Data/Sharing Data Insights.md @@ -1,17 +1,17 @@ # Sharing Data Insights - [Sharing your data insights across your organization facilitates collaboration and mutual learning – increasing data literacy across the company](https://locallyoptimistic.com/post/share-your-data-insights-to-engage-your-colleagues/). It also helps remind folks that members of the data team can be strategic partners, creating opportunities for proactive brainstorming that can drive innovation. +[Sharing your data insights across your organization facilitates collaboration and mutual learning – increasing data literacy across the company](https://locallyoptimistic.com/post/share-your-data-insights-to-engage-your-colleagues/). It also helps remind folks that members of the data team can be strategic partners, creating opportunities for proactive brainstorming that can drive innovation. - The aim is to answer the following questions each time: +The aim is to answer the following questions each time: - 1. **What am I looking at?** A **short-but-informative title** can tell people immediately what data is the focus of the insight. - 2. **What should I learn from this?** or, Why should I care? Include the **most useful information,** and/or a **clear takeaway**. For folks who only have a few seconds to scan the message, it should be **easy to spot** the **most valuable** bit of the insight, the **reason** this exploration was considered worth sharing. - 3. **What caught my eye?** Share a chart or a related resource! - 4. **What if I want to know more?** A **link to additional information** can be valuable for people who have time for more than a quick scan and want to understand how you developed the insight, or do some of their own related exploration. - 5. **What if I have a question?** Explicitly **inviting questions** and responses is crucial. It’s the best part of sharing an insight! This is where you get to learn about things your colleagues know that you don’t, or what they’re curious about but has not yet risen to the level of becoming a data request from them. - 6. **What if posting this prompts a whole bunch of follow-up questions, or exposes incorrect assumptions?** If you have hit on something that’s interesting to a lot of people there likely will be questions that spin off, new ways to slice the data you’re looking at, or assumptions you have made that need to be corrected. +1. **What am I looking at?** A **short-but-informative title** can tell people immediately what data is the focus of the insight. +2. **What should I learn from this?** or, Why should I care? Include the **most useful information,** and/or a **clear takeaway**. For folks who only have a few seconds to scan the message, it should be **easy to spot** the **most valuable** bit of the insight, the **reason** this exploration was considered worth sharing. +3. **What caught my eye?** Share a chart or a related resource! +4. **What if I want to know more?** A **link to additional information** can be valuable for people who have time for more than a quick scan and want to understand how you developed the insight, or do some of their own related exploration. +5. **What if I have a question?** Explicitly **inviting questions** and responses is crucial. It’s the best part of sharing an insight! This is where you get to learn about things your colleagues know that you don’t, or what they’re curious about but has not yet risen to the level of becoming a data request from them. +6. **What if posting this prompts a whole bunch of follow-up questions, or exposes incorrect assumptions?** If you have hit on something that’s interesting to a lot of people there likely will be questions that spin off, new ways to slice the data you’re looking at, or assumptions you have made that need to be corrected. -### Slack Template +## Slack Template ```md ![Chart]() diff --git a/Identity.md b/Identity.md index 1157fdc..dfa84c3 100644 --- a/Identity.md +++ b/Identity.md @@ -2,7 +2,7 @@ - [Maintain a very small identity](http://www.paulgraham.com/identity.html). The act of labeling yourself is the act of restricting yourself to what you think fits that label. Don't have opinions on everything. Avoid forming an opinion at all about things that are not evident. Do not affiliate your identity with anything extrinsic - such as a religion, political party, country, company, profession, [[Programming]] language, social class, etc. - Identity can be helpful in some cases. When we identity as something aligned with our [[Values]] and that can self correct (e.g: rationalism), it encourages you to behave better! - - [Try to affiliate more strongly with the communities whose core beliefs would be less dangerous if they turned out to be wrong](https://economicsdetective.com/2016/10/identity-mind-killer/). + - [Try to affiliate more strongly with the communities whose core beliefs would be less dangerous if they turned out to be wrong](https://economicsdetective.com/2016/10/identity-mind-killer/). - Identity labels are a way of [[Conceptual Compression]]. They help you infer some things about people that identify as something. - You're not your opinions. Don't define yourself by what you work on or what you hate. Once a belief becomes part of your identity, any evidence that threatens the belief is a personal attack. - The only constant in the world is that it changes. Identify as someone that changes their mind when the data changes! diff --git a/Journaling.md b/Journaling.md index a97a8cf..e16669f 100644 --- a/Journaling.md +++ b/Journaling.md @@ -14,6 +14,6 @@ - Review a set of recurrent prompts. Tweak them over time. For example: - Consistency at your core [[habits]] this week ([[Fitness]], [[Routine]], [[Productivity]], etc.). How can you tweak them to be more consistent or more useful? - What did you do this week that was a mistake and how can I avoid repeating it? - - What would you like to accomplish next week? - - Do you need to clarify something? - - Which actions will you move closer to your [[goals]]? + - What would you like to accomplish next week? + - Do you need to clarify something? + - Which actions will you move closer to your [[goals]]? diff --git a/Knowledge Graphs.md b/Knowledge Graphs.md index 54b6f26..c03173f 100644 --- a/Knowledge Graphs.md +++ b/Knowledge Graphs.md @@ -9,9 +9,9 @@ - Why didn't it catch on? - Graphs always appear like a complicated mess, and we prefer hierarchies and categories. - The Knowledge Graph seems like the purest representation of all data in a company but requires you to have all the data in the right format correctly annotated, correctly maintained, changed, and available. - - It takes too much effort to maintain and keep it semantic instead of copy-paste text around. This is one of the most interesting [[Large Language Models]] application. + - It takes too much effort to maintain and keep it semantic instead of copy-paste text around. This is one of the most interesting [[Large Language Models]] application. - It offers no protection against some team inside the company breaking the whole web by moving to a different URI or refactoring their domain model in incompatible ways. - - For the Semantic Web to work, the infrastructure behind it needs to permanently keep all of the necessary sources that a file relies on. This could be a place where [[IPFS]] or others [[Decentralized Protocols]] could help! + - For the Semantic Web to work, the infrastructure behind it needs to permanently keep all of the necessary sources that a file relies on. This could be a place where [[IPFS]] or others [[Decentralized Protocols]] could help! - It tends to assume that the world fits into neat categories. Instead, we live in a world where membership in categories is partial, probabilistic, contested (Pluto), and changes over time. - The status quo of the semantic web space is still SPARQL. - You can build [a knowledge graph database on top of a relational engine](https://twitter.com/RelationalAI). diff --git a/Large Language Models.md b/Large Language Models.md index b8863c7..54cc971 100644 --- a/Large Language Models.md +++ b/Large Language Models.md @@ -17,7 +17,7 @@ - Designing prompts is an iterative process that requires a lot of experimentation to get optimal results. Start with simple prompts and keep adding more elements and context as you aim for better results. - Be very specific about the instruction and task you want the model to perform. The more descriptive and detailed the prompt is, the better the results. - Some additions: - - Be highly organized + - Be highly organized - Suggest solutions that I didn’t think about - be proactive and anticipate my needs - Treat me as an expert in all subject matter