diff --git a/tutorial/onboarding/cheat-sheet/index.md b/tutorial/onboarding/cheat-sheet/index.md index 5591e87a4..02a6f31fc 100644 --- a/tutorial/onboarding/cheat-sheet/index.md +++ b/tutorial/onboarding/cheat-sheet/index.md @@ -12,12 +12,12 @@ the core building blocks of a standard use case in Keboola. {:toc} ## Extracting Data from Sources -### Proper User Credentials +### User Credentials When working with data source components in Keboola, proper authorization is crucial. This involves providing credentials and connection details for source databases or relevant tokens and API keys for extracting data from services. It is advisable to use technical user credentials created specifically for Keboola integrations, as using the credentials of a real person may present challenges related to permissions, potential changes or terminations, and password resets. -### Accessibility of Your Data Sources +### Accessibility of Data Sources Ensure that the data sources you intend to integrate are accessible from the Keboola platform. Internal databases running on on-premise servers or private clouds may not be accessible by default. In such cases, consider whitelisting Keboola's IP addresses, establishing an SSH tunnel (if supported by the Keboola component), or requesting Keboola to set up a VPN server. Collaboration with administrators or owners of the data source on your side, coupled with support from the Keboola @@ -28,7 +28,7 @@ When integrating typical data sources such as MS SQL Server, PostgreSQL, MySQL, extract everything without evaluating necessity. This approach can lead to unnecessary credit spending. It is recommended to initially extract a limited batch of data to verify its relevance before proceeding with a full replication of the entire data history from the source. -### Incremental Fetching and Incremental Loading +### Incremental Fetching and Loading **Incremental fetching:** Keboola's ability to read data from the source in increments, either through specific parameters or Keboola's incremental fetching options in database connectors, is beneficial for larger datasets. This setup is particularly useful when the full extraction time exceeds that of extracting increments only. @@ -40,7 +40,7 @@ matching primary key values. Incremental load without a primary key set would al It's important to note that certain connectors may automatically implement both incremental fetching and loading without requiring manual setup by users. This information is usually highlighted in the configuration UI, providing users with transparency about the implemented behaviour. -### Optimizing with Parallelization +### Optimize with Parallelization To optimize the overall runtime of your pipeline, consider employing parallelization to execute multiple configurations simultaneously. It's essential to recognize that while parallelization can significantly reduce the total runtime, each individual job consumes credits independently. Therefore, parallelization is a tool for optimizing the execution timeline rather than cost. **Where to apply parallelization:** @@ -66,12 +66,12 @@ but there is a limit on the maximum parallel Storage jobs in a project. In multi slots, potentially extending the overall runtime. The default limit for parallel Storage jobs is 10, but it can be increased through Keboola Support. ## Developing a Transformation -### Using a Workspace for Development +### Using a Workspace It's common for users to directly dive into the Transformations section of the UI to set up and test scripts. However, this approach may not be optimal. Executing a transformation component, whether it involves Python, R, or SQL transformations, always incurs some overhead from the component execution layered on top of the script execution. This can result in unnecessary credit consumption during code debugging. Our recommendation is to start by creating a Workspace for development purposes. Develop and test your code within the Workspace environment. Once your script is functioning correctly, you can then transfer it to a Transformation configuration and execute it, ensuring more efficient credit usage. -### Input and Output Mapping in a Transformation +### Input and Output Mapping Every transformation operates within its designated, temporary transformation workspace. When a Transformation is executed, it establishes this distinct workspace, which is isolated from the primary Keboola Storage. Consequently, within your code, you cannot directly access all Storage Objects; instead, you must load selected Storage objects into your transformation using an input mapping. @@ -186,13 +186,13 @@ This is beneficial for tasks that may regularly fail due to specific conditions, Alternatively, it is suitable for independent tasks whose failure does not impact the rest of the flow. However, monitoring execution statuses becomes crucial to promptly address potential errors and implement necessary fixes. -### Notifications for Insightful Monitoring +### Notifications for Monitoring For a seamless execution of your use-cases, staying informed about errors or unusual execution times in your flows is crucial. Configure **notifications** within your flow to receive timely updates. Teams often opt to configure a group mailbox for specific user groups, ensuring that all team members receive notifications regarding errors, warnings, or instances where the flow runs longer than the expected duration. This proactive approach enhances awareness and facilitates prompt responses to any issues that may arise. -### Automating Flows with Scheduled Execution +### Automating Flows **Date & Time Schedule:** The most common setup for automating flows involves scheduling them to run at specific time slots. In a multi-tenant stack, it's advisable to avoid peak time slots, such as midnight, to optimize resource availability. A simple adjustment, like scheduling your flow for 0:15 am, can positively impact execution, minimizing competition for resources within the multi-tenant environment. @@ -213,7 +213,7 @@ attempting to write data to a destination. Frequently, specific privileges are e spectrum of users within an organization. Insufficient permissions often manifest as errors when writing data to a destination. In such cases, Keboola Support is available to assist in identifying the specific permissions required for a particular component. -### Who You’re Providing Access to Data +### Who Gets Access to Data In the Keboola project, you have a precise understanding of who can access the integrated data. However, when writing data to a destination, whether it's a database, object storage, or an API/service, you are essentially extending access to those data to users who have privileges for that specific destination. It is crucial to be vigilant and ensure that you do not inadvertently share your data with unintended recipients. @@ -239,7 +239,7 @@ handling of data updates, tailored to the specific requirements of the destinati ### Caution Before Data Writing To be straightforward, it's crucial to thoroughly understand the implications of your actions. While Keboola offers a straightforward process for restoring data in case of accidental corruption, this may not hold true for the destination where you intend to write your data. The restoration of data in such destinations can be challenging, and in certain instances, it might even be impossible. Therefore, exercising heightened caution is strongly advised. Make sure you are well-informed and deliberate in your decisions when it comes to writing data, recognizing that the ease of recovery in Keboola may not necessarily extend to all destinations. -## Understanding Job Log and Troubleshooting +## Job Log and Troubleshooting Whether you're a seasoned data engineer or just starting out, encountering errors during development is inevitable. Here are some tips for effectively troubleshooting errors. diff --git a/tutorial/onboarding/usage-blueprint/index.md b/tutorial/onboarding/usage-blueprint/index.md index 2eb39dfcb..51b8c9ffc 100644 --- a/tutorial/onboarding/usage-blueprint/index.md +++ b/tutorial/onboarding/usage-blueprint/index.md @@ -5,95 +5,89 @@ permalink: /tutorial/onboarding/usage-blueprint/ > Welcome to your personalized Keboola Platform Usage Blueprint Document! > -> *This comprehensive guide is designed to serve as a foundation for crafting your customized documentation. -> Each section outlines its purpose and recommends content, providing you with a framework that can be tailored to your organization's specific principles -> and details. As you navigate through this document, notes formatted for guidance will accompany you, ensuring a seamless and -> straightforward customization process. Feel free to delete these notes once you've incorporated your unique insights and details into your personalized -> blueprint. Let's embark on this journey of building a tailored documentation resource that aligns perfectly with your organization's needs and practices.* +> *This detailed guide helps you create your own documentation, explaining each part and what to include. +> It's designed to fit your organization's specific needs and values. You'll find helpful notes throughout to make customization easy. +> Once you've added your own details, you can remove these notes. Let's start creating documentation that matches your organization's unique requirements!* * TOC {:toc} ## Getting Access to the Keboola Platform ### Keboola Administration -> *In the initial stages, you'll need to identify your Keboola Organization administrators who will have the authority to create new Keboola projects -> and invite initial users. It is generally recommended to keep the number of Organization administrators limited, typically ranging from 2 to 4, -> based on the organization's size.* +> *At the start, choose a few people (usually 2 to 4, depending on how big your organization is) to be your Keboola organization administrators. +> They'll have the power to set up new Keboola projects and add the first users. It's best to keep this group small.* -**Keboola Organization admin** is a role with permissions to: +**Keboola organization admin** is a role with permissions to: - Leave and re-enter all existing projects in the organization. - View and edit billing details. - Manage [shared buckets](/catalog/#sharing-types). - Create [new projects](/management/organization/#manage-projects). - Change [organization settings](/management/organization/#organization-settings). -- Allow [Keboola Support team](/management/support/#require-approval-for-support-access) to join your projects. +- Allow [Keboola support team](/management/support/#require-approval-for-support-access) to join your projects. Our Keboola organization administrators are: - Name, [email@company.com](email@company.com) - Name, [email2@company.com](email2@company.com) - Name, [email3@company.com](email3@company.com) -#### User requesting access to a Keboola project -> *Usually, Keboola Administrators maintain a list of current projects, including the project owner or lead project engineer. -> Users are directed to this list to contact the project owner directly. The project owner can then follow the guide in our public documentation -> to invite the user into a project.* +#### User requesting access to a project +> *Usually, Keboola administrators keep a list of all current projects, project owners, and lead project engineers. +> If you want access to a project, find the project leader on this list and ask them directly. They can use our public guide to add you to the project.* > -> *It's important to note that if you are using a single-tenant Keboola deployment with customized identity and access management, such as Active Directory, -> this process may not apply to you. In such cases, you should describe your organization's specific process for users to follow.* +> *Keep in mind, if you use a single-tenant Keboola deployment with customized identity and access management, such as Active Directory, +> this process may not apply to you. In that case, describe your organization's own process.* -To request access to any of our existing Keboola projects, please contact the Project Owner directly to invite you to the project. Below is a list of our existing Keboola projects. +To join an existing Keboola project, reach out to the **project owner** directly to invite you to the project. We’ve listed all our current projects below. | Project ID | Name | Description | Owner | |---|---|---|---| | 111 | [PROD] Marketing | Production project for marketing campaign automation | Jane Doe; jane@company.com | #### Member accessing a project -> *Depending on your selection, you might be operating in a multi-tenant Azure (North Europe region), AWS (US or EU regions), or a GCP (Europe region) stack, -> or in your dedicated single-tenant stack. The location of the deployment determines the root URL that'll take you to the platform's UI. -> Please find the relevant link for your deployment:* +> *Depending on your selection, you might be operating in a multi-tenant Azure (North Europe region), AWS (US or EU regions), +> or GCP (Europe region) stack, or a dedicated single-tenant stack. The location of the stack determines the base URL that’ll take you to the platform’s UI. +> Check below for the link that matches your stack:* > -> *Azure North Europe: [https://connection.north-europe.azure.keboola.com/admin/](https://connection.north-europe.azure.keboola.com/admin/) -> AWS EU: [https://connection.eu-central-1.keboola.com/admin](https://connection.eu-central-1.keboola.com/admin) -> AWS US: [https://connection.keboola.com/admin](https://connection.keboola.com/admin) -> GCP EU: [TODO Please insert the link for GCP EU] -> Single tenant stack: All relevant information is available within your Production Design document.* +> - Azure North Europe: [https://connection.north-europe.azure.keboola.com/admin/](https://connection.north-europe.azure.keboola.com/admin/) +> - AWS EU: [https://connection.eu-central-1.keboola.com/admin](https://connection.eu-central-1.keboola.com/admin) +> - AWS US: [https://connection.keboola.com/admin](https://connection.keboola.com/admin) +> - GCP EU: [TODO Please insert the link for GCP EU] +> - Single tenant stack: All relevant information is available within your Production Design document. -Navigate to the login site of our Keboola platform here: [https://connection.keboola.com/admin](https://connection.keboola.com/admin). -After you log in, you'll see a list of projects you have access to. Click on the selected project name to access the project environment. +Navigate to the login site of the Keboola platform here: [https://connection.keboola.com/admin](https://connection.keboola.com/admin). +After you log in, you'll see a list of projects you can access. Click on the selected project name to access the project environment. -#### Requesting a new Keboola project -If you wish to develop your own use-cases in Keboola, please reach out to one of the Organization administrators mentioned above to create a project for you. +#### Requesting a new project +If you wish to develop your own use cases in Keboola, reach out to one of the organization administrators mentioned above to create a project for you. -> *To request the creation of a new project, the requester should get in touch with a Keboola Administrator. -> The Organization Administrator holds the exclusive role of managing Keboola projects.* +> *To have a new project created, contact a Keboola administrator. Organization administrators are the only ones who can set up Keboola projects.* > -> *The specific process may vary for each organization. In larger organizations, it is common to implement a questionnaire or form that users can use -> to request a project. This form often provides administrators with additional details they may need for the project creation process.* +> *The way to request a new project might vary based on the company size. In larger companies, using a form or questionnaire for project requests is common. +> This helps give the administrators the extra information they need to create the project.* -#### Processing user termination -Terminated users must be manually removed from all projects they are members of. Keboola Organization administrators can leverage Telemetry data -(refer to the [**Keboola Governance Guide**](/tutorial/onboarding/governance-guide/) for more details) to identify the projects and -subsequently remove users from individual projects. +#### User termination +Terminated users must be manually removed from all projects they are members of. Keboola organization administrators can use telemetry data +(see the [**Keboola Governance Guide**](/tutorial/onboarding/governance-guide/) for more details) to determine which projects to remove them from. -Alternatively, Project Owners can take the responsibility to remove terminated users from their respective projects. +Or, project owners can remove the terminated user themselves. -It's important to note that removing a user from a project will not affect any configurations created by them. Configurations will still remain usable -and functional after the user is removed. +***Note:** Removing a user from a project will not affect any configurations they set up. +All their configurations will remain usable and functional after the user is removed.* ### Project Naming Conventions -> *It is advisable to establish and uphold naming conventions for project names. The chosen names should be clear and indicative of the project's purpose. -> Below, we suggest some of the typical conventions. Please refer to the [Multi-Project Architecture Guide](/tutorial/onboarding/architecture-guide/) -> for details about different project levels/stages etc.* +> *It is a good idea to establish and follow a convention for creating project names so they are clear and show what the project is about. +> We suggest some of the typical conventions below. See the [Multi-Project Architecture Guide](/tutorial/onboarding/architecture-guide/) +> for more on how to name different project levels, stages, etc.* > -> *In the following example, numerical codes such as 00, 10, 20 are used for project levels, but they can also be represented as L0, L1, L2, or other variations.* +> *In the following example, we use numerical codes like 00, 10, and 20 to show project levels, but you can also use L0, L1, L2, or other styles.* -All projects within our organization adhere to a specific naming convention: +All projects in our organizaation follow this naming convention: `[STAGE]{Domain - optional}[Region - optional] Project Name` -For the **Project Name** part, the convention dictates capitalizing the initial letter of each word, except for conjunctions like "and" and "or." +In the **Project Name**, we capitalize the first letter of each word, except for conjunctions like "and" and "or." -Here are examples of project names: +Here are examples of how we name projects: - `[10]{Sales}[EU] Financial Reporting` - `[00]{Sales} Corporate Rrofiling` @@ -101,27 +95,24 @@ Here are examples of project names: ## Keboola Project Rules and Principles ### Managing Project Users -> *As we mentioned in the previous chapters it’s typical to identify a Project Owner who is mainly responsible for managing project users. -> Keboola identifies several project user roles.* +> *As mentioned before, a project owner is mainly responsible for managing project users. Keboola has different roles for project users.* -The Project Owner bears the responsibility of ensuring that users are invited into their projects with the suitable roles. -Keboola user roles are documented [here](https://help.keboola.com/management/project/users/#user-roles). -It's essential to understand that the status of Project Owner is purely a formal role and -doesn't directly correspond to specific Keboola project roles and privileges. In most cases, users are invited under a project admin role -(or a share role) unless there is a specific requirement for a different role. +The project owner needs to ensure users are added with the right roles. Keboola user roles are documented [here](https://help.keboola.com/management/project/users/#user-roles). +Remember, being a project owner is a formal role and doesn’t directly correspond to specific Keboola project roles and privileges. +Usually, users are invited as project admins or with a sharing role unless they need a different one. -To invite or remove a user from your project, please refer to the official steps outlined in the Keboola documentation [here](https://help.keboola.com/management/project/users/#inviting-user). +To invite or remove a user from your project, follow the steps in the [Keboola documentation](https://help.keboola.com/management/project/users/#inviting-user). ### Naming Conventions -> *Implementing naming conventions for all components in Keboola is recommended. This ensures that the project remains organized, comprehensible, -> and simpler to manage and navigate. Keep in mind that the following is just a suggestion, as there is no universally recognized best practice -> for naming conventions.* +> *Using naming conventions for all Keboola components is recommended to keep your project well organized, comprehensible, and simpler to manage and navigate. +> The guidelines below are just suggestions, as there is no universally recognized best practice for naming.* -All configurations for Keboola components, including data source and data destination connectors, Transformations, Workspaces, Flows, and others, along with any Storage objects created, are required to adhere to the standard naming conventions outlined below. +Make sure to apply the naming rules below to all configurations of Keboola components, including data source and data destination connectors, +transformations, workspaces, flows, and any other Storage objects you create. #### Component configurations -The name of each component (data source/data destination connector or application) should include a use case/category (if possible), a domain (if applicable), -and the configuration name should follow the convention of capitalizing the first letter. +When naming a component (like a data connector or application), include its use case or category and, if relevant, its domain. +Start each word in the configuration name with a capital letter. `{Domain}[USECASE - optional] Configuration Name` @@ -129,139 +120,129 @@ Examples: - `{Sales}[REPORTING] Payments and Invoices` - `{Operations}[PLANNING] Workloads and Plan` -Certain components, particularly data source connectors, may incorporate configuration rows. For example, in a database configuration, -a configuration row signifies a connection to a specific individual table. In this scenario, it is recommended to maintain the specific source object name -as the configuration row's name. If it's an API or service connector, the name should represent a distinct endpoint or domain. +For components with configuration rows, like connectors loading data from a database, name each row after the specific table it connects to. +The name should reflect the specific endpoint or domain if it's an API or service connector. #### Transformations -For transformations and optional transformation folders, employ a systematic approach such as `[USECASE - optional] Transformation Name` to categorize -and distinguish different transformations. +When naming transformations or optional folders, use a format like `[USECASE - optional] Transformation Name` to keep them organized. Examples: - `[REPORTING] Payment Data Preprocessing` - `[REPORTING] Invoices Denormalization` -If it's anticipated that individual transformations are components of a larger process and will consistently be executed together in a specific order, -consider incorporating a stage sign, like: +If transformations are part of a bigger process and run in a set order, add a number to show the sequence: - `[REPORTING][00] Payment Data Preprocessing` - `[REPORTING][01] Invoices Denormalization` - `[REPORTING][02] Main Report Calculation` -Transformations can be further grouped in Transformation Folders. Naming of such folders should follow a convention of `[USECASE] Transformation Folder Name`. +Transformations can also be grouped into transformation folders. Name such folders using the following format: `[USECASE] Transformation Folder Name`. Example: - `[REPORTING] Financial Reporting` #### Workspaces -Names of private workspaces are completely up to the user. For shared workspaces, please follow the convention of `[USECASE]{Owner-optional} Workspace Name`. +You can name private workspaces however you like. For shared workspaces, use `[USECASE]{Owner-optional} Workspace Name`. Example: - `[REPORTING]{Jane} ML Model Development` #### Flows -Flow names should clearly articulate their purpose and use case. If there are dependencies between flows within one project, you can replicate the stage signs -as suggested in the Transformations section above. Also, you can introduce a Domain unless the Domain is given by the project directly. -The convention is `[USECASE]{Domain}[STAGE] Flow Name`. +Flow names need to state their purpose and use case. If flows depend on each other in a project, use the stage signs like in the Transformations section. +You can add a domain if it’s not part of the project. The convention is `[USECASE]{Domain}[STAGE] Flow Name`. Examples: - `[REPORTING]{Sales} Main Reporting Calculations` - `[PLANNING][00] Data extractions` - `[PLANNING][01] Data normalization` -Flows can be further grouped in Flows Folders. Naming of such folders should follow a convention of `{Group} Flow Folder Name`. -The `{Group}` can be for example the `Domain` or `STAGE` or simply anything you’d like to group your flows by. +Flows can also be grouped into flow folders. Name such folders using the following format: `{Group} Flow Folder Name`. +The `{Group}` can be, for example, the `Domain` or `STAGE` or simply anything that helps to organize your flows. Example: - `{Sales} Financial Reporting` #### Storage -Please be aware that certain objects, specifically Storage bucket names and Storage table names, are automatically generated by Keboola data source connectors. -The conventions outlined below are applicable solely to objects created through configurations, where users have the ability to set the object names. - -For Storage buckets, tables, and columns, use the following rules: -1. Use uppercase `SNAKE_CASE` type of naming convention. -2. Avoid using `OUT` buckets for temporary tables, by putting something in an out bucket you're de facto declaring "these are processed and validated data that are ready for consumption". Please refer to Keboola official documentation for details about IN and OUT stages of Storage Buckets [here](/storage/buckets/). -3. Buckets linked to your project from [Data Catalog](/catalog/) should be marked with a `SHARED` keyword, e.g., `SHARED_REPORTING_FINANCIAL` for a bucket that’s named `REPORTING_FINANCIAL` in the source project. -4. Avoid using too general words like `MAIN` and rather choose something a bit more descriptive `SALES_METRICS`. Even if it is a final output bucket from a domain specific project `Sales` it should be named as `SALES_MAIN` rather than just `MAIN`. -5. Clearly differentiate between storage stages IN/OUT: - - IN stage - - Anything that comes to the project from the outside, before it is processed - - Raw data from data source components, shared buckets - - Tabular configurations for other components - - OUT stage - - Processed, output data that is ready for outputting outside – to BI tools, Snowflake, other projects. -6. Set “_PK” and “_ID” columns within each table to clearly define primary and foreign keys. +Keep in mind that Storage bucket and table names are automatically created by Keboola data source connectors. +The following naming conventions are for objects you create yourself. + +For Storage buckets, tables, and columns: +1. Use uppercase `SNAKE_CASE` naming. +2. Don't use `OUT` buckets for temporary tables. `OUT` buckets should only have data that's ready to use. See Keboola's documentation for more on IN and OUT Storage buckets [here](/storage/buckets/). +3. Marke buckets from the [Data Catalog](/catalog/) as `SHARED`, like `SHARED_REPORTING_FINANCIAL`. +4. Be specific with names. Instead of `MAIN`, use something descriptive like `SALES_METRICS`. Even in specific projects, use `SALES_MAIN` rather than just `MAIN`. +5. Clearly separate IN/OUT Storage stages: + - IN stage: Incoming data, like raw data or shared buckets + - OUT stage: Processed data ready to be used elsewhere (in BI tools, Snowflake, other projects). +6. Set “_PK” and “_ID” columns within each table to mark primary and foreign keys.