Skip to content

bemibrando/dataex_dwe_challenge

Repository files navigation

DataEx DWE 2023 Challenge Azure

(Developed on March 2nd, 2023.)
This is a DW developed to a Challenge for the bootcamp Data Women Engineers 2023 from DataEx in partnership with Microsoft.


Build with:

Azure
Jupyter Notebook
T-SQL

Table of contents


Overview

The challenge


Analyze the execution of Government expenses, data source from the Transparency Portal, referring to data from Jan/22 to Mar/22.
At the end of the challenge, you will have to send the data dictionary, the dimensional diagram of the applied modeling and the jupyter notebook.

Project Composition

1- DW_09_CHALLENGE_DICIONARIO_DADOS.pdf
A document that contains metaata about the data within a database. It serves as a comprehensive reference guide that provides detailed information about the structure, organization and meaning of the data elements stored int he system.

2- DW_09_CHALLENGE_DIAGRAMA.pdf
Dimensional diagram used to organize and represent data for analytics and reporting purposes.

3- DESAFIO_ALUNA09.ipynb
Jupyter notebook containing the solution to the challenge.
Developed the process of extracting raw data from sources, transforming it into a usable format, including cleaning, filtering, aggregating the data and loading it into a destination where it can be accessed and analyzed effectively.

Links

My Process

Built with

  • The project was developed using the Azure environment and T-SQL im Jupyter Notebook
  • A log table was created to assist in the development process and store error messages incase of failure.

What I learned

  • This project was my first endeavor in the field of data engineering, and the entire ETL process was a completely new experience for me.
  • With the correction, I understood how the time dimension works and realized that I failed to apply it in this project.
  • In the table "UnidadeGestora", there were records where the same code had different names. To handle this, I created a temporary table, grouping the codes into partitions and sorting them alphabetically. Then, I selected the first record from each partition.

Author

Bianca Emi profile's photo
Bianca Emi

Made with ♥ by Bianca Emi 👋 Get in touch!



Acknowledgments

  • DataEx is a consolidated company which offers consulting services in Power BI, Business Intelligence, Artificial Intelligence, Analytics, Cloud Transformation Journey, Data Management and App Development, for companies.
  • Andre Rosa Bootcamp instructor, a Brazilian professional with over 20 years of experience in Information Technology.
  • Microsoft Bootcamp supporter. Microsoft is an American multinational corporation and technology company

About

DW developed for the bootcamp taught by DataEx in partnership with Microsoft

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published