From e1f4e558fa29aee63cdc81c6666ce7ccd32f6007 Mon Sep 17 00:00:00 2001 From: Aleksandra Nenadic Date: Thu, 14 Dec 2023 19:11:10 +0000 Subject: [PATCH] Mainly review of episode on refactoring --- _episodes/30-section3-intro.md | 2 +- _episodes/31-software-requirements.md | 4 +- _episodes/32-software-design.md | 5 +- _episodes/33-refactoring-functions.md | 272 ----------------- _episodes/33-refactoring.md | 274 ++++++++++++++++++ ...ng-decoupled-units.md => 34-decoupling.md} | 87 +++--- ...tecture.md => 35-software-architecture.md} | 3 +- 7 files changed, 327 insertions(+), 320 deletions(-) delete mode 100644 _episodes/33-refactoring-functions.md create mode 100644 _episodes/33-refactoring.md rename _episodes/{34-refactoring-decoupled-units.md => 34-decoupling.md} (84%) rename _episodes/{35-refactoring-architecture.md => 35-software-architecture.md} (99%) diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index e969de22a..461d55f4c 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -2,7 +2,7 @@ title: "Section 3: Software Development as a Process" colour: "#fafac8" start: true -teaching: 5 +teaching: 10 exercises: 0 questions: - "How can we design and write 'good' software that meets its goals and requirements?" diff --git a/_episodes/31-software-requirements.md b/_episodes/31-software-requirements.md index 917726df2..87634a989 100644 --- a/_episodes/31-software-requirements.md +++ b/_episodes/31-software-requirements.md @@ -1,7 +1,7 @@ --- title: "Software Requirements" -teaching: 15 -exercises: 30 +teaching: 25 +exercises: 15 questions: - "Where do we start when beginning a new software project?" - "How can we capture and organise what is required for software to function as intended?" diff --git a/_episodes/32-software-design.md b/_episodes/32-software-design.md index 7cf76c767..33c3822c2 100644 --- a/_episodes/32-software-design.md +++ b/_episodes/32-software-design.md @@ -74,8 +74,7 @@ goal of having *maintainable* code, which is: * using meaningful and descriptive names for variables, functions, and classes * documenting code to describe it does and how it may be used * using simple control flow to make it easier to follow the code execution - * keeping functions and methods small and focused on a single task and avoiding large functions - that do a little bit of everything (also important for testing) + * keeping functions and methods small and focused on a single task (also important for testing) * *testable* through a set of (preferably automated) tests, e.g. by: * writing unit, functional, regression tests to verify the code produces the expected outputs from controlled inputs and exhibits the expected behavior over time @@ -125,7 +124,7 @@ software project and try to identify ways in which it can be improved. > {: .solution} {: .challenge} -## Technical Debt +## Poor Design Choices & Technical Debt When faced with a problem that you need to solve by writing code - it may be tempted to skip the design phase and dive straight into coding. diff --git a/_episodes/33-refactoring-functions.md b/_episodes/33-refactoring-functions.md deleted file mode 100644 index 42eae41f7..000000000 --- a/_episodes/33-refactoring-functions.md +++ /dev/null @@ -1,272 +0,0 @@ ---- -title: "Refactoring Functions to Do Just One Thing" -teaching: 30 -exercises: 20 -questions: -- "How do you refactor code without breaking it?" -- "How do you write code that is easy to test?" -- "What is functional programming?" -- "Which situations/problems is functional programming well suited for?" -objectives: -- "Understand how to refactor functions to be easier to test" -- "Be able to write regressions tests to avoid breaking existing code" -- "Understand what a pure function is." -keypoints: -- "By refactoring code into pure functions that act on data makes code easier to test." -- "Making tests before you refactor gives you confidence that your refactoring hasn't broken anything" -- "Functional programming is a programming paradigm where programs are constructed by applying and composing smaller and simple functions into more complex ones (which describe the flow of data within a program as a sequence of data transformations)." ---- - -## Introduction - -In this episode we will take some code and refactor it in a way which is going to make it -easier to test. -By having more tests, we can more confident of future changes having their intended effect. -The change we will make will also end up making the code easier to understand. - -## Writing tests before refactoring - -The process we are going to be following is: - -1. Write some tests that test the behaviour as it is now -2. Refactor the code to be more testable -3. Ensure that the original tests still pass - -By writing the tests *before* we refactor, we can be confident we haven't broken -existing behaviour through the refactoring. - -There is a bit of a chicken-and-the-egg problem here however. -If the refactoring is to make it easier to write tests, how can we write tests -before doing the refactoring? - -The tricks to get around this trap are: - - * Test at a higher level, with coarser accuracy - * Write tests that you intend to remove - -The best tests are ones that test single bits of code rigorously. -However, with this code it isn't possible to do that. - -Instead we will make minimal changes to the code to make it a bit testable, -for example returning the data instead of visualising it. - -We will make the asserts verify whatever the outcome is currently, -rather than worrying whether that is correct. -These tests are to verify the behaviour doesn't *change* rather than to check the current behaviour is correct. -This kind of testing is called **regression testing** as we are testing for -regressions in existing behaviour. - -As with everything in this episode, there isn't a hard and fast rule. -Refactoring doesn't change behaviour, but sometimes to make it possible to verify -you're not changing the important behaviour you have to make some small tweaks to write -the tests at all. - -> ## Exercise: Write regression tests before refactoring -> Add a new test file called `test_compute_data.py` in the tests folder. -> Add and complete this regression test to verify the current output of `analyse_data` -> is unchanged by the refactorings we are going to do: -> ```python -> def test_analyse_data(): -> from inflammation.compute_data import analyse_data -> path = Path.cwd() / "../data" -> result = analyse_data(path) -> -> # TODO: add an assert for the value of result -> ``` -> Use `assert_array_almost_equal` from the `numpy.testing` library to -> compare arrays of floating point numbers. -> -> You will need to modify `analyse_data` to not create a graph and instead -> return the data. -> ->> ## Hint ->> You might find it helpful to assert the results equal some made up array, observe the test failing ->> and copy and paste the correct result into the test. -> {: .solution} -> ->> ## Solution ->> One approach we can take is to: ->> * comment out the visualize (as this will cause our test to hang) ->> * return the data instead, so we can write asserts on the data ->> * See what the calculated value is, and assert that it is the same ->> Putting this together, you can write a test that looks something like: ->> ->> ```python ->> import numpy.testing as npt ->> from pathlib import Path ->> ->> def test_analyse_data(): ->> from inflammation.compute_data import analyse_data ->> path = Path.cwd() / "../data" ->> result = analyse_data(path) ->> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211, ->> 0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094, ->> 1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312, ->> 1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578, ->> 0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417, ->> 0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707, ->> 0.50323031,0.47574665,0.45197398,0.22070227] ->> npt.assert_array_almost_equal(result, expected_output) ->> ``` ->> ->> Note - this isn't a good test: ->> * It isn't at all obvious why these numbers are correct. ->> * It doesn't test edge cases. ->> * If the files change, the test will start failing. ->> ->> However, it allows us to guarantee we don't accidentally change the analysis output. -> {: .solution} -{: .challenge} - -## Pure functions - -A **pure function** is a function that works like a mathematical function. -That is, it takes in some inputs as parameters, and it produces an output. -That output should always be the same for the same input. -That is, it does not depend on any information not present in the inputs (such as global variables, databases, the time of day etc.) -Further, it should not cause any **side effects**, such as writing to a file or changing a global variable. - -You should try and have as much of the complex, analytical and mathematical code in pure functions. - -By eliminating dependency on external things such as global state, we -reduce the cognitive load to understand the function. -The reader only needs to concern themselves with the input -parameters of the function and the code itself, rather than -the overall context the function is operating in. - -Similarly, a function that *calls* a pure function is also easier -to understand. -Since the function won't have any side effects, the reader needs to -only understand what the function returns, which will probably -be clear from the context in which the function is called. - -This property also makes them easier to re-use as the caller -only needs to understand what parameters to provide, rather -than anything else that might need to be configured -or side effects for calling it at a time that is different -to when the original author intended. - -Some parts of a program are inevitably impure. -Programs need to read input from the user, or write to a database. -Well designed programs separate complex logic from the necessary impure "glue" code that interacts with users and systems. -This way, you have easy-to-test, easy-to-read code that contains the complex logic. -And you have really simple code that just reads data from a file, or gathers user input etc, -that is maybe harder to test, but is so simple that it only needs a handful of tests anyway. - -> ## Exercise: Refactor the function into a pure function -> Refactor the `analyse_data` function into a pure function with the logic, and an impure function that handles the input and output. -> The pure function should take in the data, and return the analysis results: -> ```python -> def compute_standard_deviation_by_day(data): -> # TODO -> return daily_standard_deviation -> ``` -> The "glue" function should maintain the behaviour of the original `analyse_data` -> but delegate all the calculations to the new pure function. ->> ## Solution ->> You can move all of the code that does the analysis into a separate function that ->> might look something like this: ->> ```python ->> def compute_standard_deviation_by_day(data): ->> means_by_day = map(models.daily_mean, data) ->> means_by_day_matrix = np.stack(list(means_by_day)) ->> ->> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) ->> return daily_standard_deviation ->> ``` ->> Then the glue function can use this function, whilst keeping all the logic ->> for reading the file and processing the data for showing in a graph: ->>```python ->>def analyse_data(data_dir): ->> """Calculate the standard deviation by day between datasets ->> Gets all the inflammation csvs within a directory, works out the mean ->> inflammation value for each day across all datasets, then graphs the ->> standard deviation of these means.""" ->> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) ->> if len(data_file_paths) == 0: ->> raise ValueError(f"No inflammation csv's found in path {data_dir}") ->> data = map(models.load_csv, data_file_paths) ->> daily_standard_deviation = compute_standard_deviation_by_day(data) ->> ->> graph_data = { ->> 'standard deviation by day': daily_standard_deviation, ->> } ->> # views.visualize(graph_data) ->> return daily_standard_deviation ->>``` ->> Ensure you re-run our regression test to check this refactoring has not ->> changed the output of `analyse_data`. -> {: .solution} -{: .challenge} - -### Testing Pure Functions - -Now we have a pure function for the analysis, we can write tests that cover -all the things we would like tests to cover without depending on the data -existing in CSVs. - -This is another advantage of pure functions - they are very well suited to automated testing. - -They are **easier to write** - -we construct input and assert the output -without having to think about making sure the global state is correct before or after. - -Perhaps more important, they are **easier to read** - -the reader will not have to open up a CSV file to understand why the test is correct. - -It will also make the tests **easier to maintain**. -If at some point the data format is changed from CSV to JSON, the bulk of the tests -won't need to be updated. - -> ## Exercise: Write some tests for the pure function -> Now we have refactored our a pure function, we can more easily write comprehensive tests. -> Add tests that check for when there is only one file with multiple rows, multiple files with one row -> and any other cases you can think of that should be tested. ->> ## Solution ->> You might have thought of more tests, but we can easily extend the test by parametrizing ->> with more inputs and expected outputs: ->> ```python ->>@pytest.mark.parametrize('data,expected_output', [ ->> ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), ->> ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), ->> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) ->>], ->>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) ->>def test_compute_standard_deviation_by_day(data, expected_output): ->> from inflammation.compute_data import compute_standard_deviation_by_data ->> ->> result = compute_standard_deviation_by_data(data) ->> npt.assert_array_almost_equal(result, expected_output) -``` -> {: .solution} -{: .challenge} - -## Functional Programming - -**Pure Functions** are a concept that is part of the idea of **Functional Programming**. -Functional programming is a style of programming that encourages using pure functions, -chained together. -Some programming languages, such as Haskell or Lisp just support writing functional code, -but it is more common for languages to allow using functional and **imperative** (the style -of code you have probably been writing thus far where you instruct the computer directly what to do). -Python, Java, C++ and many other languages allow for mixing these two styles. - -In Python, you can use the built-in functions `map`, `filter` and `reduce` to chain -pure functions together into pipelines. - -In the original code, we used `map` to "map" the file paths into the loaded data. -Extending this idea, you could then "map" the results of that through another process. - -You can read more about using these language features [here](https://www.learnpython.org/en/Map%2C_Filter%2C_Reduce). -Other programming languages will have similar features, and searching "functional style" + your programming language of choice -will help you find the features available. - -There are no hard and fast rules in software design but making your complex logic out of composed pure functions is a great place to start -when trying to make code readable, testable and maintainable. -This tends to be possible when: - -* Doing any kind of data analysis -* Simulations -* Translating data from one format to another - -{% include links.md %} diff --git a/_episodes/33-refactoring.md b/_episodes/33-refactoring.md new file mode 100644 index 000000000..0251b6ad3 --- /dev/null +++ b/_episodes/33-refactoring.md @@ -0,0 +1,274 @@ +--- +title: "Refactoring Code" +teaching: 30 +exercises: 20 +questions: +- "How do you refactor code without breaking it?" +- "What are benefits of pure functions?" +objectives: +- "Understand the use of regressions tests to avoid breaking existing code when refactoring." +- "Understand the use of pure functions in software design to make the code easier to test." +keypoints: +- "Implementing regression tests before you refactor the code gives you confidence that your changes have not +broken anything." +- "By refactoring code into pure functions that process data without side effects makes code easier +to read, test and maintain." +--- + +## Introduction + +In this episode we will refactor the function `analyse_data()` in `compute_data.py` +from our project in the following two ways: +* add more tests so we can be more confident that future changes will have the +intended effect and will not break the existing code. +* split the `analyse_data()` function into a number of smaller (functions) making the code +easier to understand and test. + +## Writing Tests Before Refactoring + +When refactoring, it is useful to apply the following process: + +1. Write some tests that test the behaviour as it is now +2. Refactor the code +3. Check that the original tests still pass + +By writing the tests *before* we refactor, we can be confident we have not broken +existing behaviour through refactoring. + +There is a bit of a "chicken and egg" problem here - if the refactoring is supposed to make it easier +to write tests in the future, how can we write tests before doing the refactoring? +The tricks to get around this trap are: + + * Test at a higher level, with coarser accuracy + * Write tests that you intend to remove + +The best tests are ones that test single bits of functionality rigorously. +However, with our current `analyse_data()` code that is not possible because it is a +large function doing a little bit of everything. +Instead we will make minimal changes to the code to make it a bit more testable. + +Firstly, +we will modify the function to return the data instead of visualising it because graphs are harder +to test automatically (i.e. they need to be viewed and inspected manually in order to determine +their correctness). +Next, we will make the assert statements verify what the outcome is +currently, rather than checking whether that is correct or not. +Such tests are meant to +verify that the behaviour does not *change* rather than checking the current behaviour is correct +(there should be another set of tests checking the correctness). +This kind of testing is called **regression testing** as we are testing for +regressions in existing behaviour. + +Refactoring code is not meant to change its behaviour, but sometimes to make it possible to verify +you not changing the important behaviour you have to make small tweaks to the code to write +the tests at all. + +> ## Exercise: Write Regression Tests +> Modify the `analyse_data()` function not to plot a graph and return the data instead. +> Then, add a new test file called `test_compute_data.py` in the `tests` folder and +> add a regression test to verify the current output of `analyse_data()`. We will use this test +> in the remainder of this section to verify the output `analyse_data()` is unchanged each time +> we refactor or change code in the future. +> +> Start from the skeleton test code below: +> +> ```python +> def test_analyse_data(): +> from inflammation.compute_data import analyse_data +> path = Path.cwd() / "../data" +> result = analyse_data(path) +> +> # TODO: add an assert for the value of result +> ``` +> Use `assert_array_almost_equal` from the `numpy.testing` library to +> compare arrays of floating point numbers. +> +>> ## Hint +>> When determining the correct return data result to use in tests, it may be helpful to assert the +>> result equals some random made-up data, observe the test fail initially and then +>> copy and paste the correct result into the test. +> {: .solution} +> +>> ## Solution +>> One approach we can take is to: +>> * comment out the visualize method on `analyse_data()` +>> (as this will cause our test to hang waiting for the result data) +>> * return the data instead, so we can write asserts on the data +>> * See what the calculated value is, and assert that it is the same as the expected value +>> +>> Putting this together, your test may look like: +>> +>> ```python +>> import numpy.testing as npt +>> from pathlib import Path +>> +>> def test_analyse_data(): +>> from inflammation.compute_data import analyse_data +>> path = Path.cwd() / "../data" +>> result = analyse_data(path) +>> expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211, +>> 0.25104719,0.22330897,0.89680503,0.21573875,1.24235548,0.63042094, +>> 1.57511696,2.18850242,0.3729574,0.69395538,2.52365162,0.3179312, +>> 1.22850657,1.63149639,2.45861227,1.55556052,2.8214853,0.92117578, +>> 0.76176979,2.18346188,0.55368435,1.78441632,0.26549221,1.43938417, +>> 0.78959769,0.64913879,1.16078544,0.42417995,0.36019114,0.80801707, +>> 0.50323031,0.47574665,0.45197398,0.22070227] +>> npt.assert_array_almost_equal(result, expected_output) +>> ``` +>> +>> Note that while the above test will detect if we accidentally break the analysis code and +>> change the output of the analysis, is not a good or complete test for the following reasons: +>> * It is not at all obvious why the `expected_output` is correct +>> * It does not test edge cases +>> * If the data files in the directory change - the test will fail +>> +>> We would need additional tests to check the above. +> {: .solution} +{: .challenge} + +## Separating Pure and Impure Code + +Now that we have our regression test for `analyse_data()` in place, we are ready to refactor the +function further. +We would like to separate out as much of its code as possible as **pure functions**. +Pure functions are very useful and much easier to test as they take input only from its input +parameters and output only via their return values. + +### Pure Functions + +A pure function in programming works like a mathematical function - +it takes in some input and produces an output and that output is +always the same for the same input. +That is, the output of a pure function does not depend on any information +which is not present in the input (such as global variables). +Furthermore, pure functions do not cause any *side effects* - they do not modify the input data +or data that exist outside the function (such as printing text, writing to a file or +changing a global variable). They perform actions that affect nothing but the value they return. + +### Benefits of Pure Functions + +Pure functions are easier to understand because they eliminate side effects. +The reader only needs to concern themselves with the input +parameters of the function and the function code itself, rather than +the overall context the function is operating in. +Similarly, a function that calls a pure function is also easier +to understand - we only need to understand what the function returns, which will probably +be clear from the context in which the function is called. +Finally, pure functions are easier to reuse as the caller +only needs to understand what parameters to provide, rather +than anything else that might need to be configured prior to the call. +For these reasons, you should try and have as much of the complex, analytical and mathematical +code are pure functions. + + +Some parts of a program are inevitably impure. +Programs need to read input from users, generate a graph, or write results to a file or a database. +Well designed programs separate complex logic from the necessary impure "glue" code that +interacts with users and other systems. +This way, you have easy-to-read and easy-to-test pure code that contains the complex logic +and simplified impure code that reads data from a file or gathers user input. Impure code may +be harder to test but, when simplified like this, may only require a handful of tests anyway. + +> ## Exercise: Refactoring To Use a Pure Function +> Refactor the `analyse_data()` function to delegate the data analysis to a new +> pure function `compute_standard_deviation_by_day()` and separate it +> from the impure code that handles the input and output. +> The pure function should take in the data, and return the analysis result, as follows: +> ```python +> def compute_standard_deviation_by_day(data): +> # TODO +> return daily_standard_deviation +> ``` +>> ## Solution +>> The analysis code will be refactored into a separate function that may look something like: +>> ```python +>> def compute_standard_deviation_by_day(data): +>> means_by_day = map(models.daily_mean, data) +>> means_by_day_matrix = np.stack(list(means_by_day)) +>> +>> daily_standard_deviation = np.std(means_by_day_matrix, axis=0) +>> return daily_standard_deviation +>> ``` +>> The `analyse_data()` function now calls the `compute_standard_deviation_by_day()` function, +>> while keeping all the logic for reading the data, processing it and showing it in a graph: +>>```python +>>def analyse_data(data_dir): +>> """Calculate the standard deviation by day between datasets +>> Gets all the inflammation csvs within a directory, works out the mean +>> inflammation value for each day across all datasets, then graphs the +>> standard deviation of these means.""" +>> data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) +>> if len(data_file_paths) == 0: +>> raise ValueError(f"No inflammation csv's found in path {data_dir}") +>> data = map(models.load_csv, data_file_paths) +>> daily_standard_deviation = compute_standard_deviation_by_day(data) +>> +>> graph_data = { +>> 'standard deviation by day': daily_standard_deviation, +>> } +>> # views.visualize(graph_data) +>> return daily_standard_deviation +>>``` +>> Make sure to re-run the regression test to check this refactoring has not +>> changed the output of `analyse_data()`. +> {: .solution} +{: .challenge} + +### Testing Pure Functions + +Now we have our analysis implemented as a pure function, we can write tests that cover +all the things we would like to check without depending on CSVs files. +This is another advantage of pure functions - they are very well suited to automated testing, +i.e. their tests are: +* **easier to write** - we construct input and assert the output +without having to think about making sure the global state is correct before or after +* **easier to read** - the reader will not have to open a CSV file to understand why +the test is correct +* **easier to maintain** - if at some point the data format changes +from CSV to JSON, the bulk of the tests need not be updated + +> ## Exercise: Testing a Pure Function +> Add tests for `compute_standard_deviation_by_data()` that check for situations +> when there is only one file with multiple rows, +> multiple files with one row, and any other cases you can think of that should be tested. +>> ## Solution +>> You might have thought of more tests, but we can easily extend the test by parametrizing +>> with more inputs and expected outputs: +>> ```python +>>@pytest.mark.parametrize('data,expected_output', [ +>> ([[[0, 1, 0], [0, 2, 0]]], [0, 0, 0]), +>> ([[[0, 2, 0]], [[0, 1, 0]]], [0, math.sqrt(0.25), 0]), +>> ([[[0, 1, 0], [0, 2, 0]], [[0, 1, 0], [0, 2, 0]]], [0, 0, 0]) +>>], +>>ids=['Two patients in same file', 'Two patients in different files', 'Two identical patients in two different files']) +>>def test_compute_standard_deviation_by_day(data, expected_output): +>> from inflammation.compute_data import compute_standard_deviation_by_data +>> +>> result = compute_standard_deviation_by_data(data) +>> npt.assert_array_almost_equal(result, expected_output) +``` +> {: .solution} +{: .challenge} + +> ## Functional Programming +> **Functional programming** is a programming paradigm where programs are constructed by +> applying and composing/chaining pure functions. +> Some programming languages, such as Haskell or Lisp, support writing pure functional code only. +> Other languages, such as Python, Java, C++, allow mixing **functional** and **procedural** +> programming paradigms. +> Read more in the [extra episode on functional programming](/functional-programming/index.html) +> and when it can be very useful to switch to this paradigm +> (e.g. to employ MapReduce approach for data processing). +{: .callout} + + +There are no definite rules in software design but making your complex logic out of +composed pure functions is a great place to start when trying to make your code readable, +testable and maintainable. This is particularly useful for: + +* Data processing and analysis +(for example, using [Python Pandas library](https://pandas.pydata.org/) for data manipulation where most of functions appear pure) +* Doing simulations +* Translating data from one format to another + +{% include links.md %} diff --git a/_episodes/34-refactoring-decoupled-units.md b/_episodes/34-decoupling.md similarity index 84% rename from _episodes/34-refactoring-decoupled-units.md rename to _episodes/34-decoupling.md index a9e82d9a9..02ab7044a 100644 --- a/_episodes/34-refactoring-decoupled-units.md +++ b/_episodes/34-decoupling.md @@ -1,5 +1,5 @@ --- -title: "Using Classes to De-Couple Code" +title: "Decoupling Code" teaching: 30 exercises: 45 questions: @@ -19,27 +19,33 @@ keypoints: ## Introduction -When we're thinking about units of code, one important thing to consider is -whether the code is **decoupled** (as opposed to **coupled**). -Two units of code can be considered decoupled if changes in one don't -necessitate changes in the other. -While two connected units can't be totally decoupled, loose coupling -allows for more maintainable code: +In software design, an important aspect is the extent its components and smaller units +as **coupled**. +Two units of code can be considered **decoupled** if a change in one does not +necessitate a change in the other. +While two connected units cannot always be totally decoupled, **loose coupling** +is something we should aim for. Benefits of decoupled code include: -* Loosely coupled code is easier to read as you don't need to understand the +* easier to read as you do not need to understand the detail of the other unit. -* Loosely coupled code is easier to test, as one of the units can be replaced - by a test or mock version of it. -* Loose coupled code tends to be easier to maintain, as changes can be isolated +* easier to test, as one of the units can be replaced + by a test or a mock version of it. +* code tends to be easier to maintain, as changes can be isolated from other parts of the code. -Introducing **abstractions** is a way to decouple code. +## Abstractions + +We have already mentioned abstractions as a principle that simplifies complexity by +hiding details and focusing on high-level view and efficiency. + + +Abstractions are a way of decoupling code. If one part of the code only uses another part through an appropriate abstraction then it becomes easier for these parts to change independently. -> ## Exercise: Decouple the file loading from the computation -> Currently the function is hard coded to load all the files in a directory. -> Decouple this into a separate function that returns all the files to load +> ## Exercise: Decouple Data Loading from Analysis +> Loading data from CSV files in a directory is baked into the `analyse_data()` function. +> Decouple this into a separate function that returns all the files to load. >> ## Solution >> You should have written a new function that reads all the data into the format needed >> for the analysis: @@ -56,45 +62,44 @@ then it becomes easier for these parts to change independently. >> def analyse_data(data_dir): >> data = load_inflammation_data(data_dir) >> daily_standard_deviation = compute_standard_deviation_by_data(data) ->> ... +>> ... >> ``` ->> This is now easier to understand, as we don't need to understand the the file loading ->> to read the statistical analysis, and we don't have to understand the statistical analysis ->> when reading the data loading. ->> Ensure you re-run our regression test to check this refactoring has not ->> changed the output of `analyse_data`. +>> The code is now easier to follow since we do not need to understand the the data loading from +>> files to read the statistical analysis, and vice versa - we do not have to understand the +>> statistical analysis when looking at data loading. +>> Ensure you re-run the regression tests to check this refactoring has not +>> changed the output of `analyse_data()`. > {: .solution} {: .challenge} -Even with this change, the file loading is coupled with the data analysis. -For example, if we wave to support reading JSON files or CSV files -we would have to pass into `analyse_data` some kind of flag indicating what we want. - -Instead, we would like to decouple the consideration of what data to load -from the `analyse_data`` function entirely. +However, even with this change, the data loading is still coupled with the data analysis. +For example, if we have to support loading data from different sources +(e.g. JSON files and CSV files), we would have to pass some kind of a flag indicating +what we want into `analyse_data()`. Instead, we would like to decouple the +consideration of what data to load from the `analyse_data()` function entirely. -One way we can do this is to use a language feature called a **class**. +One way we can do this is to use an object-oriented language feature called a *class*. -## Using Python Classes +## Classes -A class is a way of grouping together data with some specific methods. -In Python, you can declare a class as follows: +A class is a way of grouping together data with some specific methods on that data. +In Python, you can **declare** a class as follows: ```python class Circle: pass ``` -They are typically named using `UpperCase`. +They are typically named using "CapitalisedWords" naming convention. -You can then **construct** a class elsewhere in your code by doing the following: +You can then **construct** a class **instance** elsewhere in your code by doing the following: ```python my_circle = Circle() ``` -When you construct a class in this ways, the classes **construtor** is called. -It is possible to pass in values to the constructor that configure the class: +When you construct a class in this ways, the class' **constructor** is called. +It is also possible to pass in values to the constructor to configure the class instance: ```python class Circle: @@ -104,15 +109,15 @@ class Circle: my_circle = Circle(10) ``` -The constructor has the special name `__init__` (one of the so called "dunder methods"). -Notice it also has a special first parameter called `self` (called this by convention). +The constructor has the special name `__init__`. +Notice it has a special first parameter called `self` by convention. This parameter can be used to access the current **instance** of the object being created. A class can be thought of as a cookie cutter template, and the instances are the cookies themselves. That is, one class can have many instances. -Classes can also have methods defined on them. +Classes can also have other methods defined on them. Like constructors, they have an special `self` parameter that must come first. ```python @@ -130,11 +135,11 @@ Here the instance of the class, `my_circle` will be automatically passed in as the first parameter when calling `get_area`. Then the method can access the **member variable** `radius`. -> ## Exercise: Use a class to configure loading +> ## Exercise: Use Classes to Abstract out Data Loading > Put the `load_inflammation_data` function we wrote in the last exercise as a member method > of a new class called `CSVDataSource`. > Put the configuration of where to load the files in the classes constructor. -> Once this is done, you can construct this class outside the the statistical analysis +> Once this is done, you can construct this class outside the statistical analysis > and pass the instance in to `analyse_data`. >> ## Hint >> When we have completed the refactoring, the code in the `analyse_data` function @@ -172,7 +177,7 @@ Then the method can access the **member variable** `radius`. >> We can now pass an instance of this class into the the statistical analysis function. >> This means that should we want to re-use the analysis it wouldn't be fixed to reading >> from a directory of CSVs. ->> We have "decoupled" the reading of the data from the statistical analysis. +>> We have fully decoupled the reading of the data from the statistical analysis. >> ```python >> def analyse_data(data_source): >> data = data_source.load_inflammation_data() diff --git a/_episodes/35-refactoring-architecture.md b/_episodes/35-software-architecture.md similarity index 99% rename from _episodes/35-refactoring-architecture.md rename to _episodes/35-software-architecture.md index a00390828..3fad1388d 100644 --- a/_episodes/35-refactoring-architecture.md +++ b/_episodes/35-software-architecture.md @@ -1,5 +1,5 @@ --- -title: "Architecting Code to Separate Responsibilities" +title: "Software Architecture" teaching: 15 exercises: 50 questions: @@ -18,6 +18,7 @@ keypoints: ## Introduction +Separating Responsibilities Model-View-Controller (MVC) is a way of separating out different responsibilities of a typical application. Specifically we have: