-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpython2.qmd
executable file
·595 lines (412 loc) · 19.2 KB
/
python2.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
---
title: Python Unit 2
execute:
enabled: false
---
## Control flow
The Python interpreter executes the statements in a Python script one by one, and quits after the last statement.
```python
a = 4
b = 9
c = 5
mean = (a + b + c) / 3
print(mean)
```
Evidently, this basic flow of control only allows you to write the most simple programs. Therefore, Python supports compound statements that alter the control flow.
:::{.callout-note}
You may run all of the following examples in interactive mode. However, if you are writing a compound statement, entering several indented lines may be tedious. Thus, you might prefer to create one Python script per example (such as `func_test.py`) and then run this script (e.g., `python func_test.py`).
:::
### Functions
*Call* a function by writing
- function name
- left parenthesis
- arguments separated by commas
- right parenthesis
We already encountered `print()`, a so-called *built-in function* that prints its argument(s) to the standard output (stdout).
```python
print("Hello world!")
```
*Define* a function by writing
- keyword `def`
- function name
- left parenthesis
- parameters separated by commas
- right parenthesis
- colon
- indented statements (suite)
The following function calculates the mean of three numbers:
```python
def mean_of_three(a, b, c):
result = (a + b + c) / 3
return result
m = mean_of_three(5, 8, 2)
print(m)
```
A function may pass values to the caller by using the `return` statement. In the example above, Python assigns the return value to the variable `m`.
- A statement such as `return value1, value2, value3` tells a function to return *several* values.
- A function that *lacks* a `return` statement implicitly returns `None`.
In order to correctly use functions, we need to understand the difference between parameters and arguments:
- A *parameter* (also called “formal parameter”) is part of the function definition and specifies the number (and sometimes also the type) of input values the function may receive.
- An *argument* (also called “actual parameter”) is the value that is passed to a function when it is called.
When defining a function, we need to distinguish between required and optional parameters:
- A *required parameter* must receive a value when the function is called.
- An *optional parameter* may be omitted when the function is called. In this case, a default value that was specified in the function definition is assigned to the parameter.
Below, we define an exponential function. The parameter `exponent` is required; by contrast, `base` is optional, with a default value of 2.71828.
```python
def exp(exponent, base=2.71828):
return base ** exponent
print(exp(4)) # equal to exp(4, 2.71828)
print(exp(4, 10))
```
When calling a function, we need to distinguish between positional and keyword arguments:
- A *positional argument* is assigned to a parameter based on its position.
- A *keyword argument* is assigned to a parameter based on its key.
```python
print(exp(4, 10)) # 4 -> exponent, 10 -> base
print(exp(base=10, exponent=4)) # 10 -> base, 4 -> exponent
```
### If statements
if statements will execute a block of code if and only if a condition is true:
```python
n = -1
if n < 0:
print(n, "is negative")
```
Create an if statement by writing
- keyword `if`
- condition (an expression that yields a Boolean value)
- colon
- indented statements (suite)
Python executes the code block following the if statement if the condition is true. If Python should also do something if the *condition is false*, you must add an `else` block:
```python
n = 5
if n < 0:
print(n, "is negative")
else:
print(n, "is positive")
```
`elif` blocks allow you to *chain* if statements, i.e., to test an arbitrary number of conditions and execute a distinct set of statements for each condition. Thus, our final if statement, which also checks whether a provided number is zero, is:
```python
if n < 0:
print(n, "is negative")
elif n > 0:
print(n, "is positive")
else:
print(n, "is zero")
```
:::{.callout-note #example-function-with-if}
#### Example
Let's put the if statement which we developed above into a function. This will allow us to test several numbers for their sign. We will replace the `print()` statements with `return`, since our function should not print its result, but rather return it.
The function will format the result by means of an *f-string* (“formatted string literal”). An f-string is written as `f" ... "`. It behaves like a normal string, but each part enclosed in curly braces `{}` is interpreted as Python statement whose result is inserted into the string.
```python
f"An addition: {2+3}"
#> 'An addition: 5'
```
Store the following code in `test_sign.py` and execute the file with Python:
```python
def test_sign(n):
if n < 0:
return f"{n} is negative"
elif n > 0:
return f"{n} is positive"
else:
return f"{n} is zero"
print(test_sign(102))
print(test_sign(-12))
print(test_sign(0))
```
:::
### Loops
#### `while` loops
A `while` loop executes its suite as long as the given condition is true. Create such a loop by writing
- keyword `while`
- condition (an expression that yields a Boolean value)
- colon
- indented statements (suite, loop body)
Store the following code in `calc_powers.py` and execute the file with Python:
```python
i = 1
while i < 10000:
i = i * 2
print(i)
```
Typically, the condition will become false after several iterations of the loop body. Above, the variable `i` receives a new value whenever the loop body is run.
#### `for` loops
In Python, a `for` loop is a so-called *iterative loop*, since it executes the same code block once for each value in a sequence and may access the current element within the loop body.
Create a for loop by writing
- keyword `for`
- loop variable
- keyword `in`
- iterable object (e.g., string or list)
- colon
- indented statements (suite, loop body)
```python
for letter in "Python":
print(letter)
```
The loop above is executed six times, and during each run, the loop variable `letter` contains a different character of the string `"Python"` (i.e., the values `"P"`, `"y"`, `"t"`, `"h"`, `"o"`, and `"n"`).
:::{.callout-note #example-counter}
#### Example
A common theme is to iterate over a list and store some information on each list element for later. Below, we initialize a *counter variable* with zero. We then iterate over a list of numbers and add each element to the counter variable. After the loop has finished, this variable therefore contains the sum of all numbers in the list.
```python
counter = 0
for n in [1, 5, 13, 9, 4]:
counter = counter + n
print(counter)
#> 32
```
:::
Sometimes we wish to iterate over a long sequence of numbers. It would be cumbersome if we had to explicitly write a list containing the numbers from, say, 0 to 10000 – there must be a simpler approach, right? Indeed, there is the built-in `range()` function, which generates a sequence of numbers.
Depending on the numbers of arguments passed to `range()`, it will return
- a sequence of integers from 0 to a final value (excluded)
```python
for i in range(10):
print(i)
#> 0, 1, …, 9
```
- a sequence of integers from an initial value (included) to a final value (excluded)
```python
for i in range(2, 7):
print(i)
#> 2, 3, 4, 5, 6
```
- a sequence of integers as above, using the third argument as step size:
```python
for i in range(20, 3, -3):
print(i)
#> 20, 17, 14, 11, 8, 5
```
### Modularization
Modular programming is a technique that allows to split code into several files, each of which contains related functions and data types. Thereby, code may easily be reused, since a function
- has to be implemented only once in a *module* (also called *package*)
- and then may be used by every program that *imports* this module.
For instance, the `math` module defines common mathematical functions. (`math` is part of the Python *standard library*, whose modules are available on each Python installation. Other modules must be installed by the user.)
The keyword `import` imports a module and creates a *namespace* with the same name, which allows us to call a function from the module as follows:
- namespace
- `.`
- function name
```python
import math
math.factorial(7)
```
We also may select a different name for the namespace upon importing:
```python
import math as mathematics
mathematics.factorial(7)
```
Moreover, individual (or all) functions of a module may be imported into the *global namespace*:
```python
from math import factorial
factorial(7)
```
To increase readability of your code, you should place all import statements at the top of your Python script.
## Data structures
### Lists
Lists belong to the class of *compound data types*, which serve as a “collection” of other data types. Lists are created by enclosing comma-separated elements in brackets. For instance, the following list comprises the first eight prime numbers:
```python
primes = [2, 3, 5, 7, 11, 13, 17, 19]
primes
```
Indexing and slicing work [as explained for strings](python1.qmd#text-type), returning a single element or a new list containing the sliced elements, respectively:
```python
primes[3]
primes[2:5]
```
Since lists are *mutable*, you may change their elements:
```python
primes[3] = 100
primes
```
The built-in function `len()` returns the number of elements in a list:
```python
len(primes)
```
Importantly, you may iterate over a list via a `for` loop:
```python
for p in primes:
print(p)
```
The elements of a list may be lists themselves, which yields *nested lists*:
```python
nested_list = [
[1, 2, 3],
[4, 5],
[6, 7, 8, 9]
]
nested_list
```
Indexing and slicing also work for nested lists:
```python
nested_list[2]
nested_list[2][1]
nested_list[2][1:3]
```
Since lists are [objects](python3.qmd#object-oriented-programming), they may be manipulated by special functions called *methods*. Methods are called like functions, but refer to a given object. Thus, the syntax for calling a method is as follows:
- name of the object
- `.`
- name of the method
- left parenthesis etc. (like a function call)
Here are some of the methods supported by lists (check the contents of `bases` after every statement!):
```python
bases = ["A", "G", "X", "Y"]
bases.append("Z") # append an element to the list
del bases[2] # delete element with index 2
l = bases.pop() # remove the last element and return it
bases.extend(["T", "U"]) # appends elements from another list
bases.insert(1, "C") # insert the element given by the second argument
# at the index given by the first argument
bases.reverse() # reverse order of elements
bases.clear() # delete all elements
```
### Tuples
Tuples are similar to lists, since they also represent a sequence of items. In contrast to lists, however, tuples are *immutable* – once you have created a tuple, you can't change it. Tuples support common list operations:
```python
bases = ("A", "C", "C", "G", "T")
"A" in bases # check whether the tuple contains an element
bases[1:3] # slicing
len(bases) # number of elements
min(bases) # smallest element
max(bases) # largest element
bases.count("C") # count the number of a given element
bases.index("G") # find the index of a given element
```
### Sets
The *set* is a sequential data type that represents a mathematical set – thus, sets are *unordered*. Both mutable and immutable variants are available (`set` and `frozenset`, respectively). We create a set by placing their elements between curly braces, or by converting from another sequential type:
```python
A = {2, 4, 5} # variant 1: direct
A1 = set([2, 4, 5]) # variant 2: converting a list
A == A1 # are these sets equivalent?
```
Besides adding and removing elements, you may also perform the classical set operations:
```python
B = {1, 2, 3, 4, 5}
B.discard(5) # remove element if it exists
B.add(7) # add element
B.add(7) # set does not change – each element must be unique!
A | B # union (OR)
A & B # intersection (AND)
A - B # difference
B - A # note that the difference is not commutative
A ^ B # symmetric difference (XOR)
```
### Dictionaries
Dictionaries (“dicts”) are unordered collections of *key-value pairs*, where keys have to be unique. Create a dict
- either by separating key-value pairs by colons and enclosing comma-separated pairs with curly braces,
```python
atom_names = {"C": "carbon", "H": "hydrogen", "N": "nitrogen"}
atom_names
```
- or by using the built-in function `dict()`, to which we supply the key-value-pairs as keyword arguments:
```python
atom_names_2 = dict(C="carbon", H="hydrogen", N="nitrogen")
atom_names_2
atom_names_2 == atom_names
```
Typical dict operations include inserting, deleting, and – most importantly – searching:
```python
atom_names["C"] # find the value associated with key "S"
atom_names["O"] = "oxygen" # insert a new key-value pair
del atom_names["H"] # delete the value associated with "H"
```
You may iterate over all key-value pairs in a dict via the `.items()` method and a `for` loop:
```python
for key, value in atom_names.items():
print(key, value)
```
## Exercises
Store all files that you generate for unit 2 in the folder `python2` in your home directory.
### Exercise 2.1 (3 P)
You have successfully solved this exercise as soon as you have worked through this unit. In particular, the folder `python2` must contain the following files, which you have created in the course of this unit:
- `test_sign.py`
- `calc_powers.py`
### Exercise 2.2 (3 P)
Implement a function `get_charge` that checks whether an amino acid is positively charged (e.g., arginine), negatively charged (e.g., aspartate), or neutral (e.g., valine).
- The function should be called with a single argument that specifies the amino acid using its single-letter abbreviation. Only the 21 eukaryotic proteinogenic amino acids should be considered.
- Depending on the charge, the function should return one of the strings `"positive"`, `"negative"`, or `"neutral"`.
- If the user does not supply a valid abbreviation, the function should return `"invalid input"`.
Store the function in `exercise_2_2.py`.
You may check that your program works correctly by using the following exemplary calls:
```python
print(get_charge("D"))
#> "negative"
print(get_charge("F"))
#> "neutral"
print(get_charge("H"))
#> "positive"
print(get_charge("foo"))
#> "invalid input"
```
:::{.callout-tip collapse="true"}
You have [learned](#example-function-with-if) how to put an if statement into a function. In the section on [tuples](#tuples), you checked whether a tuple contained a given element. A similar test will be required for the if statement in this exercise.
:::
### Exercise 2.3 (3 P)
Implement a function `get_average_gc` that calculates the average GC-content of several DNA or RNA sequences.
- The function should be called with a single argument. It will receive a list containing an arbitrary number of strings, each of which represents a nucleotide sequence.
- The function should return a single number, i.e., the average GC-content across all sequences.
Store the function in `exercise_2_3.py`.
You may check that your program works correctly by using the following exemplary calls:
```python
print(get_average_gc(["CGACCACATTAATGGACTAC", "CTTGGATTATCACCCCCGTC"]))
#> 0.5
print(get_average_gc(["CCACCGAGACGCCAG", "CGGCCTCCCGAC", "GCCCGGGCCCGCGGGGGT"]))
#> 0.837037037037037
```
:::{.callout-tip collapse="true"}
- Use a [for loop](#for-loops) to iterate over the strings in the list.
- You will need a [counter variable](#example-counter).
- `s.count("X")` counts the number of times the character `"X"` appears in the string stored in `s`.
- The `len()` function also works on strings.
:::
### Exercise 2.4 (3 P)
If you code a lot, you will often use third-party packages and consult their documentation. This is what we will do in this exercise: Implement a function `make_sequence` that creates a random DNA or RNA sequence of a given length and calculates how often a given base appears in this sequence.
- The first (required) parameter sets the sequence length.
- The second (optional) parameter `rna` decides whether an RNA or DNA sequence should be created. This parameter has a Boolean type and is false by default.
- The third (optional) parameter `count_base` names the base that should be counted (default: `"A"`).
- The function should return two values: (1) A string representing the created sequence, and (2) how often the specified base appears.
Since the function should create the sequence randomly, you might consider to use the function `choices()` available in the `random` module. This module is part of the standard library – please read the [documentation](https://docs.python.org/3/library/index.html). You will also need the string method `join()` to convert the value returned by `choices()` into a string. Since `string` is a built-in data type, it is also documented in the standard library.
Store the function in `exercise_2_4.py`.
You may check that your program works correctly by using the following exemplary calls:
```python
random.seed(42)
print(make_sequence(20, False, "A"))
#> ('GACAGGTACAAGAAGGAGTA', 9)
print(make_sequence(15, rna=True, count_base="C"))
#> ('UGCAUCAAUGUGGUC', 3)
```
:::{.callout-tip collapse="true"}
- Exactly follow the specification of function parameters and return values:
- Should a parameter have a certain name?
- If a parameter is optional, what is its default value?
- How many values should the function return?
See the [functions](#functions) section for details.
- Remember how to [import](#modularization) a function from a module.
:::
### Exercise 2.5 (3 P)
In this exercise, you will create a command line program that calculates the similarity of two tissue samples by means of their expressed genes.
- The program should process two command line arguments `genes1` and `genes2`.
- Each argument specifices the genes that are expressed in the respective sample. Individual genes should be separated by semicolons (e.g., `GENE1;GENE2;GENE3;GENE4`). It does not matter whether a gene appears more than once.
- The program should print three values, each on a separate line:
1. the number of (unique) genes detected in the first sample
2. the number of (unique) genes detected in the second sample
3. the similarity of the samples, which is measured by the [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index).
Store the function in `exercise_2_5.py`.
You may check that your program works correctly by using the following exemplary calls:
```default
$ python exercise_2_5.py "RAB5C;PITX2;ZNF222;LMTK2;LMTK2" "RAB5C;PABPC4;ZNF222;PKN1;PTMA"
4
5
0.2857142857142857
$ python exercise_2_5.py "THOC5;RAD23B;GPR31;PIRC85;PANO1" "THOC5;ATP1A2;GPR31;THOC5"
5
3
0.3333333333333333
$ python exercise_2_5.py "RAD23B" "RAD23B"
1
1
1.0
```
:::{.callout-tip collapse="true"}
- Your program has to process [command line arguments](python1.qmd#command-line-arguments).
- To split each argument into individual genes, the string method `split()`may be useful – see [documentation](https://docs.python.org/3/library/index.html).
- Consult the [sets](#sets) section on set creation from lists and calculation of union and intersection.
:::