|
| 1 | +# Filtering data in SQLAlchemy |
| 2 | + |
| 3 | +The purpose of these laboratory classes is to familiarize participants with methods to creating and execute select query with conditions. |
| 4 | + |
| 5 | +The scope of this classes: |
| 6 | +- using select() - to creating select query |
| 7 | +- using query() - to creating query |
| 8 | +- and_() , or_(), in_() - to add conditions to query |
| 9 | +- order_by() - to sort results |
| 10 | +- label() - to make alias |
| 11 | +- limit() - to limit results of query |
| 12 | + |
| 13 | +## Introduction |
| 14 | +From the previous classes we know two methods of creating a database model in SQLAlchemy based on: |
| 15 | +- [mapper](https://docs.sqlalchemy.org/en/13/orm/mapping_api.html#sqlalchemy.orm.mapper) |
| 16 | +- [Class representation](https://docs.sqlalchemy.org/en/13/orm/tutorial.html) |
| 17 | + |
| 18 | +For both, we must first connect to the database |
| 19 | + |
| 20 | +```python |
| 21 | + |
| 22 | +from sqlalchemy import create_engine |
| 23 | + |
| 24 | +engine = create_engine(url_to_database) |
| 25 | +``` |
| 26 | + |
| 27 | +We can use a script to initialize mapper operation: |
| 28 | + |
| 29 | +```python |
| 30 | +from sqlalchemy import create_engine, MetaData, Table |
| 31 | + |
| 32 | +metadata = MetaData() |
| 33 | + |
| 34 | +dic_table = {} |
| 35 | +for table_name in engine.table_names(): |
| 36 | + dic_table[table_name] = Table(table_name, metadata , autoload=True, autoload_with=engine) |
| 37 | + |
| 38 | +print(repr(dic_table['category'])) |
| 39 | +``` |
| 40 | +Where `dic_table` is the dictionary with tables representation where the key is the name of the table. The last line in the script above shows references to the table representation named *category*. |
| 41 | + |
| 42 | +If we want youse Object representation we need run script: |
| 43 | + |
| 44 | +```python |
| 45 | +from sqlalchemy.orm import sessionmaker |
| 46 | +from sqlalchemy.ext.declarative import declarative_base |
| 47 | + |
| 48 | +from sqlalchemy import Column, Integer, String, Date, ForeignKey |
| 49 | + |
| 50 | +session = (sessionmaker(bind=engine))() |
| 51 | + |
| 52 | +Base = declarative_base() |
| 53 | + |
| 54 | +class Category(Base): |
| 55 | + __tablename__ = 'category' |
| 56 | + category_id = Column(Integer, primary_key=True) |
| 57 | + name = Column(String(50)) |
| 58 | + last_update = Column(Date) |
| 59 | + def __str__(self): |
| 60 | + return 'Category id:{0}\nCategory name: {1}\nCategory last_update: {2}'.format(self.category_id,self.name,self.last_update) |
| 61 | +``` |
| 62 | +At the moment we are ready to start creating database queries. The advantage of using ORM is that you don't have to rewrite queries when changing the database engine. The disadvantage, however, is that we are limited by query structures imposed by ORM. |
| 63 | + |
| 64 | +If this does not suit us, we can of course run a query written by us: |
| 65 | + |
| 66 | +```python |
| 67 | +stmt = 'select * from category' |
| 68 | + |
| 69 | +results = engine.execute(stmt).fetchall() |
| 70 | + |
| 71 | +print(results) |
| 72 | +``` |
| 73 | + |
| 74 | +## Basic select |
| 75 | + |
| 76 | +To make query we can use script: |
| 77 | + |
| 78 | +```python |
| 79 | +from sqlalchemy import select |
| 80 | + |
| 81 | +# select * from category |
| 82 | + |
| 83 | +mapper_stmt = select([dic_table['category']]) |
| 84 | +print('Mapper select: ') |
| 85 | +print(mapper_stmt) |
| 86 | + |
| 87 | +session_stmt = session.query(Category) |
| 88 | +print('\nSession select: ') |
| 89 | +print(session_stmt) |
| 90 | +``` |
| 91 | + |
| 92 | +```sql |
| 93 | +Mapper select: |
| 94 | +SELECT category.category_id, category.name, category.last_update |
| 95 | +FROM category |
| 96 | + |
| 97 | +Session select: |
| 98 | +SELECT category.category_id AS category_category_id, category.name AS category_name, category.last_update AS category_last_update |
| 99 | +FROM category |
| 100 | +``` |
| 101 | +As can be seen in the case of a query based on the class session, aliases are added to the names of the columns returned. This is the only difference at this stage of building queries. |
| 102 | + |
| 103 | +To run a query based on the select class: |
| 104 | +```python |
| 105 | +mapper_results = engine.execute(mapper_stmt).fetchall() |
| 106 | +print(results) |
| 107 | +``` |
| 108 | +As a result of the script, we get a list of tuples representing the values of table rows. Examples: |
| 109 | + |
| 110 | +```python |
| 111 | +[(1, 'Action', datetime.datetime(2006, 2, 15, 9, 46, 27)), (2, 'Animation', datetime.datetime(2006, 2, 15, 9, 46, 27)), (3, 'Children', datetime.datetime(2006, 2, 15, 9, 46, 27)), (4, 'Classics', datetime.datetime(2006, 2, 15, 9, 46, 27)), (5, 'Comedy', datetime.datetime(2006, 2, 15, 9, 46, 27)), (6, 'Documentary', datetime.datetime(2006, 2, 15, 9, 46, 27)), (7, 'Drama', datetime.datetime(2006, 2, 15, 9, 46, 27)), (8, 'Family', datetime.datetime(2006, 2, 15, 9, 46, 27)), (9, 'Foreign', datetime.datetime(2006, 2, 15, 9, 46, 27)), (10, 'Games', datetime.datetime(2006, 2, 15, 9, 46, 27)), (11, 'Horror', datetime.datetime(2006, 2, 15, 9, 46, 27)), (12, 'Music', datetime.datetime(2006, 2, 15, 9, 46, 27)), (13, 'New', datetime.datetime(2006, 2, 15, 9, 46, 27)), (14, 'Sci-Fi', datetime.datetime(2006, 2, 15, 9, 46, 27)), (15, 'Sports', datetime.datetime(2006, 2, 15, 9, 46, 27)), (16, 'Travel', datetime.datetime(2006, 2, 15, 9, 46, 27))] |
| 112 | +``` |
| 113 | +This form of results presentation is inconvenient if we use objectivity in all our software. To return results as a class, use the formula |
| 114 | + |
| 115 | +```python |
| 116 | +session_results = session_stmt.all() |
| 117 | +# all results print |
| 118 | +print(All results: ) |
| 119 | +print(session_results) |
| 120 | +# print information from first category in result list |
| 121 | +print(\nFirst category:) |
| 122 | +print(session_results[0]) |
| 123 | +``` |
| 124 | + |
| 125 | +```python |
| 126 | +All results: |
| 127 | +[<__main__.Category object at 0x000001F996CB8588>, <__main__.Category object at 0x000001F996CB83C8>, <__main__.Category object at 0x000001F996CB8FC8>, <__main__.Category object at 0x000001F996CB8948>, <__main__.Category object at 0x000001F996C97F88>, <__main__.Category object at 0x000001F996C97988>, <__main__.Category object at 0x000001F996C97EC8>, <__main__.Category object at 0x000001F996C97DC8>, <__main__.Category object at 0x000001F996C97B08>, <__main__.Category object at 0x000001F996C97C48>, <__main__.Category object at 0x000001F996C97C08>, <__main__.Category object at 0x000001F996C974C8>, <__main__.Category object at 0x000001F996C97CC8>, <__main__.Category object at 0x000001F996C7CB88>, <__main__.Category object at 0x000001F996C7CAC8>, <__main__.Category object at 0x000001F996C6A1C8>] |
| 128 | + |
| 129 | +First category: |
| 130 | +Category id:1 |
| 131 | +Category name: Action |
| 132 | +Category last_update: 2006-02-15 09:46:27 |
| 133 | +``` |
| 134 | +As you can easily see in this case, the overloaded operator operator ** __ str __ **. This approach is very useful in implementing business logic. |
| 135 | + |
| 136 | +If we want to create a query for selected columns then we use the following pattern: |
| 137 | + |
| 138 | +```python |
| 139 | +mapper_stmt = select([dic_table['category'].columns.category_id,dic_table['category'].columns.name]) |
| 140 | + |
| 141 | +session_stmt = session.query(Category.category_id, Category.name) |
| 142 | +``` |
| 143 | +In this case, the query will return a list of results in both cases. If you want to use object mapping, create a class and set session query in this way: |
| 144 | + |
| 145 | +```python |
| 146 | +class Category_filter(Base): |
| 147 | + __tablename__ = 'category' |
| 148 | + __table_args__ = {'extend_existing': True} |
| 149 | + category_id = Column(Integer, primary_key=True) |
| 150 | + name = Column(String(50)) |
| 151 | + def __str__(self): |
| 152 | + return 'Category id:{0}\nCategory name: {1}'.format(self.category_id,self.name) |
| 153 | + |
| 154 | +q = session.query(Category_filter) |
| 155 | +print(q) |
| 156 | +``` |
| 157 | + |
| 158 | + |
| 159 | +## Select with conditions |
| 160 | + |
| 161 | +To start filtering according to a given criterion: |
| 162 | +- mapper option: |
| 163 | +```python |
| 164 | +mapper_stmt = select([dic_table['category'].columns.category_id,dic_table['category'].columns.name]).where(dic_table['category'].columns.name == 'Games') |
| 165 | + |
| 166 | +``` |
| 167 | +- session option: |
| 168 | +```python |
| 169 | +session_stmt = session.query(Category.category_id, Category.name).filter(Category.name == 'Games') |
| 170 | + |
| 171 | +``` |
| 172 | + |
| 173 | +We can also use logical conditions, such as:: |
| 174 | +- or_ |
| 175 | +- and_ |
| 176 | +- in_ |
| 177 | + |
| 178 | +Example of use or_ and and_ in one query: |
| 179 | +```python |
| 180 | +from sqlalchemy import or_, and_ |
| 181 | + |
| 182 | +mapper_stmt = select([dic_table['category'].columns.category_id,dic_table['category'].columns.name]).\ |
| 183 | +where(and_(\ |
| 184 | + or_(dic_table['category'].columns.category_id > 10,dic_table['category'].columns.category_id < 2), \ |
| 185 | + or_(dic_table['category'].columns.category_id > 3,dic_table['category'].columns.category_id < 5))) |
| 186 | + |
| 187 | +session_stmt = session.query(Category_filter).\ |
| 188 | +filter(and_(\ |
| 189 | + or_(Category_filter.category_id > 10,Category_filter.category_id < 2), \ |
| 190 | + or_(Category_filter.category_id > 3,Category_filter.category_id < 5))) |
| 191 | +``` |
| 192 | + |
| 193 | +If we also want to use the in_ function: |
| 194 | +```python |
| 195 | + |
| 196 | +mapper_stmt = select([dic_table['category'].columns.category_id,dic_table['category'].columns.name]).\ |
| 197 | +where(and_(\ |
| 198 | + or_(dic_table['category'].columns.category_id > 10,dic_table['category'].columns.category_id < 2),\ |
| 199 | + or_(dic_table['category'].columns.category_id > 3,dic_table['category'].columns.category_id < 5),\ |
| 200 | + dic_table['category'].columns.name.in_(['Sci-Fi','Horror','Action']) |
| 201 | + )) |
| 202 | + |
| 203 | +session_stmt = session.query(Category_filter).\ |
| 204 | +filter(and_(\ |
| 205 | + or_(Category_filter.category_id > 10,Category_filter.category_id < 2), \ |
| 206 | + or_(Category_filter.category_id > 3,Category_filter.category_id < 5)),\ |
| 207 | + Category_filter.name.in_(['Sci-Fi','Horror','Action']) |
| 208 | + ) |
| 209 | +``` |
| 210 | + |
| 211 | +## Sort results in query |
| 212 | +In both cases it is possible to sort using the order_by function. For ascending sorting, the harvest will look like this: |
| 213 | +```python |
| 214 | +mapper_stmt = select([dic_table['category'].columns.category_id,dic_table['category'].columns.name]).\ |
| 215 | +where(and_(\ |
| 216 | + or_(dic_table['category'].columns.category_id > 10,dic_table['category'].columns.category_id < 2), \ |
| 217 | + or_(dic_table['category'].columns.category_id > 3,dic_table['category'].columns.category_id < 5))).\ |
| 218 | +order_by(dic_table['category'].columns.name) |
| 219 | + |
| 220 | +mapper_results = db.execute(mapper_stmt).fetchall() |
| 221 | + |
| 222 | +print(mapper_results) |
| 223 | +``` |
| 224 | +```python |
| 225 | +[(1, 'Action'), (11, 'Horror'), (12, 'Music'), (13, 'New'), (14, 'Sci-Fi'), (15, 'Sports'), (16, 'Travel')] |
| 226 | + |
| 227 | +``` |
| 228 | +And in reverse: |
| 229 | + |
| 230 | +```python |
| 231 | +mapper_stmt = select([dic_table['category'].columns.category_id,dic_table['category'].columns.name]).\ |
| 232 | +where(and_(\ |
| 233 | + or_(dic_table['category'].columns.category_id > 10,dic_table['category'].columns.category_id < 2), \ |
| 234 | + or_(dic_table['category'].columns.category_id > 3,dic_table['category'].columns.category_id < 5))).\ |
| 235 | +order_by(dic_table['category'].columns.name.desc()) |
| 236 | + |
| 237 | +mapper_results = db.execute(mapper_stmt).fetchall() |
| 238 | + |
| 239 | +print(mapper_results) |
| 240 | +``` |
| 241 | +```python |
| 242 | +[(16, 'Travel'), (15, 'Sports'), (14, 'Sci-Fi'), (13, 'New'), (12, 'Music'), (11, 'Horror'), (1, 'Action')] |
| 243 | +``` |
| 244 | + |
| 245 | +The same applies to sessions: |
| 246 | + |
| 247 | +```python |
| 248 | +session_stmt_asc= session.query(Category_filter).\ |
| 249 | +filter(and_(\ |
| 250 | + or_(Category_filter.category_id > 10,Category_filter.category_id < 2), \ |
| 251 | + or_(Category_filter.category_id > 3,Category_filter.category_id < 5))).\ |
| 252 | +order_by(Category_filter.name) |
| 253 | + |
| 254 | +session_stmt_desc= session.query(Category_filter).\ |
| 255 | +filter(and_(\ |
| 256 | + or_(Category_filter.category_id > 10,Category_filter.category_id < 2), \ |
| 257 | + or_(Category_filter.category_id > 3,Category_filter.category_id < 5))).\ |
| 258 | +order_by(Category_filter.name.desc()) |
| 259 | +``` |
| 260 | + |
| 261 | +## Alias name |
| 262 | + |
| 263 | +Of course, you can also enter aliases for names via the label function. Examples of use: |
| 264 | +```python |
| 265 | +mapper_stmt = select([dic_table['category'].columns.category_id.label('id'),dic_table['category'].columns.name.label('category name')]) |
| 266 | +print(mapper_stmt) |
| 267 | +``` |
| 268 | +```sql |
| 269 | +SELECT category.category_id AS id, category.name AS "category name" |
| 270 | +FROM category |
| 271 | +``` |
| 272 | +```python |
| 273 | +session_stmt= session.query(Category_filter.category_id.label('id'), Category_filter.name.label('category name')) |
| 274 | +print(session_stmt) |
| 275 | + |
| 276 | +``` |
| 277 | +```sql |
| 278 | +SELECT category.category_id AS id, category.name AS "category name" |
| 279 | +FROM category |
| 280 | +``` |
| 281 | + |
| 282 | + |
| 283 | +## Limits on the results in query |
| 284 | +To limit the number of records returned by the database, we can use the limit function. Her work is illustrated by examples: |
| 285 | +```python |
| 286 | +mapper_stmt = select([dic_table['category'].columns.category_id.label('id'),dic_table['category'].columns.name.label('category name')]).limit(3) |
| 287 | +print(mapper_stmt) |
| 288 | +``` |
| 289 | +```sql |
| 290 | +SELECT category.category_id AS id, category.name AS "category name" |
| 291 | +FROM category |
| 292 | +LIMIT :param_1 |
| 293 | +``` |
| 294 | +```python |
| 295 | +session_stmt= session.query(Category_filter.category_id.label('id'), Category_filter.name.label('category name')).limit(3) |
| 296 | +print(session_stmt) |
| 297 | + |
| 298 | +``` |
| 299 | +```sql |
| 300 | +SELECT category.category_id AS id, category.name AS "category name" |
| 301 | +FROM category |
| 302 | +LIMIT %(param_1)s |
| 303 | +``` |
| 304 | +## Exercise |
| 305 | + |
| 306 | +Use all of these methods to create queries for the test database. Check their execution time using the [profiling and timing code methods](https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html). |
| 307 | + |
| 308 | +For queries: |
| 309 | +1. How many categories of films we have in the rental? |
| 310 | +2. Display list of categories in alphabetic order. |
| 311 | +3. Find the oldest and youngest film in rental. |
| 312 | +4. How many rentals were in between 2005-07-01 and 2005-08-01? |
| 313 | +5. How many rentals were in between 2010-01-01 and 2011-02-01? |
| 314 | +6. Find the biggest payment in the rental. |
| 315 | +7. Find all customers from Polend or Nigeria or Bangladesh. |
| 316 | +8. Where live staff memebers? |
| 317 | +9. How many staff members live in Argentina or Spain? |
| 318 | +10. Which categories of the films were rented by clients? |
| 319 | +11. Find all categories of films rented in America. |
| 320 | +12. Find all title of films where was playe: Olympia Pfeiffer or Julia Zellweger or Ellen Presley |
| 321 | + |
| 322 | + |
| 323 | + |
0 commit comments