Skip to content

Commit

Permalink
Added section on case sensitivity to Cassandra notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
khliland committed Oct 7, 2024
1 parent 5486f46 commit 42a84b8
Show file tree
Hide file tree
Showing 198 changed files with 5,338 additions and 1,667,211 deletions.
18 changes: 10 additions & 8 deletions D2Dbook/3_Data_sources/2_Databases/2_MySQL.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "slide"
Expand Down Expand Up @@ -203,12 +203,14 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": []
"tags": [
"raises-exception"
]
},
"outputs": [],
"source": [
Expand All @@ -220,7 +222,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "fragment"
Expand Down Expand Up @@ -248,7 +250,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
Expand Down Expand Up @@ -276,7 +278,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "fragment"
Expand All @@ -291,7 +293,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "fragment"
Expand Down Expand Up @@ -339,7 +341,7 @@
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "IND320_2024",
"language": "python",
"name": "python3"
},
Expand Down
112 changes: 111 additions & 1 deletion D2Dbook/3_Data_sources/2_Databases/3_Cassandra.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-17T10:25:34.787878Z",
Expand Down Expand Up @@ -190,6 +190,116 @@
"session.execute(\"INSERT INTO my_first_table (ind, company, model) VALUES (3, 'Polestar', '3');\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Query the data\n",
"rows = session.execute(\"SELECT * FROM my_first_table;\")\n",
"for i in rows:\n",
" print(i)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Case sensitivity\n",
"- Cassandra is by default case insensitive in column names.\n",
"- To use column names with capital letters, use double quotation marks both when creating tables and when inserting data.\n",
"- The effect of insensitivity may be surprising.\n",
" - Look carefully at the use of quotation marks and error message below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [],
"source": [
"session.set_keyspace('my_first_keyspace')\n",
"session.execute(\"DROP TABLE IF EXISTS my_first_keyspace.case_insensitive;\") # Starting from scratch every time\n",
"session.execute(\"CREATE TABLE IF NOT EXISTS case_insensitive (Capital int PRIMARY KEY, Letters text, Everywhere text);\")\n",
"session.execute(\"DROP TABLE IF EXISTS my_first_keyspace.case_sensitive;\") # Starting from scratch every time\n",
"session.execute(\"CREATE TABLE IF NOT EXISTS case_sensitive (\\\"Capital\\\" int PRIMARY KEY, \\\"Letters\\\" text, \\\"Everywhere\\\" text);\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"outputs": [],
"source": [
"session.execute(\"INSERT INTO case_insensitive (Capital, Letters, Everywhere) VALUES (1, 'Tesla', 'Model S');\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": [
"raises-exception"
]
},
"outputs": [],
"source": [
"session.execute(\"INSERT INTO case_sensitive (Capital, Letters, Everywhere) VALUES (1, 'Tesla', 'Model S');\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"outputs": [],
"source": [
"session.execute(\"INSERT INTO case_sensitive (\\\"Capital\\\", \\\"Letters\\\", \\\"Everywhere\\\") VALUES (1, 'Tesla', 'Model S');\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [],
"source": [
"# Query the data\n",
"rows = session.execute(\"SELECT * FROM case_insensitive;\")\n",
"for i in rows:\n",
" print(i)\n",
"rows = session.execute(\"SELECT * FROM case_sensitive;\")\n",
"for i in rows:\n",
" print(i)"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down
9 changes: 6 additions & 3 deletions D2Dbook/3_Data_sources/2_Databases/4_Spark.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,8 @@
" config('spark.sql.extensions', 'com.datastax.spark.connector.CassandraSparkExtensions').\\\n",
" config('spark.sql.catalog.mycatalog', 'com.datastax.spark.connector.datasource.CassandraCatalog').\\\n",
" config('spark.cassandra.connection.port', '9042').getOrCreate()\n",
"# Some warnings are to be expected."
"# Some warnings are to be expected.\n",
"# If running this cell does not give any output after ~30 seconds, there is likely an error in the configuration (JAVA_HOME, HADOOP_HOME, etc.)."
]
},
{
Expand Down Expand Up @@ -304,7 +305,9 @@
"## Write data to Cassandra\n",
"- One can append or overwrite data in existing database tables.\n",
"- PySpark is picky regarding data formats.\n",
" - Reading data from the existing table and extracting formatting is possible."
" - Reading data from the existing table and extracting formatting is possible.\n",
"- PySpark is case sensitive, while Cassandra is not by default.\n",
" - See example of case sensitvity issues in the [Cassandra notebook](./3_Cassandra.ipynb)."
]
},
{
Expand Down Expand Up @@ -414,7 +417,7 @@
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "IND320_2024",
"language": "python",
"name": "python3"
},
Expand Down
Loading

0 comments on commit 42a84b8

Please sign in to comment.