Azure Blob Storage Processing with Event Grid Triggers & Python Azure Functions using Managed Identity
The developers at Mystique Unicorn process files as soon as they arrive. They are currently using the Azure system managed identity. They are looking to scope the permissions of the function execution environment to the bare minimum.
They heard about Azure's capabilities for user managed identities. Can you help them implement this event processing at Mystique Unicorn?
Azure offers user assigned managed identities that can be narrowly scoped to necessary permissions. But there are few nitty gritty things that we need to remember when using user managed identities.
-
User managed identities and resource mapping are cached - The update interval is
24
hrs. Excerpt from docs,The back-end services for managed identities maintain a cache per resource URI for around 24 hours. If you update the access policy of a particular target resource and immediately retrieve a token for that resource, you may continue to get a cached token with outdated permissions until that token expires. There's currently no way to force a token refresh.
It is possible you may get errors due to this. My testing didn't show any errors, but it is possible.
-
The permission scoping for resources like cosmosdb data plane is slightlly different from say Storage Accounts. We need to use the
Microsoft.DocumentDB/databaseAccounts/sqlRoleDefinitions
to assign the permissions. TheMicrosoft.DocumentDB/databaseAccounts/sqlRoleAssignments
is used to assign the role to the user managed identity. -
The Azure function needs these environment variables for permissions like cosmos DB to work, Thanks to this blog
SUBSCRIPTION_ID
RESOURCE_GROUP
With this knowledge, we can process Blob Storage through the use of Azure Functions and Event Grid triggers with user managed identity. The Azure blob storage events for events like blob creation and deletion can be used to trigger the function. A sample event from event grid is shown below,
{
"id": "538fcf9f-3..-1024-801417067d3a",
"data": {
"api": "PutBlob",
"clientRequestId": "c0c0f290-ec..0bc9ef3b",
"requestId": "538fcf9f-3..01417000000",
"eTag": "0x8DB4E3BA4F8E488",
"contentType": "application/json",
"contentLength": 40,
"blobType": "BlockBlob",
"url": "https://warehouse6p5crf002.blob.core.windows.net/store-events-blob-002/source/7031_2023-05-06_event.json",
"sequencer": "0000000000000000000000.000005276ba",
"storageDiagnostics": { "batchId": "2901e730-b..-80d271000000" }
},
"topic": null,
"subject": "/blobServices/default/containers/store-events-blob-002/blobs/source/7031_2023-05-06_event.json",
"event_type": null
}
We can use this event as a trigger, retrieve the corresponding blob mentioned in data.url
using the input binding and persist the processed event back to Blob Storage using the output binding and cosmos DB. Although we can use a output binding for cosmos, we will use the python sdk to demonstrate the use of managed identity.
By leveraging the power of Bicep, all necessary resources can be easily provisioned and managed with minimal effort. Our solution uses Python for efficient event processing, allowing for quick and easy deployment of sophisticated event processing pipelines.
-
This demo, instructions, scripts and bicep template is designed to be run in
westeurope
. With few or no modifications you can try it out in other regions as well(Not covered here).- π Azure CLI Installed & Configured - Get help here
- π Bicep Installed & Configured - Get help here
- π VS Code & Bicep Extenstions - Get help here
-
-
Get the application code
https://github.com/miztiik/azure-blob-trigger-function-user-identity cd azure-blob-trigger-function-user-identity
-
-
Let check you have Azure Cli working with
# You should have azure cli preinstalled az account show
You should see an output like this,
{ "environmentName": "AzureCloud", "homeTenantId": "16b30820b6d3", "id": "1ac6fdbff37cd9e3", "isDefault": true, "managedByTenants": [], "name": "YOUR-SUBS-NAME", "state": "Enabled", "tenantId": "16b30820b6d3", "user": { "name": "miztiik@", "type": "user" } }
-
-
Stack: Main Bicep This will create the following resoureces
- General purpose Storage Account
- This will be used by Azure functions to store the function code
- Storage Account with blob container
- This will be used to store the events
- Event Grid Topic
- This will be used to trigger the Azure Function.
- Create a subscription to the topic, that filters for
Microsoft.Storage.BlobCreated
events specific to the blob container.
- Managed Identity
- This will be used by the Azure Function to access the Cosmos DB
- Python Azure Function
- Input, Trigger, Output Binding to the blob container for events
- Cosmos DB
- This will be used to store the processed events
# make deploy sh deployment_scripts/deploy.sh
After successfully deploying the stack, Check the
Resource Groups/Deployments
section for the resources. - General purpose Storage Account
-
-
-
Upload file(s) to blob
Get the storage account and container name from the output of the deployment. Upload a file to the container and check the logs of the function app to see the event processing in action.
Sample bash script to upload files to blob container. You can also upload manually from the portal,
# Set variables LOG_FILE="/var/log/miztiik-$(date +'%Y-%m-%d').json" COMPUTER_NAME=$(hostname) SLEEP_AT_WORK_SECS=0 LOG_COUNT=2 GREEN="\e[32m" CYAN="\e[36m" YELLOW="\e[33m" RESET="\e[0m" RESOURCE_GROUP="Miztiik_Enterprises_azure_blob_eventgrid_trigger_function_003" LOCATION="northeurope" SA_NAME="warehouseenx5vm003" CONTAINER_NAME="store-events-blob-003" for ((i=1; i<=LOG_COUNT; i++)) do FILE_NAME_PREFIX=$(openssl rand -hex 4) FILE_NAME="${RANDOM}_$(date +'%Y-%m-%d')_event.json" echo -n "{\"message\": \"hello world on $(date +'%Y-%m-%d')\" , \"timestamp\": \"$(date -u +"%Y-%m-%dT%H:%M:%SZ")\"}" > ${FILE_NAME} UPLOAD_STATUS=$(az storage blob upload \ --account-name ${SA_NAME} \ --container-name ${CONTAINER_NAME} \ --name "source/${FILE_NAME}" \ --file ${FILE_NAME} \ --no-progress \ --auth-mode login\ --output json | tr -d '\r') sleep 2 # echo -e ${GREEN}${UPLOAD_STATUS}${RESET} echo -e "\n ${YELLOW}($i/$LOG_COUNT)${RESET} Blob: ${GREEN}${FILE_NAME}${RESET} uploaded to container: ${CYAN}${CONTAINER_NAME}${RESET} in storage account: ${CYAN}${SA_NAME}${RESET}" done
You should see an output like this,
(1/2) Blob: 758_2023-05-13_event.json uploaded to container: store-events-blob-003 in storage account: warehouseenx5vm003
(2/2) Blob: 7893_2023-05-13_event.json uploaded to container: store-events-blob-003 in storage account: warehouseenx5vm003
-
-
Here we have demonstrated how to use Azure functions to process blob files and persist in cosmosDB.
If you want to destroy all the resources created by the stack, Execute the below command to delete the stack, or you can delete the stack from console as well
- Resources created during Deploying The Application
- Any other custom resources, you have created for this demo
# Delete from resource group
az group delete --name Miztiik_Enterprises_xxx --yes
# Follow any on-screen prompt
This is not an exhaustive list, please carry out other necessary steps as maybe applicable to your needs.
This repository aims to show how to Bicep to new developers, Solution Architects & Ops Engineers in Azure.
Thank you for your interest in contributing to our project. Whether it is a bug report, new feature, correction, or additional documentation or solutions, we greatly value feedback and contributions from our community. Start here
Buy me a coffee β.
- Azure Event Grid trigger for Azure Functions
- Blob Storage events
- Azure Blob Storage Input Binding
- Azure Blob Storage Ouput Binding
- Azure Event Grid Filters
- Miztiik Blog - Blob Storage Event Processing with Python Azure Functions
- Miztiik Blog - Blob Storage Processing with Python Azure Functions with HTTP Triggers
- Azure Docs - Managed Identity
- Azure Docs - Managed Identity Caching
- Azure Docs - Azure Function and User Assigned Managed Identities
Level: 200