Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key Storage Provider not registered in Docker Image #89

Closed
cj-lopez opened this issue Jan 25, 2021 · 51 comments
Closed

Key Storage Provider not registered in Docker Image #89

cj-lopez opened this issue Jan 25, 2021 · 51 comments
Assignees

Comments

@cj-lopez
Copy link

Venafi created a Key Storage Provider and is running into the following problem:

Key Storage Provider (KSP) is not available after loading a container image with it installed. If the Docker host has the KSP also installed, then there is no issues, but I should not need to install the KSP on the Docker host. If I register the KSP manually after the docker image is loaded the KSP works without any issues.

What assistance are you requesting from Microsoft?
My belief is there is a bug or logic bug with how containers are handling KSPs or perhaps bcrypt in general. I would like to see this issue investigated and resolved. I do not believe there is in issue with the KSP developed by Venafi (my company) for the following reasons:

  • Our KSP works without any issues when installing on a base operating system.
  • Our KSP works without any issue if we force registration after the container has loaded.
  • Other KSPs (eg. Safenet Luna) appear to have the exact same issue.

TROUBLESHOOTING INFORMATION:
I have a Dockerfile and an MSI of Venafi’s KSP that can be run to reproduce the issue.

High-level overview:

  • Deploy container image for Windows Server Code LTSC 2019
  • Add Powershell to the image
  • Install Venafi’s Key Storage Provider
  • Launch the container
  • Run certutil -csplist
    Notice only Venafi’s CSP is available, the KSP is not available.

If I run the following command:
i. Regsvr32 c:\windows\system32\venaficsp.dll
ii. Then re-run certutil -csplist
iii. The KSP is then available

All of this was performed with isolation mode set to hyperv.

I can reproduce Venafi's issue. I even added the regsvr32 command to the dockerfile and that had no effect. So it looks like for whatever reason, the registry changes that a KSP makes is not being persisted.

Can someone help investigate?

@ghost ghost added the triage New and needs attention label Jan 25, 2021
@vrapolinario vrapolinario self-assigned this Jan 25, 2021
@vrapolinario vrapolinario added Security & Identity and removed triage New and needs attention labels Jan 25, 2021
@iamjplant
Copy link

iamjplant commented Jan 28, 2021

I have observed the exact same class of behavior with my company's KSP (not the same KSP as above).

@jwbrothers
Copy link

This is not Venafi specific. I have also verified this issue is the same if you use the SafeNet Luna client. This issue will be seen if the KSP is not installed on the host machine. If I install the KSP on the host node, it is always available in the container.

The work-around of registering the KSP post load of the container should not be necessary.

@ghost
Copy link

ghost commented Mar 4, 2021

This issue has been open for 30 days with no updates.
@vrapolinario, please provide an update or close this issue.

@jwbrothers
Copy link

I am honestly surprised Microsoft is not actively looking at this issue. Seems like the crypto layer is not being initialized properly.

@ghost
Copy link

ghost commented Apr 23, 2021

This issue has been open for 30 days with no updates.
@vrapolinario, please provide an update or close this issue.

@vrapolinario
Copy link
Contributor

Sorry for the delay on this. We do have an internal bug created to investigate, but it's stalled for some reason. I just pinged the teams that own the internal but to follow up. Will let this thread know once I hear back.

@vrapolinario
Copy link
Contributor

For internal MS tracking, we have a bug for investigation and fix: 31877422.

@ghost
Copy link

ghost commented Jun 12, 2021

This issue has been open for 30 days with no updates.
@vrapolinario, please provide an update or close this issue.

1 similar comment
@ghost
Copy link

ghost commented Jul 12, 2021

This issue has been open for 30 days with no updates.
@vrapolinario, please provide an update or close this issue.

@vrapolinario
Copy link
Contributor

Unfortunately this is still under works. We'll update as soon as we have any news.

@ghost
Copy link

ghost commented Nov 8, 2021

This issue has been open for 90 days with no updates.
@weijuans-msft, please provide an update or close this issue.

@hotchkj
Copy link

hotchkj commented Feb 21, 2022

As this has been open more than a year, is there any timescale for a fix on this? This impacts signing infrastructure & the ability to bundle providers as part of images without compromising container security.

In addition, this issue is tagged as In Progress but is on the Project's Backlog which is quite confusing as milestone handling goes.

@cwilhit
Copy link
Contributor

cwilhit commented Mar 24, 2022

Agreed, that is confusing. I will look into this.

@cwilhit
Copy link
Contributor

cwilhit commented Apr 6, 2022

Update: re-engaged the dev team who owns the code associated with this area. If and when there is an ETA I will share here.

@iamjplant
Copy link

Is there a public roadmap you could link us to?

@cwilhit
Copy link
Contributor

cwilhit commented Apr 19, 2022

Hey @iamjplant, there is not (the roadmap on this GitHub ideally, but the team needs to improve hygiene as it's very out of date).

@michbern-ms
Copy link
Contributor

I jumped in on this issue and sat down to reproduce it.

• Took an up-to-date Windows Server 2022 host VM that
• Set it up for Containers: Prep Windows operating system containers | Microsoft Docs
• Downloaded a Dockerfile and Venafi install package that @cj-lopez had provided to me offline.
• docker build --isolation=hyperv .
• docker run -it cmd.exe
• certutil -csplist

Screenshot 2022-05-18 124106

It appears to be working. I know other folks had run into this, so it would be helpful to know if anyone is still seeing this issue and could help me reproduce it. Thanks.

@iamjplant
Copy link

iamjplant commented May 19, 2022

I have not been building or running with isolation mode explicitly set to anything, FWIW.

The other obvious difference that I am noticing is the OS. My FROM:
image

Just double-checked, and I still see the issue. As part of the docker build process i register my KSP, but when i run the container it is no longer registered, and i have to register it again.

@michbern-ms
Copy link
Contributor

michbern-ms commented May 19, 2022

@iamjplant Just double checking - what host OS (edition and version) are you building from?

The original issue above has a big, bold sentence about running in hyper-v isolation, so I assumed that was relevant.

@iamjplant
Copy link

My host OS:
image

image

@michbern-ms
Copy link
Contributor

Thank you. I'll need to see if I can reproduce it with that host/guest combination.

@michbern-ms
Copy link
Contributor

It's been a busy few weeks with people-management stuff for me. I did update my dockerfile with precisely the FROM statement that @iamjplant provided:

FROM mcr.microsoft.com/windows/servercore:ltsc2019-amd64

And it still works for me.

image

I'm starting to run out of avenues of investigation here.

@iamjplant
Copy link

Bummer...
How about the host OS? I just simplified my repro dockerfile and can still get the error. Is it possible this was fixed after Windows 10 20H1 or in the Server class OS?

@jwbrothers
Copy link

@imjplant the KSP still needs to be installed in the Docker container as well for my host comment to work.

@iamjplant
Copy link

@jwbrothers agreed.

@michbern-ms
Copy link
Contributor

michbern-ms commented Jun 10, 2022

Thank you, @jwbrothers - I can finally reproduce this on a Server 2022 host / Server 2019 guest. In fairness, this is pretty subtle; I appreciate your help in clarifying expectations. Now I can see about what might be causing this. I am looping in the appropriate feature teams to investigate further.

@michbern-ms
Copy link
Contributor

Thanks again for all the help on this.

We made some good progress in this, in partnership with the subject matter experts for Windows Crypto. @jwbrothers hypothesis was half-right: the crypto layer is initialized properly, but it is not persisting through container shutdown properly. We understand why the KSP registration is not persisting through container shutdown, but it is not trivial to fix.

We need to plan and cost the engineering work, so we do not have a timeline yet. We recognize this is a long-running issue.

In the meantime, the Crypto engineers confirmed that re-registering the KSP’s DLL is a reasonable workaround and is a safe way of keeping your workload functioning properly in Windows Containers.

@fschmied
Copy link

fschmied commented Jul 5, 2022

@michbern-ms Just a quick question, as we, too, are running into this issue. I understand that you've identified a problem where the KSP registration is not correctly persisted when performed within a container (e.g., while building the container), so information is lost and the KSP will not work when the container is later restarted.

However, I see a second problem here. If I understand correctly, the issue seems to go away if the KSP is registered on the host. AFAIK, a container should not behave differently depending on what is installed/registered on the host, it should have an isolated surroundings. Does that imply that there is a bug in the isolation? And could a container thus maybe corrupt its host's crypto layer configuration in some way?

@iamjplant
Copy link

@fschmied I have not been able to duplicate that with my setup. The KSP is not registered in the container regardless of the host registration.

@fschmied
Copy link

fschmied commented Jul 7, 2022

Okay, that's interesting, as @jwbrothers mentioned that above, and my team have also reported seeing this. However, we were using process isolation whereas the original issue was reported with Hyper-V isolation, maybe that's the difference.

@iamjplant Could you try if you can see what @jwbrothers was reporting above, i.e., different behavior depending on what's installed on the host when using process isolation?

@iamjplant
Copy link

Sadly as soon as I add a --isolation=process to my command I run afoul of my IT department and I can't test it. I did reconfirm that having the KSP registered locally did not carry over to my container w/o the --isolation=process argument.

@michbern-ms
Copy link
Contributor

@fschmied, you raise an interesting question. Process-isolated containers are implemented with a shared kernel, whereas a Hyper-v isolated container has its own kernel. So, I'm also curious to see if the results are consistently different in those two cases.

As a reminder, there is not really a security boundary between a process-isolated container and its host. See:
https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/container-security for a robust discussion of this topic.

@fschmied
Copy link

@michbern-ms We've been able to reproduce that there is a bug in process isolation regarding KSP registrations. See #263 for the full description.

@ghost
Copy link

ghost commented Nov 21, 2022

This issue has been open for 90 days with no updates.
@weijuans-msft, please provide an update or close this issue.

@michbern-ms
Copy link
Contributor

Wanted to provide an update - the engineering work to fix this issue was painstaking, but it is complete and available in builds for the Windows Server Insider Program: https://learn.microsoft.com/en-us/windows-insider/business/server-get-started

Meanwhile, we have started on the process to make this change available on Windows Server 2022.

@jwbrothers
Copy link

Thanks for the update.

@fschmied
Copy link

fschmied commented May 14, 2023

Also thanks for the update! For the time being, we're still on Windows Server 2019, so we'll need to keep working around the issues and hoping none of our customers will find a situation that we can't work around any more. We're planning to move on to Server 2022 in a couple of months, so I'm keeping my fingers crossed.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 90 days with no updates.
@akarshm, please provide an update or close this issue.

@zosocanuck
Copy link

Hi, I'm also encountering the same issues and was curious if there were any new updates?

@michbern-ms
Copy link
Contributor

We are very close. I can't give you a precise date, but we're planning to service the fix to both Server 2019 and Server 2022 rather soon.

@zosocanuck Are you also using the Venafi CSP, or a different one?

@zosocanuck
Copy link

@michbern-ms Yes I'm using the Venafi CSP

@michbern-ms
Copy link
Contributor

michbern-ms commented Sep 12, 2023

This is a humbling update to write, but here goes.

The good news: We completed our work to backport the crypto-stack isolation feature to Server 2019 and Server 2022. It is available in the servicing releases that became generally available today (September 12, 2023). So, the support for having different crypto/KSP configs in a container than exist in the host machine is complete and available. Microsoft is very careful with the millions of existing Windows Server instances in production today, so authoring and verification of these two backports was time-consuming, but we're proud that the work is available now.

Moreover, the whole scenario works end-to-end in the Windows Server 23H2 release, which will become available later this fall:
https://techcommunity.microsoft.com/t5/containers/portability-with-windows-server-annual-channel-for-containers/ba-p/3885911

The bad news: In final testing, we found that there is an issue that only manifests when running with the Venafi provider. This issue manifests as the container failing to start after the Venafi provider is installed.

image

We are investigating this new issue with high priority.

This is clearly not the update I was hoping to post today, but I felt that the transparency was important for all of you who have been paying attention to this investigation.

Thanks,
Michael

@michbern-ms
Copy link
Contributor

We asked our colleagues in the OS crypto and security team for a simple way to demonstrate the new functionality in the September servicing release, and they helpfully sent us the following - some PowerShell commands you can use to see the new crypto-config isolation support:

Using the following PowerShell commands:
Get-TlsCipherSuite (TLS) | Microsoft Learn
Enable-TlsCipherSuite (TLS) | Microsoft Learn
Disable-TlsCipherSuite (TLS) | Microsoft Learn

Call: Get-TlsCipherSuite on both the host and the container should show the current state.
Then from within the container, call: Enable-TlsCipherSuite -Name TLS_DHE_DSS_WITH_AES_256_CBC_SHA_BLA_BLA_BLA this should add this cipher suite to the end of the list.
Without the fix you’d expect to see this listed in both the host and the client when re-running: Get-TlsCipherSuite. With the fix, you only see it in the relevant container.
The fake cipher suite can then be removed by calling: Disable-TlsCipherSuite -Name ‘TLS_DHE_DSS_WITH_AES_256_CBC_SHA_BLA_BLA_BLA’.

Note: Enable-TlsCipherSuite and Disable-TlsCipherSuite need to be run from a PowerShell started as Administrator (Works by default in a container but needs to be done on the host).

@michbern-ms
Copy link
Contributor

I'm happy to say that the new issue identified above on 9/12 is fixed in the Windows Server 11B release, which is available publicly today. We retested the original scenario with the Venafi provider and confirmed that it is working as expected now.

I'd love to get an independent confirmation of that, and then we can close out this thread.

@minego
Copy link

minego commented Nov 14, 2023 via email

@zosocanuck
Copy link

@michbern-ms Is there a specific image tag that you recommend we can test against?

@michbern-ms
Copy link
Contributor

I believe the fix impacts both the container host and the container image. So, for the container, you can use:

mcr.microsoft.com/windows/servercore:ltsc2022-amd64
mcr.microsoft.com/windows/servercore:ltsc2019-amd64

For the host, please be sure you've received the 2023-11 cumulative update.

Thanks,
Michael

@zosocanuck
Copy link

@michbern-ms We were also able to validate the latest servercore images fixed the issues we were seeing with the Venafi provider.

@michbern-ms
Copy link
Contributor

Fantastic! After quite a long journey, I'm happy to see this one closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests