Discussion:
[edk2] [PATCH] OvmfPkg/VirtioLib: Ensure VirtioFlush() is not blocked
Olivier Martin
2015-07-09 16:47:17 UTC
Permalink
.. in case the platform does not received the used buffer from the
device, VirtioFlush() returns EFI_TIMEOUT.

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Olivier Martin <***@arm.com>
---
OvmfPkg/Include/Library/VirtioLib.h | 3 +++
OvmfPkg/Library/VirtioLib/VirtioLib.c | 16 ++++++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/OvmfPkg/Include/Library/VirtioLib.h b/OvmfPkg/Include/Library/VirtioLib.h
index 36527a5..62f6811 100644
--- a/OvmfPkg/Include/Library/VirtioLib.h
+++ b/OvmfPkg/Include/Library/VirtioLib.h
@@ -170,6 +170,9 @@ VirtioAppendDesc (

@return Error code from VirtIo->SetQueueNotify() if it fails.

+ @retval EFI_TIMEOUT If it did not received the used buffer from the device
+ in approximatively less than 10ms
+
@retval EFI_SUCCESS Otherwise, the host processed all descriptors.

**/
diff --git a/OvmfPkg/Library/VirtioLib/VirtioLib.c b/OvmfPkg/Library/VirtioLib/VirtioLib.c
index 54cf225..566f596 100644
--- a/OvmfPkg/Library/VirtioLib/VirtioLib.c
+++ b/OvmfPkg/Library/VirtioLib/VirtioLib.c
@@ -3,7 +3,7 @@
Utility functions used by virtio device drivers.

Copyright (C) 2012, Red Hat, Inc.
- Portion of Copyright (C) 2013, ARM Ltd.
+ Portion of Copyright (C) 2013-2014, ARM Ltd.

This program and the accompanying materials are licensed and made available
under the terms and conditions of the BSD License which accompanies this
@@ -252,6 +252,9 @@ VirtioAppendDesc (

@return Error code from VirtIo->SetQueueNotify() if it fails.

+ @retval EFI_TIMEOUT If it did not received the used buffer from the device
+ in approximatively less than 10ms
+
@retval EFI_SUCCESS Otherwise, the host processed all descriptors.

**/
@@ -267,6 +270,7 @@ VirtioFlush (
UINT16 NextAvailIdx;
EFI_STATUS Status;
UINTN PollPeriodUsecs;
+ UINTN Timeout;

//
// virtio-0.9.5, 2.4.1.2 Updating the Available Ring
@@ -304,16 +308,24 @@ VirtioFlush (
// Keep slowing down until we reach a poll period of slightly above 1 ms.
//
PollPeriodUsecs = 1;
+ Timeout = 0;
MemoryFence();
- while (*Ring->Used.Idx != NextAvailIdx) {
+ while ((*Ring->Used.Idx != NextAvailIdx) && (Timeout < 10)) {
gBS->Stall (PollPeriodUsecs); // calls AcpiTimerLib::MicroSecondDelay

if (PollPeriodUsecs < 1024) {
PollPeriodUsecs *= 2;
+ } else {
+ Timeout++;
}
MemoryFence();
}

MemoryFence();
+
+ if (Timeout == 10) {
+ return EFI_TIMEOUT;
+ }
+
return EFI_SUCCESS;
}
--
2.1.1
Laszlo Ersek
2015-07-09 17:29:37 UTC
Permalink
Post by Olivier Martin
.. in case the platform does not received the used buffer from the
device, VirtioFlush() returns EFI_TIMEOUT.
Contributed-under: TianoCore Contribution Agreement 1.0
---
OvmfPkg/Include/Library/VirtioLib.h | 3 +++
OvmfPkg/Library/VirtioLib/VirtioLib.c | 16 ++++++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/OvmfPkg/Include/Library/VirtioLib.h b/OvmfPkg/Include/Library/VirtioLib.h
index 36527a5..62f6811 100644
--- a/OvmfPkg/Include/Library/VirtioLib.h
+++ b/OvmfPkg/Include/Library/VirtioLib.h
@@ -170,6 +170,9 @@ VirtioAppendDesc (
@return Error code from VirtIo->SetQueueNotify() if it fails.
+ in approximatively less than 10ms
+
@retval EFI_SUCCESS Otherwise, the host processed all descriptors.
**/
diff --git a/OvmfPkg/Library/VirtioLib/VirtioLib.c b/OvmfPkg/Library/VirtioLib/VirtioLib.c
index 54cf225..566f596 100644
--- a/OvmfPkg/Library/VirtioLib/VirtioLib.c
+++ b/OvmfPkg/Library/VirtioLib/VirtioLib.c
@@ -3,7 +3,7 @@
Utility functions used by virtio device drivers.
Copyright (C) 2012, Red Hat, Inc.
- Portion of Copyright (C) 2013, ARM Ltd.
+ Portion of Copyright (C) 2013-2014, ARM Ltd.
This program and the accompanying materials are licensed and made available
under the terms and conditions of the BSD License which accompanies this
@@ -252,6 +252,9 @@ VirtioAppendDesc (
@return Error code from VirtIo->SetQueueNotify() if it fails.
+ in approximatively less than 10ms
+
@retval EFI_SUCCESS Otherwise, the host processed all descriptors.
**/
@@ -267,6 +270,7 @@ VirtioFlush (
UINT16 NextAvailIdx;
EFI_STATUS Status;
UINTN PollPeriodUsecs;
+ UINTN Timeout;
//
// virtio-0.9.5, 2.4.1.2 Updating the Available Ring
@@ -304,16 +308,24 @@ VirtioFlush (
// Keep slowing down until we reach a poll period of slightly above 1 ms.
//
PollPeriodUsecs = 1;
+ Timeout = 0;
MemoryFence();
- while (*Ring->Used.Idx != NextAvailIdx) {
+ while ((*Ring->Used.Idx != NextAvailIdx) && (Timeout < 10)) {
gBS->Stall (PollPeriodUsecs); // calls AcpiTimerLib::MicroSecondDelay
if (PollPeriodUsecs < 1024) {
PollPeriodUsecs *= 2;
+ } else {
+ Timeout++;
}
MemoryFence();
}
MemoryFence();
+
+ if (Timeout == 10) {
+ return EFI_TIMEOUT;
+ }
+
return EFI_SUCCESS;
}
I think if this ever happens, that's a bug in the host / hypervisor.
Under what conditions are you seeing this as necessary?

VirtioFlush() blocks infinitely on purpose in this loop. The protocol
does not define any circumstances for timeout. The communicating peers
are not connected by a fragile network link, they are connected by host
memory. Whenever you time out, that's an arbitrary decision; it might
even misfire dependent on host load.

If you think this is really necessary, then please introduce an integer
PCD. The default value of the PCD (probably 0) should preserve existing
behavior. The PCD should be then overridden by downstreams (by directly
editing the appropriate DSC file). Alternatively, if you have a specific
host platform in mind that necessitates this change (and for which the
value 10 also seems right), then please introduce a new build flag as
well (in the affected DSC(s)). People who build for that platform can
then hardcode that flag in their build scripts.

There are two users of VirtioFlush(): the virtio-blk and virtio-scsi
drivers. Both check for (and propagate) any errors returned by
VirtioFlush() correctly. Any timeout error would reach the outermost
callers (the respective protocol users) just fine. So those parts of the
tree need not be fixed up, I agree. Still, this patch would work around
a host bug, as far as I understand, and the actual timeout looks
arbitrary, so it should be off by default, and controllable without
modifying the C source.

The commit message (and the PCD's comment) should also identify the host
platform that requires this as precisely as possible.

... Obviously, it is *also* possible that the guest code has some other
bug, elsewhere. (We can never exclude that, unless we prove the code
correct with formal tools.) It seems quite unlikely, but if that's the
case, then I'd prefer to track it down and fix it, instead of papering
it over.

Thanks!
Laszlo
Olivier Martin
2015-07-10 15:22:07 UTC
Permalink
Thanks Lazlo, I had this issue with the ARM FVP Models and its Virtio support (see: http://infocenter.arm.com/help/topic/com.arm.doc.dui0834a/Chunk446462681.html). I do not remember which model version. I am happy to abandon this patch until I track down the model version and an action is taken by the ARM model team to fix it.

-----Original Message-----
From: Laszlo Ersek [mailto:***@redhat.com]
Sent: 09 July 2015 18:30
To: Olivier Martin
Cc: ***@intel.com; edk2-***@lists.sourceforge.net
Subject: Re: [PATCH] OvmfPkg/VirtioLib: Ensure VirtioFlush() is not blocked
Post by Olivier Martin
.. in case the platform does not received the used buffer from the
device, VirtioFlush() returns EFI_TIMEOUT.
Contributed-under: TianoCore Contribution Agreement 1.0
---
OvmfPkg/Include/Library/VirtioLib.h | 3 +++
OvmfPkg/Library/VirtioLib/VirtioLib.c | 16 ++++++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/OvmfPkg/Include/Library/VirtioLib.h
b/OvmfPkg/Include/Library/VirtioLib.h
index 36527a5..62f6811 100644
--- a/OvmfPkg/Include/Library/VirtioLib.h
+++ b/OvmfPkg/Include/Library/VirtioLib.h
@@ -170,6 +170,9 @@ VirtioAppendDesc (
@return Error code from VirtIo->SetQueueNotify() if it fails.
+ in approximatively less than 10ms
+
@retval EFI_SUCCESS Otherwise, the host processed all descriptors.
**/
diff --git a/OvmfPkg/Library/VirtioLib/VirtioLib.c
b/OvmfPkg/Library/VirtioLib/VirtioLib.c
index 54cf225..566f596 100644
--- a/OvmfPkg/Library/VirtioLib/VirtioLib.c
+++ b/OvmfPkg/Library/VirtioLib/VirtioLib.c
@@ -3,7 +3,7 @@
Utility functions used by virtio device drivers.
Copyright (C) 2012, Red Hat, Inc.
- Portion of Copyright (C) 2013, ARM Ltd.
+ Portion of Copyright (C) 2013-2014, ARM Ltd.
This program and the accompanying materials are licensed and made available
under the terms and conditions of the BSD License which accompanies
@return Error code from VirtIo->SetQueueNotify() if it fails.
+ in approximatively less than 10ms
+
@retval EFI_SUCCESS Otherwise, the host processed all descriptors.
**/
@@ -267,6 +270,7 @@ VirtioFlush (
UINT16 NextAvailIdx;
EFI_STATUS Status;
UINTN PollPeriodUsecs;
+ UINTN Timeout;
//
// Keep slowing down until we reach a poll period of slightly above 1 ms.
//
PollPeriodUsecs = 1;
+ Timeout = 0;
MemoryFence();
- while (*Ring->Used.Idx != NextAvailIdx) {
+ while ((*Ring->Used.Idx != NextAvailIdx) && (Timeout < 10)) {
gBS->Stall (PollPeriodUsecs); // calls
AcpiTimerLib::MicroSecondDelay
if (PollPeriodUsecs < 1024) {
PollPeriodUsecs *= 2;
+ } else {
+ Timeout++;
}
MemoryFence();
}
MemoryFence();
+
+ if (Timeout == 10) {
+ return EFI_TIMEOUT;
+ }
+
return EFI_SUCCESS;
}
I think if this ever happens, that's a bug in the host / hypervisor.
Under what conditions are you seeing this as necessary?

VirtioFlush() blocks infinitely on purpose in this loop. The protocol does not define any circumstances for timeout. The communicating peers are not connected by a fragile network link, they are connected by host memory. Whenever you time out, that's an arbitrary decision; it might even misfire dependent on host load.

If you think this is really necessary, then please introduce an integer PCD. The default value of the PCD (probably 0) should preserve existing behavior. The PCD should be then overridden by downstreams (by directly editing the appropriate DSC file). Alternatively, if you have a specific host platform in mind that necessitates this change (and for which the value 10 also seems right), then please introduce a new build flag as well (in the affected DSC(s)). People who build for that platform can then hardcode that flag in their build scripts.

There are two users of VirtioFlush(): the virtio-blk and virtio-scsi drivers. Both check for (and propagate) any errors returned by
VirtioFlush() correctly. Any timeout error would reach the outermost callers (the respective protocol users) just fine. So those parts of the tree need not be fixed up, I agree. Still, this patch would work around a host bug, as far as I understand, and the actual timeout looks arbitrary, so it should be off by default, and controllable without modifying the C source.

The commit message (and the PCD's comment) should also identify the host platform that requires this as precisely as possible.

... Obviously, it is *also* possible that the guest code has some other bug, elsewhere. (We can never exclude that, unless we prove the code correct with formal tools.) It seems quite unlikely, but if that's the case, then I'd prefer to track it down and fix it, instead of papering it over.

Thanks!
Laszlo


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
Laszlo Ersek
2015-07-10 15:59:02 UTC
Permalink
Post by Olivier Martin
Thanks Lazlo, I had this issue with the ARM FVP Models and its Virtio
http://infocenter.arm.com/help/topic/com.arm.doc.dui0834a/Chunk446462681.html).
I do not remember which model version. I am happy to abandon this
patch until I track down the model version and an action is taken by
the ARM model team to fix it.
I agree that this would be the best course of action -- in November 2013
or so I reported a virtio bug for the Foundation model (case #566785,
"full device reset doesn't reset QueuePFNs"), with a reproducer guest
kernel:

http://people.redhat.com/~lersek/arm_case566785_queuepfn_repro/

The model team at ARM fixed the bug for the next release. It was a
positive experience for me.

Thanks!
Laszlo
Post by Olivier Martin
-----Original Message-----
Sent: 09 July 2015 18:30
To: Olivier Martin
Subject: Re: [PATCH] OvmfPkg/VirtioLib: Ensure VirtioFlush() is not blocked
Post by Olivier Martin
.. in case the platform does not received the used buffer from the
device, VirtioFlush() returns EFI_TIMEOUT.
Contributed-under: TianoCore Contribution Agreement 1.0
---
OvmfPkg/Include/Library/VirtioLib.h | 3 +++
OvmfPkg/Library/VirtioLib/VirtioLib.c | 16 ++++++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/OvmfPkg/Include/Library/VirtioLib.h
b/OvmfPkg/Include/Library/VirtioLib.h
index 36527a5..62f6811 100644
--- a/OvmfPkg/Include/Library/VirtioLib.h
+++ b/OvmfPkg/Include/Library/VirtioLib.h
@@ -170,6 +170,9 @@ VirtioAppendDesc (
@return Error code from VirtIo->SetQueueNotify() if it fails.
+ in approximatively less than 10ms
+
@retval EFI_SUCCESS Otherwise, the host processed all descriptors.
**/
diff --git a/OvmfPkg/Library/VirtioLib/VirtioLib.c
b/OvmfPkg/Library/VirtioLib/VirtioLib.c
index 54cf225..566f596 100644
--- a/OvmfPkg/Library/VirtioLib/VirtioLib.c
+++ b/OvmfPkg/Library/VirtioLib/VirtioLib.c
@@ -3,7 +3,7 @@
Utility functions used by virtio device drivers.
Copyright (C) 2012, Red Hat, Inc.
- Portion of Copyright (C) 2013, ARM Ltd.
+ Portion of Copyright (C) 2013-2014, ARM Ltd.
This program and the accompanying materials are licensed and made available
under the terms and conditions of the BSD License which accompanies
@return Error code from VirtIo->SetQueueNotify() if it fails.
+ in approximatively less than 10ms
+
@retval EFI_SUCCESS Otherwise, the host processed all descriptors.
**/
@@ -267,6 +270,7 @@ VirtioFlush (
UINT16 NextAvailIdx;
EFI_STATUS Status;
UINTN PollPeriodUsecs;
+ UINTN Timeout;
//
// Keep slowing down until we reach a poll period of slightly above 1 ms.
//
PollPeriodUsecs = 1;
+ Timeout = 0;
MemoryFence();
- while (*Ring->Used.Idx != NextAvailIdx) {
+ while ((*Ring->Used.Idx != NextAvailIdx) && (Timeout < 10)) {
gBS->Stall (PollPeriodUsecs); // calls
AcpiTimerLib::MicroSecondDelay
if (PollPeriodUsecs < 1024) {
PollPeriodUsecs *= 2;
+ } else {
+ Timeout++;
}
MemoryFence();
}
MemoryFence();
+
+ if (Timeout == 10) {
+ return EFI_TIMEOUT;
+ }
+
return EFI_SUCCESS;
}
I think if this ever happens, that's a bug in the host / hypervisor.
Under what conditions are you seeing this as necessary?
VirtioFlush() blocks infinitely on purpose in this loop. The protocol does not define any circumstances for timeout. The communicating peers are not connected by a fragile network link, they are connected by host memory. Whenever you time out, that's an arbitrary decision; it might even misfire dependent on host load.
If you think this is really necessary, then please introduce an integer PCD. The default value of the PCD (probably 0) should preserve existing behavior. The PCD should be then overridden by downstreams (by directly editing the appropriate DSC file). Alternatively, if you have a specific host platform in mind that necessitates this change (and for which the value 10 also seems right), then please introduce a new build flag as well (in the affected DSC(s)). People who build for that platform can then hardcode that flag in their build scripts.
There are two users of VirtioFlush(): the virtio-blk and virtio-scsi drivers. Both check for (and propagate) any errors returned by
VirtioFlush() correctly. Any timeout error would reach the outermost callers (the respective protocol users) just fine. So those parts of the tree need not be fixed up, I agree. Still, this patch would work around a host bug, as far as I understand, and the actual timeout looks arbitrary, so it should be off by default, and controllable without modifying the C source.
The commit message (and the PCD's comment) should also identify the host platform that requires this as precisely as possible.
... Obviously, it is *also* possible that the guest code has some other bug, elsewhere. (We can never exclude that, unless we prove the code correct with formal tools.) It seems quite unlikely, but if that's the case, then I'd prefer to track it down and fix it, instead of papering it over.
Thanks!
Laszlo
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
Loading...