Discussion:
[edk2] [PATCH 0/4] OvmfPkg: enable >= 64 GB guests
Laszlo Ersek
2015-06-16 17:05:23 UTC
Permalink
No functional changes in v2. I just added Tested-by tags, improved the
English grammar in a comment in patch #1, and dropped the BUG language
in the commit message of patch #4 (because that issue was caused by the
EPT setup / phys-address-width of my laptop).

Public branch: <https://github.com/lersek/edk2/commits/bigram_v1>.

(I placed "v1" in the branch name not because I expect to have to post a
v2, but to clearly distinguish it from the earlier RFC series.)

Cc: Maoming <***@huawei.com>
Cc: Huangpeng (Peter) <***@huawei.com>
Cc: Wei Liu <***@citrix.com>

Thanks
Laszlo

Laszlo Ersek (4):
OvmfPkg: PlatformPei: enable larger permanent PEI RAM
OvmfPkg: PlatformPei: create the CPU HOB with dynamic memory space
width
OvmfPkg: PlatformPei: beautify memory HOB order in QemuInitializeRam()
OvmfPkg: PlatformPei: invert MTRR setup in QemuInitializeRam()

OvmfPkg/PlatformPei/Platform.h | 7 +++
OvmfPkg/PlatformPei/MemDetect.c | 115 +++++++++++++++++++++++++++++++++++-----
OvmfPkg/PlatformPei/Platform.c | 7 ++-
3 files changed, 114 insertions(+), 15 deletions(-)
--
1.8.3.1


------------------------------------------------------------------------------
Laszlo Ersek
2015-06-16 17:05:24 UTC
Permalink
We'll soon increase the maximum guest-physical RAM size supported by OVMF.
For more RAM, the DXE IPL is going to build more page tables, and for that
it's going to need a bigger chunk from the permanent PEI RAM.
DXE IPL Entry
Loading PEIM at 0x000BFF61000 EntryPoint=0x000BFF61260 DxeCore.efi
Loading DXE CORE at 0x000BFF61000 EntryPoint=0x000BFF61260
AllocatePages failed: No 0x40201 Pages is available.
There is only left 0x3F1F pages memory resource to be allocated.
BigPageAddress != 0
(The above example belongs to the artificially high, maximal address width
of 52, clamped by the DXE core to 48. The address width of 48 bits
corresponds to 256 TB or RAM, and requires a bit more than 1GB for paging
structures.)

Cc: Maoming <***@huawei.com>
Cc: Huangpeng (Peter) <***@huawei.com>
Cc: Wei Liu <***@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <***@redhat.com>
Tested-by: Wei Liu <***@citrix.com>
Tested-by: Maoming <***@huawei.com>
---
OvmfPkg/PlatformPei/Platform.h | 7 +++++
OvmfPkg/PlatformPei/MemDetect.c | 61 +++++++++++++++++++++++++++++++++++++++--
OvmfPkg/PlatformPei/Platform.c | 1 +
3 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/OvmfPkg/PlatformPei/Platform.h b/OvmfPkg/PlatformPei/Platform.h
index 31640e9..8b6a976 100644
--- a/OvmfPkg/PlatformPei/Platform.h
+++ b/OvmfPkg/PlatformPei/Platform.h
@@ -59,6 +59,11 @@ AddUntestedMemoryRangeHob (
EFI_PHYSICAL_ADDRESS MemoryLimit
);

+VOID
+AddressWidthInitialization (
+ VOID
+ );
+
EFI_STATUS
PublishPeiMemory (
VOID
@@ -100,4 +105,6 @@ extern EFI_BOOT_MODE mBootMode;

extern BOOLEAN mS3Supported;

+extern UINT8 mPhysMemAddressWidth;
+
#endif // _PLATFORM_PEI_H_INCLUDED_
diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index bd7bb02..6b424f7 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
@@ -36,6 +36,8 @@ Module Name:
#include "Platform.h"
#include "Cmos.h"

+UINT8 mPhysMemAddressWidth;
+
UINT32
GetSystemMemorySizeBelow4gb (
VOID
@@ -84,6 +86,47 @@ GetSystemMemorySizeAbove4gb (
return LShiftU64 (Size, 16);
}

+
+/**
+ Initialize the mPhysMemAddressWidth variable, based on guest RAM size.
+**/
+VOID
+AddressWidthInitialization (
+ VOID
+ )
+{
+ UINT64 FirstNonAddress;
+
+ //
+ // As guest-physical memory size grows, the permanent PEI RAM requirements
+ // are dominated by the identity-mapping page tables built by the DXE IPL.
+ // The DXL IPL keys off of the physical address bits advertized in the CPU
+ // HOB. To conserve memory, we calculate the minimum address width here.
+ //
+ FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
+ mPhysMemAddressWidth = (UINT8)HighBitSet64 (FirstNonAddress);
+
+ //
+ // If FirstNonAddress is not an integral power of two, then we need an
+ // additional bit.
+ //
+ if ((FirstNonAddress & (FirstNonAddress - 1)) != 0) {
+ ++mPhysMemAddressWidth;
+ }
+
+ //
+ // The minimum address width is 36 (covers up to and excluding 64 GB, which
+ // is the maximum for Ia32 + PAE). The theoretical architecture maximum for
+ // X64 long mode is 52 bits, but the DXE IPL clamps that down to 48 bits. We
+ // can simply assert that here, since 48 bits are good enough for 256 TB.
+ //
+ if (mPhysMemAddressWidth <= 36) {
+ mPhysMemAddressWidth = 36;
+ }
+ ASSERT (mPhysMemAddressWidth <= 48);
+}
+
+
/**
Publish PEI core memory

@@ -99,6 +142,7 @@ PublishPeiMemory (
EFI_PHYSICAL_ADDRESS MemoryBase;
UINT64 MemorySize;
UINT64 LowerMemorySize;
+ UINT32 PeiMemoryCap;

if (mBootMode == BOOT_ON_S3_RESUME) {
MemoryBase = PcdGet32 (PcdS3AcpiReservedMemoryBase);
@@ -107,13 +151,24 @@ PublishPeiMemory (
LowerMemorySize = GetSystemMemorySizeBelow4gb ();

//
+ // For the minimum address width of 36, installing 64 MB as permanent PEI
+ // RAM is sufficient. For the maximum width, the DXE IPL needs a bit more
+ // than 1 GB for paging structures. Therefore we establish an exponential
+ // formula so that the 48-36+1=13 different widths map to permanent PEI RAM
+ // sizes in [64 MB, 2 GB], that is [1<<26, 1<<31]; 6 different powers.
+ //
+ PeiMemoryCap = SIZE_64MB << ((mPhysMemAddressWidth - 36) * 5 / 12);
+ DEBUG ((EFI_D_INFO, "%a: mPhysMemAddressWidth=%d PeiMemoryCap=%uMB\n",
+ __FUNCTION__, mPhysMemAddressWidth, PeiMemoryCap >> 20));
+
+ //
// Determine the range of memory to use during PEI
//
MemoryBase = PcdGet32 (PcdOvmfDxeMemFvBase) + PcdGet32 (PcdOvmfDxeMemFvSize);
MemorySize = LowerMemorySize - MemoryBase;
- if (MemorySize > SIZE_64MB) {
- MemoryBase = LowerMemorySize - SIZE_64MB;
- MemorySize = SIZE_64MB;
+ if (MemorySize > PeiMemoryCap) {
+ MemoryBase = LowerMemorySize - PeiMemoryCap;
+ MemorySize = PeiMemoryCap;
}
}

diff --git a/OvmfPkg/PlatformPei/Platform.c b/OvmfPkg/PlatformPei/Platform.c
index 1ad5bfc..54ec822 100644
--- a/OvmfPkg/PlatformPei/Platform.c
+++ b/OvmfPkg/PlatformPei/Platform.c
@@ -405,6 +405,7 @@ InitializePlatform (
}

BootModeInitialization ();
+ AddressWidthInitialization ();

PublishPeiMemory ();
--
1.8.3.1



------------------------------------------------------------------------------
Jordan Justen
2015-06-22 18:34:37 UTC
Permalink
Post by Laszlo Ersek
We'll soon increase the maximum guest-physical RAM size supported by OVMF.
For more RAM, the DXE IPL is going to build more page tables, and for that
it's going to need a bigger chunk from the permanent PEI RAM.
DXE IPL Entry
Loading PEIM at 0x000BFF61000 EntryPoint=0x000BFF61260 DxeCore.efi
Loading DXE CORE at 0x000BFF61000 EntryPoint=0x000BFF61260
AllocatePages failed: No 0x40201 Pages is available.
There is only left 0x3F1F pages memory resource to be allocated.
BigPageAddress != 0
(The above example belongs to the artificially high, maximal address width
of 52, clamped by the DXE core to 48. The address width of 48 bits
corresponds to 256 TB or RAM, and requires a bit more than 1GB for paging
structures.)
Contributed-under: TianoCore Contribution Agreement 1.0
---
OvmfPkg/PlatformPei/Platform.h | 7 +++++
OvmfPkg/PlatformPei/MemDetect.c | 61 +++++++++++++++++++++++++++++++++++++++--
OvmfPkg/PlatformPei/Platform.c | 1 +
3 files changed, 66 insertions(+), 3 deletions(-)
diff --git a/OvmfPkg/PlatformPei/Platform.h b/OvmfPkg/PlatformPei/Platform.h
index 31640e9..8b6a976 100644
--- a/OvmfPkg/PlatformPei/Platform.h
+++ b/OvmfPkg/PlatformPei/Platform.h
@@ -59,6 +59,11 @@ AddUntestedMemoryRangeHob (
EFI_PHYSICAL_ADDRESS MemoryLimit
);
+VOID
+AddressWidthInitialization (
+ VOID
+ );
+
EFI_STATUS
PublishPeiMemory (
VOID
@@ -100,4 +105,6 @@ extern EFI_BOOT_MODE mBootMode;
extern BOOLEAN mS3Supported;
+extern UINT8 mPhysMemAddressWidth;
+
#endif // _PLATFORM_PEI_H_INCLUDED_
diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index bd7bb02..6b424f7 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
#include "Platform.h"
#include "Cmos.h"
+UINT8 mPhysMemAddressWidth;
+
UINT32
GetSystemMemorySizeBelow4gb (
VOID
@@ -84,6 +86,47 @@ GetSystemMemorySizeAbove4gb (
return LShiftU64 (Size, 16);
}
+
+/**
+ Initialize the mPhysMemAddressWidth variable, based on guest RAM size.
+**/
+VOID
+AddressWidthInitialization (
+ VOID
+ )
+{
+ UINT64 FirstNonAddress;
+
+ //
+ // As guest-physical memory size grows, the permanent PEI RAM requirements
+ // are dominated by the identity-mapping page tables built by the DXE IPL.
+ // The DXL IPL keys off of the physical address bits advertized in the CPU
+ // HOB. To conserve memory, we calculate the minimum address width here.
+ //
+ FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
+ mPhysMemAddressWidth = (UINT8)HighBitSet64 (FirstNonAddress);
+
+ //
+ // If FirstNonAddress is not an integral power of two, then we need an
+ // additional bit.
+ //
+ if ((FirstNonAddress & (FirstNonAddress - 1)) != 0) {
+ ++mPhysMemAddressWidth;
+ }
+
+ //
+ // The minimum address width is 36 (covers up to and excluding 64 GB, which
+ // is the maximum for Ia32 + PAE). The theoretical architecture maximum for
+ // X64 long mode is 52 bits, but the DXE IPL clamps that down to 48 bits. We
+ // can simply assert that here, since 48 bits are good enough for 256 TB.
+ //
+ if (mPhysMemAddressWidth <= 36) {
+ mPhysMemAddressWidth = 36;
+ }
+ ASSERT (mPhysMemAddressWidth <= 48);
+}
+
+
/**
Publish PEI core memory
@@ -99,6 +142,7 @@ PublishPeiMemory (
EFI_PHYSICAL_ADDRESS MemoryBase;
UINT64 MemorySize;
UINT64 LowerMemorySize;
+ UINT32 PeiMemoryCap;
if (mBootMode == BOOT_ON_S3_RESUME) {
MemoryBase = PcdGet32 (PcdS3AcpiReservedMemoryBase);
@@ -107,13 +151,24 @@ PublishPeiMemory (
LowerMemorySize = GetSystemMemorySizeBelow4gb ();
//
+ // For the minimum address width of 36, installing 64 MB as permanent PEI
+ // RAM is sufficient. For the maximum width, the DXE IPL needs a bit more
+ // than 1 GB for paging structures. Therefore we establish an exponential
+ // formula so that the 48-36+1=13 different widths map to permanent PEI RAM
+ // sizes in [64 MB, 2 GB], that is [1<<26, 1<<31]; 6 different powers.
+ //
+ PeiMemoryCap = SIZE_64MB << ((mPhysMemAddressWidth - 36) * 5 / 12);
for p in map(lambda a: (a, 64 << ((a-36) * 5 / 12)), range(36,49)): print '%d bits => %d MB' % p
...
36 bits => 64 MB
37 bits => 64 MB
38 bits => 64 MB
39 bits => 128 MB
40 bits => 128 MB
41 bits => 256 MB
42 bits => 256 MB
43 bits => 256 MB
44 bits => 512 MB
45 bits => 512 MB
46 bits => 1024 MB
47 bits => 1024 MB
48 bits => 2048 MB

Interesting, but I still don't quite feel like it makes (intuitive)
sense to me. :)

What if you make a calculation of how big the tables should be and
then add 64 MB to it?

-Jordan
Post by Laszlo Ersek
+ DEBUG ((EFI_D_INFO, "%a: mPhysMemAddressWidth=%d PeiMemoryCap=%uMB\n",
+ __FUNCTION__, mPhysMemAddressWidth, PeiMemoryCap >> 20));
+
+ //
// Determine the range of memory to use during PEI
//
MemoryBase = PcdGet32 (PcdOvmfDxeMemFvBase) + PcdGet32 (PcdOvmfDxeMemFvSize);
MemorySize = LowerMemorySize - MemoryBase;
- if (MemorySize > SIZE_64MB) {
- MemoryBase = LowerMemorySize - SIZE_64MB;
- MemorySize = SIZE_64MB;
+ if (MemorySize > PeiMemoryCap) {
+ MemoryBase = LowerMemorySize - PeiMemoryCap;
+ MemorySize = PeiMemoryCap;
}
}
diff --git a/OvmfPkg/PlatformPei/Platform.c b/OvmfPkg/PlatformPei/Platform.c
index 1ad5bfc..54ec822 100644
--- a/OvmfPkg/PlatformPei/Platform.c
+++ b/OvmfPkg/PlatformPei/Platform.c
@@ -405,6 +405,7 @@ InitializePlatform (
}
BootModeInitialization ();
+ AddressWidthInitialization ();
PublishPeiMemory ();
--
1.8.3.1
------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
https://lists.sourceforge.net/lists/listinfo/edk2-devel
Laszlo Ersek
2015-06-22 19:07:05 UTC
Permalink
Post by Jordan Justen
Post by Laszlo Ersek
We'll soon increase the maximum guest-physical RAM size supported by OVMF.
For more RAM, the DXE IPL is going to build more page tables, and for that
it's going to need a bigger chunk from the permanent PEI RAM.
DXE IPL Entry
Loading PEIM at 0x000BFF61000 EntryPoint=0x000BFF61260 DxeCore.efi
Loading DXE CORE at 0x000BFF61000 EntryPoint=0x000BFF61260
AllocatePages failed: No 0x40201 Pages is available.
There is only left 0x3F1F pages memory resource to be allocated.
BigPageAddress != 0
(The above example belongs to the artificially high, maximal address width
of 52, clamped by the DXE core to 48. The address width of 48 bits
corresponds to 256 TB or RAM, and requires a bit more than 1GB for paging
structures.)
Contributed-under: TianoCore Contribution Agreement 1.0
---
OvmfPkg/PlatformPei/Platform.h | 7 +++++
OvmfPkg/PlatformPei/MemDetect.c | 61 +++++++++++++++++++++++++++++++++++++++--
OvmfPkg/PlatformPei/Platform.c | 1 +
3 files changed, 66 insertions(+), 3 deletions(-)
diff --git a/OvmfPkg/PlatformPei/Platform.h b/OvmfPkg/PlatformPei/Platform.h
index 31640e9..8b6a976 100644
--- a/OvmfPkg/PlatformPei/Platform.h
+++ b/OvmfPkg/PlatformPei/Platform.h
@@ -59,6 +59,11 @@ AddUntestedMemoryRangeHob (
EFI_PHYSICAL_ADDRESS MemoryLimit
);
+VOID
+AddressWidthInitialization (
+ VOID
+ );
+
EFI_STATUS
PublishPeiMemory (
VOID
@@ -100,4 +105,6 @@ extern EFI_BOOT_MODE mBootMode;
extern BOOLEAN mS3Supported;
+extern UINT8 mPhysMemAddressWidth;
+
#endif // _PLATFORM_PEI_H_INCLUDED_
diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index bd7bb02..6b424f7 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
#include "Platform.h"
#include "Cmos.h"
+UINT8 mPhysMemAddressWidth;
+
UINT32
GetSystemMemorySizeBelow4gb (
VOID
@@ -84,6 +86,47 @@ GetSystemMemorySizeAbove4gb (
return LShiftU64 (Size, 16);
}
+
+/**
+ Initialize the mPhysMemAddressWidth variable, based on guest RAM size.
+**/
+VOID
+AddressWidthInitialization (
+ VOID
+ )
+{
+ UINT64 FirstNonAddress;
+
+ //
+ // As guest-physical memory size grows, the permanent PEI RAM requirements
+ // are dominated by the identity-mapping page tables built by the DXE IPL.
+ // The DXL IPL keys off of the physical address bits advertized in the CPU
+ // HOB. To conserve memory, we calculate the minimum address width here.
+ //
+ FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
+ mPhysMemAddressWidth = (UINT8)HighBitSet64 (FirstNonAddress);
+
+ //
+ // If FirstNonAddress is not an integral power of two, then we need an
+ // additional bit.
+ //
+ if ((FirstNonAddress & (FirstNonAddress - 1)) != 0) {
+ ++mPhysMemAddressWidth;
+ }
+
+ //
+ // The minimum address width is 36 (covers up to and excluding 64 GB, which
+ // is the maximum for Ia32 + PAE). The theoretical architecture maximum for
+ // X64 long mode is 52 bits, but the DXE IPL clamps that down to 48 bits. We
+ // can simply assert that here, since 48 bits are good enough for 256 TB.
+ //
+ if (mPhysMemAddressWidth <= 36) {
+ mPhysMemAddressWidth = 36;
+ }
+ ASSERT (mPhysMemAddressWidth <= 48);
+}
+
+
/**
Publish PEI core memory
@@ -99,6 +142,7 @@ PublishPeiMemory (
EFI_PHYSICAL_ADDRESS MemoryBase;
UINT64 MemorySize;
UINT64 LowerMemorySize;
+ UINT32 PeiMemoryCap;
if (mBootMode == BOOT_ON_S3_RESUME) {
MemoryBase = PcdGet32 (PcdS3AcpiReservedMemoryBase);
@@ -107,13 +151,24 @@ PublishPeiMemory (
LowerMemorySize = GetSystemMemorySizeBelow4gb ();
//
+ // For the minimum address width of 36, installing 64 MB as permanent PEI
+ // RAM is sufficient. For the maximum width, the DXE IPL needs a bit more
+ // than 1 GB for paging structures. Therefore we establish an exponential
+ // formula so that the 48-36+1=13 different widths map to permanent PEI RAM
+ // sizes in [64 MB, 2 GB], that is [1<<26, 1<<31]; 6 different powers.
+ //
+ PeiMemoryCap = SIZE_64MB << ((mPhysMemAddressWidth - 36) * 5 / 12);
for p in map(lambda a: (a, 64 << ((a-36) * 5 / 12)), range(36,49)): print '%d bits => %d MB' % p
...
36 bits => 64 MB
37 bits => 64 MB
38 bits => 64 MB
39 bits => 128 MB
40 bits => 128 MB
41 bits => 256 MB
42 bits => 256 MB
43 bits => 256 MB
44 bits => 512 MB
45 bits => 512 MB
46 bits => 1024 MB
47 bits => 1024 MB
48 bits => 2048 MB
Interesting, but I still don't quite feel like it makes (intuitive)
sense to me. :)
When you add one bit to mPhysMemAddressWidth, the address space to be
mapped doubles. That most probably means some kind of multiplicative
increase for the space required by the page tables too.

However, since we know the "boundary conditions" from experimentation,
the multiplier for the page tables is not 2 (per bit added to
mPhysMemAddressWidth). Instead, "a cumulative multiplier" of 2^12 (36
--> 48 bits) in mPhysMemAddressWidth leads to an empirical "cumulative
multiplier" of 2^5 (2^6 MB --> 2^11 MB).

If you distribute 2^5 over 12 bumps, multiplicatively, you get the above
formula. Basically, the evened out multiplier for the memory needed by
the page tables is -- empirically -- 2^(5/12) ~= 1.3348. (Also known as
the 12th root of 2^5.)

So, the idea is, if you double the address space to identity-map, the
space used for the mapping should get multipled by approx. 1.3348. I've
seen similar formulas in similar situations. (For example: assume that a
VM is supposed to have N GB as *dynamically* settable maximum RAM; what
is then the minimum RAM you should guarantee for the guest OS at startup
so that it can map all of those N GB later on? Etc.)

As I said, this was derived empirically, in reverse thinking, from the
boundary values (which are known).
Post by Jordan Justen
What if you make a calculation of how big the tables should be and
then add 64 MB to it?
Any calculation I could come up with (ie. how much RAM *I* would need
for identity mapping the address sapce) would not be any better than the
above empirical formula -- because whatever I'd invent might not bear
any resemblance to how the actual allocation happens.

The allocation is done in CreateIdentityMappingPageTables(), in the file
mentioned by the commit message (in the ASSERT()); I didn't want to try
to duplicate all that calculation here (which can depend on a PCD and on
CPUID results too). I think that a rough estimate is good enough. (Note
that across the entire range, the significance of the PEI RAM cap
diminishes, because for a factor of 4096 on the LHS, the RHS is only
multiplied by 32 overall.) If someone encounters a case where this
formula is off a little bit (ie. runs out of PEI RAM), we can tune it
further. (That's what I have seen in the situations mentioned above --
the constant multiplier idea is usually not scrapped, just the
boundaries are tweaked or a constant additive offset is introduced as
last step etc.)

Thanks
Laszlo
Post by Jordan Justen
-Jordan
Post by Laszlo Ersek
+ DEBUG ((EFI_D_INFO, "%a: mPhysMemAddressWidth=%d PeiMemoryCap=%uMB\n",
+ __FUNCTION__, mPhysMemAddressWidth, PeiMemoryCap >> 20));
+
+ //
// Determine the range of memory to use during PEI
//
MemoryBase = PcdGet32 (PcdOvmfDxeMemFvBase) + PcdGet32 (PcdOvmfDxeMemFvSize);
MemorySize = LowerMemorySize - MemoryBase;
- if (MemorySize > SIZE_64MB) {
- MemoryBase = LowerMemorySize - SIZE_64MB;
- MemorySize = SIZE_64MB;
+ if (MemorySize > PeiMemoryCap) {
+ MemoryBase = LowerMemorySize - PeiMemoryCap;
+ MemorySize = PeiMemoryCap;
}
}
diff --git a/OvmfPkg/PlatformPei/Platform.c b/OvmfPkg/PlatformPei/Platform.c
index 1ad5bfc..54ec822 100644
--- a/OvmfPkg/PlatformPei/Platform.c
+++ b/OvmfPkg/PlatformPei/Platform.c
@@ -405,6 +405,7 @@ InitializePlatform (
}
BootModeInitialization ();
+ AddressWidthInitialization ();
PublishPeiMemory ();
--
1.8.3.1
------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
https://lists.sourceforge.net/lists/listinfo/edk2-devel
Jordan Justen
2015-06-22 19:46:56 UTC
Permalink
Post by Laszlo Ersek
Post by Jordan Justen
What if you make a calculation of how big the tables should be and
then add 64 MB to it?
Any calculation I could come up with (ie. how much RAM *I* would need
for identity mapping the address sapce) would not be any better than the
above empirical formula -- because whatever I'd invent might not bear
any resemblance to how the actual allocation happens.
I don't think this is the case. The code that does this in DxeIpl
doesn't change that often, and it is not too different from other
similar firmware/OS code.

It seems a lot more intuitive to me to say something like: "we need a
big blob of memory for page tables (see DxeIpl), and then about 64MB
more for misc"

-Jordan
Brian J. Johnson
2015-06-22 19:22:54 UTC
Permalink
Post by Jordan Justen
Post by Laszlo Ersek
We'll soon increase the maximum guest-physical RAM size supported by OVMF.
For more RAM, the DXE IPL is going to build more page tables, and for that
it's going to need a bigger chunk from the permanent PEI RAM.
DXE IPL Entry
Loading PEIM at 0x000BFF61000 EntryPoint=0x000BFF61260 DxeCore.efi
Loading DXE CORE at 0x000BFF61000 EntryPoint=0x000BFF61260
AllocatePages failed: No 0x40201 Pages is available.
There is only left 0x3F1F pages memory resource to be allocated.
BigPageAddress != 0
(The above example belongs to the artificially high, maximal address width
of 52, clamped by the DXE core to 48. The address width of 48 bits
corresponds to 256 TB or RAM, and requires a bit more than 1GB for paging
structures.)
Contributed-under: TianoCore Contribution Agreement 1.0
---
OvmfPkg/PlatformPei/Platform.h | 7 +++++
OvmfPkg/PlatformPei/MemDetect.c | 61 +++++++++++++++++++++++++++++++++++++++--
OvmfPkg/PlatformPei/Platform.c | 1 +
3 files changed, 66 insertions(+), 3 deletions(-)
diff --git a/OvmfPkg/PlatformPei/Platform.h b/OvmfPkg/PlatformPei/Platform.h
index 31640e9..8b6a976 100644
--- a/OvmfPkg/PlatformPei/Platform.h
+++ b/OvmfPkg/PlatformPei/Platform.h
@@ -59,6 +59,11 @@ AddUntestedMemoryRangeHob (
EFI_PHYSICAL_ADDRESS MemoryLimit
);
+VOID
+AddressWidthInitialization (
+ VOID
+ );
+
EFI_STATUS
PublishPeiMemory (
VOID
@@ -100,4 +105,6 @@ extern EFI_BOOT_MODE mBootMode;
extern BOOLEAN mS3Supported;
+extern UINT8 mPhysMemAddressWidth;
+
#endif // _PLATFORM_PEI_H_INCLUDED_
diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index bd7bb02..6b424f7 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
#include "Platform.h"
#include "Cmos.h"
+UINT8 mPhysMemAddressWidth;
+
UINT32
GetSystemMemorySizeBelow4gb (
VOID
@@ -84,6 +86,47 @@ GetSystemMemorySizeAbove4gb (
return LShiftU64 (Size, 16);
}
+
+/**
+ Initialize the mPhysMemAddressWidth variable, based on guest RAM size.
+**/
+VOID
+AddressWidthInitialization (
+ VOID
+ )
+{
+ UINT64 FirstNonAddress;
+
+ //
+ // As guest-physical memory size grows, the permanent PEI RAM requirements
+ // are dominated by the identity-mapping page tables built by the DXE IPL.
+ // The DXL IPL keys off of the physical address bits advertized in the CPU
+ // HOB. To conserve memory, we calculate the minimum address width here.
+ //
+ FirstNonAddress = BASE_4GB + GetSystemMemorySizeAbove4gb ();
+ mPhysMemAddressWidth = (UINT8)HighBitSet64 (FirstNonAddress);
+
+ //
+ // If FirstNonAddress is not an integral power of two, then we need an
+ // additional bit.
+ //
+ if ((FirstNonAddress & (FirstNonAddress - 1)) != 0) {
+ ++mPhysMemAddressWidth;
+ }
+
+ //
+ // The minimum address width is 36 (covers up to and excluding 64 GB, which
+ // is the maximum for Ia32 + PAE). The theoretical architecture maximum for
+ // X64 long mode is 52 bits, but the DXE IPL clamps that down to 48 bits. We
+ // can simply assert that here, since 48 bits are good enough for 256 TB.
+ //
+ if (mPhysMemAddressWidth <= 36) {
+ mPhysMemAddressWidth = 36;
+ }
+ ASSERT (mPhysMemAddressWidth <= 48);
+}
+
+
/**
Publish PEI core memory
@@ -99,6 +142,7 @@ PublishPeiMemory (
EFI_PHYSICAL_ADDRESS MemoryBase;
UINT64 MemorySize;
UINT64 LowerMemorySize;
+ UINT32 PeiMemoryCap;
if (mBootMode == BOOT_ON_S3_RESUME) {
MemoryBase = PcdGet32 (PcdS3AcpiReservedMemoryBase);
@@ -107,13 +151,24 @@ PublishPeiMemory (
LowerMemorySize = GetSystemMemorySizeBelow4gb ();
//
+ // For the minimum address width of 36, installing 64 MB as permanent PEI
+ // RAM is sufficient. For the maximum width, the DXE IPL needs a bit more
+ // than 1 GB for paging structures. Therefore we establish an exponential
+ // formula so that the 48-36+1=13 different widths map to permanent PEI RAM
+ // sizes in [64 MB, 2 GB], that is [1<<26, 1<<31]; 6 different powers.
+ //
+ PeiMemoryCap = SIZE_64MB << ((mPhysMemAddressWidth - 36) * 5 / 12);
for p in map(lambda a: (a, 64 << ((a-36) * 5 / 12)), range(36,49)): print '%d bits => %d MB' % p
...
36 bits => 64 MB
37 bits => 64 MB
38 bits => 64 MB
39 bits => 128 MB
40 bits => 128 MB
41 bits => 256 MB
42 bits => 256 MB
43 bits => 256 MB
44 bits => 512 MB
45 bits => 512 MB
46 bits => 1024 MB
47 bits => 1024 MB
48 bits => 2048 MB
Interesting, but I still don't quite feel like it makes (intuitive)
sense to me. :)
What if you make a calculation of how big the tables should be and
then add 64 MB to it?
-Jordan
FWIW,
MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c:CreateIdentityMappingPageTables()
has the code which actually allocates the pages (as you point out in
your later email, but I was just about ready to send this one):

//
// Calculate the table entries needed.
//
if (PhysicalAddressBits <= 39 ) {
NumberOfPml4EntriesNeeded = 1;
NumberOfPdpEntriesNeeded = (UINT32)LShiftU64 (1,
(PhysicalAddressBits - 30));
} else {
NumberOfPml4EntriesNeeded = (UINT32)LShiftU64 (1,
(PhysicalAddressBits - 39));
NumberOfPdpEntriesNeeded = 512;
}

//
// Pre-allocate big pages to avoid later allocations.
//
if (!Page1GSupport) {
TotalPagesNum = (NumberOfPdpEntriesNeeded + 1) *
NumberOfPml4EntriesNeeded + 1;
} else {
TotalPagesNum = NumberOfPml4EntriesNeeded + 1;
}
BigPageAddress = (UINTN) AllocatePages (TotalPagesNum);

I stuck those calculations in gnumeric (assuming Page1GSupport==FALSE)
and got this:

PABits Pml4 Pdp TotPgs MB
------ ---- --- ------ --
36 1 64 66 0.258
37 1 128 130 0.508
38 1 256 258 1.008
39 1 512 514 2.008
40 2 512 1027 4.012
41 4 512 2053 8.020
42 8 512 4105 16.035
43 16 512 8209 32.066
44 32 512 16417 64.129
45 64 512 32833 128.254
46 128 512 65665 256.504
47 256 512 131329 513.004
48 512 512 262657 1026.004

So 48 bits of PA should take just over a GB of page tables.

Can you set PcdUse1GPageTable=TRUE? That vastly reduces the number of
page table pages needed, and vastly reduces the time needed to
initialize them. Just wondering. (I've found that some older Microsoft
boot loaders don't like this setting, but I haven't tried the newer
ones. Linux is fine with it.)
--
Brian J. Johnson

--------------------------------------------------------------------

My statements are my own, are not authorized by SGI, and do not
necessarily represent SGI’s positions.
Jordan Justen
2015-06-22 19:53:14 UTC
Permalink
Post by Brian J. Johnson
I stuck those calculations in gnumeric (assuming Page1GSupport==FALSE)
PABits Pml4 Pdp TotPgs MB
------ ---- --- ------ --
36 1 64 66 0.258
37 1 128 130 0.508
38 1 256 258 1.008
39 1 512 514 2.008
40 2 512 1027 4.012
41 4 512 2053 8.020
42 8 512 4105 16.035
43 16 512 8209 32.066
44 32 512 16417 64.129
45 64 512 32833 128.254
46 128 512 65665 256.504
47 256 512 131329 513.004
48 512 512 262657 1026.004
So 48 bits of PA should take just over a GB of page tables.
Can you set PcdUse1GPageTable=TRUE? That vastly reduces the number of
page table pages needed, and vastly reduces the time needed to
initialize them. Just wondering. (I've found that some older Microsoft
boot loaders don't like this setting, but I haven't tried the newer
ones. Linux is fine with it.)
Sound kind of complicated to tell if it is okay to use. Although,
maybe if we just use it dynamically when memory space is larger than
say 36 bits, then perhaps the risk of people running an unsupported
configuration is also low.

-Jordan
Laszlo Ersek
2015-06-23 07:26:47 UTC
Permalink
Post by Jordan Justen
Post by Brian J. Johnson
I stuck those calculations in gnumeric (assuming Page1GSupport==FALSE)
PABits Pml4 Pdp TotPgs MB
------ ---- --- ------ --
36 1 64 66 0.258
37 1 128 130 0.508
38 1 256 258 1.008
39 1 512 514 2.008
40 2 512 1027 4.012
41 4 512 2053 8.020
42 8 512 4105 16.035
43 16 512 8209 32.066
44 32 512 16417 64.129
45 64 512 32833 128.254
46 128 512 65665 256.504
47 256 512 131329 513.004
48 512 512 262657 1026.004
So 48 bits of PA should take just over a GB of page tables.
Thank you Brian for this!
Post by Jordan Justen
Post by Brian J. Johnson
Can you set PcdUse1GPageTable=TRUE? That vastly reduces the number of
page table pages needed, and vastly reduces the time needed to
initialize them. Just wondering. (I've found that some older Microsoft
boot loaders don't like this setting, but I haven't tried the newer
ones. Linux is fine with it.)
Sound kind of complicated to tell if it is okay to use. Although,
maybe if we just use it dynamically when memory space is larger than
say 36 bits, then perhaps the risk of people running an unsupported
configuration is also low.
How about I simply add a switch statement and hardcode the values from
Brian's table, and after the switch I add 64MB for "misc"?

Thanks
Laszlo
Brian J. Johnson
2015-06-23 14:51:11 UTC
Permalink
Post by Laszlo Ersek
Post by Jordan Justen
Post by Brian J. Johnson
I stuck those calculations in gnumeric (assuming Page1GSupport==FALSE)
PABits Pml4 Pdp TotPgs MB
------ ---- --- ------ --
36 1 64 66 0.258
37 1 128 130 0.508
38 1 256 258 1.008
39 1 512 514 2.008
40 2 512 1027 4.012
41 4 512 2053 8.020
42 8 512 4105 16.035
43 16 512 8209 32.066
44 32 512 16417 64.129
45 64 512 32833 128.254
46 128 512 65665 256.504
47 256 512 131329 513.004
48 512 512 262657 1026.004
So 48 bits of PA should take just over a GB of page tables.
Thank you Brian for this!
Post by Jordan Justen
Post by Brian J. Johnson
Can you set PcdUse1GPageTable=TRUE? That vastly reduces the number of
page table pages needed, and vastly reduces the time needed to
initialize them. Just wondering. (I've found that some older Microsoft
boot loaders don't like this setting, but I haven't tried the newer
ones. Linux is fine with it.)
Sound kind of complicated to tell if it is okay to use. Although,
maybe if we just use it dynamically when memory space is larger than
say 36 bits, then perhaps the risk of people running an unsupported
configuration is also low.
How about I simply add a switch statement and hardcode the values from
Brian's table, and after the switch I add 64MB for "misc"?
Personally I can't stand hardcoded constants. I'd much rather see the
"real" calculation, based on the code from
CreateIdentityMappingPageTables(). It's pretty straightforward, really:
calculate the number of PML4 and PDP entries needed based on the PA
size, via a couple of shifts. Then convert that to pages: one page for
the PML4s, one page per PML4 for the PDPs, and one page per PDP for the
PDEs:

Pages = 1 + Pml4 + (Pml4 * Pdp) = (Pdp + 1) * Pml4 + 1

If you're using 1GB pages, then there are no PDEs, so:

Pages = 1 + Pml4

That matches the code in CreateIdentityMappingPageTables(). And it's
all architectural for x86, so it's not likely to change.

If you want to simplify the calculations a bit, you could assume at
least 39 PA bits. Then the only variable is the number of PML4 entries
needed, which is just 1 << (PASize - 39).

But that's my opinion. There's always some estimating involved in code
like this, since you don't know exactly how much memory will be
allocated dynamically.
--
Brian

--------------------------------------------------------------------

BASIC silently prevents you from tying your shoelaces together.
Pascal says, "You can't tie your shoelaces together."
C says, "Hmm, this person wants to tie his shoelaces together. Okay."
C++ finds a general instance of tying two things together and then
applies that to your shoelaces.
Prolog tries to prove your shoelaces are not tied together.
Perl provides primitives for square knots, slip knots,
granny knots, ...
-- Michael Rawdon et. al.
Laszlo Ersek
2015-06-23 16:04:50 UTC
Permalink
Post by Brian J. Johnson
Post by Laszlo Ersek
Post by Jordan Justen
Post by Brian J. Johnson
I stuck those calculations in gnumeric (assuming Page1GSupport==FALSE)
PABits Pml4 Pdp TotPgs MB
------ ---- --- ------ --
36 1 64 66 0.258
37 1 128 130 0.508
38 1 256 258 1.008
39 1 512 514 2.008
40 2 512 1027 4.012
41 4 512 2053 8.020
42 8 512 4105 16.035
43 16 512 8209 32.066
44 32 512 16417 64.129
45 64 512 32833 128.254
46 128 512 65665 256.504
47 256 512 131329 513.004
48 512 512 262657 1026.004
So 48 bits of PA should take just over a GB of page tables.
Thank you Brian for this!
Post by Jordan Justen
Post by Brian J. Johnson
Can you set PcdUse1GPageTable=TRUE? That vastly reduces the number of
page table pages needed, and vastly reduces the time needed to
initialize them. Just wondering. (I've found that some older Microsoft
boot loaders don't like this setting, but I haven't tried the newer
ones. Linux is fine with it.)
Sound kind of complicated to tell if it is okay to use. Although,
maybe if we just use it dynamically when memory space is larger than
say 36 bits, then perhaps the risk of people running an unsupported
configuration is also low.
How about I simply add a switch statement and hardcode the values from
Brian's table, and after the switch I add 64MB for "misc"?
Personally I can't stand hardcoded constants. I'd much rather see the
"real" calculation, based on the code from
CreateIdentityMappingPageTables().
Unless the real calculation is extracted from
CreateIdentityMappingPageTables() into a library, and reused by both the
original function and the new OVMF call site (via function calls), we're
just talking different degrees of open coding. In one case the results
of the calculation would be hard-coded, in the other case the
calculation itself would be hard-coded (but still not shared with the
code that actually allocates the memory).

If the algorithm is architecturally dictated, and uniquely defined, then
so are its results.
Post by Brian J. Johnson
calculate the number of PML4 and PDP entries needed based on the PA
size, via a couple of shifts. Then convert that to pages: one page for
the PML4s, one page per PML4 for the PDPs, and one page per PDP for the
Pages = 1 + Pml4 + (Pml4 * Pdp) = (Pdp + 1) * Pml4 + 1
Pages = 1 + Pml4
That matches the code in CreateIdentityMappingPageTables(). And it's
all architectural for x86, so it's not likely to change.
Thanks for summarizing it.

Although I still disagree with replicating the calculation in OvmfPkg, I
can try implementing it.

But then, please don't point out that this code is specific to X64 (and
Ia32X64), and a 32-bit DXE works differently, and that *too* should be
replicated in the OVMF code. That's just hugely overkill. The patch I
submitted is agnostic to the word size of DXE.

Actually: does anyone feel inclined to take on this patch? (Feel free to
grab this patch as basis.)

Thanks
Laszlo
Post by Brian J. Johnson
If you want to simplify the calculations a bit, you could assume at
least 39 PA bits. Then the only variable is the number of PML4 entries
needed, which is just 1 << (PASize - 39).
But that's my opinion. There's always some estimating involved in code
like this, since you don't know exactly how much memory will be
allocated dynamically.
Brian J. Johnson
2015-06-23 16:30:52 UTC
Permalink
Post by Laszlo Ersek
Post by Brian J. Johnson
Post by Laszlo Ersek
Post by Jordan Justen
Post by Brian J. Johnson
I stuck those calculations in gnumeric (assuming Page1GSupport==FALSE)
PABits Pml4 Pdp TotPgs MB
------ ---- --- ------ --
36 1 64 66 0.258
37 1 128 130 0.508
38 1 256 258 1.008
39 1 512 514 2.008
40 2 512 1027 4.012
41 4 512 2053 8.020
42 8 512 4105 16.035
43 16 512 8209 32.066
44 32 512 16417 64.129
45 64 512 32833 128.254
46 128 512 65665 256.504
47 256 512 131329 513.004
48 512 512 262657 1026.004
So 48 bits of PA should take just over a GB of page tables.
Thank you Brian for this!
Post by Jordan Justen
Post by Brian J. Johnson
Can you set PcdUse1GPageTable=TRUE? That vastly reduces the number of
page table pages needed, and vastly reduces the time needed to
initialize them. Just wondering. (I've found that some older Microsoft
boot loaders don't like this setting, but I haven't tried the newer
ones. Linux is fine with it.)
Sound kind of complicated to tell if it is okay to use. Although,
maybe if we just use it dynamically when memory space is larger than
say 36 bits, then perhaps the risk of people running an unsupported
configuration is also low.
How about I simply add a switch statement and hardcode the values from
Brian's table, and after the switch I add 64MB for "misc"?
Personally I can't stand hardcoded constants. I'd much rather see the
"real" calculation, based on the code from
CreateIdentityMappingPageTables().
Unless the real calculation is extracted from
CreateIdentityMappingPageTables() into a library, and reused by both the
original function and the new OVMF call site (via function calls), we're
just talking different degrees of open coding. In one case the results
of the calculation would be hard-coded, in the other case the
calculation itself would be hard-coded (but still not shared with the
code that actually allocates the memory).
If the algorithm is architecturally dictated, and uniquely defined, then
so are its results.
Post by Brian J. Johnson
calculate the number of PML4 and PDP entries needed based on the PA
size, via a couple of shifts. Then convert that to pages: one page for
the PML4s, one page per PML4 for the PDPs, and one page per PDP for the
Pages = 1 + Pml4 + (Pml4 * Pdp) = (Pdp + 1) * Pml4 + 1
Pages = 1 + Pml4
That matches the code in CreateIdentityMappingPageTables(). And it's
all architectural for x86, so it's not likely to change.
Thanks for summarizing it.
Although I still disagree with replicating the calculation in OvmfPkg, I
can try implementing it.
Whatever... I'm not going to dictate how you code it (not that my
opinion should have much weight for OVMF in the first place... I don't
regularly contribute to it.) As you said, it's a matter of duplicating
code one way or the other.
Post by Laszlo Ersek
But then, please don't point out that this code is specific to X64 (and
Ia32X64), and a 32-bit DXE works differently, and that *too* should be
replicated in the OVMF code. That's just hugely overkill. The patch I
submitted is agnostic to the word size of DXE.
Probably just Ia32X64. 32-bit DXE doesn't need page tables, and 64-bit
PEI already generates them somewhere else.
Post by Laszlo Ersek
Actually: does anyone feel inclined to take on this patch? (Feel free to
grab this patch as basis.)
Thanks
Laszlo
Post by Brian J. Johnson
If you want to simplify the calculations a bit, you could assume at
least 39 PA bits. Then the only variable is the number of PML4 entries
needed, which is just 1 << (PASize - 39).
But that's my opinion. There's always some estimating involved in code
like this, since you don't know exactly how much memory will be
allocated dynamically.
--
Brian

--------------------------------------------------------------------
Laszlo Ersek
2015-06-16 17:05:25 UTC
Permalink
Maoming reported that guest memory sizes equal to or larger than 64GB
were not correctly handled by OVMF.

Enabling the DEBUG_GCD (0x00100000) bit in PcdDebugPrintErrorLevel, and
starting QEMU with 64GB guest RAM size, I found the following error in the
GCD:AddMemorySpace(Base=0000000100000000,Length=0000000F40000000)
GcdMemoryType = Reserved
Capabilities = 030000000000000F
Status = Unsupported
This message is emitted when the DXE core is initializing the memory space
map, processing the "above 4GB" memory resource descriptor HOB that was
created by OVMF's QemuInitializeRam() function (see "UpperMemorySize").

The DXE core's call chain fails in:

CoreInternalAddMemorySpace() [MdeModulePkg/Core/Dxe/Gcd/Gcd.c]
CoreConvertSpace()
//
// Search for the list of descriptors that cover the range BaseAddress
// to BaseAddress+Length
//
CoreSearchGcdMapEntry()

CoreSearchGcdMapEntry() fails because the one entry (with type
"nonexistent") in the initial GCD memory space map is too small, and
GCD:Initial GCD Memory Space Map
GCDMemType Range Capabilities Attributes
========== ================================= ================ ================
NonExist 0000000000000000-0000000FFFFFFFFF 0000000000000000 0000000000000000
The size of this initial entry is determined from the CPU HOB
(CoreInitializeGcdServices()).

Set the SizeOfMemorySpace field in the CPU HOB to mPhysMemAddressWidth,
which is the narrowest valid value to cover the entire guest RAM.

Reported-by: Maoming <***@huawei.com>
Cc: Maoming <***@huawei.com>
Cc: Huangpeng (Peter) <***@huawei.com>
Cc: Wei Liu <***@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <***@redhat.com>
Tested-by: Wei Liu <***@citrix.com>
Tested-by: Maoming <***@huawei.com>
---
OvmfPkg/PlatformPei/Platform.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/OvmfPkg/PlatformPei/Platform.c b/OvmfPkg/PlatformPei/Platform.c
index 54ec822..2a1b577 100644
--- a/OvmfPkg/PlatformPei/Platform.c
+++ b/OvmfPkg/PlatformPei/Platform.c
@@ -247,9 +247,11 @@ MiscInitialization (
IoOr8 (0x92, BIT1);

//
- // Build the CPU hob with 36-bit addressing and 16-bits of IO space.
+ // Build the CPU HOB with guest RAM size dependent address width and 16-bits
+ // of IO space. (Side note: unlike other HOBs, the CPU HOB is needed during
+ // S3 resume as well, so we build it unconditionally.)
//
- BuildCpuHob (36, 16);
+ BuildCpuHob (mPhysMemAddressWidth, 16);

//
// Determine platform type and save Host Bridge DID to PCD
--
1.8.3.1



------------------------------------------------------------------------------
Laszlo Ersek
2015-06-16 17:05:26 UTC
Permalink
Build the memory HOBs in a tight block, in increasing base address order.

Cc: Maoming <***@huawei.com>
Cc: Huangpeng (Peter) <***@huawei.com>
Cc: Wei Liu <***@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <***@redhat.com>
Tested-by: Maoming <***@huawei.com>
---
OvmfPkg/PlatformPei/MemDetect.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index 6b424f7..1228063 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
@@ -207,8 +207,11 @@ QemuInitializeRam (
//
// Create memory HOBs
//
- AddMemoryRangeHob (BASE_1MB, LowerMemorySize);
AddMemoryRangeHob (0, BASE_512KB + BASE_128KB);
+ AddMemoryRangeHob (BASE_1MB, LowerMemorySize);
+ if (UpperMemorySize != 0) {
+ AddUntestedMemoryBaseSizeHob (BASE_4GB, UpperMemorySize);
+ }
}

MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB, CacheWriteBack);
@@ -216,10 +219,6 @@ QemuInitializeRam (
MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, CacheWriteBack);

if (UpperMemorySize != 0) {
- if (mBootMode != BOOT_ON_S3_RESUME) {
- AddUntestedMemoryBaseSizeHob (BASE_4GB, UpperMemorySize);
- }
-
MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack);
}
}
--
1.8.3.1



------------------------------------------------------------------------------
Laszlo Ersek
2015-06-16 17:05:27 UTC
Permalink
At the moment we work with a UC default MTRR type, and set three memory
ranges to WB:
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).

Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.

This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.

So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).

Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.

Cc: Maoming <***@huawei.com>
Cc: Huangpeng (Peter) <***@huawei.com>
Cc: Wei Liu <***@citrix.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <***@redhat.com>
Tested-by: Maoming <***@huawei.com>
---
OvmfPkg/PlatformPei/MemDetect.c | 43 +++++++++++++++++++++++++++++++++++++----
1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index 1228063..e872126 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
@@ -194,6 +194,8 @@ QemuInitializeRam (
{
UINT64 LowerMemorySize;
UINT64 UpperMemorySize;
+ MTRR_SETTINGS MtrrSettings;
+ EFI_STATUS Status;

DEBUG ((EFI_D_INFO, "%a called\n", __FUNCTION__));

@@ -214,12 +216,45 @@ QemuInitializeRam (
}
}

- MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB, CacheWriteBack);
+ //
+ // We'd like to keep the following ranges uncached:
+ // - [640 KB, 1 MB)
+ // - [LowerMemorySize, 4 GB)
+ //
+ // Everything else should be WB. Unfortunately, programming the inverse (ie.
+ // keeping the default UC, and configuring the complement set of the above as
+ // WB) is not reliable in general, because the end of the upper RAM can have
+ // practically any alignment, and we may not have enough variable MTRRs to
+ // cover it exactly.
+ //
+ if (IsMtrrSupported ()) {
+ MtrrGetAllMtrrs (&MtrrSettings);

- MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, CacheWriteBack);
+ //
+ // MTRRs disabled, fixed MTRRs disabled, default type is uncached
+ //
+ ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0);
+ ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0);
+ ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0);

- if (UpperMemorySize != 0) {
- MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack);
+ //
+ // flip default type to writeback
+ //
+ SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06);
+ ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables);
+ MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6;
+ MtrrSetAllMtrrs (&MtrrSettings);
+
+ //
+ // punch holes
+ //
+ Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB,
+ SIZE_256KB + SIZE_128KB, CacheUncacheable);
+ ASSERT_EFI_ERROR (Status);
+
+ Status = MtrrSetMemoryAttribute (LowerMemorySize,
+ SIZE_4GB - LowerMemorySize, CacheUncacheable);
+ ASSERT_EFI_ERROR (Status);
}
}
--
1.8.3.1


------------------------------------------------------------------------------
Jordan Justen
2015-06-22 18:55:17 UTC
Permalink
Post by Laszlo Ersek
At the moment we work with a UC default MTRR type, and set three memory
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).
Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.
This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).
Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.
Contributed-under: TianoCore Contribution Agreement 1.0
---
OvmfPkg/PlatformPei/MemDetect.c | 43 +++++++++++++++++++++++++++++++++++++----
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index 1228063..e872126 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
@@ -194,6 +194,8 @@ QemuInitializeRam (
{
UINT64 LowerMemorySize;
UINT64 UpperMemorySize;
+ MTRR_SETTINGS MtrrSettings;
+ EFI_STATUS Status;
DEBUG ((EFI_D_INFO, "%a called\n", __FUNCTION__));
@@ -214,12 +216,45 @@ QemuInitializeRam (
}
}
- MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB, CacheWriteBack);
+ //
+ // - [640 KB, 1 MB)
+ // - [LowerMemorySize, 4 GB)
+ //
+ // Everything else should be WB. Unfortunately, programming the inverse (ie.
+ // keeping the default UC, and configuring the complement set of the above as
+ // WB) is not reliable in general, because the end of the upper RAM can have
+ // practically any alignment, and we may not have enough variable MTRRs to
+ // cover it exactly.
+ //
+ if (IsMtrrSupported ()) {
+ MtrrGetAllMtrrs (&MtrrSettings);
- MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, CacheWriteBack);
+ //
+ // MTRRs disabled, fixed MTRRs disabled, default type is uncached
+ //
+ ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0);
+ ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0);
+ ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0);
- if (UpperMemorySize != 0) {
- MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack);
+ //
+ // flip default type to writeback
+ //
+ SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06);
+ ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables);
+ MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6;
+ MtrrSetAllMtrrs (&MtrrSettings);
+
+ //
+ // punch holes
What about: "Set memory from 640KB - 1MB to uncacheable"
Post by Laszlo Ersek
+ //
+ Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB,
+ SIZE_256KB + SIZE_128KB, CacheUncacheable);
What about instead specifying length as:

BASE_1MB - (BASE_512KB + BASE_128KB)
Post by Laszlo Ersek
+ ASSERT_EFI_ERROR (Status);
+
How about a comment: "Set MMIO just below 4GB to uncacheable"
Post by Laszlo Ersek
+ Status = MtrrSetMemoryAttribute (LowerMemorySize,
+ SIZE_4GB - LowerMemorySize, CacheUncacheable);
+ ASSERT_EFI_ERROR (Status);
}
}
--
1.8.3.1
------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
https://lists.sourceforge.net/lists/listinfo/edk2-devel
Laszlo Ersek
2015-06-22 19:10:34 UTC
Permalink
Post by Jordan Justen
Post by Laszlo Ersek
At the moment we work with a UC default MTRR type, and set three memory
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).
Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.
This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).
Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.
Contributed-under: TianoCore Contribution Agreement 1.0
---
OvmfPkg/PlatformPei/MemDetect.c | 43 +++++++++++++++++++++++++++++++++++++----
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index 1228063..e872126 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
@@ -194,6 +194,8 @@ QemuInitializeRam (
{
UINT64 LowerMemorySize;
UINT64 UpperMemorySize;
+ MTRR_SETTINGS MtrrSettings;
+ EFI_STATUS Status;
DEBUG ((EFI_D_INFO, "%a called\n", __FUNCTION__));
@@ -214,12 +216,45 @@ QemuInitializeRam (
}
}
- MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB, CacheWriteBack);
+ //
+ // - [640 KB, 1 MB)
+ // - [LowerMemorySize, 4 GB)
+ //
+ // Everything else should be WB. Unfortunately, programming the inverse (ie.
+ // keeping the default UC, and configuring the complement set of the above as
+ // WB) is not reliable in general, because the end of the upper RAM can have
+ // practically any alignment, and we may not have enough variable MTRRs to
+ // cover it exactly.
+ //
+ if (IsMtrrSupported ()) {
+ MtrrGetAllMtrrs (&MtrrSettings);
- MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, CacheWriteBack);
+ //
+ // MTRRs disabled, fixed MTRRs disabled, default type is uncached
+ //
+ ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0);
+ ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0);
+ ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0);
- if (UpperMemorySize != 0) {
- MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack);
+ //
+ // flip default type to writeback
+ //
+ SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06);
+ ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables);
+ MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6;
+ MtrrSetAllMtrrs (&MtrrSettings);
+
+ //
+ // punch holes
What about: "Set memory from 640KB - 1MB to uncacheable"
Okay.
Post by Jordan Justen
Post by Laszlo Ersek
+ //
+ Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB,
+ SIZE_256KB + SIZE_128KB, CacheUncacheable);
BASE_1MB - (BASE_512KB + BASE_128KB)
Will do.
Post by Jordan Justen
Post by Laszlo Ersek
+ ASSERT_EFI_ERROR (Status);
+
How about a comment: "Set MMIO just below 4GB to uncacheable"
Hmm, "just below" doesn't quite reflect (to me) that it is supposed to
cover the entire 32-bit PCI hole, which can be gigabytes big. Do you
have a proposal that reflects this more accurately?
Post by Jordan Justen
Post by Laszlo Ersek
+ Status = MtrrSetMemoryAttribute (LowerMemorySize,
+ SIZE_4GB - LowerMemorySize, CacheUncacheable);
+ ASSERT_EFI_ERROR (Status);
Thank you. (I'll post a v2 later.)
Laszlo
Post by Jordan Justen
Post by Laszlo Ersek
}
}
--
1.8.3.1
------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
https://lists.sourceforge.net/lists/listinfo/edk2-devel
Laszlo Ersek
2015-06-23 17:38:45 UTC
Permalink
Post by Laszlo Ersek
Post by Jordan Justen
Post by Laszlo Ersek
At the moment we work with a UC default MTRR type, and set three memory
- [0, 640 KB),
- [1 MB, LowerMemorySize),
- [4 GB, 4 GB + UpperMemorySize).
Unfortunately, coverage for the third range can fail with a high
likelihood. If the alignment of the base (ie. 4 GB) and the alignment of
the size (UpperMemorySize) differ, then MtrrLib creates a series of
variable MTRR entries, with power-of-two sized MTRR masks. And, it's
really easy to run out of variable MTRR entries, dependent on the
alignment difference.
This is a problem because a Linux guest will loudly reject any high memory
that is not covered my MTRR.
- flip the MTRR default type to WB,
- set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
type and variable MTRRs, so we can't avoid this,
- set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
- set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
more likely than the other scheme (due to less chaotic alignment
differences).
Effects of this patch can be observed by setting DEBUG_CACHE (0x00200000)
in PcdDebugPrintErrorLevel.
Contributed-under: TianoCore Contribution Agreement 1.0
---
OvmfPkg/PlatformPei/MemDetect.c | 43 +++++++++++++++++++++++++++++++++++++----
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/OvmfPkg/PlatformPei/MemDetect.c b/OvmfPkg/PlatformPei/MemDetect.c
index 1228063..e872126 100644
--- a/OvmfPkg/PlatformPei/MemDetect.c
+++ b/OvmfPkg/PlatformPei/MemDetect.c
@@ -194,6 +194,8 @@ QemuInitializeRam (
{
UINT64 LowerMemorySize;
UINT64 UpperMemorySize;
+ MTRR_SETTINGS MtrrSettings;
+ EFI_STATUS Status;
DEBUG ((EFI_D_INFO, "%a called\n", __FUNCTION__));
@@ -214,12 +216,45 @@ QemuInitializeRam (
}
}
- MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB, CacheWriteBack);
+ //
+ // - [640 KB, 1 MB)
+ // - [LowerMemorySize, 4 GB)
+ //
+ // Everything else should be WB. Unfortunately, programming the inverse (ie.
+ // keeping the default UC, and configuring the complement set of the above as
+ // WB) is not reliable in general, because the end of the upper RAM can have
+ // practically any alignment, and we may not have enough variable MTRRs to
+ // cover it exactly.
+ //
+ if (IsMtrrSupported ()) {
+ MtrrGetAllMtrrs (&MtrrSettings);
- MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, CacheWriteBack);
+ //
+ // MTRRs disabled, fixed MTRRs disabled, default type is uncached
+ //
+ ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0);
+ ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0);
+ ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0);
- if (UpperMemorySize != 0) {
- MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack);
+ //
+ // flip default type to writeback
+ //
+ SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06);
+ ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables);
+ MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6;
+ MtrrSetAllMtrrs (&MtrrSettings);
+
+ //
+ // punch holes
What about: "Set memory from 640KB - 1MB to uncacheable"
Okay.
Post by Jordan Justen
Post by Laszlo Ersek
+ //
+ Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB,
+ SIZE_256KB + SIZE_128KB, CacheUncacheable);
BASE_1MB - (BASE_512KB + BASE_128KB)
Will do.
Post by Jordan Justen
Post by Laszlo Ersek
+ ASSERT_EFI_ERROR (Status);
+
How about a comment: "Set MMIO just below 4GB to uncacheable"
Hmm, "just below" doesn't quite reflect (to me) that it is supposed to
cover the entire 32-bit PCI hole, which can be gigabytes big. Do you
have a proposal that reflects this more accurately?
Nevermind, I went with your initial suggestion.
Post by Laszlo Ersek
Post by Jordan Justen
Post by Laszlo Ersek
+ Status = MtrrSetMemoryAttribute (LowerMemorySize,
+ SIZE_4GB - LowerMemorySize, CacheUncacheable);
+ ASSERT_EFI_ERROR (Status);
Thank you. (I'll post a v2 later.)
Laszlo
Post by Jordan Justen
Post by Laszlo Ersek
}
}
--
1.8.3.1
------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
https://lists.sourceforge.net/lists/listinfo/edk2-devel
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
edk2-devel mailing list
https://lists.sourceforge.net/lists/listinfo/edk2-devel
Loading...