Discussion:
[edk2] [RFC] edk2 support for a new QEMU device - PXB (PCI Expander Device)
Marcel Apfelbaum
2015-06-02 15:04:43 UTC
Permalink
Hi,

The following series:
- [Qemu-devel] [PATCH V8 00/17] hw/pc: implement multiple primary busses for pc machines
- https://www.mail-archive.com/qemu-***@nongnu.org/msg300089.html
adds a PCI Expander Device to QEMU that exposes a new PCI root bus.

The PXB is a "light-weight" host bridge whose purpose is to enable
the main host bridge to support multiple PCI root buses.

It does not have its own registers for configuration cycles, but is snoops
on main host bridge registers and it lives on the same PCI segment.

The device receives from the command line the bus number and expects the
firmware (bios/UEFI) to probe the bus for devices behind it and
configure them.

My question is how can it be supported in edk2? Are there any architecture
limitations that will prevent it to work?

My edk2/UEFI knowledge is rather limited, but I did see in the spec
that there is support for this kind of device:

13.1.1 PCI Root Bridge I/O Overview
...
Depending on the chipset, a single EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL may abstract a portion of a PCI
Segment, or an entire PCI Segment. A PCI Host Bridge may produce one or more PCI Root
Bridges. When a PCI Host Bridge produces multiple PCI Root Bridges, it is possible to have
more than one PCI Segment.
...

It seems that multiple EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances for the same
PCI Host Bridge mapped into the same PCI Segment is the answer. First instance
belongs to the "main" host bridge and the other to the PXBs.

The open questions are of course how to assign resources (bus numbers/IO/MEM)
to the EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.

For the bus numbers I think that the PCI Host Bridge can scan the 0x0 - 0xff range
and build incrementally the bus ranges.

Regarding IO/MEM ranges I am still not sure.
The way it is done in SeaBIOS is that all devices behind PXB root bus are
"considered" as being behind bus 0 for resources allocation.
Once the resources allocation is done, each EFI_PCI_ROOT_BRIDGE
gets the list of MEM/IO ranges corresponding with the devices behind them.

Any comments and suggestions would be greatly appreciated.
Thank you in advance,
Marcel

------------------------------------------------------------------------------
Laszlo Ersek
2015-06-02 16:25:27 UTC
Permalink
Post by Marcel Apfelbaum
Hi,
- [Qemu-devel] [PATCH V8 00/17] hw/pc: implement multiple primary
busses for pc machines
adds a PCI Expander Device to QEMU that exposes a new PCI root bus.
(Let's tie this thread to the v7 question too:

http://thread.gmane.org/gmane.comp.emulators.qemu/338583/focus=338599
)
Post by Marcel Apfelbaum
The PXB is a "light-weight" host bridge whose purpose is to enable
the main host bridge to support multiple PCI root buses.
It does not have its own registers for configuration cycles, but is
snoops on main host bridge registers and it lives on the same PCI
segment.
The device receives from the command line the bus number and expects
the firmware (bios/UEFI) to probe the bus for devices behind it and
configure them.
My question is how can it be supported in edk2? Are there any
architecture limitations that will prevent it to work?
My edk2/UEFI knowledge is rather limited, but I did see in the spec
13.1.1 PCI Root Bridge I/O Overview
...
Depending on the chipset, a single
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL may abstract a portion of a PCI
Segment, or an entire PCI Segment. A PCI Host Bridge may produce
one or more PCI Root Bridges. When a PCI Host Bridge produces
multiple PCI Root Bridges, it is possible to have more than one
PCI Segment.
...
It seems that multiple EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances for
the same PCI Host Bridge mapped into the same PCI Segment is the
answer. First instance belongs to the "main" host bridge and the other
to the PXBs.
The open questions are of course how to assign resources (bus
numbers/IO/MEM) to the EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
For the bus numbers I think that the PCI Host Bridge can scan the 0x0
- 0xff range and build incrementally the bus ranges.
Regarding IO/MEM ranges I am still not sure. The way it is done in
SeaBIOS is that all devices behind PXB root bus are "considered" as
being behind bus 0 for resources allocation. Once the resources
allocation is done, each EFI_PCI_ROOT_BRIDGE gets the list of MEM/IO
ranges corresponding with the devices behind them.
Any comments and suggestions would be greatly appreciated.
Thank you in advance,
Marcel
I'm attaching a horrible patch (applies on top of edk2 SVN r17543, aka
git commit d4848bb9df) that allows OVMF to recognize the e1000 NIC with
the following QEMU command line:

ISO=/mnt/data/isos/Fedora-Live-Xfce-x86_64-20-1.iso
CODE=/home/virt-images/OVMF_CODE.fd
TMPL=/home/virt-images/OVMF_VARS.fd

cp $TMPL vars.fd

qemu-system-x86_64 \
-m 2048 \
-M pc \
-enable-kvm \
-device qxl-vga \
-drive if=pflash,readonly,format=raw,file=$CODE \
-drive if=pflash,format=raw,file=vars.fd \
-drive id=cdrom,if=none,readonly,format=raw,file=$ISO \
-device virtio-scsi-pci,id=scsi0 \
-device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
-debugcon file:debug.log \
-global isa-debugcon.iobase=0x402 \
-device pxb,id=bridge1,bus_nr=128 \
-netdev user,id=netdev0,hostfwd=tcp:127.0.0.1:2222-:22 \
-device e1000,netdev=netdev0,bus=bridge1,addr=1 \
-monitor stdio

With this hack in place, and using the above QEMU command line,
"debug.log" bears witness to the PCI enumeration succeeding, and the
"PCI" command in the UEFI shell lists the e1000 NIC.

I agree with your analysis that the way to support this QEMU feature in
OVMF is to produce several EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
Beyond that agreement, I must say that invalidating the assumption that
"there is only one root bridge" breaks about everything in OVMF.

Just skimming my hack-patch identifies the problems (or in some cases,
questions):

* The PCI host bridge driver under PcAtChipsetPkg needs to be cloned
under OvmfPkg, and (as you say) the bus ranges need to be determined
dynamically. On IRC you said that probing for device 0 on a bus is
sufficient to see if the bus lives, but for now I'm unsure if this would
be a layering violation or not for the UEFI protocols in question. Maybe
not.

* The bus ranges assigned to each "pxb" device (ie. root bridge) would
have to be able to accommodate any subordinate buses enumerated off that
root bridge. At least this is what PciRootBridgeEnumerator() in
"MdeModulePkg/Bus/Pci/PciBusDxe/PciEnumerator.c" seems to require. I've
got no clue how to size the bus ranges properly for the root bridges to
satisfy this.

* In fact, the bus range presented over the
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() function cannot be just
a range. As far as I tested the PCI bus driver (see path above), it
doesn't find anything if the range retrieved from
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() doesn't start *exactly*
with a live bus. In other words, it doesn't look for sibling buses, or
independent buses in the bus range exposed by the root bridge protocol.
It looks for *one* bus *at* the start of the range (and subordinate
buses hanging off of that). I have absolutely no clue why this is so,
but it means that for each pxb found, one root bridge io protocol would
have to be produced, and that proto should expose one bus range, with
the low end matching exactly the bus number, and the high end enabling
child buses to be enumerated.

* In the attached hack, I'm splitting the pre-patch, static, IO & MMIO
apertures in the middle. Maybe they could be the same shared ranges, as
you say. I don't know.

* In the OVMF BDS (boot device selection) code, we manually connect the
only one root bridge. This would have to be made dynamic, to connect all
of them. This connection basically amounts to "starting the enumeration".

* The OVMF boot order processing code hardcodes PciRoot(0x0) in a bunch
of device path matching logic. That would not be appropriate any longer.
In fact the above command line should boot the fedora live CD, but it
doesn't, and in the UEFI setup utility I cannot even browse the CD
filesystem.

* Gabriel wrote earlier some code for setting the INTx interrupt pin
registers of all PCI devices in OVMF's BDS. That code breaks now, an
assert is triggered ("PCI host bridge (00:00.0) should have no
interrupts"). Not sure why this happens.

* The UEFI device paths for the PCI root bridges (textually,
PciRoot(0x0), PciRoot(0x1) etc) actually start with ACPI device path
nodes. They consist of a PNP0A03 _HID and a numeric _UID. If my reading
of the UEFI spec is correct, the _UIDs that OVMF would assign to these
device path notes would have to match the *actual* ACPI payload that
QEMU exports. The _UID assignment is now static (just a 0), and my
hack-patch assigns a static 1 to the "other" root bridge's device path.
This is not good. OVMF would either have to parse ACPI payload
(horrible) or get the _UID<->pxb assignment via fw_cfg.

That's all the carnage I can think of right now, but I'm sure this is
just the tip of the iceberg. This would be a very large project, and
QEMU might have to expose a lot more info over fw_cfg than it does now.

In any case, the device model itself could be digestible for OVMF, based
on the results of this hack.

Thanks
Laszlo
Marcel Apfelbaum
2015-06-03 08:15:51 UTC
Permalink
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Hi,
- [Qemu-devel] [PATCH V8 00/17] hw/pc: implement multiple primary
busses for pc machines
adds a PCI Expander Device to QEMU that exposes a new PCI root bus.
http://thread.gmane.org/gmane.comp.emulators.qemu/338583/focus=338599
)
Post by Marcel Apfelbaum
The PXB is a "light-weight" host bridge whose purpose is to enable
the main host bridge to support multiple PCI root buses.
It does not have its own registers for configuration cycles, but is
snoops on main host bridge registers and it lives on the same PCI
segment.
The device receives from the command line the bus number and expects
the firmware (bios/UEFI) to probe the bus for devices behind it and
configure them.
My question is how can it be supported in edk2? Are there any
architecture limitations that will prevent it to work?
My edk2/UEFI knowledge is rather limited, but I did see in the spec
13.1.1 PCI Root Bridge I/O Overview
...
Depending on the chipset, a single
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL may abstract a portion of a PCI
Segment, or an entire PCI Segment. A PCI Host Bridge may produce
one or more PCI Root Bridges. When a PCI Host Bridge produces
multiple PCI Root Bridges, it is possible to have more than one
PCI Segment.
...
It seems that multiple EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances for
the same PCI Host Bridge mapped into the same PCI Segment is the
answer. First instance belongs to the "main" host bridge and the other
to the PXBs.
The open questions are of course how to assign resources (bus
numbers/IO/MEM) to the EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
For the bus numbers I think that the PCI Host Bridge can scan the 0x0
- 0xff range and build incrementally the bus ranges.
Regarding IO/MEM ranges I am still not sure. The way it is done in
SeaBIOS is that all devices behind PXB root bus are "considered" as
being behind bus 0 for resources allocation. Once the resources
allocation is done, each EFI_PCI_ROOT_BRIDGE gets the list of MEM/IO
ranges corresponding with the devices behind them.
Any comments and suggestions would be greatly appreciated.
Thank you in advance,
Marcel
I'm attaching a horrible patch (applies on top of edk2 SVN r17543, aka
git commit d4848bb9df) that allows OVMF to recognize the e1000 NIC with
ISO=/mnt/data/isos/Fedora-Live-Xfce-x86_64-20-1.iso
CODE=/home/virt-images/OVMF_CODE.fd
TMPL=/home/virt-images/OVMF_VARS.fd
cp $TMPL vars.fd
qemu-system-x86_64 \
-m 2048 \
-M pc \
-enable-kvm \
-device qxl-vga \
-drive if=pflash,readonly,format=raw,file=$CODE \
-drive if=pflash,format=raw,file=vars.fd \
-drive id=cdrom,if=none,readonly,format=raw,file=$ISO \
-device virtio-scsi-pci,id=scsi0 \
-device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
-debugcon file:debug.log \
-global isa-debugcon.iobase=0x402 \
-device pxb,id=bridge1,bus_nr=128 \
-netdev user,id=netdev0,hostfwd=tcp:127.0.0.1:2222-:22 \
-device e1000,netdev=netdev0,bus=bridge1,addr=1 \
-monitor stdio
With this hack in place, and using the above QEMU command line,
"debug.log" bears witness to the PCI enumeration succeeding, and the
"PCI" command in the UEFI shell lists the e1000 NIC.
Hi Laszlo,

These are very good news, that means that the device *can work* with edk2.
Post by Laszlo Ersek
I agree with your analysis that the way to support this QEMU feature in
OVMF is to produce several EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
Beyond that agreement, I must say that invalidating the assumption that
"there is only one root bridge" breaks about everything in OVMF.
Just skimming my hack-patch identifies the problems (or in some cases,
* The PCI host bridge driver under PcAtChipsetPkg needs to be cloned
under OvmfPkg, and (as you say) the bus ranges need to be determined
dynamically. On IRC you said that probing for device 0 on a bus is
sufficient to see if the bus lives, but for now I'm unsure if this would
be a layering violation or not for the UEFI protocols in question. Maybe
not.
* The bus ranges assigned to each "pxb" device (ie. root bridge) would
have to be able to accommodate any subordinate buses enumerated off that
root bridge. At least this is what PciRootBridgeEnumerator() in
"MdeModulePkg/Bus/Pci/PciBusDxe/PciEnumerator.c" seems to require. I've
got no clue how to size the bus ranges properly for the root bridges to
satisfy this.
* In fact, the bus range presented over the
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() function cannot be just
a range. As far as I tested the PCI bus driver (see path above), it
doesn't find anything if the range retrieved from
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() doesn't start *exactly*
with a live bus. In other words, it doesn't look for sibling buses, or
independent buses in the bus range exposed by the root bridge protocol.
It looks for *one* bus *at* the start of the range (and subordinate
buses hanging off of that). I have absolutely no clue why this is so,
but it means that for each pxb found, one root bridge io protocol would
have to be produced, and that proto should expose one bus range, with
the low end matching exactly the bus number, and the high end enabling
child buses to be enumerated.
* In the attached hack, I'm splitting the pre-patch, static, IO & MMIO
apertures in the middle. Maybe they could be the same shared ranges, as
you say. I don't know.
* In the OVMF BDS (boot device selection) code, we manually connect the
only one root bridge. This would have to be made dynamic, to connect all
of them. This connection basically amounts to "starting the enumeration".
* The OVMF boot order processing code hardcodes PciRoot(0x0) in a bunch
of device path matching logic. That would not be appropriate any longer.
In fact the above command line should boot the fedora live CD, but it
doesn't, and in the UEFI setup utility I cannot even browse the CD
filesystem.
* Gabriel wrote earlier some code for setting the INTx interrupt pin
registers of all PCI devices in OVMF's BDS. That code breaks now, an
assert is triggered ("PCI host bridge (00:00.0) should have no
interrupts"). Not sure why this happens.
* The UEFI device paths for the PCI root bridges (textually,
PciRoot(0x0), PciRoot(0x1) etc) actually start with ACPI device path
nodes. They consist of a PNP0A03 _HID and a numeric _UID. If my reading
of the UEFI spec is correct, the _UIDs that OVMF would assign to these
device path notes would have to match the *actual* ACPI payload that
QEMU exports. The _UID assignment is now static (just a 0), and my
hack-patch assigns a static 1 to the "other" root bridge's device path.
This is not good. OVMF would either have to parse ACPI payload
(horrible) or get the _UID<->pxb assignment via fw_cfg.
That's all the carnage I can think of right now, but I'm sure this is
just the tip of the iceberg. This would be a very large project, and
QEMU might have to expose a lot more info over fw_cfg than it does now.
In any case, the device model itself could be digestible for OVMF, based
on the results of this hack.
Thanks a lot for your analysis.
Since I am new to edk2, I cannot take this project by myself, but if PCI
guys can come up with a plan or design, I'll be glad to implement it,
or at least to contribute to it.


Thanks,
Marcel
Post by Laszlo Ersek
Thanks
Laszlo
------------------------------------------------------------------------------
Laszlo Ersek
2015-06-03 10:20:00 UTC
Permalink
Post by Marcel Apfelbaum
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Hi,
- [Qemu-devel] [PATCH V8 00/17] hw/pc: implement multiple primary
busses for pc machines
adds a PCI Expander Device to QEMU that exposes a new PCI root bus.
http://thread.gmane.org/gmane.comp.emulators.qemu/338583/focus=338599
)
Post by Marcel Apfelbaum
The PXB is a "light-weight" host bridge whose purpose is to enable
the main host bridge to support multiple PCI root buses.
It does not have its own registers for configuration cycles, but is
snoops on main host bridge registers and it lives on the same PCI
segment.
The device receives from the command line the bus number and expects
the firmware (bios/UEFI) to probe the bus for devices behind it and
configure them.
My question is how can it be supported in edk2? Are there any
architecture limitations that will prevent it to work?
My edk2/UEFI knowledge is rather limited, but I did see in the spec
13.1.1 PCI Root Bridge I/O Overview
...
Depending on the chipset, a single
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL may abstract a portion of a PCI
Segment, or an entire PCI Segment. A PCI Host Bridge may produce
one or more PCI Root Bridges. When a PCI Host Bridge produces
multiple PCI Root Bridges, it is possible to have more than one
PCI Segment.
...
It seems that multiple EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances for
the same PCI Host Bridge mapped into the same PCI Segment is the
answer. First instance belongs to the "main" host bridge and the other
to the PXBs.
The open questions are of course how to assign resources (bus
numbers/IO/MEM) to the EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
For the bus numbers I think that the PCI Host Bridge can scan the 0x0
- 0xff range and build incrementally the bus ranges.
Regarding IO/MEM ranges I am still not sure. The way it is done in
SeaBIOS is that all devices behind PXB root bus are "considered" as
being behind bus 0 for resources allocation. Once the resources
allocation is done, each EFI_PCI_ROOT_BRIDGE gets the list of MEM/IO
ranges corresponding with the devices behind them.
Any comments and suggestions would be greatly appreciated.
Thank you in advance,
Marcel
I'm attaching a horrible patch (applies on top of edk2 SVN r17543, aka
git commit d4848bb9df) that allows OVMF to recognize the e1000 NIC with
ISO=/mnt/data/isos/Fedora-Live-Xfce-x86_64-20-1.iso
CODE=/home/virt-images/OVMF_CODE.fd
TMPL=/home/virt-images/OVMF_VARS.fd
cp $TMPL vars.fd
qemu-system-x86_64 \
-m 2048 \
-M pc \
-enable-kvm \
-device qxl-vga \
-drive if=pflash,readonly,format=raw,file=$CODE \
-drive if=pflash,format=raw,file=vars.fd \
-drive id=cdrom,if=none,readonly,format=raw,file=$ISO \
-device virtio-scsi-pci,id=scsi0 \
-device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
-debugcon file:debug.log \
-global isa-debugcon.iobase=0x402 \
-device pxb,id=bridge1,bus_nr=128 \
-netdev user,id=netdev0,hostfwd=tcp:127.0.0.1:2222-:22 \
-device e1000,netdev=netdev0,bus=bridge1,addr=1 \
-monitor stdio
With this hack in place, and using the above QEMU command line,
"debug.log" bears witness to the PCI enumeration succeeding, and the
"PCI" command in the UEFI shell lists the e1000 NIC.
Hi Laszlo,
These are very good news, that means that the device *can work* with edk2.
Post by Laszlo Ersek
I agree with your analysis that the way to support this QEMU feature in
OVMF is to produce several EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
Beyond that agreement, I must say that invalidating the assumption that
"there is only one root bridge" breaks about everything in OVMF.
Just skimming my hack-patch identifies the problems (or in some cases,
* The PCI host bridge driver under PcAtChipsetPkg needs to be cloned
under OvmfPkg, and (as you say) the bus ranges need to be determined
dynamically. On IRC you said that probing for device 0 on a bus is
sufficient to see if the bus lives, but for now I'm unsure if this would
be a layering violation or not for the UEFI protocols in question. Maybe
not.
* The bus ranges assigned to each "pxb" device (ie. root bridge) would
have to be able to accommodate any subordinate buses enumerated off that
root bridge. At least this is what PciRootBridgeEnumerator() in
"MdeModulePkg/Bus/Pci/PciBusDxe/PciEnumerator.c" seems to require. I've
got no clue how to size the bus ranges properly for the root bridges to
satisfy this.
* In fact, the bus range presented over the
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() function cannot be just
a range. As far as I tested the PCI bus driver (see path above), it
doesn't find anything if the range retrieved from
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() doesn't start *exactly*
with a live bus. In other words, it doesn't look for sibling buses, or
independent buses in the bus range exposed by the root bridge protocol.
It looks for *one* bus *at* the start of the range (and subordinate
buses hanging off of that). I have absolutely no clue why this is so,
but it means that for each pxb found, one root bridge io protocol would
have to be produced, and that proto should expose one bus range, with
the low end matching exactly the bus number, and the high end enabling
child buses to be enumerated.
* In the attached hack, I'm splitting the pre-patch, static, IO & MMIO
apertures in the middle. Maybe they could be the same shared ranges, as
you say. I don't know.
* In the OVMF BDS (boot device selection) code, we manually connect the
only one root bridge. This would have to be made dynamic, to connect all
of them. This connection basically amounts to "starting the enumeration".
* The OVMF boot order processing code hardcodes PciRoot(0x0) in a bunch
of device path matching logic. That would not be appropriate any longer.
In fact the above command line should boot the fedora live CD, but it
doesn't, and in the UEFI setup utility I cannot even browse the CD
filesystem.
* Gabriel wrote earlier some code for setting the INTx interrupt pin
registers of all PCI devices in OVMF's BDS. That code breaks now, an
assert is triggered ("PCI host bridge (00:00.0) should have no
interrupts"). Not sure why this happens.
* The UEFI device paths for the PCI root bridges (textually,
PciRoot(0x0), PciRoot(0x1) etc) actually start with ACPI device path
nodes. They consist of a PNP0A03 _HID and a numeric _UID. If my reading
of the UEFI spec is correct, the _UIDs that OVMF would assign to these
device path notes would have to match the *actual* ACPI payload that
QEMU exports. The _UID assignment is now static (just a 0), and my
hack-patch assigns a static 1 to the "other" root bridge's device path.
This is not good. OVMF would either have to parse ACPI payload
(horrible) or get the _UID<->pxb assignment via fw_cfg.
That's all the carnage I can think of right now, but I'm sure this is
just the tip of the iceberg. This would be a very large project, and
QEMU might have to expose a lot more info over fw_cfg than it does now.
In any case, the device model itself could be digestible for OVMF, based
on the results of this hack.
Thanks a lot for your analysis.
Since I am new to edk2, I cannot take this project by myself, but if PCI
guys can come up with a plan or design, I'll be glad to implement it,
or at least to contribute to it.
After sleeping on it :), I'd certainly like to find the time to
collaborate on this myself. Maybe we can experiment some more; for
example we could start by you explaining to me how exactly to probe for
a root bus's presence (you mentioned device 0, but I'll need more than
that).

For the bus range allocation, here's an idea:
- create a bitmap with 256 bits (32 bytes) with all bits zero
- probe all root buses; whatever is found, flip its bit to 1
- assuming N root buses were found, divide the number of remaining zero
bits with N. The quotient Q means how many subordinate buses each root
bus would be able to accommodate
- for each root bus:
- create an ACPI bus range descriptor that includes only the root
bus's number
- pull out Q zero bits from the bitmap, from the left, flipping them
to one as you proceed
- for each zero bit pulled, try to append that bus number to the ACPI
bus range descriptor (simply bumping the end). If there's a
discontinuity, start a new ACPI bus range descriptor.

This greedy algorithm would grant each root bus the same number of
possible subordinate buses, could be implemented in linear time, and
would keep the individual bus ranges "reasonably continuous" (ie. there
should be a reasonably low number of ACPI bus range descriptors, per
root bus).

What do you think? This wouldn't be a very hard patch to write, and then
we could experiment with various -device pxb,bus_nr=xxx parameters.

The MMIO and IO spaces I would just share between all of them; the
allocations from those are delegated back to the host bridge / root
bridge driver, and the current implementation seems sufficient -- it
just assings blocks from the same big MMIO ( / IO) space downwards.

Thanks
Laszlo

------------------------------------------------------------------------------
Brian J. Johnson
2015-06-03 14:19:07 UTC
Permalink
After sleeping on it:), I'd certainly like to find the time to
collaborate on this myself. Maybe we can experiment some more; for
example we could start by you explaining to me how exactly to probe for
a root bus's presence (you mentioned device 0, but I'll need more than
that).
- create a bitmap with 256 bits (32 bytes) with all bits zero
- probe all root buses; whatever is found, flip its bit to 1
- assuming N root buses were found, divide the number of remaining zero
bits with N. The quotient Q means how many subordinate buses each root
bus would be able to accommodate
- create an ACPI bus range descriptor that includes only the root
bus's number
- pull out Q zero bits from the bitmap, from the left, flipping them
to one as you proceed
- for each zero bit pulled, try to append that bus number to the ACPI
bus range descriptor (simply bumping the end). If there's a
discontinuity, start a new ACPI bus range descriptor.
This greedy algorithm would grant each root bus the same number of
possible subordinate buses, could be implemented in linear time, and
would keep the individual bus ranges "reasonably continuous" (ie. there
should be a reasonably low number of ACPI bus range descriptors, per
root bus).
In the interest of keeping things simple, "real hardware"
implementations I'm aware of usually determine root bus range
assignments statically, by platform-specific means (often setup
screens.) But it's certainly platform-specific, so whatever makes sense
for OVMF is fine.

If QEMU supports it, I'd strongly encourage you guys to add support for
multiple PCIe segments (multiple, possibly discontiguous 256-bus ranges)
from the beginning. Segment support doesn't show up in a lot BIOSes, so
a lot of code gets written assuming a single segment. That makes it
hard for those of us with multi-segment hardware to port it to our
platforms. :)
What do you think? This wouldn't be a very hard patch to write, and then
we could experiment with various -device pxb,bus_nr=xxx parameters.
The MMIO and IO spaces I would just share between all of them; the
allocations from those are delegated back to the host bridge / root
bridge driver, and the current implementation seems sufficient -- it
just assings blocks from the same big MMIO ( / IO) space downwards.
Thanks,
--
Brian J. Johnson

--------------------------------------------------------------------

My statements are my own, are not authorized by SGI, and do not
necessarily represent SGI’s positions.

------------------------------------------------------------------------------
Marcel Apfelbaum
2015-06-03 20:34:09 UTC
Permalink
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Hi,
- [Qemu-devel] [PATCH V8 00/17] hw/pc: implement multiple primary
busses for pc machines
adds a PCI Expander Device to QEMU that exposes a new PCI root bus.
http://thread.gmane.org/gmane.comp.emulators.qemu/338583/focus=338599
)
Post by Marcel Apfelbaum
The PXB is a "light-weight" host bridge whose purpose is to enable
the main host bridge to support multiple PCI root buses.
It does not have its own registers for configuration cycles, but is
snoops on main host bridge registers and it lives on the same PCI
segment.
The device receives from the command line the bus number and expects
the firmware (bios/UEFI) to probe the bus for devices behind it and
configure them.
My question is how can it be supported in edk2? Are there any
architecture limitations that will prevent it to work?
My edk2/UEFI knowledge is rather limited, but I did see in the spec
13.1.1 PCI Root Bridge I/O Overview
...
Depending on the chipset, a single
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL may abstract a portion of a PCI
Segment, or an entire PCI Segment. A PCI Host Bridge may produce
one or more PCI Root Bridges. When a PCI Host Bridge produces
multiple PCI Root Bridges, it is possible to have more than one
PCI Segment.
...
It seems that multiple EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances for
the same PCI Host Bridge mapped into the same PCI Segment is the
answer. First instance belongs to the "main" host bridge and the other
to the PXBs.
The open questions are of course how to assign resources (bus
numbers/IO/MEM) to the EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
For the bus numbers I think that the PCI Host Bridge can scan the 0x0
- 0xff range and build incrementally the bus ranges.
Regarding IO/MEM ranges I am still not sure. The way it is done in
SeaBIOS is that all devices behind PXB root bus are "considered" as
being behind bus 0 for resources allocation. Once the resources
allocation is done, each EFI_PCI_ROOT_BRIDGE gets the list of MEM/IO
ranges corresponding with the devices behind them.
Any comments and suggestions would be greatly appreciated.
Thank you in advance,
Marcel
I'm attaching a horrible patch (applies on top of edk2 SVN r17543, aka
git commit d4848bb9df) that allows OVMF to recognize the e1000 NIC with
ISO=/mnt/data/isos/Fedora-Live-Xfce-x86_64-20-1.iso
CODE=/home/virt-images/OVMF_CODE.fd
TMPL=/home/virt-images/OVMF_VARS.fd
cp $TMPL vars.fd
qemu-system-x86_64 \
-m 2048 \
-M pc \
-enable-kvm \
-device qxl-vga \
-drive if=pflash,readonly,format=raw,file=$CODE \
-drive if=pflash,format=raw,file=vars.fd \
-drive id=cdrom,if=none,readonly,format=raw,file=$ISO \
-device virtio-scsi-pci,id=scsi0 \
-device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
-debugcon file:debug.log \
-global isa-debugcon.iobase=0x402 \
-device pxb,id=bridge1,bus_nr=128 \
-netdev user,id=netdev0,hostfwd=tcp:127.0.0.1:2222-:22 \
-device e1000,netdev=netdev0,bus=bridge1,addr=1 \
-monitor stdio
With this hack in place, and using the above QEMU command line,
"debug.log" bears witness to the PCI enumeration succeeding, and the
"PCI" command in the UEFI shell lists the e1000 NIC.
Hi Laszlo,
These are very good news, that means that the device *can work* with edk2.
Post by Laszlo Ersek
I agree with your analysis that the way to support this QEMU feature in
OVMF is to produce several EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL instances.
Beyond that agreement, I must say that invalidating the assumption that
"there is only one root bridge" breaks about everything in OVMF.
Just skimming my hack-patch identifies the problems (or in some cases,
* The PCI host bridge driver under PcAtChipsetPkg needs to be cloned
under OvmfPkg, and (as you say) the bus ranges need to be determined
dynamically. On IRC you said that probing for device 0 on a bus is
sufficient to see if the bus lives, but for now I'm unsure if this would
be a layering violation or not for the UEFI protocols in question. Maybe
not.
* The bus ranges assigned to each "pxb" device (ie. root bridge) would
have to be able to accommodate any subordinate buses enumerated off that
root bridge. At least this is what PciRootBridgeEnumerator() in
"MdeModulePkg/Bus/Pci/PciBusDxe/PciEnumerator.c" seems to require. I've
got no clue how to size the bus ranges properly for the root bridges to
satisfy this.
* In fact, the bus range presented over the
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() function cannot be just
a range. As far as I tested the PCI bus driver (see path above), it
doesn't find anything if the range retrieved from
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL.Configuration() doesn't start *exactly*
with a live bus. In other words, it doesn't look for sibling buses, or
independent buses in the bus range exposed by the root bridge protocol.
It looks for *one* bus *at* the start of the range (and subordinate
buses hanging off of that). I have absolutely no clue why this is so,
but it means that for each pxb found, one root bridge io protocol would
have to be produced, and that proto should expose one bus range, with
the low end matching exactly the bus number, and the high end enabling
child buses to be enumerated.
* In the attached hack, I'm splitting the pre-patch, static, IO & MMIO
apertures in the middle. Maybe they could be the same shared ranges, as
you say. I don't know.
* In the OVMF BDS (boot device selection) code, we manually connect the
only one root bridge. This would have to be made dynamic, to connect all
of them. This connection basically amounts to "starting the enumeration".
* The OVMF boot order processing code hardcodes PciRoot(0x0) in a bunch
of device path matching logic. That would not be appropriate any longer.
In fact the above command line should boot the fedora live CD, but it
doesn't, and in the UEFI setup utility I cannot even browse the CD
filesystem.
* Gabriel wrote earlier some code for setting the INTx interrupt pin
registers of all PCI devices in OVMF's BDS. That code breaks now, an
assert is triggered ("PCI host bridge (00:00.0) should have no
interrupts"). Not sure why this happens.
* The UEFI device paths for the PCI root bridges (textually,
PciRoot(0x0), PciRoot(0x1) etc) actually start with ACPI device path
nodes. They consist of a PNP0A03 _HID and a numeric _UID. If my reading
of the UEFI spec is correct, the _UIDs that OVMF would assign to these
device path notes would have to match the *actual* ACPI payload that
QEMU exports. The _UID assignment is now static (just a 0), and my
hack-patch assigns a static 1 to the "other" root bridge's device path.
This is not good. OVMF would either have to parse ACPI payload
(horrible) or get the _UID<->pxb assignment via fw_cfg.
That's all the carnage I can think of right now, but I'm sure this is
just the tip of the iceberg. This would be a very large project, and
QEMU might have to expose a lot more info over fw_cfg than it does now.
In any case, the device model itself could be digestible for OVMF, based
on the results of this hack.
Thanks a lot for your analysis.
Since I am new to edk2, I cannot take this project by myself, but if PCI
guys can come up with a plan or design, I'll be glad to implement it,
or at least to contribute to it.
After sleeping on it :), I'd certainly like to find the time to
collaborate on this myself.
Great!

Maybe we can experiment some more; for
Post by Laszlo Ersek
example we could start by you explaining to me how exactly to probe for
a root bus's presence (you mentioned device 0, but I'll need more than
that).
Well, I lied. :)
I had a look now on seabios and it does the following:
- Receives using a fw_config file the number of extra root buses.
- It starts scanning from bus 0 to bus 0xff until it discovers all
the extra root buses. The 'discovery' is "go over all bus's slots and
probe for a non empty PCI header". If you find at least one device you
just discovered a new PCI root bus.

I think that we can improve the fw_config file to pass the actually
bus numbers and not only the total. In this way should be relatively easy
for edk2 to handle the extra root buses.
Post by Laszlo Ersek
- create a bitmap with 256 bits (32 bytes) with all bits zero
- probe all root buses; whatever is found, flip its bit to 1
- assuming N root buses were found, divide the number of remaining zero
bits with N. The quotient Q means how many subordinate buses each root
bus would be able to accommodate
- create an ACPI bus range descriptor that includes only the root
bus's number
- pull out Q zero bits from the bitmap, from the left, flipping them
to one as you proceed
- for each zero bit pulled, try to append that bus number to the ACPI
bus range descriptor (simply bumping the end). If there's a
discontinuity, start a new ACPI bus range descriptor.
This greedy algorithm would grant each root bus the same number of
possible subordinate buses, could be implemented in linear time, and
would keep the individual bus ranges "reasonably continuous" (ie. there
should be a reasonably low number of ACPI bus range descriptors, per
root bus).
What do you think? This wouldn't be a very hard patch to write, and then
we could experiment with various -device pxb,bus_nr=xxx parameters.
Well, it looks nice but I think that we can do something much simpler :)
Let's continue the above idea that QEMU passes to edk2 the *extra* root bus numbers
in ascending order for simplicity.
For example 8,16,32. From here you can derive that the bus ranges are:
0-7 host bridge 0
8-15 pxb root bridge 1
16-31 pxb root bridge 2
32-0xff pxb root bridge 3

BTW, this is the way, as far as I know, that the real hw divides the ranges.
Limitation:
- How do you know you have enough bus numbers for a host bridge to cover
all PCI-2-PCI bridges behind it? Let's say bus 0 has 10 bridges, 0-7 range is not enough.
Reasoning:
-This is *hw vendor* issue, not firmware, in our case QEMU should check
the ranges are enough before starting edk2.
In conclusion, this assumption does not break anything or gives as a big limitation.
And Seabios already assumes that... and QEMU is not going to break it.
Post by Laszlo Ersek
The MMIO and IO spaces I would just share between all of them; the
allocations from those are delegated back to the host bridge / root
bridge driver, and the current implementation seems sufficient -- it
just assings blocks from the same big MMIO ( / IO) space downwards
Yes, this is how it should be done, I am happy that it already works that way.

Thanks,
Marcel
Post by Laszlo Ersek
Thanks
Laszlo
------------------------------------------------------------------------------
Laszlo Ersek
2015-06-03 23:11:22 UTC
Permalink
Post by Marcel Apfelbaum
Maybe we can experiment some more; for example we could start by
you explaining to me how exactly to probe for a root bus's presence
(you mentioned device 0, but I'll need more than that).
Well, I lied. :)
- Receives using a fw_config file the number of extra root buses.
- It starts scanning from bus 0 to bus 0xff until it discovers all
the extra root buses. The 'discovery' is "go over all bus's slots
and probe for a non empty PCI header". If you find at least one
device you just discovered a new PCI root bus.
I thought about checking the VendorId header field for dev=0 func=0 on
each bus. (Sources on the net indicate that the VendorId field is
usually queried for presence -- all bits one means "nope".)
Post by Marcel Apfelbaum
I think that we can improve the fw_config file to pass the actually
bus numbers and not only the total. In this way should be relatively
easy for edk2 to handle the extra root buses.
Yes. I had thought this would be the easiest. I wasn't sure though if
you'd appreciate such an idea :)
Post by Marcel Apfelbaum
- create a bitmap with 256 bits (32 bytes) with all bits zero
- probe all root buses; whatever is found, flip its bit to 1
- assuming N root buses were found, divide the number of remaining
zero bits with N. The quotient Q means how many subordinate buses
each root bus would be able to accommodate
- create an ACPI bus range descriptor that includes only the root
bus's number
- pull out Q zero bits from the bitmap, from the left, flipping
them to one as you proceed
- for each zero bit pulled, try to append that bus number to the
ACPI bus range descriptor (simply bumping the end). If there's a
discontinuity, start a new ACPI bus range descriptor.
This greedy algorithm would grant each root bus the same number of
possible subordinate buses, could be implemented in linear time, and
would keep the individual bus ranges "reasonably continuous" (ie.
there should be a reasonably low number of ACPI bus range
descriptors, per root bus).
What do you think? This wouldn't be a very hard patch to write, and
then we could experiment with various -device pxb,bus_nr=xxx
parameters.
Well, it looks nice but I think that we can do something much simpler :)
Let's continue the above idea that QEMU passes to edk2 the *extra*
root bus numbers in ascending order for simplicity.
0-7 host bridge 0
8-15 pxb root bridge 1
16-31 pxb root bridge 2
32-0xff pxb root bridge 3
Sounds good, at least if the bus numbers assigned to the pxb's partition
the full range fairly uniformly.
Post by Marcel Apfelbaum
BTW, this is the way, as far as I know, that the real hw divides the ranges.
- How do you know you have enough bus numbers for a host bridge to
cover all PCI-2-PCI bridges behind it? Let's say bus 0 has 10
bridges, 0-7 range is not enough.
Exactly.
Post by Marcel Apfelbaum
- This is *hw vendor* issue, not firmware, in our case QEMU should
check the ranges are enough before starting edk2.
If you're willing to do the work in QEMU, you certainly won't meet any
resistance on my part! :)
Post by Marcel Apfelbaum
In conclusion, this assumption does not break anything or gives as a big limitation.
And Seabios already assumes that... and QEMU is not going to break it.
Great!
Post by Marcel Apfelbaum
The MMIO and IO spaces I would just share between all of them; the
allocations from those are delegated back to the host bridge / root
bridge driver, and the current implementation seems sufficient -- it
just assings blocks from the same big MMIO ( / IO) space downwards
Yes, this is how it should be done, I am happy that it already works that way.
Tonight I've started to work on this anyway. Before attacking the bitmap
idea, I wanted to -- had to, really -- rewrap OVMF's fresh clone of
"PcAtChipsetPkg/PciHostBridgeDxe" to 79 columns. I expect to delve into
the driver more deeply this time than last time, and the consistently
overlong (130-148 character) lines make the code simply unreadable.

So, I just finished that. (It was surprisingly difficult; the rewrapping
took 8 patches, the cumulative diffstat is 9 files changed, 2261
insertions(+), 1445 deletions(-).) I thought I'd check my email before
embarking on the bitmap thing. Your email arrived at the best possible
moment! Not just because I don't have to implement the bitmap, the
search, the multiple ACPI bus ranges per root bridge, but also because
the internals of the driver rely quite heavily on each root bridge
having a single contiguous bus range.

I think I could have rebased that to bitmap checks, but the approach
you're suggesting makes it all unnecessary. (Plus, I don't have to worry
about any incompatibility with the PCI bus driver, which I won't touch.)

What element type do you propose for the array in the new fw_cfg file?
(And what name for the fw_cfg file itself?)

"etc/extra-pci-roots" uses uint64_t, little endian, for the number of
extra root buses. (In fact if you expose the explicit list in a separate
file, then the element count is not even necessary separately, because
file sizes are available in the fw_cfg directory, and I can divide the
file size with the element size.)

I have two more questions (raised earlier), about the _HID and the _UIDs
in the SSDT.

First, I can see in your patch

hw/acpi: add support for i440fx 'snooping' root busses

that the _UID is populated for each root bus with a string of the form

PC%02X

where the argument is "bus_num". UEFI can accommodate this, with the
Expanded ACPI Device Path node, but I'll have to know if the "bus_num"
argument matches the exact numer that you're going to pass down in the
new fw_cfg file. Does it?

Second, about the _HID and _UID objects being populated by strings, and
not numbers. In *theory* this should be all fine, but I'm concerned that
in practice they will cause complications (eg. in booting).

Can you perhaps change (or update, separately) the QEMU patch in
question, so that _HID is populated with aml_eisaid() instead of
aml_string(), and that _UID is populated with a flat integer (the
"bus_num" value)? It should not be very intrusive for QEMU at this
point, and down the road I think it would ensure better compatibility.
Post by Marcel Apfelbaum
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index db32fd1..8fae3b9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -944,9 +944,8 @@ build_ssdt(GArray *table_data, GArray *linker,
scope = aml_scope("\\_SB");
dev = aml_device("PC%.02X", bus_num);
- aml_append(dev,
- aml_name_decl("_UID", aml_string("PC%.02X", bus_num)));
- aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A03")));
+ aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
+ aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
if (numa_node != NUMA_NODE_UNASSIGNED) {
As far as I can see in the QEMU source, filling in _HID and _UID like
this is existing practice.

Thanks!
Laszlo

------------------------------------------------------------------------------
Marcel Apfelbaum
2015-06-04 09:42:42 UTC
Permalink
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Maybe we can experiment some more; for example we could start by
you explaining to me how exactly to probe for a root bus's presence
(you mentioned device 0, but I'll need more than that).
Well, I lied. :)
- Receives using a fw_config file the number of extra root buses.
- It starts scanning from bus 0 to bus 0xff until it discovers all
the extra root buses. The 'discovery' is "go over all bus's slots
and probe for a non empty PCI header". If you find at least one
device you just discovered a new PCI root bus.
I thought about checking the VendorId header field for dev=0 func=0 on
each bus. (Sources on the net indicate that the VendorId field is
usually queried for presence -- all bits one means "nope".)
Post by Marcel Apfelbaum
I think that we can improve the fw_config file to pass the actually
bus numbers and not only the total. In this way should be relatively
easy for edk2 to handle the extra root buses.
Yes. I had thought this would be the easiest. I wasn't sure though if
you'd appreciate such an idea :)
Well, this is the right thing to do and also the implementation will be straight forward.
Post by Laszlo Ersek
Post by Marcel Apfelbaum
- create a bitmap with 256 bits (32 bytes) with all bits zero
- probe all root buses; whatever is found, flip its bit to 1
- assuming N root buses were found, divide the number of remaining
zero bits with N. The quotient Q means how many subordinate buses
each root bus would be able to accommodate
- create an ACPI bus range descriptor that includes only the root
bus's number
- pull out Q zero bits from the bitmap, from the left, flipping
them to one as you proceed
- for each zero bit pulled, try to append that bus number to the
ACPI bus range descriptor (simply bumping the end). If there's a
discontinuity, start a new ACPI bus range descriptor.
This greedy algorithm would grant each root bus the same number of
possible subordinate buses, could be implemented in linear time, and
would keep the individual bus ranges "reasonably continuous" (ie.
there should be a reasonably low number of ACPI bus range
descriptors, per root bus).
What do you think? This wouldn't be a very hard patch to write, and
then we could experiment with various -device pxb,bus_nr=xxx
parameters.
Well, it looks nice but I think that we can do something much simpler :)
Let's continue the above idea that QEMU passes to edk2 the *extra*
root bus numbers in ascending order for simplicity.
0-7 host bridge 0
8-15 pxb root bridge 1
16-31 pxb root bridge 2
32-0xff pxb root bridge 3
Sounds good, at least if the bus numbers assigned to the pxb's partition
the full range fairly uniformly.
The management will assign the ranges, so libvirt will take care of that.
Post by Laszlo Ersek
Post by Marcel Apfelbaum
BTW, this is the way, as far as I know, that the real hw divides the ranges.
- How do you know you have enough bus numbers for a host bridge to
cover all PCI-2-PCI bridges behind it? Let's say bus 0 has 10
bridges, 0-7 range is not enough.
Exactly.
Post by Marcel Apfelbaum
- This is *hw vendor* issue, not firmware, in our case QEMU should
check the ranges are enough before starting edk2.
If you're willing to do the work in QEMU, you certainly won't meet any
resistance on my part! :)
Of course I will handle it in QEMU.
For the moment I check only that 2 pxbs do not receive the same root bus number.
I'll improve that by checking on PXB init that it does not interfere withe the other ranges.

It will go like this.
Sort root/host bridges in ascending order of theri root bus number.
- For each root bridge
- Go over pci-2-pci-bridges
- Count all the buses using the subordinate bus pci header field.
- See that the above nr is less then <next PCI root bus nr> - <this PCI root bus nr> + 1
Post by Laszlo Ersek
Post by Marcel Apfelbaum
In conclusion, this assumption does not break anything or gives as a big limitation.
And Seabios already assumes that... and QEMU is not going to break it.
Great!
:)
Post by Laszlo Ersek
Post by Marcel Apfelbaum
The MMIO and IO spaces I would just share between all of them; the
allocations from those are delegated back to the host bridge / root
bridge driver, and the current implementation seems sufficient -- it
just assings blocks from the same big MMIO ( / IO) space downwards
Yes, this is how it should be done, I am happy that it already works that way.
Tonight I've started to work on this anyway. Before attacking the bitmap
idea, I wanted to -- had to, really -- rewrap OVMF's fresh clone of
"PcAtChipsetPkg/PciHostBridgeDxe" to 79 columns. I expect to delve into
the driver more deeply this time than last time, and the consistently
overlong (130-148 character) lines make the code simply unreadable.
So, I just finished that. (It was surprisingly difficult; the rewrapping
took 8 patches, the cumulative diffstat is 9 files changed, 2261
insertions(+), 1445 deletions(-).) I thought I'd check my email before
embarking on the bitmap thing. Your email arrived at the best possible
moment! Not just because I don't have to implement the bitmap, the
search, the multiple ACPI bus ranges per root bridge, but also because
the internals of the driver rely quite heavily on each root bridge
having a single contiguous bus range.
Good!
Post by Laszlo Ersek
I think I could have rebased that to bitmap checks, but the approach
you're suggesting makes it all unnecessary. (Plus, I don't have to worry
about any incompatibility with the PCI bus driver, which I won't touch.)
What element type do you propose for the array in the new fw_cfg file?
(And what name for the fw_cfg file itself?)
"etc/extra-pci-roots" uses uint64_t, little endian, for the number of
extra root buses. (In fact if you expose the explicit list in a separate
file, then the element count is not even necessary separately, because
file sizes are available in the fw_cfg directory, and I can divide the
file size with the element size.)
I can prepare another file. Regarding the new array, each element should be
a number between 0x0 and 0xff, so a uint8_t seems fair.
Post by Laszlo Ersek
I have two more questions (raised earlier), about the _HID and the _UIDs
in the SSDT.
First, I can see in your patch
hw/acpi: add support for i440fx 'snooping' root busses
that the _UID is populated for each root bus with a string of the form
PC%02X
where the argument is "bus_num". UEFI can accommodate this, with the
Expanded ACPI Device Path node, but I'll have to know if the "bus_num"
argument matches the exact numer that you're going to pass down in the
new fw_cfg file. Does it?
Yes.
Post by Laszlo Ersek
Second, about the _HID and _UID objects being populated by strings, and
not numbers. In *theory* this should be all fine, but I'm concerned that
in practice they will cause complications (eg. in booting).
The _UID was inspired, I think, by a real HW machine with multiple root buses.
But I have no problem changing it.
Post by Laszlo Ersek
Can you perhaps change (or update, separately) the QEMU patch in
question, so that _HID is populated with aml_eisaid() instead of
aml_string(), and that _UID is populated with a flat integer (the
"bus_num" value)? It should not be very intrusive for QEMU at this
point, and down the road I think it would ensure better compatibility.
I see no problem with this, I only need to check that it plays nice with
Linux and Windows guest OS.
Post by Laszlo Ersek
Post by Marcel Apfelbaum
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index db32fd1..8fae3b9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -944,9 +944,8 @@ build_ssdt(GArray *table_data, GArray *linker,
scope = aml_scope("\\_SB");
dev = aml_device("PC%.02X", bus_num);
- aml_append(dev,
- aml_name_decl("_UID", aml_string("PC%.02X", bus_num)));
- aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A03")));
+ aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
+ aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
if (numa_node != NUMA_NODE_UNASSIGNED) {
As far as I can see in the QEMU source, filling in _HID and _UID like
this is existing practice.
I can submit the patch , (or you can submit and I'll ack) on top of PXB series.
I am going to be on PTO, so it will wait a week :)

Thanks a lot!
Marcel
Post by Laszlo Ersek
Thanks!
Laszlo
------------------------------------------------------------------------------
Laszlo Ersek
2015-06-04 13:04:15 UTC
Permalink
Post by Marcel Apfelbaum
Post by Laszlo Ersek
What element type do you propose for the array in the new fw_cfg file?
(And what name for the fw_cfg file itself?)
"etc/extra-pci-roots" uses uint64_t, little endian, for the number of
extra root buses. (In fact if you expose the explicit list in a separate
file, then the element count is not even necessary separately, because
file sizes are available in the fw_cfg directory, and I can divide the
file size with the element size.)
I can prepare another file.
As long as we're crossing neither a QEMU nor a SeaBIOS release boundary,
I think we could just change the contents of the same file, with the
existing name.
Post by Marcel Apfelbaum
Regarding the new array, each element
should be
a number between 0x0 and 0xff, so a uint8_t seems fair.
Hm. The number of bytes to save here is really small, and it has been
suggested to maybe try to support segments? I don't know anything about
PCI segments; I vaguely recall that it allows for disjoint bus
intervals, with each interval having at most 256 elements. Maybe we
could accommodate that with a uint32_t element type?

In any case I'll leave it to you. I'll simply make the element type a
typedef in the OVMF code, and then I can easily flip it to another
integer type if necessary. One thing we should agree upon though that
whatever the width, it should be little endian.
Post by Marcel Apfelbaum
Post by Laszlo Ersek
I have two more questions (raised earlier), about the _HID and the _UIDs
in the SSDT.
First, I can see in your patch
hw/acpi: add support for i440fx 'snooping' root busses
that the _UID is populated for each root bus with a string of the form
PC%02X
where the argument is "bus_num". UEFI can accommodate this, with the
Expanded ACPI Device Path node, but I'll have to know if the "bus_num"
argument matches the exact numer that you're going to pass down in the
new fw_cfg file. Does it?
Yes.
Great, thanks.
Post by Marcel Apfelbaum
Post by Laszlo Ersek
Post by Marcel Apfelbaum
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index db32fd1..8fae3b9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -944,9 +944,8 @@ build_ssdt(GArray *table_data, GArray *linker,
scope = aml_scope("\\_SB");
dev = aml_device("PC%.02X", bus_num);
- aml_append(dev,
- aml_name_decl("_UID", aml_string("PC%.02X", bus_num)));
- aml_append(dev, aml_name_decl("_HID",
aml_string("PNP0A03")));
+ aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
+ aml_append(dev, aml_name_decl("_HID",
aml_eisaid("PNP0A03")));
aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
if (numa_node != NUMA_NODE_UNASSIGNED) {
As far as I can see in the QEMU source, filling in _HID and _UID like
this is existing practice.
I can submit the patch , (or you can submit and I'll ack) on top of PXB series.
I think I'll apply this locally for now, and test it together with the
OVMF code I plan to write. One of us can submit it later (I'm unaware of
any urgency, but I might be wrong).
Post by Marcel Apfelbaum
I am going to be on PTO, so it will wait a week :)
Works for me. Have a nice vacation. :)

Thanks!
Laszlo

------------------------------------------------------------------------------
Marcel Apfelbaum
2015-06-04 15:18:49 UTC
Permalink
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Post by Laszlo Ersek
What element type do you propose for the array in the new fw_cfg file?
(And what name for the fw_cfg file itself?)
"etc/extra-pci-roots" uses uint64_t, little endian, for the number of
extra root buses. (In fact if you expose the explicit list in a separate
file, then the element count is not even necessary separately, because
file sizes are available in the fw_cfg directory, and I can divide the
file size with the element size.)
I can prepare another file.
As long as we're crossing neither a QEMU nor a SeaBIOS release boundary,
I think we could just change the contents of the same file, with the
existing name.
The extra-roots file was existing before PXB.
I am afraid to break some other thing.
This is why I prefer another file.
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Regarding the new array, each element
should be
a number between 0x0 and 0xff, so a uint8_t seems fair.
Hm. The number of bytes to save here is really small, and it has been
suggested to maybe try to support segments? I don't know anything about
PCI segments; I vaguely recall that it allows for disjoint bus
intervals, with each interval having at most 256 elements. Maybe we
could accommodate that with a uint32_t element type?
While I dont' really care about the type,
Pmultiple pci segments correspond to multiple *host bridges*,
as opposed to one host bridge with multiple root bridges.

Once you support multiple host bridges, pxbs are not really needed. (PCIe based machines)
Post by Laszlo Ersek
In any case I'll leave it to you. I'll simply make the element type a
typedef in the OVMF code, and then I can easily flip it to another
integer type if necessary. One thing we should agree upon though that
whatever the width, it should be little endian.
OK for little endian.
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Post by Laszlo Ersek
I have two more questions (raised earlier), about the _HID and the _UIDs
in the SSDT.
First, I can see in your patch
hw/acpi: add support for i440fx 'snooping' root busses
that the _UID is populated for each root bus with a string of the form
PC%02X
where the argument is "bus_num". UEFI can accommodate this, with the
Expanded ACPI Device Path node, but I'll have to know if the "bus_num"
argument matches the exact numer that you're going to pass down in the
new fw_cfg file. Does it?
Yes.
Great, thanks.
Post by Marcel Apfelbaum
Post by Laszlo Ersek
Post by Marcel Apfelbaum
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index db32fd1..8fae3b9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -944,9 +944,8 @@ build_ssdt(GArray *table_data, GArray *linker,
scope = aml_scope("\\_SB");
dev = aml_device("PC%.02X", bus_num);
- aml_append(dev,
- aml_name_decl("_UID", aml_string("PC%.02X", bus_num)));
- aml_append(dev, aml_name_decl("_HID",
aml_string("PNP0A03")));
+ aml_append(dev, aml_n ame_decl("_UID", aml_int(bus_num)));
+ aml_append(dev, aml_name_decl("_HID",
aml_eisaid("PNP0A03")));
aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
if (numa_node != NUMA_NODE_UNASSIGNED) {
As far as I can see in the QEMU source, filling in _HID and _UID like
this is existing practice.
I can submit the patch , (or you can submit and I'll ack) on top of PXB series.
I think I'll apply this locally for now, and test it together with the
OVMF code I plan to write. One of us can submit it later (I'm unaware of
any urgency, but I might be wrong).
Post by Marcel Apfelbaum
I am going to be on PTO, so it will wait a week :)
Works for me. Have a nice vacation. :)
Thanks!
Marcel
Post by Laszlo Ersek
Thanks!
Laszlo
------------------------------------------------------------------------------
Laszlo Ersek
2015-06-04 16:27:09 UTC
Permalink
Post by Marcel Apfelbaum
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Post by Laszlo Ersek
What element type do you propose for the array in the new fw_cfg file?
(And what name for the fw_cfg file itself?)
"etc/extra-pci-roots" uses uint64_t, little endian, for the number of
extra root buses. (In fact if you expose the explicit list in a separate
file, then the element count is not even necessary separately, because
file sizes are available in the fw_cfg directory, and I can divide the
file size with the element size.)
I can prepare another file.
As long as we're crossing neither a QEMU nor a SeaBIOS release boundary,
I think we could just change the contents of the same file, with the
existing name.
The extra-roots file was existing before PXB.
I am afraid to break some other thing.
This is why I prefer another file.
Noted.
Post by Marcel Apfelbaum
Post by Laszlo Ersek
Post by Marcel Apfelbaum
Regarding the new array, each element
should be
a number between 0x0 and 0xff, so a uint8_t seems fair.
Hm. The number of bytes to save here is really small, and it has been
suggested to maybe try to support segments? I don't know anything about
PCI segments; I vaguely recall that it allows for disjoint bus
intervals, with each interval having at most 256 elements. Maybe we
could accommodate that with a uint32_t element type?
While I dont' really care about the type,
Pmultiple pci segments correspond to multiple *host bridges*,
as opposed to one host bridge with multiple root bridges.
Noted. UINT8 is fine then.

Thanks!
Laszlo

------------------------------------------------------------------------------
Loading...