Host write guarantees source access scope - vulkan

According to the spec (1.2.138), section 6.9 (Host Write Ordering Guarantees):
When batches of command buffers are submitted to a queue via vkQueueSubmit, it defines a
memory dependency with prior host operations, and execution of command buffers submitted to
the queue.
[...]
The first access scope includes all host writes to mappable device memory that are available to the
host memory domain.
does "mappable" mean allocated using VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT?
does "available to the host memory domain" mean currently mapped?

does "mappable" mean allocated using VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT?
Yes.
does "available to the host memory domain" mean currently mapped?
No.
On device that means synchronization that includes Domain Op to the host is performed. That means barrier-like primitive with VK_PIPELINE_STAGE_HOST_BIT.
Or on host writes via the mapped pointer have to be vkFlushMappedMemoryRanges if they are not COHERENT. The memory write is already in host domain, but vkFlush performs the Availability OP.
In this chapter it means the later. Memory writes through the mapped pointer that are COHERENT or vkFlushMappedMemoryRanges will automatically\implicitly get to be visible to the GPU on the next vkQueueSubmit.

Related

NTPD Pseudo Servers

Good evening!
I'm configuring NTP on an embedded Linux system connected with an U-Blox GPS receiver. I've used NTPD and GPSD.
I would like to know what's the technical difference between:
PPS Signal provided by the GPSD shared memory SHM, (Pseudo IP Address 127.127.28.1);
PPS Signal "Stand Alone", but always connected in some way I would like to understand, with GPS (Pseudo IP Address 127.127.22.0)
It is critical for me to understand because I really need an high level synchronization and I would like the right information from the receiver.
Searching all over the web I've found only confused answers to my doubt...
Thanks in advance!
FL
The SHM driver is not designed to provide a PPS signal by itself. So maybe your notion here is misguided.
A PPS signal is used for getting a (precise) notion of the
frequency of the local clock (the one used for measuring external signals), as it just provides a well known timing distance of the "pulses" (1s in this case). Actually pps is a frequency source.
GPSD on the other hand is communicating with some device (could be built into your HW). It then proovides the time data read from the GPS source via shared memory to ntp. This provisioning of data does not guarantee any timing relation (delay). (E.g. could occur earlier or later within the second due to load or scheduling)
From the perspective of ntp, you will have a true date/time label, but you might not know exactly when the related point in time did occur related to your local clock. (Usually not precisely enough for common ntp use cases.) This is where PPS kicks in.
Depending on how the GPS device is being connected to your local machine (parallel, serial, internal bus) you will have some way of getting an interrupt on the pulse from the pps signal. (e.g. with serial connection you usually will get a transition on the DCD pin).
The internal processing of the related interrupt will read the local clock and the resulting timing information is then provided for further processing. This information is exactly what the PPS clock discipline is using and providing to ntp. What you need to configure here, is the offset from the triggering of the pulse to reading the local clock. (Pulse usually is assumed to occur "on the second.)
So, in your configuration, it is likely that the "source" of the PPS signal is the same GPSD is using for providing date/time data (your GPS device).
However, the actual signal used for date/time data and pps is different. Date/time will use a data telegram or some register content read from the GPS device while pps will be a level change on an input pin proveded from this very device.
For details start with the interfacing information from your GPS receiver, especialy any timings stated there. Then look at ntp and figure what driver(s) will allow exploiting such input data for best time quality.

GPU Memory Read Instruction Flow, Operand collector

I am trying to learn the architecture of a GPU with GPGPU-Sim and I am confused with the flow of memory operations. Lets say I have arithmetic instruction like a = b + c. Before doing the calculation, memory load operations are required for b and c. Load instructions for these are sent to the memories.First of all cache tags are checked.
In case of a miss, the request is being added to MSHR and sent to the lower memory via interconnection network from gpu cores. When the request returns to the core from interconnection network, it is added to a some kind of memory response fifo. Then cache lines are filled by ejecting those requests from the response fifo.
In case of a hit, data are available at cache.
In both cases, our data for arithmetic instruction units are available in caches. I know that operand collector collects required operands for issuing warps, but the part confuses me is where does the operand collector collects those operands from? Per thread registers? If so, when do these registers get required data from caches?
Found the answer. One memory request response from memory response fifo is popped each cycle when the fifo is not empty and writeback stage is not stalled.The popped memory request response gets written to the single ported register file banks. SIMD execution units load required registers for arithmetic instructions from those register file banks when needed. Information about operand collector and those register file banks are available online and pantented by NVIDIA.

Reading USB device Vendor ID and Device ID from PCI config space (EFI)

I want to get Vendor ID and Device ID for plugged USB device via EFI program. I can read whole PCI config space I find USB host controller to which My USB device is pugged I can also read whole memory addressed for this controller but I don't know what exactly I'am searching for in this memory to get these IDs. Can someone help me?
I will answer for xHCI. The USB protocols define a setup packet, which the xHCI Driver must construct and the xHCI controller hardware converts it to a USB transaction with the device – there are no registers that it accesses directly for this information like with PCIe.
After a hardware reset, all root hub ports will be in a disconnected state. The port will be powered on and waiting for a device connect. When the hardware detects a device attach, it sets the Current Connect Status and Connect Status Change flags in the PORTSC register to 1, and this action will cause the PSCEG signal to go high at the same time as it is a logical OR of the PORTSC register bits. This signal generates a Port Status Change Event in the controller, which causes the xHCI controller hardware to put a packet (known as a Transfer Request Block) on the Event Ring. The Event Ring segments and table were allocated from the non-paged pool and initialised during xHCI controller enumeration, likely during the StartDevice routine of the xHCI Driver, which is called when it is loaded; it also initialises the Event Ring Registers in the device MMIO space.
After enqueuing the event on the ring, the hardware triggers an interrupt at a particular offset into the MSI-X table (the MSI-X offset from a BAR and BAR no. are stored in the MSI-X capability in the PCIe configuration space of the xHCI controller; hence, the MSI-X table is in the MMIO space). The Primary Event Ring always receives all Port Status Change events. The Primary Event Ring is always mapped to the first MSI-X interrupt. The MSI-X interrupt will travel as a standard PCIe MSI to the LAPIC and it will queue an interrupt in the IRR for the vector specified in the MSI-X table store data. The xHCI Driver previously registered an ISR at the IDT entry corresponding to the vector using kernel and HAL APIs. The xHCI Driver's ISR realises the TRB packet is a Port Status Change event; it can evaluate the port ID to determine the root hub port that was the source of the change event (e.g. 5) and then examine the 5th PORTSC register to see what change has taken place, which it accesses at a certain offset from the operational base, which is an offset from the address in the BAR of the xHCI controller's PCIe configuration space that was allocated during PCIe enumeration at boot (MMIO base); the formula is Operational base + (400h + (10h*(n-1))), where n is the port number 1 through MaxPorts and Operational base is the value in the CAPLENGTH register + MMIO base.
The xHCI Driver notifies the Root Hub Driver that a device has arrived (I think by calling a Root Hub callback function, which it accesses through the PDO of the Root Hub maybe), and the Root Hub Driver creates a PDO for it and notifies the PnP manager that the set of child devices has changed for the Root Hub. Either the xHCI Driver automatically assigns a Slot ID and does the address device TRB procedure silently before calling the callback function and provides the slot ID on the spot, or the Hub Driver has to initiate this by sending a URB to the xHCI controller to request a slot ID be assigned and returned to it when it is informed that there is a port status change on a certain Port ID (I'm not sure. And I'm not sure which data structures are controlled by the hub driver instead of the xHCI driver, so this is a guess). When the xHCI Driver receives the URB, it will send an enable slot TRB on the command ring and get the slot ID from the command completion TRB on the event ring; it will allocate a Device Context structure (known as Output Device Context) and pointer to it in the Device Context Base Array, which the xHCI controller maintains in non-paged pool. In the URB response, the hub driver receives the slot ID; the hub driver then allocates an Input Device Context in memory. The Add Context flags for the slot context and endpoint 0 context in the Input Control Context structure in the Input Device Context are set to 1 to indicate they need to be added. The slot context in the input device context structure is then allocated with port number, root string and number of endpoints. The Endpoint 0 (aka. default control) context data structure in the Input Context must be configured with valid values for the TR Dequeue pointer (points to the transfer ring it allocates) , EP type, Error Count and Max Packet Size fields. The MaxPStreams, Max Burst Size and EP state values should be set to 0. The Input Context structure pointer is sent by the hub in an address device command addressed to the Slot ID of the new device, which is sent via a URB to the xHCI Driver and it places the address device TRB on the command ring and the Host Controller will copy the Input Context to the Output Context pointed to by the DCBA entry for the slot.
The Hub Driver then sends URBs to the xHCI Driver to get the device descriptor in the following in the form:
Status = SubmitRequestToRootHub(RootHubDeviceObject,
IOCTL_INTERNAL_USB_SUBMIT_URB,
Urb,
NULL);
All URBs sent after the Slot ID is returned to the hub will contain the slot ID. It will also link ChildDeviceObject->DeviceExtension->UsbDeviceHandle to Urb->UrbHeader.UsbdDeviceHandle, which makes the PDO the hub allocated for the new device accessible through the URB. RootHubDeviceObject is the PDO of the Hub Driver, which is owned by the xHCI Controller Driver (or usbxhci-usbport pair), which will be used in the call to IoCallDriver inside this routine. The URB will be of type GET_DESCRIPTOR. The IRP is then initialised with a major code of IRP_MJ_INTERNAL_DEVICE_CONTROL and the stack location is initialised with the URB as one of the parameters and StackPtr->Parameters.DeviceIoControl.IoControlCode = IoControlCode;. It then calls IoCallDriver on the RootHubDeviceObject, which is owned by the xHCI Driver.
The xHCI driver will use Slot ID specified in the URB to index into the DCBA array and the doorbell array. It goes to the Control (Default, 0) Endpoint descriptor, which is at index 1 in the slot's device context array pointed to by DCBA[slotID] (slot context is at index 0) and it will write a Setup stage TD (Which consists of a single Setup TRB) to the Enqueue Pointer specified in the default (control) endpoint descriptor (which I presume is autoset to the same address as the dequeue pointer originally in the input device context when the xHCI controller handles the address device command), which is a physical address in RAM that the xHCI controller reads from using PCIe TLP transactions. In the TRB, it specifies the TRB type (Setup); Transfer Type (IN); Transfer Length (Device Descriptor size = 18); Immediate data = 0 (not sure what this is but only the setup stage appears to have it toggled to 1); Interrupt on completion (no); bmRequestType = 80h and bRequest = 6 together specify the GET_DESCRIPTOR request type; wValue is set to type: Device Descriptor, which is 0100h; and then wLength is 18 (length of the device descriptor). It then advances the Endpoint 0 Transfer ring Enqueue Pointer (adds the size of the previous TD to it). It then writes a Data Stage TD at the location of the new Enqueue Pointer that it wrote; but, indeed, it uses the virtual address the xHCI Driver defined over the MMIO space on xHCI enumeration (it used MmMapIoSpace on the CM_RESOURCE_LIST in Parameters.StartDevice.AllocatedResourcesTranslated in the start device routine) to write to the location in RAM because system software cannot use physical addresses unlike PCIe devices (the Host Controller). The Data Stage TD consists of one TRB with TRB type set to Data Stage TRB; Direction = 1; TRB transfer length = 18; Chain bit = 0; IOC = 0 (doesn't interrupt because only the Status Stage causes an interrupt i.e. when it is done); Immediate data = 0; Data buffer pointer (the 64 bit physical address that the xHCI controller will write the response to which is translated from the virtual address provided by the hub driver); and the Cycle bit (Current producer cycle state (toggled to 1 or 0 based on the enqueue pointer wrapping round the ring. The producer toggles the Cycle Bit from 0 to 1 if it encounters a link TRB (it reads before writing to a location to make sure there is not a Link TRB already there that points to the start of the ring)). It then advances the Enqueue pointer again. Finally it writes a Status Stage TD that consists of a single status TRB with TRB type = Status Stage TRB; Direction = '0'; TRB transfer length = 0; Chain bit = 0; Interrupt On Completion = 1; Immediate data = 0; Data buffer pointer = 0 (there isn't one as it's just a status stage); and Cycle bit = (Current producer cycle state).
The xHCI Driver then indexes into the Doorbell Array using the Slot ID and writes a sequence to the doorbell register at that index, which indicates that the Control EP 0 Enqueue Pointer was updated. The Host Controller then kicks into action and reads the TRBs, incrementing the Dequeue Pointer; and, when it is equal to the Enqueue Pointer, it stops. For each TRB, it sends the appropriate packet to the device. When it processes the Status stage TRB it will cause an interrupt on the Event Ring (I think ring 0), which causes an MSI-x interrupt, as stated before, to the LAPIC of the CPU to the specified vector, which will be picked up by the xHCI Contoller ISR and DPC. The ISR will deploy a DPC that will complete the IRP. The descriptor will be at the virtual address specified in the URB IRP by the Hub Driver.
The Hub Driver inserts the information it received in the URB IRPs into the PDO->DeviceExtension field, which is a pointer to a driver defined structure with which it can do what it wants, which means the information is essentially cached and no more URBs need to be sent to the xHCI Driver. The hub also sends GET_CONFIGURATION URBs to the xHCI Driver for each configuration number that the device reported in the Device Descriptor. The xHCI Driver would then pass that configuration value to the device in a GET_CONFIGURATION TRB with the correct configuration number, and the whole configuration hierarchy for that configuration number would be returned to the Hub Driver at the address specified in the URB.
The hub driver then alerts the PnP manager to the arrival of the device by calling IoInvalidateDeviceRelations() with Type parameter of BusRelations and a pointer to its PDO (Physical device object) that it was assigned by the xHCI Driver. The PnP manager queries the Device Stack of the PDO for the current list of devices on the bus using an IRP_MN_QUERY_DEVICE_RELATIONS request; to do so, it initialises an IRP structure (ideally reusing one from a lookaside list based on the stacksize hint in the device object; otherwise, it directly allocates memory from the non-paged pool for a new one). The IRP points to the stack (which is contiguous with the IRP) through the CurrentStackLocation member. It then initialises the first stack location for the call it wants to perform (In this case a Major function of IRP_MJ_PNP and a minor function of IRP_MN_QUERY_DEVICE_RELATIONS). It then calls the driver at the top of the Device Stack of the PDO that was sent, which could be an upper filter driver (which would just not implement that minor function and the function body would be code to pass it down – we will assume there isn't one for now). So, the top of the stack will be the FDO of the hub (which it reaches using IoGetAttachedDevice, which is the top of the stack). It calls it using IoCallDriver(*FDO, *IRP), a wrapper of IofCallDriver, which gets the next stack location by decrementing the CurrentStackLocation pointer, which causes it to point to the next stack location through the rules of C pointer arithmetic (which is the first stack location as the pointer was initialised one after it), and then it uses the major function number IRP_MJ_PNP indicated in the stack location to index into the MajorFunction array of the driver object of the FDO that was passed to IoCallDriver (Hub Driver) and calls the function at that position in the array.
The code for that call looks like this:
return FDO->DriverObject->MajorFunction[StackPtr->MajorFunction](FDO,
Irp);
You can see it passes the IRP. This allows the USB Hub Driver's function handler for IRP_MJ_PNP to check the minor function at the current stack location and then call the correct internal function. For each child device, the handler references the PDO in the DEVICE_RELATIONS structure, which is just an array of PDO pointers. It then sets Irp->IoStatus.Information to a pointer to the array and returns. The Plug and Play Manager then looks at the array of PDOs and compares the addresses to the address of the PDOs on the device tree that it has already enumerated. If there are new addresses it queries the Device and Instance IDs and the resource requirements by sending a bunch of IRPs like RP_MN_QUERY_ID, IRP_MN_QUERY_CAPABILITIES, IRP_MN_QUERY_DEVICE_TEXT, IRP_MN_QUERY_BUS_INFORMATION, IRP_MN_QUERY_RESOURCES and IRP_MN_QUERY_RESOURCE_REQUIREMENTS; and, if any of the PDOs have been marked inactive, it also sends an IRP_MN_SURPRISE_REMOVAL to those PDOs using the same IRP initialisation process as previously described (the FDOs of the devices will not implement the surprise removal function and pass it down to the Hub Driver) and the Hub Driver will disable the device and release hardware resources assigned to it.
To query the Device and Instance IDs, the PnP Manager sends a IRP_MN_QUERY_ID (one for Device ID and a separate one for Instance ID) to each PDO in the array whose pointer it received that is new (which will be the PDO for the new device that was created by the Root Hub Driver). In the IRP_MN_QUERY_ID, it initialises the Parameters.QueryId.IdType member of the stack location to BusQueryDeviceID. The hub driver receives the PDO queried by the PnP manager through the IRP_MJ_PNP handler it installed on DriverEntry, and in response to this Device ID query, the Hub Driver needs to query the device for the information required to build and concatenate the Device ID, but remember it already did this as soon as the PDO was created (it already got the device descriptor) so it can just use the DeviceExtension in which the information from the device descriptor was inserted -- the usDeviceId field of the device descriptor appears to be the Device ID. The Instance ID is a device identification string that distinguishes a device from other devices of the same type; it is either a number supplied by the bus usually unique to a slot/port (if UniqueID in DEVICE_CAPABILITIES structure acquired when the PnP manager queried the device capabilities is FALSE), otherwise it is an identifier unique to the device (supplied by the device) obtained from the iSerialNumber of the device descriptor. The InstanceID is queried by the PnP manager in a separate call after using Parameters.QueryId.IdType = BusQueryInstanceID in the IRP. The PnP manager then concatenates the device ID, say USB\VID_1C4F&PID_0002 to the instance ID 7&15cdfaa&0&3 to get the Device Instance ID (DIID) USB\VID_1C4F&PID_0002\7&15cdfaa&0&3. The obvious result is that there will be a single registry entry for each device with a serial number and multiple entries for a device without a serial number.
The PnP manager then uses the DIID to index into the registry at HKLM\SYSTEM\CurrentControlSet\Enum\Bus\DeviceID\InstanceID.
On my own system:
In it is a classguid value that leads to a class subkey under HKLM\SYSTEM\CurrentControlSet\Control\Class\<GUID>, which might be a keyboard class for instance. These values are filled in by driver .INF files.
The PnP manager checks the registry for the presence of a corresponding function driver, and when it doesn’t find one, it informs the user-mode PnP manager of the new device by its DIID. The user-mode PnP Manager first tries to perform an automatic install without user intervention. If the installation process involves the posting of dialog boxes that require user interaction and the currently logged-on user has administrator privileges, the user-mode PnP manager launches the Rundll32.exe application (the same application that hosts Control Panel utilities) to execute the Hardware Installation Wizard (%SystemRoot%\System32\Newdev.dll). If the currently logged-on user doesn’t have administrator privileges (or if no user is logged on) and the installation of the device requires user interaction, the user-mode PnP manager defers the installation until a privileged user logs on. The Hardware Installation Wizard uses Setupapi.dll and CfgMgr32.dll (configuration manager) API functions to locate INF files that correspond to drivers that are compatible with the detected device.
It selects the .INF file that most closely resembles by giving them a ranking by looking for Compatible IDs in an .INF file that hopefully match the Hardware / Compatible IDs generated from the DIID, which were inserted into the device extension. If it finds one then it installs the driver. Installation will happen with each new device that is plugged in and is essentially just the filling in of the registry with the correct information under that DIID.
Installation is performed in two steps. In the first, the third-party driver developer imports the driver package into the driver store, and in the second step, the system performs the actual installation, which is always done through the %SystemRoot%\System32\Drvinst.exe process.
Control passes back to the PnP Manager and it uses the registry keys to load drivers in the following order:
Any lower-level filter drivers specified in the LowerFilters value of the device’s enumeration key.
Any lower-level filter drivers specified in the LowerFilters value of the device’s class key.
The function driver specified by the Service value in the device’s enumeration key. This value is interpreted as the driver’s key under HKLM\SYSTEM\CurrentControlSet\Services.
Any upper-level filter drivers specified in the UpperFilters value of the device’s enumeration key.
Any upper-level filter drivers specified in the UpperFilters value of the device’s class key.
USB drivers will have an intermediate devnode – usbccgp if it is a composite device and usbstor if it is a Mass Storage Device, which can be seen here. When the Hub Driver sends the DIID, it is usbstor that gets loaded by the PnP Manager, as can be seen in the image above. (We need the intermediate USB storage node to translate generic disk.sys IRPs to URBs and handle USB specific drive configuration rather than cramming the functionality all in usbhub.sys).
It loads the drivers and calls the DriverEntry function of each and then the AddDevice routine if they aren't already running (for another USB device that uses the same driver); otherwise, it just calls the AddDevice routine. The AddDevice routine creates an FDO for the passed PDO. In its AddDevice routine, the Filter Driver creates a Filter Device Object (FiDO) and attaches it to the Device Stack (IoAttachDeviceToDeviceStack). Then, the PnP Manager creates a Device Node and associates it with the PDO. The PnP Manager already earlier acquired the Bus Device's (Hub Driver's) opinion on the resources of the device using IRP_MN_QUERY_RESOURCE_REQUIREMENTS at the same time as sending IRPs for the Device ID. The PnP Manager now sends a IRP_MN_FILTER_RESOURCE_REQUIREMENTS using IoCallDriver at the top of new the Device Node with the FDO specified. Only the FDO driver handles this and it will alter any requirements for the device object that it needs that the Hub Driver was unable to predict. A USB Mass Storage Device doesn't require an interrupt as it will just use the primary event ring. If it did, it would specify in the response to the IRP the number of MSI-x messages it requires), and once the PnP manager assigns resources to the device, it sends an IRP_MN_START_DEVICE IRP to the Device Stack. Although each USB device can have a separate interrupt and corresponding event ring, it doesn't really matter because it's always going to be the same lowest level driver that responds to interrupts: the xHCI driver; the USB devices do not have ISRs to register. Therefore all USB devices can share the single event ring and single interrupt.
In the usbstor routine that handles the Start Device IRP, the IRP gets passed down to the bus device (usbhub) after the routine sets an IoCompletionRoutine. The IoCompletionRoutine, when it is eventually called, will send a GET_CONFIGURATION URB which will passed down to its PDO (owned by usbhub) and usbhub will present the configuration that it cached in that PDO device extension earlier. Usbstor eventually decides on the configuration to use and sends a SET_CONFIGURATION URB. Usbhub will fill in an input context from the cached configuration with endpoints i.e. ISOCH IN, INTERRUPT IN and adds the input context pointer to the URB. Usbhub then adds more information like Slot ID and sends the URB to the xHCI Controller and it picks it up and inserts a TRB on the Default Endpoint 0 Transfer Ring for the device to which the Host Controller will populate the Output Device Context of the slot with the Input Device Context according to the Input Control Context and informs the device about the configuration selected.
Usbstor will then allocate a PDO and call IoInvalidateBusRelations. When it gets to it, usbstor will request the device information which was stored its PDO belonging to usbhub earlier and it will translate the DIID to a standard format for usbstor recognisable by disk.sys .inf drivers (format \Disk&Ven_xxx&Prod_xxx&Rev_x.xx) and append a USBSTOR\ prefix and this will allow the registry to load disk.sys and partmgr.sys as a filter stored in the disk.sys class subkey. Usbstor is now the bus device for the device.
Disk.sys FDO will check a driver maintained table to see how many disks (N) have been enumerated in the system and will name the FDO \Device\HarddiskN\DRN. Partmgr.sys creates a symbolic link \Device\HarddiskN\Partiton0 to \Device\HarddiskN\DRN. Partmgr will then call IoReadPartitionTableEx and create partition PDOs for each of the partitions, naming them \Device\HarddiskN\PartitonM and so on. For each partition, it sends an IOCTL_INTERNAL_VOLMGR_PARTITION_ARRIVED IRP to volmgr and provides the disk signature and offset to the partition. Volmgr creates a volume PDO for each volume and creates symlinks between \Device\HarddiskN\PartitonM and the volume PDO name \Device\HarddiskvolumeX X >= 1 it assigns to the volume PDO (actually because of this symbolic link, the partition PDO with that name can never be accessed, and the PDO itself lacks a devnode and is internally managed by partmgr) and sends MntMgr an IRP_MJ_DEVICE_CONTROL request, specifying IOCTL_MOUNTMGR_VOLUME_ARRIVAL_NOTIFICATION by calling IoBuildDeviceIoControlRequest for each volume PDO (Harddiskvolume1 is always the first volume on disk.sys's \Device\Harddisk0\DR0). The Mount Manager responds by querying volmgr for the volume’s nonpersistent device object name (by sending 3 IOCTLs (control IRPs)) located in the Device directory of the system object tree (for example: "\Device\HarddiskVolume1"), the Unique ID generated for the volume and a suggested persistent symbolic link name e.g. \DosDevices\D:.
The persistent drive letter and mount points are stored with identical data fields. The data of the values is called the unique ID which is provided by volmgr to mntmgr with IOCTL_MOUNTDEV_QUERY_UNIQUE_ID. The Unique ID for basic disks is the signature and offset to the partition. The mount manager uses the suggested name if the mount manager's database does not already contain a persistent drive letter name for the volume paired with the unique ID. Otherwise, it ignores the suggestion and uses the drive letter name in its persistent name database. It ties \DosDevices\D: to the unique ID and develops a GUID and ties the GUID in the form \??\Volume{GUID} to the unique ID in its namespace and then it creates object manager namespace symbolic links between them. Later mount points i.e. \DosDevices\D:\mymount will also be paired with the unique ID. In photo above, the MBR drive contains A: (system reserved) and C: partitions and it is clear they have the same MBR signature and a different offset. The D: drive is a GPT drive and has a GPT signature and offset. E: is a USB MBR drive with a USBSTOR specific signature and offset.
When a file on C: is opened for the first time, the file system will not be mounted, so when the file path string is parsed in IopParseDevice, which calls IopCheckVpbMounted on the volume (C: corresponds to device name \Device\HarddiskVolume2 due to the symbolic link created by the mount manager), it will call IopMountDevice because VPB->DeviceObject == NULL, which sends an IRP_MJ_FILE_SYSTEM_CONTROL/IRP_MN_MOUNT_VOLUME to each registered file system registered file system that registered itself using IoRegisterFileSystem. The called FSs handle IRP_MN_MOUNT_VOLUME and determine if their file system is on the media and, if it is, the file system creates the File System Volume Device Object (VDO) and puts it in the VPB. C: gives the device and \file gives the file object. \ is the root directory object. IoGetRelatedDeviceObject gets the top of the VDO stack from the file object (performs IoGetAttachedDevice on FileObject->Vpb->DeviceObject).
NTFS driver internal structures:
Windows USB Drive device tree
Overview of the MMIO space
How the xHCI controller is enumerated
At system boot, pci.sys will be loaded by apci.sys and one of things it does is call IoInvalidateDeviceRelations, which triggers the PnP manager to send an IRP_MN_QUERY_DEVICE_RELATIONS, to which it responds with a list of PDOs. It creates the list on the spot by using the MCFG ACPI table's Base address of enhanced configuration mechanism field to get the base of the PCIe configuration space (PCIEXBAR) and then iterating through it on 4096 byte boundaries, and for any Vendor/Device IDs it finds on the boundaries, it creates a PDO and pairs it with the configuration no. One of the devices will be the xHCI controller. It goes through exactly the same process as described above, eventually creating a DIID and checking the registry etc., which will cause the xHCI Driver to be loaded; and the PnP Manager also asks the bus for resources its child will require with a IRP_MN_QUERY_RESOURCE_REQUIREMENTS to the PDO half stack (which will also be handled by ACPI bus filter). After the xHCI Driver is loaded, it sends a IRP_MN_FILTER_RESOURCE_REQUIREMENTS to the xHCI Driver so that it can make modifications to the resource requirements. The Devnode of the xHCI controller PDO receives an IRP_MN_START_DEVICE, and the xHCI controller notices it is for its own PDO and sets an IoCompletionRoutine and passes it down to pci.sys, which will see that the passed PDO is a child, and it will allocate the MMIO physical range decided on by the PnP manager it received in the resource list in the start device IRP parameters into the BAR and also sets up an MSI-x interrupt decided upon in the IRP_MN_QUERY_RESOURCE_REQUIREMENTS and IRP_QUERY_FILTER_RESOURCE_REQUIREMENTS and calls IoCompleteRequest which calls the IoCompletionRoutine the xHCI Driver set, which will call MmMapIoSpace over the CmResourceTypeMemory descriptors in the CM_RESOURCE_LIST in Parameters.StartDevice.AllocatedResourcesTranslated. It will create the Event Ring, Command Ring and DBCA in the virtual address space received from MmMapIoSpace and set the registers in the MMIO space to point to them. It then associates the Event Ring the MSI-x vector received in IRP_MN_START_DEVICE. It will then set up the ISR using IoConnectInterrupt and register DPCs. I'm not sure how the USB Hub Driver is loaded, but potentially it is done in StartDevice of the xHCI driver; it could call IoInvalidateDeviceRelations and say it only has one child, the Hub. It provides a DIID affixed with IUSB3\.
PCI config space shows you PCI and PCI Express devices, not USB devices.
PCI config space will show you the Vendor and Device ID of the the USB controller, but not the attached devices. You would have to enumerate the USB bus by reading/writing USB registers for that.
Note that taking over the USB controller will break currently-running USB stack and kill your USB keyboard and boot devices.
If you are at the UEFI shell, perhaps you can find what you need in the output of devtree.
If you are writing your own UEFI DXE code, it would have to query the USB driver.
Despite the question already being answered and marked accepted, I would just like to wave a flag for using:
EFI_PCI_IO_PROTOCOL for PCI operations
EFI_USB_IO_PROTOCOL for interacting with USB devices regardless of what bus the host controller happens to be connected to.
This way your application ends up being portable between all compliant UEFI platforms.
User #fpmurphy who posts answers here occasionally has examples of both in their github area.

multicast packet loss - running two instances of the same application

On Redhat Linux, I have a multicast listener listening to a very busy multicast data source. It runs perfectly by itself, no packet losses. However, once I start the second instance of the same application with the exactly same settings (same src/dst IP address, sock buffer size, user buffer size, etc.) I started to see very frequent packet losses from both instances. And they lost exact the same packets. If I stop the one of the instances, the remaining one returns to normal without any packet loss.
Initially, I though it is the CPU/kernel load issue, maybe it could not get the packets out of buffer quickly enough. So I did another test. I still keep one instance of the application running. But then started a totally different multicast listener on the same computer but use the second NIC card and listen to a different but even busier multicast source. Both applications run fine without any packet loss.
So it looks like one NIC card is not powerful enough to support two multicast applications, even though they listen to exact the same thing. The possible cause to the packet loss problem might be that, in this scenario, the NIC card driver needs to copy the incoming data to two sock buffers, and this extra copy task is too much for the ether card to handle so it drops packets. Any deeper analysis on this issue and any possible solutions?
Thanks
You are basically finding out that the kernel is inefficient at fan-out of multicast packets. Worst case scenario the code is for every incoming packet allocating two new buffers, the SKB object and packet payload, and copying the NIC buffer twice.
Pick the best case scenario, for every incoming packet a new SKB is allocated but the packet payload is shared between the two sockets with reference counting. Now imagine what happens when two applications, each on their own core and on separate sockets. Every reference to the packet payload is going to cause the memory bus to stall whilst both core caches have to flush and reload, and above that each application is having to kernel context switch back and forth to pass the socket payload. The result is terrible performance.
You aren't the first to encounter such a problem and many vendors have created solutions to it. The basic design is to limit the incoming data to one thread on one core on one socket, then have that thread distribute the data to all other interested threads, preferably using user space code building upon shared memory and lockless data structures.
Examples are TIBCO's Rendezvous and 29 West's Ultra Messaging showing a 660ns IPC bus:
http://www.globenewswire.com/newsroom/news.html?d=194703

When do USB Hosts require a zero-length IN packet at the end of a Control Read Transfer?

I am writing code for a USB device. Suppose the USB host starts a control read transfer to read some data from the device, and the amount of data requested (wLength in the Setup Packet) is a multiple of the Endpoint 0 max packet size. Then after the host has received all the data (in the form of several IN transactions with maximum-sized data packets), will it initiate another IN transaction to see if there is more data even though there can't be more?
Here's an example sequence of events that I am wondering about:
USB enumeration process: max packet size on endpoint 0 is reported to be 64.
SETUP-DATA-ACK transaction starts a control read transfer, wLength = 128.
IN-DATA-ACK transaction delivers first 64 bytes of data to host.
IN-DATA-ACK transaction delivers last 64 bytes of data to host.
IN-DATA-ACK with zero-length DATA packet? Does this transaction ever happen?
OUT-DATA-ACK transaction completes Status Phase of the transfer; transfer is over.
I tested this on my computer (Windows Vista, if it matters) and the answer was no: the host was smart enough to know that no more data can be received from the device, even though all the packets sent by the device were full (maximum size allowed on Endpoint 0). I'm wondering if there are any hosts that are not smart enough, and will try to perform another IN transaction and expect to receive a zero-length data packet.
I think I read the relevant parts of the USB 2.0 and USB 3.0 specifications from usb.org but I did not find this issue addressed. I would appreciate it if someone can point me to the right section in either of those documents.
I know that a zero-length packet can be necessary if the device chooses to send less data than the host requested in wLength.
I know that I could make my code flexible enough to handle either case, but I'm hoping I don't have to.
Thanks to anyone who can answer this question!
Read carefully USB specification:
The Data stage of a control transfer from an endpoint to the host is complete when the endpoint does one of
the following:
Has transferred exactly the amount of data specified during the Setup stage
Transfers a packet with a payload size less than wMaxPacketSize or transfers a zero-length packet
So, in your case, when wLength == transfer size, answer is NO, you don't need ZLP.
In case wLength > transfer size, and (transfer size % ep0 size) == 0 answer is YES, you need ZLP.
In general, USB uses a less-than-max-length packet to demarcate an end-of-transfer. So in the case of a transfer which is an integer multiple of max-packet-length, a ZLP is used for demarcation.
You see this in bulk pipes a lot. For example, if you have a 4096 byte transfer, that will be broken down into an integer number of max-length packets plus one zero-length-packet. If the SW driver has a big enough receive buffer set up, higher-level SW receives the entire transfer at once, when the ZLP occurs.
Control transfers are a special case because they have the wLength field, so ZLP isn't strictly necessary.
But I'd strongly suggest SW be flexible to both, as you may see variations with different USB host silicon or low-level HCD drivers.
I would like to expand on MBR's answer. The USB specification 2.0, in section 5.5.3, says:
The Data stage of a control transfer from an endpoint to the host is
complete when the endpoint does one of the following:
Has transferred exactly the amount of data specified during the Setup stage
Transfers a packet with a payload size less than wMaxPacketSize or transfers a zero-length packet
When a Data stage is complete, the Host Controller advances to the
Status stage instead of continuing on with another data transaction.
If the Host Controller does not advance to the Status stage when the
Data stage is complete, the endpoint halts the pipe as was outlined in
Section 5.3.2. If a larger-than-expected data payload is received from
the endpoint, the IRP for the control transfer will be
aborted/retired.
I added emphasis to one of the sentences in that quote because it seems to specifically say what the device should do: it should "halt" the pipe if the host tries to continue the data phase after it was done, and it is done if all the requested data has been transmitted (i.e. the number of bytes transferred is greater than or equal to wLength). I think halting refers to sending a STALL packet.
In other words, the device does not need a zero-length packet in this situation and in fact the USB specification says it should not provide one.
You don't have to. (*)
The whole point of wLength is to tell the host the maximum number of bytes it should attempt to read (but it might read less !)
(*) I have seen devices that crash when IN/OUT requests were made at incorrect time during control transfers (when debugging our host solution). So any host doing what you are worried about, would of killed those devices and is hopefully not in the market.