Linux Optimus Setup
I’ve finally obtained a laptop. It is an end-of-life Metabox Prime-S P950EP; based on the Clevo P950EP6. Its sole purpose is for mobile development and demonstration of `Edict' (and other associated software) at some meetups around Melbourne; I will continue to use my desktop for almost all day-to-day work.
It includes what has become the standard combination of an Intel GPU, for simple desktop workloads, and an nVidia GPU, for the more resource intensive graphics operations.
Naturally (for me) I’m using Linux for a majority of my work. However I rapidly encountered a constellation of platform quirks and driver frictions that one hears about so often in the Linux community in relation to nVidia devices.
ACPI
If your device hangs with a black screen as your graphical environment is starting, and the GPU is disabled, you may need to workaround some ACPI bugs.
My particular laptop appears to require Windows 10 ACPI functionality is disabled. We can do this by adding an option to the kernel’s command line; in my case by appending this line to my default GRUB configuration:
Other laptops require some combination of `Windows 2009', `Windows 2013', `Windows 2015'; or the negation of some or all of these. Unfortunately, the only way to discover the correct combination may be through brute force.
bumblebee/bbswitch
The most commonly cited mechanism for switching between GPUs on Linux is a combination of bumblebee
and bbswitch
.
Bumblebee provides an method to execute an application using a hidden X server (for exclusive use of the nVidia driver), and an environment that is modified to promote nVidia’s libGL.so (and friends) above the default Intel installation.
bbswitch
is a kernel module that provides a robust mechanism to power down (and up) the nVidia GPU, and manage the loading and unloading of the nvidia
kernel module.
However, given that this appears to trigger ACPI related system hangs on newer systems the better option appears to be avoiding the use of bbwitch
altogether. Instead we can rely on the kernel’s default PCIe power management facilities.
The above prevents the nvidia
module from being automatically loaded at boot, but does not prevent it from being manually loaded as the modules_blacklist
kernel parameter does.
Bumblebee requires the PMMethod
directive is set to none
so as to avoid the use of bbswitch
. It will instead default to the kernel’s power management system.
The kernel will only power down the device when the driver is unloaded, so we also require AlwaysUnloadKernelDriver
.
xinitrc.d
Alas, while the driver was not loaded at the point my greeter was displayed, it was loaded at some point while XFCE was starting up.
After evaluating some overkill solutions to answering the who loaded the module' question via `systemtap
I instead used a technique usually used for blacklisting the module.
The install directive will execute the listed command instead of loading the module.
Instead of actually loading the module we’ll dump a list of all running processes in the system. With a bit of luck one might see a likely candidate.
In our case the likely offender was nvidia-settings
which is a sufficiently unique name to just grep `/etc' and come out with a call to `/etc/X11/xinit/xinitrc.d/95-nvidia-settings'. It’s extraodinarily easy to accidentally trigger nVidia binaries/libraries into loading the kernel module; so anything related is a good candidate.
The 95-nvidia-settings
script belongs to the nvidia-drivers
package which we obviously can’t remove. But we can disable it by removing execute permissions from the script (and thus punt the problem back to our future selves when we next reinstall the driver and undo our changes).
Power Management
Now that the driver is likely to be unloaded by default we can set the PCIe bus to automatically power down when idle.
echo "auto" > /sys/bus/pci/devices/0000:01:00.0/power/control
An easy method for this is to use something like powertop --auto-tune
, or automate it via a Laptop Mode Tools' rule, or via the `systemd
tmpfiles.
facilities.
Confirmation
To verify we’ve got the correct behaviour after all this we reboot, login and then check:
- The GPU fan isn’t overly loud, and
lsmod | grep nvidia
does not report any loaded modules, andoptirun glxinfo | grep NVIDIA
reports the vendor is some variant of `NVIDIA'
I hope this helps someone avoid a goodly number of painful days rebooting their system.