Table of Contents
The “Illeagal Instruction” problem
If you recently started to see errors like “illegal instruction” while trying to execute different web accessing scripts/binaries (like curl httpS://example.com), then most probably your Centos 6 NSS package was just updated.
The impact of this bug is enormous, because every single HTTP library which uses nss,openssl for maintaining HTTPS connections would crash.
This includes all kinds of: wget, curl, libcurl, php + curl, perl + curl, perl + lwp … and so on (endless list)
This bug causes the script to immediately crash, and if you strace it you will see that it ends with “SIGILL“, which is a SIGNAL for using illegal cpu instruction.
The problem is already reported in Redhat Bugzilla, and here is the thread:
https://bugzilla.redhat.com//show_bug.cgi?id=1249426
The problem is mostly seen when your VPS is running under XEN hypervisors.
What is the temporary fix for this ?
As suggested in the redhat bugreport also, the fix is to add the following environment variable:
export NSS_DISABLE_HW_GCM=1
Fix script executing and init scripts
So if you run curl, or some other script hitting the bug, you can change your command to:
NSS_DISABLE_HW_GCM=1 curl https://example.com
or even edit your profile, and init scripts:
vim /etc/init.d/apache
--> put "export NSS_DISABLE_HW_GCM=1" at the beginning
vim /etc/init.d/some_other_software
--> put "export NSS_DISABLE_HW_GCM=1" at the beginning
Add it to profile.d
echo "export NSS_DISABLE_HW_GCM=1" > /etc/profile.d/fix_nss_bug.sh
Fix the Cron Jobs
In order to add the environment variable to all newly spawned cron jobs, follow the steps below:
Add the following to /etc/pam.d/crond
vim /etc/pam.d/crond
auth required pam_env.so
vim /etc/security/pam_env.conf
NSS_DISABLE_HW_GCM DEFAULT=1
UPDATE (18.06.2016)
Fix PHP and PHP-FPM
If your php-fpm or php CGI, doesn’t work after fixing the FPM init script or Apache/Nginx init scripts there is another dirty fix for this.
I was getting the following lines on one of my servers with php-fpm:
WARNING: [pool www] child 15439 exited on signal 4 (SIGILL) after 11815.774529 seconds from start WARNING: [pool www] child 18368 exited on signal 4 (SIGILL) after 42061.277718 seconds from start WARNING: [pool www] child 21132 exited on signal 4 (SIGILL) after 157757.037631 seconds from start WARNING: [pool www] child 27533 exited on signal 4 (SIGILL) after 132962.416230 seconds from start WARNING: [pool www] child 5517 exited on signal 4 (SIGILL) after 238677.092402 seconds from start WARNING: [pool www] child 29918 exited on signal 4 (SIGILL) after 19513.135416 seconds from start WARNING: [pool www] child 8551 exited on signal 4 (SIGILL) after 102877.660247 seconds from start WARNING: [pool www] child 3138 exited on signal 4 (SIGILL) after 245.902968 seconds from start
This could be “dirty fixed” with the following steps:
1) Create some PHP file with following content
root@server #vim temporary_illegal_instruction_php_fix.php --- <?php putenv("NSS_DISABLE_HW_AES=1"); $_ENV["NSS_DISABLE_HW_AES"]="1"; ---
– Keep in mind there is NO “?>” closing tag.
2) Now you need to prepend the file above to all of your php scripts
Happily, PHP already has such functionality, so you only need to add the following line in your global php.ini file.
root@server # vim php.ini # ..... auto_prepend_file = "/path/to/your/temporary_illegal_instruction_php_fix.php" # .....
3) Finally restart your php-fpm or your web server in order the change to take effect
Fix cPanel Softaculous, RVSiteBuilder or another plugin/cPanel misbehavior
If you happen to use cPanel as a hosting control panel, then you might have problems with some 3rd party plugins (or 1st party), which are affected by this bug.
The problem is that cPanel uses it’s own web server instance, for running the cPanel/WHM Web GUI.
In order to fix their internal PHP, you must do the same as in the “Fix PHP and PHP-FPM” step, with the only difference, that you must include the file into their internal php.ini:
/usr/local/cpanel/3rdparty/etc/php.ini
When I first reported this problem to them, they didn’t have any representative info about this, but now they have created a dedicated post, with detailed information, how to fix yourself on a cPanel based server:
https://forums.cpanel.net/threads/xen-hw_aes-detection-issues-yum-update-illegal-instruction.551681/
Thanks a lot for the this information! I’ve been all the weekend struggling with this…
This was a big help – thank you!
Thanks for the Info. Ran into this while doing test-work in AWS. Thought it was because t1.micro instances have too little memory (stopping a t1.micro PV instance and restarting as an m1.small or m3.medium also seems to make the problem go away). Nice to see can continue to use t1.micros for lower-cost taskings.
Hi,
That sounds very interesting.
I suppose that Amazon are using Xen based hypervisors for the t1.micro instances ,and KVM (or something different) for the small, medium instances :)
Dunno. They *should* be using Xen for any of the paravirtual instance-types.
I can confirm that in our XenServer environment, we’ve seen the problem only on paravirtualized hosts–not on fully-virtualized hosts. So if you have an option to use HVM virtualization, it may “solve” things as far as nss-based applications are concerned.
Thanks a TON. I spent the better part of a day trying to figure this out after yum updating NSS. It immediately broke my curl calls to all AWS EC2 servers. Even after yum downgrade of all NSS stuff. Scripts calling AWS with cUrl would work fine when called from the shell, but when passed to fast-cgi PHP-FPM in Nginx using a web browser the exact same scripts would cause the SIGILL error mentioned above and terminate the process. This was a tough one to troubleshoot — only affected the AWS remotes… something with SSL (?) Again, thanks! I’m up and running again thanks to you.
Hi,
I’m glad this still helps someone.
I was spending some good time to troubleshoot it , too :)
Thanks a lot man!