1. 12 Feb, 2019 12 commits
  2. 23 Jan, 2019 2 commits
    • Eric Biggers's avatar
      crypto: authenc - fix parsing key with misaligned rta_len · b9119fd2
      Eric Biggers authored
      commit 8f9c4693 upstream.
      
      Keys for "authenc" AEADs are formatted as an rtattr containing a 4-byte
      'enckeylen', followed by an authentication key and an encryption key.
      crypto_authenc_extractkeys() parses the key to find the inner keys.
      
      However, it fails to consider the case where the rtattr's payload is
      longer than 4 bytes but not 4-byte aligned, and where the key ends
      before the next 4-byte aligned boundary.  In this case, 'keylen -=
      RTA_ALIGN(rta->rta_len);' underflows to a value near UINT_MAX.  This
      causes a buffer overread and crash during crypto_ahash_setkey().
      
      Fix it by restricting the rtattr payload to the expected size.
      
      Reproducer using AF_ALG:
      
      	#include <linux/if_alg.h>
      	#include <linux/rtnetlink.h>
      	#include <sys/socket.h>
      
      	int main()
      	{
      		int fd;
      		struct sockaddr_alg addr = {
      			.salg_type = "aead",
      			.salg_name = "authenc(hmac(sha256),cbc(aes))",
      		};
      		struct {
      			struct rtattr attr;
      			__be32 enckeylen;
      			char keys[1];
      		} __attribute__((packed)) key = {
      			.attr.rta_len = sizeof(key),
      			.attr.rta_type = 1 /* CRYPTO_AUTHENC_KEYA_PARAM */,
      		};
      
      		fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
      		bind(fd, (void *)&addr, sizeof(addr));
      		setsockopt(fd, SOL_ALG, ALG_SET_KEY, &key, sizeof(key));
      	}
      
      It caused:
      
      	BUG: unable to handle kernel paging request at ffff88007ffdc000
      	PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 2e05067 PTE 0
      	Oops: 0000 [#1] SMP
      	CPU: 0 PID: 883 Comm: authenc Not tainted 4.20.0-rc1-00108-g00c9fe37 #13
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
      	RIP: 0010:sha256_ni_transform+0xb3/0x330 arch/x86/crypto/sha256_ni_asm.S:155
      	[...]
      	Call Trace:
      	 sha256_ni_finup+0x10/0x20 arch/x86/crypto/sha256_ssse3_glue.c:321
      	 crypto_shash_finup+0x1a/0x30 crypto/shash.c:178
      	 shash_digest_unaligned+0x45/0x60 crypto/shash.c:186
      	 crypto_shash_digest+0x24/0x40 crypto/shash.c:202
      	 hmac_setkey+0x135/0x1e0 crypto/hmac.c:66
      	 crypto_shash_setkey+0x2b/0xb0 crypto/shash.c:66
      	 shash_async_setkey+0x10/0x20 crypto/shash.c:223
      	 crypto_ahash_setkey+0x2d/0xa0 crypto/ahash.c:202
      	 crypto_authenc_setkey+0x68/0x100 crypto/authenc.c:96
      	 crypto_aead_setkey+0x2a/0xc0 crypto/aead.c:62
      	 aead_setkey+0xc/0x10 crypto/algif_aead.c:526
      	 alg_setkey crypto/af_alg.c:223 [inline]
      	 alg_setsockopt+0xfe/0x130 crypto/af_alg.c:256
      	 __sys_setsockopt+0x6d/0xd0 net/socket.c:1902
      	 __do_sys_setsockopt net/socket.c:1913 [inline]
      	 __se_sys_setsockopt net/socket.c:1910 [inline]
      	 __x64_sys_setsockopt+0x1f/0x30 net/socket.c:1910
      	 do_syscall_64+0x4a/0x180 arch/x86/entry/common.c:290
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: e236d4a8 ("[CRYPTO] authenc: Move enckeylen into key itself")
      Cc: <stable@vger.kernel.org> # v2.6.25+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9119fd2
    • Harsh Jain's avatar
      crypto: authencesn - Avoid twice completion call in decrypt path · d196d2fd
      Harsh Jain authored
      commit a7773363 upstream.
      
      Authencesn template in decrypt path unconditionally calls aead_request_complete
      after ahash_verify which leads to following kernel panic in after decryption.
      
      [  338.539800] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [  338.548372] PGD 0 P4D 0
      [  338.551157] Oops: 0000 [#1] SMP PTI
      [  338.554919] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G        W I       4.19.7+ #13
      [  338.564431] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0        07/29/10
      [  338.572212] RIP: 0010:esp_input_done2+0x350/0x410 [esp4]
      [  338.578030] Code: ff 0f b6 68 10 48 8b 83 c8 00 00 00 e9 8e fe ff ff 8b 04 25 04 00 00 00 83 e8 01 48 98 48 8b 3c c5 10 00 00 00 e9 f7 fd ff ff <8b> 04 25 04 00 00 00 83 e8 01 48 98 4c 8b 24 c5 10 00 00 00 e9 3b
      [  338.598547] RSP: 0018:ffff911c97803c00 EFLAGS: 00010246
      [  338.604268] RAX: 0000000000000002 RBX: ffff911c4469ee00 RCX: 0000000000000000
      [  338.612090] RDX: 0000000000000000 RSI: 0000000000000130 RDI: ffff911b87c20400
      [  338.619874] RBP: 0000000000000000 R08: ffff911b87c20498 R09: 000000000000000a
      [  338.627610] R10: 0000000000000001 R11: 0000000000000004 R12: 0000000000000000
      [  338.635402] R13: ffff911c89590000 R14: ffff911c91730000 R15: 0000000000000000
      [  338.643234] FS:  0000000000000000(0000) GS:ffff911c97800000(0000) knlGS:0000000000000000
      [  338.652047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  338.658299] CR2: 0000000000000004 CR3: 00000001ec20a000 CR4: 00000000000006f0
      [  338.666382] Call Trace:
      [  338.669051]  <IRQ>
      [  338.671254]  esp_input_done+0x12/0x20 [esp4]
      [  338.675922]  chcr_handle_resp+0x3b5/0x790 [chcr]
      [  338.680949]  cpl_fw6_pld_handler+0x37/0x60 [chcr]
      [  338.686080]  chcr_uld_rx_handler+0x22/0x50 [chcr]
      [  338.691233]  uldrx_handler+0x8c/0xc0 [cxgb4]
      [  338.695923]  process_responses+0x2f0/0x5d0 [cxgb4]
      [  338.701177]  ? bitmap_find_next_zero_area_off+0x3a/0x90
      [  338.706882]  ? matrix_alloc_area.constprop.7+0x60/0x90
      [  338.712517]  ? apic_update_irq_cfg+0x82/0xf0
      [  338.717177]  napi_rx_handler+0x14/0xe0 [cxgb4]
      [  338.722015]  net_rx_action+0x2aa/0x3e0
      [  338.726136]  __do_softirq+0xcb/0x280
      [  338.730054]  irq_exit+0xde/0xf0
      [  338.733504]  do_IRQ+0x54/0xd0
      [  338.736745]  common_interrupt+0xf/0xf
      
      Fixes: 104880a6 ("crypto: authencesn - Convert to new AEAD...")
      Signed-off-by: default avatarHarsh Jain <harsh@chelsio.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d196d2fd
  3. 01 Dec, 2018 1 commit
  4. 21 Nov, 2018 1 commit
  5. 13 Nov, 2018 2 commits
  6. 04 Oct, 2018 1 commit
  7. 26 Sep, 2018 1 commit
  8. 19 Sep, 2018 1 commit
  9. 09 Sep, 2018 1 commit
  10. 17 Aug, 2018 6 commits
    • Eric Biggers's avatar
      crypto: skcipher - fix crash flushing dcache in error path · 7e179bff
      Eric Biggers authored
      commit 8088d3dd upstream.
      
      scatterwalk_done() is only meant to be called after a nonzero number of
      bytes have been processed, since scatterwalk_pagedone() will flush the
      dcache of the *previous* page.  But in the error case of
      skcipher_walk_done(), e.g. if the input wasn't an integer number of
      blocks, scatterwalk_done() was actually called after advancing 0 bytes.
      This caused a crash ("BUG: unable to handle kernel paging request")
      during '!PageSlab(page)' on architectures like arm and arm64 that define
      ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
      page-aligned as in that case walk->offset == 0.
      
      Fix it by reorganizing skcipher_walk_done() to skip the
      scatterwalk_advance() and scatterwalk_done() if an error has occurred.
      
      This bug was found by syzkaller fuzzing.
      
      Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
      
      	#include <linux/if_alg.h>
      	#include <sys/socket.h>
      	#include <unistd.h>
      
      	int main()
      	{
      		struct sockaddr_alg addr = {
      			.salg_type = "skcipher",
      			.salg_name = "cbc(aes-generic)",
      		};
      		char buffer[4096] __attribute__((aligned(4096))) = { 0 };
      		int fd;
      
      		fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
      		bind(fd, (void *)&addr, sizeof(addr));
      		setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
      		fd = accept(fd, NULL, NULL);
      		write(fd, buffer, 15);
      		read(fd, buffer, 15);
      	}
      Reported-by: default avatarLiu Chao <liuchao741@huawei.com>
      Fixes: b286d8b1 ("crypto: skcipher - Add skcipher walk interface")
      Cc: <stable@vger.kernel.org> # v4.10+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e179bff
    • Eric Biggers's avatar
      crypto: skcipher - fix aligning block size in skcipher_copy_iv() · 0f2981ee
      Eric Biggers authored
      commit 0567fc9e upstream.
      
      The ALIGN() macro needs to be passed the alignment, not the alignmask
      (which is the alignment minus 1).
      
      Fixes: b286d8b1 ("crypto: skcipher - Add skcipher walk interface")
      Cc: <stable@vger.kernel.org> # v4.10+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f2981ee
    • Eric Biggers's avatar
      crypto: ablkcipher - fix crash flushing dcache in error path · 68432fd1
      Eric Biggers authored
      commit 318abdfb upstream.
      
      Like the skcipher_walk and blkcipher_walk cases:
      
      scatterwalk_done() is only meant to be called after a nonzero number of
      bytes have been processed, since scatterwalk_pagedone() will flush the
      dcache of the *previous* page.  But in the error case of
      ablkcipher_walk_done(), e.g. if the input wasn't an integer number of
      blocks, scatterwalk_done() was actually called after advancing 0 bytes.
      This caused a crash ("BUG: unable to handle kernel paging request")
      during '!PageSlab(page)' on architectures like arm and arm64 that define
      ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
      page-aligned as in that case walk->offset == 0.
      
      Fix it by reorganizing ablkcipher_walk_done() to skip the
      scatterwalk_advance() and scatterwalk_done() if an error has occurred.
      Reported-by: default avatarLiu Chao <liuchao741@huawei.com>
      Fixes: bf06099d ("crypto: skcipher - Add ablkcipher_walk interfaces")
      Cc: <stable@vger.kernel.org> # v2.6.35+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68432fd1
    • Eric Biggers's avatar
      crypto: blkcipher - fix crash flushing dcache in error path · 2cde72d9
      Eric Biggers authored
      commit 0868def3 upstream.
      
      Like the skcipher_walk case:
      
      scatterwalk_done() is only meant to be called after a nonzero number of
      bytes have been processed, since scatterwalk_pagedone() will flush the
      dcache of the *previous* page.  But in the error case of
      blkcipher_walk_done(), e.g. if the input wasn't an integer number of
      blocks, scatterwalk_done() was actually called after advancing 0 bytes.
      This caused a crash ("BUG: unable to handle kernel paging request")
      during '!PageSlab(page)' on architectures like arm and arm64 that define
      ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
      page-aligned as in that case walk->offset == 0.
      
      Fix it by reorganizing blkcipher_walk_done() to skip the
      scatterwalk_advance() and scatterwalk_done() if an error has occurred.
      
      This bug was found by syzkaller fuzzing.
      
      Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
      
      	#include <linux/if_alg.h>
      	#include <sys/socket.h>
      	#include <unistd.h>
      
      	int main()
      	{
      		struct sockaddr_alg addr = {
      			.salg_type = "skcipher",
      			.salg_name = "ecb(aes-generic)",
      		};
      		char buffer[4096] __attribute__((aligned(4096))) = { 0 };
      		int fd;
      
      		fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
      		bind(fd, (void *)&addr, sizeof(addr));
      		setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
      		fd = accept(fd, NULL, NULL);
      		write(fd, buffer, 15);
      		read(fd, buffer, 15);
      	}
      Reported-by: default avatarLiu Chao <liuchao741@huawei.com>
      Fixes: 5cde0af2 ("[CRYPTO] cipher: Added block cipher type")
      Cc: <stable@vger.kernel.org> # v2.6.19+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cde72d9
    • Eric Biggers's avatar
      crypto: vmac - separate tfm and request context · e7aefb13
      Eric Biggers authored
      commit bb296481 upstream.
      
      syzbot reported a crash in vmac_final() when multiple threads
      concurrently use the same "vmac(aes)" transform through AF_ALG.  The bug
      is pretty fundamental: the VMAC template doesn't separate per-request
      state from per-tfm (per-key) state like the other hash algorithms do,
      but rather stores it all in the tfm context.  That's wrong.
      
      Also, vmac_final() incorrectly zeroes most of the state including the
      derived keys and cached pseudorandom pad.  Therefore, only the first
      VMAC invocation with a given key calculates the correct digest.
      
      Fix these bugs by splitting the per-tfm state from the per-request state
      and using the proper init/update/final sequencing for requests.
      
      Reproducer for the crash:
      
          #include <linux/if_alg.h>
          #include <sys/socket.h>
          #include <unistd.h>
      
          int main()
          {
                  int fd;
                  struct sockaddr_alg addr = {
                          .salg_type = "hash",
                          .salg_name = "vmac(aes)",
                  };
                  char buf[256] = { 0 };
      
                  fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
                  bind(fd, (void *)&addr, sizeof(addr));
                  setsockopt(fd, SOL_ALG, ALG_SET_KEY, buf, 16);
                  fork();
                  fd = accept(fd, NULL, NULL);
                  for (;;)
                          write(fd, buf, 256);
          }
      
      The immediate cause of the crash is that vmac_ctx_t.partial_size exceeds
      VMAC_NHBYTES, causing vmac_final() to memset() a negative length.
      
      Reported-by: syzbot+264bca3a6e8d645550d3@syzkaller.appspotmail.com
      Fixes: f1939f7c ("crypto: vmac - New hash algorithm for intel_txt support")
      Cc: <stable@vger.kernel.org> # v2.6.32+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7aefb13
    • Eric Biggers's avatar
      crypto: vmac - require a block cipher with 128-bit block size · ef70d145
      Eric Biggers authored
      commit 73bf20ef upstream.
      
      The VMAC template assumes the block cipher has a 128-bit block size, but
      it failed to check for that.  Thus it was possible to instantiate it
      using a 64-bit block size cipher, e.g. "vmac(cast5)", causing
      uninitialized memory to be used.
      
      Add the needed check when instantiating the template.
      
      Fixes: f1939f7c ("crypto: vmac - New hash algorithm for intel_txt support")
      Cc: <stable@vger.kernel.org> # v2.6.32+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef70d145
  11. 03 Aug, 2018 2 commits
  12. 22 Jul, 2018 1 commit
  13. 17 Jul, 2018 1 commit
    • Eric Biggers's avatar
      crypto: x86/salsa20 - remove x86 salsa20 implementations · 19f39eff
      Eric Biggers authored
      commit b7b73cd5 upstream.
      
      The x86 assembly implementations of Salsa20 use the frame base pointer
      register (%ebp or %rbp), which breaks frame pointer convention and
      breaks stack traces when unwinding from an interrupt in the crypto code.
      Recent (v4.10+) kernels will warn about this, e.g.
      
      WARNING: kernel stack regs at 00000000a8291e69 in syzkaller047086:4677 has bad 'bp' value 000000001077994c
      [...]
      
      But after looking into it, I believe there's very little reason to still
      retain the x86 Salsa20 code.  First, these are *not* vectorized
      (SSE2/SSSE3/AVX2) implementations, which would be needed to get anywhere
      close to the best Salsa20 performance on any remotely modern x86
      processor; they're just regular x86 assembly.  Second, it's still
      unclear that anyone is actually using the kernel's Salsa20 at all,
      especially given that now ChaCha20 is supported too, and with much more
      efficient SSSE3 and AVX2 implementations.  Finally, in benchmarks I did
      on both Intel and AMD processors with both gcc 8.1.0 and gcc 4.9.4, the
      x86_64 salsa20-asm is actually slightly *slower* than salsa20-generic
      (~3% slower on Skylake, ~10% slower on Zen), while the i686 salsa20-asm
      is only slightly faster than salsa20-generic (~15% faster on Skylake,
      ~20% faster on Zen).  The gcc version made little difference.
      
      So, the x86_64 salsa20-asm is pretty clearly useless.  That leaves just
      the i686 salsa20-asm, which based on my tests provides a 15-20% speed
      boost.  But that's without updating the code to not use %ebp.  And given
      the maintenance cost, the small speed difference vs. salsa20-generic,
      the fact that few people still use i686 kernels, the doubt that anyone
      is even using the kernel's Salsa20 at all, and the fact that a SSE2
      implementation would almost certainly be much faster on any remotely
      modern x86 processor yet no one has cared enough to add one yet, I don't
      think it's worthwhile to keep.
      
      Thus, just remove both the x86_64 and i686 salsa20-asm implementations.
      
      Reported-by: syzbot+ffa3a158337bbc01ff09@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19f39eff
  14. 03 Jul, 2018 1 commit
    • Maciej S. Szmigiero's avatar
      X.509: unpack RSA signatureValue field from BIT STRING · af20e4ec
      Maciej S. Szmigiero authored
      commit b65c32ec upstream.
      
      The signatureValue field of a X.509 certificate is encoded as a BIT STRING.
      For RSA signatures this BIT STRING is of so-called primitive subtype, which
      contains a u8 prefix indicating a count of unused bits in the encoding.
      
      We have to strip this prefix from signature data, just as we already do for
      key data in x509_extract_key_data() function.
      
      This wasn't noticed earlier because this prefix byte is zero for RSA key
      sizes divisible by 8. Since BIT STRING is a big-endian encoding adding zero
      prefixes has no bearing on its value.
      
      The signature length, however was incorrect, which is a problem for RSA
      implementations that need it to be exactly correct (like AMD CCP).
      Signed-off-by: default avatarMaciej S. Szmigiero <mail@maciej.szmigiero.name>
      Fixes: c26fd69f ("X.509: Add a crypto key parser for binary (DER) X.509 certificates")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJames Morris <james.morris@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af20e4ec
  15. 30 May, 2018 1 commit
    • Eric Biggers's avatar
      PKCS#7: fix direct verification of SignerInfo signature · e47c1bf9
      Eric Biggers authored
      [ Upstream commit 6459ae38 ]
      
      If none of the certificates in a SignerInfo's certificate chain match a
      trusted key, nor is the last certificate signed by a trusted key, then
      pkcs7_validate_trust_one() tries to check whether the SignerInfo's
      signature was made directly by a trusted key.  But, it actually fails to
      set the 'sig' variable correctly, so it actually verifies the last
      signature seen.  That will only be the SignerInfo's signature if the
      certificate chain is empty; otherwise it will actually be the last
      certificate's signature.
      
      This is not by itself a security problem, since verifying any of the
      certificates in the chain should be sufficient to verify the SignerInfo.
      Still, it's not working as intended so it should be fixed.
      
      Fix it by setting 'sig' correctly for the direct verification case.
      
      Fixes: 757932e6 ("PKCS#7: Handle PKCS#7 messages that contain no X.509 certs")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e47c1bf9
  16. 16 May, 2018 1 commit
  17. 01 May, 2018 1 commit
  18. 12 Apr, 2018 1 commit
    • Arnd Bergmann's avatar
      crypto: aes-generic - build with -Os on gcc-7+ · 7cae67e3
      Arnd Bergmann authored
      
      [ Upstream commit 148b974d ]
      
      While testing other changes, I discovered that gcc-7.2.1 produces badly
      optimized code for aes_encrypt/aes_decrypt. This is especially true when
      CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely
      large stack usage that in turn might cause kernel stack overflows:
      
      crypto/aes_generic.c: In function 'aes_encrypt':
      crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      crypto/aes_generic.c: In function 'aes_decrypt':
      crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      
      I verified that this problem exists on all architectures that are
      supported by gcc-7.2, though arm64 in particular is less affected than
      the others. I also found that gcc-7.1 and gcc-8 do not show the extreme
      stack usage but still produce worse code than earlier versions for this
      file, apparently because of optimization passes that generally provide
      a substantial improvement in object code quality but understandably fail
      to find any shortcuts in the AES algorithm.
      
      Possible workarounds include
      
      a) disabling -ftree-pre and -ftree-sra optimizations, this was an earlier
         patch I tried, which reliably fixed the stack usage, but caused a
         serious performance regression in some versions, as later testing
         found.
      
      b) disabling UBSAN on this file or all ciphers, as suggested by Ard
         Biesheuvel. This would lead to massively better crypto performance in
         UBSAN-enabled kernels and avoid the stack usage, but there is a concern
         over whether we should exclude arbitrary files from UBSAN at all.
      
      c) Forcing the optimization level in a different way. Similar to a),
         but rather than deselecting specific optimization stages,
         this now uses "gcc -Os" for this file, regardless of the
         CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE/SIZE option. This is a reliable
         workaround for the stack consumption on all architecture, and I've
         retested the performance results now on x86, cycles/byte (lower is
         better) for cbc(aes-generic) with 256 bit keys:
      
      			-O2     -Os
      	gcc-6.3.1	14.9	15.1
      	gcc-7.0.1	14.7	15.3
      	gcc-7.1.1	15.3	14.7
      	gcc-7.2.1	16.8	15.9
      	gcc-8.0.0	15.5	15.6
      
      This implements the option c) by enabling forcing -Os on all compiler
      versions starting with gcc-7.1. As a workaround for PR83356, it would
      only be needed for gcc-7.2+ with UBSAN enabled, but since it also shows
      better performance on gcc-7.1 without UBSAN, it seems appropriate to
      use the faster version here as well.
      
      Side note: during testing, I also played with the AES code in libressl,
      which had a similar performance regression from gcc-6 to gcc-7.2,
      but was three times slower overall. It might be interesting to
      investigate that further and possibly port the Linux implementation
      into that.
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
      Cc: Richard Biener <rguenther@suse.de>
      Cc: Jakub Jelinek <jakub@gcc.gnu.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cae67e3
  19. 08 Apr, 2018 3 commits