make-dynamic-linker-cache OOMs for LLVM 15 on i686-linux

  • Open
  • quality assurance status badge
Details
4 participants
  • Ludovic Courtès
  • Ludovic Courtès
  • Marius Bakke
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Marius Bakke
Severity
important
M
M
Marius Bakke wrote on 18 Nov 2022 10:41
(address . bug-guix@gnu.org)
87mt8o14xd.fsf@gnu.org
Hello,

LLVM 15.0.4 fails on i686-linux:


Because the 'make-dynamic-linker-cache' phase runs out of memory:

starting phase `make-dynamic-linker-cache'
GC Warning: Repeated allocation of very large block (appr. size 268439552):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 134221824):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 268439552):
May lead to memory leak and poor performance
GC Warning: Failed to expand heap by 285216768 bytes
GC Warning: Failed to expand heap by 268439552 bytes
GC Warning: Out of Memory! Heap size: 3620 MiB. Returning NULL!
Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
Warning: Unwind-only out of memory exception; skipping pre-unwind handler.


Not sure why this phase uses so much memory. Ideas?
-----BEGIN PGP SIGNATURE-----

iIUEARYKAC0WIQRNTknu3zbaMQ2ddzTocYulkRQQdwUCY3fRzg8cbWFyaXVzQGdu
dS5vcmcACgkQ6HGLpZEUEHdg4AEAlgoHstuJbIvzu6VNET3NpLOzagBsqFCMMulA
3qOfm/cBAKpm1PeRtKIKv40YW/HnXhVkUNKzNVizYDfDt8pFi0kA
=OSPc
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 22 Nov 2022 00:42
(name . Marius Bakke)(address . marius@gnu.org)
877cznidn8.fsf@gnu.org
Hi,

(Cc: Maxim and Greg for LLVM packaging questions below.)

Marius Bakke <marius@gnu.org> skribis:

Toggle quote (24 lines)
> LLVM 15.0.4 fails on i686-linux:
>
> https://ci.guix.gnu.org/build/1702995/details
>
> Because the 'make-dynamic-linker-cache' phase runs out of memory:
>
> starting phase `make-dynamic-linker-cache'
> GC Warning: Repeated allocation of very large block (appr. size 268439552):
> May lead to memory leak and poor performance
> GC Warning: Repeated allocation of very large block (appr. size 134221824):
> May lead to memory leak and poor performance
> GC Warning: Repeated allocation of very large block (appr. size 268439552):
> May lead to memory leak and poor performance
> GC Warning: Failed to expand heap by 285216768 bytes
> GC Warning: Failed to expand heap by 268439552 bytes
> GC Warning: Out of Memory! Heap size: 3620 MiB. Returning NULL!
> Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
> Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
> Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
>
> (excerpt from https://ci.guix.gnu.org/build/1702995/log/raw)
>
> Not sure why this phase uses so much memory. Ideas?

Yes: the gremlin.scm code uses ‘file-dynamic-info’, which loads the
whole file in memory. Ridiculous.

We should instead mmap it (but there are no ‘mmap’ bindings in Guile,
yet) or arrange to load just the relevant parts (we’ll have to check but
maybe ‘file-dynamic-info’ can find everything it needs at the beginning
of a file, the PT_DYNAMIC segment.)

For example, with the patch below, things still appear to be fine with
LLVM:

Toggle snippet (6 lines)
scheme@(guix build gremlin)> (file-dynamic-info "/gnu/store/mj14k58lfc88jhcn6va0s2fpwkv3s35c-llvm-13.0.1/lib/libLLVMScalarOpts.so")
$11 = #<<elf-dynamic-info> soname: "libLLVMScalarOpts.so.13" needed: ("libLLVMAggressiveInstCombine.so.13" "libLLVMInstCombine.so.13" "libLLVMTransformUtils.so.13" "libLLVMAnalysis.so.13" "libLLVMCore.so.13" "libLLVMSupport.so.13" "libstdc++.so.6" "libm.so.6" "libgcc_s.so.1" "libc.so.6" "ld-linux-x86-64.so.2") rpath: () runpath: ("/gnu/store/mj14k58lfc88jhcn6va0s2fpwkv3s35c-llvm-13.0.1/lib" "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib" "/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib" "/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib/gcc/x86_64-unknown-linux-gnu/10.3.0/../../..")>
scheme@(guix build gremlin)> (file-dynamic-info "/gnu/store/mj14k58lfc88jhcn6va0s2fpwkv3s35c-llvm-13.0.1/lib/libLLVMX86CodeGen.so.13")
$12 = #<<elf-dynamic-info> soname: "libLLVMX86CodeGen.so.13" needed: ("libLLVMAsmPrinter.so.13" "libLLVMX86Desc.so.13" "libLLVMX86Info.so.13" "libLLVMGlobalISel.so.13" "libLLVMCFGuard.so.13" "libLLVMSelectionDAG.so.13" "libLLVMCodeGen.so.13" "libLLVMTarget.so.13" "libLLVMTransformUtils.so.13" "libLLVMAnalysis.so.13" "libLLVMProfileData.so.13" "libLLVMMC.so.13" "libLLVMCore.so.13" "libLLVMSupport.so.13" "libstdc++.so.6" "libm.so.6" "libgcc_s.so.1" "libc.so.6" "ld-linux-x86-64.so.2") rpath: () runpath: ("/gnu/store/mj14k58lfc88jhcn6va0s2fpwkv3s35c-llvm-13.0.1/lib" "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib" "/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib" "/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib/gcc/x86_64-unknown-linux-gnu/10.3.0/../../..")>

We could temporarily delete this phase for all 32-bit builds of LLVM.

But the crux of the problem is that llvm@15 has a single huge shared
library, unlike previous versions:

Toggle snippet (6 lines)
$ du -hL /gnu/store/bgqdvvi7k6l255332rfawgjmn2hpn13r-llvm-15.0.4/lib/*.so
133M /gnu/store/bgqdvvi7k6l255332rfawgjmn2hpn13r-llvm-15.0.4/lib/libLLVM-15.0.4.so
96K /gnu/store/bgqdvvi7k6l255332rfawgjmn2hpn13r-llvm-15.0.4/lib/libLTO.so
16K /gnu/store/bgqdvvi7k6l255332rfawgjmn2hpn13r-llvm-15.0.4/lib/libRemarks.so

(It also has tons of .a files, which shouldn’t be there.)

Is that big LLVM.so due to different build options on our side? Or is
it a radical upstream change (sounds unlikely, but who knows)?

Thanks,
Ludo’.
Toggle diff (17 lines)
diff --git a/guix/build/gremlin.scm b/guix/build/gremlin.scm
index 2a74d51dd9..8a38dde1eb 100644
--- a/guix/build/gremlin.scm
+++ b/guix/build/gremlin.scm
@@ -250,7 +250,11 @@ (define (file-dynamic-info file)
info."
(call-with-input-file file
(lambda (port)
- (elf-dynamic-info (parse-elf (get-bytevector-all port))))))
+ (elf-dynamic-info (parse-elf
+ ;; Read at most 10 MiB in memory, which should be
+ ;; enough to get the PT_DYNAMIC segment.
+ ;; TODO: mmap the whole file instead.
+ (get-bytevector-n port (* 10 (expt 2 20))))))))
(define (file-runpath file)
"Return the DT_RUNPATH dynamic entry of FILE as a list of strings, or #f if
M
M
Maxim Cournoyer wrote on 22 Nov 2022 02:22
(name . Ludovic Courtès)(address . ludo@gnu.org)
874jurp9tz.fsf@gmail.com
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (33 lines)
> Hi,
>
> (Cc: Maxim and Greg for LLVM packaging questions below.)
>
> Marius Bakke <marius@gnu.org> skribis:
>
>> LLVM 15.0.4 fails on i686-linux:
>>
>> https://ci.guix.gnu.org/build/1702995/details
>>
>> Because the 'make-dynamic-linker-cache' phase runs out of memory:
>>
>> starting phase `make-dynamic-linker-cache'
>> GC Warning: Repeated allocation of very large block (appr. size 268439552):
>> May lead to memory leak and poor performance
>> GC Warning: Repeated allocation of very large block (appr. size 134221824):
>> May lead to memory leak and poor performance
>> GC Warning: Repeated allocation of very large block (appr. size 268439552):
>> May lead to memory leak and poor performance
>> GC Warning: Failed to expand heap by 285216768 bytes
>> GC Warning: Failed to expand heap by 268439552 bytes
>> GC Warning: Out of Memory! Heap size: 3620 MiB. Returning NULL!
>> Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
>> Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
>> Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
>>
>> (excerpt from https://ci.guix.gnu.org/build/1702995/log/raw)
>>
>> Not sure why this phase uses so much memory. Ideas?
>
> Yes: the gremlin.scm code uses ‘file-dynamic-info’, which loads the
> whole file in memory. Ridiculous.

If it loaded just that file, it should be fine, no? It weighs 133 MiB,
as you've shown below:

Toggle quote (9 lines)
> But the crux of the problem is that llvm@15 has a single huge shared
> library, unlike previous versions:
>
> $ du -hL /gnu/store/bgqdvvi7k6l255332rfawgjmn2hpn13r-llvm-15.0.4/lib/*.so
> 133M /gnu/store/bgqdvvi7k6l255332rfawgjmn2hpn13r-llvm-15.0.4/lib/libLLVM-15.0.4.so
> 96K /gnu/store/bgqdvvi7k6l255332rfawgjmn2hpn13r-llvm-15.0.4/lib/libLTO.so
> 16K /gnu/store/bgqdvvi7k6l255332rfawgjmn2hpn13r-llvm-15.0.4/lib/libRemarks.so
> (It also has tons of .a files, which shouldn’t be there.)

The static files are needed at least to build the clang runtime. I had
tried to get rid of them without success. Perhaps they could be moved
to a "static" output if they're needed only at that time.

Toggle quote (3 lines)
> Is that big LLVM.so due to different build options on our side? Or is
> it a radical upstream change (sounds unlikely, but who knows)?

It's caused by -DLLVM_LINK_LLVM_DYLIB=ON and -DLLVM_BUILD_LLVM_DYLIB=ON,
which is the supported configuration to build a shared library of LLVM
(-DBUILD_SHARED_LIBS=ON is obsolete/deprecated) [0].

It also makes things conveniently easy to link to LLVM; you just need to
link to '-lLLVM', and everything it needs is available.


--
Thanks,
Maxim
M
M
Maxim Cournoyer wrote on 20 Nov 2023 11:33
(name . Ludovic Courtès)(address . ludo@gnu.org)
87bkbo9q3q.fsf@gmail.com
Hi,

This still happens:

Toggle snippet (42 lines)
starting phase `make-dynamic-linker-cache'
GC Warning: Repeated allocation of very large block (appr. size 16781312):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 67112960):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 134221824):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 67112960):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 67112960):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 134221824):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 134221824):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 134221824):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 268439552):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 33558528):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 67112960):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 67112960):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 268439552):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 268439552):
May lead to memory leak and poor performance
GC Warning: Repeated allocation of very large block (appr. size 268439552):
May lead to memory leak and poor performance
GC Warning: Failed to expand heap by 285216768 bytes
GC Warning: Failed to expand heap by 268439552 bytes
GC Warning: Out of Memory! Heap size: 3362 MiB. Returning NULL!
Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
builder for
`/gnu/store/j4w1wrhgpjjcfqf2jskklr95r6hpy51i-llvm-15.0.7.drv' failed
with exit code 1


--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 12 Apr 02:35 -0700
(address . 59365@debbugs.gnu.org)
87le5igbyk.fsf@gnu.org
Hello,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (19 lines)
>> GC Warning: Failed to expand heap by 285216768 bytes
>> GC Warning: Failed to expand heap by 268439552 bytes
>> GC Warning: Out of Memory! Heap size: 3620 MiB. Returning NULL!
>> Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
>> Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
>> Warning: Unwind-only out of memory exception; skipping pre-unwind handler.
>>
>> (excerpt from https://ci.guix.gnu.org/build/1702995/log/raw)
>>
>> Not sure why this phase uses so much memory. Ideas?
>
> Yes: the gremlin.scm code uses ‘file-dynamic-info’, which loads the
> whole file in memory. Ridiculous.
>
> We should instead mmap it (but there are no ‘mmap’ bindings in Guile,
> yet) or arrange to load just the relevant parts (we’ll have to check but
> maybe ‘file-dynamic-info’ can find everything it needs at the beginning
> of a file, the PT_DYNAMIC segment.)

Another instance of the problem that we just stumbled upon is ‘guix pack -RR’:
that too tries to load entire ELF files in memory, in
‘elf-loader-compile-flags’.

Mmap!

Ludo’.
L
L
Ludovic Courtès wrote on 12 Apr 02:35 -0700
control message for bug #59365
(address . control@debbugs.gnu.org)
87jzl2gbye.fsf@gnu.org
severity 59365 important
quit
?