当前位置:   article > 正文

android HAL层崩溃排查记录

android HAL层崩溃排查记录

要最近在调试系统HDMI CEC功能时,遇到一个奇怪的崩溃问题,这边记录下。

初步分析

先上日志:

  1. --------- beginning of crash
  2. 03-06 10:48:25.503 1133 1133 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
  3. 03-06 10:48:25.503 1133 1133 F DEBUG : Build fingerprint: ':13/TD1A.220804.031/3582:userdebug/release-keys'
  4. 03-06 10:48:25.503 1133 1133 F DEBUG : Revision: '0'
  5. 03-06 10:48:25.503 1133 1133 F DEBUG : ABI: 'arm64'
  6. 03-06 10:48:25.503 1133 1133 F DEBUG : Timestamp: 2024-03-06 10:48:25.490260378-0500
  7. 03-06 10:48:25.503 1133 1133 F DEBUG : Process uptime: 6s
  8. 03-06 10:48:25.503 1133 1133 F DEBUG : Cmdline: /vendor/bin/hw/android.hardware.tv.cec@1.0-service
  9. 03-06 10:48:25.503 1133 1133 F DEBUG : pid: 615, tid: 615, name: cec@1.0-service >>> /vendor/bin/hw/android.hardware.tv.cec@1.0-service <<<
  10. 03-06 10:48:25.503 1133 1133 F DEBUG : uid: 1000
  11. 03-06 10:48:25.503 1133 1133 F DEBUG : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
  12. 03-06 10:48:25.503 1133 1133 F DEBUG : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
  13. 03-06 10:48:25.503 1133 1133 F DEBUG : Abort message: 'stack corruption detected (-fstack-protector)'
  14. 03-06 10:48:25.503 1133 1133 F DEBUG : x0 0000000000000000 x1 0000000000000267 x2 0000000000000006 x3 0000007fe8d61420
  15. 03-06 10:48:25.503 1133 1133 F DEBUG : x4 0000000000808080 x5 0000000000808080 x6 0000000000808080 x7 8080808080808080
  16. 03-06 10:48:25.503 1133 1133 F DEBUG : x8 00000000000000f0 x9 00000077cc5b4a00 x10 0000000000000001 x11 00000077cc5f2ce4
  17. 03-06 10:48:25.503 1133 1133 F DEBUG : x12 0101010101010101 x13 000000007fffffff x14 0000000000001686 x15 0000000000000030
  18. 03-06 10:48:25.503 1133 1133 F DEBUG : x16 00000077cc657d60 x17 00000077cc634b70 x18 00000077d3ae2000 x19 0000000000000267
  19. 03-06 10:48:25.504 1133 1133 F DEBUG : x20 0000000000000267 x21 00000000ffffffff x22 0000000000000030 x23 00000077d302a000
  20. 03-06 10:48:25.504 1133 1133 F DEBUG : x24 0000000000000004 x25 00000077d302a000 x26 00000077d302a000 x27 b40000763c5972c8
  21. 03-06 10:48:25.504 1133 1133 F DEBUG : x28 0000000000000000 x29 0000007fe8d614a0
  22. 03-06 10:48:25.504 1133 1133 F DEBUG : lr 00000077cc5e4868 sp 0000007fe8d61400 pc 00000077cc5e4894 pst 0000000000001000
  23. 03-06 10:48:25.504 1133 1133 F DEBUG : backtrace:
  24. 03-06 10:48:25.504 1133 1133 F DEBUG : #00 pc 0000000000051894 /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
  25. 03-06 10:48:25.504 1133 1133 F DEBUG : #01 pc 00000000000664e8 /apex/com.android.runtime/lib64/bionic/libc.so (__stack_chk_fail+20) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
  26. 03-06 10:48:25.504 1133 1133 F DEBUG : #02 pc 0000000000006954 /vendor/lib64/hw/android.hardware.tv.cec@1.0-impl.so (android::hardware::tv::cec::V1_0::implementation::HdmiCec::getPortInfo(std::__1::function<void (android::hardware::hidl_vec<android::hardware::tv::cec::V1_0::HdmiPortInfo> const&)>)+376) (BuildId: 647cc2659b38df33f681ae1d58a04c74)
  27. 03-06 10:48:25.504 1133 1133 F DEBUG : #03 pc 0000000000016540 /vendor/lib64/android.hardware.tv.cec@1.0.so (android::hardware::tv::cec::V1_0::BnHwHdmiCec::_hidl_getPortInfo(android::hidl::base::V1_0::BnHwBase*, android::hardware::Parcel const&, android::hardware::Parcel*, std::__1::function<void (android::hardware::Parcel&)>)+252) (BuildId: 8ca54579dc40d30a62824bb0a91d98f4)
  28. 03-06 10:48:25.504 1133 1133 F DEBUG : #04 pc 0000000000017668 /vendor/lib64/android.hardware.tv.cec@1.0.so (android::hardware::tv::cec::V1_0::BnHwHdmiCec::onTransact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+1132) (BuildId: 8ca54579dc40d30a62824bb0a91d98f4)
  29. 03-06 10:48:25.504 1133 1133 F DEBUG : #05 pc 000000000008ee40 /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::BHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+156) (BuildId: 3fafcf3a9734f0d41045c2b5f828b363)
  30. 03-06 10:48:25.504 1133 1133 F DEBUG : #06 pc 0000000000093dfc /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::IPCThreadState::executeCommand(int)+2784) (BuildId: 3fafcf3a9734f0d41045c2b5f828b363)
  31. 03-06 10:48:25.504 1133 1133 F DEBUG : #07 pc 00000000000931bc /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::IPCThreadState::getAndExecuteCommand()+224) (BuildId: 3fafcf3a9734f0d41045c2b5f828b363)
  32. 03-06 10:48:25.504 1133 1133 F DEBUG : #08 pc 0000000000094388 /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::IPCThreadState::joinThreadPool(bool)+172) (BuildId: 3fafcf3a9734f0d41045c2b5f828b363)
  33. 03-06 10:48:25.504 1133 1133 F DEBUG : #09 pc 00000000000010e4 /vendor/bin/hw/android.hardware.tv.cec@1.0-service (main+144) (BuildId: f6a65dc725b06643501c269fa219b717)
  34. 03-06 10:48:25.504 1133 1133 F DEBUG : #10 pc 000000000004a0f4 /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+96) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
  35. 03-06 10:48:26.344 1267 1267 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

 初步看了下,崩溃在android.hardware.tv.cec@1.0-service 服务进程中。那么简单,上addr2line工具。

addr2line

  1. addr2line --help
  2. Usage: addr2line [option(s)] [addr(s)]
  3. Convert addresses into line number/file name pairs.
  4. If no addresses are specified on the command line, they will be read from stdin
  5. The options are:
  6. @<file> Read options from <file>
  7. -a --addresses Show addresses
  8. -b --target=<bfdname> Set the binary file format
  9. -e --exe=<executable> Set the input file name (default is a.out)
  10. -i --inlines Unwind inlined functions
  11. -j --section=<name> Read section-relative offsets instead of addresses
  12. -p --pretty-print Make the output easier to read for humans
  13. -s --basenames Strip directory names
  14. -f --functions Show function names
  15. -C --demangle[=style] Demangle function names
  16. -R --recurse-limit Enable a limit on recursion whilst demangling. [Default]
  17. -r --no-recurse-limit Disable a limit on recursion whilst demangling
  18. -h --help Display this information
  19. -v --version Display the program's version
  20. addr2line: supported targets: elf64-x86-64 elf32-i386 elf32-iamcu elf32-x86-64 pei-i386 pe-x86-64 pei-x86-64 elf64-l1om elf64-k1om elf64-little elf64-big elf32-little elf32-big pe-bigobj-x86-64 pe-i386 srec symbolsrec verilog tekhex binary ihex plugin
  21. Report bugs to <https://sourceware.org/bugzilla/>

addr2line -ife out/target/product/aosp/symbols/vendor/lib64/hw/android.hardware.tv.cec@1.0-impl.so 0000000000006954

llvm-addr2line

记得android编译链接工具更新了,确实不能用这个版本了。下面切成llvm-addr2line工具。

  1. prebuilts/clang/host/linux-x86/llvm-binutils-stable/llvm-addr2line --help
  2. OVERVIEW: llvm-addr2line
  3. USAGE: llvm-addr2line [options] addresses...
  4. OPTIONS:
  5. --addresses Show address before line information
  6. --adjust-vma=<offset> Add specified offset to object file addresses
  7. -a Alias for --addresses
  8. --basenames Strip directory names from paths
  9. -C Alias for --demangle
  10. --debug-file-directory=<dir>
  11. Path to directory where to look for debug files
  12. -demangle=false Alias for --no-demangle
  13. -demangle=true Alias for --demangle
  14. --demangle Demangle function names
  15. --dia Use the DIA library to access symbols (Windows only)
  16. --dwp=<file> Path to DWP file to be use for any split CUs
  17. -e=<file> Alias for --obj
  18. --exe=<file> Alias for --obj
  19. --exe <file> Alias for --obj
  20. -e <file> Alias for --obj
  21. -f=<value> Alias for --functions=
  22. --fallback-debug-path=<dir>
  23. Fallback path for debug binaries
  24. --functions=<value> Print function name for a given address
  25. --functions Print function name for a given address
  26. -f Alias for --functions
  27. --help Display this help
  28. --inlines Print all inlined frames for a given address
  29. --inlining=false Alias for --no-inlines
  30. --inlining=true Alias for --inlines
  31. --inlining Alias for --inlines
  32. -i Alias for --inlines
  33. --no-demangle Don't demangle function names
  34. --no-inlines Do not print inlined frames
  35. --no-untag-addresses Remove memory tags from addresses before symbolization
  36. --obj=<file> Path to object file to be symbolized (if not provided, object file should be specified for each input line)
  37. --output-style=style Specify print style. Supported styles: LLVM, GNU, JSON
  38. --pretty-print Make the output more human friendly
  39. --print-address Alias for --addresses
  40. --print-source-context-lines=<value>
  41. Print N lines of source file context
  42. -p Alias for --pretty-print
  43. --relative-address Interpret addresses as addresses relative to the image base
  44. --relativenames Strip the compilation directory from paths
  45. -s Alias for --basenames
  46. --verbose Print verbose line info
  47. --version Display the version
  48. -v Alias for --version
  49. llvm-symbolizer Mach-O Specific Options:
  50. --default-arch=<value> Default architecture (for multi-arch objects)
  51. --dsym-hint=<dir> Path to .dSYM bundles to search for debug info for the object files
  52. Pass @FILE as argument to read options from FILE.

于是,定位命令行切换成:

  1. prebuilts/clang/host/linux-x86/llvm-binutils-stable/llvm-addr2line -ife out/target/product/aosp/symbols/vendor/lib64/hw/android.hardware.tv.cec@1.0-impl.so 0000000000006954
  2. _ZN7android8hardware2tv3cec4V1_014implementation7HdmiCec11getPortInfoENSt3__18functionIFvRKNS0_8hidl_vecINS3_12HdmiPortInfoEEEEEE
  3. hardware/interfaces/tv/cec/1.0/default/HdmiCec.cpp:0

怎么可能是源码的0行,现在轮到我崩溃了。。

背景知识

看来直接通过上面的通用方式,不能直接定位到崩溃点的代码了。

那从进程名,打印出来的函数名:getPortInfo ,对应的崩溃错误:

Abort message: 'stack corruption detected (-fstack-protector)'

来看看能不能发现些什么。

-fstack-protector 检测到的堆栈损坏

编译器的 -fstack-protector 选项会在具有栈上缓冲区的函数中插入检查机制,以防止缓冲区溢出。默认情况下,系统会为平台代码(而非应用)启用此选项。启用此选项后,编译器会向函数序言添加指令,以在堆栈上写入刚刚超过上一局部值的随机值,并向函数结尾添加指令以进行回读并确认是否发生更改。如果该值已更改,则表示该值已被缓冲区溢出覆盖,因此该结尾会调用 __stack_chk_fail 来记录消息和中止。

pid: 26717, tid: 26717, name: crasher  >>> crasher <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: 'stack corruption detected'
    r0 00000000  r1 0000685d  r2 00000006  r3 00000008
    r4 ffd516d8  r5 0000685d  r6 0000685d  r7 0000010c
    r8 00000000  r9 00000000  sl 00000000  fp ffd518bc
    ip 00000000  sp ffd516c8  lr ee63ece3  pc ee66ef0c  cpsr 000e0010

backtrace:
    #00 pc 00049f0c  /system/lib/libc.so (tgkill+12)
    #01 pc 00019cdf  /system/lib/libc.so (abort+50)
    #02 pc 0001e07d  /system/lib/libc.so (__libc_fatal+24)
    #03 pc 0004863f  /system/lib/libc.so (__stack_chk_fail+6)
    #04 pc 000013ed  /system/xbin/crasher (smash_stack+76)
    #05 pc 00001591  /system/xbin/crasher (do_action+280)
    #06 pc 00002219  /system/xbin/crasher (main+100)
    #07 pc 000177a1  /system/lib/libc.so (__libc_init+48)
    #08 pc 00001144  /system/xbin/crasher (_start+96)

0x00 概述

栈溢出保护是一种缓冲区溢出攻击缓解手段,当函数存在缓冲区溢出攻击漏洞时,攻击者可以覆盖栈上的返回地址来让shellcode能够得到执行。当启用栈保护后,函数开始执行的时候会先往栈里插入cookie信息,当函数真正返回的时候会验证cookie信息能否合法,假如不合法就中止程序运行。攻击者在覆盖返回地址的时候往往也会将cookie信息给覆盖掉,导致栈保护检查失败而阻止shellcode的执行。在Linux中我们将cookie信息称为canary(以下统一使用canary)。

gcc在4.2版本中增加了-fstack-protector和-fstack-protector-all编译参数以支持栈保护功能,4.9新添加了-fstack-protector-strong编译参数让保护的范围更广。以下是-fstack-protector和-fstack-protector-strong的区别:

原创技术干货 | 解读Linux安全机制之栈溢出保护

Linux系统中存在着三种类型的栈:

  1. 应用程序栈:工作在Ring3,由应用程序来维护;

  2. 内核进程上下文栈:工作在Ring0,由内核在创立线程的时候创立;

  3. 内核中断上下文栈:工作在Ring0,在内核初始化的时候给每个CPU核心创立一个。

看来,是哪里可能存在内存溢出。联系到用户之前在未进行多个hdmi cec端口配置时,未有发现此问题。配置多个口后,出现此问题。而 java层代码一致,有变动的,就是HAL层这块了。

定位改动

因为有比较明确的改动地方引起,就从改动开始排查吧。

HAL被调用的地方,也是上面崩溃指向的函数:

  1. Return<void> HdmiCec::getPortInfo(getPortInfo_cb _hidl_cb) {
  2. struct hdmi_port_info* legacyPorts;
  3. int numPorts;
  4. hidl_vec<HdmiPortInfo> portInfos;
  5. mDevice->get_port_info(mDevice, &legacyPorts, &numPorts);
  6. portInfos.resize(numPorts);
  7. for (int i = 0; i < numPorts; ++i) {
  8. portInfos[i] = {
  9. .type = static_cast<HdmiPortType>(legacyPorts[i].type),
  10. .portId = static_cast<uint32_t>(legacyPorts[i].port_id),
  11. .cecSupported = legacyPorts[i].cec_supported != 0,
  12. .arcSupported = legacyPorts[i].arc_supported != 0,
  13. .physicalAddress = legacyPorts[i].physical_address
  14. };
  15. }
  16. _hidl_cb(portInfos);
  17. return Void();
  18. }

初始版本

  1. struct hdmi_cec_context_t {
  2. hdmi_cec_device_t device;
  3. /* our private state goes below here */
  4. event_callback_t event_callback;
  5. void* cec_arg;
  6. struct hdmi_port_info port;
  7. int fd;
  8. int en_mask;
  9. bool enable;
  10. bool system_control;
  11. int phy_addr;
  12. bool hotplug;
  13. bool cec_init;
  14. };
  1. static void hdmi_cec_get_port_info(const struct hdmi_cec_device* dev,
  2. struct hdmi_port_info* list[], int* total)
  3. {
  4. ...
  5. list[0] = &ctx->port;
  6. list[0]->type = HDMI_OUTPUT;
  7. list[0]->port_id = HDMI_CEC_PORT_ID;
  8. list[0]->cec_supported = support;
  9. list[0]->arc_supported = 0;
  10. list[0]->physical_address = val;
  11. *total = 1;
  12. }

问题版本

  1. struct hdmi_cec_context_t {
  2. hdmi_cec_device_t device;
  3. /* our private state goes below here */
  4. event_callback_t event_callback;
  5. void* cec_arg;
  6. struct hdmi_port_info port[4];
  7. int fd;
  8. int en_mask;
  9. bool enable;
  10. bool system_control;
  11. int phy_addr;
  12. bool hotplug;
  13. bool cec_init;
  14. };
  1. static void hdmi_cec_get_port_info(const struct hdmi_cec_device* dev,
  2. struct hdmi_port_info* list[], int* total)
  3. {
  4. ...
  5. list[0] = &ctx->port[0];
  6. list[0]->type = HDMI_INPUT;
  7. list[0]->port_id = 1;
  8. list[0]->cec_supported = support;
  9. list[0]->arc_supported = 0;
  10. list[0]->physical_address = 0x1000;//CVT_DEF_ARC_PHYSICAL_ADDRESS;
  11. list[1] = &ctx->port[1];
  12. list[1]->type = HDMI_INPUT;
  13. list[1]->port_id = 2;
  14. list[1]->cec_supported = support;
  15. list[1]->arc_supported = 0;
  16. list[1]->physical_address = 0x3000;
  17. list[2] = &ctx->port[2];
  18. list[2]->type = HDMI_INPUT;
  19. list[2]->port_id = 3;
  20. list[2]->cec_supported = support;
  21. list[2]->arc_supported = 0;
  22. list[2]->physical_address = 0x4000;
  23. list[3] = &ctx->port[3];
  24. list[3]->type = HDMI_INPUT;
  25. list[3]->port_id = 4;
  26. list[3]->cec_supported = support;
  27. list[3]->arc_supported = 1;
  28. list[3]->physical_address = 0x2000;
  29. *total = 4;
  30. }

上面测试过,只添加2个(list[0],list[1]),也是不会崩溃。看起来,是个内存溢出的问题。排查了相关数量定义,限制,似乎是没有找到有限制2个的。反馈还出现过一次配置3个的可以。

关注下面变量的定义及传递: 

struct hdmi_port_info* legacyPorts;
mDevice->get_port_info(mDevice, &legacyPorts, &numPorts);

hdmi_cec_get_port_info的参数

struct hdmi_port_info* list[] 是一个指针数组,其中每个元素都是指向 struct hdmi_port_info 结构体的指针。list 是一个指针数组,它可以存储 struct hdmi_port_info* 类型的指针。

修正版本

  1. static void hdmi_cec_get_port_info(const struct hdmi_cec_device* dev,
  2. struct hdmi_port_info* list[], int* total)
  3. {
  4. ...
  5. ctx->port[0].type = HDMI_INPUT;
  6. ctx->port[0].port_id = 1;
  7. ctx->port[0].cec_supported = 1;
  8. ctx->port[0].arc_supported = 1;
  9. ctx->port[0].physical_address = 0x1000;
  10. ctx->port[1].type = HDMI_INPUT;
  11. ctx->port[1].port_id = 2;
  12. ctx->port[1].cec_supported = 1;
  13. ctx->port[1].arc_supported = 0;
  14. ctx->port[1].physical_address = 0x2000;
  15. ctx->port[2].type = HDMI_INPUT;
  16. ctx->port[2].port_id = 3;
  17. ctx->port[2].cec_supported = 1;
  18. ctx->port[2].arc_supported = 0;
  19. ctx->port[2].physical_address = 0x3000;
  20. ctx->port[3].type = HDMI_INPUT;
  21. ctx->port[3].port_id = 4;
  22. ctx->port[3].cec_supported = 1;
  23. ctx->port[3].arc_supported = 0;
  24. ctx->port[3].physical_address = 0x4000;
  25. *list = &ctx->port[0];
  26. *total = 4;
  27. }

问题分析

让我们逐步解释上述过程中涉及到的相关步骤:

1. 定义 `legacyPorts` 指针:

struct hdmi_port_info* legacyPorts;
   这行代码定义了一个名为 `legacyPorts` 的指针,它的类型是 `struct hdmi_port_info*`,即指向 `struct hdmi_port_info` 结构体的指针。

2. 调用 hdmi_cec_get_port_info 函数:

hdmi_cec_get_port_info(mDevice, &legacyPorts, &numPorts);
   在这行代码中,我们将 `legacyPorts` 的地址(即指向 `legacyPorts` 指针的指针)和 `numPorts` 的地址(即指向 `numPorts` 变量的指针)传递给 `hdmi_cec_get_port_info` 函数。

3. 在 `hdmi_cec_get_port_info` 函数中:*list = ctx->port;
   在函数实现中,`list` 是一个指向指针数组的指针,`ctx->port` 是指向 `struct hdmi_port_info` 数组的指针。
   通过 `*list = ctx->port;` 这行代码,我们将 `ctx->port` 数组的起始地址赋值给了 `list` 指针,这样 `list` 指针就指向了 `ctx->port` 数组的内容。
   由于 `legacyPorts` 是 `list` 的地址,所以在函数调用结束后,`legacyPorts` 指向了 `ctx->port` 数组的内容。

总结起来,通过使用 `&legacyPorts` 将 `legacyPorts` 的地址传递给 `hdmi_cec_get_port_info` 函数,在函数内部将 `ctx->port` 的地址赋值给了 `*list`,从而使得 `legacyPorts` 指向了 `ctx->port` 数组的内容。这样,通过 `legacyPorts` 指针,我们可以在函数外部访问和操作 `ctx->port` 数组的填充后的端口信息。

 总结

所以,通过修正版本的分析,就知道问题版本出问题的原因了。

在用问题版本中,我们使用list[0] = &ctx->port[0],对struct hdmi_port_info* list[]中的每个元素进行赋值。

  1. list[0] = &ctx->port[0];
  2. ...
  3. list[3]->physical_address = 0x2000;

我们在调用时,定义了一个名为 `legacyPorts` 的指针,它的类型是 `struct hdmi_port_info*`,即指向 `struct hdmi_port_info` 结构体的指针。

参考链接:

诊断原生代码崩溃问题  |  Android 开源项目  |  Android Open Source Project

原创技术干货 | 解读Linux安全机制之栈溢出保护 - 送码网

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/225855
推荐阅读
相关标签
  

闽ICP备14008679号