Android系统javanative crash和anr异常处理流程

Android系统javanative crash和anr异常处理流程,第1张

Android系统java/native crash和anr异常处理流程

1、Android系统java crash异常处理流程

参考:Android8.0 系统异常处理流程_此男子淡漠-CSDN博客

Java处理未捕获异常有个Thread.UncaughtExceptionHandler,在Android系统中当然也是通过实现其来进行未捕获异常处理。Android 默认系统异常处理是在启动SystemServer进程时设置的。

Zygote进程启动SystemServer时会调用ZygoteInit的forkSystemServer()方法,该方法中又通过handleSystemServerProcess()方法来对SystemServer进程做一些处理,最后会调用到RuntimeInit.commonInit()方法

frameworks/base/core/java/com/android/internal/os/RuntimeInit.java

protected static final void commonInit() {

    Thread.setUncaughtExceptionPreHandler(new LoggingHandler());

    // 该出就设置了默认未捕获异常的处理Handler-KillApplicationHandler

    Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler());

   ...

}

KillApplicationHandler代码如下:frameworks/base/core/java/com/android/internal/os/RuntimeInit.java

private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {

    public void uncaughtException(Thread t, Throwable e) {

        try {

            ...

            // 1. mApplicationObject标识当前应用

            ActivityManager.getService().handleApplicationCrash(

                    mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));

        } ...

        finally {

            // 无论如何都要保证出现crash的进程不存活

            Process.killProcess(Process.myPid());

            System.exit(10);

        }

    }

}

注:如上ActivityManager.getService()得到的就是ActivityManagerService的服务端代理对象,实现是通过Binder机制。看看AMS在handleApplicationCrash方法中是如何处理的

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

public void handleApplicationCrash(IBinder app,

        ApplicationErrorReport.ParcelableCrashInfo crashInfo) {

    ProcessRecord r = findAppProcess(app, "Crash");

    final String processName = app == null ? "system_server"

            : (r == null ? "unknown" : r.processName);

    handleApplicationCrashInner("crash", r, processName, crashInfo);

}

void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,

        ApplicationErrorReport.CrashInfo crashInfo) {

    // 1. 将crash信息写入event log中

    EventLog.writeEvent(EventLogTags.AM_CRASH, Binder.getCallingPid(),

            UserHandle.getUserId(Binder.getCallingUid()), processName,

            r == null ? -1 : r.info.flags,

            crashInfo.exceptionClassName,

            crashInfo.exceptionMessage,

            crashInfo.throwFileName,

            crashInfo.throwLineNumber);

    addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);

    // 2.

    mAppErrors.crashApplication(r, crashInfo);

}

备注:如上注释1处将log记录在event log中。注释2处调用AppError的crashApplication方法

frameworks/base/services/core/java/com/android/server/am/AppErrors.java

void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {

    final int callingPid = Binder.getCallingPid();

    final int callingUid = Binder.getCallingUid();

    final long origId = Binder.clearCallingIdentity();

    try {

        // 调用内部的crashApplicationInner

        crashApplicationInner(r, crashInfo, callingPid, callingUid);

    } finally {

        Binder.restoreCallingIdentity(origId);

    }

}

继续看crashApplicationInner方法frameworks/base/services/core/java/com/android/server/am/AppErrors.java

void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,

        int callingPid, int callingUid) {

    ...

    synchronized (mService) {

        // 1. 处理有IActivityController的情况,如果Controller已经处理错误,则不会显示错误框

        if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,

                timeMillis, callingPid, callingUid)) {

            return;

        }

        ...

        AppErrorDialog.Data data = new AppErrorDialog.Data();

        data.result = result;

        data.proc = r;

        ...

        // 2. 发送SHOW_ERROR_UI_MSG给AMS的mUiHandler,将d出一个错误对话框,提示用户某进程crash

        final Message msg = Message.obtain();

        msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;

        task = data.task;

        msg.obj = data;

        mService.mUiHandler.sendMessage(msg);

    }

    // 3. 调用AppErrorResult的get方法,该方法内部调用了wait方法,故为阻塞状态,当用户处理了对话框后会调用AppErrorResult的set方法,该方法内部调用了notifyAll()方法来唤醒线程。

    // 注意此处涉及了两个线程的工作,crashApplicationInner函数工作在Binder调用所在的线程;对话框工作于AMS的Ui线程

   

    int res = result.get();

    Intent appErrorIntent = null;

    MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_CRASH, res);

    // 4. 判断用户 *** 作结果,然后根据结果做不同处理

    if (res == AppErrorDialog.TIMEOUT || res == AppErrorDialog.CANCEL) {

        res = AppErrorDialog.FORCE_QUIT;

    }

    synchronized (mService) {

        // 不在提示错误

        if (res == AppErrorDialog.MUTE) {

            stopReportingCrashesLocked(r);

        }

        // 尝试重启进程

        if (res == AppErrorDialog.RESTART) {

            mService.removeProcessLocked(r, false, true, "crash");

            if (task != null) {

                try {

                    mService.startActivityFromRecents(task.taskId,

                            ActivityOptions.makeBasic().toBundle());

                } ...

            }

        }

        // 强行结束进程

        if (res == AppErrorDialog.FORCE_QUIT) {

            long orig = Binder.clearCallingIdentity();

            try {

                // Kill it with fire!

                mService.mStackSupervisor.handleAppCrashLocked(r);

                if (!r.persistent) {

                    mService.removeProcessLocked(r, false, false, "crash");

                    mService.mStackSupervisor.resumeFocusedStackTopActivityLocked();

                }

            } finally {

                Binder.restoreCallingIdentity(orig);

            }

        }

        // 停止进程并报告错误

        if (res == AppErrorDialog.FORCE_QUIT_AND_REPORT) {

            appErrorIntent = createAppErrorIntentLocked(r, timeMillis, crashInfo);

        }

        ...

    }

    if (appErrorIntent != null) {

        try {

            // 启动报告错误界面

            mContext.startActivityAsUser(appErrorIntent, new UserHandle(r.userId));

        } catch (ActivityNotFoundException e) {

            Slog.w(TAG, "bug report receiver dissappeared", e);

        }

    }

}

备注:如上,注释1会优先让crash观察者进行crash处理,crash观察者通过AMS的setActivityController()方法进行设置,如果已经处理则不会再d出错误对话框。注释2会发送SHOW_ERROR_UI_MSG消息给AMS的mUIHandler处理来请求d出错误对话框。注释3通过调用AppErrorResult中的get()方法来使线程阻塞。需要注意的是此处涉及到两个线程,crashApplicationInner工作在Binder调用所在的线程,对话框显示则处于AMS的UI线程。具体AppErrorResult的工作后面会说到。待用户 *** 作对话框后或者超时时间到时get()方法就会被唤醒,并且返回处理结果。注释4则根据用户 *** 作结果进行不同的处理,例如强制停止进程,重启进程等。

crash对话框的显示和用户行为

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

final class UiHandler extends Handler {

    @Override

    public void handleMessage(Message msg) {

        switch (msg.what) {

        // 显示错误对话框

        case SHOW_ERROR_UI_MSG: {

            mAppErrors.handleShowAppErrorUi(msg);

            ensureBootCompleted();

        } break;

        // 显示ANR对话框

        case SHOW_NOT_RESPONDING_UI_MSG: {

            mAppErrors.handleShowAnrUi(msg);

            ensureBootCompleted();

        } break;

        ...

}

可以看到UiHandler对错误和ANR对话框显示的处理,这里看错误对话框的显示,其还是通过AppErrors类进行处理。frameworks/base/services/core/java/com/android/server/am/AppErrors.java

void handleShowAppErrorUi(Message msg) {

    ...

    synchronized (mService) {

        ProcessRecord proc = data.proc;

        AppErrorResult res = data.result;

        // 1. crash 对话框已显示,故无需再显示

        if (proc != null && proc.crashDialog != null) {

            if (res != null) {

                res.set(AppErrorDialog.ALREADY_SHOWING);

            }

            return;

        }

       

       ...

        final boolean crashSilenced = mAppsNotReportingCrashes != null &&

                mAppsNotReportingCrashes.contains(proc.info.packageName);

        if ((mService.canShowErrorDialogs() || showBackground) && !crashSilenced) {

            // 2. 创建crash对话框

            proc.crashDialog = new AppErrorDialog(mContext, mService, data);

        } else {

            // 3. 如果AMS禁止显示错误对话框,或者当前设备处于睡眠模式则不会让显示对话框

            if (res != null) {

                res.set(AppErrorDialog.CANT_SHOW);

            }

        }

    }

    // 4. 调用Dialog show方法显示crash对话框

    if(data.proc.crashDialog != null) {

        data.proc.crashDialog.show();

    }

}

备注:注释1先对crash进程是否已经显示对话框做了判断,如果已经显示则无需显示。注释2处,手机没有息屏,AMS也允许显示crash对话框,则创建对话框,否则走注释3处,直接说明不显示。如果走到注释4则需要显示crash对话框,故直接调用Dialog的show()方法。这里对注释1和注释3处的res.set()方法做以解释,这res就是AppErrorResult,也就是在crashApplicationInner方法中创建的,该方法在请求AMS显示对话框时调用了result.get()使其阻塞,调用set方法后则会唤醒Binder调用线程,接着走下面代码,进而对结果进行判断。

看下AppErrorResult get()和set()的实现

frameworks/base/services/core/java/com/android/server/am/AppErrorResult.java

final class AppErrorResult {

    public void set(int res) {

        synchronized (this) {

            mHasResult = true;

            // 1. set方法设置mResult的值

            mResult = res;

            // 2.  调用notifyAll唤醒持有当前对象锁且处于阻塞状态的所有线程

            notifyAll();

        }

    }

    public int get() {

        synchronized (this) {

            while (!mHasResult) {

                try {

                    //3. 实质通过wait()使当前线程阻塞

                    wait();

                } catch (InterruptedException e) {

                }

            }

        }

        // 4. 返回mResult

        return mResult;

    }

    boolean mHasResult = false;

    int mResult;

}

通过get()方法线程阻塞,通过set方法更新mResult的值并唤醒处于等待队列的线程,此时接着get()方法wait后面的代码执行,将set()方法中更新的mResult值作为返回值。

当错误对话框d出后,用户 *** 作或者超时时间处理

frameworks/base/services/core/java/com/android/server/am/AppErrorDialog.java

@Override

public void onClick(View v) {

    // 1. 判断点击控件,来决定 *** 作

    switch (v.getId()) {

        // 请求重启进程

        case com.android.internal.R.id.aerr_restart:

            mHandler.obtainMessage(RESTART).sendToTarget();

            break;

        // 请求反馈报错问题

        case com.android.internal.R.id.aerr_report:

            mHandler.obtainMessage(FORCE_QUIT_AND_REPORT).sendToTarget();

            break;

        // 请求关闭crash Dialog并杀死进程

        case com.android.internal.R.id.aerr_close:

            mHandler.obtainMessage(FORCE_QUIT).sendToTarget();

            break;

        // 请求不再提示对话框

        case com.android.internal.R.id.aerr_mute:

            mHandler.obtainMessage(MUTE).sendToTarget();

            break;

        default:

            break;

    }

}

   

// 2. 受到请求信息后调用setResult()方法并关闭对话框

private final Handler mHandler = new Handler() {

    public void handleMessage(Message msg) {

        setResult(msg.what);

        dismiss();

    }

};

private void setResult(int result) {

    synchronized (mService) {

        if (mProc != null && mProc.crashDialog == AppErrorDialog.this) {

            mProc.crashDialog = null;

        }

    }

    // 3. 调用AppErrorResult的set方法使阻塞线程运行,并将用户点击结果告知

    mResult.set(result);

    mHandler.removeMessages(TIMEOUT);

}

如上,最终通过mResult.set()方法唤线程,是线程代码接着执行

frameworks/base/services/core/java/com/android/server/am/AppErrors.java

void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,

        int callingPid, int callingUid) {

    ...

    // 3. 阻塞线程直至超时或者用户 *** 作对话框

    int res = result.get();

    // 4. 判断用户 *** 作结果,然后根据结果做不同处理

    ...

}

后续清理工作

根据前面的流程,我们知道当进程crash后,最终将被kill掉,此时AMS还需要完成后续的清理工作。

我们先来回忆一下进程启动后,注册到AMS的部分流程

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

// 进程启动后,对应的ActivityThread会attach到AMS上

private final boolean attachApplicationLocked(IApplicationThread thread,

            int pid) {

    ...

    final String processName = app.processName;

    try {

        // 1.  创建“讣告”接收者

        AppDeathRecipient adr = new AppDeathRecipient(

                app, pid, thread);

        thread.asBinder().linkToDeath(adr, 0);

        app.deathRecipient = adr;

    }

    ...

}

当进程注册到AMS时,AMS注册了一个“讣告”接收者注册到进程中。

因此,当crash进程被kill后,AppDeathRecipient中的binderDied方法将被回调。看源码知道bindDied()方法中又会调用到appDiedLocked()方法

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,

        boolean fromBinderDied) {

    ...

    // 1. 该进程没有杀死,则杀死进程

    if (!app.killed) {

        if (!fromBinderDied) {

            killProcessQuiet(pid);

        }

        killProcessGroup(app.uid, pid);

        app.killed = true;

    }

    if (app.pid == pid && app.thread != null &&

            app.thread.asBinder() == thread.asBinder()) {

        ...

        // 2.

        handleAppDiedLocked(app, false, true);

        ...

    } ...

}

备注:注释1会将进程杀死,注释2处为app死亡的关键处理。

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

private final void handleAppDiedLocked(ProcessRecord app,

        boolean restarting, boolean allowRestart) {

    int pid = app.pid;

    // 1. 进行进程中service、ContentProvider、BroadcastReceiver等的收尾工作

    boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1,

            false );

    if (!kept && !restarting) {

        removeLruProcessLocked(app);

        if (pid > 0) {

            ProcessList.remove(pid);

        }

    }

    ...

    // 2. 判断是否还存在可见的Activity

    boolean hasVisibleActivities = mStackSupervisor.handleAppDiedLocked(app);

    // 清除activity列表

    app.activities.clear();

    ...

    try {

        if (!restarting && hasVisibleActivities

                && !mStackSupervisor.resumeFocusedStackTopActivityLocked()) {

            // 3. 若当前crash进程中存在可视Activity,那么AMS还是会确保所有可见Activity正常运行,故会重启该进程

            mStackSupervisor.ensureActivitiesVisibleLocked(null, 0, !PRESERVE_WINDOWS);

        }

    } finally {

        mWindowManager.continueSurfaceLayout();

    }

}

备注:注释1比较重要的是对于crash进程中的Bounded Service而言,会清理掉service与客户端之间的联系,此外若service的客户端重要性过低,还会被直接kill掉。注释2处判断是否应用还存在可见的Activity,注释3处对于可见的Activity系统要保证其正常运行,还会重新启动进程。

2、Android系统native crash异常处理流程

参考:Android稳定性系列8 Native crash处理流程_liuwg1226的专栏-CSDN博客

从系统全局来说,Crash分为framework/App Crash, Native Crash,以及Kernel Crash。

(1)对于framework层或者app层的Crash(即Java层面Crash),那么往往是通过抛出未捕获异常而导致的Crash。

(2)至于Kernel Crash,很多情况是发生Kernel panic,对于内核崩溃往往是驱动或者硬件出现故障。

(3)Native Crash,即C/C++层面的Crash,这是介于系统framework层与Linux层之间的一层,这是本文接下来要讲解的内容。

system_server进程启动过程中,调用startOtherServices来启动各种其他系统Service时,也正是这个时机会创建一个用于监听native crash事件的NativeCrashListener对象(继承于线程),通过socket机制来监听,等待即debuggerd与该线程创建连接,并处理相应事件。紧接着通过NativeCrashListener#run()调用到AMS#handleApplicationCrashInner()函数来处理crash流程。

NativeCrashListener的主要工作:

(1)创建socket服务端”/data/system/ndebugsocket”

(2)等待socket客户端(即debuggerd)来建立连接;

(3)调用NativeCrashListener#consumeNativeCrashData来处理native crash信息;

(4)应答debuggerd已经建立连接,并写入应答消息告知debuggerd进程。

Native crash的工作核心是由debuggerd守护进程来完成。要了解Native Crash,首先从应用程序入口位于begin.S中的__linker_init入手。

2.1 begin.S

arch/arm/begin.S

ENTRY(_start)

  mov r0, sp

  //入口地址 【见小节1.2】

  bl __linker_init

 

  mov pc, r0

END(_start)

2.2 __linker_init

linker.cpp

extern "C" ElfW(Addr) __linker_init(void* raw_args) {

  KernelArgumentBlock args(raw_args);

  ElfW(Addr) linker_addr = args.getauxval(AT_base);

  ...

  //【见小节1.3】

  ElfW(Addr) start_address = __linker_init_post_relocation(args, linker_addr);

  return start_address;

}

2.3 __linker_init_post_relocation

linker.cpp

static ElfW(Addr) __linker_init_post_relocation(KernelArgumentBlock& args, ElfW(Addr) linker_base) {

  ...

  // Sanitize the environment.

  __libc_init_AT_SECURE(args);

  // Initialize system properties

  __system_properties_init();

  //【见小节1.4】

  debuggerd_init();

  ...

}

2.4 debuggerd_init

linker/debugger.cpp

__LIBC_HIDDEN__ void debuggerd_init() {

  struct sigaction action;

  memset(&action, 0, sizeof(action));

  sigemptyset(&action.sa_mask);

  //【见小节1.5】

  action.sa_sigaction = debuggerd_signal_handler;

  //SA_RESTART代表中断某个syscall,则会自动重新调用该syscall

  //SA_SIGINFO代表信号附带参数siginfo_t结构体可传送到signal_handler函数

  action.sa_flags = SA_RESTART | SA_SIGINFO;

  //使用备用signal栈(如果可用),以便我们能捕获栈溢出

  action.sa_flags |= SA_ONSTACK;

  sigaction(SIGABRT, &action, nullptr);

  sigaction(SIGBUS, &action, nullptr);

  sigaction(SIGFPE, &action, nullptr);

  sigaction(SIGILL, &action, nullptr);

  sigaction(SIGPIPE, &action, nullptr);

  sigaction(SIGSEGV, &action, nullptr);

#if defined(SIGSTKFLT)

  sigaction(SIGSTKFLT, &action, nullptr);

#endif

  sigaction(SIGTRAP, &action, nullptr);

}

2.6 send_debuggerd_packet

linker/debugger.cpp

static void send_debuggerd_packet(siginfo_t* info) {

  // Mutex防止多个crashing线程同一时间来来尝试跟debuggerd进行通信

  static pthread_mutex_t crash_mutex = PTHREAD_MUTEX_INITIALIZER;

  int ret = pthread_mutex_trylock(&crash_mutex);

  if (ret != 0) {

    if (ret == EBUSY) {

      __libc_format_log(ANDROID_LOG_INFO, "libc",

          "Another thread contacted debuggerd first; not contacting debuggerd.");

      //等待其他线程释放该锁,从而获取该锁

      pthread_mutex_lock(&crash_mutex);

    }

    return;

  }

  //建立与debuggerd的socket通道

  int s = socket_abstract_client(DEBUGGER_SOCKET_NAME, SOCK_STREAM | SOCK_CLOEXEC);

  ...

  debugger_msg_t msg;

  msg.action = DEBUGGER_ACTION_CRASH;

  msg.tid = gettid();

  msg.abort_msg_address = reinterpret_cast(g_abort_message);

  msg.original_si_code = (info != nullptr) ? info->si_code : 0;

  //将DEBUGGER_ACTION_CRASH消息发送给debuggerd服务端

  ret = TEMP_FAILURE_RETRY(write(s, &msg, sizeof(msg)));

  if (ret == sizeof(msg)) {

    char debuggerd_ack;

    //阻塞等待debuggerd服务端的回应数据

    ret = TEMP_FAILURE_RETRY(read(s, &debuggerd_ack, 1));

    int saved_errno = errno;

    notify_gdb_of_libraries();

    errno = saved_errno;

  }

  close(s);

}

该方法的主要功能:

调用socket_abstract_client,建立于debuggerd的socket通道;

将action = DEBUGGER_ACTION_CRASH的消息发送给debuggerd服务端;

阻塞等待debuggerd服务端的回应数据。

接下来,看看debuggerd服务端接收到DEBUGGER_ACTION_CRASH的处理流程

debuggerd服务端

debuggerd 守护进程启动后,一直在等待socket client的连接。当native crash发送后便会向debuggerd发送action = DEBUGGER_ACTION_CRASH的消息。

2.1 do_server

/debuggerd/debuggerd.cpp

static int do_server() {

  ...

  for (;;) {

    sockaddr_storage ss;

    sockaddr* addrp = reinterpret_cast(&ss);

    socklen_t alen = sizeof(ss);

    //等待客户端连接

    int fd = accept4(s, addrp, &alen, SOCK_CLOEXEC);

    if (fd == -1) {

      continue; //accept失败

    }

    //处理native crash发送过来的请求【见小节2.2】

    handle_request(fd);

  }

  return 0;

}

-------à一路调用到

worker_process,处于client发送过来的请求,server端通过子进程来处理

/debuggerd/debuggerd.cpp

static void worker_process(int fd, debugger_request_t& request) {

  std::string tombstone_path;

  int tombstone_fd = -1;

  switch (request.action) {

    case DEBUGGER_ACTION_CRASH:

      //打开tombstone文件

      tombstone_fd = open_tombstone(&tombstone_path);

      if (tombstone_fd == -1) {

        exit(1); //无法打开tombstone文件,则退出该进程

      }

      break;

    ...

  }

……

  if (!attach_gdb) {

    //将进程crash情况告知AMS【见小节2.4.3】

    activity_manager_write(request.pid, crash_signal, amfd, *amfd_data.get());

  }

……

}

整个过程比较复杂,下面只介绍attach_gdb=false的执行流程:

(1)当DEBUGGER_ACTION_CRASH ,则调用open_tombstone并继续执行;

(2)调用ptrace方法attach到目标进程;

(3)调用BacktraceMap::Create来生成backtrace;

(4)当DEBUGGER_ACTION_CRASH,则执行activity_manager_connect;

(5)调用drop_privileges来取消特权模式;

(6)通过perform_dump执行dump *** 作;

(7)SIGBUS等致命信号,则调用engrave_tombstone(),这是核心方法

(8)调用activity_manager_write,将进程crash情况告知AMS;

(9)调用ptrace方法detach到目标进程;

(10)当DEBUGGER_ACTION_CRASH,发送信号SIGKILL给目标进程tid

备注:如上activity_manager_connect()该方法的功能是建立跟上层ActivityManager的socket连接。对于”/data/system/ndebugsocket”的socket的服务端是在NativeCrashListener.java方法中创建并启动的。

3、Android系统anr异常处理流程

参考:深入探索Android稳定性优化 – Android开发中文站(深入探索Android稳定性优化)

ANR(Application Not responding),是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR。一般地,这时往往会d出一个提示框,告知用户当前xxx未响应,用户可选择继续等待或者Force Close。

ANR的几种类型:

(1)KeyDispatchTimeout (5 seconds) 按键或触摸事件处理超时(一般是UI主线程做了耗时的 *** 作,这类ANR最常见)

(2)BroadcastTimeout(10 seconds,即10s内没有执行完成) 广播的分发和处理超时(一般是onReceiver执行时间过长)

(3)ServiceTimeout(20 seconds) Service的启动和执行20s超时

(4)ContentProviderTimeout(10 second)ContentProvider 在10S内没有处理完成发生ANR。

ActivityManagerService.appNotResponding()在程序无响应、ANR时被调用

/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

    final void appNotResponding(ProcessRecord app, ActivityRecord activity,

            ActivityRecord parent, boolean aboveSystem, final String annotation) {

        ...

        updateCpuStatsNow(); //第一次 更新cpu统计信息

        synchronized (this) {

          //PowerManager.reboot() 会阻塞很长时间,因此忽略关机时的ANR

          if (mShuttingDown) {

              return;

          } else if (app.notResponding) {

              return;

          } else if (app.crashing) {

              return;

          }

          //记录ANR到EventLog

          EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,

                  app.processName, app.info.flags, annotation);

          // 将当前进程添加到firstPids

          firstPids.add(app.pid);

          int parentPid = app.pid;

          //将system_server进程添加到firstPids

          if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);

          for (int i = mLruProcesses.size() - 1; i >= 0; i--) {

              ProcessRecord r = mLruProcesses.get(i);

              if (r != null && r.thread != null) {

                  int pid = r.pid;

                  if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {

                      if (r.persistent) {

                          firstPids.add(pid); //将persistent进程添加到firstPids

                      } else {

                          lastPids.put(pid, Boolean.TRUE); //其他进程添加到lastPids

                      }

                  }

              }

          }

        }

        // 记录ANR输出到main log

        StringBuilder info = new StringBuilder();

        info.setLength(0);

        info.append("ANR in ").append(app.processName);

        if (activity != null && activity.shortComponentName != null) {

            info.append(" (").append(activity.shortComponentName).append(")");

        }

        info.append("n");

        info.append("PID: ").append(app.pid).append("n");

        if (annotation != null) {

            info.append("Reason: ").append(annotation).append("n");

        }

        if (parent != null && parent != activity) {

            info.append("Parent: ").append(parent.shortComponentName).append("n");

        }

        //创建CPU tracker对象

        final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

        //输出traces信息【见小节2】

        File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker,

                lastPids, NATIVE_STACKS_OF_INTEREST);

        updateCpuStatsNow(); //第二次更新cpu统计信息

        //记录当前各个进程的CPU使用情况

        synchronized (mProcessCpuTracker) {

            cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);

        }

        //记录当前CPU负载情况

        info.append(processCpuTracker.printCurrentLoad());

        info.append(cpuInfo);

        //记录从anr时间开始的Cpu使用情况

        info.append(processCpuTracker.printCurrentState(anrTime));

        //输出当前ANR的reason,以及CPU使用率、负载信息

        Slog.e(TAG, info.toString());

        //将traces文件 和 CPU使用率信息保存到dropbox,即data/system/dropbox目录

        addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,

                cpuInfo, tracesFile, null);

        synchronized (this) {

            ...

            //后台ANR的情况, 则直接杀掉

            if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) {

                app.kill("bg anr", true);

                return;

            }

            //设置app的ANR状态,病查询错误报告receiver

            makeAppNotRespondingLocked(app,

                    activity != null ? activity.shortComponentName : null,

                    annotation != null ? "ANR " + annotation : "ANR",

                    info.toString());

            //重命名trace文件

            String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);

            if (tracesPath != null && tracesPath.length() != 0) {

                //traceRenameFile = "/data/anr/traces.txt"

                File traceRenameFile = new File(tracesPath);

                String newTracesPath;

                int lpos = tracesPath.lastIndexOf (".");

                if (-1 != lpos)

                    // 新的traces文件= /data/anr/traces_进程名_当前日期.txt

                    newTracesPath = tracesPath.substring (0, lpos) + "_" + app.processName + "_" + mTraceDateFormat.format(new Date()) + tracesPath.substring (lpos);

                else

                    newTracesPath = tracesPath + "_" + app.processName;

                traceRenameFile.renameTo(new File(newTracesPath));

            }

            //d出ANR对话框

            Message msg = Message.obtain();

            HashMap map = new HashMap();

            msg.what = SHOW_NOT_RESPONDING_MSG;

            msg.obj = map;

            msg.arg1 = aboveSystem ? 1 : 0;

            map.put("app", app);

            if (activity != null) {

                map.put("activity", activity);

            }

            //向ui线程发送,内容为SHOW_NOT_RESPONDING_MSG的消息

            mUiHandler.sendMessage(msg);

        }

    }

当发生ANR时, 会按顺序依次执行:

(1)输出ANR Reason信息到Event Log. 也就是说ANR触发的时间点最接近的就是EventLog中输出的am_anr信息;

(2)收集并输出重要进程列表中的各个线程的traces信息,该方法较耗时; 【见小节2】

(3)输出当前各个进程的CPU使用情况以及CPU负载情况;

(4)将traces文件和 CPU使用情况信息保存到dropbox,即/data/system/dropbox目录

(5)根据进程类型,来决定直接后台杀掉,还是d框告知用户.

ANR输出重要进程的traces信息,这些进程包含:

(1)firstPids队列:第一个是ANR进程,第二个是system_server,剩余是所有persistent进程;

(2)Native队列:是指/system/bin/目录的mediaserver,sdcard 以及surfaceflinger进程;

(3)lastPids队列: 是指mLruProcesses中的不属于firstPids的所有进程。

5、总结:

(1)Java/native crash调用:AMS#handleApplicationCrashInner方法(注意:app进程调用引起的native crash会走到AMS的这里,通过NativeCrashListener# consumeNativeCrashData函数中调用NativeCrashReporter走到AMS)

(2)native crash 调用:NativeCrashListener# consumeNativeCrashData 方法(注:native守护进程crash会走到这里,如wpa_supplicant)

//关键日志:

NativeCrashListener: Read pid=7441 signal=11

/system/bin/tombstoned: Tombstone written to: /data/tombstones/tombstone_04

BootReceiver: Copying /data/tombstones/tombstone_04 to DropBox (SYSTEM_TOMBSTONE)

(3)Anr调用:AMS#appNotResponding

注:需要在开发者选项中打开相应配置才会d框。

欢迎分享,转载请注明来源:内存溢出

原文地址:https://54852.com/zaji/3977041.html

(0)
打赏 微信扫一扫微信扫一扫 支付宝扫一扫支付宝扫一扫
上一篇 2022-10-21
下一篇2022-10-21

发表评论

登录后才能评论

评论列表(0条)

    保存