2015年6月3日 星期三

Debug ANR issue

1. 利用 "InputDispatcher: Application is not responding" 做為關鍵字搜尋main log,找出發生anr的時間點、造成anr的兇手和ANR的類型

Ex: 05-20 13:08:48.863   819   944 I InputDispatcher: Application is not responding: AppWindowToken{eb23daa token=Token{33133f87 ActivityRecord{38efa3c6 u0 com.sony.nfx.app.sfrc/.npam.InitialActivity t405}}} - Window{38ae425f u0 com.sony.nfx.app.sfrc/com.sony.nfx.app.sfrc.npam.InitialActivity}.  It has been 15007.0ms since event, 15001.4ms since wait started.  Reason: Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 3.

2. 尋找ANR發生點附近的ActivityManager的資訊,可以得知ANR的類型和當時CPU的使用情況

如果CPU使用量接近100%,說明當前設備很忙,有可能是CPU飢餓導致了ANR
如果CPU使用量很少,說明主線程被BLOCK了
如果IOwait很高,說明ANR有可能是主線程在進行I/O操作造成的

04-0113:12:15.872 E/ActivityManager( 220): ANR in com.android.email(com.android.email/.activity.SplitScreenActivity)
04-0113:12:15.872 E/ActivityManager( 220): Reason:keyDispatchingTimedOut
04-0113:12:15.872 E/ActivityManager( 220): Load: 8.68 / 8.37 / 8.53
..............................
04-0113:12:15.872 E/ActivityManager( 220): 32%TOTAL: 28% user + 3.7% kernel

3. 分析 traces log

首先取得traces log檔案:
adb pull data/anr/traces.txt ./mytraces.txt

然後再分析traces log
使用關鍵字Cmdline: 搜尋,並找出發生ANR的兇手的traces log,並檢查 call stack,尋找造成ANR的原因

DALVIK THREADS:
(mutexes: tll=0tsl=0 tscl=0 ghl=0 hwl=0 hwll=0)
"main" prio=5 tid=1NATIVE                    // 表示process 沒有被 block住


DALVIK THREADS:
(mutexes: tll=0tsl=0 tscl=0 ghl=0 hwl=0 hwll=0)
"main" prio=5 tid=1BLOCK                     // 表示process 被block住

4. 常見ANR發生的原因

案例一:
CPU IOWait很高,說明當前系統在忙於I/O,因此資料庫操作被阻塞
atjava.lang.Thread.parkFor(Thread.java:1424)
atjava.lang.LangAccessImpl.parkFor(LangAccessImpl.java:48)
atsun.misc.Unsafe.park(Unsafe.java:337)
atjava.util.concurrent.locks.LockSupport.park(LockSupport.java:157)
atjava.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:808)
atjava.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:841)
atjava.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1171)
atjava.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:200)
atjava.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:261)
atandroid.database.sqlite.SQLiteDatabase.lock(SQLiteDatabase.java:378)
atandroid.database.sqlite.SQLiteCursor.(SQLiteCursor.java:222)
atandroid.database.sqlite.SQLiteDirectCursorDriver.query(SQLiteDirectCursorDriver.java:53)


案例二:
在UI線程進行網路數據的讀寫
atorg.apache.harmony.luni.platform.OSNetworkSystem.receiveStreamImpl(NativeMethod)
atorg.apache.harmony.luni.platform.OSNetworkSystem.receiveStream(OSNetworkSystem.java:478)
atorg.apache.harmony.luni.net.PlainSocketImpl.read(PlainSocketImpl.java:565)
atorg.apache.harmony.luni.net.SocketInputStream.read(SocketInputStream.java:87)

案例三:
DMS06424182
Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.
此案例可以直接從main log和sys log找出root cause,不透過traces log

Crash log DMS06424182_DS3 analysis(Another one is similar):
1. ESTA test app's sends back key event to camera app caused ANR, because camera window of CameraActivity has been set to null before CameraActivity receives the back key event. And due to time out, input event has been injected by ESTA app(pid: 3039).
[filter_log.txt]
05-16 02:07:22.968 V/ESTA    ( 3039): [Test Step] [d] pressKey KEYCODE_BACK
05-16 02:07:39.222 V/ESTA    ( 3039): [Test Step] [d] sleep 250
05-16 02:07:39.231 V/ESTA    ( 3039): appEarlyNotResponding com.sonyericsson.android.camera Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
05-16 02:07:39.232 V/ESTA    ( 3039): <<<<<<<<<<<< EarlyANR >>>>>>>>>>>>
05-16 02:07:39.236 V/ESTA    ( 3039): ProcessName: com.sonyericsson.android.camera
05-16 02:07:39.239 V/ESTA    ( 3039): PID: 19897
05-16 02:07:39.241 V/ESTA    ( 3039): annotation: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
[SYS_ANDROID_LOG]
05-16 02:07:23.202   809  1853 V WindowManager: Changing focus from Window{3f36f72d u0 com.sonyericsson.android.camera/com.sonyericsson.android.camera.CameraActivity} to null Callers=com.android.server.wm.WindowManagerService.relayoutWindow:3670 com.android.server.wm.Session.relayout:202 android.view.IWindowSession$Stub.onTransact:237 com.android.server.wm.Session.onTransact:136
05-16 02:07:39.220   809  1896 W InputManager: Input event injection from pid 3039 timed out.
05-16 02:07:39.229  3039  3300 V AcceptanceTest: appEarlyNotResponding com.sonyericsson.android.camera Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
05-16 02:07:39.231  3039  3300 V ESTA    : appEarlyNotResponding com.sonyericsson.android.camera Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
05-16 02:07:39.232  3039  3300 V ESTA    : <<<<<<<<<<<< EarlyANR >>>>>>>>>>>>
05-16 02:07:39.236  3039  3300 V ESTA    : ProcessName: com.sonyericsson.android.camera
05-16 02:07:39.239  3039  3300 V ESTA    : PID: 19897
05-16 02:07:39.241  3039  3300 V ESTA    : annotation: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)

2. In stability test like ESTA, too many input events are sent in short time(stress test). This ANR might be caused by time sequence problem.
InputDispatcher dispatches input event to wrong focused Activity. As we know, focused activity always be set before its task resumed, however, sometimes it doesn't set at all. Eg. task already exists in stack. That may make the focus info in InputDispatcher mismatched, and resulting input dispatcher timed out.

3. Proposal solution: update the focused activity in time with below codes.
BTW, please framework team double check it before delivering it.
/frameworks/base/services/core/java/com/android/server/am/ActivityStackSupervisor.java
1649    final int startActivityUncheckedLocked(ActivityRecord r, ActivityRecord sourceRecord,
1650            IVoiceInteractionSession voiceSession, IVoiceInteractor voiceInteractor, int startFlags,
1651            boolean doResume, Bundle options, TaskRecord inTask) {
......
......
2084                    if (!addingToTask && reuseTask == null) {
2085                        // We didn't do anything...  but it was needed (a.k.a., client
2086                        // don't use that intent!)  And for paranoia, make
2087                        // sure we have correctly resumed the top activity.
2088                        if (doResume) {
2089                            targetStack.resumeTopActivityLocked(null, options);
+                                   ActivityRecord top = topRunningActivityLocked();
+                                   if (top != null) {
+                                       mService.setFocusedActivityLocked(top);
+                                   }
2090                        } else {
2091                            ActivityOptions.abort(options);
2092                        }
2093                        /// M: Add debug task message @{
2094                        if (DEBUG_TASKS) {
2095                            Slog.i(TAG, "START_TASK_TO_FRONT, doResume = " + doResume);
2096                        }

5. 參考資料:

http://codex.wiki/post/186093-336/







沒有留言: