Android SDK Stability


#1

In the currently released GoTenna firmware and Android SDK there are a bunch of problems that prevent most applications from maintaining a connection to a GoTenna for more than 30 to 90 minutes or so (NS1 Dashboard is an exception due to the two months of fixes and stability testing I’ve ended up doing):

  1. GoTenna’s Bluetooth transceiver resets itself more or less randomly (every 30 minutes to 60 minutes I would estimate): https://github.com/gotenna/PublicSDK/issues/31
  2. The Android GoTenna SDK connects to the GoTenna device with “reconnects” enabled, but when it is notified of a disconnect, attempts to reconnect itself. This creates a race condition, and if the “original” connection reconnects before the “new” connection, the “original” connection is orphaned, and a force stop of the application and subsquent relaunch is required, since the orphaned connection will attempt to reconnect if you simply restart the GoTenna.
  3. The Android GoTenna SDK has some pretty bad bugs in its BluetoothGattCallback method implementations, including one where it purposely orphans the Bluetooth connection on status code 133, and another where it throws an exception, and due to a bug in many versions of Android, crashes the entire Bluetooth stack on the device, which requires the Bluetooth adapter to be reenabled.
  4. The Android GoTenna SDK has an issue where if a connection to a GoTenna is lost in the middle of receiving data during a command, the data of the command can occasionally cause the SDK to be unable to process any new commands.

Note that although issue #1 makes issues #2 through #4 much more common, those issues can still occur simply due to the GoTenna being power cycled or going out of range temporarily. I’ve communicated with Rahul quite a bit about these issues (and a couple other improvements), and provided suggested ways of addressing them and log information.

Are these issues going to be resolved when the new firmware comes out either through the firmware itself in the case of #1, or by an associated new version of the Android GoTenna SDK for the other issues?

Even with issues #2 through #4 (and a couple other minor changes), an application still needs to have extension mitigations in place to maintain long connection durations:

  1. A watchdog that stops a reconnection attempt and restarts it after a period of time (simply relying on the reconnect doesn’t work).
  2. A watchdog that looks for the GoTenna being discovered, but the connection process is taking too long, and responds by restarting the Bluetooth adapter.
  3. A watchdog that checks to see if the Android Bluetooth service has shut itself down, and if so, re-enables it. In some cases issue #1 occuring results in the Android Bluetooth adapter becoming disabled without broadcasting the usual events (this seems to be specific to Android 7.0, and I haven’t released a version of the NS1 Dashboard app with this fix in place quite yet)

The NS1 Dashboard application has the added complexity of maintaining a Bluetooth connection to the Vehicle Interface portion of the NS1 Gauge System at the same time it is maintaining a GoTenna connection. Before adding GoTenna support I had a single watchdog that was looking for a stalled connection, where Android thought it was connected, but data from the device wasn’t being received over the link. Adding GoTenna support involved adding a socket connect watchdog that detected attempts to connect that took too long or too short, and responded by the usual re-enabling of the Bluetooth adapter.

The NS1 Dashboard also has the ugly complexity of reconnecting the GoTenna and Vehicle Interface after a Bluetooth adapter re-enable. This involves attempting to disconnect things before disabling, remembering what was connected, and of course kicking off reconnection attempts.

With fixes and mitigations in place the NS1 Dashboard app can maintain connections to a GoTenna indefinitely. This connection durability is quite important when you think about it:

Say you have 4 people in a group. With a restart of the Bluetooth transceiver in the GoTenna every hour it means that on average someone has a GoTenna disconnect and not reconnect every 15 minutes or so. If people stay on top of it they can usually catch the problem within 5 minutes. That still leaves the entire system of 4 people not functioning correctly 1/3 of the time.

In reality, after restarting the app and GoTenna every hour or so the users eventually get tired of devoting time to managing the system, which usually means that within a couple hours at least one or two people have just given up. So the system works for a couple hours the first day or two, and after that everyone has decided it isn’t worth the effort and doesn’t bother to even turn the GoTennas on.


#2

Hi there!

I am one of the Android Developers who has worked on the Android goTenna SDK. First off, thank you for the detailed descriptions of all of the issues you have faced. I will try to address each one of these issues individually.

  1. Yes, this issue should be addressed in an upcoming firmware update, which has been a long time coming, and will arrive with various Bluetooth 5.0 improvements as well. As you mentioned, this issue alone increases the frequency of other related issues, so we are looking forward to rolling this out as soon as we can.

  2. Can you clarify some of your points here? I am not sure why you think the SDK connects with with “reconnects” enabled, are you referring to the autoConnect param in the connectGatt function? The original connection is closed on our side upon a disconnect so I am not sure how it is being “orphaned” as you mentioned.

  3. The 133 status code error was, from what we have seen, an issue that frequently occurred on older Samsung devices in the past, and the only way to recover was to manually disconnect and attempt a re-connect. Samsung devices appear to have many of their own quirks in the Android BLE realm. The Bluetooth stack crash you have mentioned has been fixed, and will be integrated into the next SDK update.

  4. This has been fixed as well, and should be integrated into the next SDK update.

As you are probably well aware by now, managing BLE connections on Android can be rather challenging due to the number of native Android issues out there.

Thanks again for bringing these issues to our attention.


#3

Hi Tcolligan,

Thanks for the quick response!

Here’s the anaylsis I saved regarding issue #2, which may be Android 8.0 specific, or maybe even Samsung Note8 or S8 specific:

On a Samsung Note 8 I found that I could only maintain a connection for 1 to 6 hours though.

The symptoms were that on connection loss the Gotenna would start flashing, however, when the GoTenna SDK initiated a scan to reconnect the Gotenna would stop flashing as if it had connected, but would not be followed by a series of flashes that usually happen on a successful reconnect. I would have to restart the app to be able to successfully reconnect, or restart the GoTenna Mesh itself. In this case the a Bluetooth Adapter reenable would not resolve the issue, and it was not triggering my automatic re-enable code anyway, since the ScanResult didn’t contain the GoTenna.

I tracked this problem down to the “auto reconnect” flag on the connectGatt call. On non-Note 8 builds having this flag be set to true, as it is in the SDK at the moment, didn’t cause problems. On the Note 8 it seems that the GoTenna SDK would be informed of the disconnect, initiate a scan, but at the same time Android wouid automatically try to reconnect, and if successful, this now orphaned connection to the GoTenna would prevent the scan initiated by the GoTenna SDK from finding and initiating a new connection to the GoTenna.

Here’s the analysis I saved regarding issue #3:

When the GoTenna SDK’s BluetoothGattCallback implementation is called with a status 133 the Gotenna SDK responds by closing the BluetoothGatt object and null-ing out a bunch of variables. This causes a NullPointerException when the BluetoothGatt object attempts to invoke the BluetoothGattCallback with information informing the callback that a disconnect has occurred (seems to happen about 10 seconds or so after the status of 133 is returned in the onCharacteristicWrite method). I’m 90% sure that this NullPointerException when attempting to call the BluetoothGattCallback crashes the Bluetooth stack, since the GoTenna thinks it is still connected until the Android app’s process is killed, and the app is unable to reconnect to the GoTenna until after the app’s process is killed.

I have more details in emails that I can forward to you if you would like, including log files that contain exceptions and other logging information.

I had also requested a new GTConnectionState that indicates the GoTenna SDK has found a GoTenna in the ScanResult and is attempting to connect it to (GTConnectionState.CONNECTING). This is critical to differentiating between the happy path use case of no GoTenna being in range (GTConnectionState.SCANNING with no GTConnectionState.CONNECTING event), and the unhappy use case of a GoTenna being found (GTConnectionState.SCANNING followed by a GTConnectionState.CONNECTING event), but due to issues with the Bluetooth stack that require a re-enable of the Bluetooth adapter, the connection sequence stalls (no GTConnectionState.CONNECTED received after a reasonable time period). Did this request make it into the next release of the Android GoTenna SDK?

Thanks!
Stefan


#4

Hi Stefan,

Thank you for all of the additional info. I think you are right about the auto-reconnect flag. I have been doing some digging to see what I can find about similar issues and found a fairly helpful document that Nordic created, which elaborates on various Android BLE issues, and has a whole section devoted to the “auto reconnect” flag, which is far more helpful than how the official Android documentation describes the flag. I experimented with switching the flag to false, and in my limited testing so far, I have not seen any issues with doing so. Dependent on more testing, we may roll that change into the next SDK release.

I am fairly certain that the crash you have described is fixed in our internal build, but feel free to send extra details and logs to support@gotenna.com, so that they can be forwarded my way, and we can make sure.

As for adding a CONNECTING state, I do not see the request logged anywhere, so I have taken the liberty of adding the request to our internal feature/bug tracker. Off the top of my head, this seems like it is fairly low risk for us to add in, and I can see how it may be helpful for you and other developers, so I think there is a good chance this can make it into the next release.

Tom


#5

Great to hear the that the initial results regarding setting the auto reconnect flag to false have been positive!

I apparently don’t have a saved log file containing the exception that I believe was causing the bluetooth stack to crash on status code 133. I’m actually re-running all of my stabiliy tests on the supported devices at the moment after adding support for the Asus Zenpad 8, so in the next week or two I will be able to forward you a log with that exception.

Thanks for logging a request for the GTConnectionState addition. It’s looking quite promising that in the next version of the Android GoTenna SDK I will not need to do any frowned on development activities to create a usable product!