Eaase: An Emulator-as-a-Service with ADB

Today we’re launching… well, beta-launching, a new Android Emulator-as-a-service, for your CI jobs. Say hello to Eaase.dev.

Unlike the other Emulator services, we have the easiest interface with the remote emulator: we just give you an ADB connection so you can do whatever you want with it.

But why?

There are quite a few emulator services out there, the one you’re probably most familiar with being Firebase Test Lab (or, Gradle Managed Devices). These services are not technically an emulator as a service, they’re more like a test-runner as a service. They take an APK, run the tests, and send back the results.

This has some advantages: for example, Marathon Labs is a similar service, but since it has access to the APK it can do things like test-sharding or re-running flaky tests.

But each of these services have their own non-standard API to interact with them. This means that there are many Gradle plugins out there that won’t work with them. In particular, we at Screenshotbot care about screenshot testing: and most of the Android screenshot testing libraries don’t support even Gradle Managed Devices.

We wanted an emulator-as-a-service that can help our customers with their Screenshot testing infrastructure, and that’s how we got here.

Give it a try

Eaase is in Beta. For now, we’re keeping the service absolutely free, for reasonable use. If you sign up now, we’ll also waive any monthly fees on your account forever. We expect to fully launch in a few months.

First, sign up and create an API Token, export it as follows:

export EAASE_API_TOKEN=...

Now, we need to download the eaase client:

curl https://cdn.eaase.dev/installer.sh | sh 

(Ping me if you want the Windows binaries.)

Now, you can check-out an emulator and run arbitrary command on it:

$ ~/eaase/eaase run -- adb shell ls
acct
adb_keys
apex
bin
...
vendor
$

What just happened here: we checked out an emulator, ran the command and then discarded the emulator. If you have an Android project lying around, try running your androidTests via Eaase:

$ ~/eaase/eaase run -- ./gradlew :connectedDebugAndroidTests
...

If you want to try screenshot testing, here’s a sample app that uses Shot. You can record or verify screenshot tests using Eaase:

$ ~/eaase/eaase run -- ./gradlew :executeScreenshotTests

In particular, this last command will require more complex ADB commands that other Emulator services won’t be able to provide.. or at least, not easily. (It can be done with Firebase Test Lab, but it’s super convoluted, and you’ll spend a lot of time testing the integration.)

Some screenshot tests will also require you to enable the hidden_api_policy. This is also something that couldn’t be done with Firebase Test Lab, but is easy to do if you have an ADB connection with Eaase.

Performance

Currently we’re seeing 15-20s boot times for emulators, which we think is not bad. But we want to get to zero-second boot times (i.e. by keeping emulators ready for you as soon as you request it).

Security

All the ADB connections are tunneled over an encrypted channel. Our company is also in the process of getting SOC 2 Type II certification We are SOC 2 Type II compliant as of today, soon after I posted this blog post. I’ll post a formal blog post announcing this soon.

Reliability

Each time we start up an emulator, we run a bunch of sanity checks on it to make sure it’s in a healthy state before sending it over to you. So you can be confident that the emulator you get is reliable for your CI jobs.

Pricing

As mentioned, this is currently completely free (for reasonable use). Once we fully launch, we expect to charge about $10 a month, plus $0.01 per minute of emulator use. If you sign-up now and try out an emulator, we’ll waive the monthly fee forever, so you only have to pay for what you use.

Open Source

If you’re familiar with our work, then you’ll know we love open-source. We plan to open-source Eaase along with our full launch. If we open-source it too soon, it’ll slow down our ability to make significant architectural changes, which is why we aren’t doing it yet.

What works and doesn’t work

As I mentioned, this is in public Beta. As a company, we like getting early feedback, and we encourage you to give us any or all feedback, good or bad, even if your thoughts are half-baked. (Sometimes the half-baked feedback is the most valuable, because it gives us a lot of insight into how people think about our product.)

Here’s what to expect, and the future work we expect to do:

  • Capacity: we’re currently running on limited capacity of worker nodes. If we’re full, you might wait to get an emulator.
  • We have only one emulator configuration at the moment. An API Level 30 emulator. We’ll make this configurable in the very near future.
  • We plan to provide access to logcats from within our dashboard, but currently you can still generate logcats as artifacts using ADB during your CI jobs.
  • We also plan to enable WebRTC so that emulators can be accessed and used for development purposes which still having access to the ADB connection.
  • We plan to make it even easier to run multiple commands against a single checked-out emulator. For this, we’ll create an eaase connect command that connect and daemonized the Eaase process so you can run commands after the connection. For instance a CI job might look like this:
~/eaase/eaase connect
adb shell set global ...
./gradlew :...

And that’s all for now, please send your feedback to me. As always, if you’re looking to improve your screenshot testing infrastructure and would just like to talk about Screenshotbot, please reach out to me at arnold@screenshotbot.io.

Getting your team to adopt Screenshot Testing: The Definitive Guide

You’ve seen the Medium posts, you’ve seen the rosy tech-talks. And you know that the next big thing in your mobile testing toolkit needs to be screenshot tests. But you quickly hit roadblocks getting your team to use it. In this guide, we’ll help you understand and navigate these roadblocks.

But before that, let’s address screenshot vs snapshot tests. Most Medium posts will tell you that snapshot tests are more than just screenshot tests… well, they’re partly right and mostly wrong. In reality, the iOS world calls it snapshot tests, and the Android world calls it screenshot tests. You see, back in 2015 when I built and open-sourced screenshot-tests-for-android, the iOS world already had ios-snapshot-test-case, built by my colleagues at Facebook. I was too lazy to rename the main Screenshot class, and so I decided to call it screenshot testing instead of snapshot testing. screenshot-tests-for-android was the first screenshot testing library on Android, and the name stuck ever since on Android across multiple libraries by different people.

But I digress, let’s convince your team to use screenshot testing.

Your immediate team

Alright, so you’re excited after reading that fantastic Medium post, and you share it on your Slack: “We need this now!” Your manager is like, “Oh great, let’s do it”, and you’re like … what now?

Maybe you write your first screenshot test for your product screen, and commit the test and the screenshot to the repo. Hi-fives all around. Then Ben over there tries to make a change to your product screen and the tests fail and he comes over to you and asks “What now?” and you’re like, “duh, re-record those screenshots.” Then Alice tries to make another UI change, and has her tests fail, and asks you “What now?” and you’re like “duh, re-record those screenshots.” Eventually that happens enough times that you create a wiki page for it.

Once a week, you send that wiki page to a new-hire or intern.

But everyone knows you now, so that’s nice. You’re the screenshot testing person. They secretly hate you each time they re-record screenshots, but they certainly know you. And they’re also not going to add new screenshots, because each new screenshot means more screenshots for them to record.

That Linux person

So you’re basking in your new found fame, and along comes a new hire, with their disdain for corporate America and refusing to use the company Macs. Only Linux, thank-you-very-much. IT is annoyed but gets them a beefy Intel machine and tells them to do whatever with it.

So this person is chugging along, makes a UI change, and the usual drill happens: they message you, you introduce yourself (they need to know you!), you send them the wiki-page. They try to re-record and tell you that “the tests still don’t pass.”

So you dig around more Medium posts, and you find out that screenshot tests are dependent on the platform where you record screenshots. Linux and Mac, Intel and ARM, they generate slightly different screenshots.

You help them debug things for a while and eventually get mad, and tell them to get a Mac “like everyone else”. They refuse. A brawl ensues, HR is called in, your manager mediates and makes you compromise: it’s the screenshot tests or your job. You choose your job, the screenshot tests are disabled.

The DevOps person

But let’s say you did convince your Linux colleague to switch to a Mac. Things are going smoothly, until one day not long after, you have a DevOps engineer ping you, asking you why the repo size is growing so fast and accusingly points at all those PNG files.

“Oh,” you say “that’s the screenshot tests! super cool stuff, it lets… “ The DevOps engineer cuts you off and tells you you need to delete those screenshots. You propose this magic tool called GitLFS that all those Medium posts talk about, “could you help me set it up?”.

“Absolutely!”, says the DevOps engineer “Let me set up a meeting with the team, and prioritize this for Q3! anyway, thanks for understanding! In the meantime, let’s delete those screenshots.”

That Super-Senior-Staff Engineer

You know the one. This is the person that everybody looks up to. Their word is final. Everyone wants to be in their good grace for their promotion’s sake.

So this one time, their UI tests fail. Cue the annoyed back and forth with you, and you send them the wiki page etc.

“The screenshots look the same,” this person says.

You can’t argue of course, so you put everything you’re doing aside, and you pull up imagemagick and compare the images and show them what the difference is.

“Ok”, they say and moves along. Crisis averted. Now to figure out what you were working on before this interruption…

To summarize

Here’s what we learnt:

  • You will be the screenshot testing person
  • Each new dev needs to learn how to re-record screenshots
  • Screenshot recording will depend on the machine being recorded
  • At some point you will have to build and maintain a GitLFS infrastructure
  • Image comparison can be annoying, and people will ping you to do the comparison for them because it was your tests that blocked them.

How do you solve this?

Most of these problems can be solved with a simple trick: Record your screenshots on CI. So usually when a company has invested enough time in screenshot testing, they’ll eventually create a CI job to re-record screenshots and update your PR with the updated screenshots. This avoids all the issues with devs manually re-recording screenshots, and it also solves the issues with platform dependence since you’re always recording on an identical CI machine.

(This isn’t trivial to build from scratch though, you also need to figure out a way for devs to locally run screenshot tests.)

You still have the DevOps overhead of maintaining GitLFS, and having to deal with subpar image comparison tools that GitHub provides.

Using an external Service

So companies like AirBnb and Facebook have gone a different direction: on each CI run, the screenshots are generated an uploaded to a different service that maintains the screenshots. (Note that it is easier to scale a service than GitLFS, since in most cases you don’t need to read the entire image from disk, you only need a hash of the file content.)

Most of these services will just notify you with a build status on GitHub showing you the changed screenshots, and giving you a chance to review them.

A service also can provide better image comparison tools. In fact Screenshotbot has some pretty fantastic image comparison tools, which let’s you zoom to pixel changes, show you exact RGB values of pixels that changed etc.

Screenshotbot does not use GitLFS, we store our images on our own infrastructure that you don’t have to worry about, so it also reduces the operational overhead and costs for you.

An external service is also super easy to get started with. For example, we have a Gradle plugin that makes it trivial on Android, and for other platforms you just need to pass the recorded screenshots to our command line tool. Our backend is open source, so you don’t need to get approvals or budget commitments to set up a server.

But most importantly: you’ll no longer be known as the screenshot testing person. People will get notified when screenshot tests change, they don’t need to re-record screenshots and they don’t need to look up wiki pages. Nobody will ping you to compare screenshots for them since Screenshotbot will do it for them.

But your company’s screenshot tests will explode in number since adding screenshots doesn’t slow developers down. You’ll be the Hero that nobody will ever know…, so make sure you brag about it and share metrics on your slack channel every quarter about how much your screenshots have grown. (We’re building an analytics page very soon just so you can do this.)

Introducing a Gradle plugin for Screenshotbot

Screenshotbot has always been easy to integrate in Android. We now introduce our new Gradle plugin that makes it even simpler for Android teams to integrate their projects using existing screenshot testing libraries. In addition, it simplifies some workflows that developers use locally during development.

If you haven’t tried Screenshotbot before, here’s a quick primer: Screenshotbot is a platform-agnostic screenshot testing service. We don’t care how you generate your screenshots, as long as you can provide us a directory of screenshots. Once you upload them to us with the scripts we provide, we will compare it against an appropriate base commit, and notify you on the appropriate channel (e.g. a Pull Request or Slack). In particular, we are used by small and large Android, iOS and Web teams.

Library Support

Our Gradle plugin supports most of the popular Android screenshot testing libraries:

Local Runs

We recently introduced a Local Runs mode for Screenshotbot. Before this feature, even though teams used Screenshotbot in their CI, developers had to use the record/verify provided by their screenshot testing library when iterating locally. Screenshotbot’s Local Runs lets developers use a single workflow for both CI and local development, and they also can take advantage of our powerful image comparison tools.

It also reduces the context switching as people switch between different screenshot testing libraries.

Quick Setup and testing Local Runs

It’s pretty straightforward to install our plugin. Just include it in your your build.gradle:

plugin {
  id "io.screenshotbot.plugin" version "1.9"
}

And that’s really it! The plugin will detect which screenshot testing libraries are being used, and generate appropriate tasks. The plugin does not change the behavior of any existing tasks, so it should be safe to add this plugin while migrating to Screenshotbot.

The plugin provides three tasks (which are named slightly differently for each screenshot testing library). We have a record and a verify task, and a CI task for automating the CI process.

For the rest of this discussion, we’ll focus on Paparazzi.

If you run gradle tasks, you’ll now see that our plugin has added the new tasks, for each Paparazzi flavor combination.

Once you fully migrate to Screenshotbot, you’ll no longer have any images stored in Git or GitLFS. But our plugin handles the case if you still have screenshots stored: we just never verify screenshots against screenshots saved in the repository.

Instead the developer can choose to record the screenshots at any point of time by running gradle recordPaparazziDebugScreenshotbot. The recorded screenshots are not tied to any git commit, so the developer can run this even during incremental refactoring steps.

To check that no screenshots changed you will run gradle verifyPaparazziDebugScreenshots. If nothing changes, this will pass. If something changed, the task will fail and you’ll see a link to screenshotbot.io with the report of changes. At this point you could choose to re-record, or revert your latest changes.

We also made it easy for developers to install their API keys by running gradle installScreenshotbot: this step will just direct you to a URL to get a key and paste it back. (If you haven’t created an account with us yet, running this step will guide you through the process.)

Using the plugin in CI

In the past, we suggested that people integrating with these libraries should add all their screenshots to .gitignore, call the screenshot testing library’s record step in CI, and then call the script we provide.

The Gradle plugin streamlines this process further.

If you previously called gradle verifyPaparazziDebug on CI, you just change it to gradle recordAndVerifyPaparazziDebugScreenshotbotCI. That’s it. In most cases, our script will read environment variables from your CI to correctly figure out all the details we need such as whether this is a Pull Request or main branch commit etc.

What’s next

We chose to build this plugin because of the fantastic feedback we get from our users. If you think there’s another common configuration that we could automate, please let us know! In particular, we are working on Fastlane plugins for iOS developers.

If you’re using a screenshot testing library that’s not listed above, we’ll be happy to add it. It usually doesn’t take a lot of work for us to add an integration.

This Gradle plugin is also pretty new, so if you do see bugs, please reach out to support@screenshotbot.io, we’re pretty good about responding to issues quickly.

Scaling Screenshotbot

Introduction

At the heart of Screenshotbot, is a mapping from Git commits to a list of screenshots.

Around this simple mapping, we build powerful workflows and interactions. How do we implement such a mapping?

Before we begin, let’s formally define this requirement. For the sake of this blog post, we’ll look at a simplified version of what Screenshotbot actually does.

The list of screenshots we mentioned earlier is actually a map from a name->image (or in Java: Map<String, File>). But we need to store one such map for every commit, so this needs to look like commit->(name->image) (or in Java: Map<String, Map<String, File>>).

We need to figure out how to efficiently store this. We have some specific querying requirements, but I think we can just focus on the storage considerations for this blog post.

A naive solution

There’s an obvious first approach, and it’s actually the one we initially built into Screenshotbot. For each commit we just store the full map. If you come from a relational database world, you can think of this as a table with columns COMMIT, NAME, and FILE, with an index on COMMIT and NAME.

This naive solution is not bad, and will indeed get you quite far. In fact, this was the schema I used even at Facebook when I built the screenshot testing infrastructure there. (I’ve talked a bit about this in my Droidcon talk.)

And at Facebook this worked well … until we tried to build a feature where we commented on every diff with the screenshot changes. That’s when our database came to a crawl. We never solved this at Facebook, at least during my time there.

The scaling issue

The problem is that for a large company like Facebook, especially one that uses a mono-repo, you’re going to have thousands of commits every day.

And you could potentially have tens of thousands of screenshots. The database with this schema is growing at a rate of 10,000,000 rows per day. This is just not scalable without some massively distributed architecture.

But a massively distributed architecture adds a significant operational overhead. It also feels … overkill.

Consider this: most of those commits in the mono-repo aren’t really changing screenshots. Most of the rows we’re storing are redundant. This gets us to our first optimization: Add an extra layer of indirection.

In relational database terms, we could have a screenshot_map table, with an ID, NAME, and FILE. And each commit will point to a specific screenshot_map‘s ID. You will need to store some hashes for each screenshot_map in a separate table, in order to query for duplicates. This will remove a huge chunk of those redundant rows.

It also helps that in a large mono-repo screenshots are usually broken up into multiple product verticals: what Screenshotbot calls channels. This makes it more likely that some of the mappings will stay unchanged for longer periods of time. But this relies on the end-user being well-behaved and proactively trying to break up screenshots into channels.

So this solution combined with a well-behaved user, is actually not that bad. You’ll get quite far with this. And at Screenshotbot, it got us quite far for a few years. But eventually this wasn’t good enough.

It turns out we do have some large channels. And it turns out that some of our largest channels also has flaky screenshots. Screenshotbot provides tooling to support flakiness, so we need to be able to handle flakiness without scaling issues. But flaky screenshots will cause new screenshot_maps to be created each time even if a single screenshot is flaky. This is not good.

Heuristics

This leads to our second insight: even if screenshots change, in most cases only a few screenshots are actually changing. Let \delta_c be the number of changes in commit c. Let M be the total number of unique images uploaded across all the commits, then it’s easy to see that:

\sum_{c \in \text{commits}} \delta_c \approx M

It’s possible that there are a few duplicates: for instance, for a revert commit, the \delta_c might be non-zero, but no new images would be added, since we would be re-using a previously known image. In practice this is small proportion of commits, so our approximation is good enough.

And M is a much more manageable number. In fact, it’s manageable enough that people have been storing all of their screenshots in Git or GitLFS for years, and each of the images would’ve been another file in Git. If we can scale with M, then that would be pretty ideal and would not require any kind of massively distributed architecture.

This leads to an obvious heuristic: Perhaps we can just store the deltas? Progress! but there are a whole lot of edge-cases to consider. We have to replay the deltas to generate the map, how do we do that efficiently? How do we pick which previous map to take the delta from? If we replay \delta_{c_1} and \delta_{c_2}, should we cache the intermediate state of c_1? But that would mean storing the full map for c_1 and c_2… It might be possible to make this work, but as a small team we like simple solutions that are easy to maintain and debug. This isn’t it.

Functional collections

Enter functional collections.

If you’re coming from a functional programming language, this will not be new to you. If you’ve mostly been working with Java, Kotlin or the like: prepare to have your mind blown.

A functional map is like an Immutable map, but with better complexity guarantees. In particular, inserting an element to functional map is a O(\log{N}) operation, but does not modify the old copy of the map and instead just returns a new copy. Internally, a functional map uses a special form of binary search trees. In particular, an O(\log N) operation means at most O(\log N) new nodes are created.

(For an intuition of how this works, consider two large binary search trees that differ in just a few values. A large number of sub-trees in the two trees could be identical and the two trees could potentially share the same nodes between them.)

See where we’re going with this?

If we apply the \delta_{c_i} over the functional map for {c_{i-1}}, the cost of the operation (in terms of time complexity and additional memory) is O(\delta_{c_i}\log{N}) (where N is the number of screenshots per commit). Over the entire repository this will be

\sum_{c \in \text{commits}}{\delta_c \log{N}} = (\sum_{c \in \text{commits}}{\delta_c})\log{N} \approx M\log{N}

That is not bad! We already had an unavoidable need to scale with O(M), and this only adds a \log{N} factor. This will scale ridiculously well, even if it’s all hosted on a single machine. Even for a mono-repo.

(We add an extra optimization here to break up the “chain” of deltas, so in order to compute the map of the most latest commit, you don’t have to apply deltas from the beginning of time. In practice, this means most of the maps aren’t computed, and we would actually be scaling like O(M).)

We still have to address how we choose which commit to compute our deltas against. We use some heuristics here, but using functional maps makes it easier to run these heuristics efficiently. Essentially we look at some k last known maps, and pick a parent map that creates the lowest cost for us.

What does this mean for you?

In particular, this means that even our open-source version will scale pretty well for you, even when hosted on a single server.

But what about databases?

You might wonder what the schema for this looks like. Some of these operations that I described seem like it will have to hit a database multiple times (e.g. walking up a graph of parent maps), and that seems slow. And the answer is: we just don’t use a database. Everything is stored in memory. But that’s a blog post for another time.

Coming soon

In our future posts, we’ll also cover our image processing primitives, and how we’re able to efficiently provide features like Zoom to Change and Masks. We’ll cover how we handle high availability, and talk about some of the other popular open-source libraries we’ve built over time to support Screenshotbot.

Finally, here’s a link to our code that handles screenshot-maps.