Blog Infos
Author
Published
Topics
, , , ,
Published
A glimpse of the Android app. Source: the author

 

Face recognition is a problem where the identity of the person needs to be determined from a picture of their face by searching an annotated set of reference images present in a database. It has wide applications in industry; eKYC, video surveillance and attendance monitoring being a few of them.

Performing face recognition on a handheld device, like a smartphone, can be beneficial for applications like attendance-monitoring or face-unlock as it does not need any specialized hardware and can be maintained/updated just like any other application on the device. Moreover, performing the recognition on-device can help achieve low-latency and reduced risk of data leakage. Building an app that does face-recognition on-device is challenging as it requires exploration along the following points:

  1. How do we detect faces from an image OR live camera preview in Android?
  2. Where do we store face images efficiently, such that the retrieval is fast and memory-efficient?
  3. Given two face images, how do we assert their common identity?

With the current state of Android development and machine learning, we are now able to answer all these questions and develop an app for on-device face recognition:

  1. Google’s MLKit or Mediapipe provide lightning-fast face detection SDKs for Android, iOS and Web.
  2. ObjectBox is a promising on-device NoSQL database faster than SQLite.
  3. FaceNet, an ML model provides embeddings that can be compared and used to determine the identity from a person’s face image. Moreover, TensorFlow Lite allows us to run FaceNet on Android with its Android library.

Assembling these technologies together and building an on-device face recognition app is what we’ll be exploring in this blog! We will first develop a clear understanding of the face recognition pipeline and then check the code/implementation in Kotlin.

How does the face recognition pipeline work?
An overview of the face recognition pipeline. Source: Author
  1. First, we prompt the user to select a set of images that belong to the same person, and also take the person’s name as input or the annotation.
  2. We use Mediapipe to detect faces from the images selected by the user and crop them.
  3. The FaceNet model takes the cropped faces from the previous step and encodes them into a vector/embedding, which is called a face embedding. This embedding captures the essential features of a face, such as the distance between the eyes, the shape of the jaw, and so on.
  4. The face embeddings are then stored in an ObjectBox database. ObjectBox is a high-performance embedded database that is well-suited for storing data on mobile devices and provides a vector index.
  5. On getting a new face in front of the camera, we, similar to step 2, use Mediapipe to detect faces in the frame received the camera preview and crop the face
  6. The FaceNet model again takes the detected faces and encodes them into face vectors/embeddings.
  7. The face embedding from the camera feed (query) is then compared to the face embeddings stored in the database (embeddings). A nearest neighbor search is performed to find the face embedding in the database that is most similar to the query embedding.
  8. If the nearest neighbor is close enough (determined by a threshold), then the system classifies the person in the camera feed as the person associated with the nearest neighbor. If the nearest neighbor is not close enough according to the threshold, then the system doesn’t recognize the person in the camera feed.
Implementation Overview
  1. Reading images from user’s Device
  2. Cropping faces from the image with Mediapipe’s Face Detector
  3. Transforming cropped face images into embeddings with FaceNet/TFLite
  4. Using ObjectBox to build a vector database
  5. Performing nearest-neighbor search and recognizing faces

The GitHub repository can be found here, which contains well-documented code following the best Android development practices and clean architecture guidelines.

https://github.com/shubham0204/OnDevice-Face-Recognition-Android?source=post_page—–076a40dbaac6——————————–

This project is an upgraded version of another project, with 230+ stars on GitHub,

https://github.com/shubham0204/FaceRecognition_With_FaceNet_Android?source=post_page—–076a40dbaac6——————————–

1. Reading Images From User’s Device

We allow the user to select multiple images from the device through a photo-picker and group them under the name of the person. Next, we use Mediapipe’s face detector to crop faces from those images and use our FaceNet model to produce embeddings. The embeddings along with the person’s data (i.e. the fields in PersonRecord data class) are then stored in the ObjectBox database.

We use theActivityResultContracts.PickMultipleVisualMedia() contract to open the device’s default photo picker and return the Uri of the selected images. The Uri list is stored in the view-model, where a function, also defined in the view-model, can process them.

val pickVisualMediaLauncher =
        rememberLauncherForActivityResult(
            contract = ActivityResultContracts.PickMultipleVisualMedia()
        ) {
            viewModel.selectedImageURIs.value = it
}
Button(
    enabled = viewModel.personNameState.value.isNotEmpty(),
    onClick = {
        pickVisualMediaLauncher.launch(
            PickVisualMediaRequest(ActivityResultContracts.PickVisualMedia.ImageOnly)
        )
    }   
) {
    Icon(imageVector = Icons.Default.Photo, contentDescription = "Choose photos")
    Text(text = "Choose photos")
}
2. Cropping faces from the image with Mediapipe’s Face Detector

In this step, we read a Bitmap from each Uri belonging to selectedImageURIs. Moreover, each Uri is also parsed for its EXIF data, which is a form of image metadata describing the image’s orientation, camera details, location tags etc. We use the EXIF data to restore the orientation of the image and then perform face detection on the image.

We also instantiate Mediapipe’s FaceDetector , which given a Bitmap image returns the coordinates of the bounding boxes representing each face in the image. As each image is tied to the name of a single person, we enforce the fact that the image contains only one face for simplicity.

suspend fun getCroppedFace(imageUri: Uri): Result<Bitmap> =
    withContext(Dispatchers.IO) {
        var imageInputStream =
            context.contentResolver.openInputStream(imageUri)
                ?: return@withContext Result.failure<Bitmap>(
                    AppException(ErrorCode.FACE_DETECTOR_FAILURE)
                )
        var imageBitmap = BitmapFactory.decodeStream(imageInputStream)
        imageInputStream.close()

        // Re-create an input-stream to reset its position
        // InputStream returns false with markSupported(), hence we cannot
        // reset its position
        // Without recreating the inputStream, no exif-data is read
        imageInputStream =
            context.contentResolver.openInputStream(imageUri)
                ?: return@withContext Result.failure<Bitmap>(
                    AppException(ErrorCode.FACE_DETECTOR_FAILURE)
                )
        val exifInterface = ExifInterface(imageInputStream)
        imageBitmap =
            when (
                exifInterface.getAttributeInt(
                    ExifInterface.TAG_ORIENTATION,
                    ExifInterface.ORIENTATION_UNDEFINED
                )
            ) {
                ExifInterface.ORIENTATION_ROTATE_90 -> rotateBitmap(imageBitmap, 90f)
                ExifInterface.ORIENTATION_ROTATE_180 -> rotateBitmap(imageBitmap, 180f)
                ExifInterface.ORIENTATION_ROTATE_270 -> rotateBitmap(imageBitmap, 270f)
                else -> imageBitmap
            }
        imageInputStream.close()

        // We need exactly one face in the image, in other cases, return the
        // necessary errors
        val faces = faceDetector.detect(BitmapImageBuilder(imageBitmap).build()).detections()
        if (faces.size > 1) {
            return@withContext Result.failure<Bitmap>(AppException(ErrorCode.MULTIPLE_FACES))
        } else if (faces.size == 0) {
            return@withContext Result.failure<Bitmap>(AppException(ErrorCode.NO_FACE))
        } else {
            // Validate the bounding box and
            // return the cropped face
            val rect = faces[0].boundingBox().toRect()
            if (validateRect(imageBitmap, rect)) {
                val croppedBitmap =
                    Bitmap.createBitmap(
                        imageBitmap,
                        rect.left,
                        rect.top,
                        rect.width(),
                        rect.height()
                    )
                return@withContext Result.success(croppedBitmap)
            } else {
                return@withContext Result.failure<Bitmap>(
                    AppException(ErrorCode.FACE_DETECTOR_FAILURE)
                )
            }
        }
    }

The TensorFlow Lite model used for face detection is stored in the assets folder of the app. The faceDetector is initialized as following,

private val modelName = "blaze_face_short_range.tflite" // in assets folder
private val baseOptions = BaseOptions.builder().setModelAssetPath(modelName).build()
private val faceDetectorOptions =
    FaceDetector.FaceDetectorOptions.builder()
        .setBaseOptions(baseOptions)
        .setRunningMode(RunningMode.IMAGE)
        .build()
private val faceDetector = FaceDetector.createFromOptions(context, faceDetectorOptions)

Also, rotateBitmap and validateRect are two utility functions that rotate the given bitmap and check bounds of the given Rect respectively,

private fun rotateBitmap(source: Bitmap, degrees: Float): Bitmap {
    val matrix = Matrix()
    matrix.postRotate(degrees)
    return Bitmap.createBitmap(source, 0, 0, source.width, source.height, matrix, false)
}

// Check if the bounds of `boundingBox` fit within the
// limits of `cameraFrameBitmap`
private fun validateRect(cameraFrameBitmap: Bitmap, boundingBox: Rect): Boolean {
    return boundingBox.left >= 0 &&
            boundingBox.top >= 0 &&
            (boundingBox.left + boundingBox.width()) < cameraFrameBitmap.width &&
            (boundingBox.top + boundingBox.height()) < cameraFrameBitmap.height
}

Once we’ve cropped a face from each of the user selected images, we can now proceed towards transforming them into embeddings or fixed-size vectors using the FaceNet model.

3. Transforming cropped face images into embeddings with FaceNet/TFLite

TensorFlow Lite is a reduced runtime for performing accelerated inference on a number of devices including desktop, mobile and edge. Our FaceNet model has been converted to the TFLite format and the TensorFlow team maintains a Maven package for the runtime. The Interpreter class is used to execute the model given an input Tensor object.

Before passing the image to the model, we need to perform some pre-processing on the image, which is possible with ImageProcessor included in the TFLite Android library,

// Input image size for FaceNet model.
private val imgSize = 160

// Output embedding size
private val embeddingDim = 512

private var interpreter: Interpreter
private val imageTensorProcessor =
    ImageProcessor.Builder()
        .add(ResizeOp(imgSize, imgSize, ResizeOp.ResizeMethod.BILINEAR))
        .add(StandardizeOp())
        .build()

// Op to perform standardization
// x' = ( x - mean ) / std_dev
class StandardizeOp : TensorOperator {

    override fun apply(p0: TensorBuffer?): TensorBuffer {
        val pixels = p0!!.floatArray
        val mean = pixels.average().toFloat()
        var std = sqrt(pixels.map { pi -> (pi - mean).pow(2) }.sum() / pixels.size.toFloat())
        std = max(std, 1f / sqrt(pixels.size.toFloat()))
        for (i in pixels.indices) {
            pixels[i] = (pixels[i] - mean) / std
        }
        val output = TensorBufferFloat.createFixedSize(p0.shape, DataType.FLOAT32)
        output.loadArray(pixels)
        return output
    }
}

Next, we define a method which takes in a cropped face image as a Bitmap and returns the embedding/vector as a FloatArray ,

// Gets an face embedding using FaceNet
suspend fun getFaceEmbedding(image: Bitmap): FloatArray =
    withContext(Dispatchers.Default) {
        return@withContext runFaceNet(convertBitmapToBuffer(image))[0]
    }

// Run the FaceNet model
private fun runFaceNet(inputs: Any): Array<FloatArray> {
    val faceNetModelOutputs = Array(1) { FloatArray(embeddingDim) }
    interpreter.run(inputs, faceNetModelOutputs)
    return faceNetModelOutputs
}

// Resize the given bitmap and convert it to a ByteBuffer
private fun convertBitmapToBuffer(image: Bitmap): ByteBuffer {
    return imageTensorProcessor.process(TensorImage.fromBitmap(image)).buffer
}

Chaining the functions getCroppedFace from step (2) and getFaceEmbedding from the above code snippet,

suspend fun addImage(personID: Long, personName: String, imageUri: Uri): Result<Boolean> {
    // Perform face-detection and get the cropped face as a Bitmap
    val faceDetectionResult = mediapipeFaceDetector.getCroppedFace(imageUri)
    if (faceDetectionResult.isSuccess) {
        // Get the embedding for the cropped face, and store it
        // in the database, along with `personId` and `personName`
        val embedding = faceNet.getFaceEmbedding(faceDetectionResult.getOrNull()!!)
        imagesVectorDB.addFaceImageRecord(
            FaceImageRecord(
                personID = personID,
                personName = personName,
                faceEmbedding = embedding
            )
        )
        return Result.success(true)
    } else {
        return Result.failure(faceDetectionResult.exceptionOrNull()!!)
    }
}

imagesVectorDB.addFaceImageRecord is responsible for storing the embedding in the vector database. In the next step, we build a schema for a database, a data-access object and see how we can perform vector-search, all with ObjectBox.

Job Offers

Job Offers

There are currently no vacancies.

OUR VIDEO RECOMMENDATION

, ,

On-Device Machine Learning with Google AI Edge

On-device machine learning has been steadily becoming more powerful and easy to use, allowing developers to add increasingly incredible features to their mobile apps, such as image generation, object detection, and LLM-based assistants.
Watch Video

On-Device Machine Learning with Google AI Edge

Paul Ruiz
On-Device ML DevRel Lead
Google

On-Device Machine Learning with Google AI Edge

Paul Ruiz
On-Device ML DevRel ...
Google

On-Device Machine Learning with Google AI Edge

Paul Ruiz
On-Device ML DevRel Lead
Google

Jobs

4. Using ObjectBox to build a vector database

In order to store the face embeddings and perform nearest-neighbor search on them, we need a database with vector-search capabilities. ObjectBox is a promising on-device embedded NoSQL database that supports vector indexing. We follow the official guide to add ObjectBox to our project, and then define data class entities that represent collections within our database.

@Entity
data class FaceImageRecord(
    // primary-key of `FaceImageRecord`
    @Id var recordID: Long = 0,

    // personId is derived from `PersonRecord`
    @Index var personID: Long = 0,

    var personName: String = "",

    // the FaceNet-512 model provides a 512-dimensional embedding
    // the FaceNet model provides a 128-dimensional embedding
    @HnswIndex(dimensions = 512)
    var faceEmbedding: FloatArray = floatArrayOf()
)

@Entity
data class PersonRecord(
    // primary-key
    @Id var personID: Long = 0,

    var personName: String = "",

    // number of images selected by the user
    // under the name of the person
    var numImages: Long = 0,

    // time when the record was added
    var addTime: Long = 0
)

You may read more on vector databases and vector search here:

https://www.pinecone.io/learn/vector-database/?source=post_page—–076a40dbaac6——————————–

We need two collections on-device, one to store the face embeddings and another to hold information about the people whose images were added to the database. Each face embedding has its own recordId stored alongside the faceEmbedding and the personId which is derived from a PersonRecord .

Next, we create two classes that manage interactions with these collections, and define methods for insertion, retrieval and deletion,

class ImagesVectorDB {

    private val imagesBox = ObjectBoxStore.store.boxFor(FaceImageRecord::class.java)

    fun addFaceImageRecord(record: FaceImageRecord) {
        imagesBox.put(record)
    }

    // Performs nearest neighbor search
    // and returns the similarity score
    fun getNearestEmbeddingPersonName(embedding: FloatArray): FaceImageRecord? {
        return imagesBox
            .query(FaceImageRecord_.faceEmbedding.nearestNeighbors(embedding, 10))
            .build()
            .findWithScores()
            .map { it.get() }
            .firstOrNull()
    }

    fun removeFaceRecordsWithPersonID(personID: Long) {
        imagesBox.removeByIds(
            imagesBox.query(FaceImageRecord_.personID.equal(personID)).build().findIds().toList()
        )
    }
}
class PersonDB {

    private val personBox = ObjectBoxStore.store.boxFor(PersonRecord::class.java)

    fun addPerson(person: PersonRecord): Long {
        return personBox.put(person)
    }

    fun removePerson(personID: Long) {
        personBox.removeByIds(listOf(personID))
    }
    
    // Returns the number of records present in the collection
    fun getCount(): Long = personBox.count()

    @OptIn(ExperimentalCoroutinesApi::class)
    fun getAll(): Flow<MutableList<PersonRecord>> =
        personBox.query(PersonRecord_.personID.notNull()).build().flow().flowOn(Dispatchers.IO)
}

We have defined a schema for our database along with helper classes that execute queries for operations required by our application. Next, we’ll discover how to determine the identity of a face captured in the app’s live camera preview.

5. Performing nearest-neighbor search and recognizing faces

To implement face recognition, we first need to set up a live camera preview with CameraX and attach an ImageAnalysis.Analyzer which is used to process each camera frame. For each camera frame, we repeat steps (2) and (3) again i.e. crop face images, transform them into FaceNet embeddings.

Each embedding is then passed to ImagesDB ‘s getNearestEmbeddingPersonName method which performs a nearest-neighbor search and returns the name associated with the embedding (nearest-neighbor). By performing this operation, we are able to determine an embedding e which lies closest to our query embedding q , but we still have to assert whether e and q belong to the same person. In order to make this assertion, we compute the cosine similarity between e and q and check if it is smaller than a fixed threshold which is 0.4 in our case.

The recognition is as follows,

if (cosine(q,e)) < 0.4 {
    // Belong to different people
} else {
    // Belong to the same person
    // Hence recognized person name -> e.personName
}

The following code snippet performs these actions by using the methods we had defined in steps 2 and 3,

// From the given frame, return the name of the person by performing
// face recognition
suspend fun getNearestPersonName(frameBitmap: Bitmap): String? {
    // Perform face-detection and get the cropped face as a Bitmap
    val faceDetectionResult = mediapipeFaceDetector.getCroppedFace(frameBitmap)
    if (faceDetectionResult.isSuccess) {
        // Get the embedding for the cropped face (query embedding)
        val embedding = faceNet.getFaceEmbedding(faceDetectionResult.getOrNull()!!)
        // Perform nearest-neighbor search
        val recognitionResult =
            imagesVectorDB.getNearestEmbeddingPersonName(embedding) ?: return null
        // Calculate cosine similarity between the nearest-neighbor
        // and the query embedding
        val distance = cosineDistance(embedding, recognitionResult.faceEmbedding)
        // If the distance > 0.4, we recognize the person
        // else we conclude that the face does not match enough
        return if (distance > 0.4) {
            recognitionResult.personName
        } else {
            "Not recognized"
        }
    } else {
        return null
    }
}

// Compute the cosine of the angle between the two vectors
private fun cosineDistance(x1: FloatArray, x2: FloatArray): Float {
    var mag1 = 0.0f
    var mag2 = 0.0f
    var product = 0.0f
    for (i in x1.indices) {
        mag1 += x1[i].pow(2)
        mag2 += x2[i].pow(2)
        product += x1[i] * x2[i]
    }
    mag1 = sqrt(mag1)
    mag2 = sqrt(mag2)
    return product / (mag1 * mag2)
}

The String returned by getNearestPersonName is then displayed with a bounding box over the camera preview with a custom View and its onDraw method.

This ends our face recognition pipeline, and we repeat step (5) for every face found in the live camera preview.

Conclusion

If you made it till here, congrats! I hope the face recognition pipeline was clear and the code implementation was well understood too. I recommend enthusiastic readers to implement such a pipeline in iOS or Flutter or KMP referring to the Android implementation on GitHub.

I’m an on-device ML enthusiast and developing ML apps on Android is my passion. Do check out my website and share your thoughts on this story. Keep learning and have a nice day ahead!

This article is previously published on proandroiddev.com

YOU MAY BE INTERESTED IN

YOU MAY BE INTERESTED IN

blog
Using annotations in Kotlin has some nuances that are useful to know
READ MORE
blog
One of the latest trends in UI design is blurring the background content behind the foreground elements. This creates a sense of depth, transparency, and focus,…
READ MORE
blog
Now that Android Studio Iguana is out and stable, I wanted to write about…
READ MORE
blog
The suspension capability is the most essential feature upon which all other Kotlin Coroutines…
READ MORE
Menu