Optimizing Heavy Web Service

Mateusz Kubuszok > Scalac

Agenda

  • when optimize
  • optimizing queries
  • caching requests
  • keeping things simple
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

-- Donald Knuth

How can we tell if optimization is needed and where?

Measure!


  trait Service[Request, Response] {

    def apply(request: Request): Future[Response]
  }

  object Service {

    def apply[Request, Response](body: Request => Future[Response]) =
      new Service[Request, Response] {

        override def apply(request: Request) = body(request)
      }
  }
            

def monitored[Request, Response](name: String)
                                (service: Service[Request, Response]) =
  new Service[Request, Response] {

    override def apply(request: Request) = {
      NewRelic.incrementCounter(name)
      val start = System.currentTimeMillis
      val response = service(request)
      response onComplete { case _ =>
        val time = System.currentTimeMillis - start
        NewRelic.recordResponseTimeMetric(name, time)
      }
      response onError { case ex => NewRelic.noticeError(ex) }
      response
    }
  }
            

val getUsersBillings: Service[UsersBillingsReq, UsersBillingsRes] =
  monitored("UserServices.getUserBillings") {
    Service { request =>
      Future {
        val formattedBillings = for {
          user     <- userRepository.fetchUsers(request.userIds)
          contract <- user.contracts
          billings <- contract.billings
        } yield formatBilling(user, contract, billing)

        UsersBillingsRes(formattedBillings)
      }
    }
  }
            

Queries to the database

  • n+1 query problem
  • recalculating things all over again

n+1 query

  • often consequence of using the Active Record pattern
  • usually not that much of an issue when the amount of requests is low, operations are simple and DB latency is low (e.g. small blogs, very simple CRUDs)
  • kills performance when service is heavily used, requires data from a long chain of relations and latency is non-negligible

// avg. user has 3 contacts
// avg. contract has 2 billings

val formattedBillings = for {
  user     <- userRepository.fetchUsers(request.userIds)
              // 1 DB query
  contract <- user.contracts
              // next 3 DB queries on avg.
  billing  <- contracts.billings
              // next 3*2=6 DB queries on avg.
} yield formatBilling(user, contract, billing)
        // total 10 queries on avg.
            

// avg. user has 3 contacts
// avg. contract has 2 billings

val users     = userRepository.fetchUsers(request.userIds)
                  .map(user => (user.id, user)).toMap
val contracts = contractRepository.findForUserIds(users.keys)
                  .map(contract => (contract.id, contract)).toMap
val billings  = billingRepository.findByContractIds(contracts.keys)
                // 3 queries in total

val formattedBillings = for {
  billing  <- billings
  contract <- contract.get(billing.contractId).toSeq
  user     <- users.get(contract.userId).toSeq
} yield formatBilling(user, contract, billing)
            

Caching DB requests

Result we'd like to cache


Future {
  val users     = userRepository.fetchUsers(request.userIds)
                    .map(user => (user.id, user)).toMap
  val contracts = contractRepository.findForUserIds(users.keys)
                    .map(contract => (contract.id, contract)).toMap
  val billings  = billingRepository.findByContractIds(contracts.keys)

  val formattedBillings = for {
    billing  <- billings
    contract <- contract.get(billing.contractId).toSeq
    user     <- users.get(contract.userId).toSeq
  } yield formatBilling(user, contract, billing)

  UsersBillingsRes(formattedBillings)
}
            

How could be acheive that with Redis?


{
  "EntityKey(user, 1)" : {
    "user-billings-1,2,5": "",
    ...
  },
  "EntityKey(user, 2)" : {
    "user-billings-1,2,5": "",
    ...
  },
  ...
  "user-billings-1,2,5": [serialized value],
  ...
}
            

case class EntityKey(type: String, id: String)

trait CacheContext[T] {

  def entityKeys(value: T): Seq[EntityKey] // entities result depends on
  def serializer: Serializer[T]            // T -> String
  def deserializer: Deserializer[T]        // String -> T
}
            

trait CacheHandler {
  def getOrPut[T](valueKey: String, ttl: Duration)
                 (valueF => Future[T]))
                 (implicit cacheContext: CacheContext[T],
                           executionContext: ExecutionContext): Future[T]
  def invalidate(entityKeys: EntityKey*)
}
            

Obtaining values from Redis


def get[T](key: String)
          (implicit cacheContext: CacheContext[T],
                    executionContext: ExecutionContext): Future[T] =
  redisClient.mget(key) map { gets =>
    gets.headOption map (_.toArray) map cacheContext.deserializer
  }
            

Storing values into Redis


def put[T](valueKey: String, value: T, ttl: Duration)
          (implicit cacheContext: CacheContext[T],
                    executionContext: ExecutionContext): Future[Unit] = {
  val entityKeys = cacheContext.entityKeys(value)
  val bytes = cacheContext.serializer(value).bytes
  val transaction = redisClient.multi()
  transaction.setex(valueKey, ttl.toSeconds, bytes)
  entityKeys map (_.toString) map { entityKey =>
    transaction.hmset(entityKey, Map(valueKey -> Array[Bytes]()))
    transaction.expire(entityKey, maxTtl.toSeconds)
  }
  transaction.exec() map (())
}
            

def getOrPut[T](valueKey: String, ttl: Duration)
               (valueF: => Future[T]))
               (implicit cacheContext: CacheContext[T],
                         executionContext: ExecutionContext): Future[T] =
  get[T](valueKey) flatMap { optionValue =>
    optionValue map Future.successful getOrElse {
      for {
        value <- valueF
        _     <- put[T](valueKey, value, ttl)
      } yield value
    }
  }
            

Invalidating values


val transaction = redisClient.multi()
val keys = entityKeys map (_.toString) map { key =>
  key -> transaction.hgetall(key) }
for {
  _ <- transaction.exec()
  invTransaction = redisClient.multi()
  allInvalidatedKeys <- Future.sequence(keys map { case (key, keyMapF) =>
    keyMapF map { keyMap =>
      val invKeys = keyMap.keys
      invTransaction.hdel(key, invKeys:_*)
      invKeys
    }
  }) map (_.flatten)
  _ = invTransaction.del(allInvalidatedKeys:_*)
  _ <- invTransaction.exec()
} yield ()
            

implicit val userBillingsContext = new CacheContext[UsersBillingsRes] {

  def entityKeys(value: UsersBillingsRes): Seq[EntityKey] = ...
  def serializer: Serializer[UsersBillingsRes]            = ...
  def deserializer: Deserializer[UsersBillingsRes]        = ...
}
            

cacheHandler.getOrPut[UsersBillingsRes](
    s"user-billings-${request.userIds.mkString}", 10 minutes) {
  Future {
    // ...

    UsersBillingsRes(formattedBillings)
  }
}
            

Caching API requests


trait UsersControllerImpl extends UsersController {

  def getBillingsForCurrentUserAndContractor(userId: Long) =
      authenticatedRequest { request =>
    val observedEntityId   = userId
    val observedEntityType = "user"
    val currentUserId = currentUser.id

    for {
      result <- userServices.getContractBillingsForPair(
          ContractBillingsForPairReuqest(userId, currentUserId))
    } yield {
      // create JSON from user
    }
  }
}
            

Lets improve CacheHandler a little


def getOrPut[T](valueKey: String, ttl: Duration, request: Request)
               (valueF => Future[T]))
               (implicit cacheContext: RequestCacheContext[T],
                         executionContext: ExecutionContext): Future[T] = {
  val uri    = request.uri
  val header = request.headers.get(Headers.Authentication)
                              .getOrElse("")
  val params = request.params.map { case (name, value) =>
    name + ":" + value.sorted.toString
  }.toSeq.sorted
  val requestSuffix = s"$uri-$header-${params.mkString}"

  getOrPut[T](s"$value-$requestSuffix", ttl)(valueF)
}
            

def getBillingsForCurrentUserAndContractor(userId: Long) =
    authenticatedRequest { request =>
  val observedEntityId = userId
  val currentUserId    = currentUser.id

  cacheHandler.getOrPut[BillingJsonResult](
      "api-billings-for-$userId", 10 minutes, request) {
    for {
      result <- userServices.getContractBillingsForPair(
          ContractBillingsForPairReuqest(userId, currentUserId))
    } yield {
      // create JSON from user
    }
  }
}
            

Removal of redundant queries


val users     = userRepository.fetchUsers(request.userIds)
                  .map(user => (user.id, user)).toMap
val contracts = contractRepository.findForUserIds(users.keys)
                  .map(contract => (contract.id, contract)).toMap
val billings  = billingRepository.findByContractIds(contracts.keys)

val formattedBillings = for {
  billing  <- billings
  contract <- contract.get(billing.contractId).toSeq
  user     <- users.get(contract.userId).toSeq
} yield formatBilling(user, contract, billing)

UsersBillingsRes(formattedBillings)
            

[
  {
    "id": 5543,
    "date": "2015-10-23",
    ...
    "_links": ...,
    "_embedded": {
      "user": {
        "id": 1,
        "name": "John",
        "surname": "Smith",
        ...
      },
      "contract": {
        "id": 646,
        ...
      },
      "billings": {
        "id": 766,
        ...
      }
    }
  },
  ...
]
            

Was it all really needed?

No!

We actually needed:

  • Billing
  • User ID
  • Contract ID


// UserId -> Billings
val billingsByUsers: Map[Long, Seq[Billing]] =
    billingRepository.findByUserIds(request.userIds)

val formattedBillings = for {
  (userId, billings) <- billingsByUsers
  billing            <- billings
  contractId        = billing.contractId
} yield formatBilling(userId, contractId, billing)

UsersBillingsResponse(formattedBillings)
            

[
  {
    "id": 5543,
    "date": "2015-10-23",
    "userId": 1,
    "contractId": 646,
    ...
  },
  ...
]
            

Summary

  • measure what needs to be optimized
  • avoid n+1 queries
  • cache queries and API requests when necessary
  • keep things simple and small

Questions?

Thank you!