By: Mike Palei
So you trained your model, a real miracle in the world of AI, and now comes the time to serve it. But how exactly? What is the best way? Caboom! 💥💥💥
- A web server instance? Something like Flask if you are in the Python world. But how do you solve concurrency and scaling issues? 🚀
- A managed restful endpoint? For example, one could use AWS Sagemaker with autoscaling. But then one needs to keep very close tabs on the budget. 💵
- Kubernetes? You will have to invest quite a bit of devops effort to manage it. 👷
- Managed Kubernetes cluster as a service? Once again, comes at a cost. 💵
Wait! What about Lambda? There is very little management effort once you configured it correctly and you only pay for what you really use. Sounds interesting, right? Especially if your customers pay you per API call. 😉