Beyond the Hype: Realities of ML/AI Abstraction
As a data scientist, I see a lot of ML/AI projects boasting about what amazing things their library can do using ‘pip install’ and 3 lines of Python. In parallel, there is this insatiable need for data scientists to have libraries that can do every single little computer science thing to “offload” that mental burden of actually learning how the actual system works and quickly do the state-of-the-art thing they read in some arxiv article last week. I don’t know if this kind of thinking is a consequence of how simple Python is compared to compiled languages, or is a direct result of pushing higher and higher levels of abstraction in data science …
While it is amazing that we live in a world where these 3 lines of Python can do what world-class scientists couldn’t do on HPC 10 years ago, I don’t think that this is helpful. First of all, these 3 lines are hiding a huge amount of complexity! Secondly, I just think that constantly reaching for libraries out of laziness or fear isn’t helpful for data scientists or engineers in general. One of the joys of coding is actually spending time deeply understanding and building complex systems. If you should listen to somebody, the legendary John Carmack gave this advice: “Drive a nail down through whatever layer cake problem you’ve got and learn a cross-section from there.” So please, don’t just eat the cake frosting.
Ok, but complexity is bad and abstraction is good, right?
Not quite right, these amazing 3 lines are great when you start a project, you’ll copy-paste them into a pretty little notebook and everything is working like magic! Well, it’s not magic my friend, and your facial expression will change when you need to productionize your pretty notebook, optimize some part of the code, integrate it with your already existing services, deploy it on a different infrastructure, or one of the 1000 other things.
Hiding code complexity with abstraction isn’t the answer. Having good abstractions is a whole different story. Good abstractions don’t conceal complexity, they actually simplify it. They allow for behavior modification, optimized code paths, and modularized deployment.
Finally, I don’t believe you are actually saving that much time using these already premade amazing libraries. I’m not suggesting coding everything from scratch in assembly, but you should work at a higher layer of abstraction and determine where in the tech stack layers is good enough to be productive.