If you have tried to understand the maths behind machine learning, including deep learning, you would have come across topics from Information Theory – Entropy, Cross Entropy, KL Divergence, etc. The concepts from information theory is ever prevalent in the realm of machine learning, right from the splitting criteria of a Decision Tree to loss functions in Generative Adversarial Networks.
If you are a beginner in Machine Learning, you might not have made an effort to go deep and understand the mathematics behind the “.fit()“, but as you mature and stumble across more and more complex problems, it becomes essential to understand the math or at least the intuition behind the maths to effectively apply the right technique at the right place.
When I was starting out, I was also guilty of the same. I’ll see “Cross Categorical Entropy” as a loss function in a Neural Network and I take it for granted – that it is some magical loss function that works with multi-class labels. I’ll see “entropy” as one of the splitting criterion in Decision Trees and I just experiment with it without understanding what it is. But as I matured,