PCA : Analyzing UFC fighting

As a longtime martial artist and an MMA fan, I saw the progression of the sport known as mixed martial arts from a competition between two old fighting styles to a real sport, where competitors took what works from each martial art, tailored it to their fighting abilities to serve them best when facing other trained killers.

From the first UFCs where fighters where strict boxers, wrestlers, BJJ practitioners or tough guys, the sport evolved to a mixture of styles. There was no more of a ‘pure’ kickboxer, ‘pure’ wrestler etc.. each martial artist now can kick, punch, shoot for a takedown or pass guard and submit. Nonetheless, before each fight, competitors are still presented as strikers, wrestlers or grapplers, depending on their background, what they showed int their last fight or what the announcers think about the fighter-style/ ability.

As a data science enthusiast, this never sat right with me. I believe in a world view where assumptions and hypotheses must be tested against real-life data to gauge their truthfulness.

So I found a UFC DATASET on Kaggle and got to work 😄

MMA fighting styles

Overview of the project

Now armed with the dataset :

  • I first aggregated the data to have a dataframe of fighters and their mean 81 stats across all their fights( ex : mean body kick landed, mean takedown attempts …)
  • We have highly correlated features, so I dropped the features that are correlated more than 0.9 ( could have used the p-value, something to try in the V2)
  • I applied PCA to the resulting Dataframe
  • I applied K means on the PCA result (reduced the features to n-features =5)
  • Plotted the result of the clustering first in 2D (feature 1 and 2) than in 3D

Getting to work

I embedded the whole Jupyter notebook.

Well, well, well.

Let’s see what we’ve got, did we get the stereotypical features of Wrestle-Boxer, the Pure striker or ‘Specialist grappler’? The heatmap below, shows which combination of original features explain the variability in the data, which is really a really useful insight when trying to wich combination of abilities explain different fighting styles PCA result 1.Ability 1 :

  • Top 5 features :
TIP\_Control Time                             0.224394
Strikes\_Ground Total Strikes\_Attempts        0.212942
Strikes\_Body Significant Strikes\_Attempts    0.211967
TIP\_Distance Time                            0.211406
Strikes\_Kicks\_Attempts                       0.205699
  • Bottom 5 features
Grappling\_Reversals\_Landed                   0.053318
Strikes\_Knock Down\_Landed                    0.077610
Strikes\_Ground Significant Kicks\_Attempts    0.101966
Strikes\_Ground Leg Strikes\_Attempts          0.103285
TIP\_Side Control Time                        0.105258

2.Ability 2 :

  • Top 5 features :
TIP\_Ground Control Time                  0.262890
TIP\_Control Time                         0.233163
TIP\_Ground Time                          0.220465
Strikes\_Ground Total Strikes\_Attempts    0.220360
TIP\_Half Guard Control Time              0.200544
  • Bottom 5 features
Grappling\_Reversals\_Landed                   0.053318
Strikes\_Knock Down\_Landed                    0.077610
Strikes\_Ground Significant Kicks\_Attempts    0.101966
Strikes\_Ground Leg Strikes\_Attempts          0.103285
TIP\_Side Control Time                        0.105258

3.Ability 3 :

  • Top 5 features :
Strikes\_Clinch Head Strikes\_Attempts         0.319705
Strikes\_Body Significant Strikes\_Attempts    0.301377
Strikes\_Clinch Body Strikes\_Attempts         0.278838
Strikes\_Distance Head Strikes\_Attempts       0.259668
Grappling\_Standups\_Landed                    0.259087
  • Bottom 5 features
Strikes\_Kicks\_Attempts                        -0.214250
Strikes\_Ground Significant Punches\_Attempts   -0.206274
Strikes\_Clinch Significant Kicks\_Attempts     -0.188765
Strikes\_Distance Leg Kicks\_Attempts           -0.182671
Strikes\_Distance Head Kicks\_Attempts          -0.175149

4.Ability 4 :

  • Top 5 features :
TIP\_Mount Control Time            0.246223
Grappling\_Submissions\_Attempts    0.237841
Strikes\_Knock Down\_Landed         0.223450
TIP\_Ground Time                   0.196941
TIP\_Back Control Time             0.182116
  • Bottom 5 features
Strikes\_Clinch Leg Strikes\_Attempts           -0.309975
Strikes\_Clinch Body Strikes\_Attempts          -0.281264
Strikes\_Clinch Significant Kicks\_Attempts     -0.271919
Strikes\_Clinch Significant Punches\_Attempts   -0.267636
Strikes\_Ground Body Strikes\_Attempts          -0.247264

5.Ability 5 :

  • Top 5 features :
Strikes\_Leg Total Strikes\_Attempts           0.335641
Strikes\_Clinch Leg Strikes\_Attempts          0.319116
TIP\_Mount Control Time                       0.267034
TIP\_Clinch Time                              0.187386
Strikes\_Clinch Significant Kicks\_Attempts    0.187309
  • Bottom 5 features
Strikes\_Ground Significant Punches\_Attempts   -0.356793
Strikes\_Ground Significant Kicks\_Attempts     -0.331809
Strikes\_Ground Head Strikes\_Attempts          -0.218538
Strikes\_Ground Body Strikes\_Attempts          -0.187797
TIP\_Misc. Ground Control Time                 -0.184458


Well, first of all , i deleted features when highly correlated with each other, so the features i got after the PCA should be taken with a grain of salt as it s not the only ‘important feature’.

A first explanation is that the feature 1 : It correlates positively with all features, strongly with striking, less with knockdowns , grappling reversals, kick in the ground ( which is stupid) and ‘side control’ : it’s the equivalent of having good striking, good ground strikes in some form of a guard.

A first explanation is that feature 2: The second Feature is really interesting, it correlates highly with ground control time, ground strikes, half guard control, and ground strikes. Less correlation with grappling reversals, leg strikes in the ground ( again, stupid you’ll likely lose the position for little damage). This for me feels like the ability to grapple, the feature explains 12% of the variability of data

A first explanation is that feature 3: Now, the difference of this feature comes from the ability to land strikes at clinching distance, and body strikes and -wait for it- Grappling standup. Correlate negatively with ground punches, distance leg kicks, and head kick.

Ok, somebody who strikes at clinching distance, try to stand up and don’t like to throw kicks at distance. This sounds like a striker when fighting a wrestler/grappler. He will end up in a tie-up and he ‘ll avoid throwing kicks at distance so he avoids getting taken down.

A first explanation is that feature 4: This features is a combination of mount control time, submissions attempts, knocks downs and ground/back control time. Negatively correlated with all clinch strikes and ground body strikes. Well, this is a strangler. This is the ability to grapple and throw haymakers because you’re not worried about getting to the ground

A first explanation is that the feature 5: This is weird, as a feature of fighting it’s positively correlated with strikes at all distances, clinch time and mount control time, negatively correlated with ground strikes and ground control time

I can’t explain why this is a feature, this is a feature of holes in your game maybe …

V2. What’s next ?

  1. I did apply K means algorithm to cluster fighters
  2. A more in-depth analysis of the features can be done, i need to select more wisely the features( i did this in the v1 y eliminating features that are highly correlated with each other) 2Bis. This analysis is all about fighting ‘Offense’: This was a great insight my brother had, all stats show fighting offense ability, a stats about fighting defense should be calculated relative to your opponents landed Versus Attempts ratio: A great boxer ( Floyd Mayweather for instance) reduces the Landed AND attempts of his opponents -relative to their previous fights- when facing him. This should be added to the model
  3. I applied the t-SNE algorithm which is particularly well suited for the visualization of high-dimensional datasets which shows a horizontally symmetrical distribution of data points : TSN-e