preloader

Quantifying Ignorance

Ever since I was a child I have had a dream, albeit a naïve one, to know everything. Obviously, I am not the most brilliant person imaginable, but what it has resulted in is an extreme generalist skill set. I have a BA as well as an MSc, I have worked in public and the private sector, I have done very manual jobs and new millennial jobs like quantitative research. I like to think of myself as a jack of all trades and master of none.

Marshall McLuhan once wrote that violence is a quest for identify. He predicted, quite rightly, that as the ‘global village’ grows people will realise that the little things that they feel make them unique are not so unique, resulting in a need to redefine or destroy others to make themselves ‘personalised.’ I mention this because I feel I am on that same quest, sans destorying people. Every time I complete an analysis, learn a skill, get a job or promotion social media confronts me with people with much more success, knowledge, experience who are achieving even greater things.

At this stage I have two options: Curl up in my bed and say “what’s the point, I will never be the best.” Or I can get up every day and keep trying to make myself better and painfully recognize my limitations and try to overcome them. Iperfer to do the later.

I am also a long term planner. I like to work in five year plans that include personal, professional and financial goals. It helps keep me focused and is usually a five-year projection with a slight ‘push’ goal. If I say I will normally save £100 a month, I force myself to save £110. If I say I will get a pay increase of 2% I fight for 3%. So I am always pushing, but realistically so I both achieve and avoid disappointment. It has been the same with my personal goals. For the past 10 years, becoming an intelligence analyst was that goal. In the process, I acquired more skills and seniority than the traditional role requires. Because of my own past success I have been forced to re-evaluate my future. The conclusion I reached is to pursue a career in Data Science. To achieve that goal I need a road map.

The name of this blog known unknowns, is a infamously mocked, but poorly understood intelligence concept. However known unknowns are at the heart of what the intelligence community does. They are important because we need to understand what we do not know, which is just as important to decision making as what we do know. A blindly confident decision on half data is still a failure even if you trip into success becasue we have learned nothing and cannot replicate that result. So a known unknown implies two states of knowledge; a theoretical level of information necessary to know EVERYTHING about situation, and what you do know. In the gap is uncertainty that compromises our ability to achieve goals within the that situation. We want that risk of failure to be as low as possible.

So here I am, a dream to be a data scientist and an intelligence problem of creating a road map to get there in an uncertain environment. 500 words later I get to the point of the post. Is there a way I can quantify what I don’t know about data science that will help me making a proper learning plan? Can I quantify my own ignorance? Let’s try!

Method

As above, I need an external and objective criterion to know what the entirety of the problem is. For this I selected Uducity’s Data Career Skills Guidance. For those unfamiliar, Uducity a skills training platform for the digital economy. They put out great stuff on what employers of digital and tech industry are doing and looking for in employees. Udcity provides 28 skills in 10 categories they recommend a data scientist should have. I separated some of the skills out such as R/Python to R and Python. So I ended up with 36 skills.

CategorySkill
Programming and ToolsSpreadsheets
Programming and ToolsSQL
Programming and ToolsR
Programming and ToolsPython
Programming and ToolsJupyter
StatisticsDescriptive
StatisticsInferential
StatisticsA/B Testing
MathematicsAlgebra
MathematicsNotation
Data WranglingData Cleaning
Data WranglingData Blending
Data WranglingData Transforming
Data WranglingData Formatting
Data WranglingRelational Databases (SQL)
Communication and Data VisualProgramming
Communication and Data VisualDashboarding
Data IntuitionAsking Questions
Data IntuitionBecoming Subject Matter Experts
Data IntuitionBusiness and product things
Data IntuitionPractice!
Machine LearningSupervised Learning
Machine LearningUnsupervised Learning
Software SkillsCoding Testing. Debugging
Software SkillsVersion Control
Software SkillsModel Deployment
Software SkillsData at Scale
Advanced MathematicsLinear Algebra
Advanced MathematicsCalculus
Experiment DesignControlling variables and choosing good control and testing groups
Experiment DesignSample Size and Power law
Experiment DesignConfidence level
Experiment DesignSMART experiments: Specific, Measurable, Actionable, Realistic, Timely
Experiment DesignBayesian Statistics
Experiment DesignBootstrapping
Experiment DesignSimulation

Now I need to apply myself to these areas with some kind of grading option. I am considering three solutions:

Binary 1/0- You have the skill or not. But this doesn’t give any nuances

Continuous Scale – 1 to n with the numbers providing a level of expertise. More nuances without a great deal of complexity

I have arbitrarily chosen a continues scale of 1 to 5. My fear of using binary or a 1 to 3 scale is that there will not be enough nuances to make decisions about my level knowledge and most items would either be 1 in binary or 2 in continuous scale. Anything higher than five will be too nuanced to draw comparative insight from. For defining the scores I tried to create a non-bias system for grading. I choose to use the amount of help I would need to apply the skill. Such as:

1- Could not perform on my own even with help

2- Would require assistance or training to implement

3- With self study/ guidance, could understand and apply

4- Likely apply with minimal assistance

5- Comfortable to perform on my own

I also added a second criteria which is my assessment of confidence in that skill as per the guidance of Applied Information Economics Estimative Calibration by Douglas Hubbard. This gives some check to my subjective skills assessment.

1- 1% to 20% Confidence

2- 21% to 40% Confidence

3- 41% to 60% Confidence

4- 61% to 80% Confidence

5- 81% to 100% Confidence

The intersection between the two points will also help me in understanding any over or under confidence in my skill set that will affect a learning plan. My completed dataset looks like:

CategorySkillKnowledgeConfidence
Programming and ToolsSpreadsheets55
Programming and ToolsSQL34
Programming and ToolsR43
Programming and ToolsPython43
Programming and ToolsJupyter24
StatisticsDescriptive43
StatisticsInferential43
StatisticsA/B Testing14
MathematicsAlgebra44
MathematicsNotation44
Data WranglingData Cleaning45
Data WranglingData Blending45
Data WranglingData Transforming44
Data WranglingData Formatting44
Data WranglingRelational Databases (SQL)34
Communication and Data VisualProgramming43
Communication and Data VisualDashboarding55
Data IntuitionAsking Questions54
Data IntuitionBecoming Subject Matter Experts54
Data IntuitionBusiness and product things44
Data IntuitionPractice!42
Machine LearningSupervised Learning44
Machine LearningUnsupervised Learning34
Software SkillsCoding Testing. Debugging35
Software SkillsVersion Control35
Software SkillsModel Deployment25
Software SkillsData at Scale15
Advanced MathematicsLinear Algebra33
Advanced MathematicsCalculus43
Experiment DesignControlling variables and choosing good control and testing groups35
Experiment DesignSample Size and Power law35
Experiment DesignConfidence level43
Experiment DesignSMART experiments: Specific, Measurable, Actionable, Realistic, Timely45
Experiment DesignBayesian Statistics44
Experiment DesignBootstrapping24
Experiment DesignSimulation34

Analysis

The first thing we can look at is a radar graph that compares about my skill to the ‘idyllic’. To do this I first aggregated my scores board category. I then calculated the maximum score I could achieve in each category using our 1 to 5 scale and overlaid that on the graph.

While this graph is interesting we need to hone in on areas of weakness for an individual learning plan. We can do that by looking at the percentage I scored against the total points available.

CategoryMy ScoreTotal ScorePercent
Advanced Mathematics71070.00
Communication and Data Visual91090.00
Data Intuition182090.00
Data Wrangling192576.00
Experiment Design233565.71
Machine Learning71070.00
Mathematics81080.00
Programming and Tools182572.00
Software Skills92045.00
Statistics91560.00

So in what board categories am I under developed? It looks like Statistics and Software Skills and experimental design are under 70%. I do find Statistics is a bit surprising. Let’s go deeper into the skills again.

CategorySkillKnowledgeConfidence
8StatisticsA/B Testing14
27Software SkillsData at Scale15
26Software SkillsModel Deployment25
35Experiment DesignBootstrapping24
24Software SkillsCoding Testing. Debugging35
25Software SkillsVersion Control35
30Experiment DesignControlling variables and choosing good control and testing groups35
31Experiment DesignSample Size and Power law35
36Experiment DesignSimulation34
6StatisticsDescriptive43
7StatisticsInferential43
32Experiment DesignConfidence level43
33Experiment DesignSMART experiments: Specific, Measurable, Actionable, Realistic, Timely45
34Experiment DesignBayesian Statistics44

Okay, so statistics is lowered by my lack of experience in A/B Testing (A topic I know on paper, but have never had to do), but in terms of more traditional statistics I am alright. For software skills, I am being let down by my lack of ‘scale’ experience. Point taken! Any company I have worked for was desktop only. And for experimental design it is a mixed bag. In some areas, I look great and in others, like bootstrapping and simulation I need some work.Great so we now have a priority list of items to tackle in our learning plan!

Conclusion

The tagline for this blog is Albert Einstein’s quote;

As our circle of knowledge expands, so does the circumference of darkness surrounding it.

Being under confident means I am at least internally consistent with that belief. Obviously there are problems with my analysis. Having a more objective knowledge score would be a big help and would negate the need for a confidence score. There are also many skills I do possess that are necessary for a data scientist that do not appear in this list such as; communication and presentation skills, and my experience in intelligence that is currently summed up here as ‘Asking Questions.’ This is as far as I have gotten. What do you think of my approach? Is there any other way I am quantify what I don’t know? My plan is to make is a project which I return to from time to time, update the data and keep exploring.

Related Post

  • R
    May 1, 2020