Quantifying Ignorance

Ever since I was a child I have had a dream, albeit a naïve one, to know everything. Obviously, I am not the most brilliant person imaginable, but what it has resulted in is an extreme generalist skill set. I have a BA as well as an MSc, I have worked in public and the private sector, I have done very manual jobs and new millennial jobs like quantitative research. I like to think of myself as a jack of all trades and master of none.

Marshall McLuhan once wrote that violence is a quest for identify. He predicted, quite rightly, that as the ‘global village’ grows people will realise that the little things that they feel make them unique are not so unique, resulting in a need to redefine or destroy others to make themselves ‘personalised.’ I mention this because I feel I am on that same quest, sans destorying people. Every time I complete an analysis, learn a skill, get a job or promotion social media confronts me with people with much more success, knowledge, experience who are achieving even greater things.

At this stage I have two options: Curl up in my bed and say “what’s the point, I will never be the best.” Or I can get up every day and keep trying to make myself better and painfully recognize my limitations and try to overcome them. Iperfer to do the later.

I am also a long term planner. I like to work in five year plans that include personal, professional and financial goals. It helps keep me focused and is usually a five-year projection with a slight ‘push’ goal. If I say I will normally save £100 a month, I force myself to save £110. If I say I will get a pay increase of 2% I fight for 3%. So I am always pushing, but realistically so I both achieve and avoid disappointment. It has been the same with my personal goals. For the past 10 years, becoming an intelligence analyst was that goal. In the process, I acquired more skills and seniority than the traditional role requires. Because of my own past success I have been forced to re-evaluate my future. The conclusion I reached is to pursue a career in Data Science. To achieve that goal I need a road map.

The name of this blog known unknowns, is a infamously mocked, but poorly understood intelligence concept. However known unknowns are at the heart of what the intelligence community does. They are important because we need to understand what we do not know, which is just as important to decision making as what we do know. A blindly confident decision on half data is still a failure even if you trip into success becasue we have learned nothing and cannot replicate that result. So a known unknown implies two states of knowledge; a theoretical level of information necessary to know EVERYTHING about situation, and what you do know. In the gap is uncertainty that compromises our ability to achieve goals within the that situation. We want that risk of failure to be as low as possible.

So here I am, a dream to be a data scientist and an intelligence problem of creating a road map to get there in an uncertain environment. 500 words later I get to the point of the post. Is there a way I can quantify what I don’t know about data science that will help me making a proper learning plan? Can I quantify my own ignorance? Let’s try!

Method

As above, I need an external and objective criterion to know what the entirety of the problem is. For this I selected Uducity’s Data Career Skills Guidance. For those unfamiliar, Uducity a skills training platform for the digital economy. They put out great stuff on what employers of digital and tech industry are doing and looking for in employees. Udcity provides 28 skills in 10 categories they recommend a data scientist should have. I separated some of the skills out such as R/Python to R and Python. So I ended up with 36 skills.

Category	Skill
Programming and Tools	Spreadsheets
Programming and Tools	SQL
Programming and Tools	R
Programming and Tools	Python
Programming and Tools	Jupyter
Statistics	Descriptive
Statistics	Inferential
Statistics	A/B Testing
Mathematics	Algebra
Mathematics	Notation
Data Wrangling	Data Cleaning
Data Wrangling	Data Blending
Data Wrangling	Data Transforming
Data Wrangling	Data Formatting
Data Wrangling	Relational Databases (SQL)
Communication and Data Visual	Programming
Communication and Data Visual	Dashboarding
Data Intuition	Asking Questions
Data Intuition	Becoming Subject Matter Experts
Data Intuition	Business and product things
Data Intuition	Practice!
Machine Learning	Supervised Learning
Machine Learning	Unsupervised Learning
Software Skills	Coding Testing. Debugging
Software Skills	Version Control
Software Skills	Model Deployment
Software Skills	Data at Scale
Advanced Mathematics	Linear Algebra
Advanced Mathematics	Calculus
Experiment Design	Controlling variables and choosing good control and testing groups
Experiment Design	Sample Size and Power law
Experiment Design	Confidence level
Experiment Design	SMART experiments: Specific, Measurable, Actionable, Realistic, Timely
Experiment Design	Bayesian Statistics
Experiment Design	Bootstrapping
Experiment Design	Simulation

Now I need to apply myself to these areas with some kind of grading option. I am considering three solutions:

Binary 1/0- You have the skill or not. But this doesn’t give any nuances

Continuous Scale – 1 to n with the numbers providing a level of expertise. More nuances without a great deal of complexity

I have arbitrarily chosen a continues scale of 1 to 5. My fear of using binary or a 1 to 3 scale is that there will not be enough nuances to make decisions about my level knowledge and most items would either be 1 in binary or 2 in continuous scale. Anything higher than five will be too nuanced to draw comparative insight from. For defining the scores I tried to create a non-bias system for grading. I choose to use the amount of help I would need to apply the skill. Such as:

1- Could not perform on my own even with help

2- Would require assistance or training to implement

3- With self study/ guidance, could understand and apply

4- Likely apply with minimal assistance

5- Comfortable to perform on my own

I also added a second criteria which is my assessment of confidence in that skill as per the guidance of Applied Information Economics Estimative Calibration by Douglas Hubbard. This gives some check to my subjective skills assessment.

1- 1% to 20% Confidence

2- 21% to 40% Confidence

3- 41% to 60% Confidence

4- 61% to 80% Confidence

5- 81% to 100% Confidence

The intersection between the two points will also help me in understanding any over or under confidence in my skill set that will affect a learning plan. My completed dataset looks like:

Category	Skill	Knowledge	Confidence
Programming and Tools	Spreadsheets	5	5
Programming and Tools	SQL	3	4
Programming and Tools	R	4	3
Programming and Tools	Python	4	3
Programming and Tools	Jupyter	2	4
Statistics	Descriptive	4	3
Statistics	Inferential	4	3
Statistics	A/B Testing	1	4
Mathematics	Algebra	4	4
Mathematics	Notation	4	4
Data Wrangling	Data Cleaning	4	5
Data Wrangling	Data Blending	4	5
Data Wrangling	Data Transforming	4	4
Data Wrangling	Data Formatting	4	4
Data Wrangling	Relational Databases (SQL)	3	4
Communication and Data Visual	Programming	4	3
Communication and Data Visual	Dashboarding	5	5
Data Intuition	Asking Questions	5	4
Data Intuition	Becoming Subject Matter Experts	5	4
Data Intuition	Business and product things	4	4
Data Intuition	Practice!	4	2
Machine Learning	Supervised Learning	4	4
Machine Learning	Unsupervised Learning	3	4
Software Skills	Coding Testing. Debugging	3	5
Software Skills	Version Control	3	5
Software Skills	Model Deployment	2	5
Software Skills	Data at Scale	1	5
Advanced Mathematics	Linear Algebra	3	3
Advanced Mathematics	Calculus	4	3
Experiment Design	Controlling variables and choosing good control and testing groups	3	5
Experiment Design	Sample Size and Power law	3	5
Experiment Design	Confidence level	4	3
Experiment Design	SMART experiments: Specific, Measurable, Actionable, Realistic, Timely	4	5
Experiment Design	Bayesian Statistics	4	4
Experiment Design	Bootstrapping	2	4
Experiment Design	Simulation	3	4

Analysis

The first thing we can look at is a radar graph that compares about my skill to the ‘idyllic’. To do this I first aggregated my scores board category. I then calculated the maximum score I could achieve in each category using our 1 to 5 scale and overlaid that on the graph.

While this graph is interesting we need to hone in on areas of weakness for an individual learning plan. We can do that by looking at the percentage I scored against the total points available.

Category	My Score	Total Score	Percent
Advanced Mathematics	7	10	70.00
Communication and Data Visual	9	10	90.00
Data Intuition	18	20	90.00
Data Wrangling	19	25	76.00
Experiment Design	23	35	65.71
Machine Learning	7	10	70.00
Mathematics	8	10	80.00
Programming and Tools	18	25	72.00
Software Skills	9	20	45.00
Statistics	9	15	60.00

So in what board categories am I under developed? It looks like Statistics and Software Skills and experimental design are under 70%. I do find Statistics is a bit surprising. Let’s go deeper into the skills again.

	Category	Skill	Knowledge	Confidence
8	Statistics	A/B Testing	1	4
27	Software Skills	Data at Scale	1	5
26	Software Skills	Model Deployment	2	5
35	Experiment Design	Bootstrapping	2	4
24	Software Skills	Coding Testing. Debugging	3	5
25	Software Skills	Version Control	3	5
30	Experiment Design	Controlling variables and choosing good control and testing groups	3	5
31	Experiment Design	Sample Size and Power law	3	5
36	Experiment Design	Simulation	3	4
6	Statistics	Descriptive	4	3
7	Statistics	Inferential	4	3
32	Experiment Design	Confidence level	4	3
33	Experiment Design	SMART experiments: Specific, Measurable, Actionable, Realistic, Timely	4	5
34	Experiment Design	Bayesian Statistics	4	4

Okay, so statistics is lowered by my lack of experience in A/B Testing (A topic I know on paper, but have never had to do), but in terms of more traditional statistics I am alright. For software skills, I am being let down by my lack of ‘scale’ experience. Point taken! Any company I have worked for was desktop only. And for experimental design it is a mixed bag. In some areas, I look great and in others, like bootstrapping and simulation I need some work.Great so we now have a priority list of items to tackle in our learning plan!

Conclusion

The tagline for this blog is Albert Einstein’s quote;

As our circle of knowledge expands, so does the circumference of darkness surrounding it.

Being under confident means I am at least internally consistent with that belief. Obviously there are problems with my analysis. Having a more objective knowledge score would be a big help and would negate the need for a confidence score. There are also many skills I do possess that are necessary for a data scientist that do not appear in this list such as; communication and presentation skills, and my experience in intelligence that is currently summed up here as ‘Asking Questions.’ This is as far as I have gotten. What do you think of my approach? Is there any other way I am quantify what I don’t know? My plan is to make is a project which I return to from time to time, update the data and keep exploring.

Quantifying Ignorance

Method

Analysis

Conclusion

Tags

Related Post

Will Make You...Curious

Logs

Silver Linings

R

Fudging Numbers