Roadmap to becoming an Artificial Intelligence Expert in 2020

Below you find a set of charts demonstrating the paths that you can take and the technologies that you would want to adopt in order to become a data scientist, machine learning or an ai expert. We made these charts for our new employees to make them AI Experts but we wanted to share them here to help the community.

# Note

# Disclaimer

The purpose of these roadmaps is to give you an idea about the landscape and to guide you if you are confused about what to learn next and not to encourage you to pick what is hip and trendy. You should grow some understanding of why one tool would better suited for some cases than the other and remember hip and trendy never means best suited for the job.

# Introduction

GIT – Version ControlPapers with codePersonal Recommendation! Available Options Data Scientist Big Data Engineer Machine Learnin…Deep Learning Data Engineer Required for any pathAI Expert in 2020Choose your pathLegendSemantic VersioningKeep a ChangelogViewer does not support full SVG 1.1

# Data Science Roadmap

Data Scientist

Matrices & Linear Algebra Fundamentals

Matrices & Linear Algebra Fu…Database Basics

Relational vs. non-relational databases

Relational vs. non-relational databases

SQL + Joins (Inner, Outer, Cross, Theta Join)

SQL + Joins (Inner, Outer, Cross, Thet…NoSQLTabular Data

Data Frames & Series%3CmxGra…

Extract, Transform, Load (ETL)

Extract, Transform, Load (ET…

Reporting vs BI vs Analytics

Reporting vs BI vs AnalyticsData FormatsJSONXML

Regular Expressions (RegEx)

Regular Expressions (RegEx)Probability TheoryProbability distributionRandomness, random variable and…Conditional probability and…

(Statistical) independence

(Statistical) independenceiidcdf, pdf, pmf

Continuous distributions (pdf’s)

Continuous distributions (pd…

Cumulative distribution function (cdf)

Cumulative distribution function (cd…

Probability density function (pdf)

Probability density function (pdf)

Probability mass function (pmf)

Probability mass function (pmf)Normal / GaussianUniform (continuous)BetaDirichletExponentialUniform (discrete)

Discrete distributions (pmf’s)

Discrete distributions (pmf’… χ2 (chi-squared)BinomialMultinomialHypergeometricPoissonExpectation and meanImportant LawsSummary statisticsEstimationHypothesis Testing

Confidence Interval (CI)%3CmxGraphModel%3E%3Croot%3E%3CmxCell%20id%3D%220%22%2F%3E%3CmxCell%20id%3D%221%22%20parent%3D%220%22%2F%3E%3CUserObject%20label%3D%22Important%20Laws%22%20id%3D%222%22%3E%3CmxCell%20style%3D%22rounded%3D1%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22360%22%20y%3D%22740%22%20width%3D%22170%22%20height%3D%2230%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3C%2FUserObject%3E%3C%2Froot%3E%3C%2FmxGraphModel%3E

Confidence Interval (CI)%3Cm…Monte Carlo MethodGeometricVariance and standard deviation (…Covariance and correlationMedian, quartileInterquartile rangePercentile / quantileMode

Law of large numbers (LLN)

Law of large numbers (LLN)

Central limit theorem (CLT)

Central limit theorem (CL…

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (ML…

Kernel Density Estimation (KDE)

Kernel Density Estimation (KDE)p-ValueChi2 testF-testt-testPython BasicsImportant librariesVirtual EnvironmentsExpressionsVariablesData StructuresFunctions

Install packages (via pip, conda or similar)

Install packages (via pip, conda or si…Codestyle, e.g. PEP8NumpyPandasEcosystemManipulate Data FramesSubsetting DataReading CSV and raw dataFundamentalsStatisticsPython   Programming

Chart Suggestions thought starter

Chart Suggestions thought st…

Exploratory Data Analysis /

Data Munging / – Wrangling

Exploratory Data Analysis /…PythonMatplotlib

plotnine (like ggplot in R)

plotnine (like ggplot in R)Vega-LiteD3.jsTableauDash Dimensionality & Numerosity…VisualizationNormalization

Data Scrubbing,

Handling Missing Values

Data Scrubbing,…Unbiased EstimatorsBinning sparse valuesFeature ExtractionDenoisingSampling

Principal Component Analysis (PCA)

Principal Component Analysis…Machine LearningData EngineerCSVAwesome Public DatasetsKaggleJupyter Notebooks / LabWebDashboardsBIPowerBIseabornipyvolume (3D data)streamlitData Sources

# Machine Learning Roadmap

Machine Learning

Concepts, Inputs & Attributes

Concepts, Inputs & AttributesGeneralCategorical VariablesOrdinal VariablesNumerical Variables

Cost functions and

gradient descent

Cost functions and…

Overfitting / Underfitting

Overfitting / Underfitting

Training, validation

and test data

Training, validation…Precision vs RecallBias & VarianceLiftSupervised LearningMethodsUnsupervised LearningEnsemble LearningReinforcement LearningRegressionClassificationClassification RateDecision TreesNaïve Bayes ClassifiersLogistic RegressionLinear RegressionPoisson RegressionK-Nearest NeighbourSVMClustering

Association Rule Learning

Association Rule LearningDimensionality ReductionHierarchical ClusteringK-Means ClusteringDBSCANFuzzy C-MeansMean ShiftAgglomerative

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)BoostingBaggingStackingQ-LearningSentiment AnalysisCollaborative FilteringTaggingPredictionUse CasesToolsscikit-learnDeep LearningImportant librariesspacy (NLP)Huggingface Transform…Viewer does not support full SVG 1.1

# Deep Learning Roadmap

Deep Learning

Deep Learning Papers Reading Roadmap

Deep Learning Papers Reading…PapersPapers with code

Papers with code – state of the art

Papers with code – state of…


Neural Networks

Understanding…Neural NetworksPerceptronsAutoencoders

Convolutional Neural Networks


Convolutional Neural Network…

Generative Adversarial Networks (GAN)

Generative Adversarial Netwo…ArchitecturesAwesome Deep LearningToolsPyTorchkeep exploring and s…

Recurrent Neural Networks


Recurrent Neural Networks…LSMLSTMGRUTensorflowViewer does not support full SVG 1.1

# Data Engineer Roadmap

Data EngineerSummary of Data FormatsData Discovery

Data Source & Acquisition

Data Source & AcquisitionData IntegrationData Fusion

Transformation & Enrichment

Transformation & EnrichmentOpenRefineData SurveyHow much DataUsing ETL

Data Lake vs Data Warehouse

Data Lake vs Data Warehouse

Dockerize your Python Application

Dockerize your Python Applic…keep exploring and st…Viewer does not support full SVG 1.1

# Big Data Engineer Roadmap

Big Data Engineer

Architectural Patterns & Best Practices (video)

Architectural Patterns & Bes…

Horizontal vs vertical scaling

Horizontal vs vertical scali…Map ReduceData ReplicationJob & Task TrackerName & Data Nodes

Check the Awesome Big Data List

Check the Awesome Big Data L…Hadoop (large data)Spark (in memory)HDFSLoading data with Sqoop and PigBig Data ArchitecturesPrinciplesToolsRAPIDS (on GPU)

Flume, Scribe: For Unstruct Data

Flume, Scribe: For Unstruct…Data Warehouse with HiveElastic (EKL) StackAvroFlinkMLFlowKafka & KSQLDatabasesStorm: Hadoop Realtime

to get data (e.g. logging), search, analyze

   and visualize it in realtime

to get data (e.g. logging),…CassandraMongoDB, Neo4jScalabilityZooKeeperKubernetesCloud ServicesAWS SageMakerGoogle ML Engine

Microsoft Azure

Machine Learning Studio

Microsoft Azure…keep exploring and st…Awesome Production MLViewer does not support full SVG 1.1

# 🚦 Wrap Up

If you think any of the roadmaps can be improved, please do open a PR with any updates and submit any issues. Also, we will continue to improve this, so you might want to watch/star this repository to revisit.

# 🙌 Contribution

Have a look at the contribution docs for how to update any of the roadmaps

  • Open pull request with improvements
  • Discuss ideas in issues
  • Spread the word
  • Reach out with any feedback

