Tag Archives: Data Science

Four waves of Artificial Intelligence & Machine Learning

While teaching students in two different courses (AIML & “Data Science and Analysis”), there was a requirement to categorize historical AI & ML along with it’s interface with Data Science.

To start: AI is the superset, ML is a subset of AI, Neural Networks (Deep Learning) are specialized subsets of ML.

Below is a categorization of AIML across four waves and it’s interface with Data Science:

Wave 1:

Concepts: Traditional topics like state space search, heuristics, knowledge representation, expert systems, fuzzy logic, problem solving languages and such.

UseCases: Think a small basic robot moving through your home and taking decisions on avoiding obstacles.

Wave 2:

Concepts: Standard algorithms built on top of Regression, Statistics, Algebra, Probability, Calculus and such – Classification, Decision Trees, Association Mining, Clustering, Ensemble methods, Random Forest, SVM and so on. NLP, Computer vision, scanning solutions, advanced search and such areas also evolved here in parallel or with the help of these algorithms.

UseCases: Spam detection, Decision making, Co-related variables related predictions, Prescriptive Analytics and so on.

Wave 3:

Concepts: Replicating human / animal brain. Neural Networks. Storing and managing very large amount of data (structured & un-structured)

UseCases: BigData, Self driving cars, Image recognition, Complex reasoning, Medical diagnosis, Chat bots, Personal assistants, potentially unlimited usecases interfacing with all usecases across AIML & Data Science.

Wave 4:

Concepts & UseCases: Explanability, Interpretability – Understanding the complexity of artificial intelligence & machine learning models. UI & Low code driven AIML (Neural Networks), one shot learning, hardware optimized AIML. Deep Learning. BERT and newer context driven algorithms also are in this area, Natural Language Generation is another area here.

Where does Data Science interface with AIML:

  • Non structured data analysis
  • Natural language generation
  • Sentiment analysis
  • Use of standard algorithms to analyse structured data
  • Building insights & making predictions / prescriptions and so on

Email me: Neil@TechAndTrain.com

What are we doing in AI / ML / Data Science / Decision Science / Analytics World? – Glossary

Over the last few years I have explored, programmed, worked in, researched and taught Data Science / AI / ML / Analytics / Decision Science to multiple students and with many software professionals. I have collected many keywords that you can google and explore. This will help you to keep pace and learn about things happening is these areas. It’s like a glossary of words to search over internet. It’s a mix and match of technologies, algorithms, concepts, AI / ML / Information Technology terms, BigData words and so on in no particular order. I will keep expanding this till it’s a relatively exhaustive list.

  • Automatic Machine Learning
  • Transfer Learning
  • Explainable Machine Learning
  • Keras
  • PyTorch
  • MLR
  • R
  • Python
  • Ggplot2
  • MathplotLib
  • MLib
  • Spark
  • Hadoop
  • Tableau
  • Chatbots
  • Talend
  • MongoDB
  • Neo4j
  • Kafka
  • ELK
  • NoSQL
  • Cassandra
  • AWS SageMaker
  • SVM
  • Decision Trees
  • Regression: Logistic, Multiple, Simple Linear, Polynomial
  • Scikit Learn
  • KNIME
  • BERT
  • NLG
  • NLP
  • Random Forest
  • Hyper parameters
  • Boosting
  • Association rules / mining – Apriori, FP-Growth
  • Data mining
  • OpenCV
  • Self driving cars
  • AI / Memory embedded SOCs, GPUs, TPUs
  • Neural engine chipsets
  • Neural Networks
  • Deep Learning
  • EDA
  • Statistical & Algorithmic modelling
  • Sampling
  • Probability distributions
  • Hypothesis testing
  • Intervals, extrapolation, interpolation
  • Scaling
  • Normalization
  • Agents, search, constraint satisfaction
  • Rules based systems
  • Semantic net
  • Propositional logic
  • Fuzzy reasoning
  • Probabilistic learning
  • First order logic
  • Game theory
  • Pipeline building
  • Ludwig
  • Bayesian belief networks
  • Anaconda Navigator
  • Jupyter
  • Synthetic data
  • Google dataset search
  • Kaggle
  • CNN / RNN / Feed forward / Back propagation / Multi-layer
  • Tensorflow
  • Deepfakes
  • KNN
  • K means clustering
  • Naive Bayes
  • Dimensionality reduction
  • Feature engineering
  • Supervised, unsupervised & reinforcement learning
  • Markov model
  • Time series
  • Categorical & Continuous data
  • Imputation
  • Data analysis
  • Classification / Clustering / Trees / Hyperplane
  • Differential calculus
  • Testing & training data
  • Visualization
  • Missing data treatment
  • Scipy
  • Pandas
  • LightGBM
  • Numpy
  • Dplyr
  • Google Collaboratory
  • PyCharm
  • Plotly
  • Shiny
  • Caret
  • NLTK, Stanford NLP, OpenNLP
  • Artificial intelligence
  • SQL / PLSQL
  • Data warehousing
  • Cognitive computing
  • Coral
  • Arduino
  • Raspberry Pi
  • RTOS
  • DARPA Spectrum Challenge
  • 100 page ML book
  • Equations, Functions, and Graphs
  • Differentiation and Optimization
  • Vectors and Matrices
  • Statistics and Probability
  • Operations management & research
  • Unstructured, semi-structured & structured data
  • Five Vs
  • Descriptive, Predictive & Prescriptive analytics
  • Model accuracy
  • IoT / IIoT
  • Recommendation Systems
  • Real Time Analytics
  • Google Analytics

If you are learning something by googling these topics, feel free to provide suggestions for adding more words here. You are welcome to discuss / suggest on top of this article as well. Thank you for reading.

Email me: Neil@TechAndTrain.com

Visit my creations:

  • www.TechAndTrain.com
  • www.QandA.in
  • www.TechTower.in

What should be the subjects & course structure for teaching Data Analytics / Data Science in MBA?

Data Science & Analytics including Operations / Decision Science are evolving fields which are in demand currently for various reasons. Most companies are experimenting and creating projects / products around analytics / data science. I am listing the subjects & courses that an MBA student should take to cover Data Science / Analytics:

  1. Mathematics — Intermediate level statistics, linear algebra, discreet mathematics & basic calculus
  2. Introduction to Business Analytics & Data Science — covering basics of the subjects like what is machine learning, artificial intelligence, major software / products, data science / analytics basics including various types of data, sentiment analysis, basics of algorithms and contemporary topics
  3. BigData ecosystem — Concepts of Hadoop, Spark, MapReduce, NoSQL and the ecosystem around it
  4. Business Intelligence — covering reporting, dash-boarding, visualization and contemporary topics around it
  5. Business Analysis — covering concepts of how to collect requirements, build a project plan / statement of work, proposals, proof of concepts, concepts of AGILE / DevOps, data analysis, business process re-engineering and similar
  6. Programming in R & Python for managers — Intermediate level topics including data manipulation / cleaning, charting / visualization, running major machine learning algorithms, mathematics functions and libraries
  7. Data warehousing — covering the introduction of it and multi-dimensional cubes, business dimensions, star / snowflake schema, process of ETL and similar
  8. Data mining — covering major algorithms in supervised / unsupervised / semi-supervised areas and their implementation
  9. Cloud computing — covering cloud architecture, offerings & major product companies
  10. Operations subjects — which should include Operations management, Operations research, Project Management, Logistics & Supply Chain management, Total Quality Management
  11. Case study, use case and industry driven internships and projects which give exposure to students using proprietary / open source tools & products mapped to domains like Digital marketing, Financial analytics, HR analytics, Web / Mobile analytics, Advertising, Operational Analytics, eCommerce, Manufacturing, Banking, etc. used in industry to join all of the above together into implementation
  12. Above goes with an assumption that students already have intermediate level skills in productivity tools like MS-Office / Google Docs/Sheets, Linux, Year 1 general management subjects like Finance, HR, Marketing, etc.

Reach out to me at neil@TechAndTrain.com if you want to discuss Data Science / R / Java / Python / etc. or want to conduct a training for MBA / BE / MCA / MSc students or are interested in having a workshop for on Data Science / R / Java / AWS / Excel / etc.

How to explore and learn “Analytics & Data Science” ?

One of my students asked me as to how can someone explore and learn Analytics / Data Science domain with an intention to build their career in it ?

There are three types of roles available in Data Science / Analytics:

  1. Functional consultant like a Business / Data Analyst
  2. Technical Consultant like a Data Engineer.
  3. Mixed profiles like a Data Scientist where you need to know the business domain and technology both

Here are some suggestions to start your journey in Analytics:

1. Learn either of R or Python to start with. Its good if you know Java & AWS as well.

2. Explore concepts of Machine Learning, Artificial Intelligence, BigData, BlockChain, NoSQL & IoT

3. Check the free or cheap courses on Coursera, edX, Udemy, Khan Academy, NPTEL, MIT OCW, etc. for above topics

4. If you have LinkedIn premium account, good courses are available in LinkedIn Learning as well

5. Regularly check job descriptions for Data Scientist, Data Analyst & Data Engineer – This tells you what’s happening in the market and where to align your skills

6. Follow people on LinkedIn / Twitter / Medium / etc. who are into Data Science / Analytics. They post really good information there

7. Regularly read EconomicTimes, LiveMint, Business Standard, CNN Money, BBC Business, Bloomberg, similar sites and update yourself in at-least one functional domain like Digital Marketing, Finance, HR, Operations, Banking, Insurance, etc. via NPTEL, MIT OCW, DataScienceCentral.com, Quora, etc. Explore certifications like Google Analytics.

8. You especially may want to follow people like Andriy Burkov, Andrew NG, Liz Ryan, etc. and sites like Harvard Business Review, Inc., Forbes, Technology Review, ZDNet.com, Kaggle & Sloan Management Review. Here is an example list.

9. Make a list of blogs to follow around these topics too. Here is an example list. 

10. Meet like minded professionals and students in your area using Meetup app. Build your own blog / website / small startup on what you are learning, write articles on LinkedIn / Medium, etc. which will help you to network. Offer some consulting to startups in and around your area. You can get a target list here for Gujarat (some are proper established companies, some are small / young): https://www.techandtrain.com/gujjobs.html – I update this once a month

11. Revise / study concepts of statistics, calculus, linear algebra, operations research and discrete mathematics

12. Explore the tools used by Data Scientists. Here is an example list. 

Many jobs in Analytics / Data Science are available. You can go light on technical topics if you intend to be a functional consultant. This is an evolving field and one website or one book won’t give you full information. Get into a habit of surfing from across the net and buy few good books around above topics. Things change / update / evolve in Analytics every few months.

Reach out to me at neil@techandtrain.com if you want to discuss Data Science / R / Java / etc. or want to conduct a training for MBA / BE / MCA / MSc students or are interested in having a workshop for your managers / executives on Data Science / R / Java / AWS / Excel / etc.

Learning R Programming – Part 1

As per Glassdoor Top 5 skills in Data Science for job openings are:

  1. Python
  2. R
  3. SQL
  4. Hadoop
  5. Java

Most Java developers know SQL, Hadoop & Java to a good extent in today’s environment, two important skills Python & R should be learned by the Java developer / architect / manager if s/he wants to contribute / work in Data Science area. In this article you will find a structured step by step approach for learning programming in R.

  1. To start with install R and familiarize yourself with R Console & R Script interface. You can run commands on both but it’s best to write multiple commands and try them out in script editor. Use short cut CTRL + R to run your commands in R Script editor.
  2. Explore menu options like Package -> Install / Load / Choose CRAN Mirror. By default many commands for statistics, visualization, etc. are given in R by default. Big set of libraries are already loaded into R by default and 100s more are available. Select any mirror to download new packages and install / load them step by step. You will need working internet connection. Learn how to set working directory. You can see default libraries available in R using library()
  3. From there move on to various R Objects / Data Types – Explore Data Frame, Vector, List, Matrices, Arrays and Factors. Try out examples for the same.
  4. Next step learn to load / read and write datasets by commands like read.csv / write.csv. You can also read / write excel sheets but for it you will need other packages. See the basic commands like summary, structure & fix to analyze / edit your dataset
  5. Next step – go through various categories of operators (logical, mathematical, relational) and concepts like pipe %>%, constants, rules for naming identifiers followed by various statistical functions directly available in R. You can get help on a command by using ?<COMMAND>. Also, learn to create functions and use conditions like if
  6. By now you should revise basics of statistics & various visualization charts which are taught typically in Year 1 / Semester 1 of MBA. Explore various default commands for statistics built into R by default. Some examples – mean, variance, standard deviation, etc.
  7. Learn to manipulate / read / write datasets using subset, sample_n & sample_frac and using dplyr package which has commands like select & filter among others
  8. Check various types of default visualization commands in R for various charts like barplot & pie. Post this learn how to use ggplot2 package
  9. You will get many datasets at kaggle.com and various websites like stock exchanges – NSE / BSE, RBI, Open Data websites of various Governments and others
  10. Explore top 20 packages of R categorized by various areas as given below.

An advantage of learning R is that you will become better at statistics & data science. It’s much simpler than Java in terms of syntax and structure and is influenced by open source languages / scripting like Linux, etc.

Reach out to me at neil@techandtrain.com if you want to discuss R, conduct a training for MBA / BE / MCA / MSc students in R or want to conduct a workshop for your managers / executives on Data Science / R / Java / etc.

References:

Top 10 skills for Data Science – Glassdoor Economic Research

Top 20 packages in R

Universities & Colleges offering courses & degrees in Analytics / Data Science in India – Part 1

  • BITS Pilani – Offering MSc in Analytics, MTech in Computer Science with specialization in Analytics
  • IIM Calcutta, ISI Kolkata & IIT Kharagpur – PG Diploma in Business Analytics
  • Great Lakes, Chennai – PGP in Analytics
  • IIIT Bangalore – PGP in Analytics via UpGrad
  • XLRI – Executive Program in Data Science
  • NIRMA University – MBA – Courses in 2nd year MBA covering Analytics & Data Science
  • Shanti Business School, Ahmedabad – PGDM having 2nd year specialization in Analytics / Decision Science
  • Gujarat University – MSc in Machine Learning & Artificial Intelligence
  • PG Diploma in Data Science – Manipal Global Academy in Data Science
  • Big Data & Visual Analytics – SP Jain School of Global Management
  • Master of Business Administration (Data Sciences and Data Analytics)- Symbiosis Center for Information Technology
  • Indian Institute of Information Technology and Management – Kerala (IITM-K) M.Sc. In Computer Science with Specialization in Data Analytics
  • Indian Institute of Science, Bangalore M.Tech. In Computational and Data Science & M.Tech.(Research) / PhD
  • Executive Program in Business Analytics- IIM Calcutta
  • St. Xavier’s College (Autonomous), Ahmedabad. (Collaboration with Tata Consultancy Services Limited) – M.Sc. in Big Data Analytics
  • IIIT Hyderabad – MTech in Data Science / Analytics

Not a complete list but gives you good data points on courses / degrees on Data Science / Analytics in India