Category Archives: Analytics

Short list of good courses / links / books on Mathematics, Operating Systems and AIML / ChatGPT – Part 1

Here is a short list of good courses / links / books on Mathematics, Operating Systems and AIML / ChatGPT – Part 1:

Email me: Neil@HarwaniSystems.in

Management outcomes for departmental analytics – Part 1

What could be the outcomes for various departments in an organization in terms of using analytics? Here is part 1 for the same.

Background and context: Most organizations struggle at some point of time in their lifecycle for management outcomes and as per the latest trends try to use analytics to assist in better decision making. With more than 30-50 areas for algorithms / frameworks and knowledge in the quantitative and mathematics fields, where should the focus be? Based on my observations and learnings, focus could be on below areas in terms of outcomes:

Human Resources:

  • Employee satisfaction
  • Career advancement
  • Skills & Knowledge analysis
  • Compensation & benefits analysis
  • Leadership pipeline analysis
  • Scalability analysis
  • Cost analysis

Finance:

  • Sustainability in terms of profitability & continued revenue
  • Local community engagement
  • Investments for sustainability
  • Product & services evolution & sustainability

Product team:

  • Customer satisfaction scores
  • Revenue trends
  • Features analysis in terms of usability & adoption
  • Product support & customer success scores

Marketing:

  • Customer retention
  • Net new customer addition
  • Brand awareness & recall
  • Cost analysis for digital marketing & customer acquisition

Operations:

  • Customer satisfaction scores
  • Turn around and closure time for requests
  • Problem analysis & cause finding

Email me: Neil@HarwaniSystems.in

Four waves of Artificial Intelligence & Machine Learning

While teaching students in two different courses (AIML & “Data Science and Analysis”), there was a requirement to categorize historical AI & ML along with it’s interface with Data Science.

To start: AI is the superset, ML is a subset of AI, Neural Networks (Deep Learning) are specialized subsets of ML.

Below is a categorization of AIML across four waves and it’s interface with Data Science:

Wave 1:

Concepts: Traditional topics like state space search, heuristics, knowledge representation, expert systems, fuzzy logic, problem solving languages and such.

UseCases: Think a small basic robot moving through your home and taking decisions on avoiding obstacles.

Wave 2:

Concepts: Standard algorithms built on top of Regression, Statistics, Algebra, Probability, Calculus and such – Classification, Decision Trees, Association Mining, Clustering, Ensemble methods, Random Forest, SVM and so on. NLP, Computer vision, scanning solutions, advanced search and such areas also evolved here in parallel or with the help of these algorithms.

UseCases: Spam detection, Decision making, Co-related variables related predictions, Prescriptive Analytics and so on.

Wave 3:

Concepts: Replicating human / animal brain. Neural Networks. Storing and managing very large amount of data (structured & un-structured)

UseCases: BigData, Self driving cars, Image recognition, Complex reasoning, Medical diagnosis, Chat bots, Personal assistants, potentially unlimited usecases interfacing with all usecases across AIML & Data Science.

Wave 4:

Concepts & UseCases: Explanability, Interpretability – Understanding the complexity of artificial intelligence & machine learning models. UI & Low code driven AIML (Neural Networks), one shot learning, hardware optimized AIML. Deep Learning. BERT and newer context driven algorithms also are in this area, Natural Language Generation is another area here.

Where does Data Science interface with AIML:

  • Non structured data analysis
  • Natural language generation
  • Sentiment analysis
  • Use of standard algorithms to analyse structured data
  • Building insights & making predictions / prescriptions and so on

Email me: Neil@TechAndTrain.com

Building data models that everyone can understand and more importantly believe

Building data models that everyone can understand and more importantly believe. Faculty Article – Author: Mr. Balakrishnan Unny & Mr. Neil Harwani. Thank you Sapience – IMNU’s (Nirma University) Alumni Newsletter for publishing our article in Changing Times 2.0 (A Special Edition).

Data Analysis Process in Analytics / Data Science

This article is based on understanding from Wikipedia article on Data Analysis & my experiences in Data Science / Analytics / AI / ML – https://en.wikipedia.org/wiki/Data_analysis

Various areas like Data Mining, Predictive Analysis, Exploratory Data Analysis, Text Analytics, Business Intelligence, Confirmatory Data Analysis and Data Visualization overlap with this area

Before starting your journey on solving an industry or academic or research problem in Data Science / Analytics / AI / ML / Decision Science, a fundamental step where many students & professionals struggle is Data Analysis. In this article, I am providing a step by step approach on analyzing your data. Directly starting with programming of various algorithms or neural network on your data could at times be counterproductive and should be avoided. Initial stage should involve robust data analysis via steps given below followed by model building which can include custom or already proven algorithms or a derivative of some popular models. For each of the points discussed below, I have added additional information on top of interpretation of Wikipedia information based on my experience in industry towards the end of each of the points or I have added new points post the interpretations.

Your steps for data analysis should generally be:

  1. Setup your data analysis process at a high level with your objectives – inspecting data, cleaning it, processing (could include dimensionality reduction / feature engineering), transformation, modelling and communicating it. Many forget the functional and feedback loop in this process setup to improve data quality – that must be included too.
  2. Next step is in understanding the data in terms of what is it telling us. Data could be quantitative style numbers or textual or a mix of it. Treatment for all three is different. For quantitative / numerical data, we try to understand whether it is time-series, ranking, part to whole, deviation, frequency distribution, correlation, nominal or geographical or geospatial data. For textual or mixed type of data we need to use approaches of text mining, sentiment analysis, natural language processing to get insights around frequency of words, influential words & sentences by weight, trends, categories, clusters and more. Most of this article revolves around quantitative or numerical data perse and not textual data. I have provided a very brief idea on textual data analysis here in this point.
  3. Next step would be to have the quantitative techniques being applied on the data in terms of sanity, audit / reconciliation of totals via formulas, relationships between data, checking things like whether variables in data are related in terms of correlation / sufficiency / necessity / etc. I would suggest using R Studio or similar tool for this step.
  4. Post this we want to actually perform actions like filtering, sorting, checking range and classes, summary, clusters, relationships, context, extremes, etc. At this stage, exploratory data analysis techniques come in very handy where we use various libraries which provide graphical representation. Excel & Tableau come in handy here.
  5. Our next step will be to check for biases, deciphering facts & opinions, deciphering any numerical incorrect / irrelevant inferences which are being projected and need correction / improvement. This needs detailed study of data from domain / functional perspective and applying statistical analysis on it. Working with a business / functional consultant in this phase is especially useful.
  6. Some areas which we need to take care of include quality of data, quality of measurements, transformations various variables / observations into log scale or others like what we have on richter scale for earthquakes, mapping to objectives and characteristics. This is an intuitive step where visualizing data through various transformations in R / Python / etc. using libraries like Ggplot2, Plotly, Matplotlib, etc. helps.
  7. Next comes checking outliers, missing values, randomness, analysis & plotting various of charts based on type of data whether categorical or continuous. This is statistical analysis & visualization where I find R to be most suited.
  8. Building models around our data analysis steps could involve linear, non-linear models and checking values via hypothesis testing and mapping to algorithms to process, predict, cluster, find trends and so on. Products / tools like R / Python with libraries like Scikit learn, Numpy, Pandas, MLR, Caret, Keras, TensorFlow, etc. help here
  9. While running the models take care of cross-validation of data & sensitivity analysis – This can generally be done using some options in model training & testing phase for supervised learning.
  10. Feedback loop to circle and improve data & results, accuracy analysis and improvement, pipeline building, interpretation of results & functional mapping to domain are additional things that we need to consider on top of the basics given in Wikipedia article. Also, things like dimensionality reduction techniques like PCA, SVD and such need to be explored in detail as they are helpful in this analysis.

Additional information on top of what is in Wikipedia article:

  1. Explainable AI / ML – https://en.wikipedia.org/wiki/Explainable_artificial_intelligence
  2. Interpretable ML – https://statmodeling.stat.columbia.edu/2018/10/30/explainable-ml-versus-interpretable-ml/
  3. Tools / languages / products to use: R, Python, Pandas, Numpy, Tableau and so on
  4. EDA – https://en.wikipedia.org/wiki/Exploratory_data_analysis
  5. Which chart to use – https://www.tableau.com/learn/whitepapers/which-chart-or-graph-is-right-for-you
  6. List of charts – https://python-graph-gallery.com/all-charts/
  7. Confirmatory data analysis – https://en.wikipedia.org/wiki/Statistical_hypothesis_testing
  8. Singular Value Decomposition – https://en.wikipedia.org/wiki/Singular_value_decomposition
  9. Dimensionality Reduction – https://en.wikipedia.org/wiki/Dimensionality_reduction

Email me: Neil@TechAndTrain.com

Visit my creations:

  • www.TechAndTrain.com
  • www.QandA.in
  • www.TechTower.in

What are we doing in AI / ML / Data Science / Decision Science / Analytics World? – Glossary

Over the last few years I have explored, programmed, worked in, researched and taught Data Science / AI / ML / Analytics / Decision Science to multiple students and with many software professionals. I have collected many keywords that you can google and explore. This will help you to keep pace and learn about things happening is these areas. It’s like a glossary of words to search over internet. It’s a mix and match of technologies, algorithms, concepts, AI / ML / Information Technology terms, BigData words and so on in no particular order. I will keep expanding this till it’s a relatively exhaustive list.

  • Automatic Machine Learning
  • Transfer Learning
  • Explainable Machine Learning
  • Keras
  • PyTorch
  • MLR
  • R
  • Python
  • Ggplot2
  • MathplotLib
  • MLib
  • Spark
  • Hadoop
  • Tableau
  • Chatbots
  • Talend
  • MongoDB
  • Neo4j
  • Kafka
  • ELK
  • NoSQL
  • Cassandra
  • AWS SageMaker
  • SVM
  • Decision Trees
  • Regression: Logistic, Multiple, Simple Linear, Polynomial
  • Scikit Learn
  • KNIME
  • BERT
  • NLG
  • NLP
  • Random Forest
  • Hyper parameters
  • Boosting
  • Association rules / mining – Apriori, FP-Growth
  • Data mining
  • OpenCV
  • Self driving cars
  • AI / Memory embedded SOCs, GPUs, TPUs
  • Neural engine chipsets
  • Neural Networks
  • Deep Learning
  • EDA
  • Statistical & Algorithmic modelling
  • Sampling
  • Probability distributions
  • Hypothesis testing
  • Intervals, extrapolation, interpolation
  • Scaling
  • Normalization
  • Agents, search, constraint satisfaction
  • Rules based systems
  • Semantic net
  • Propositional logic
  • Fuzzy reasoning
  • Probabilistic learning
  • First order logic
  • Game theory
  • Pipeline building
  • Ludwig
  • Bayesian belief networks
  • Anaconda Navigator
  • Jupyter
  • Synthetic data
  • Google dataset search
  • Kaggle
  • CNN / RNN / Feed forward / Back propagation / Multi-layer
  • Tensorflow
  • Deepfakes
  • KNN
  • K means clustering
  • Naive Bayes
  • Dimensionality reduction
  • Feature engineering
  • Supervised, unsupervised & reinforcement learning
  • Markov model
  • Time series
  • Categorical & Continuous data
  • Imputation
  • Data analysis
  • Classification / Clustering / Trees / Hyperplane
  • Differential calculus
  • Testing & training data
  • Visualization
  • Missing data treatment
  • Scipy
  • Pandas
  • LightGBM
  • Numpy
  • Dplyr
  • Google Collaboratory
  • PyCharm
  • Plotly
  • Shiny
  • Caret
  • NLTK, Stanford NLP, OpenNLP
  • Artificial intelligence
  • SQL / PLSQL
  • Data warehousing
  • Cognitive computing
  • Coral
  • Arduino
  • Raspberry Pi
  • RTOS
  • DARPA Spectrum Challenge
  • 100 page ML book
  • Equations, Functions, and Graphs
  • Differentiation and Optimization
  • Vectors and Matrices
  • Statistics and Probability
  • Operations management & research
  • Unstructured, semi-structured & structured data
  • Five Vs
  • Descriptive, Predictive & Prescriptive analytics
  • Model accuracy
  • IoT / IIoT
  • Recommendation Systems
  • Real Time Analytics
  • Google Analytics

If you are learning something by googling these topics, feel free to provide suggestions for adding more words here. You are welcome to discuss / suggest on top of this article as well. Thank you for reading.

Email me: Neil@TechAndTrain.com

Visit my creations:

  • www.TechAndTrain.com
  • www.QandA.in
  • www.TechTower.in

Three waves of Analytics – Notes on articles by Prof. Davenport

References:

ANALYTICS 1.0 – Business Intelligence, RDBMS & Data Warehousing

  • Vertical scaling
  • Better results and analysis meant higher processing power & memory
  • Complex systems
  • Chances of singular failure
  • Backup was compulsory
  • Storage in RDBMS
  • Transformation in business dimensions and facts in Data Warehouse
  • Descriptive analytics mainly

ANALYTICS 2.0 – BigData, Hadoop, NoSQL & Spark – In memory computing

Problems with Analytics 1.0

  • Costly hardware
  • Large amounts of data
  • Unstructured data

Solution

  • BigData
  • Hadoop – Large files
  • NoSQL – Small files or less size data
  • Horizontal scaling

Problems with BigData

  • Querying unstructured data
  • Large amount of data for real time processing not batch processing

Solution

  • PIG
  • HIVE
  • Spark – In-memory computing
  • Predictive analytics mainly

ANALYTICS 3.0 – Edge Computing, Data Rich Organizations, Real Time Analytics & more

Problems with Analytics 2.0

  • Most analysis was retrospective and for past data
  • Organization wide data also started getting collected but was unused
  • Real time data started to flow in big amounts

Solution

  • Data rich organizations
  • Use data from organization to build products not just mapped to market but also with own organization
  • E.g. Differentiated products in manufacturing to compete with mass economies of scale production
  • Edge computing
  • Real time processing
  • Combined data
  • Embedded analytics
  • Data discovery
  • Cross functional teams
  • Moving to Prescriptive & Real Time analytics

Email me: Neil@TechAndTrain.com

Visit my creations:

  • www.TechAndTrain.com
  • www.QandA.in
  • www.TechTower.in

What should be the subjects & course structure for teaching Data Analytics / Data Science in MBA?

Data Science & Analytics including Operations / Decision Science are evolving fields which are in demand currently for various reasons. Most companies are experimenting and creating projects / products around analytics / data science. I am listing the subjects & courses that an MBA student should take to cover Data Science / Analytics:

  1. Mathematics — Intermediate level statistics, linear algebra, discreet mathematics & basic calculus
  2. Introduction to Business Analytics & Data Science — covering basics of the subjects like what is machine learning, artificial intelligence, major software / products, data science / analytics basics including various types of data, sentiment analysis, basics of algorithms and contemporary topics
  3. BigData ecosystem — Concepts of Hadoop, Spark, MapReduce, NoSQL and the ecosystem around it
  4. Business Intelligence — covering reporting, dash-boarding, visualization and contemporary topics around it
  5. Business Analysis — covering concepts of how to collect requirements, build a project plan / statement of work, proposals, proof of concepts, concepts of AGILE / DevOps, data analysis, business process re-engineering and similar
  6. Programming in R & Python for managers — Intermediate level topics including data manipulation / cleaning, charting / visualization, running major machine learning algorithms, mathematics functions and libraries
  7. Data warehousing — covering the introduction of it and multi-dimensional cubes, business dimensions, star / snowflake schema, process of ETL and similar
  8. Data mining — covering major algorithms in supervised / unsupervised / semi-supervised areas and their implementation
  9. Cloud computing — covering cloud architecture, offerings & major product companies
  10. Operations subjects — which should include Operations management, Operations research, Project Management, Logistics & Supply Chain management, Total Quality Management
  11. Case study, use case and industry driven internships and projects which give exposure to students using proprietary / open source tools & products mapped to domains like Digital marketing, Financial analytics, HR analytics, Web / Mobile analytics, Advertising, Operational Analytics, eCommerce, Manufacturing, Banking, etc. used in industry to join all of the above together into implementation
  12. Above goes with an assumption that students already have intermediate level skills in productivity tools like MS-Office / Google Docs/Sheets, Linux, Year 1 general management subjects like Finance, HR, Marketing, etc.

Reach out to me at neil@TechAndTrain.com if you want to discuss Data Science / R / Java / Python / etc. or want to conduct a training for MBA / BE / MCA / MSc students or are interested in having a workshop for on Data Science / R / Java / AWS / Excel / etc.

How to solve a machine learning problem ? – 1

 

  1. Select a language like for example either of R or Python
  2. Select a machine learning package to use and associated data manipulation, charting, output, etc. packages
  3. Get and explore the data using techniques like Exploratory Data Analysis for an initial understanding of data and some inferences
  4. Break your original data set into training set and testing set. Clarify what you want to predict in testing set – for example do you want to give loan to customer based on his profile OR what services to offer based on their past recorded behavior in data sets. Typically testing set is smaller than training set and testing set would not have the prediction output (result) column in data set. That would be available in training set
  5. Find out dependent / independent variables, skewness, outliers in data, check if any values need to be converted into categorical values from numeric if they have only few states – typical examples: levels like 1, 2, 3 or YES/NO type fields / columns
  6. Plot histograms, box plots, etc. in above step 4 for help
  7. Add missing values using various techniques: Either simpler options like add mean, median, mode depending on type of data OR you can use machine learning algorithm for the same for replacing missing value or creating dummy fields / columns
  8. Move onto feature engineering by creating completely new variables from available data OR / AND transform by adding thresholds, etc. to remove outliers. Find out the important feature/s and check the relevance of the newly created features. If the new features have high co-relation to earlier features / variables you may not get many new inferences (mostly) so it would be good to do some more manipulation to get create new variables which have new inferences / results / observations
  9. Select your statistical model and create the tasks for machine learning (ML)
  10. Train your ML tasks with training data using the selected algorithm like decision tree, regression, random forest, etc. based on the fitment and suitability
  11. Predict using prediction task based on your testing data set from the trained model in step 10
  12. Check your accuracy by observing the result in real situation versus your result from step 11

This is part 1 of the series on Machine Learning. Treat this as a generic guideline. Many times we will be required to tailor this to various situations and data sets in which case the steps will get enhanced / substituted / refined as per requirement.

Reach out to me at neil@techandtrain.com if you want to discuss Data Science / R / Java / Python / etc. or want to conduct a training for MBA / BE / MCA / MSc students or are interested in having a workshop for your managers / executives on Data Science / R / Java / AWS / Excel / etc.

How to explore and learn “Analytics & Data Science” ?

One of my students asked me as to how can someone explore and learn Analytics / Data Science domain with an intention to build their career in it ?

There are three types of roles available in Data Science / Analytics:

  1. Functional consultant like a Business / Data Analyst
  2. Technical Consultant like a Data Engineer.
  3. Mixed profiles like a Data Scientist where you need to know the business domain and technology both

Here are some suggestions to start your journey in Analytics:

1. Learn either of R or Python to start with. Its good if you know Java & AWS as well.

2. Explore concepts of Machine Learning, Artificial Intelligence, BigData, BlockChain, NoSQL & IoT

3. Check the free or cheap courses on Coursera, edX, Udemy, Khan Academy, NPTEL, MIT OCW, etc. for above topics

4. If you have LinkedIn premium account, good courses are available in LinkedIn Learning as well

5. Regularly check job descriptions for Data Scientist, Data Analyst & Data Engineer – This tells you what’s happening in the market and where to align your skills

6. Follow people on LinkedIn / Twitter / Medium / etc. who are into Data Science / Analytics. They post really good information there

7. Regularly read EconomicTimes, LiveMint, Business Standard, CNN Money, BBC Business, Bloomberg, similar sites and update yourself in at-least one functional domain like Digital Marketing, Finance, HR, Operations, Banking, Insurance, etc. via NPTEL, MIT OCW, DataScienceCentral.com, Quora, etc. Explore certifications like Google Analytics.

8. You especially may want to follow people like Andriy Burkov, Andrew NG, Liz Ryan, etc. and sites like Harvard Business Review, Inc., Forbes, Technology Review, ZDNet.com, Kaggle & Sloan Management Review. Here is an example list.

9. Make a list of blogs to follow around these topics too. Here is an example list. 

10. Meet like minded professionals and students in your area using Meetup app. Build your own blog / website / small startup on what you are learning, write articles on LinkedIn / Medium, etc. which will help you to network. Offer some consulting to startups in and around your area. You can get a target list here for Gujarat (some are proper established companies, some are small / young): https://www.techandtrain.com/gujjobs.html – I update this once a month

11. Revise / study concepts of statistics, calculus, linear algebra, operations research and discrete mathematics

12. Explore the tools used by Data Scientists. Here is an example list. 

Many jobs in Analytics / Data Science are available. You can go light on technical topics if you intend to be a functional consultant. This is an evolving field and one website or one book won’t give you full information. Get into a habit of surfing from across the net and buy few good books around above topics. Things change / update / evolve in Analytics every few months.

Reach out to me at neil@techandtrain.com if you want to discuss Data Science / R / Java / etc. or want to conduct a training for MBA / BE / MCA / MSc students or are interested in having a workshop for your managers / executives on Data Science / R / Java / AWS / Excel / etc.