Machine Learning | Innovation Ideas blog

Select a language like for example either of R or Python
Select a machine learning package to use and associated data manipulation, charting, output, etc. packages
Get and explore the data using techniques like Exploratory Data Analysis for an initial understanding of data and some inferences
Break your original data set into training set and testing set. Clarify what you want to predict in testing set – for example do you want to give loan to customer based on his profile OR what services to offer based on their past recorded behavior in data sets. Typically testing set is smaller than training set and testing set would not have the prediction output (result) column in data set. That would be available in training set
Find out dependent / independent variables, skewness, outliers in data, check if any values need to be converted into categorical values from numeric if they have only few states – typical examples: levels like 1, 2, 3 or YES/NO type fields / columns
Plot histograms, box plots, etc. in above step 4 for help
Add missing values using various techniques: Either simpler options like add mean, median, mode depending on type of data OR you can use machine learning algorithm for the same for replacing missing value or creating dummy fields / columns
Move onto feature engineering by creating completely new variables from available data OR / AND transform by adding thresholds, etc. to remove outliers. Find out the important feature/s and check the relevance of the newly created features. If the new features have high co-relation to earlier features / variables you may not get many new inferences (mostly) so it would be good to do some more manipulation to get create new variables which have new inferences / results / observations
Select your statistical model and create the tasks for machine learning (ML)
Train your ML tasks with training data using the selected algorithm like decision tree, regression, random forest, etc. based on the fitment and suitability
Predict using prediction task based on your testing data set from the trained model in step 10
Check your accuracy by observing the result in real situation versus your result from step 11

This is part 1 of the series on Machine Learning. Treat this as a generic guideline. Many times we will be required to tailor this to various situations and data sets in which case the steps will get enhanced / substituted / refined as per requirement.

Reach out to me at neil@techandtrain.com if you want to discuss Data Science / R / Java / Python / etc. or want to conduct a training for MBA / BE / MCA / MSc students or are interested in having a workshop for your managers / executives on Data Science / R / Java / AWS / Excel / etc.

Innovation Ideas blog

Tag Archives: Machine Learning

Links on Machine Unlearning – Part 1

How to solve a machine learning problem ? – 1

Ideas on Innovation around Software. We Thrive On Ideas. We are Learner Centered, Open Source & Digital Focused.

Share this:

Share this:

Ideas on Innovation around Software. We Thrive On Ideas. We are Learner Centered, Open Source & Digital Focused.