{"id":385,"date":"2018-11-24T08:55:46","date_gmt":"2018-11-24T03:25:46","guid":{"rendered":"https:\/\/www.techandtrain.com\/blog\/?p=385"},"modified":"2025-10-22T14:55:34","modified_gmt":"2025-10-22T09:25:34","slug":"how-to-solve-a-machine-learning-problem-1","status":"publish","type":"post","link":"https:\/\/www.techandtrain.com\/blog\/2018\/11\/how-to-solve-a-machine-learning-problem-1\/","title":{"rendered":"How to solve a machine learning problem ? &#8211; 1"},"content":{"rendered":"<p>&nbsp;<\/p>\n<ol>\n<li>Select a language like for example either of R or Python<\/li>\n<li>Select a machine learning package to use and associated data manipulation, charting, output, etc. packages<\/li>\n<li>Get and explore the data using techniques like Exploratory Data Analysis for an initial understanding of data and some inferences<\/li>\n<li>Break your original data set into training set and testing set. Clarify what you want to predict in testing set &#8211; for example do you want to give loan to customer based on his profile OR what services to offer based on their past recorded behavior in data sets. Typically testing set is smaller than training set and testing set would not have the prediction output (result) column in data set. That would be available in training set<\/li>\n<li>Find out dependent \/ independent variables, skewness, outliers in data, check if any values need to be converted into categorical values from numeric if they have only few states &#8211; typical examples: levels like 1, 2, 3 or YES\/NO type fields \/ columns<\/li>\n<li>Plot histograms, box plots, etc. in above step 4 for help<\/li>\n<li>Add missing values using various techniques: Either simpler options like add mean, median, mode depending on type of data OR you can use machine learning algorithm for the same for replacing missing value or creating dummy fields \/ columns<\/li>\n<li>Move onto feature engineering by creating completely new variables from available data OR \/ AND transform by adding thresholds, etc. to remove outliers. Find out the important feature\/s and check the relevance of the newly created features. If the new features have high co-relation to earlier features \/ variables you may not get many new inferences (mostly) so it would be good to do some more manipulation to get create new variables which have new inferences \/ results \/ observations<\/li>\n<li>Select your statistical model and create the tasks for machine learning (ML)<\/li>\n<li>Train your ML tasks with training data using the selected algorithm like decision tree, regression, random forest, etc. based on the fitment and suitability<\/li>\n<li>Predict using prediction task based on your testing data set from the trained model in step 10<\/li>\n<li>Check your accuracy by observing the result in real situation versus your result from step 11<\/li>\n<\/ol>\n<p>This is part 1 of the series on Machine Learning. Treat this as a generic guideline. Many times we will be required to tailor this to various situations and data sets in which case the steps will get enhanced \/ substituted \/ refined as per requirement.<\/p>\n<p>Reach out to me at\u00a0<a href=\"http:\/\/mailto:neil@techandtrain.com\/\" target=\"_blank\" rel=\"nofollow noopener\">neil@techandtrain.com<\/a>\u00a0if you want to discuss Data Science \/ R \/ Java \/ Python \/ etc. or want to conduct a training for MBA \/ BE \/ MCA \/ MSc students or are interested in having a workshop for your managers \/ executives on Data Science \/ R \/ Java \/ AWS \/ Excel \/ etc.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Select a language like for example either of R or Python Select a machine learning package to use and [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":386,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[72],"tags":[83],"class_list":["post-385","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analytics","tag-machine-learning"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2018\/11\/ML.jpg?fit=1920%2C1288&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p7do02-6d","jetpack-related-posts":[{"id":557,"url":"https:\/\/www.techandtrain.com\/blog\/2020\/02\/what-are-we-doing-in-ai-ml-data-science-decision-science-analytics-world-glossary\/","url_meta":{"origin":385,"position":0},"title":"What are we doing in AI \/ ML \/ Data Science \/ Decision Science \/ Analytics World? &#8211; Glossary","author":"Neil Harwani","date":"February 17, 2020","format":false,"excerpt":"Over the last few years I have explored, programmed, worked in, researched and taught Data Science \/ AI \/ ML \/ Analytics \/ Decision Science to multiple students and with many software professionals. I have collected many keywords that you can google and explore. This will help you to keep\u2026","rel":"","context":"In &quot;Analytics&quot;","block_context":{"text":"Analytics","link":"https:\/\/www.techandtrain.com\/blog\/category\/analytics\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2020\/02\/Mathematics.jpg?fit=960%2C490&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2020\/02\/Mathematics.jpg?fit=960%2C490&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2020\/02\/Mathematics.jpg?fit=960%2C490&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2020\/02\/Mathematics.jpg?fit=960%2C490&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":1541,"url":"https:\/\/www.techandtrain.com\/blog\/2026\/04\/dimensions-for-artificial-intelligence-genai-llms-deep-learning-neural-networks-data-science-to-ponder-on-part-1-assisted-by-ai-chatgpt\/","url_meta":{"origin":385,"position":1},"title":"Dimensions for Artificial Intelligence \/ GenAI \/ LLMs \/ Deep Learning \/ Neural Networks \/ Data Science to ponder on &#8211; Part 1-Assisted by AI &#8211; ChatGPT","author":"Neil Harwani","date":"April 17, 2026","format":false,"excerpt":"\ud83e\udde0 1. Model Performance & Quality Beyond accuracy: Precision \/ Recall \/ F1-score ROC-AUC Calibration (probability correctness) Generalization ability Robustness (noise, adversarial inputs) Stability (variance across runs) Overfitting \/ Underfitting control Latency (response time) Throughput (requests per second) \u2696\ufe0f 2. Responsible AI \/ Ethics Along with fairness, bias, explainability, interpretability:\u2026","rel":"","context":"In &quot;AIML&quot;","block_context":{"text":"AIML","link":"https:\/\/www.techandtrain.com\/blog\/category\/aiml\/"},"img":{"alt_text":"Image credit: www.Pixabay.com","src":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2026\/04\/Dimensions.jpg?fit=640%2C360&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2026\/04\/Dimensions.jpg?fit=640%2C360&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2026\/04\/Dimensions.jpg?fit=640%2C360&ssl=1&resize=525%2C300 1.5x"},"classes":[]},{"id":1216,"url":"https:\/\/www.techandtrain.com\/blog\/2024\/02\/notes-on-explainability-interpretability-in-machine-learning-chatgpt-bard-generated\/","url_meta":{"origin":385,"position":2},"title":"Notes on explainability &amp; interpretability in Machine Learning &#8211; ChatGPT &amp; BARD generated","author":"Neil Harwani","date":"February 4, 2024","format":false,"excerpt":"Explainability and interpretability in neural networks are crucial for understanding how these models make decisions, especially in critical applications like healthcare, finance, and autonomous vehicles. Several software tools and libraries have been developed to aid in this process, providing insights into the inner workings of complex models. Here are some\u2026","rel":"","context":"In &quot;AIML&quot;","block_context":{"text":"AIML","link":"https:\/\/www.techandtrain.com\/blog\/category\/aiml\/"},"img":{"alt_text":"Credits: www.Pixabay.com","src":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/02\/NN.png?fit=1069%2C1200&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/02\/NN.png?fit=1069%2C1200&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/02\/NN.png?fit=1069%2C1200&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/02\/NN.png?fit=1069%2C1200&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/02\/NN.png?fit=1069%2C1200&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":1532,"url":"https:\/\/www.techandtrain.com\/blog\/2026\/04\/keywords-notes-from-executive-masters-in-data-science-for-decision-making-at-iit-gandhinagar-part-1-assisted-by-chatgpt\/","url_meta":{"origin":385,"position":3},"title":"Keywords &amp; Notes from Executive Masters in Data Science for Decision Making at IIT Gandhinagar &#8211; Part 1 &#8211; Assisted by ChatGPT","author":"Neil Harwani","date":"April 11, 2026","format":false,"excerpt":"Here are 20 high-quality keywords for each category, structured for learning, research, and practical application: 1. Advanced Probability & Statistics Bayesian Inference Markov Chains Stochastic Processes Central Limit Theorem Hypothesis Testing Maximum Likelihood Estimation (MLE) Bayesian Networks Copulas Multivariate Distributions Monte Carlo Simulation Gibbs Sampling Hidden Markov Models (HMM) Variational\u2026","rel":"","context":"In &quot;Academics&quot;","block_context":{"text":"Academics","link":"https:\/\/www.techandtrain.com\/blog\/category\/academics\/"},"img":{"alt_text":"Image credit: www.Pixabay.com","src":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2026\/04\/DS-scaled.jpg?fit=1200%2C800&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2026\/04\/DS-scaled.jpg?fit=1200%2C800&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2026\/04\/DS-scaled.jpg?fit=1200%2C800&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2026\/04\/DS-scaled.jpg?fit=1200%2C800&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2026\/04\/DS-scaled.jpg?fit=1200%2C800&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":1426,"url":"https:\/\/www.techandtrain.com\/blog\/2025\/02\/top-100-mathematics-keywords-for-data-science-part-1\/","url_meta":{"origin":385,"position":4},"title":"Top 100 mathematics keywords for Data Science &#8211; Part 1","author":"Neil Harwani","date":"February 22, 2025","format":false,"excerpt":"Whoever is teaching you data science without teaching you Mathematics especially optimization is not teaching it right to you. That's my biggest learning from Master of Data Science at IIT Gandhinagar - it will take you good 2 years to learn the related mathematics in all four major areas below.\u2026","rel":"","context":"In &quot;Data Science&quot;","block_context":{"text":"Data Science","link":"https:\/\/www.techandtrain.com\/blog\/category\/data-science\/"},"img":{"alt_text":"Image credit: www.Pixabay.com","src":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2025\/02\/Data.jpg?fit=1200%2C801&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2025\/02\/Data.jpg?fit=1200%2C801&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2025\/02\/Data.jpg?fit=1200%2C801&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2025\/02\/Data.jpg?fit=1200%2C801&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2025\/02\/Data.jpg?fit=1200%2C801&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":1242,"url":"https:\/\/www.techandtrain.com\/blog\/2024\/04\/cyber-security-tips-for-portals-generated-by-chatgpt-gemini-part-1\/","url_meta":{"origin":385,"position":5},"title":"Cyber security tips for Portals &#8211; Generated by ChatGPT &amp; GEMINI &#8211; Part 1","author":"Neil Harwani","date":"April 8, 2024","format":false,"excerpt":"Cyber security is a critical concern for portal applications, which often serve as gateways to a wide range of resources and services. Here are some vital tips to enhance the cyber security posture of portal applications: 1. Use Strong Authentication Mechanisms: Implement multi-factor authentication (MFA) to add an extra layer\u2026","rel":"","context":"In &quot;Cyber Security&quot;","block_context":{"text":"Cyber Security","link":"https:\/\/www.techandtrain.com\/blog\/category\/cyber-security\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/04\/CyberSecurity.png?fit=1200%2C675&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/04\/CyberSecurity.png?fit=1200%2C675&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/04\/CyberSecurity.png?fit=1200%2C675&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/04\/CyberSecurity.png?fit=1200%2C675&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.techandtrain.com\/blog\/wp-content\/uploads\/2024\/04\/CyberSecurity.png?fit=1200%2C675&ssl=1&resize=1050%2C600 3x"},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/posts\/385","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/comments?post=385"}],"version-history":[{"count":4,"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/posts\/385\/revisions"}],"predecessor-version":[{"id":391,"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/posts\/385\/revisions\/391"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/media\/386"}],"wp:attachment":[{"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/media?parent=385"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/categories?post=385"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.techandtrain.com\/blog\/wp-json\/wp\/v2\/tags?post=385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}