Data Science Methodology Class 12 Questions And Answers

Share with others

Data Science Methodology Class 12 Questions and Answers – The CBSE has updated the syllabus for St. XII (Code 843). The NCERT Solutions and QA are made based on the updated CBSE textbook. All the important information is taken from the Artificial Intelligence Class XII Textbook Based on the CBSE Board Pattern.

Contents show

Data Science Methodology Class 12 Questions and Answers

1. How many steps are there in Data Science Methodology? Name them in order.

Answer: Data Science Methodology is a process with a prescribed sequence of iterative steps that data scientists follow to approach a problem and find a solution. Data Science Methodology which introduced by John Rollins, a Data Scientist at IBM Analytics. It consists of 10 steps.

Business Understanding
Analytic approach
Data requirements
Data collection
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Feedback

2. What do you mean by Feature Engineering?

Answer: Feature Engineering is a part of Data Preparation. Feature engineering is the process of selecting, modifying, or creating new features (variables) from raw data to improve the performance of machine learning models.

3. Data is collected from different sources. Explain the different types of sources with example.

Answer: Data collection is a systematic process of gathering observations or measurements. There are mainly two sources of data collection:

Primary data Source: A primary data source refers to the original source of data, where the data is collected firsthand through direct observation, experimentation, surveys, interviews, or other methods.
Secondary data Source: A secondary data source refers to the data which is already stored and ready for use. Data given in books, journals, websites, internal transactional databases, etc.

4. Which step of Data Science Methodology is related to constructing the data set? Explain.

Answer: Data Understanding encompasses all activities related to constructing the dataset. In this stage, we check whether the data collected represents the problem to be solved or not. Techniques such as descriptive statistics and visualization can be applied to the dataset, to assess the content, quality, and initial insights about the data.

5. Write a short note on the steps done during Data Preparation.

Answer: Data preparation is an important stesp in Data Science Methodology that ensuer the data is transformed into a state where it is easier to work with. Data preparation includes:

Cleaning of data
Combine data from multiple sources
Transform data into meaningful input variables

6. Differentiate between descriptive modelling and predictive modelling.

Answer:

Descriptive Modeling: It is a concept in data science and statistics that focuses on summarizing and understanding the characteristics of a dataset without making predictions or decisions. The goal of descriptive modeling is to describe the data rather than predict or make decisions based on it.
Predictive modeling: It involves using data and statistical algorithms to identify patterns and trends in order to predict future outcomes or values. It relies on historical data and uses it to create a model that can predict future behavior or trends or forecast what might happen next.

7. Explain the different metrics used for evaluating Classification models.

Answer: Evaluation metrics help assess the performance of a trained model on a test dataset, providing insights into its strengths and weaknesses. The classification models have a different evaluation metrics used for evaluating classification models.

Confusion Matrix
Accuracy
Precision and Recall
F-Score

8. Is Feedback a necessary step in Data Science Methodology? Justify your answer.

Answer: Feedback is necessary to take from users and clients, it helps to understand and observe how the model works in the deployed environment. This process continues until the model provides satisfactory and acceptable results.

9. Write a comparative study on train-test split and cross validation.

Answer:

Train-Test Split	Cross Validation
Normally applied on large datasets	Normally applied on small datasets
Divides the data into training data set and testing dataset.	Divides a dataset into subsets (folds), trains the model on some folds, and evaluates its performance on the remaining data.
Clear demarcation on training data and testing data.	Every data point at some stage could be in either testing or training data set.

10. Why is model validation important?

Answer: Model Validation offers a systematic approach to measure its accuracy and reliability, providing insights into how well it generalizes to new, unseen data. The benefits of Model Validation include

Enhancing the model quality.
Reduced risk of errors
Prevents the model from overfitting and underfitting.

11. Explain the procedure of k-fold cross validation with suitable diagram.

Answer: Cross Validation is a technique used to evaluate a model’s performance. It splits the data into multiple parts or folds. It trains the model on some folds and tests it on other folds and repeats this process for a number fixed by the data scientist.

In k-fold cross validation we will be working with k subsets of datasets. For example, if we divide the data into 5 folds or 5 pieces, as shown in below image, each being 20% of the full dataset, then k=5.

12. Data is the main part of any project. How will you find the requirements of data, collect it, understand the data and prepare it for modelling?

Answer: Data is important for any machine learning project, and data can be collected using different stages, but before using data, the first thing we have to do is clean the data and prepare it for modelling.

Stage 1: Understanding data requirements

Before collecting the data, first you have to understand the problem that you want to solve. You also have to identify if the data is structured data or unstructured data and how much data is needed; also, ensure compliance with privacy laws.

Stage 2: Data Collection

There are two different stages where data can be collected, but before collecting the data we have to ensure that data should complete, data should not have biased or noisy and data privacy and ethical issue etc. The two different types of data can be collected. (a) Primary data, where the data is collected from surveys, web scraping, and using a database. (b) Secondary Data, where the data is collected from research reports, previous projects, open datasets, etc.

Stage 3: Understanding and Exploring Data

To gain insights from the data, first understand the data. To understand the data, you have to analyse the data using statistics (mean, median, and mode), clean the data, visualise the data, find relationships between variables, handle missing data, duplicate data, etc.

Step 4: Data Preparation for Modelling

Once the data is cleaned, then it should be transformed into suitable data for machine learning using feature engineering, data splitting, and transformation. Feature engineering helps to create new features from existing ones using MinMaxScaler and StandardScaler. Splitting data helps to split the data into training and testing data, where 80% of the data are used for training and 20% of the data are used for testing. K-Fold cross-validation helps to improve model generalisation by testing on multiple datasets.

13. Can MSE be a negative value? Give reasons.

Answer: The MSE value cannot be negative. The difference between projected and actual values are always squared. As a result, all outcomes are either positive or negative.

14. Imagine that you want to create your first app. Create a list of questions you would develop to decompose this task.

Answer: To decompose this task, you would need to know the answer to a series of smaller problems:

what kind of app you want to create?
what will your app will look like?
who is the target audience for your app?
what will the graphics will look like?
what audio will you include?

15. Differentiate between training set and test set.

Answer: A training set is a set of historical data in which the outcomes are already known. Train Dataset: Used to fit the machine learning model. Test Dataset: Used to evaluate the fit machine learningmodel.

15. List the considerations which data scientists have to keep in mind during the testing stage.

Answer:

Considerations:

The volume of test data can be large, which presents complexities.
Human biases in selecting test data can adversely impact the testing phase, therefore, data validation is important.
Your testing team should test the AI and ML algorithms keeping model validation, successful learnability, and algorithm effectiveness in mind. Regulatory compliance testing and security testing are important since the system might dealwith sensitive data, moreover, the large volume of data makes performance testing crucial.

16. Explain the Cross Validation Procedure? In which situation is it better than a Train Test Split?

Answer: In cross-validation, we run our modeling process on different subsets of the data to get multiple measures of model quality. In k-fold cross-validation, the original dataset is equally divided into k subparts or folds. Out of the k-folds, for each iteration, one group is selected as test data, and the remaining (k-1) groups are selected as training data. This process is repeated for k times. The final accuracy of the model is calculated by taking the mean accuracy. When the dataset is smaller, cross-validation procedure should be selected for higher accuracy.

Disclaimer: We have taken an effort to provide you with the accurate handout of “Data Science Methodology Class 12 Questions and Answers“. If you feel that there is any error or mistake, please contact me at anuraganand2017@gmail.com. The above CBSE study material present on our websites is for education purpose, not our copyrights. All the above content and Screenshot are taken from Artificial Intelligence Class 12 CBSE Textbook, Sample Paper, Old Sample Paper, Board Paper and Support Material which is present in CBSEACADEMIC website, This Textbook and Support Material are legally copyright by Central Board of Secondary Education. We are only providing a medium and helping the students to improve the performances in the examination.

Images and content shown above are the property of individual organizations and are used here for reference purposes only.

For more information, refer to the official CBSE textbooks available at cbseacademic.nic.in

cbseskilleducation

Share with others

Data Science Methodology Class 12 Questions and Answers

Leave a Comment Cancel reply