Overview
This week's assignment tests your understanding of basic linear regression concepts and their implementation in Python. You'll be building a simple linear regression model from scratch using gradient descent, evaluating its performance, and interpreting the results.
You'll submit a Jupyter notebook with your code and answers to the questions.
Learning Objectives
By completing this assignment, you will be able to:
- Implement gradient descent for linear regression from scratch
- Evaluate model performance using metrics like MSE and R-squared
- Interpret regression coefficients and their significance
- Visualize regression results and residuals
- Apply feature scaling and regularization techniques
Dataset
You'll be working with a synthetic dataset generated for this assignment. The dataset contains:
X: A 2D numpy array with 1000 samples and 3 features
y: A 1D numpy array with 1000 target values
The data has been generated with a known linear relationship plus some random noise, allowing you to validate your implementation.
Tasks
Task 1: Data Exploration and Preparation (20 points)
Load the dataset and perform basic exploratory data analysis.
- Load the provided
data.npz file
- Examine the shape and basic statistics of the features and target
- Visualize the relationship between each feature and the target using scatter plots
- Check for missing values and outliers
- Split the data into training (70%) and test (30%) sets
Task 2: Implement Linear Regression (40 points)
Implement a linear regression model using gradient descent.
- Add a bias term (column of ones) to your feature matrix
- Implement the cost function (mean squared error)
- Implement gradient descent to minimize the cost function
- Train your model on the training data
- Plot the learning curve (cost vs. iteration)
- Experiment with different learning rates and report your findings
Task 3: Model Evaluation (20 points)
Evaluate your trained model on the test set.
- Make predictions on the test set
- Calculate and report the following metrics:
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R-squared (coefficient of determination)
- Visualize the predictions vs. actual values
- Plot the residuals and check for patterns
Task 4: Feature Scaling and Regularization (20 points)
Improve your model with feature scaling and regularization.
- Implement feature standardization (z-score normalization)
- Retrain your model with scaled features and compare results
- Implement ridge regression (L2 regularization)
- Experiment with different regularization strengths
- Discuss the effects of scaling and regularization on your model
Submission Requirements
Submit a single Jupyter notebook (.ipynb file) containing:
- All code for the implementation
- Visualizations and plots
- Answers to all questions in markdown cells
- Interpretation of your results
The notebook should be well-organized with clear section headings and comments.
Grading Rubric
| Component |
Points |
Criteria |
| Task 1 |
20 |
Complete data exploration, proper visualization, correct train/test split |
| Task 2 |
40 |
Correct implementation, proper training, meaningful analysis of learning rates |
| Task 3 |
20 |
Accurate evaluation metrics, appropriate visualizations, correct interpretation |
| Task 4 |
20 |
Proper implementation of scaling and regularization, insightful analysis |
| Code Quality |
Bonus |
Clean, well-commented code with proper organization |
Tips
- Start early and ask questions if you get stuck
- Test each function independently before integrating
- Use vectorized operations for efficiency
- Document your thought process in markdown cells
- Compare your results with scikit-learn's implementation for validation
Deadline
Due: Friday, October 27, 2023, 11:59 PM
Late submissions: 10% penalty per day, up to 3 days late