Hi Learners,
This thread is locked for this batch learners.
Post your questions below.
This thread is locked for this batch learners.
Post your questions below.
Recommended. Know people from your network.
Don't have an account?Sign up Now
To reset your password, enter the email address you registered with and we"ll send your instructions on their way.
Don't have an account?Sign up Now
Want to join the rest of our members? Sign up right away!
Sign UpHello Jayanth,
We had looked at dimensional reduction during data pre-processing using PCA (Principal Component Analysis). I've got a question on this part.
Principal components are derived by feeding the independent columns in to SKLEARN PCA fit and transform in python in Jupyter notebook. Cumulative variance helps to check the variance proportion for each independent column in entire data set that has all the independent columns fed in to PCA model. As i got from you during class, component to feature mapping can be identified using eigenvalues.
My question on this part is do we have any technique in python to find the relation between component returned from PCA model to features, so that we can drop those columns that has lesser variance proportion?
Thank you.
Sheik
https://drive.google.com/open?id=1EaeClCrbtDJdocPdPMfQ7iaGBjnZgctsHi
Pls share link for drive for the material discussed on 1st April class
Thanks
So the values of Lambda in both Ridge and Lasso aren't that important. It's important to know what accuracies/RMSE's we get after we toggle either into L1/L2. e.g.Jayant,
You have explained to us that regularization function (L1 & L2) can be applied to the cost function in Decision Tree regression algorithm. What will be the values for the weights to be used in regularization function?
Thanks
Balaram
It's useful to drop the transformed columns only after applying PCA. A column that might not seem interesting might be very important after transformation, and vice versa.
As part of pre processing you could look into correlated variables (but that's just EDA). (You might want to drop or leave out heavily correlated variables, e.g. 'batting strike rate' is heavily correlated 'number_of_boundaries"Suppose we have dataset and some independent variables are correlated so do we need to remove the correlated variable first then perform the PCA?
how to check the correlated variable? By p value only?
Simply we find the p values and if p values is greater than 0.05 than we neglect that variable.
Thanks
Mohammad Sharib khan
Thank you Jayanth for your response. However, my actual question was how do we know which column to drop and which ones to consider post PCA transformation? Thank you for helping out.
Ok Jay. Thanks a lot. One final question - Components returned out of PCA model is in any order with the actual independent features?Typically drop the components that aren't contributing to beyond 90% of the cumulative variance