Learn Python
Learn Data Structure & Algorithm
Learn Numpy
Learn Pandas
Learn Matplotlib
Learn Seaborn
Learn Statistics
Learn Math
Learn MATLAB
introduction
Setup
Read data
Data preprocessing
Data cleaning
Handle date-time column
Handling outliers
Encoding
Feature_Engineering
Feature selection filter methods
Feature selection wrapper methods
Multicollinearity
Data split
Feature scaling
Supervised Learning
Regression
Classification
Bias and Variance
Overfitting and Underfitting
Regularization
Ensemble learning
Unsupervised Learning
Clustering
Association Rule
Common
Model evaluation
Cross Validation
Parameter tuning
Code Exercise
Car Price Prediction
Flight Fare Prediction
Diabetes Prediction
Spam Mail Prediction
Fake News Prediction
Boston House Price Prediction
Learn Github
Learn OpenCV
Learn Deep Learning
Learn MySQL
Learn MongoDB
Learn Web scraping
Learn Excel
Learn Power BI
Learn Tableau
Learn Docker
Learn Hadoop
1. id: Unique id for a news article
2. title: the title of a news article
3. author: author of the news article
4. text: the text of the article, cloud be incomplete
5. label: a label that marks whether the news article is real or fake
Here
1 means fake news
0 means real news
Stemming is the process where we remove the prefixes and suffixes from words so that they are reduced to a
simpler form which called stems.
For example:
If we have words like history or historical and after stemming it will convert into histori. If we have words
like finally or final or finalized then after stemming it will convert into fina. Similarly, go or goes will
convert into go.
Stop words are those word which doesn't have that much value in the sentence and if we remove these from the sentence then it will not change the meaning of the sentence. In short here remove those words, repeating words, prepositions, articles, etc which don't have that much effect on a particular project or application.
Look we ml model can't work with text data. It can only work with numerical data. In content column we have text data. To use it for training we need to convert it into numerical form. To do this we will use TF-IDF method.
Here we only tf-idf on X data because Y data is in numerical form.