Machine Learning Day2 : Reading Diffferent Format Dataset Using Pandas

Machine Learning Day2 : Reading Diffferent Format Dataset Using Pandas

short and crisp article for read data in ml model

hey folks this day 2 in machine learning journey in this article i will provide you how to read different different dataset format using pandas.

DIFFERENT WAYS TO EXTRACT DATA FOR MACHINE LEARNING MODEL

There is mainly 5 ways to get data for machine learning models.

  1. using csv file
  2. using json
  3. using dbms
  4. using API
  5. using web scrapping

lets discuss one by one these methods

CSV FORMAT

csv is the most common dataset format to provide data for the machine learning model. csv stands for comma separated values.

CSV FILE EXAMPLE

csv_example.png

CODE

import numpy as np //import numpy library as np 
import pandas as pd //import pandas library as pd 
data=pd.read_csv("/IMDB_Top250Engmovies2_OMDB_Detailed.csv")   //read_csv is a function in pandas library 
data.head() //data.head() used to display top 5 rows in the dataset

OUTPUT

ml screenshot.PNG

TSV FORMAT

TSV format is also a popular format used in machine learning .TSV stands for tab separated values . here in place of comma separated values we have tab separated value.

EXAMPLE OF TSV FILE

tsv_example.png

CODE

import pandas as pd //import pandas library
data=pd.read_csv('file.csv',sep='\t') // read csv used to read csv file and sep='\t' stands for separator using tab 
data.head()

JSON FILE FORMAT

JSON file format is one of the most widely used file format in the industry so let study some basics of json format . json (javascript on notation) are plain text file you can open them in any text editor

EXAMPLE OF JSON FILE

JSON_FILE.jfif

CODE

import numpy as np 
import pandas as pd 
data=pd.read_json('restuarant_details.json')
data.head()

OUTPUT

json_example.PNG

READING DATA THROUGH URL IN JSON

some time the json file is so big that we cannot use it in our local machine so we can directly access the data through server

CODE

import pandas as pd // import pandas library
import numpy as np // import numpy library 
data=pd.read_json('file_url') // read json file using read_json function 
data.head() //display top 5 row using data.head()

i will cover the rest of method in a separate article

for more information read python pandas documentation pandas.pydata.org/docs/user_guide/io.html