Just Dial Data Extractor is the most powerful and easy-to-use python Script. Users can easily get data from the websites and save it in the .csv format.
This python script extracts the following data:
- Phone number
- Name
- Address
Let’s now get our hands dirty!!
- Python version: I’m using Python 3.0, however, feel free to use Python 2.0 by making slight adjustments. I’m using Jupyter Notebook, so you don’t need any command-line knowledge.
- Selenium package: Install selenium package using the following command.
!pip install selenium
That’s it! You are all set.
Approach:
- Import the following modules: webdriver from selenium, ChromeDriverManager, pandas, time and os.
- Use the driver.get() method and pass the link you want to get information from.
- Use the driver.find_elements_by_class_name() method and pass ‘store-details’.
- Instantiate empty lists to store the values.
- Iterate the StoreDetails and start fetching the individual details that are required.
- Create a user-defined function strings_to_number() to convert the extracted string to numbers.
- Display the details and save them as a CSV file according to the requirements.
# code
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
import pandas as pd
import time
import os
# driver.get method() will navigate to a page given by the URL address
driver.get("https://www.justdial.com/Delhi/Ceiling-Tile-Dealers-Armstrong/nct-11271379")
def strings_to_num(argument):
switcher = {
'dc': '+',
'fe': '(',
'hg': ')',
'ba': '-',
'acb': '0',
'yz': '1',
'wx': '2',
'vu': '3',
'ts': '4',
'rq': '5',
'po': '6',
'nm': '7',
'lk': '8',
'ji': '9'
}
return switcher.get(argument, "nothing")
storeDetails = driver.find_elements_by_class_name('store-details')
nameList = []
addressList = []
numbersList = []
for i in range(len(storeDetails)):
name = storeDetails[i].find_element_by_class_name('lng_cont_name').text
address = storeDetails[i].find_element_by_class_name('cont_sw_addr').text
contactList = storeDetails[i].find_elements_by_class_name('mobilesv')
myList = []
for j in range(len(contactList)):
myString = contactList[j].get_attribute('class').split("-")[1]
myList.append(strings_to_num(myString))
nameList.append(name)
addressList.append(address)
numbersList.append("".join(myList))
# intialise data of lists.
data = {'Company Name':nameList,
'Address': addressList,
'Phone':numbersList}
# Create DataFrame
df = pd.DataFrame(data)
print(df)
# Save Data as .csv
df.to_csv('demo1.csv', mode='a', header=False)
- No need to download the chrome driver
- Feel free to modify the code. Try to :
- Extract Rating and etc.
- Code Link : https://github.com/alokm014/WhatsApp-Automation-Selenium
Comment below about your experience! There is no limit to scraping and the above is just an example to get you guys started. So happy coding!
Video Link: