David Moscoso | Rebuilding a book's Catalog of a Publisher

Rebuilding a book's Catalog of a Publisher

Content of a Book’s Catalog of a Publisher

A publisher’s Catalog has all the data and metadata related to his books. For example:

ISBN
Title
Author
Dimensions
Pages
Summary
Translator
Illustrator
Type of Cover
Image Url

This publisher has all this information in his web page, but it was uploaded manually and he has not a unic file with the information.

Scrapping of the Data

To collect the data I iterate a list of ISBN (International Serial Book Number) to find the exact page in the web page. Then using Beautiful Soup, I extract the data for each page of each book. Subsequently I drop the information in a CSV using pandas.

If some ISBN or some Data has mistakes it passes to a list of ISBN with problems to check it mannually.

Project link: https://github.com/dfmoscoso23/bookcatalog

Data Science
Web Scrapping
Catalog
Beautiful Soup
Data bases
Selenium