A publisher’s Catalog has all the data and metadata related to his books. For example:
This publisher has all this information in his web page, but it was uploaded manually and he has not a unic file with the information.
To collect the data I iterate a list of ISBN (International Serial Book Number) to find the exact page in the web page. Then using Beautiful Soup, I extract the data for each page of each book. Subsequently I drop the information in a CSV using pandas.
If some ISBN or some Data has mistakes it passes to a list of ISBN with problems to check it mannually.
Project link: https://github.com/dfmoscoso23/bookcatalog