projects:digikey_partsdb
This is an old revision of the document!
digikey parts slurper
fetch www.digikey.com/product-search/en?FV=
grep for catfilterlink
remove beginning of line to inclusive “
remove end of line from “ inclusive
produces following info
grabbing FV's
we need the FV's to crawl each subsection. grab all the above urls, make sure Results per Page = 500. The CSV download is capped at 500 results per fetch, so no point increasing this value.
- <input type=hidden name=FV value=fff40000,fff80000>
also grab the total page count
- <a class=“Last” href=”/product-search/en/undefined-category/undefined-family/0/page/8”>Last</a>
The page/8“ is the total page count, pages start from 1
grab the FV value and page count, and store for each of the above URL's
crawl individual pages
curl with a valid useragent i used –useragent “Chrome/1.0” but vary it to avoid rate limiters.
projects/digikey_partsdb.1381593057.txt.gz · Last modified: 2013/10/12 08:50 by charliex