projects:digikey

digikey parts slurper

fetch www.digikey.com/product-search/en?FV=

grep for catfilterlink

remove beginning of line to inclusive “

remove end of line from “ inclusive

produces following info

grabbing FV's

we need the FV's to crawl each subsection. grab all the above urls, make sure Results per Page = 500. The CSV download is capped at 500 results per fetch, so no point increasing this value.

<input type=hidden name=FV value=fff40000,fff80000>

also grab the total page count

<a class=“Last” href=”/product-search/en/undefined-category/undefined-family/0/page/8”>Last</a>

The page/8“ is the total page count, pages start from 1

grab the FV value and page count, and store for each of the above URL's

crawl individual pages

curl with a valid useragent i used –useragent “Chrome/1.0” but vary it to avoid rate limiters.

curl.exe -o page%1.csv -L -v -G "http://www.digikey.com/product-search/download.csv?FV=fff40008%2Cfff801b9&mnonly=0&newproducts=0&ColumnSort=0&page=%1&stock=0&pbfree=0&rohs=0&quantity=0&ptm=0&fid=0&pageSize=500" --digest --user-agent "Chrome/1.0"

The response has 4 bytes at the front we don't want, so a simple byteskip script or piece of code.

 

#include <stdio.h>
#include <stdlib.h>

int main(int argc,char*argv[])
{
	FILE *fp,*ofp;

	if( argc < 4 ) {
		fprintf(stderr,"%s usage : infile outfile offset\n",argv[0]);
		exit(-1);
	}

	fp =fopen( argv[1],  "rb");
	if( fp == NULL ) {
		fprintf(stderr,"Couldnt open input file %s\n",argv[1]);
		exit(-2);
	}

	unsigned long length ;

	fseek(fp,0,SEEK_END);

	length = ftell( fp ) ;


	if( length == 0 ) {
		
		fclose( fp );

		fprintf(stderr,"zero length file %s\n",argv[1]);
		exit(-3);
	}

	unsigned long offset;

	//skip offset
	offset = strtoul (argv[3], NULL, 0);

	if( offset >= length ){
		
		fclose( fp );

		fprintf(stderr,"offset is  outside file length %s at %d\n",argv[1], offset);
		exit(-5);
	}

	// set to skip position
	fseek(fp,offset,SEEK_SET);

	unsigned char *buffer = NULL;

	buffer = (unsigned char *)malloc( length - offset );
	if( buffer == NULL ) {
		
		fclose(fp);

		fprintf(stderr,"Couldnt allocate output buffer %lu\n", offset );
		exit(-6);
	}

	// read whole buffer.
	if( fread(buffer,1,length - offset ,fp ) != (length-offset) ) {
		fclose(fp);
		fprintf(stderr,"Couldnt allocate output buffer %lu\n", offset );
		exit(-7);

	}

	// open output file for writing.
	ofp = fopen( argv[2],  "wb");
	
	if( ofp == NULL ) {
		fclose(fp);
		
		free( buffer );
		buffer = NULL;
		fprintf(stderr,"Couldnt open output file %s\n",argv[2]);
		exit(-8);
	}

	if( fwrite(buffer,1,length-offset,ofp) != (length-offset) ) { 
		fclose(fp);
		fclose(ofp);
		fprintf(stderr,"Couldnt write output file %s\n", argv[2]);
		exit(-9);
	}

	free( buffer );

	fclose(fp);
	fclose(ofp);


	return 0;
}

Process all the files.

for %a in (*.csv) do byteskip %a o%a 4

I used one of the online CSV to MYSQL converters, but most of them can't handle the variations in CSV. To create the initial schema for each table i converted one CSV to XLS by importing it into google docs, and then re-exporting it as an XLS then importing that into phpmyadmin, that makes the base schema.<br>

Rename the table in phpmyadmin or via mysql tool

Then do the final import with the csvtosql tool, (in progress)