Automagically download from rapidshare
Published on October 7, 2007This script will crack the rapidshare captcha and download the file Requires:
- python-mechanize
- imagemagick
- ocrad/gocr
Run with ./downloadFromRapidshare.py http://rapidshare.com/somefile
downloadFromRapidshare.py:
#!/usr/bin/env python
from mechanize import *
import re
import urllib
import commands
import os
import sys
import time
url=sys.argv[1]
print "V2: "+ url
br = Browser()
response = br.open(url)
br.select_form(nr=0)
response = br.submit(nr=1)
html = response.read()
def htc(m):
return chr(int(m.group(1),16))
def urldecode(url):
rex=re.compile('%([0-9a-hA-H][0-9a-hA-H])',re.M)
return rex.sub(htc,url)
htmlDec = urldecode(html)
try:
exit = 0
waitTime = re.search('Or wait [0-9]* minute', html).group(0).split(' ')[2]
try:
if sys.argv[2] == 'y':
print 'Waiting for '+waitTime+' minutes'
time.sleep(int(waitTime)*60)
except:
print waitTime
exit =1
except:
pass
if exit == 1:
sys.exit(2)
waitTime = re.search('var c=[0-9]*',htmlDec).group(0)[6:]
print 'Waiting for ' + waitTime + ' seconds'
for sec in range(int(waitTime),0,-1) :
sys.stdout.write(' \r'+str(sec)+' ')
sys.stdout.flush()
time.sleep(1)
imgURL = re.search('http://[^"]*jpg', re.search('Please enter[^:]*:[^:]*:', htmlDec).group(0)).group(0)
print 'Downloading captcha image ('+imgURL+')'
f = urllib.urlopen(imgURL)
fp = open('captcha.jpg','w')
fp.write(f.read())
fp.close()
print 'Breaking captcha...'
commands.getoutput('convert captcha.jpg captcha.pbm')
captchaText = commands.getoutput('gocr -d 10 -m 256 -m 2 -p ./db2/db2 captcha.pbm')
print 'Captcha is "'+captchaText[0:4]+'"'
# save a copy for later
commands.getoutput('mv captcha.jpg captchas/`md5sum captcha.jpg | cut -d " " -f 1`')
postUrl = re.search('action="http://[^"]*',htmlDec).group(0)[8:]
file = br.open(postUrl+'?accesscode='+captchaText[0:4])
url = postUrl
print "Downloading " + url.split('/')[-1] + "..."
fileSize = file.info().getheader("Content-Length")
try:
exit = 0
if os.stat(url.split('/')[-1]).st_size == int(fileSize):
print 'File already fully downloaded.'
exit =1
except:
pass
if exit == 1:
sys.exit(0)
fp = open(url.split('/')[-1],'w')
numK = 256
#numK = 25
downloadedSize = 0
while 1:
st = time.time()
chunk = file.read(1024*numK)
sizeOfChunk = len(chunk)
downloadedSize += sizeOfChunk
if not chunk: break
speed = int( ( sizeOfChunk /( time.time() - st ) ) / 1024 )
fp.write(chunk)
fp.flush()
percentage = (float(downloadedSize)/float(fileSize))*100
sys.stdout.write('\r ')
sys.stdout.flush()
sys.stdout.write('\r'+str(downloadedSize)+'/'+str(fileSize)+' '+str(int(percentage))+'% '+str(speed)+' kb/s ')
sys.stdout.flush()
fp.close()
Oct 13th:
Updated code to improve performance
Oct 19th:
rapidshare.com updated their site and now write the form using javascript They don't check where the parameters are coming from though so we can issue a GET request instead of a very messy POST request
Nov 19th:
rapidshare.com updated their captchas. The new convert args seem to work but only in about 1 out of 2 or 3 times.
convert captcha.jpg -monochrome -edge 23 -fuzz 60% -floodfill 1x1 white -negate captcha.pbm
It has problems with some characters (2, Z, 4, M) It's not a complicated captcha so maybe if I have some free time soon I will improve the accuracy
Feb 3rd:
Updated code, it's a lot cleaner now and works ~90% of the time once gocr has been trained
Read MoreBluetooth remote control
Published on June 21, 2007Read MoreRemuco is a system to remotely control Linux music players with JavaME capable mobile devices via Bluetooth.
How to get Django working on digiweb.ie using django.cgi
Published on June 11, 2007I just thought I'd post exactly what I did to get django running on digiweb.ie shared hosting
First I downloaded and extracted Django version 0.96 to ~/Django-0.96
All requests will then be forwarded through the django.cgi script which I have saved as cgi-bin/dj
.htaccess
RewriteEngine on
RewriteRule ^cgi-bin/ - [L]
RewriteRule ^media/ - [L]
RewriteRule ^(.*)(/)$ cgi-bin/dj/$1/
RewriteRule ^$ cgi-bin/dj/home/
Read More
Django Wiki Part 2
Published on May 31, 2007Building on the simple wikitags parser I wrote about earlier I'm now trying to build an application around that template tag.
A basic wiki should the following features:
- Give the user the ability to create and edit pages.
- Recent versions of a page should be stored to help see what has been changed and to mitigate the effects of spam.
- The syntax of the markup/formatting language should be simple and allow easy linking, especially internal linking.
- Have the ability to contain static pages
The django.views.generic.create_update module looks like the obvious choice to use for creating/editing pages but the basic structure will be something like this.
Read More Next >>

