blog stats
seamusc.com

Automagically download from rapidshare

Published on October 7, 2007

This script will crack the rapidshare captcha and download the file Requires:

  • python-mechanize
  • imagemagick
  • ocrad/gocr

Run with ./downloadFromRapidshare.py http://rapidshare.com/somefile

downloadFromRapidshare.py:

#!/usr/bin/env python

from mechanize import *
import re
import urllib
import commands
import os
import sys
import time



url=sys.argv[1]

print "V2:  "+ url

br = Browser()

response = br.open(url)



br.select_form(nr=0)
response = br.submit(nr=1)

html = response.read()


def htc(m):
return chr(int(m.group(1),16))

def urldecode(url):
    rex=re.compile('%([0-9a-hA-H][0-9a-hA-H])',re.M)
    return rex.sub(htc,url)

htmlDec = urldecode(html)


try:
    exit = 0
    waitTime = re.search('Or wait [0-9]* minute', html).group(0).split(' ')[2]
    try:
        if sys.argv[2] == 'y':
            print 'Waiting for '+waitTime+' minutes'
            time.sleep(int(waitTime)*60)
    except:
        print waitTime
    exit =1
except:
    pass
if exit == 1:
    sys.exit(2)

waitTime = re.search('var c=[0-9]*',htmlDec).group(0)[6:]


print 'Waiting for ' + waitTime + ' seconds'
for sec in range(int(waitTime),0,-1) :
    sys.stdout.write('    \r'+str(sec)+'  ')
    sys.stdout.flush()
    time.sleep(1) 

imgURL = re.search('http://[^"]*jpg', re.search('Please enter[^:]*:[^:]*:', htmlDec).group(0)).group(0)
print 'Downloading captcha image ('+imgURL+')'
f = urllib.urlopen(imgURL)
fp = open('captcha.jpg','w')
fp.write(f.read())
fp.close()

print 'Breaking captcha...'


commands.getoutput('convert captcha.jpg captcha.pbm')
captchaText = commands.getoutput('gocr -d 10 -m 256 -m 2 -p ./db2/db2 captcha.pbm')
print 'Captcha is "'+captchaText[0:4]+'"'

# save a copy for later
commands.getoutput('mv captcha.jpg captchas/`md5sum captcha.jpg | cut -d " " -f 1`')


postUrl = re.search('action="http://[^"]*',htmlDec).group(0)[8:]


file = br.open(postUrl+'?accesscode='+captchaText[0:4])
url = postUrl



print "Downloading " + url.split('/')[-1] + "..."



fileSize = file.info().getheader("Content-Length")
try:
    exit = 0
    if os.stat(url.split('/')[-1]).st_size == int(fileSize):
        print 'File already fully downloaded.'
        exit =1
except:
    pass
if exit == 1:
    sys.exit(0)

fp = open(url.split('/')[-1],'w')

numK = 256
#numK = 25

downloadedSize = 0
while 1:
    st = time.time()
    chunk = file.read(1024*numK)
    sizeOfChunk = len(chunk)
    downloadedSize += sizeOfChunk

    if not chunk: break

    speed =  int( ( sizeOfChunk /( time.time() - st ) ) / 1024 )

    fp.write(chunk)
    fp.flush()

    percentage = (float(downloadedSize)/float(fileSize))*100
    sys.stdout.write('\r                                                                    ')
    sys.stdout.flush()
    sys.stdout.write('\r'+str(downloadedSize)+'/'+str(fileSize)+'  '+str(int(percentage))+'%  '+str(speed)+' kb/s                                    ')
    sys.stdout.flush()

fp.close()

Oct 13th:

Updated code to improve performance

Oct 19th:

rapidshare.com updated their site and now write the form using javascript They don't check where the parameters are coming from though so we can issue a GET request instead of a very messy POST request


Nov 19th:

rapidshare.com updated their captchas. The new convert args seem to work but only in about 1 out of 2 or 3 times.

convert captcha.jpg -monochrome -edge 23 -fuzz 60% -floodfill 1x1 white -negate  captcha.pbm

It has problems with some characters (2, Z, 4, M) It's not a complicated captcha so maybe if I have some free time soon I will improve the accuracy

Feb 3rd:

Updated code, it's a lot cleaner now and works ~90% of the time once gocr has been trained

Read More

Bluetooth remote control

Published on June 21, 2007

Remuco is a system to remotely control Linux music players with JavaME capable mobile devices via Bluetooth.


Read More

How to get Django working on digiweb.ie using django.cgi

Published on June 11, 2007

I just thought I'd post exactly what I did to get django running on digiweb.ie shared hosting

First I downloaded and extracted Django version 0.96 to ~/Django-0.96

All requests will then be forwarded through the django.cgi script which I have saved as cgi-bin/dj

.htaccess

RewriteEngine on
RewriteRule ^cgi-bin/ - [L]
RewriteRule ^media/ - [L]
RewriteRule ^(.*)(/)$ cgi-bin/dj/$1/
RewriteRule ^$ cgi-bin/dj/home/
Read More

Django Wiki Part 2

Published on May 31, 2007

Building on the simple wikitags parser I wrote about earlier I'm now trying to build an application around that template tag.

A basic wiki should the following features:

  • Give the user the ability to create and edit pages.
  • Recent versions of a page should be stored to help see what has been changed and to mitigate the effects of spam.
  • The syntax of the markup/formatting language should be simple and allow easy linking, especially internal linking.
  • Have the ability to contain static pages

The django.views.generic.create_update module looks like the obvious choice to use for creating/editing pages but the basic structure will be something like this.

Read More
Next >>