blog stats
seamusc.com

Automagically download from rapidshare

Published on October 7, 2007

This script will crack the rapidshare captcha and download the file Requires:

  • python-mechanize
  • imagemagick
  • ocrad/gocr

Run with ./downloadFromRapidshare.py http://rapidshare.com/somefile

downloadFromRapidshare.py:

#!/usr/bin/env python

from mechanize import *
import re
import urllib
import commands
import os
import sys
import time



url=sys.argv[1]

print "V2:  "+ url

br = Browser()

response = br.open(url)



br.select_form(nr=0)
response = br.submit(nr=1)

html = response.read()


def htc(m):
return chr(int(m.group(1),16))

def urldecode(url):
    rex=re.compile('%([0-9a-hA-H][0-9a-hA-H])',re.M)
    return rex.sub(htc,url)

htmlDec = urldecode(html)


try:
    exit = 0
    waitTime = re.search('Or wait [0-9]* minute', html).group(0).split(' ')[2]
    try:
        if sys.argv[2] == 'y':
            print 'Waiting for '+waitTime+' minutes'
            time.sleep(int(waitTime)*60)
    except:
        print waitTime
    exit =1
except:
    pass
if exit == 1:
    sys.exit(2)

waitTime = re.search('var c=[0-9]*',htmlDec).group(0)[6:]


print 'Waiting for ' + waitTime + ' seconds'
for sec in range(int(waitTime),0,-1) :
    sys.stdout.write('    \r'+str(sec)+'  ')
    sys.stdout.flush()
    time.sleep(1) 

imgURL = re.search('http://[^"]*jpg', re.search('Please enter[^:]*:[^:]*:', htmlDec).group(0)).group(0)
print 'Downloading captcha image ('+imgURL+')'
f = urllib.urlopen(imgURL)
fp = open('captcha.jpg','w')
fp.write(f.read())
fp.close()

print 'Breaking captcha...'


commands.getoutput('convert captcha.jpg captcha.pbm')
captchaText = commands.getoutput('gocr -d 10 -m 256 -m 2 -p ./db2/db2 captcha.pbm')
print 'Captcha is "'+captchaText[0:4]+'"'

# save a copy for later
commands.getoutput('mv captcha.jpg captchas/`md5sum captcha.jpg | cut -d " " -f 1`')


postUrl = re.search('action="http://[^"]*',htmlDec).group(0)[8:]


file = br.open(postUrl+'?accesscode='+captchaText[0:4])
url = postUrl



print "Downloading " + url.split('/')[-1] + "..."



fileSize = file.info().getheader("Content-Length")
try:
    exit = 0
    if os.stat(url.split('/')[-1]).st_size == int(fileSize):
        print 'File already fully downloaded.'
        exit =1
except:
    pass
if exit == 1:
    sys.exit(0)

fp = open(url.split('/')[-1],'w')

numK = 256
#numK = 25

downloadedSize = 0
while 1:
    st = time.time()
    chunk = file.read(1024*numK)
    sizeOfChunk = len(chunk)
    downloadedSize += sizeOfChunk

    if not chunk: break

    speed =  int( ( sizeOfChunk /( time.time() - st ) ) / 1024 )

    fp.write(chunk)
    fp.flush()

    percentage = (float(downloadedSize)/float(fileSize))*100
    sys.stdout.write('\r                                                                    ')
    sys.stdout.flush()
    sys.stdout.write('\r'+str(downloadedSize)+'/'+str(fileSize)+'  '+str(int(percentage))+'%  '+str(speed)+' kb/s                                    ')
    sys.stdout.flush()

fp.close()

Oct 13th:

Updated code to improve performance

Oct 19th:

rapidshare.com updated their site and now write the form using javascript They don't check where the parameters are coming from though so we can issue a GET request instead of a very messy POST request


Nov 19th:

rapidshare.com updated their captchas. The new convert args seem to work but only in about 1 out of 2 or 3 times.

convert captcha.jpg -monochrome -edge 23 -fuzz 60% -floodfill 1x1 white -negate  captcha.pbm

It has problems with some characters (2, Z, 4, M) It's not a complicated captcha so maybe if I have some free time soon I will improve the accuracy

Feb 3rd:

Updated code, it's a lot cleaner now and works ~90% of the time once gocr has been trained

Read More

How to get Django working on digiweb.ie using django.cgi

Published on June 11, 2007

I just thought I'd post exactly what I did to get django running on digiweb.ie shared hosting

First I downloaded and extracted Django version 0.96 to ~/Django-0.96

All requests will then be forwarded through the django.cgi script which I have saved as cgi-bin/dj

.htaccess

RewriteEngine on
RewriteRule ^cgi-bin/ - [L]
RewriteRule ^media/ - [L]
RewriteRule ^(.*)(/)$ cgi-bin/dj/$1/
RewriteRule ^$ cgi-bin/dj/home/
Read More

Django Wiki Part 2

Published on May 31, 2007

Building on the simple wikitags parser I wrote about earlier I'm now trying to build an application around that template tag.

A basic wiki should the following features:

  • Give the user the ability to create and edit pages.
  • Recent versions of a page should be stored to help see what has been changed and to mitigate the effects of spam.
  • The syntax of the markup/formatting language should be simple and allow easy linking, especially internal linking.
  • Have the ability to contain static pages

The django.views.generic.create_update module looks like the obvious choice to use for creating/editing pages but the basic structure will be something like this.

Read More

Break page template tag

Published on May 31, 2007

Just looking over some of the posts here I realised that many of them are very long. Django's template system will let you truncate a piece of text using the truncatewords:N tag but this often is not what you want.

For this site if i decide to truncate all my posts at say 200 words then it is possible that a <code> or <p> tag will be left open, which will mess up the appearance of the page.

It would be nicer if I could just tell the template system where to break my post so that bad things don't happen or that a post is broken with only a few words to go.

This little piece of code should implement this, breaking the page when it finds BREAKHERE on it's own on a line.

Read More

RSS feeds in Django

Published on May 22, 2007

Adding RSS feeds to your site is another thing that django makes so easy for you. To start off add

(r'^feeds/(?P<url>.*)/$', 'django.contrib.syndication.views.feed',
    {'feed_dict': feeds}),

to your urls.py file

Read More

Django Flowchart

Published on May 21, 2007

Just found this and need to stick it somewhere so I don't lose it.

Read More

Comments

Published on May 19, 2007

Django really is a wonderful piece of software. After watching a video of Jacob Kaplan-Moss' presentation at google where he mentioned that users can comment on almost all content on LJWorld.com I thought "hey that's a pretty cool idea, I want to do that too". So I did, but the really cool thing is just how easy it was to do.

First add 'django.contrib.comments', to your INSTALLED_APPS in settings.py

If your using a custom view instead of django's generic views then you'll need to add

from django.contrib.comments.models import FreeComment

to your views.py

At the top of the template where you want to display the comments add this in

{% load comments.comments %}
Read More

Django Wiki

Published on May 19, 2007

Update: I have expanded on this some more here

Update: Fixed a number of issues, should now be working

Update: I have noticed a lot of people coming here after searching google for "django wiki". If you decide to use this code as a basis of your own template tag please let me know just so I know if it actually works reliably.

I've started designing a wiki for django as I can't seem to find one that exists already. It shouldn't be all that hard to do. I've started by just creating some simple search and replace statements using python's regex module re

Read More

Django on digiweb.ie

Published on May 18, 2007

I finally got this site up and running on digiweb.ie's free student hosting package. The site is powered fully by django

The Web framework for perfectionists with deadlines

As digiweb do not support mod_python or fastcgi on this package I ended up having to use django.cgi to process requests. This method is described as being very slow as for each request an instance of python plus all of django's code needs to started and loaded into memory but I really haven't noticed a delay. Kudos to digiweb.ie for their very generous hosting package!

Read More