Automagically download from rapidshare
Published on October 7, 2007This script will crack the rapidshare captcha and download the file Requires:
- python-mechanize
- imagemagick
- ocrad/gocr
Run with ./downloadFromRapidshare.py http://rapidshare.com/somefile
downloadFromRapidshare.py:
#!/usr/bin/env python
from mechanize import *
import re
import urllib
import commands
import os
import sys
import time
url=sys.argv[1]
print "V2: "+ url
br = Browser()
response = br.open(url)
br.select_form(nr=0)
response = br.submit(nr=1)
html = response.read()
def htc(m):
return chr(int(m.group(1),16))
def urldecode(url):
rex=re.compile('%([0-9a-hA-H][0-9a-hA-H])',re.M)
return rex.sub(htc,url)
htmlDec = urldecode(html)
try:
exit = 0
waitTime = re.search('Or wait [0-9]* minute', html).group(0).split(' ')[2]
try:
if sys.argv[2] == 'y':
print 'Waiting for '+waitTime+' minutes'
time.sleep(int(waitTime)*60)
except:
print waitTime
exit =1
except:
pass
if exit == 1:
sys.exit(2)
waitTime = re.search('var c=[0-9]*',htmlDec).group(0)[6:]
print 'Waiting for ' + waitTime + ' seconds'
for sec in range(int(waitTime),0,-1) :
sys.stdout.write(' \r'+str(sec)+' ')
sys.stdout.flush()
time.sleep(1)
imgURL = re.search('http://[^"]*jpg', re.search('Please enter[^:]*:[^:]*:', htmlDec).group(0)).group(0)
print 'Downloading captcha image ('+imgURL+')'
f = urllib.urlopen(imgURL)
fp = open('captcha.jpg','w')
fp.write(f.read())
fp.close()
print 'Breaking captcha...'
commands.getoutput('convert captcha.jpg captcha.pbm')
captchaText = commands.getoutput('gocr -d 10 -m 256 -m 2 -p ./db2/db2 captcha.pbm')
print 'Captcha is "'+captchaText[0:4]+'"'
# save a copy for later
commands.getoutput('mv captcha.jpg captchas/`md5sum captcha.jpg | cut -d " " -f 1`')
postUrl = re.search('action="http://[^"]*',htmlDec).group(0)[8:]
file = br.open(postUrl+'?accesscode='+captchaText[0:4])
url = postUrl
print "Downloading " + url.split('/')[-1] + "..."
fileSize = file.info().getheader("Content-Length")
try:
exit = 0
if os.stat(url.split('/')[-1]).st_size == int(fileSize):
print 'File already fully downloaded.'
exit =1
except:
pass
if exit == 1:
sys.exit(0)
fp = open(url.split('/')[-1],'w')
numK = 256
#numK = 25
downloadedSize = 0
while 1:
st = time.time()
chunk = file.read(1024*numK)
sizeOfChunk = len(chunk)
downloadedSize += sizeOfChunk
if not chunk: break
speed = int( ( sizeOfChunk /( time.time() - st ) ) / 1024 )
fp.write(chunk)
fp.flush()
percentage = (float(downloadedSize)/float(fileSize))*100
sys.stdout.write('\r ')
sys.stdout.flush()
sys.stdout.write('\r'+str(downloadedSize)+'/'+str(fileSize)+' '+str(int(percentage))+'% '+str(speed)+' kb/s ')
sys.stdout.flush()
fp.close()
Oct 13th:
Updated code to improve performance
Oct 19th:
rapidshare.com updated their site and now write the form using javascript They don't check where the parameters are coming from though so we can issue a GET request instead of a very messy POST request
Nov 19th:
rapidshare.com updated their captchas. The new convert args seem to work but only in about 1 out of 2 or 3 times.
convert captcha.jpg -monochrome -edge 23 -fuzz 60% -floodfill 1x1 white -negate captcha.pbm
It has problems with some characters (2, Z, 4, M) It's not a complicated captcha so maybe if I have some free time soon I will improve the accuracy
Feb 3rd:
Updated code, it's a lot cleaner now and works ~90% of the time once gocr has been trained
Read MoreHow to get Django working on digiweb.ie using django.cgi
Published on June 11, 2007I just thought I'd post exactly what I did to get django running on digiweb.ie shared hosting
First I downloaded and extracted Django version 0.96 to ~/Django-0.96
All requests will then be forwarded through the django.cgi script which I have saved as cgi-bin/dj
.htaccess
RewriteEngine on
RewriteRule ^cgi-bin/ - [L]
RewriteRule ^media/ - [L]
RewriteRule ^(.*)(/)$ cgi-bin/dj/$1/
RewriteRule ^$ cgi-bin/dj/home/
Read More
Django Wiki Part 2
Published on May 31, 2007Building on the simple wikitags parser I wrote about earlier I'm now trying to build an application around that template tag.
A basic wiki should the following features:
- Give the user the ability to create and edit pages.
- Recent versions of a page should be stored to help see what has been changed and to mitigate the effects of spam.
- The syntax of the markup/formatting language should be simple and allow easy linking, especially internal linking.
- Have the ability to contain static pages
The django.views.generic.create_update module looks like the obvious choice to use for creating/editing pages but the basic structure will be something like this.
Read MoreBreak page template tag
Published on May 31, 2007Just looking over some of the posts here I realised that many of them are very long. Django's template system will let you truncate a piece of text using the truncatewords:N tag but this often is not what you want.
For this site if i decide to truncate all my posts at say 200 words then it is possible that a <code> or <p> tag will be left open, which will mess up the appearance of the page.
It would be nicer if I could just tell the template system where to break my post so that bad things don't happen or that a post is broken with only a few words to go.
This little piece of code should implement this, breaking the page when it finds BREAKHERE on it's own on a line.
Read MoreRSS feeds in Django
Published on May 22, 2007Adding RSS feeds to your site is another thing that django makes so easy for you. To start off add
(r'^feeds/(?P<url>.*)/$', 'django.contrib.syndication.views.feed',
{'feed_dict': feeds}),
to your urls.py file
Read MoreDjango Flowchart
Published on May 21, 2007Just found this and need to stick it somewhere so I don't lose it.
Read MoreComments
Published on May 19, 2007Django really is a wonderful piece of software. After watching a video of Jacob Kaplan-Moss' presentation at google where he mentioned that users can comment on almost all content on LJWorld.com I thought "hey that's a pretty cool idea, I want to do that too". So I did, but the really cool thing is just how easy it was to do.
First add 'django.contrib.comments', to your INSTALLED_APPS in settings.py
If your using a custom view instead of django's generic views then you'll need to add
from django.contrib.comments.models import FreeComment
to your views.py
At the top of the template where you want to display the comments add this in
{% load comments.comments %}
Read More
Django Wiki
Published on May 19, 2007Update: I have expanded on this some more here
Update: Fixed a number of issues, should now be working
Update: I have noticed a lot of people coming here after searching google for "django wiki". If you decide to use this code as a basis of your own template tag please let me know just so I know if it actually works reliably.
I've started designing a wiki for django as I can't seem to find one that exists already. It shouldn't be all that hard to do. I've started by just creating some simple search and replace statements using python's regex module re
Read MoreDjango on digiweb.ie
Published on May 18, 2007I finally got this site up and running on digiweb.ie's free student hosting package. The site is powered fully by django
The Web framework for perfectionists with deadlines
As digiweb do not support mod_python or fastcgi on this package I ended up having to use django.cgi to process requests. This method is described as being very slow as for each request an instance of python plus all of django's code needs to started and loaded into memory but I really haven't noticed a delay. Kudos to digiweb.ie for their very generous hosting package!
Read More


