Notes on Python


web2py expects to receive a proper hostname or ip address.  If there is an alias set up to a machine then http://<alias>/admin/default/index will fail.  Replace admin by the ip address.


mkdir cofa_9706
cd cofa_9706/
mkdir trunk
cd trunk/
cd ..
mkdir branches
mkdir tags
cd ..
svn import cofa_9706  file:///data-current/programming/svnroot/cofa_9706
note repeat directory name in svn repo address

Actually, then you need to backup the directory then co the repo you just created, then diff it with the backup then delete the backup.

Getting svn to work with Eric

Heavens above! This is ridiculously difficult. I don’t even know how I managed to do it.


create svn repository with trunk, tags, branches

manually import a directory with existing files into trunk using the command line

use eric to create a new project from svn

file:///path/to/project/excluding the trunk directory

create the directory into which the project is to be imported

try to use absolute paths everywhere rather than relative paths or paths including soft links.

os.path stuff


os.path.splitext(path)[-1] => extension


from optparse import OptionParser

oParser = OptionParser()
oParser.add_option("-o", "--oldFile", dest="oldFile",
                  help="Previous version of file against which a new markup is to be made")
oParser.add_option("-n", "--newFile",dest='newFile',
                  help="Current version of file ")
oParser.add_option("-m", "--markupStyle", dest='markupStyle', default = 'text',
                   help="What style is the markup to be presented in, currently 'text' and 'html' are supported")

(options, args) = oParser.parse_args()

see docs

Code Profiling

Refreshing imported module in interactive mode


For highlighting search objects on ansi terminal

def highlightmatch(matchobj):

matched with

line = re.sub(searchstring,highlightmatch,line)

Need, of course, to import or define the relevant ansi control characters. Easily generalised for other forms of highlighting.

Finding UNO

import uno can fail if the right path is not in sys.path

locate then add that directory to the path

possibly by sys.path.append()

can also add to PYTHONPATH environment variable, but when I did this I kept mistyping ooo3 as 0003 (first are letters+3, second are numbers+3)

More UNO

first run an instance of ooo3, consider headless


needs to be invoked with ooo’s version of python – <path to ooo>/program/python <path>/ <path>/somefile.odt  [from ooo3.1 this may not be necessary]

openoffice or soffice “-accept=socket,host=localhost,port=2002;urp;”

also look at other options which are out there: -headless -nologo etc

Regex for Clauses?

When clauses have been run together into one paragraph (could also combine with stripping out the line breaks).  Re.sub allows substitution of line breaks to split paragraphs according to clause number.  Assumes new clauses are punctuation followed by space and then a number…

findclausenos = '([\.\;]\ {1,10})([0-9]{1,2}\.[0-9]{1,2})'

def clauselines(matchobj):
        # code here

for line in licencefile.readlines():
# here the pattern is passed to the function and when a match is found it is replaced
# by the return value of the function.  Where the pattern contains groups, the text
# matched by the groups can be used as parameters for the return value.
# this is crazy powerful

Open files, reading, writing and closing



softcopy=infile.readlines() # read the lot

for line in infile.readlines() # read them one at a time
     outfile.write(line) # write each line out
infile.close() #close em

Display http in a widget

download and install pykde3… which is probably in python-kde3-bindings or something

then locate

It has a working example.

Now I need to work out how to point it at a file, rather than a url.

Unfortunately I don’t know how to do this in tkinter, so I have to use pykde3 – and therefore learn that gui.

Find last word on a line"\w*\W*$",line)
        print "last word of line = ",

Load lines from file, strip out cr, ff, lf, join lines, convert to string so that re. will operate on it

lines = str("".join([line.rstrip('\r\n\f') for line in open(filename)]))

Running Bash commands from Python

return=os.system('bash command here')


import commands
output = commands.getoutput('command')

Tkinter Scrollable Grid Example.

Smart Quotes

Argh!  Apparently smart quotes are from cp1252 encoding… Some hex codes: double quotes \x93 and \x94.  How to get it actually working:

>>> a='\x93hi there\x94'
>>> b=a.decode('cp1252')
>>> b
u'\u201chi there\u201d'
>>> c=b.encode('utf8')
>>> c
'\xe2\x80\x9chi there\xe2\x80\x9d'
>>> c.decode('utf8')
u'\u201chi there\u201d'
>>> print c.decode('utf8')
“hi there”

Can also do stuff with the codecs module and,mode,’encoding’) if necessary.   Can replace them out.

Think of there being glyphs – which are things printed on the screen, and encodings for the glyphs. You need to decode a particular encoding in order to get unicode and then deal with it.  Something like ‘\xe2\x80\x9c’ is actually meaningless (just a string of hex value bytes) in the absence of an encoding.  On the other hand ‘\xe2\x80\x9c’.decode(‘utf8′) has a meaning – it is the glyph identified by unicode character u’\u201c’.  Once something has been mapped to a unicode character you can then do stuff with it (like trying to search for or replace it).

Searching text across line breaks:,text,re.S) # re.S says search across line breaks

Unescaping HTML

There is a short python program by Frederick Lundh, but it doesn’t work for me.

So, I’m now trying

from xml.sax.saxutils import unescape

Decode CueCat

Script here

Get list of files in directory

files = commands.getoutput('ls').split('\n') #  (or equivalent command)

but unescape them to feed them back to the operating system (apparently no standard way to do this Tch! – pipes.quote is mentioned, but does not seem to work for me)

Get list of all characters used in document

charset = set([])

for i in txtList:

print charset

So add any pesky characters to potential search strings

Tokenise Whitespace in PyParsing (demo code):

from pyparsing import *

test = '''   I  want to tokenise     the white space in a string'''

def parseIt(parseDefn,textToParse):
        spam= parseDefn.parseString(textToParse)
    return spam

whiteWord=White(' ')

lines = []
lines.append(OneOrMore(ordWord & whiteWord))
lines.append(lines[0] + LineEnd())
lines.append(OneOrMore(ordWord ^ whiteWord))

for line in lines:
    print line
    print parseIt(line,test)
    print parseIt(line.leaveWhitespace(),test)

Misc PyParsing Notes:

Word(a,b) a = characteristics of first character, b = other characters

some recognised constants = printable, alphas, alphanums, nums

Word(srange(‘[A-Z]’) for caps

.parseWithTabs() to keep tabs, otherwise locn will be out of sync

parseAction function takes argument (apparently _only_) of form (s,loc,toks) or (loc, toks) or (toks)

use a parseAction to create a token with location data (need to define an extended token class)


import time


Upload file to web form

>>> import mechanize
>>> br=mechanize.Browser()
>>> myurl=''
<response_seek_wrapper at 0x9a2b00 whose wrapped object = <closeable_response at 0x9a2710 whose fp = <socket._fileobject object at 0xa281d0>>>

>>> for form in br.forms():
...    print form
<POST multipart/form-data
  <FileControl(store_file=<No files added>)>
  <SubmitControl(<None>=Submit) (readonly)>
  <HiddenControl(_formkey=8225c049-b596-47f1-a154-501a0c96db4d) (readonly)>
  <HiddenControl(_formname=uploadedFiles_create) (readonly)>>

>>> br.select_form(nr=0) # or some other select criteria
>>> br.form.add_file(open('README.txt'),'text/plain','README.txt',name='store_file')
>>> br.form.set_all_readonly(False)
>>> br.submit()
<response_seek_wrapper at 0x9a2950 whose wrapped object = <closeable_response at 0x9b61b8 whose fp = <socket._fileobject object at 0xa282d0>>>

Embed Mplayer in a Tkinter Frame:

from Tkinter import *
import commands

testVid = &quot;BMW_Museum_-_Kinetic_Sculpture-hlx-M53dC7M.flv&quot;

def playV(event):
  mpc = &quot;mplayer -wid %s %s&quot;%(fwid, testVid)
  output = commands.getoutput(mpc)

master = Tk()

L = Label(master,text=&quot;one&quot;)

separator = Frame(height=2, bd=1, relief=SUNKEN)
separator.pack(fill=X, padx=5, pady=5)


frame = Frame(width=768, height=576, bg=&quot;&quot;, colormap=&quot;new&quot;)

fwid = frame.winfo_id()


Get those links which occur in a list on a page:

from lxml.html import parse
page = parse(someaddress).getroot()
p = page.xpath('//li/a')
for p1 in p:
   print p1.attrib

0 Responses to “Scripts-Py”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog Stats

  • 216,934 hits

OSWALD Newsletter

If you would like to receive OSWALD, a weekly open source news digest please send an email to oswald (with the subject "subscribe") at

%d bloggers like this: