Notes on Python

Web2Py

web2py expects to receive a proper hostname or ip address.  If there is an alias set up to a machine then http://<alias>/admin/default/index will fail.  Replace admin by the ip address.

svn

mkdir cofa_9706
cd cofa_9706/
mkdir trunk
cd trunk/
touch __init__.py
touch main.py
cd ..
mkdir branches
mkdir tags
cd ..
svn import cofa_9706  file:///data-current/programming/svnroot/cofa_9706
note repeat directory name in svn repo address

Actually, then you need to backup the directory then co the repo you just created, then diff it with the backup then delete the backup.

Getting svn to work with Eric

Heavens above! This is ridiculously difficult. I don’t even know how I managed to do it.

try:

create svn repository with trunk, tags, branches

manually import a directory with existing files into trunk using the command line

use eric to create a new project from svn

file:///path/to/project/excluding the trunk directory

create the directory into which the project is to be imported

try to use absolute paths everywhere rather than relative paths or paths including soft links.

os.path stuff

os.path.exists(path)

os.path.splitext(path)[-1] => extension

optparse

from optparse import OptionParser

oParser = OptionParser()
oParser.add_option("-o", "--oldFile", dest="oldFile",
                  help="Previous version of file against which a new markup is to be made")
oParser.add_option("-n", "--newFile",dest='newFile',
                  help="Current version of file ")
oParser.add_option("-m", "--markupStyle", dest='markupStyle', default = 'text',
                   help="What style is the markup to be presented in, currently 'text' and 'html' are supported")

(options, args) = oParser.parse_args()

see docs

Code Profiling

http://docs.python.org/library/profile.html

Refreshing imported module in interactive mode

reload(modulename)

For highlighting search objects on ansi terminal

def highlightmatch(matchobj):
 return BOLD+matchobj.group(0)+RESET

matched with

line = re.sub(searchstring,highlightmatch,line)

Need, of course, to import or define the relevant ansi control characters. Easily generalised for other forms of highlighting.

Finding UNO

import uno can fail if the right path is not in sys.path

locate uno.py then add that directory to the path

possibly by sys.path.append()

can also add to PYTHONPATH environment variable, but when I did this I kept mistyping ooo3 as 0003 (first are letters+3, second are numbers+3)

More UNO

first run an instance of ooo3, consider headless

google ooextract.py

needs to be invoked with ooo’s version of python – <path to ooo>/program/python <path>/ooextract.py <path>/somefile.odt  [from ooo3.1 this may not be necessary]

openoffice or soffice “-accept=socket,host=localhost,port=2002;urp;”

also look at other options which are out there: -headless -nologo etc

Regex for Clauses?

When clauses have been run together into one paragraph (could also combine with stripping out the line breaks).  Re.sub allows substitution of line breaks to split paragraphs according to clause number.  Assumes new clauses are punctuation followed by space and then a number…

findclausenos = '([\.\;]\ {1,10})([0-9]{1,2}\.[0-9]{1,2})'

def clauselines(matchobj):
        # code here
        return matchobj.group(1)+"\n\n"+matchobj.group(2)

for line in licencefile.readlines():
        processed=re.sub(findclausenos,clauselines,line)
# here the pattern is passed to the function and when a match is found it is replaced
# by the return value of the function.  Where the pattern contains groups, the text
# matched by the groups can be used as parameters for the return value.
# this is crazy powerful

Open files, reading, writing and closing

infilename="infile.txt"
outfilename="outfile.txt"

infile=open(infilename,"r")
outfile=open(outfilename,"w")

softcopy=infile.readlines() # read the lot

for line in infile.readlines() # read them one at a time
     outfile.write(line) # write each line out
infile.close() #close em
outfile.close()

Display http in a widget

download and install pykde3… which is probably in python-kde3-bindings or something

then locate pyKHTMLPart.py

It has a working example.

Now I need to work out how to point it at a file, rather than a url.

Unfortunately I don’t know how to do this in tkinter, so I have to use pykde3 – and therefore learn that gui.

Find last word on a line

        last_word=re.search("\w*\W*$",line)
        print "last word of line = ", last_word.group(0)

Load lines from file, strip out cr, ff, lf, join lines, convert to string so that re. will operate on it

lines = str("".join([line.rstrip('\r\n\f') for line in open(filename)]))

Running Bash commands from Python

return=os.system('bash command here')

or

import commands
output = commands.getoutput('command')

Tkinter Scrollable Grid Example.

Smart Quotes

Argh!  Apparently smart quotes are from cp1252 encoding… Some hex codes: double quotes \x93 and \x94.  How to get it actually working:

>>> a='\x93hi there\x94'
>>> b=a.decode('cp1252')
>>> b
u'\u201chi there\u201d'
>>> c=b.encode('utf8')
>>> c
'\xe2\x80\x9chi there\xe2\x80\x9d'
>>> c.decode('utf8')
u'\u201chi there\u201d'
>>> print c.decode('utf8')
“hi there”

Can also do stuff with the codecs module and codecs.open(file,mode,’encoding’) if necessary.   Can replace them out.

Think of there being glyphs – which are things printed on the screen, and encodings for the glyphs. You need to decode a particular encoding in order to get unicode and then deal with it.  Something like ‘\xe2\x80\x9c’ is actually meaningless (just a string of hex value bytes) in the absence of an encoding.  On the other hand ‘\xe2\x80\x9c’.decode(‘utf8′) has a meaning – it is the glyph identified by unicode character u’\u201c’.  Once something has been mapped to a unicode character you can then do stuff with it (like trying to search for or replace it).

Searching text across line breaks:

re.search(search,text,re.S) # re.S says search across line breaks

Unescaping HTML

There is a short python program by Frederick Lundh, but it doesn’t work for me.

So, I’m now trying

from xml.sax.saxutils import unescape

Decode CueCat

Script here

Get list of files in directory

files = commands.getoutput('ls').split('\n') #  (or equivalent command)

but unescape them to feed them back to the operating system (apparently no standard way to do this Tch! – pipes.quote is mentioned, but does not seem to work for me)

Get list of all characters used in document

charset = set([])

for i in txtList:
    charset.add(i)

print charset

So add any pesky characters to potential search strings

Tokenise Whitespace in PyParsing (demo code):

from pyparsing import *

test = '''   I  want to tokenise     the white space in a string'''

def parseIt(parseDefn,textToParse):
    try:
        spam= parseDefn.parseString(textToParse)
    except:
        spam=None
    return spam

ordWord=Word(alphas,alphanums)
whiteWord=White(' ')

lines = []
lines.append(OneOrMore(ordWord & whiteWord))
lines.append(lines[0] + LineEnd())
lines.append(OneOrMore(ordWord ^ whiteWord))

for line in lines:
    print line
    print parseIt(line,test)
    print parseIt(line.leaveWhitespace(),test)

Misc PyParsing Notes:

Word(a,b) a = characteristics of first character, b = other characters

some recognised constants = printable, alphas, alphanums, nums

Word(srange(‘[A-Z]’) for caps

.parseWithTabs() to keep tabs, otherwise locn will be out of sync

parseAction function takes argument (apparently _only_) of form (s,loc,toks) or (loc, toks) or (toks)

use a parseAction to create a token with location data (need to define an extended token class)

Waiting

import time

time.sleep(seconds)

Upload file to web form

>>> import mechanize
>>> br=mechanize.Browser()
>>> myurl='http://127.0.0.1:8000/testingUpload/default/upload_form'
>>> br.open(myurl)
<response_seek_wrapper at 0x9a2b00 whose wrapped object = <closeable_response at 0x9a2710 whose fp = <socket._fileobject object at 0xa281d0>>>

>>> for form in br.forms():
...    print form
...
<POST http://127.0.0.1:8000/testingUpload/default/upload_form multipart/form-data
  <FileControl(store_file=<No files added>)>
  <SubmitControl(<None>=Submit) (readonly)>
  <HiddenControl(_formkey=8225c049-b596-47f1-a154-501a0c96db4d) (readonly)>
  <HiddenControl(_formname=uploadedFiles_create) (readonly)>>

>>> br.select_form(nr=0) # or some other select criteria
>>> br.form.add_file(open('README.txt'),'text/plain','README.txt',name='store_file')
>>> br.form.set_all_readonly(False)
>>> br.submit()
<response_seek_wrapper at 0x9a2950 whose wrapped object = <closeable_response at 0x9b61b8 whose fp = <socket._fileobject object at 0xa282d0>>>

Embed Mplayer in a Tkinter Frame:

from Tkinter import *
import commands

testVid = &quot;BMW_Museum_-_Kinetic_Sculpture-hlx-M53dC7M.flv&quot;

def playV(event):
  mpc = &quot;mplayer -wid %s %s&quot;%(fwid, testVid)
  output = commands.getoutput(mpc)

master = Tk()

L = Label(master,text=&quot;one&quot;)
L.pack()
L.bind('&lt;Button-1&gt;',playV)

separator = Frame(height=2, bd=1, relief=SUNKEN)
separator.pack(fill=X, padx=5, pady=5)

Label(text=&quot;two&quot;).pack()

frame = Frame(width=768, height=576, bg=&quot;&quot;, colormap=&quot;new&quot;)
frame.pack()

fwid = frame.winfo_id()

mainloop()
 

Get those links which occur in a list on a page:

from lxml.html import parse
page = parse(someaddress).getroot()
p = page.xpath('//li/a')
for p1 in p:
   print p1.attrib

0 Responses to “Scripts-Py”



  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Blog Stats

  • 219,507 hits

OSWALD Newsletter

If you would like to receive OSWALD, a weekly open source news digest please send an email to oswald (with the subject "subscribe") at opensourcelaw.biz

%d bloggers like this: