In this chapter we look at integration with Microsoft Excel using COM. This is interesting for technical and business reasons. Technically, it will give us a chance to look at both server- and client-side COM. On the business side, we've built a financial application, and a vast amount of financial data comes from and ends up going back into spreadsheets. We'll develop some simple examples that allow you to import and export data from spreadsheets, and see how to build an alternative frontend for pulling out data in Excel.
Using Python for client-side COM basically means using Python to access an application somebody else has written. There is little to learn about client-side COM per se; what matters is learning the object model of the application or library you want to use.
Excel has a large object modelabout as complex as anything you are likely to encounter. For example, just one of the objects, Range, has 84 properties and 72 methods. What's more, it's not particularly straightforward; the Perl community's mantra "There's more than one way to do it" probably applies even more to Excel than it does to Perl.
In Chapter 7, Building a GUI with COM, we stressed the importance of rerunning Office Setup to get the right help files. If you didn't do it then, do it now. The key help file is the Microsoft Excel Visual Basic Reference, which covers the object model.
The easiest way to learn the Excel object model is to write programs in Visual Basic for Applications. The VB editor is available with every copy of Office and is a first-rate development environment. The following key features help a great deal in learning:
Drop-down auto-completion
If you type ActiveSheet into the editor, a list drops down showing all the properties and methods of a Sheet object. This saves an enormous amount of time learning the object model (see Figure 9-1).
Figure 9-1. Drop-down auto-completion in VBA |
Context-sensitive help
You can click on any variable, property, or method name and press F1 to get detailed help on that part of the object model. The help file also has a number of genuinely useful topic guides.
The F5 key and spreadsheet integration
You don't need to write a whole program to get started in Excel. You can write a subroutine in the editor, hit F5, and it runs. A great way to start learning about ranges and selections is to write a short routine to select some cells, and just keep running and extending it.
Having dispensed this piece of advice, we'll ignore it totally and begin by getting at Excel from Python. Within Python, you can't do any damage, but you need to know the objects and methods required.
A one-time step, which isn't required but which makes things faster and more pleasant to develop with, is to run the Python MakePy utility. What this does is explained fully in Chapter 12, Advanced Python and COM. On the Tools menu, choose COM Makepy utility, and select Microsoft Excel 8.0 Object Library from the list box. This may take one or two minutes to run but needs to be done only once per machine. This builds a support library of Python code for accessing the Excel object model, which allows early- rather than late-bound COM. Your code will run faster, and you'll know which Office objects you are accessing. If you don't do this, it all still works, but it uses a different technique behind the scenes.
Start up PythonWin and enter the following:
>>> from win32com.client import Dispatch
>>> xlApp = Dispatch("Excel.Application")
>>> xlApp.Visible = 1
>>> xlApp.Workbooks.Add()
<win32com.gen_py.Microsoft Excel 8.0 Object Library.Workbook>
>>>
There will be a few-second pause before Excel starts. You should see Excel appear when you enter xlApp.Visible = 1, but with an empty main window; the final line creates a blank workbook with three sheets. Note that the return value of Add is informative. This is part of what Makepy does for you; if you had not run it, you'd get a less informative string back. By the way, we've made Excel visible for teaching purposes; if you just want to manipulate data, keep it hidden and save processor cycles.
The Excel object hierarchy basically goes Application, Workbook, Sheet, Range, Cell. A Range is an arbitrary region on a Sheet. You can assign variables to the various items in the hierarchy or drill down in one long statement. If you want to modify the top left cell of Sheet1 in our new workbook, you can get to it in any of the following ways:
>>> xlApp.ActiveSheet.Cells(1,1).Value = 'Python Rules!'
>>> xlApp.ActiveWorkbook.ActiveSheet.Cells(1,1).Value = 'Python Rules!'
>>> xlApp.Workbooks("Book1").Sheets("Sheet1").Cells(1,1).Value = "Python Rules!"
>>> xlApp.Workbooks(1).Sheets(1).Cells(1,1).Value = "Python Rules!"
>>> xlApp.Workbooks(1).Sheets(1).Cells(1,1).Value = "Python Rules!"
>>>
>>> xlBook = xlApp.Workbook(1)
>>> xlSheet = xlApp.Sheets(1)
>>> xlSheet.Cells(1,1).Value = "Python Rules!"
>>>
We recommend getting a reference to the sheet you want and working with that.
The last few examples illustrate a common feature of Microsoft object modelstheir dependence on collections. A collection can be viewed as a cross between a list and a dictionary; it can be accessed by a numeric index or a named string key. Python allows you to access items via numeric indexes with both a function-call and an array syntax: in other words, with parentheses or square brackets. If you are using a string key, you must use parentheses:
>>> xlBook.Sheets(1)
<win32com.gen_py.Microsoft Excel 8.0 Object Library._Worksheet>
>>> xlBook.Sheets[1]
<win32com.gen_py.Microsoft Excel 8.0 Object Library._Worksheet>
>>> xlBook.Sheets["Sheet1"]
# Some error details omitted
TypeError: Only integer indexes are supported for enumerators
>>>
However, there is a trap to watch for. Collections can be written to start indexing from one or from zero. If you are dealing with a one-based collection, you will get different answers; using square brackets gives you the true position in the collection, but parentheses gives the position according to the numbering system chosen by the author of that collection:
>>> xlBook.Sheets(1).Name
'Sheet1'
>>> xlBook.Sheets[1].Name
'Sheet2'
>>>
We recommend using parentheses throughout and relying on the object model's documentation to find how the collections work. For Microsoft Office applications, most collections start at 1.
Both Python and Excel support keyword arguments. These are generally used when you have a long list of possible arguments to a function, most of which have default values. Excel takes this to extremes; for example, the function to save a workbook is:
WorkBook.SaveAs(Filename, FileFormat, Password, WriteResPassword, ReadOnlyRecommended,
CreateBackup, AddToMru, TextCodePage, TextVisualLayout)
And that's a short one: some of the formatting commands have literally dozens of arguments. To call these, you supply only the arguments you want, as follows:
>>> xlBook.SaveAs(Filename='C:\\temp\\mysheet.xls')
>>>
The capitalization of the keywords must be exactly right. Microsoft generally uses mixed case for everything but Filename, but you'll undoubtedly have a fun time discovering more exceptions.
We put data into a cell with the Value property of a cell. We can also retrieve data the same way. This works with numbers and strings. Excel always stores numbers as floating-point values:
>>> xlSheet.Cells(1,1).Value = 'What shall be the number of thy counting?'
>>> xlSheet.Cells(2,1).Value = 3
>>> xlSheet.Cells(1,1).Value
'What shall be the number of thy counting?'
>>> xlSheet.Cells(2,1).Value
3.0
>>>
Excel users know that dates are just numbers with formatting applied. However, Excel keeps track of which cells are known to be dates and which are ordinary numbers internally. Excel (and most Windows applications, as well as COM) define a date as the number of days since 1/1/1900, while Python (and Unix) counts the number of seconds. If you want to add a date, you can work out the number yourself, but the Python COM framework provides a utility to make a COM date, which ensures that it's recognized properly but also accessible in Python format:
>>> import time
>>> now = time.time()
>>> now # how many seconds since 1970?
923611182.35
>>> import pythoncom
>>> time_object = pythoncom.MakeTime(now)
>>> int(time_object) # can get the value back…
923611182
>>> xlSheet.Cells(3,1).Value = time_object # …or send it
>>> xlSheet.Cells(3,1).Value
<time object at 188c080>
>>>
When you send a date to Excel, it automatically formats the cell as well.
If you want to insert a formula, use the formula property and enter the formula exactly as you would in Excel:
>>> xlSheet.Cells(4,1).Formula = '=A2*2'
>>> xlSheet.Cells(4,1).Value
6.0
>>> xlSheet.Cells(4,1).Formula
'=A2*2'
>>>
Finally, empty cells are represented by the Python value None:
>>> xlSheet.Cells(1,1).Value = None # clear a cell
>>> xlSheet.Cells(1,1).Value # returns None
>>>
We've been calling the Value and Formula methods without really knowing what they refer to. They are methods of an object called a Range, which refers to a range of cells. You can use Ranges to get hold of regions of a spreadsheet in several ways:
>>> myRange1 = xlSheet.Cells(4,1) # one-cell range
>>> myRange2 = xlSheet.Range("B5:C10") # excel notation
>>> myRange3 = xlSheet.Range(xlSheet.Cells(2,2), xlSheet.Cells(3,8))
>>>
A sheet has a Range() method that returns a Range object. The Range method can accept the usual Excel notation or a pair of one-cell Range objects defining the top left and bottom right. You can even perform intersections and unions to build nonrectangular ranges. Once you have a Range object, as mentioned earlier, you have 84 methods and 72 properties to play with. These cover all the formatting options including data.
In Chapter 7 we built a view of an account. To do this, we fetched a 2D array of data from Python and looped over it, putting one number at a time into a grid. When we first started playing with Python and Excel, we expected to have to do something similar. Not so! Ask a range for its value, and you get an array. You can set the value of a range to an array, too. At this point we've typed a few more items into our spreadsheet to refer to (see Figure 9-2).
Figure 9-2. Passing arrays between Python and Excel |
First, grab a horizontal array:
>>> xlSheet.Range('C3:E3').Value
((L'left', L'to', L'right'),)
>>>
Note that you get back Unicode strings, which you could convert to Python with a str() operation. When you asked for a single cell value earlier, the Python COM framework was smart enough to convert the Unicode string to a Python string; with a big array, you have to do the work.
Now, for a matrix with several rows and columns:
>>> xlSheet.Range('C5:D7').Value
((L'North', L'South'), (100.0, 200.0), (300.0, 400.0))
>>>
This returns a tuple of tuples, exactly the natural representation you would choose in Python. (For the rest of this section we use the term array to mean a Python structure of this shapea list of lists, tuple of tuples, or list of tuples.) Finally, look at a vertical row, taking the items in column F:
>>> xlSheet.Range('F2:F4').Value
((1.0,), (2.0,), (3.0,))
>>>
As before, you get a tuple of tuples.
You may be wondering what those extra commas are inside the parentheses. They are Python's way of marking a one-element tuple. The expressions 2, (2), and ((((2))) all evaluate to 2 in Python, as you would expect from the use of parentheses in mathematical formulae in any language. The comma tells Python it's looking at a one-element tuple rather than an expression to simplify.
Passing arrays between Excel and Python is fast. We tried passing a matrix with 100 rows and 100 columns on a Pentium 266. Sending it to Excel took 1.7 seconds; fetching back the same amount of data took just 0.07 seconds.
Having been through the mechanics, we will now build a class to make it slightly easier to get data in and out of Excel. It's easy already, but if you want to do a lot of work with Excel, you can certainly save a few lines of code in the users' scripts. An example of this can be found in the file exceldemos.py.
We've created a class called easyExcel. When an instance is created, it starts Excel. This class provides methods to open, create, and save files, and to get and set cell values and ranges. It can also deal with the Unicode strings and time objects if you wish. You could easily extend it to add new methods, but here are a selection that should be useful:
class easyExcel:
"""A utility to make it easier to get at Excel. Remembering
to save the data is your problem, as is error handling.
Operates on one workbook at a time."""
def __init__(self, filename=None):
self.xlApp = win32com.client.Dispatch('Excel.Application')
if filename:
self.filename = filename
self.xlBook = self.xlApp.Workbooks.Open(filename)
else:
self.xlBook = self.xlApp.Workbooks.Add()
self.filename = "
def save(self, newfilename=None):
if newfilename:
self.filename = newfilename
self.xlBook.SaveAs(newfilename)
else:
self.xlBook.Save()
def close(self):
self.xlBook.Close(SaveChanges=0)
del self.xlApp
Now put in methods to set and get cells. Users can specify a sheet name or index, row, and column:
def getCell(self, sheet, row, col):
"Get value of one cell"
sht = self.xlBook.Worksheets(sheet)
return sht.Cells(row, col).Value
def setCell(self, sheet, row, col, value):
"set value of one cell"
sht = self.xlBook.Worksheets(sheet)
sht.Cells(row, col).Value = value
def getRange(self, sheet, row1, col1, row2, col2):
"return a 2d array (i.e. tuple of tuples)"
sht = self.xlBook.Worksheet(sheet)
return sht.Range(sht.Cells(row1, col1), sht.Cells(row2, col2)).Value
When you want to insert a block of data, just specify the first cell; there's no need for users to work out the number of rows:
def setRange(self, sheet, leftCol, topRow, data):
"""insert a 2d array starting at given location.
Works out the size needed for itself"""
bottomRow = topRow + len(data) - 1
rightCol = leftCol + len(data[0]) - 1
sht = self.xlBook.Worksheets(sheet)
sht.Range(
sht.Cells(topRow, leftCol),
sht.Cells(bottomRow, rightCol)
).Value = data
Sometimes you need to grab a chunk of data when you don't know how many columns or even rows to expect. The following method scans down and right until it hits a blank: all that is needed is the starting point:
def getContiguousRange(self, sheet, row, col):
"""Tracks down and across from top left cell until it
encounters blank cells; returns the non-blank range.
Looks at first row and column; blanks at bottom or right
are OK and return None within the array"""
sht = self.xlBook.Worksheets(sheet)
# find the bottom row
bottom = row
while sht.Cells(bottom + 1, col).Value not in [None, "]:
bottom = bottom + 1
# right column
right = col
while sht.Cells(row, right + 1).Value not in [None, "]:
right = right + 1
return sht.Range(sht.Cells(row, col), sht.Cells(bottom, right)).Value
Arrays coming back often contain either Unicode strings or COM dates. You could convert these on a per-column basis as needed (sometimes there's no need to convert them), but here's a utility that returns a new array in which these have been cleaned up:
def fixStringsAndDates(self, aMatrix):
# converts all unicode strings and times
newmatrix = []
for row in aMatrix:
newrow = []
for cell in row:
if type(cell) is UnicodeType:
newrow.append(str(cell))
elif type(cell) is TimeType:
newrow.append(int(cell))
else:
newrow.append(cell)
newmatrix.append(tuple(newrow))
return newmatrix
The wrapper class now makes it easy to grab data out of a sheet. You can keep extending this when you needed a new function, e.g., searching for cells.
Now we'll write an import script using our wrapper. Excel is a good medium for entering financial data; users can enter data more quickly in Excel than in a database; Excel builds pick lists automatically; and it's easy to copy, paste, and rear-range the data quickly. However, data entered in this way isn't always properly validated. Imagine that someone in your organization is preparing a list of new invoices raised once a month and emailing the data in a standardized spreadsheet. You want to import it, validate it, get the data into double-entry format, and save it in a BookSet, as part of assembling monthly management accounts. The examples for this chapter include a spreadsheet called invoices.xls, which looks like Figure 9-3.
Figure 9-3. Invoices to be imported |
You want to open this up, grab the three facts near the top, and import the matrix of cells describing the invoices. You're not sure how many invoices to expect, so you need to count down. If it's a cash sale (i.e., Date Raised is the same as Date Paid), create one transaction; otherwise create two: one for the bill and one for the payment. Note that some invoices are not yet paid; in this case, estimate a payment data for your forecasts and tag it as a scheduled transaction so that it can be filtered out of real accounts.
The full script to do this is in the file exceldemos.py, which you can find at http://starship.python.net/crew/mhammond/ppw32/. Using the wrapper class, it becomes easy to acquire the desired data:
def getInvoices():
# the demo - get some data from a spreadsheet, parse it, make
# transactions, save
# step 1 - acquire the data
spr = easyExcel('Invoices.xls')
MonthEnd = int(spr.getCell('Sheet1', 3, 2))
PreparedBy = spr.getCell('Sheet1', 4, 2)
Submitted = int(spr.getCell('Sheet1', 5, 2))
print 'Month end %s, prepared by %s, submitted %s' % (
time.ctime(MonthEnd),
PreparedBy,
time.ctime(Submitted)
)
# do not know how many rows
rawInvoices = spr.getContiguousRange('Sheet1',8,1)
rows = spr.fixStringsAndDates(rawInvoices)
We've extracted the needed data and cleaned it up in just a few lines. Logically, the next stage is to validate the data. You could go a long way with this, but let's just check that the main table is in the correct place and hasn't been rearranged:
# check correct columns
assert rows[0] == ('Invoice No', 'Date Raised',
'Customer', 'Comment', 'Category',
'Amount', 'Terms', 'Date Paid'
), 'Column structure is wrong!'
print '%d invoices found, processing' % len(rows)
The data looks fine; now all you have to do is build the transactions you want:
# make a BookSet to hold the data
bs = BookSet()
# process the rows after the headings
for row in rows[1:]:
# unpack it into separate variables
(invno, date, customer, comment,
category, fmt_amount, terms, datepaid) = row
# amounts formatted as currency may be returned as strings
amount = string.atof(fmt_amount)
if date == datepaid:
# cash payment, only one transaction
tran = Transaction()
tran.date = date
tran.comment = 'Invoiced - ' + comment
tran.customer = customer
tran.invoiceNo = invno
tran.addLine('MyCo.Capital.PL.Income.' + category, - amount)
tran.addLine('MyCo.Assets.NCA.CurAss.Cash', amount)
bs.add(tran)
else:
# need to create an invoice and a (possibly future) payment
# first the bill
tran = Transaction()
tran.date = date
tran.comment = 'Invoiced - ' + comment
tran.customer = customer
tran.invoiceNo = invno
tran.addLine('MyCo.Capital.PL.Income.' + category, - amount)
tran.addLine('MyCo.Assets.NCA.CurAss.Creditors', amount)
bs.add(tran)
# now the payment. If not paid, use the terms to estimate a
# date, and flag it as a Scheduled transaction (i.e., not real)
tran = Transaction()
if datepaid == None:
datepaid = date + (terms * 86400)
tran.mode = 'Scheduled' # tag it as not real
tran.date = date
tran.comment = 'Invoice Paid - ' + comment
tran.customer = customer
tran.invoiceNo = invno
tran.addLine('MyCo.Assets.NCA.CurAss.Creditors', - amount)
tran.addLine('MyCo.Assets.NCA.CurAss.Cash', amount)
bs.add(tran)
# we're done, save and pack up
filename = 'invoices.dtj'
bs.save(filename)
print 'Saved in file', filename
spr.close()
Although the code is fairly lengthy, most of it is transferring a field at a time from the input to the relevant field or line of a transaction. If you define classes to represent invoices and payments, each with a standard set of attributes and the right constructors, this is simplified even further.
Excel makes a wonderful frontend for financial applications. We've already built a COM server that can handle all of our data and return certain views. There's a strong case for turning things on their heads and having Excel use Python COM objects in many circumstances. We won't go through an example here, as we have covered all the necessary techniques in detail, but it's worth thinking about what's possible.
Imagine you regularly generate a set of management accounts, including results for the year to date and a forecast a year ahead, and that you extend this with various views of the data. You could easily build an Excel spreadsheet that starts a COM server when opened. Where our VB GUI had to do a double loop over the data to get it into a grid on screen, Excel can insert whole arrays into worksheets in a split second. You can configure sheets within the workbook to display arbitrary views of the data and link charts to these views. Some of these would be user-configurable; for example, selecting an account or customer from a combo box at the top of a sheet could fill the sheet with the relevant data, and a chart below it would be updated automatically.
This is the ideal format in which to deliver numbers to a user; they can immediately start doing their own calculations and building charts or comparisons of the data that interest them.
You could as easily build import applications that allow users to prepare data for import and click a submit button to save it into a BookSet. This involves either writing the table-processing code in VB or extending the COM server with some import functionality.
When should you use Python as the client and when as the server? There are two factors to consider:
Who's starting things off?
If a user wants to work interactively with a spreadsheet, that implies Excel on top and a Python COM server. If a spreadsheet import/export is one step in the execution of a much larger Python script that runs over 200 files at 2:00 a.m., do it the other way around.
Does it require complex formatting or manipulation in Excel?
If you're getting data out of Excel, you need to know only the cell references, which we've covered. If you want to build complex spreadsheets in the middle of the night, where the formatting depends on the data, consider writing the code in VBA. The spreadsheet can have an AutoOpen macro that starts up a Python COM server to deliver the data, and the fetching and formatting can be handled in Excel. The code will be the same length, but a lot easier to develop and debug in VBA.
In this chapter we looked at the other side of the coin, client-side COM. We learned how to control office applications from Python scripts and developed some reusable tools that allow us to easily extract data from and send data to Microsoft Excel.
Excel Visual Basic for Applications online help
This is not installed under the standard Office Setup; you need to rerun Office Setup, choose Custom Setup, and explicitly select the VBA help.
Your Python installation
After reading Chapter 5, Introduction to COM, you'll discover that Python can build a complete wrapper around Excel's object model, documenting all objects and properties. This doesn't tell you what all the methods do but provides a comprehensive and remarkably readable listing of the objects and methods.
Microsoft Office 97 Developer's Handbook, Microsoft Press
This book describes how to build custom applications using all the Office applications. There is a general overview of Excel but not enough detail on Range objects.
Microsoft Excel 97 Developer's Handbook, Microsoft Press
This book devotes 500 pages specifically to building applications in Excel. Regrettably, the majority focuses on GUI development and perhaps only 10% of the book relates to what you might use with Python. If you are developing routines in Excel VBA, this is the best reference.