pattern.db

The pattern.db module includes wrappers for popular databases (SQLite + MySQL) and Unicode CSV files. It offers a convenient way to work with tabular data, for example retrieved with the pattern.web module.

It can be used by itself or with other pattern modules: web | db | en | search  | vector | graph.


Documentation

 


Database

A Database is a collection tables. A Table contains rows of data, with a specific data type (e.g., string, integer) for each field (column). A database is typically managed by a robust engine such as MySQL. Statements written in SQL (Structured Query Language) are used to interact with the engine. The pattern.db module is a wrapper for SQL. It supports the SQLITE and MYSQL engines. 

SQLite is already included with Python 2.5+. With older versions you need to install PySQLite. To use MySQL, you need to install MySQL and MySQL-Python. If you have a 64-bit Python you'll need a 64-bit MySQL and MySQLdb.

Database() creates (if necessary) and returns a database. With SQLITE, this will generate a file with the given name in the current folder.

db = Database(
      name, 
      host = 'localhost', 
      port = 3306, 
  username = 'root', 
  password = '', 
      type = SQLITE
)
db.type                     # SQLITE | MYSQL
db.name                     # Database name.
db.host                     # Database host.
db.port                     # Database port.
db.username                 # Database username (MySQL).
db.password                 # Database password (MySQL).
db.tables                   # Dictionary of (name, Table)-items.
db.relations                # List of relations, see Database.link().
db.query                    # Last SQL query string.
db.connected                # True after Database.connect(). 
db.connect()                # Happens automatically.
db.disconnect()
db.create(table, fields=[])
db.remove(table)
db.link(table1, field1, table2, field2, join=LEFT) 
db.execute(SQL, commit=False)
db.commit()
db.escape(value)            # "a cat's tail" => "'a cat's tail'"
  • Database.execute() returns the results from a raw SQL query.
  • Database.commit() commits the changes after a INSERT, UPDATE, DELETE SQL query.
  • Database.escape() safely quotes and escapes a field value.
    Also see the source code for abs() and cmp() when constructing a raw SQL query.

Creating a table

Database.create() creates a new table in the database, from a given table name and a list of row fields, defined with the field() function:

field(name, type=STRING, default=None, index=False, optional=True)

Each field has a name and a type, with an optional default value for new rows. Field names should contain only characters (a-z) and underscores. The default value for a DATE field is NOW (current date & time). With optional=True, the field is allowed to contain None.

Type Value  Example 
STRING str, unicode (1-255 characters)  u'Schrödinger'
INTEGER int 42
FLOAT float 3.14159
TEXT str, unicode open('file.txt').read() 
BLOB str (binary: PDF, PNG, ...) Database.binary(open('cat.jpg').read())
BOOLEAN bool TrueFalse
DATE Date date('1999-12-31 23:59:59')

By default, a STRING field can hold up to a 100 characters, but the length (1-255) can be changed by calling STRING like a function: e.g., STRING(255). For longer strings, use TEXT.

With index=True, the field is indexed for faster search. The index can also be set to UNIQUE (no duplicates allowed in this column) or PRIMARY. A table should have a primary key field that uniquely identifies each row (i.e., an id). You can use the pk() function, which creates an INTEGER primary key field named id. Integer primary keys are auto-numbered – no need to set the value manually in a new row.

pk(name='id') # type=INTEGER + index=PRIMARY + optional=False 

For example:

>>> from pattern.db import Database, field, pk, STRING, BOOLEAN, DATE, NOW, date
>>> db = Database('my_stuff')
>>> db.create('pets', fields=[
>>>     pk(),
>>>     field('name', STRING(80), index=True),
>>>     field('type', STRING(20)),
>>>     field('tail', BOOLEAN),
>>>     field('date_birth', DATE, default=None),
>>>     field('date_created', DATE, default=NOW)
>>> ])
>>> db.pets.append(name=u'Schrödinger', type='cat', tail=True)
>>> print db.pets.rows()[0]

[u'Schrödinger', u'cat', True, None, '2011-08-17 19:16:46']

Creating a table from XML

You can also pass Table.xml or Query.xml (see below) to Database.create(). An optional name parameter can then be used to give the new table a different name.

The optional field parameter is a function that takes each field name from the XML and returns a new field name to be used in the table. This is useful for Query.xml, where field names may contain a period (e.g., pets.name), which is not allowed in a table. 

Alternatively, use the Query.aliases dictionary to define custom field names for the XML.

 


Table

A Table is a list of rows, consisting of one or more fields (i.e., table columns) of a certain type (i.e., strings or numbers). New tables can be created with the Database.create() method. An error is raised if a table with the given name already exists. Existing tables can be retrieved with Database.tables[name], or Database.<name>.

table = Database.tables[name]
table.db                    # Parent Database.
table.name                  # Table name (characters a-z + underscore).
table.fields                # List of field names (i.e., columns).
table.schema                # Dictionary of (field, Schema)-items.
table.default               # Dictionary of (field, value)-items for new rows.
table.pk                    # Primary key field.
table.count()               # Total number of rows.
table.rows()                # List of rows, each a list of fields.
table.append(values={}, commit=True)
table.update(id, values={}, commit=True)
table.remove(id, commit=True)
table.filter(fields, filters)
table.search(*args, **kwargs) 
table.xml                   # XML-string with the table schema and rows. 
table.datasheet             # Datasheet object (see below).
  • Table.append(), update()remove() modify the table contents – see further.
  • With commit=False, changes are only reflected after a Database.commit() (= faster in batch). 
  • Table.filter() returns a subset of rows with a subset of fields – see further.
    For example: table.filter('name', type='cat').

Table schema

The Table.schema dictionary has a Schema object for each field name, containing field information:

schema = Table.schema[fieldname]
schema.name                 # Field name.
schema.type                 # STRING, INTEGER, FLOAT, TEXT, BLOB, BOOLEAN, DATE 
schema.length               # STRING field length.
schema.default              # Default value.
schema.index                # PRIMARY | UNIQUE | True | False
schema.optional             # True or False. 

For example:

>>> print db.pets.fields
>>> print db.pets.schema['name'].type
>>> print db.pets.schema['name'].length

['id', 'name', 'tail', 'date_birth', 'date_created']
'string'
80 

Append, update and remove rows

Table.append() adds a new row with the given field values. If the table has a primary key generated with pk(), returns the row's id (None otherwise).

>>> db.pets.append(name=u'Schrödinger', date_birth=date('2009-08-12'))
>>> db.pets.append({'name': u'Schrödinger', 'date_birth': date('2009-08-12')})

If a field name does not allow it to be used as a function parameter, the field values can also be passed as a dictionary (e.g., Unicode strings like data_de_creació).

Table.update() updates the values in the row with the given primary key. Instead of a primary key value, a selection of rows can also be updated using any() or all() (see further). In the third example, all rows with type='cat' will have their tail field set to True.

>>> db.pets.update(1, type='cat') # set type='cat' in row with id=1.
>>> db.pets.update(1, {'type': 'cat'})
>>> db.pets.update(all(filter('type', 'cat')), tail=True)

Table.remove() removes the row with the given primary key:

>>> db.pets.remove(1)
>>> db.pets.remove(ALL)
>>> db.pets.remove(all(filter('type', 'cat'), filter(year('date_birth'), 1990, '<')))

The last example removes all rows with type='cat' and year of birth before 1990.

Note: to insert binary data into a BLOB field, wrap it in Database.binary()
For example: db.images.append(src=db.binary(open('cat.png').read()))) 

Filter rows

Table.filter() returns a list of rows filtered by field value(s), where each row is a list of fields. Its first argument defines which fields to yield for each row – either ALL, a field name or a list of field names. The second argument is a dictionary of field constraints. These can also be given as optional keyword parameters:

>>> db.pets.filter('name')
>>> db.pets.filter(('id', 'name'))
>>> db.pets.filter(ALL, type='cat')
>>> db.pets.filter(ALL, type=('cat', 'dog'))   # type='cat' OR type='dog'
>>> db.pets.filter(ALL, type='cat', tail=True) # type='cat' AND tail=True 
>>> db.pets.filter('id', {'type': 'cat', 'tail': True})

For more intricate queries a Query object can be used.

 


Query

Table.search() returns a Query object with options for filtering, sorting and ordering rows by field value(s). It can also include fields from other, related tables.

query = Table.search( 
     fields = ALL,
    filters = [],
  relations = [],
       sort = None,
      order = ASCENDING,
      group = None,
   function = FIRST,
      range = None
)
query.table                 # Parent Table.
query.fields                # Field name, list of field names or ALL.
query.aliases               # Dictionary of (field name, alias)-items.
query.filters               # List of filter() items.
query.relations             # List of relation() items.
query.sort                  # Field name or list of field names.
query.order                 # ASCENDING | DESCENDING
query.group                 # Field name or list of field names.
query.function              # FIRST, LAST, COUNT, MIN, MAX, SUM, AVG, CONCATENATE
query.range                 # (start, stop)-tuple, e.g. rows 11-20.
query.rows()                # List of rows, each a list of fields.
query.record(row)           # Dictionary of (field, value)-items for given row.
query.xml                   # XML-string with the query schema and rows.

Query filters

The filter() function creates a field-value constraint that will match certain rows in a table. A list of such filters can be passed to the filters parameter of the Query object.

filter(field, value, comparison='=')
Comparison Description Example
= equal to filter('type', ('cat', 'dog'), '=') 
i= equal to (case-insensitive) filter('name', 'tig*', 'i=') 
!= not equal to filter('name', '*y', '!=') 
> greater than filter('weight', 10, '>')  
< less than filter('weight', 10, '<') 
>= greater than or equal to filter(year('date'), 1999, '>=') 
<= less than or equal to filter(year('date'), 2002, '<=')
: between (inclusive) filter(year('date'), (1999, 2002), ':')

To work with a specific part of a DATE field (e.g., the year), wrap it in a year(), month(), day(), hour(), minute() or second() function. 

Multiple filters can be chained together in an all() or any() list, using AND or OR logic respectively:

all(filter1, filter2, ...) # Rows must match all of the filters.
any(filter1, filter2, ...) # Rows must match one of the filters.

For example:

>>> from pattern.db import Database, filter, all, any
>>> db = Database('my_stuff')
>>> db.pets.search(filters=all(filter('name', 'Taxi'), filter('type', 'cat')))
>>> db.pets.search(filters=any(filter('name', 'Taxi'), filter('type', 'cat')))

The first query matches all cats named Taxi (i.e., name='Taxi' AND type='cat'). The second query matches any pet that is a cat or that is named 'Taxi' (i.e., name='Taxi' OR type='cat'). 

Furthmore, all() and any() can be nested to create complex search criteria. The example below matches all pets that are cats, and whose name starts with an K- or ends with a -y:

>>> f = all()
>>> f.append(filter('type', 'cat'))
>>> f.append(any(filter('name', 'K*'), filter('name', '*y')))
>>> db.pets.search(filters=f)

Query relations

Relations are used to include fields from other tables. Say we have a products table and an orders table. We don't want all the product data in the orders table – this belongs in the products table so duplicate data is avoided (this is called database normalization). Each order contains the product id. Each product id can occur in multiple orders. When we run an invoice query for an order, we fetch the product description, price, etc. using a relation.

The relation() function defines a relation to another table, between two fields in the tables (usually id's):

relation(field1, field2, table, join=LEFT) # INNER | LEFT | RIGHT | FULL
  • LEFT selects all rows from the base table, with additional fields from the related table (if found).
    If there is no link between field1 and field2, the fields from the related table contain None.
  • RIGHT is the reverse of LEFT.
  • FULL means LEFT + RIGHT.
  • INNER is like LEFT, but if there is no link between field1 and field2 that row is omitted.

Below is an example of a simple products / customers / orders database:

products
id Description price
1 pizza 15
2 garlic bread 3
customers
id name
1 Schrödinger
2 Hofstadter
orders
id product_id customer_id
1 1 2
     
>>> from pattern.db import Database, field, pk, INT
>>> db = Database('pizza_delivery')
>>> db.create('products', (pk(), field('description'), field('price', INT)))
>>> db.create('customers', (pk(), field('name')))
>>> db.create('orders', (pk(), field('product_id', INT), field('customer_id', INT)))
Add data:
>>> db.products.append(description='pizza', price=15)
>>> db.products.append(description='garlic bread', price=3)
>>> db.customers.append(name=u'Schrödinger')
>>> db.customers.append(name=u'Hofstadter')
>>> db.orders.append(product_id=1, customer_id=2)
Hofstadter orders a pizza. We use a query with relations between the tables to display the order:
>>> from pattern.db import Database, relation, LEFT
>>> db = Database('pizza_delivery')
>>> q = db.orders.search(
>>>      fields = ('id', 'customers.name', 'products.description', 'products.price'),
>>>   relations = (
>>>         relation('customer_id', 'id', 'customers', join=LEFT),
>>>         relation('product_id', 'id', 'products', join=LEFT)
>>>     )
>>> )
>>> print q.fields
>>> print q.rows()

['id', 'customers.name', 'products.description', 'products.price']
[[1, u'Hofstadter', u'pizza', 15]] 
Note: if a relation is used often, you can define it once with Database.link(). It will then be available in every Query.

Grouping rows

Rows in a Query are folded on duplicate values in the given group field. The given function is applied to the other fields. It can also be a list with a separate function for each field.

Function Field Description
FIRST any The first field in the group.
LAST  any The last field in the group.
COUNT  any The number of rows in the group.
MIN INTEGER + FLOAT The lowest field value in the group.
MAX INTEGER + FLOAT The highest field value in the group.
SUM INTEGER + FLOAT The sum of all field values in the group.
AVG INTEGER + FLOAT The average of all field values in the group.
STDEV INTEGER + FLOAT The standard deviation (= variation from average).
CONCATENATE STRING Joins all field values with a comma. 

For example, to get the total revenue per ordered product:

>>> print db.orders.search(
>>>        fields = ('products.description', 'products.price'),
>>>     relations = (relation('product_id', 'id', 'products')),
>>>         group = 'products.description', # Fold products with same description.
>>>      function = SUM
>>> ).rows()

To get the sum of a column for all rows in a table or query, simply use sum():

>>> print db.products.filter(sum('price'))[0][0]

 


View

View is a thin interface between a Table and an app. The View.render() method must be overridden in a subclass to yield rows in a certain output format (e.g., HTML). The outside world does not need to know about Table or Query – only what parameters can be fed to render().

If the given table does not exist, the constructor will call View.setup() with the given schema of field() items to create it.

view = View(database, table_name, schema=[])
view.db                     # Parent Database.
view.table                  # Parent Table.
view.schema                 # List of field()-items to create the table.
view.setup(overwrite=False)
view.render(*args, **kwargs) 

For example, the following view renders a HTML <table> with products that match a search query:

>>> from pattern.db import Database, View
>>> db = Database('pizza_delivery')
>>>
>>> class Products(View):
>>>     def __init__(self, database):
>>>         View.__init__(self, database, 'products', schema=[
>>>             pk(),
>>>             field('name', STRING),
>>>             field('price', FLOAT)
>>>         ])
>>>        
>>>     def render(self, query, **kwargs):
>>>         q = self.table.search(
>>>              fields = ['description', 'price'], 
>>>             filters = [filter('description', '*'+q+'*')])
>>>         html = []
>>>         for row in q.rows():
>>>             r = []
>>>             for i, field in enumerate(q.fields):
>>>>                 r.append('<td class="%s">%s</td>' % (field, string(row[i])))
>>>             html.append('<tr>%s</tr>' % ''.join(r))
>>>         return '<table>' + ''.join(html) + '</table>'
>>>
>>> print Products().render(q='bread')

'<table><tr><td class="description">garlic bread</td><td class="price">3</td>
 </tr></table>' 
When View is used as a CherryPy web page, render() will be called with args as a list of parts in the relative URL, and kwargs as a dictionary of all POST and GET variables sent from the client. For example, http://pizzadelivery.com/products/search?q=bread → ['products', 'search'] + {'q': 'bread'}.

 


Datasheet

Datasheet is a matrix of rows and columns, where each row and column can be retrieved as a Python list. The data is not stored in a database but can be imported / exported as a CSV-file. Optionally, the given fields is a list of (name, type)-items where type can be STRING, TEXT, INTEGER, FLOAT, BOOLEAN, BLOB or DATE.

datasheet = Datasheet(rows=[], fields=None)
datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)
datasheet.rows              # List of rows (each row = list of values).
datasheet.columns           # List of columns (each column = list of values).
datasheet.fields            # List of (name, type) column headers.
datasheet.<field>           # List of column values. 
datasheet[i]                # Row with index i.
datasheet[i, j]             # Value in row i at column j.
datasheet[i1:i2, j]         # Slice of column j from rows i1-i2.
datasheet[i, j1:j2]         # Slice of columns j1-j2 from row i.
datasheet[i1:i2, j1:j2]     # Datasheet with columns j1-j2 from rows i1-i2.
datasheet[:]                # Datasheet copy. 
datasheet.insert(i, row, default=None)
datasheet.append(row, default=None)
datasheet.extend(rows, default=None)
datasheet.copy(rows=ALL, columns=ALL) 
datasheet.group(j, function=FIRST, key=lambda value: value)
datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)
datasheet.json              # JSON-formatted string.
  • Datasheet.insert() and append() fill missing columns (at the right) with a default value.
  • Datasheet.columns.insert() and append() fill missing rows (bottom) with the default.
    An optional field parameter can be used to supply a (name, type) column header. 
  • Datasheet.copy() returns a new Datasheet from a selective list of row and/or column indices.
    To flip a datasheet, do: datasheet.copy(rows=datasheet.columns).

Grouping rows

Datasheet.group() returns a new Datasheet with unique values in column j, by grouping rows with the given function. The function takes a list of column values and returns a single value. Predefined functions are: FIRST, LAST, COUNT, MIN, MAX, SUM, AVG, STDEVCONCATENATE. It can also be a list of functions, one for each column. A TypeError is raised when a function cannot handle the data in its column. 

The key argument can be used to prepare the values in column j for comparison, for example: key=lambda date: date.year groups a column of Date objects by year. 

>>> from pattern.db import Datasheet, uid, pprint
>>> ds = Datasheet(rows=[
>>>     [uid(), u'Schrödinger', 'cat'],
>>>     [uid(), u'Hofstadter',  'cat'],
>>>     [uid(), u'Taxi',        'dog']])
>>> g = ds.copy(columns=[2,0]) # A copy with the type and id columns.
>>> g = g.group(0, COUNT)      # Group by type, count rows per type.
>>> pprint(g, fill='')

cat  2 
dog  1
The example below uses a custom concatenate function:
>>> g = ds.copy(columns=[2,1])
>>> g = g.group(0, function=lambda list: '+'.join(list))
>>> pprint(g, fill='')

cat  Schrödinger+Hofstadter
dog  Taxi     

Sorting rows & columns

Datasheet.columns[j].sort() sorts the rows in the matrix according to the values in column j
Datasheet.columns.sort() is useful when you set its order parameter, which is a list of columns indices:

>>> ds.columns.sort(order=[0,2,1])
>>> pprint(ds, fill='')

1  cat  Schrödinger
2  cat  Hofstadter
3  dog Taxi

Datasheet.columns.swap() swaps two individual columns with given indices.

Helper functions: uid & pprint

The pattern.table module has two functions that are useful in combination with a Datasheet:

uid()
pprint(datasheet, truncate=40, padding=" ", fill=".")

The uid() function returns a unique number, starting from 1, useful for a column index.

The pprint() function pretty-prints the table, with columns aligned vertically. The truncate parameter defines the maximum column width, padding the spacing between columns and fill the character used for spacing.

CSV import & export

CSV (comma-separated values) is a simple text format for tabular data, where each row is a new line and each value is separated by a comma.

Datasheet.save() exports the table as a CSV file.
Datasheet.load() returns a Datasheet from a given CSV file.

datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)
datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)

When saving a CSV file, str, int, float, bool and Date are converted to unicode. For other data types a custom encoder must be given. When loading a CSV file, all values in the datasheet will be unicode unless a custom decoder is given.

With headers=True, the headers in Datasheet.fields are saved to + loaded from CSV. In this case, the data type for each column is explicitly known and no encoder or decoder is necessary.

>>> from pattern.db import Datasheet, STRING, DATE, date
>>> ds1 = Datasheet(fields=[('name', STRING), ('date', DATE)])
>>> ds1.append([u'Schrödinger', date('August 12, 1887')])
>>> ds1.append([u'Hofstadter', date('February 15, 1945')])
>>> ds1.save('test.txt', headers=True)
>>> ds2 = Datasheet.load('test.txt', headers=True)
>>>
>>> print ds2.fields
>>> print ds2[0]
>>> print type(ds2[0,1])

[(u'name', 'string'), (u'date', 'date')]
[u'Schrödinger', Date('1887-08-12 00:00:00')]
<class 'pattern.db.Date'> 

 


Date

The Date object is a convenient subclass of Python's datetime.datetime. The date() function returns a Date object. A date can be generated from an integer (Unix timestamp), a string or NOW. Optionally, an input format (e.g. "%d/%m/%Y") and an output format for str() can be given. The default format is "YYYY-MM-DD hh:mm:ss". Use Date.timestamp to get a Unix timestamp.

date(int)
date(NOW, format=DEFAULT)
date(string)
date(string, format=DEFAULT)
date(string, inputformat, format=DEFAULT)
date(year, month, day, format=DEFAULT)
date(year, month, day, hours, minutes, seconds, format=DEFAULT)

If no input format is given, a number of common formats will be tried:

Format Example
%Y-%m-%d %H:%M:%S 2010-09-21 09:27:01
%a, %d %b %Y %H:%M:%S %z Tue, 9 Sep 2010 17:58:28 +0000
%Y-%m-%dT%H:%M:%SZ 2010-09-20T09:27:01Z
%Y-%m-%dT%H:%M:%S+0000 2010-09-20T09:27:01+0000
%Y-%m-%d %H:%M 2010-09-20 09:27
%Y-%m-%d 2010-09-20
%d/%m/%Y 20/09/2010
%d %B %Y 9 september 2010
%B %d %Y September 9 2010
%B %d, %Y September 09, 2010

All date formats used in pattern.web (e.g., a Bing search result) are automatically detected.
For a full overview of date format syntax, see: http://docs.python.org/library/time.html#time.strftime.


Date calculations

The time() function can be used to add (or subtract) time to a Date:

time(days=0, seconds=0, minutes=0, hours=0)
>>> from pattern.db import date, time
>>> d  = date('23 august 2011')
>>> d += time(days=2, hours=5)
>>> print type(d)
>>> print d
>>> print d.year, d.month, d.day

<class 'pattern.db.Date'>
2011-08-25 05:00:00
2011, 8, 25 

 


See also

  • CherryPy (BSD): object-oriented HTTP framework for Python.
  • Django (BSD): model-view-controller framework for Python.