pattern.db
The pattern.db module includes wrappers for popular databases (SQLite + MySQL) and Unicode CSV files. It offers a convenient way to work with tabular data, for example retrieved with the pattern.web module.
It can be used by itself or with other pattern modules: web | db | en | search | vector | graph.
Documentation
Database
A Database is a collection tables. A Table contains rows of data, with a specific data type (e.g., string, integer) for each field (column). A database is typically managed by a robust engine such as MySQL. Statements written in SQL (Structured Query Language) are used to interact with the engine. The pattern.db module is a wrapper for SQL. It supports the SQLITE and MYSQL engines.
SQLite is already included with Python 2.5+. With older versions you need to install PySQLite. To use MySQL, you need to install MySQL and MySQL-Python. If you have a 64-bit Python you'll need a 64-bit MySQL and MySQLdb.
Database() creates (if necessary) and returns a database. With SQLITE, this will generate a file with the given name in the current folder.
db = Database( name, host = 'localhost', port = 3306, username = 'root', password = '', type = SQLITE )
db.type # SQLITE | MYSQL db.name # Database name. db.host # Database host. db.port # Database port. db.username # Database username (MySQL). db.password # Database password (MySQL). db.tables # Dictionary of (name, Table)-items. db.relations # List of relations, see Database.link(). db.query # Last SQL query string. db.connected # True after Database.connect().
db.connect() # Happens automatically. db.disconnect()
db.create(table, fields=[]) db.remove(table) db.link(table1, field1, table2, field2, join=LEFT)
db.execute(SQL, commit=False) db.commit() db.escape(value) # "a cat's tail" => "'a cat's tail'"
- Database.execute() returns the results from a raw SQL query.
- Database.commit() commits the changes after a INSERT, UPDATE, DELETE SQL query.
- Database.escape() safely quotes and escapes a field value.
Also see the source code for abs() and cmp() when constructing a raw SQL query.
Creating a table
Database.create() creates a new table in the database, from a given table name and a list of row fields, defined with the field() function:
field(name, type=STRING, default=None, index=False, optional=True)
Each field has a name and a type, with an optional default value for new rows. Field names should contain only characters (a-z) and underscores. The default value for a DATE field is NOW (current date & time). With optional=True, the field is allowed to contain None.
Type | Value | Example |
STRING | str, unicode (1-255 characters) | u'Schrödinger' |
INTEGER | int | 42 |
FLOAT | float | 3.14159 |
TEXT | str, unicode | open('file.txt').read() |
BLOB | str (binary: PDF, PNG, ...) | Database.binary(open('cat.jpg').read()) |
BOOLEAN | bool | True, False |
DATE | Date | date('1999-12-31 23:59:59') |
By default, a STRING field can hold up to a 100 characters, but the length (1-255) can be changed by calling STRING like a function: e.g., STRING(255). For longer strings, use TEXT.
With index=True, the field is indexed for faster search. The index can also be set to UNIQUE (no duplicates allowed in this column) or PRIMARY. A table should have a primary key field that uniquely identifies each row (i.e., an id). You can use the pk() function, which creates an INTEGER primary key field named id. Integer primary keys are auto-numbered – no need to set the value manually in a new row.
pk(name='id') # type=INTEGER + index=PRIMARY + optional=False
For example:
>>> from pattern.db import Database, field, pk, STRING, BOOLEAN, DATE, NOW, date >>> db = Database('my_stuff') >>> db.create('pets', fields=[ >>> pk(), >>> field('name', STRING(80), index=True), >>> field('type', STRING(20)), >>> field('tail', BOOLEAN), >>> field('date_birth', DATE, default=None), >>> field('date_created', DATE, default=NOW) >>> ])
>>> db.pets.append(name=u'Schrödinger', type='cat', tail=True) >>> print db.pets.rows()[0] [u'Schrödinger', u'cat', True, None, '2011-08-17 19:16:46']
Creating a table from XML
You can also pass Table.xml or Query.xml (see below) to Database.create(). An optional name parameter can then be used to give the new table a different name.
The optional field parameter is a function that takes each field name from the XML and returns a new field name to be used in the table. This is useful for Query.xml, where field names may contain a period (e.g., pets.name), which is not allowed in a table.
Alternatively, use the Query.aliases dictionary to define custom field names for the XML.
Table
A Table is a list of rows, consisting of one or more fields (i.e., table columns) of a certain type (i.e., strings or numbers). New tables can be created with the Database.create() method. An error is raised if a table with the given name already exists. Existing tables can be retrieved with Database.tables[name], or Database.<name>.
table = Database.tables[name]
table.db # Parent Database. table.name # Table name (characters a-z + underscore). table.fields # List of field names (i.e., columns). table.schema # Dictionary of (field, Schema)-items. table.default # Dictionary of (field, value)-items for new rows. table.pk # Primary key field.
table.count() # Total number of rows. table.rows() # List of rows, each a list of fields.
table.append(values={}, commit=True) table.update(id, values={}, commit=True) table.remove(id, commit=True) table.filter(fields, filters)
table.search(*args, **kwargs)
table.xml # XML-string with the table schema and rows. table.datasheet # Datasheet object (see below).
- Table.append(), update(), remove() modify the table contents – see further.
- With commit=False, changes are only reflected after a Database.commit() (= faster in batch).
- Table.filter() returns a subset of rows with a subset of fields – see further.
For example: table.filter('name', type='cat').
Table schema
The Table.schema dictionary has a Schema object for each field name, containing field information:
schema = Table.schema[fieldname]
schema.name # Field name. schema.type # STRING, INTEGER, FLOAT, TEXT, BLOB, BOOLEAN, DATE schema.length # STRING field length. schema.default # Default value. schema.index # PRIMARY | UNIQUE | True | False schema.optional # True or False.
For example:
>>> print db.pets.fields >>> print db.pets.schema['name'].type >>> print db.pets.schema['name'].length ['id', 'name', 'tail', 'date_birth', 'date_created'] 'string' 80
Append, update and remove rows
Table.append() adds a new row with the given field values. If the table has a primary key generated with pk(), returns the row's id (None otherwise).
>>> db.pets.append(name=u'Schrödinger', date_birth=date('2009-08-12'))
>>> db.pets.append({'name': u'Schrödinger', 'date_birth': date('2009-08-12')})
If a field name does not allow it to be used as a function parameter, the field values can also be passed as a dictionary (e.g., Unicode strings like data_de_creació).
Table.update() updates the values in the row with the given primary key. Instead of a primary key value, a selection of rows can also be updated using any() or all() (see further). In the third example, all rows with type='cat' will have their tail field set to True.
>>> db.pets.update(1, type='cat') # set type='cat' in row with id=1.
>>> db.pets.update(1, {'type': 'cat'})
>>> db.pets.update(all(filter('type', 'cat')), tail=True)
Table.remove() removes the row with the given primary key:
>>> db.pets.remove(1)
>>> db.pets.remove(ALL)
>>> db.pets.remove(all(filter('type', 'cat'), filter(year('date_birth'), 1990, '<')))
The last example removes all rows with type='cat' and year of birth before 1990.
Note: to insert binary data into a BLOB field, wrap it in Database.binary().
For example: db.images.append(src=db.binary(open('cat.png').read())))
Filter rows
Table.filter() returns a list of rows filtered by field value(s), where each row is a list of fields. Its first argument defines which fields to yield for each row – either ALL, a field name or a list of field names. The second argument is a dictionary of field constraints. These can also be given as optional keyword parameters:
>>> db.pets.filter('name')
>>> db.pets.filter(('id', 'name'))
>>> db.pets.filter(ALL, type='cat')
>>> db.pets.filter(ALL, type=('cat', 'dog')) # type='cat' OR type='dog'
>>> db.pets.filter(ALL, type='cat', tail=True) # type='cat' AND tail=True
>>> db.pets.filter('id', {'type': 'cat', 'tail': True})
For more intricate queries a Query object can be used.
Query
Table.search() returns a Query object with options for filtering, sorting and ordering rows by field value(s). It can also include fields from other, related tables.
query = Table.search( fields = ALL, filters = [], relations = [], sort = None, order = ASCENDING, group = None, function = FIRST, range = None )
query.table # Parent Table. query.fields # Field name, list of field names or ALL. query.aliases # Dictionary of (field name, alias)-items. query.filters # List of filter() items. query.relations # List of relation() items. query.sort # Field name or list of field names. query.order # ASCENDING | DESCENDING query.group # Field name or list of field names. query.function # FIRST, LAST, COUNT, MIN, MAX, SUM, AVG, CONCATENATE query.range # (start, stop)-tuple, e.g. rows 11-20.
query.rows() # List of rows, each a list of fields.
query.record(row) # Dictionary of (field, value)-items for given row.
query.xml # XML-string with the query schema and rows.
Query filters
The filter() function creates a field-value constraint that will match certain rows in a table. A list of such filters can be passed to the filters parameter of the Query object.
filter(field, value, comparison='=')
Comparison | Description | Example |
= | equal to | filter('type', ('cat', 'dog'), '=') |
i= | equal to (case-insensitive) | filter('name', 'tig*', 'i=') |
!= | not equal to | filter('name', '*y', '!=') |
> | greater than | filter('weight', 10, '>') |
< | less than | filter('weight', 10, '<') |
>= | greater than or equal to | filter(year('date'), 1999, '>=') |
<= | less than or equal to | filter(year('date'), 2002, '<=') |
: | between (inclusive) | filter(year('date'), (1999, 2002), ':') |
To work with a specific part of a DATE field (e.g., the year), wrap it in a year(), month(), day(), hour(), minute() or second() function.
Multiple filters can be chained together in an all() or any() list, using AND or OR logic respectively:
all(filter1, filter2, ...) # Rows must match all of the filters.
any(filter1, filter2, ...) # Rows must match one of the filters.
For example:
>>> from pattern.db import Database, filter, all, any >>> db = Database('my_stuff') >>> db.pets.search(filters=all(filter('name', 'Taxi'), filter('type', 'cat'))) >>> db.pets.search(filters=any(filter('name', 'Taxi'), filter('type', 'cat')))
The first query matches all cats named Taxi (i.e., name='Taxi' AND type='cat'). The second query matches any pet that is a cat or that is named 'Taxi' (i.e., name='Taxi' OR type='cat').
Furthmore, all() and any() can be nested to create complex search criteria. The example below matches all pets that are cats, and whose name starts with an K- or ends with a -y:
>>> f = all() >>> f.append(filter('type', 'cat')) >>> f.append(any(filter('name', 'K*'), filter('name', '*y'))) >>> db.pets.search(filters=f)
Query relations
Relations are used to include fields from other tables. Say we have a products table and an orders table. We don't want all the product data in the orders table – this belongs in the products table so duplicate data is avoided (this is called database normalization). Each order contains the product id. Each product id can occur in multiple orders. When we run an invoice query for an order, we fetch the product description, price, etc. using a relation.
The relation() function defines a relation to another table, between two fields in the tables (usually id's):
relation(field1, field2, table, join=LEFT) # INNER | LEFT | RIGHT | FULL
- LEFT selects all rows from the base table, with additional fields from the related table (if found).
If there is no link between field1 and field2, the fields from the related table contain None. - RIGHT is the reverse of LEFT.
- FULL means LEFT + RIGHT.
- INNER is like LEFT, but if there is no link between field1 and field2 that row is omitted.
Below is an example of a simple products / customers / orders database:
|
|
|
>>> from pattern.db import Database, field, pk, INT >>> db = Database('pizza_delivery') >>> db.create('products', (pk(), field('description'), field('price', INT))) >>> db.create('customers', (pk(), field('name'))) >>> db.create('orders', (pk(), field('product_id', INT), field('customer_id', INT)))
>>> db.products.append(description='pizza', price=15) >>> db.products.append(description='garlic bread', price=3)
>>> db.customers.append(name=u'Schrödinger') >>> db.customers.append(name=u'Hofstadter')
>>> db.orders.append(product_id=1, customer_id=2)
>>> from pattern.db import Database, relation, LEFT >>> db = Database('pizza_delivery') >>> q = db.orders.search( >>> fields = ('id', 'customers.name', 'products.description', 'products.price'), >>> relations = ( >>> relation('customer_id', 'id', 'customers', join=LEFT), >>> relation('product_id', 'id', 'products', join=LEFT) >>> ) >>> ) >>> print q.fields >>> print q.rows() ['id', 'customers.name', 'products.description', 'products.price'] [[1, u'Hofstadter', u'pizza', 15]]
Grouping rows
Rows in a Query are folded on duplicate values in the given group field. The given function is applied to the other fields. It can also be a list with a separate function for each field.
Function | Field | Description |
FIRST | any | The first field in the group. |
LAST | any | The last field in the group. |
COUNT | any | The number of rows in the group. |
MIN | INTEGER + FLOAT | The lowest field value in the group. |
MAX | INTEGER + FLOAT | The highest field value in the group. |
SUM | INTEGER + FLOAT | The sum of all field values in the group. |
AVG | INTEGER + FLOAT | The average of all field values in the group. |
STDEV | INTEGER + FLOAT | The standard deviation (= variation from average). |
CONCATENATE | STRING | Joins all field values with a comma. |
For example, to get the total revenue per ordered product:
>>> print db.orders.search( >>> fields = ('products.description', 'products.price'), >>> relations = (relation('product_id', 'id', 'products')), >>> group = 'products.description', # Fold products with same description. >>> function = SUM >>> ).rows()
To get the sum of a column for all rows in a table or query, simply use sum():
>>> print db.products.filter(sum('price'))[0][0]
View
View is a thin interface between a Table and an app. The View.render() method must be overridden in a subclass to yield rows in a certain output format (e.g., HTML). The outside world does not need to know about Table or Query – only what parameters can be fed to render().
If the given table does not exist, the constructor will call View.setup() with the given schema of field() items to create it.
view = View(database, table_name, schema=[])
view.db # Parent Database. view.table # Parent Table. view.schema # List of field()-items to create the table.
view.setup(overwrite=False)
view.render(*args, **kwargs)
For example, the following view renders a HTML <table> with products that match a search query:
>>> from pattern.db import Database, View >>> db = Database('pizza_delivery') >>> >>> class Products(View): >>> def __init__(self, database): >>> View.__init__(self, database, 'products', schema=[ >>> pk(), >>> field('name', STRING), >>> field('price', FLOAT) >>> ]) >>> >>> def render(self, query, **kwargs): >>> q = self.table.search( >>> fields = ['description', 'price'], >>> filters = [filter('description', '*'+q+'*')]) >>> html = [] >>> for row in q.rows(): >>> r = [] >>> for i, field in enumerate(q.fields): >>>> r.append('<td class="%s">%s</td>' % (field, string(row[i]))) >>> html.append('<tr>%s</tr>' % ''.join(r)) >>> return '<table>' + ''.join(html) + '</table>' >>> >>> print Products().render(q='bread') '<table><tr><td class="description">garlic bread</td><td class="price">3</td> </tr></table>'
Datasheet
Datasheet is a matrix of rows and columns, where each row and column can be retrieved as a Python list. The data is not stored in a database but can be imported / exported as a CSV-file. Optionally, the given fields is a list of (name, type)-items where type can be STRING, TEXT, INTEGER, FLOAT, BOOLEAN, BLOB or DATE.
datasheet = Datasheet(rows=[], fields=None)
datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)
datasheet.rows # List of rows (each row = list of values). datasheet.columns # List of columns (each column = list of values). datasheet.fields # List of (name, type) column headers. datasheet.<field> # List of column values.
datasheet[i] # Row with index i. datasheet[i, j] # Value in row i at column j. datasheet[i1:i2, j] # Slice of column j from rows i1-i2. datasheet[i, j1:j2] # Slice of columns j1-j2 from row i. datasheet[i1:i2, j1:j2] # Datasheet with columns j1-j2 from rows i1-i2. datasheet[:] # Datasheet copy.
datasheet.insert(i, row, default=None) datasheet.append(row, default=None) datasheet.extend(rows, default=None) datasheet.copy(rows=ALL, columns=ALL)
datasheet.group(j, function=FIRST, key=lambda value: value)
datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)
datasheet.json # JSON-formatted string.
- Datasheet.insert() and append() fill missing columns (at the right) with a default value.
- Datasheet.columns.insert() and append() fill missing rows (bottom) with the default.
An optional field parameter can be used to supply a (name, type) column header. - Datasheet.copy() returns a new Datasheet from a selective list of row and/or column indices.
To flip a datasheet, do: datasheet.copy(rows=datasheet.columns).
Grouping rows
Datasheet.group() returns a new Datasheet with unique values in column j, by grouping rows with the given function. The function takes a list of column values and returns a single value. Predefined functions are: FIRST, LAST, COUNT, MIN, MAX, SUM, AVG, STDEV, CONCATENATE. It can also be a list of functions, one for each column. A TypeError is raised when a function cannot handle the data in its column.
The key argument can be used to prepare the values in column j for comparison, for example: key=lambda date: date.year groups a column of Date objects by year.
>>> from pattern.db import Datasheet, uid, pprint >>> ds = Datasheet(rows=[ >>> [uid(), u'Schrödinger', 'cat'], >>> [uid(), u'Hofstadter', 'cat'], >>> [uid(), u'Taxi', 'dog']]) >>> g = ds.copy(columns=[2,0]) # A copy with the type and id columns. >>> g = g.group(0, COUNT) # Group by type, count rows per type. >>> pprint(g, fill='') cat 2 dog 1
>>> g = ds.copy(columns=[2,1]) >>> g = g.group(0, function=lambda list: '+'.join(list)) >>> pprint(g, fill='') cat Schrödinger+Hofstadter dog Taxi
Sorting rows & columns
Datasheet.columns[j].sort() sorts the rows in the matrix according to the values in column j.
Datasheet.columns.sort() is useful when you set its order parameter, which is a list of columns indices:
>>> ds.columns.sort(order=[0,2,1]) >>> pprint(ds, fill='') 1 cat Schrödinger 2 cat Hofstadter 3 dog Taxi
Datasheet.columns.swap() swaps two individual columns with given indices.
Helper functions: uid & pprint
The pattern.table module has two functions that are useful in combination with a Datasheet:
uid()
pprint(datasheet, truncate=40, padding=" ", fill=".")
The uid() function returns a unique number, starting from 1, useful for a column index.
The pprint() function pretty-prints the table, with columns aligned vertically. The truncate parameter defines the maximum column width, padding the spacing between columns and fill the character used for spacing.
CSV import & export
CSV (comma-separated values) is a simple text format for tabular data, where each row is a new line and each value is separated by a comma.
Datasheet.save() exports the table as a CSV file.
Datasheet.load() returns a Datasheet from a given CSV file.
datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)
datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)
When saving a CSV file, str, int, float, bool and Date are converted to unicode. For other data types a custom encoder must be given. When loading a CSV file, all values in the datasheet will be unicode unless a custom decoder is given.
With headers=True, the headers in Datasheet.fields are saved to + loaded from CSV. In this case, the data type for each column is explicitly known and no encoder or decoder is necessary.
>>> from pattern.db import Datasheet, STRING, DATE, date >>> ds1 = Datasheet(fields=[('name', STRING), ('date', DATE)]) >>> ds1.append([u'Schrödinger', date('August 12, 1887')]) >>> ds1.append([u'Hofstadter', date('February 15, 1945')]) >>> ds1.save('test.txt', headers=True) >>> ds2 = Datasheet.load('test.txt', headers=True) >>> >>> print ds2.fields >>> print ds2[0] >>> print type(ds2[0,1]) [(u'name', 'string'), (u'date', 'date')] [u'Schrödinger', Date('1887-08-12 00:00:00')] <class 'pattern.db.Date'>
Date
The Date object is a convenient subclass of Python's datetime.datetime. The date() function returns a Date object. A date can be generated from an integer (Unix timestamp), a string or NOW. Optionally, an input format (e.g. "%d/%m/%Y") and an output format for str() can be given. The default format is "YYYY-MM-DD hh:mm:ss". Use Date.timestamp to get a Unix timestamp.
date(int)
date(NOW, format=DEFAULT)
date(string)
date(string, format=DEFAULT)
date(string, inputformat, format=DEFAULT)
date(year, month, day, format=DEFAULT)
date(year, month, day, hours, minutes, seconds, format=DEFAULT)
If no input format is given, a number of common formats will be tried:
Format | Example |
%Y-%m-%d %H:%M:%S | 2010-09-21 09:27:01 |
%a, %d %b %Y %H:%M:%S %z | Tue, 9 Sep 2010 17:58:28 +0000 |
%Y-%m-%dT%H:%M:%SZ | 2010-09-20T09:27:01Z |
%Y-%m-%dT%H:%M:%S+0000 | 2010-09-20T09:27:01+0000 |
%Y-%m-%d %H:%M | 2010-09-20 09:27 |
%Y-%m-%d | 2010-09-20 |
%d/%m/%Y | 20/09/2010 |
%d %B %Y | 9 september 2010 |
%B %d %Y | September 9 2010 |
%B %d, %Y | September 09, 2010 |
All date formats used in pattern.web (e.g., a Bing search result) are automatically detected.
For a full overview of date format syntax, see: http://docs.python.org/library/time.html#time.strftime.
Date calculations
The time() function can be used to add (or subtract) time to a Date:
time(days=0, seconds=0, minutes=0, hours=0)
>>> from pattern.db import date, time >>> d = date('23 august 2011') >>> d += time(days=2, hours=5) >>> print type(d) >>> print d >>> print d.year, d.month, d.day <class 'pattern.db.Date'> 2011-08-25 05:00:00 2011, 8, 25