Dagger evaluates file dependencies in a directed-acyclic-graph (DAG) like GNU make, but timestamps or hashes can be enabled per-file. This allows you to use fast timestamp comparisons with large files, and hashing on small files. When hashing is used, it's stored in a simple 2 column text file with filename,hash per line or in a sqlite database. Dagger can be used as a building block for a larger build system that is serial or parallel.
Dagger is written in Python to make it portable and extensible. It's graph evaluation engine is non-recursive, so it can handle very deep dependency paths. A benchmark tool (see below) is available to test and visualize complex graphs, and demonstrates using 1 million files.
import dagger, os
dag = dagger.dagger()
# Create some empty test files.
os.system('touch 1 2 3 4 5 6 7')
dag.add('1', ['2','3'])
dag.add('3', ['4','5'])
dag.add('6', ['3','7'])
# Force this node to be old, and all its dependent parents.
# You can force "freshness" with dag.stale('4', 0).
dag.stale('4') # Or make it missing with os.remove('4')
dag.run()
# Export for visualizing.
dag.dot('example.dot')
# Get iterator.
it = dag.iter()
# See how many items there are in total.
print len(it)
# 4
# Get next 2 files we can process in parallel.
todo = it.next(2)
print todo
# ['4']
# Only one is returned because others depend on it.
# Pretend we finished first task, so remove it and get next 2.
it.remove('4')
todo = it.next(2)
print todo
# ['3']
# We still get single item because 6 & 1 depend on 3.
it.remove('3')
todo = it.next(2)
print todo
# ['1','6']
# We finally get two independent items.
example.dot visualized with kgraphviewer. Old/stale nodes are colored in red by dagger.
make test
# With PIP.
pip install dagger
# If PIP is not available, use source distribution.
sudo python setup.py install
# For Windows, download and run the binary installer at PYPI.
There is a helper script in bench/
to help you see how fast dagger can evaluate the dependency graph.
You can specify how many children and depth of the mock graph.
# Small trees visualized with dot.
$ python bench.py --levels 3 --width 2 --dot 3x2.dot
nodes: inner=6 outer=8 total=14
0.0s Run
Small 3 level, 2 child wide graph output by bench.py with 1 old node ('7').
# Larger test case. Does your project use 56,000 files?
$ python bench.py --levels 6 --width 6 --allpaths
nodes: inner=9330 outer=46656 total=55986
0.24s Run
# See if even faster with pypy.
$ pypy bench.py --levels 6 --width 6 --allpaths
nodes: inner=9330 outer=46656 total=55986
0.16s Run
# Try extreme case of simulating 1 million files.
$ python bench.py --levels 7 --width 7 --allpaths
nodes: inner=137256 outer=823543 total=960799
6.35s Run
# pypy reports a shorter runtime for the 1 million node graph.
$ pypy ...
3.17s Run
# Use a text file for file hashes.
# It's ok if it doesn't exist.
dag = dagger.dagger('/home/project/hash.txt')
# dag.add(...)
# Enable hashing for all files.
dag.hashall = 1
# Evaluates the dependencies and computes hashes if none existed.
dag.run()
# Export the file.
dag.exporthash()
# This time use sqlite (in-memory) database instead of a text file.
# Ok if doesn't exist yet. It will be created.
dag = dagger.dagger('hash.sqlite', sqlite=1, sqlite_memory=1)
# dag.add(...)
# Export the file.
dag.exporthash()
dag.hash('myfile.txt', 1) # Enable hashing for file.
dag.hash('myfile.txt', 0) # Turn it off (hashing is off by default for all).
dag.add(...)
dag.run()
# See the depth-first-search node order.
print dag.ordernames()
# 2,4,5,3,1,7,6
# Access the search ordered nodes directly.
print dag.order.list
# [<dagger.node object at ...>, <dagger.node object at ...>, ...]
# Get path names; root nodes are rightmost in list. For our quick example graph:
print dag.pathnames('4')
# [['3', '1'], ['3', '6']]
v1.2.2 2012/11/22
v1.2.1 2012/11/21
v1.2.0 2012/11/21
v1.1.0 2012/11/19
__len__
function to iterator to see how many tasks remain.v1.0.0 2012/11/14
v0.0.0 2012/10/15
make html
make dist VER=0.0.0
make pypidocs
make pypi
ssh -t rsz,pythondagger@shell.sourceforge.net create
scp html/* rsz,pythondagger@shell.sourceforge.net:/home/project-web/pythondagger/htdocs
scp ../pythondagger-0.0.0.tar.gz rsz,pythondagger@shell.sourceforge.net:/home/frs/project/p/py/pythondagger
Copyright 2012 Remik Ziemlinski under the terms of the GNU General Public License