Package ziggy :: Package hdmc
[hide private]
[frames] | no frames]

Package hdmc

source code

Created on Jul 28, 2010


Author: dwmclary

Submodules [hide private]

Functions [hide private]
 
submit(script, output_data_name, iterations=1, supporting_file_list=None, reduction_script=None, arguments='', debug=False, num_mappers=None, num_reducers=None)
Submits script non-blocking job to a MapReduce cluster and collects output in output_data_name.
source code
 
submit_inline(script, output_data_name, iterations=1, supporting_file_list=None, reduction_script=None, arguments='', debug=False, num_mappers=None, num_reducers=None)
Submits script blocking job to a MapReduce cluster and collects output in output_data_name.
source code
 
make_frame(script, arguments='', iterations=1, debug=False)
Generates a basic python frame for running a batch job on a MapReduce cluster.
source code
 
build_generic_hadoop_call(mapper, reducer, input, output, supporting_file_list=None, num_mappers=None, num_reducers=None, key_comparator=None)
Builds a call array suitable for subprocess.Popen which submits a streaming job to the configured MapReduce instance.
source code
 
execute(hadoop_call)
Nonblocking execution of the given call array
source code
 
execute_and_wait(hadoop_call)
Blocking execution of the given call array
source code
 
submit_checkpoint_inline(script, output_data_name, file_list, supporting_file_list=[], reduction_script=None, arguments='', files=True, debug=False, num_mappers=None, num_reducers=None)
Submits a script to a MapReduce cluster for parallel operation on a number of files.
source code
 
submit_checkpoint(script, output_data_name, file_list, supporting_file_list=[], reduction_script=None, arguments='', files=True, debug=False, num_mappers=None, num_reducers=None)
Submits a script to a MapReduce cluster for parallel operation on a number of files.
source code
 
make_pseudo_checkpoints(file_list)
Designed to make checkpointing long lists of parameters (e.g.
source code
Variables [hide private]
  __author__ = 'D. McClary (dan.mcclary@northwestern.edu)'
  __package__ = 'ziggy.hdmc'
Function Details [hide private]

submit(script, output_data_name, iterations=1, supporting_file_list=None, reduction_script=None, arguments='', debug=False, num_mappers=None, num_reducers=None)

source code 

Submits script non-blocking job to a MapReduce cluster and collects output in output_data_name. Supporting filenames can be passed as a list, as can a reducing/filtering script. Arguments to the submitted script should be passed as a string.

submit_inline(script, output_data_name, iterations=1, supporting_file_list=None, reduction_script=None, arguments='', debug=False, num_mappers=None, num_reducers=None)

source code 

Submits script blocking job to a MapReduce cluster and collects output in output_data_name. Supporting filenames can be passed as a list, as can a reducing/filtering script. Arguments to the submitted script should be passed as a string.

submit_checkpoint_inline(script, output_data_name, file_list, supporting_file_list=[], reduction_script=None, arguments='', files=True, debug=False, num_mappers=None, num_reducers=None)

source code 

Submits a script to a MapReduce cluster for parallel operation on a number of files. An optional reducer script can be applied as well, but should filter the map results by splitting file output on ===HDMC_CHECKPOINT===. Arguments to the submitted script should be passed as a string. Blocking.

submit_checkpoint(script, output_data_name, file_list, supporting_file_list=[], reduction_script=None, arguments='', files=True, debug=False, num_mappers=None, num_reducers=None)

source code 

Submits a script to a MapReduce cluster for parallel operation on a number of files. An optional reducer script can be applied as well, but should filter the map results by splitting file output on ===HDMC_CHECKPOINT===. Arguments to the submitted script should be passed as a string. Non-blocking.

make_pseudo_checkpoints(file_list)

source code 

Designed to make checkpointing long lists of parameters (e.g. URLs) easier