VRProt
vrprot.alphafold_db_parser module
- class vrprot.alphafold_db_parser.AlphafoldDBParser(WD: str = '/Users/till/Documents/Playground/VRNetzer_Backend/extensions/ProteinStructureFetch/alphafold_to_vrnetzer/pypi_project/docs', chimerax: str = 'chimerax', alphafold_ver: str = 'v1', batch_size: int = 50, processing: str = 'cartoons_ss_coloring', overview_file: str = '/Users/till/Documents/Playground/VRNetzer_Backend/extensions/ProteinStructureFetch/alphafold_to_vrnetzer/pypi_project/docs/static/csv/overview.csv', structures: dict[str, vrprot.classes.ProteinStructure] = <factory>, not_fetched: list[str] = <factory>, keep_temp: dict[vrprot.classes.FileTypes, bool] = <factory>, log: ~vrprot.classes.Logger = <vrprot.classes.Logger object>, img_size: int = 512, db: str = 'alphafold', overwrite: bool = False)
Bases:
object
Class to parse PDB files and convert them to ply.
- Raises:
exceptions.ChimeraXException: If ChimeraX is not installed or cannot be found this Exception is raised.
- Args:
WD (str): Working directory to store processing files and output files. Defaults to “./” . chimerax (str): Path to ChimeraX executable. Defaults to “chimerax” alphafold_ver (str): Defines the version of the AlphaFoldDB to be used. Options are “v1,v2,v3,v4”. batch_size (int): Defines the size of each batch which is to be processed. processing (str): Defines the processing mode which is used to color the protein structures in ChimeraX. Defaults to “cartoons_ss_coloring”. overview_file (str): Path to where to store the overview file in which the scale of each protein strucure and the color mode is stored. Defaults to “./static/csv/overview.csv”. structures (dict[str,ProteinStructure]): Dictionary that maps strings of structures to the ProteinStructure object. Defaults to {}. not_fetched set[str]: Set of protein structures which could no be fetched. Deafults to []. keep_temp dict[FT, bool]: Configuration to keep or remove processing files like PDB or GLB files after each processing step. Defaults to:
- {
FT.pdb_file: False, FT.glb_file: False, FT.ply_file: False, FT.ascii_file: False,
}
log: logging.logger: Log with a specific name. Defaults to Logger(“AlphafoldDBParser”)
- WD: str = '/Users/till/Documents/Playground/VRNetzer_Backend/extensions/ProteinStructureFetch/alphafold_to_vrnetzer/pypi_project/docs'
- alphafold_ver: str = 'v1'
- batch_size: int = 50
- check_dirs(file: str, source: str) None
Check wether a source file is in different directory than the default directory. If so set the corresponding directory to the source.
- chimerax: str = 'chimerax'
- chimerax_process(proteins: list[str], processing: str) None
Processes the .pdb files using ChimeraX and the bundle chimerax_bundle.py. Default processing mode is ColoringModes.cartoons_sscoloring As default, the source pdb file is NOT removed. To change this set self.keep_temp[FT.pdb_file] = False.
- clear_default_dirs() None
Clears the default directories.
- convert_glbs(proteins: list[str]) None
Converts the .glb files to .ply files. As default, the source glb file is removed afterwards. To change this set self.keep_temp[FT.glb_file] = True.
- db: str = 'alphafold'
- execute_fetch(proteins: str) None
Uses a list of proteins to fetch the PDB files from the alphafold db. This PDB files will then be used to generated the color maps.
- execute_from_bulk(source: str)
Will extract all PDB files from a tar archive downloaded from AlphafoldDB to Process all structures within it with the desired processing mode. Furthermore, multi fraction structures are combined to one large structure. These structures are not handled with the desired processing mode.
- execute_from_object(proteins: list[str]) None
Uses a list of proteins which are extracted from a Python object. This assumes that the PDB files of these structures already exist in the PDB directory.
- execute_local(source: str) None
Will extract all Uniprot IDs from a local directory. Assumes that the file names have a the following format: AF-<Uniprot ID>-F1-model-<v1/v2>.[pdb/glb/ply/xyzrgb]
- fetch_pdb(proteins: list[str]) None
Fetches .pdb File from the AlphaFold Server. This function uses the request module from python standard library to directly download pdb files from the AlphaFold server.
- fetch_pipeline(proteins: list[str]) None
Fetch of the structure from the alphafold db.
- filter_already_processed(proteins: list[str]) list[str]
Filter out the proteins that have already been processed.
- gen_maps(proteins: list[str]) None
Generates the maps from the point cloud files. If all of the output files already exists, this protein is skipped. As default, the source ascii point cloud is removed afterwards. To change this set self.keep_temp[FT.ascii_file] = True.
- get_filename(protein: str) str
Get the filename of the protein.
- img_size: int = 512
- init_dirs(subs=True) None
Initialize the directories.
- init_structures_dict(proteins: list[str]) dict[dict[str]]
- keep_temp: dict[vrprot.classes.FileTypes, bool]
- log: Logger = <vrprot.classes.Logger object>
- not_fetched: list[str]
- output_exists(structures: ProteinStructure) bool
Checks if the output files already exist in the output directory.
- overview_file: str = '/Users/till/Documents/Playground/VRNetzer_Backend/extensions/ProteinStructureFetch/alphafold_to_vrnetzer/pypi_project/docs/static/csv/overview.csv'
- overwrite: bool = False
- pdb_pipeline(proteins: list[str]) None
Default pipeline which is used in all program modes. For each structure, the PDB file we be processed in chimerax and exported as GLB file. This GLB file will be converted into a PLY file. The PLY file is used to sample the point cloud which will be saved as an ASCII point cloud. This ASCII point cloud will then be used to generate the color maps (rgb,xyz_low and xyz_high).
- processing: str = 'cartoons_ss_coloring'
- proteins_from_dir(source: str) None
Processes proteins from a directory. In the source directory, the program will search for each of the available file types. Based on this, the class directories are initialized. The program will then start at the corresponding step for each structure.
- proteins_from_list(proteins: list[str]) None
Add all uniprot_ids from the list to the set of proteins.
- sample_pcd(proteins: list[str]) None
Samples the pointcloud form the ply files. As default, the source ply file is removed afterwards. To change this set self.keep_temp[FT.ply_file] = True.
- set_alphafold_version(args: Namespace) None
Parsers arguments from the argument parser Namespace and sets the alphafold version to the corresponding value.
- set_batch_size(args: Namespace) None
Parsers arguments from the argument parser Namespace and sets the batch size to the corresponding value.
- set_chimerax(args: Namespace)
- set_coloring_mode(args: Namespace) None
- set_database(args: Namespace) None
- set_dirs(args: Namespace) None
Uses arguments from the argument parser Namespace and sets the directories to the corresponding values.
- set_img_size(args: Namespace) None
- set_keep_tmp(args: Namespace) None
Uses arguments from the argument parser Namespace and sets the switch to keep or to remove the corresponding file types after a processing step is completed.
- set_version_from_filenames() None
Iterates over all Directories and searches for files, which have Alphafold version number. If one is found, set the Parser to this version. All files are treated with this version.
- structures: dict[str, vrprot.classes.ProteinStructure]
- update_existence(protein)
Updates the existence of the files for each protein structure.
- update_output_dir(output_dir)
Updates the output directory of resulting images.
- Args:
output_dir (_type_): _description_
- write_scale(protein) None
Writes the scale of the protein to the overview file. This file is used to keep track of the scale of each protein structure.
vrprot.argument_parser module
- vrprot.argument_parser.argument_parser(exec_name='main.py')
Argument parser function for the main function.
vrprot.batcher module
- vrprot.batcher.batch(functions, proteins, batch_size)
Will execute the function which has been passed in batches of protein structures.
vrprot.exceptions module
- exception vrprot.exceptions.ChimeraXException
Bases:
Exception
- exception vrprot.exceptions.StructureNotFoundError
Bases:
Exception
vrprot.overview_util module
- vrprot.overview_util.add_protein_structure_from_scales(overview: DataFrame, file: str, single_pdb_dir: str, processing: str, multi_pdb_dir: Optional[str] = None, output: str = '/Users/till/Documents/Playground/VRNetzer_Backend/extensions/ProteinStructureFetch/alphafold_to_vrnetzer/pypi_project/docs/static/csv/overview.csv')
- vrprot.overview_util.get_overview(file=None) DataFrame
- vrprot.overview_util.get_scale(uniprot_ids=[], mode=ColoringModes.cartoons_ss_coloring)
look in overview wether there is a pdb file for the uniprot id and the demanded mode.
- vrprot.overview_util.init_overview(columns=None) DataFrame
Initializes the overview table.
- vrprot.overview_util.main(proteins: list[vrprot.classes.ProteinStructure], mode: str)
- vrprot.overview_util.read_overview(file, index_col='uniprot_id') DataFrame
Reads the overview table.
- vrprot.overview_util.write_overview(overview, file=None) None
Writes the overview table.
- vrprot.overview_util.write_scale(uniprotid, scale, pdb_file, processing, overview)
vrprot.pointcloud2map_8bit module
- vrprot.pointcloud2map_8bit.gen_filennames(protein)
- vrprot.pointcloud2map_8bit.pcd_to_png(ascii_file, rgb_file, xyz_low_file, xyz_high_file, img_size=512)
- vrprot.pointcloud2map_8bit.run_batch(directory)
vrprot.sample_pointcloud module
- vrprot.sample_pointcloud.run_for_batch()
- vrprot.sample_pointcloud.sample_pcd(ply_file, output, SAMPLE_POINTS=262144, cube_no_line=None, debug=False)
vrprot.util module
- vrprot.util.batch(funcs: list[object], proteins: list[str], batch_size: int = 50) None
Will run the functions listed in funcs in a batched process.
- vrprot.util.call_ChimeraX_bundle(chimerax: str, script: str, working_Directory: str, file_names: str, mode: str, script_arg: list = []) None
Function to call chimeraX and run chimeraX Python script with the mode applied.
- Args:
script (string): chimeraX python script/bundle which should be called working_Directory (string): Define the working directory to which chimeraX should direct to (run(session,”cd “+arg[1])) file_name (string): target file which will be processed mode (string): Tells which pipline is used during chimeraX processing (ss = secondary structures, aa = aminoacids, ch = chain). Only ss is implemented at that moment. script_arg (list, strings): all arguments needed by the function used in the chimeraX Python script/bundle (size is dynamic). All Arguments are strings.
- vrprot.util.combine_fractions(directory: str, target: str, chimerax: str)
Combines multi fraction protein structure to a single structure and exports it as glb file.
- vrprot.util.convert_glb_to_ply(glb_file: str, ply_file: str, debug: bool = False) None
This function converts a glb file to a ply file.
- Args:
glb_file (string): Path to the glb file.
- vrprot.util.fetch_pdb(uniprot_id: str, url: str, save_location: str, file_name: str) bool
- vrprot.util.fetch_pdb_from_alphafold(uniprot_id: str, save_location: str, db_version: AlphaFoldVersion = 'v1') bool
Fetches .pdb File from the AlphaFold Server. This function uses the request module from python standard library to directly download pdb files from the AlphaFold server.
Requested .pdb files can be found at https://alphafold.ebi.ac.uk/files/AF-<UniProtID>-F1-model_<db_version>.pdb.
The loaded .pdb file is saved in a subfolder called “pdbs” in the current directory.
- Args:
uniprot_id (string): UniProtID of the requested protein. save_location (string): Path to the directory where the .pdb file should be saved. db_version (string): Version of the database.
- Returns:
success (bool) : tells whether the fetching was successful or not.
- vrprot.util.fetch_pdb_from_rcsb(uniprot_id: str, save_location: str) None
- vrprot.util.remove_dirs(directory)
Removes a directory an all underlying subdirectories. WARNING this can lead to los of data!
- vrprot.util.run_chimerax_coloring_script(chimearx: str, pdb_dir: str, proteins: list[str], save_location: str, processing: str, colors: list) None
This will use the give ChimeraX installation to process the .pdb files.It will color the secondary structures in the given colors. The offscreen render does only work under linux.
- Args:
protein (string): UniProtID of the protein which will be processed colors (list, optional): List containing three colors. The first is the color of coil. The second will be the color of the helix. And the last color is the color of the strands. Defaults to [“red”, “green”, “blue”] i.e. coils will be red, helix will be green and stands will be blue.
- vrprot.util.search_for_chimerax() str
Will search for the chimerax executeable on the system. Does not work with windows so far.