PatchTools is a tool to evaluate patches for inclusion into source trees managed by the GIT source code control system, notably the Linux kernel. The scenario in which PatchTools is intended to be used is:
Using “git am” or quilt to apply the patches often will not work because:
Often these problems are caused by the integration of other patches into the “c” source tree that modified the same files as your patches, thus invalidating the line numbers in your patches.
PatchTools is primarily designed for use with Linux patches, but can be used with any software system that uses GIT for source code control.
PatchTools assumes you have appropriate permissions to access any of the files referenced by PatchTools modules.
PatchTools has been tested on Linux Mint 16, using both Python 2.7.5 and 3.3.
There are two modes of installation: to one of your Python distributions, and as a stand alone source tree.
PatchTools may be installed into one of your Python distributions using pip or easy_install. It may also be installed from source into a distribution by entering ‘sudo python setup.py install’ or ‘sudo python3 setup.py install’ in a shell while you are located in the root source folder.
If you install to a Python2.x distribution, the Python modules will be placed, for example, in ‘/usr/local/lib/Python2.7/dist-packages/patchtools’, and there will be a metadata file ‘patchtools-1.0-egg-info’ file in the same folder to describe the package. The doc and examples data files will be placed in a ‘/usr/local/patchtools’ folder.
If you install to a Python3.x distribution, the Python modules will be placed, for example, in ‘/usr/local/lib/Python3.3/site-packages/patchtools, and there will be a metadata file ‘patchtools-1.0-egg-info’ file in the same folder to describe the package. The doc and examples data files will be placed in the same ‘/usr/local/patchtools’ folder.
To use PatchTools as a stand alone source tree, download the tarball, and extract its files to a suitable location. Then you must create a PYTHONPATH environment variable as described in the “System Considerations” section below.
The software may also be imported into an Eclipse-PyDev project using the ‘File:Import‘ menu and the ‘File System’ option. Ensure that the Eclipse value of PYTHONPATH is set to the parent folder of the source tree. Note that if you select a Python3 interpreter, the Command module, which is used by the Viewer, will not be able to launch a gedit that is not Python3 compatible. The causes for this are unknown, but PyDev is suspected, since such requests work properly in a shell.
After installing PatchTools, you must set up some test data. Due to the large number of files contained in a Linux kernel, and in some patch sets, test data is not included in the release.
In a suitable location create a ‘data’ folder with these sub folders:
- archives
- To hold any patch archive files you download from kernel.org, etc.
- patchset
- A folder to contain a tree of patches. You may have more than one such folder, if they have different names.
Download any patch archive files to the archives folder.
Download any patch sets to the patchset folder(s).
Download any source trees (e.g. Linux kernels) to convenient locations. It is not recommended to store a Linux kernel in an Eclipse workspace, due to the large number of files a kernel contains.
Finally create suitable config.json files to describe and link to the data. See the config.json files in the examples folders for typical contents.
You can use PatchTools can speed up the process of determining suitability of your patches to a new kernel by:
The finder module implements a class Finder to facilitate searching file trees for patterns of interest to you.
The checker module implements a class Checker to compare the contents of patch files to the source files they reference. Checker objects can be used to determine if the changes in a patch are compatible with your target kernel.
The viewer module implements a class Viewer to facilitate editing patches and the files they reference. Methods are implemented to handle special cases such as displaying a patch and the files referenced by the patch.
The walker module implements a class Walker to allow enumeration of selected directories and files in a file tree. Both directories and specific file types may be included in or excluded from the results.
The watcher module implements a class Watcher to compare the contents of patch files to a “patch archive” file associated with a kernel release. You can use the Watcher to determine if any of your patches have been integrated into newer kernels than the one on which the patches were developed.
The helper module implements a class Helper to ease the task of assembling the fairly numerous required and optional parameters of the modules listed above. It provides wrapper functions for the modules, and some useful utility functions.
The archive module implements a class Archive to extract information from “patch archive” files associated with kernel releases.
The command module implements a class Command to provide a simple wrapper for the Python subprocess module.
The functions module implements a class Functions to provide various utility functions.
The jsonconfig module implements a class JSONConfig to allow application configuration using enhanced JSON data files.
The strings module implements a class Strings to provide useful string like methods for lists of strings.
Archive objects are used by the Watcher to extract information from “patch archive” files provided by kernel.org for their kernel releases. The files contain a list of the patch diff sections that were applied to obtain the release version. See Appendix B for more information.
Checker objects compare the contents of patch files to the source files they reference. Checker objects are used to determine if the changes in a patch are compatible with your target kernel.
The Checker class accomplishes its goals by the following steps:
The Checker has two primary modes of operation:
The Checker has several optional features:
See the ‘Checker’ section of the API documentation for call details.
Since various mail systems and editors can corrupt the patch files as they are in transit, the Checker normalizes the patch path lines (“— ...”, “+++ ...”), the ‘diff’ lines (“diff –git ...”) and the hunk range lines (“@@ ...”) before splitting them for extraction. The normalization consists of replacing all tabs by spaces and replacing multiple spaces between words of a line by single spaces.
The Patch parsing logic will discard any patch lines that have been commented out by surrounding them with lines containing only ‘”“”’. See the ‘Patch’ section of the API documentation for more details. This feature can be used to narrow the focus of your investigation to a small set of patches and files. But if you plan to use ‘git am’ on a patch later, you should do the commenting in a copy of the patch.
To start a test against your target kernel, you may execute:
f = h.load("dts_patch_refs.lst")
m = h.check(f, h.extend(c['defaults'], { "mode" : "full", "find" : False, }))
h.write(m, "match_dts.txt")
Assuming that you have used the Finder to locate patches that refer to some DTS files, and saved the results into “dts_patch_refs.lst”, this code will check those patches against your kernel version.
See the ‘Checker’ section of the API documentation for further info on its parameters and methods.
See ‘Appendix E - Checker Output’ for explanation of Checker output.
If the Checker is used on all patches in a large set, it can provide you with a very large amount of bad news concerning the state of your patches, in part because it does not take into account dependencies between patches. It is useful to narrow the scope of your investigations to a subsystem, group of patches or group of files to analyze. If you decide to fix a series of related patches, you should fix the first one in commit order, test the others again to see if any problems have been resolved, and repeat this process down to the last patch.
The Walker and Finder can be used to generate small lists of files related to specific subsystems, based on matches to text strings such as “am33xx”, etc:
l = h.load("patch_names.lst")
m = { "substr" : ["am335x-bone", "am33xx.dtsi", "tps65217.dtsi" ],
f = h.find(m, { "root_path" : c['patchdir'], "file_paths" : l, "format" : "files" })',
h.save(f, "dts_patch_refs.lst")'
The Checker can be used on one or a few of these files at a time:
f = h.load("dts_patch_refs.lst")
g = h.PatchSet(c['defaults']).sort_patches({ "patches" : f, "order" : "patchset" })
h = h.check(g, c['defaults'] + { "find" : True })
Note that the Walker and Finder do not return lists of patch names in patchset order, which is supposed to be the commit order needed for successful use of ‘git am’.
If the Checker ‘targets’ option is used, the Checker will only scan diff sections that modify the files specified in the targets list. For example, you could select BeagleBoard related device tree files for investigation:
f = h.load("dts_patch_refs.lst")
g = h.PatchSet(c['defaults']).sort_patches({ "patches" : f, "order" : "patchset" })
t = ["am335x-bone.dts", "am335x-boneblack.dts", "am335x-bone-common.dtsi",
"am33xx.dtsi", "am33xx-clocks.dtsi", "tps65217.dtsi", "Makefile"]
d = h.check(g, h.extend(c['defaults'], { "find" : True, "targets" : t }))
h.write(d, "dts_matches.txt")'
When the Checker encounters a diff file name that is not in its targets list, it will issue a message like:
# SKIPPING DIFF: "diff --git a/arch/arm/boot/dts/am335x-bone.dts ...
The Patch module used by the Checker to parse patches will discard any patch lines that are surrounded by lines containing only ‘”“”’. If you observe that an initial Checker report indicates that all the errors in a hunk are like ‘“delete” line not found’ or ‘“merge” line not found’ and that the target file does not have the specified “delete” lines, or that “add” lines are already in the target file, you can comment out the diff or hunk to further narrow the scope of your investigation to hunks or diffs that actually need to be fixed. This strategy was used heavily in Example 5 described below.
If all diffs in a patch are commented out, the Checker will issue a message like:
INFO: skipping empty patch
Some example uses of this feature are:
To comment out a diff for a file in which you are not interested:
""" N/U not BeagleBone Black files diff --git a/arch/arm/boot/dts/am335x-bone.dts b/arch/arm/boot/dts/am335x-bone.dts ... """To comment out a diff or a hunk which is not needed:
""" N/N am33xx.dtsi has the add lines and more values diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi ... """
The use of ‘N/N’ and other codes to start explanatory notes is conventional, and is not interpreted by the software.
The ‘vcpf’ (view checker output, patch and its files) function of the Helper can be very helpful in determining if a patch is correct or if it is needed.
See the files in the “examples” folder for complete usage examples.
Command objects provide a simple wrapper for the Python subprocess module. Command execution can be synchronous, or asynchronous. Normally, synchronous mode will be used by the Helper, except that most of its view commands use asynchronous mode.
Commands may be passed as strings or as lists of arguments.
Command strings are executed in shell mode:
cd ~;pwd;ls
There appear to be some limitations to this approach, as in a command string like:
cd ~; source .profile;cdlsk
where ‘cdlsk’ is an alias defined in .profile:
or ‘~/.profile’.
The ‘cdlsk’ alias will not be recognized.
You cannot split the command into multiple invocations because the environment created by ‘source ./.profile’ will be lost when the sub process exits.
Commands that normally produce output organized in columns in an interactive shell, e.g. ‘ls’, will instead produce a list of items separated by ‘\n’ characters.
The usage of stdout and stderr by Linux programs is variable, for example:
- bash may return zero, indicating success, but also return error output.
- wget returns normal output on stderr.
Some programs use extended ASCII (aka Latin-1) characters (i.e. ord(ch) >= x80 in Python terms) in their output. For example, certain commands surround filenames with left single quote and right single quote characters. Even though Eclipse and PyDev can display such characters, Python3 will convert them to hexadecimal escape sequences (e.g. lsquo is represented as ‘\xe2\x80\x98’). The Command class converts lsquo and rsquo to ‘ as needed.
Python2.7 returns stdout and stderr data as strings, while Python3.x returns them as byte arrays. Command translates the values to strings as needed.
See the ‘Command’ section of the API for call details.
Finder objects try to find references to patterns you specify in a file or a file tree. For each pattern you select, the Finder will return a list of references it finds in the target file or folder.
Unlike programs such as ctags and cscope, the Finder does not attempt to index your entire software tree, but instead focuses on the folders, files and text patterns you specify.
Either absolute or relative paths may be used in specifying the search root. When relative paths are used, they must be accessible from the caller’s current directory.
Using common patterns such as ‘dma’ may produce a large amount of output, particularly if you set the search root to the root of a kernel tree, so your choices of root and patterns should be made with care.
See the ‘Finder’ section of the API documentation for more information on the class methods.
A JSONConfig object holds application configuration data taken from enhanced JSON encoded files.
The JSON files may contain line comments and block comments.
Line comments are lines that start with ‘#’ (ignoring leading white space), and will be removed in file loading.
Block comments are coded by inserting a line with only “”” before and after the lines to be commented out. Any lines between such lines are removed in file loading.
Once loaded the object may be accessed by key indexing of its dict super class instance, or by use of the get and set methods. These two methods support convenient multilevel indexing by the use of “path expressions”. For example, ‘...get(“mysql_options/admin_profile/data_base”’ will return the value of self[“mysql_options”][“admin_profile”][“data_base”]. The user can specify the path separator character in the ‘separator’ option passed to the constructor.
Python’s JSON loader strictly requires the data to have correct JSON syntax, and will generate exceptions if it doesn’t. To avoid confusing users by presenting them with line numbers in the stripped line set, JSONConfig will catch exceptions raised by the JSON decoder, map the exception line numbers back to their equivalents in the original file, and reraise the exceptions as JSONConfigError objects.
On Python 2.x, the get method will translate unicode dict values to str objects.
See the ‘JSONConfig’ section of the API doumentation for call details.
See the test*.py programs in the examples folder, and the ‘__init__’ method of the Helper for typical usage.
Matcher objects match strings to patterns. Used by the Walker to filter file names and the Finder to filter text strings, the Filter allows you to specify match patterns by:
- match
- a list of exact match patterns
- prefix
- a list of prefix patterns
- suffix
- a list of suffix patterns
- substr
- a list of substring patterns
- regexp
- a list of regular expression patterns
- funcs
- a list of callback functions
Patterns are tested against strings in the order shown above.
Some examples:
f1 = Matcher({ "prefix" : ["Kconfig", "Makefile"] })
f2 = Matcher({ "suffix" : [".dts",".dtsi"] })
f3 = Matcher({ "substr" : ["am33xx.dtsi", "am335x-bone.dts", "am335x-boneblack.dts",
"am335x-bone-common.dtsi", "am33xx_pwm-00A0.dts", "bone_pwm_P8_13-00A0.dts" ] })
f4 = Matcher({ "regexp" : [r".*am335x\-b.*\.dts.*", r".*am33xx.*\.dts.*", r".*bone.*\.dts.*"] })
Parameters like f1 can be used by the Walker to find all files whose names begin with “Kconfig” or “Makefile” in the folders you told it to search.
Parameters like f2 can be used by the Walker to find all files whose names end with ”.dts” or ”.dtsi”.
Parameters like f3 or f4 can be used by the Finder to find all references to the specified strings in your patch or source files.
Using regular expressions may eliminate the need to use line continuations, but it can be difficult to formulate expressions that produce exactly the same result as simpler combinations of ‘substr’, etc.
The ‘encoding’ example shows how to use callback functions to select files for testing.
See the ‘Matcher’ section of the API documentation for call details.
Patch objects parse the strings of a patch file into an object suitable for analysis. The object will contain a list of Diff objects.
Patch objects will discard any patch lines that have been commented out by surrounding them with lines containing only ‘”“”’. If you plan to use ‘git am’ on a patch later, you should do the commenting in a copy of the patch.
See the ‘Patch’ section of the API documentation for call details.
See checker.py for example usage of the Patch class.
Patch sets are normally organized in two level trees, with a root folder and sub directories for specific topics, e.g. ‘dma’. The ‘name’ of a patch consists of its subdirectory name joined to its file name by ‘/’.
A patch set description is a dict that specifes the order in which the patches are to be applied. Its ‘groups’ element is a list of the topic specific sub folders mentioned above. Within each sub folder, patches are intended to be applied in the order indicated by the first four characters of the patch file names. This order was encoded by using ‘git format-patch’ or quilt to format the patches.
Strings objects provide useful string like methods for lists of strings.
Note that taking a slice of a Strings object will always return a Strings object.
Strings is used by the Watcher class, as well as by the Diff, Hunk and Patch classes. See those files and the examples files for more usage examples.
Viewer objects allow you to view sets of related files, using graphical or nongraphical editors.
The default editor on Linux is ‘gedit’, which allows numerous files to be displayed in a single window.
The editor default can be overridden by passing an editor specification to the constructor, as shown in the API section below.
Walker objects walk a file tree and return the path of each discovered file. Directory and file filters may be applied to narrow the scope of a search in a large file tree.
Watcher objects facilitate viewing diff sections in your patch files, diff sections in “patch archive” files, and the source files they reference. Patch archive files contain all the diff sections that were applied to the previous version of a kernel to obtain a new version.
See the ‘archives’ folder in the examples for typical usage.
Helper objects facilitate use of PatchTools’s tools, which have many required and optional parameters. The Helper provides wrapper functions for PatchTools classes, and some useful utility functions.
The Helper constructor creates a ‘defaults’ item in the application’s config data object, using the values of ‘sourcedir’, ‘patchdir’, etc., found in the data, and subsequently uses it in calls to wrapper functions where the caller does not provide a parameters object.
See the ‘Helper’ section of the API documentation for futher info.
Many operations use configuration data that is loaded into a JSONConfig object during initialization. Items specified in the configuration data include the location of the source tree and of the patches directory. A description of the patch set may also be stored there.
See the ‘config.json’ file for an example, and the ‘JSONConfig’ section of the API documentation.
PatchTools objects are intended to be embedded in Python scripts, which can have various exception reporting and logging schemes. Consequently PatchTools classes generally do not catch exceptions except to translate them to other exceptions. For example, JSONConfig objects catch ‘KeyError’ exceptions generated by Python’s json module, and map their line numbers to those used in the source file, which may have different line numbers due to comment lines.
The exceptions.py file provides an ExceptionHandler class which can be used to print exception reports.
PatchTools classes uniformly use exceptions to report errors, for example parameter errors, while return values are used to deliver valid result data to the caller. The exceptions are defined in exception.py.
See the example files for a simple exception handling scheme.
All modules encode file paths in Unix style using ‘/’ characters.
The functions module determines if the Python version is 3.x when it is loaded. The Python version can then be obtained by other modules by calling the Functions.is_python3 method.
If PatchTools is not installed in your Python installation, and you are not using Eclipse, you must specify the PYTHONPATH environment variable to allow Python to find the PatchTools modules. The easiest way to do this is to export the definition from your .profile file:
export PYTHONPATH="$HOME/Projects/Eclipse/Linux/PatchTools"
Then you can run test programs from any location.
Eclipse-PyDev will define a PYTHONPATH variously according to the settings you choose when creating your project. The PYTHONPATH value should include the parent folder of the ‘patchtools’ source folder, as in the setting above.
Linux reportedly has adopted UTF-8 as the default text encoding, but some Linux kernel files contain ‘Latin_1’ characters whose numeric values are greater than 127, and are not valid UTF-8 start bytes. Consequently the file access methods in the Functions class default to ‘Latin_1’ encoding.
This section shows typical usage of PatchTools classes and the Helper class.
See the folders in the ‘examples’ tree for the example code referenced below.
This example shows the usage of many of the features described above.
Suppose we want to create a source file list for the Finder that enumerates selected files in the kernel tree, but excludes the ‘.git’ folder and any folders in ‘arch’ other than ‘arch/arm’. This script will do the job:
f = { "suffix" : [".c","h"], "prefix": ["Makefile","Kconfig"] }
p = { "root_path" : c['sourcedir'], "incl_files" : f }
excl = { "excl_dirs" : [".git","arch"] }
incl = { "incl_dirs" : ["arch/arm"] }
p1 = h.extend(p, incl)
p2 = h.extend(p, excl)
w = h.walk(p1) + walk(p2)
h.save(w, c['logdir'] + "/src_files.txt")
Lines 1-4 create some dict variables to include in the parameters for the walk operations.
Lines 5 and 6 combine these dicts to make the final parameter dicts.
Line 7 executes the walk function on each parameter dict and combines their output.
Line 8 writes the data to a file for future use.
Running a script like this at the start of a project can substantially reduce the time required for subsequent Finder operations.
In this example we want to assess the difficulty of porting some patch changes to our kernel by displaying a Checker output file, a kernel source file, and the patches that modify the source file:
h.view(c['logdir'] + "/match_a.log")
h.vf2p("arch/arm/boot/dts/am33xx.dtsi")
If the ‘gedit’ editor is used (the Linux default editor), all the files will be displayed in a single window, in the following order for this example:
- The source file
- The patch files in patchset group order, which is presumably their commit order.
We observe that searching for ‘am33xx.dtsi’ at the top of the Checker output file takes us to the report for patch “dma/0018...”, which is the first patch file displayed by gedit.
The Viewer classes normally launch the editor in asynchronous (nowait) mode, which allows users to enter two commands like the ones above without being blocked at the first command.
This example shows how the wait/nowait feature of the Viewer classes could be used to display a patch file, its corresponding Checker output file, and the files referenced by the patch, one at a time:
p = c['patchdir'] + "/adc/0002-input-ti_am33x_tsc-Step-enable-bits-made-configurabl.patch"
m = h.check(p)
f = "matcher.txt"
h.write(m, f)
v = { "wait" : True }
a = c['sourcedir'] + "/drivers/iio/adc/ti_am335x_adc.c"
b = c['sourcedir'] + "/drivers/input/touchscreen/ti_am335x_tsc.c"
c = c['sourcedir'] + "/drivers/mfd/ti_am335x_tscadc.c"
d = c['sourcedir'] + "/include/linux/mfd/ti_am335x_tscadc.h"
h.view([p,f,a],v)
h.view([p,f,b],v)
h.view([p,f,c],v)
h.view([p,f,d],v)
In this scenario, each view command will block until its editor is closed.
In this example we will identify all the drivers used to control the Beagle’s AM335X processor by cross referencing “compatible =” items in the Beagle’s .dts files to ”.compatible =” items in the “of_match” tables of the kernel source files.
First we will find the dts compatible items:
d = "arch/arm/boot/dts/"
l = [(d + s) for s in ["am33xx.dtsi","am335x-bone-common.dtsi", "am335x-boneblack.dts",
"am33xx-clocks.dtsi", "tps65217.dtsi"]]
f = h.find({ "substr" : ["compatible = "] }, { "root_path" : c['sourcedir'], "file_paths" : l })
h.write(f, "test3.txt")
test3.txt requires some post processing, as it contains extraneous text and duplicate entries. The Helper’s write method was used to save the list as ‘\n’ terminated strings, so this task can be automated:
strings = Strings(h.read("test3.txt"))
strings = strings.ltrim("compatible = ").rtrim(';').sort().unique()
h.write(strings, "test4.txt")
Next we will list the kernel source files that might contain the corresponding entries in their of_match tables:
p = { "root_path" : c['sourcedir'], "incl_files" : { "suffix" : [".c"] } }
p1 = h.extend(p, { "incl_dirs" : ["arch/arm"] })
p2 = h.extend(p, { "excl_dirs" : ["arch", ".git", "Documentation", "staging", "samples", "tools" ] })
w1 = h.walk(p1)
w2 = h.walk(p2)
h.save(w1+w2, "test5.txt")
In this code the output was saved as a JSON object to allow using it as pattern parameters for the following find operation.
Finally we will match the patterns in “test4.txt” to the source files:
p = h.read("test4.txt") # read compatible = items
f = h.load("test5.txt") # load candidate sources
r = c['sourcedir']
m = h.find({ "substr" : p },{ "root_path" : r ,"file_paths" : f })
h.write(m, "test6.txt")
Again some editing of the output file “test6.txt” is needed to eliminate extraneous text and duplicates:
strings = h.read("test6.txt")
strings = [string[:string.find(':')] for string in strings]
strings = Strings(strings).sort().unique()
h.write(strings, "am335x_drivers.lst")
The final list contains 36 am335x related drivers that are used to control the Beagle board. The list can be saved and passed to the Checker as a ‘targets’ option, or to the Watcher to be run whenever a new kernel version is released by kernel.org.
The ARM community, including Texas Instruments and the Beagle developers, have been making substantial progress in the last year or two in adopting the “Device Tree” system for their products. It is possible that, although they have submitted many patches related to this effort, not all the patches have been integrated into our target kernel. In this example we will investigate the state of Beagle related .dts and .dtsi files in the patches and the kernel.
Examination of the kernel’s ‘/arch/arm/boot/dts’ folder shows that there are several files related to our target device, the BeagleBone Black:
- am335x-boneblack.dts
- am335x-bone-common.dtsi
- am33xx.dtsi
- am33xx-clocks.dtsi
- tps65217.dtsi
We can find all the patches that touch these files with this Helper script:
l = h.list_patches()
p = { "substr" : ["am33xx.dtsi", "am335x-boneblack.dts", "am335x-bone-common.dtsi",
"tps65217.dtsi", "am33xx-clocks.dtsi"] }
f = find(p, { "root_path" : c['patchdir'], "file_paths" : l, "format" : "files" })
g = h.PatchSet(c['defaults']).sort_patches({ "patches" : f, "order" : "patchset" })
h.save(g, "dts_patch_refs.lst")
The first and sixth lines show use of the Helper’s utility functions, ‘list_patches’ and ‘save’.
The fourth line shows use of one of the Helper’s class wrapper functions, ‘find’. The “files” option tells the Finder to return only filenames, with no duplicates.
See the API sections for descriptions of the functions, classes and parameters used.
Now we have the file “dts_patch_refs.lst”, which contains the names of 76 patches that touch the files. From here we can use the Helper‘s view commands to view the patches individually, and the files they use. For example:
h.vp2f("pm/0062-ARM-OMAP2-AM33XX-timer-Interchance-clkevt-and-clksrc.patch")
The files can also be compared against a Checker output file by using the ‘vcpf’ command:
h.vcpf("check.log", "pm/0062-ARM-OMAP2-AM33XX-timer-Interchance-clkevt-and-clksrc.patch")
If the default editor ‘gedit’ is used, all the files for a single command will appear in one window.
Inspecting the files in this way could take considerable time, but this process can be accelerated by using a Helper script like this:
fl = h.load("dts_patch_refs.lst")
cp = h.extend(c['defaults'], { "mode" : "complete", "find" : True })
ep = { "wait" : True }
pd = c['patchdir'] + '/'
for patch in fl:
l = h.check(patch, cp)
h.write(l, test6.txt")
h.vcpf("test6.txt", pd + patch, c['defaults'], ep)
After checking the 76 patches we find that:
- 27 of the patches specify changes that are already in the kernel, so the patches can be ignored.
- Another 28 of the patches relate to features we won’t use, i.e. certain capes, so these patches can also be ignored.
- One patch definitely needs to be fixed.
- 21 patches need fixing if we will use their features, e.g. AM335 reset control or TI’s PM firmware.
- One patch has an undesirable modification of generic kernel files to work around a device specific problem, and should be redone.
See the ‘report.txt’ file in the examples/alldrivers folder for details.
Similar investigations can be done for subsystems you may be interested in, e.g. dma, adc, gpio, etc., with comparable results.
In this example we look into the appropriateness of patches that modify the touch screen control (TSC) and analog-digital converter (ADC) control logic in the kernel and driver files. We will use the Helper to facilitate the investigation.
First we find patches that relate to terms such as “tscadc”,”tsc” and “adc”. But we use search terms such as “adc.c” to exclude extraneous matches to words containing those terms:
l = h.list_patches()
p = { "substr" : ["tscadc.c","tscadc.h","tsc.c","tsc.h","adc.c","adc.h"] }
f = h.find(p, { "root_path" : c['patchdir'], "file_paths" : l, "format" : "full" })
h.save(f, "patch_refs.lst")
Next we make a list of the patches that touch each file, without duplicates:
l = h.list_patches()
t = ["drivers/iio/adc/ti_am335x_adc.c",
"drivers/input/touchscreen/ti_am335x_tsc.c",
"drivers/mfd/ti_am335x_tscadc.c",
"include/linux/mfd/ti_am335x_tscadc.h"
"include/linux/input/ti_am335x_tsc.h",
"include/linux/clk-provider.h"]
p = { "root_path" : c['patchdir'], "file_paths" : l, "format" : "files" }
f = h.find({ "substr" : t }, p)
f = [s[:s.find(':')] for s in f] # Extract patch name
s = Strings(f).sort().unique()
h.save(s, "patch_refs2.lst")
The “patch_refs2.lst” file now contains the names of 23 patches that touch the files. We can use the Checker and Viewer to examine the patches and files:
f = h.load("patch_refs2.lst")
p = h.extend(c['defaults'], { "mode" : "complete", "find" : True })
r = c['patchdir'] + '/'
cf = "../test/test6.txt"
for patch in f:
m = h.check(patch, p)
h.write(m, cf)
h.vcpf(cf, r + patch, p, { "wait" : True })
The results have been summarized in the file “report.txt” in the examples/tsc_adc folder. See the Notes section at the bottom of that file for an explanation of the codes used.
We see that although the majority of diff and hunk sections are marked “N/N” (not needed), a significant minority are indicated to need fixing. Some of these only need to have their line numbers adjusted to fit the target code, but others have more serious problems. In particular, certain features such as enhancing interrupt logic and adding work queues have not been integrated into the target kernel drivers. In other cases, the changes requested by some patches may have been obsoleted by other patches that were integrated into the kernel.
The situation is more complicated than was the case in the DTS study, and more investigation is needed. However we can drop further consideration of all the patch sections marked “N/N” or “N/U”, and focus on the remaining sections. The best approach may be to use the existing versions of the target files, and only fix and apply the patch sections that affect our project.
The BeagleBone board developers have devised a sytem to allow accessory boards that to be plugged into a Beagle board using its GPIO connectors, and have supported the development of a kernel mode file ‘capemgr.c’ to provide some control of such boards. The patches that created and enhanced the file have not been integrated into the target kernel, so the file is not found there. In this example, we investigate the use of ‘git am’ command and the patch utility to recreate the file from the patches.
See the examples/capemgr folder for the code, and Appendix D - Checker vs. quilt vs ‘git am’ for a discussion of the results. The generated file is found under the examples/capemgr/quilt folder.
Complete documentation of the API’s of the Helper and PatchTools classes is provided below.
Extract information from “patch archive” files
Constructor
Find archive file sections that modify files also modified by our patches
Validate the contents of Linux kernel patch files against the files specified in them
Constructor
Validate the contents of one or more Linux kernel patch files against a kernel
Find references to patterns you specify in a Linux kernel tree, a patch archive file, or in a set of patches
Constructor
Report matches by file to (patterns) found in selected files
Display a set of files in a user selected editor
Constructor
Launch specified editor to view the file(s)
List patch and files it uses
Display a patch file and the other patch files that use the same source files
Display patch and archive diff sections that use its files
List checker output file, patch and files it uses
Enumerate paths of selected files in a source tree
Constructor
Determine if your patches have been integrated into released kernel versions
Constructor
View files related to archive diff sections
Extract information from a diff section of a patch file
Constructor
Extract information from a hunk section of a patch file
Store configuration data obtained from enhanced JSON input files
Constructor
Ensure that a slice of our data is returned as a str object, when the value is a unicode string, on Python2.x
Get top level value or internal value
Set top level value or internal value
Add data to the current config
Determine if item is in the config data
Extract information from a patch file
Constructor
List the files referenced by a patch, without duplicates
Extract information from a set of patch files
Constructor
Get source file data for patches
Get patch data for source files
Return list of names of patches in our patch set
Return a list of source files referenced in one patch file
Return a list of patch files that refer to one source file
Get a list of patches that patchname depends on
PatchTools super class
Provides an identifiable super class for all PatchTools classes. Implements common parameter checking functions for the sub classes.
Provide some useful string like methods for lists of strings
Constructor
Ensure that slices of Strings objects are returned as Strings objects, not lists
Find the first string in our data that starts with (pattern)
Find the first string in our data that starts with a pattern in (patterns)
Find the last string in our data that starts with (pattern)
Find the last string in our data that starts with a string in (patterns)
Find all strings in our data that start with (pattern)
Return indices of strings that exactly match (pattern)
Remove leading lines that are empty or whitespace
Remove trailing lines that are empty or whitespace.
Split our data into two parts at a splitter pattern, searching forwards
Split our data into two parts at a splitter pattern, searching backwards
Split our data into two or more parts at occurrences of a splitter pattern
Extract a list of sections tagged by (begin) and (end)
Remove a list of sections tagged by (begin) and (end)
Remove duplicate successive instances of strings in our data.
Join a list of objects into a single Strings object. Each object is a list of strings or a Strings object.
Execute command in subprocess
Execute subprocess synchronously
Execute subprocess asynchronously
Wait for asynchronous subprocess to exit
Handle exceptions
Constructor
Facilitate use of PatchTools’s classes
Handle Finder request
Handle Checker request
Handle Watcher request
View specified file(s)
View patch and the source files it references
View source file and the patches that reference it
View patch and other patches that reference its files
View patch and archive diff sections that reference its files:
View Checker output, patch file and the source files it references, one at a time.
Implement filter selection of strings
Constructor
Try to match string to stored filter
Utility functions
Linux source files and patches that describe them may contain byte values that are legal ‘Latin-1’ (aka ISO-8859-1) character codes, but not legal ‘UTF-8’ start bytes. For example, 0xb3 is the ‘Latin-1’ character for the cube symbol, i.e. a superscript 3. For this reason the file access functions below default to the ‘latin_1’ encoding.
Report whether the host system is Windows
Report whether the Python version is >= 3
Determine size of the file/folder at (path)
Determine whether the object at (path) is a directory
Determine whether the object at (path) is a file
Form a path from (head) and (tail).
remove (head) from (path.
Read file data.
Read file lines.
Read file strings.
Write text to file at path
Write strings to file at(path)
Extract filename from string
Replace all internal string whitespace by single spaces
Split a string into words on space boundaries.
Determine if an object is a string
For a number of years, the Linux kernel source has been managed by the GIT source code control system. Normally you will use GIT commands to obtain and update kernels. GIT uses patches to modify the kernel sources. Linux kernel patches are usually produced by running the ‘git format-patch’ command, which commits your changes if needed, and also generates the patches.
A typical patch file is “0002-ARM-OMAP-Hack-AM33xx-clock-data-to-allow-JTAG-use.patch”:
From ac9bb90cbb8c81dd384e339063e72d2fc90221c2 Mon Sep 17 00:00:00 2001
From: Matt Porter <mporter@ti.com>
Date: Mon, 7 Jan 2013 11:55:00 -0500
Subject: [PATCH 02/35] ARM: OMAP: Hack AM33xx clock data to allow JTAG use
The debugss interface clock must remain enabled at init
in order to prevent an attached JTAG probe from hanging.
Signed-off-by: Matt Porter <mporter@ti.com>
---
arch/arm/mach-omap2/cclock33xx_data.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mach-omap2/cclock33xx_data.c b/arch/arm/mach-omap2/cclock33xx_data.c
index ea64ad6..a09d6d7 100644
--- a/arch/arm/mach-omap2/cclock33xx_data.c
+++ b/arch/arm/mach-omap2/cclock33xx_data.c
@@ -428,7 +428,7 @@ DEFINE_STRUCT_CLK(smartreflex1_fck, dpll_core_ck_parents, clk_ops_null);
* - usbotg_fck (its additional clock and not really a modulemode)
* - ieee5000
*/
-DEFINE_CLK_GATE(debugss_ick, "dpll_core_m4_ck", &dpll_core_m4_ck, 0x0,
+DEFINE_CLK_GATE(debugss_ick, "dpll_core_m4_ck", &dpll_core_m4_ck, ENABLE_ON_INIT,
AM33XX_CM_WKUP_DEBUGSS_CLKCTRL, AM33XX_MODULEMODE_SWCTRL_SHIFT,
0x0, NULL);
--
1.7.10.4
This format is a variation on the ‘diff -urN’ format that is sometimes described in the literature, and is still used in application packages. The patch content in this format is the body of an email message. The patch may have one or more ‘diff’ sections that refer to the same or different files, and each diff section may have one or more ‘hunk’ sections (marked by lines starting with ‘@@’).
See <http://wikipedia.org/wiki/Diff> for basic information on diff formats.
Note that:
In hunk lines (that begin with ‘@@’) the text after the second ‘@@’ is termed a ‘hunk note’ in the PatchTools source code. Hunk notes appear to be copies of the closest previous source line that has a character in the first column, e.g. function declaration lines and label lines in a .c file. Hunk notes are useful in finding text that is not in the location expected by a patch, but may not be unique if the file uses conditional compilation (e.g. “#ifdef FOO”).
The index line contains values obtained from the SHA1 checksums computed on the commit states before and after applying the patch on the patch developer’s system. When possible, a short unique prefix of a checksum is used.
Some special cases are handled differently by GIT than by ‘diff -urN’:
Addition of a new file to the tree:
diff --git ...
new file mode 100644
index ea64ad6..a09d6d7 100644
--- /dev/null
+++ b/arch/arm/mach-omap2/cclock33xx_data.c
@@ -0,0 +1,n @@ ...
...
where n is the number of lines in the new file.
Removal of an old file from the tree:
diff --git ...
deleted file mode 100644
index ea64ad6..a09d6d7 100644
--- a/arch/arm/mach-omap2/cclock33xx_data.c
+++ /dev/null
@@ -1,n +0,0 @@ ...
...
where n is the number of lines in the old file.
Addition of a binary file to the tree:
diff --git ...
new file mode 100755
index <old git index> .. <new git index>
GIT binary patch
literal 11764
zcmds7eQ;dWb-! ...
The Checker does not handle binary patches.
Each kernel released by kernel.org has an associated “patch” file shown on the site’s main page, which you can download. The file is an ‘xz’ archive that contains a single text file which is the concatenation of the diff sections from all the patches that were applied to produce the kernel version. Note that the file does not contain patch file names, since the patches were submitted to kernel.org as email messages. The archive also does not contain other information from the patch headers such as ‘Author’, ‘Date’, etc.
Some data values in the diffs may be different than the corresponding values in the submitted patch files. A patch is submitted by an author as an email message. If the patch is accepted for inclusion into the kernel, the author will later receive some message like “please rebase your patch to the tip of branch x”. Rebasing may involve changing the hunk line numbers and possibly the text of edits. The kernel developer/maintainer will apply the changed patch. Consequently the index checksums will be different than those in the patch, and the hunk line numbers and edits may also differ.
Examples are found in the examples folder, and are coded using the Helper class for convenience.
PatchTools was developed using a fairly large set of patches assembled by eLinux.org to create an embedded Ubuntu version based on Linux kernel version 3.8.13 that could be used with the well known “BeagleBone” embedded computers, which use ARM based processors provided by Texas Instruments. Contact eLinux.org to get the patches.
We have used various other kernels for testing; most recently a stable kernel with version number 3.16.1 was used.
The source files ‘config.json’ shows how to set up configuration data for the Helper, which uses a JSONConfig object to store and access the data.
After studying the Helper documentation, you can embark on experiments related to your project.
PatchTools is implemented with Python new style classes, for consistency of style, and to allow integration of its components into other scripts or processes.
Class constructors process parameters that are expected not to change between invocations, while instance methods process parameters that are different in each invocation. For example:
c = Checker({ 'patchdir' : <path to patch dir>, 'sourcedir' : <path to kernel dir> })
r = c.check(<patchname1>)
s = c.check(<patchname2>)
Class constructors have a single dict argument to receive parameters.
Instance methods usually have one or two target file parameters, and may have a dict argument to receive context data.
Since parameter values are provided by end users, extensive validation is performed on the values.
The Helper class provides convenient wrappers for the classes and useful utility functions. It can be used as shown in the examples.
The software has been implemented using the Kepler version of Eclipse with the PyDev plugin on a Linux Mint16 system. Use of these tools for further development is not required, but if they are not used, some additional setup may be needed.
All import statements use this form:
from patchtools.lib.utils.functions import Functions as ut
This verbose form will prevent components in other packages from being found in the Python path when the intention is to import a PatchTools component. On launching a program the Python interpreter builds a sys.path variable including some standard locations and the value of the environment variable PYTHONPATH if it is defined.
If PatchTools is not installed in your Python system, Python will only be able to find it if PYTHONPATH is set correctly. To match the absolute paths as shown above, PYTHONPATH should point to the directory that includes the patchtools, not to the patchtools directory itself.
Eclipse-PyDev may not configure PYTHONPATH correctly if you are not careful, and also adds a large set of folders to the Python path. You can edit the Eclipse PYTHONPATH variable on the project’s properties page.
PYTHONPATH won’t normally be defined in your environment when, for example, you try to launch an application in a shell. On Linux, you can overcome this problem by exporting a value from your .profile file or by wrapping calls to the application file in a script (call it ‘do_run’):
PYTHONPATH=`pwd` ; $@
which may be used like so:
./do_run test_app.py
The do_run script should be located in the parent directory of patchtools to get a correct value for PYTHONPATH.
The Checker analyzes patches and source code, and reports problems to the user. It does not try to fix a patch or make it fit the source.
Quilt and ‘git am’ will detect the same sort of problems as the the Checker does in their analysis of patches and sources. Quilt and ‘git am’ will try to resolve some of these problems by shifting patch lines elsewhere in the source file, sometimes by a large distance. But they will reject patches that cannot be fixed in this simple way.
Quilt will notify you that it has shifted lines to make a hunk fit the source, but ‘git am’ does not.
Inspection of typical Checker output gathered on the ‘patches-3.8’ patch set shows a sizable number of different patterns of mismatch between patches and source files. Many of these patterns will be too complex for quilt or ‘git am’ to fix. But the Checker can show you in great detail what is wrong with the patches, and can allow you to develop a strategy to fix them.
It is also possible that the simplistic fixing strategies pursued by quilt and ‘git am’ may not produce a correct result, or may hide from you that there is a problem in the code. In the ‘capemgr’ tests in the ‘examples’ folder, it was observed that quilt shifted the last few hunks in the last patch by about 71 lines, and that ‘git am’ probably did also, but did not report doing so. Both programs claimed to have applied the patches successfully.
From the Checker output it appears that the next to last version of capemgr.c (produced by the ‘resources/0023...’ patch) had about 70 lines of extra text that was not expected to be present by the last patch. And examining the last two patches shows they both have the same ‘before’ checksum value in their index lines, which is incorrect. You may wish to investigate why this is so.
The Checker does not create any work files in the source tree, modify either the source or the patch files, require you to use a specific folder structure, or require you to have GIT metadata in the source tree.
Quilt requires you to create a tree of work files including the patches and kernel directory, and also requires you to reset the source files targeted by the patches to their original versions each time you run it to test a set of patches. It also requires you to reset the ‘applied-patches’ file to the empty state.
‘git am’, of course, can only be used in a “working directory”, i.e. a source tree that has been obtained from a GIT repo, and has GIT metadata.
The Checker emits various kinds of error messages:
Aside from errors in the patches themselves, there some common reasons for failures:
These problems can sometimes be resolved by fixing and applying the early patch that failed. The ‘get_patch_patches’ method of the PatchSet object can be used to reveal dependencies between patches.
The most serious cause of failures is that other kernel developers have submitted patches that modify some of the same files as your patches, and that some of these patches have been integrated into Linux before yours. The most difficult case to resolve will be one in which another patch has partly fixed the same problem your patch is intended to fix, but does so with incompatible logic. You may only be able to resolve such a problem by rewriting your patches to work with your new kernel version. Alternately you may use the procedures shown in Example 5 of the Helper documentation to determine if you really need the patch.
For a file that is not in the target tree:
DIFF: "diff --git a/include/linux/input/ti_am335x_tsc.h b/include/linux/input/ti_am335x_tsc.h"
ERROR: "a" file not found: include/linux/input/ti_am335x_tsc.h
For a file with invalid length:
DIFF: "diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig"
HUNK: "@@ -40,3 +40,6 @@ config SHARP_PARAM"
ERROR: invalid old start or count for file: start=40, count=3, length=22
Presumably the file previously had a greater length in the "a" version of the sources.
As it happens, the addition specified by the hunk ('+config TI_PRIV_EDMA') is in the target
file. So the hunk is not needed.
For a hunk whose lines can be shifted:
DIFF: "diff --git a/include/linux/mfd/ti_am335x_tscadc.h b/include/linux/mfd/ti_am335x_tscadc.h"
HUNK: "@@ -71,8 +71,6 @@"
WARN: "merge" line not found at 71: "#define STEPCONFIG_INM_ADCREFM STEPCONFIG_INM(8)"
WARN: "merge" line not found at 72: "#define STEPCONFIG_INP_MASK (0xF << 19)"
WARN: "merge" line not found at 73: "#define STEPCONFIG_INP(val) ((val) << 19)"
WARN: "merge" line not found at 74: "#define STEPCONFIG_INP_AN4 STEPCONFIG_INP(4)"
WARN: "merge" line not found at 75: "#define STEPCONFIG_INP_ADCREFM STEPCONFIG_INP(8)"
WARN: "merge" line not found at 76: "#define STEPCONFIG_FIFO1 BIT(26)"
FIND: "merge" line found at 79: "#define STEPCONFIG_INM_ADCREFM STEPCONFIG_INM(8)"
FIND: "merge" line found at 80: "#define STEPCONFIG_INP_MASK (0xF << 19)"
FIND: "merge" line found at 81: "#define STEPCONFIG_INP(val) ((val) << 19)"
FIND: "merge" line found at 82: "#define STEPCONFIG_INP_AN4 STEPCONFIG_INP(4)"
FIND: "merge" line found at 83: "#define STEPCONFIG_INP_ADCREFM STEPCONFIG_INP(8)"
FIND: "merge" line found at 84: "#define STEPCONFIG_FIFO1 BIT(26)"
For a patch whose additions have been applied::
PATCH: "arm/0005-ARM-DTS-AM33XX-Add-PMU-support.patch"
DIFF: "diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi"
HUNK: "@@ -48,6 +48,11 @@"
WARN: "merge" line not found at 48: " };"
WARN: "merge" line not found at 49: " };"
WARN: "merge" line not found at 50: ""
INFO: "add" line not found at next line: " pmu {"
INFO: "add" line not found at next line: " compatible = "arm,cortex-a8-pmu";"
INFO: "add" line not found at next line: " interrupts = <3>;"
INFO: "add" line not found at next line: " };"
INFO: "add" line not found at next line: ""
WARN: "merge" line not found at 51: " /*"
WARN: "merge" line not found at 52: " * The soc node represents the soc top level view. It is uses for IPs"
WARN: "merge" line not found at 53: " * that are not memory mapped in the MPU view or for the MPU itself."
FIND: "add" line found at 69: " pmu {"
FIND: "add" line found at 70: " compatible = "arm,cortex-a8-pmu";"
FIND: "add" line found at 71: " interrupts = <3>;"
FIND: "merge" line found at 76: " * that are not memory mapped in the MPU view or for the MPU itself."
INFO: 1 patch errors
All of the "add" lines were found in the source file, although not at the locations where the patch expected
to insert them. Thus the patch is not needed.
For a patch whose deletions have been applied::
HUNK: "@@ -127,7 +138,7 @@ static int tiadc_read_raw(struct iio_dev *indio_dev,"
WARN: "merge" line not found at 127: " if (i == chan->channel)"
WARN: "merge" line not found at 128: " *val = readx1 & 0xfff;"
WARN: "merge" line not found at 129: " }"
ERROR: "delete" line not found at 130: " am335x_tsc_se_update(adc_dev->mfd_tscadc);"
WARN: "merge" line not found at 131: ""
WARN: "merge" line not found at 132: " return IIO_VAL_INT;"
WARN: "merge" line not found at 133: "}"
Note that the *Checker's* find logic only looks for significant lines, so it did not search for '};', etc.
The logic also did not find the line starting with '* The soc node', which is in the file, but with a slightly
different spelling.