libleipzig-python provides a wrapper to the web services provided by the Deutscher Wortschatz project of the University of Leipzig. Deutscher Wortschatz is a German database of text corpora and can be utilized to analyze and contextualize words in the thesaurus. libleipzig currently supports all public service calls. These do not require authentication and are provided free of charge for private or scientific purposes (even though you can supply Level-2 credentials for rate limiting purposes).
Contents
Attention!
libleipzig prefetches all service interfaces on initial load. This process requires an Internet connection.
Subsequent imports use indefinitely cached definitions (WSDL files).
>>> from libleipzig import * # might take some time initially >>> r = Baseform(u"Schlangen") >>> r # doctest: +NORMALIZE_WHITESPACE [(Grundform: u'Schlange', Wortart: u'N'), (Grundform: u'Schlangen', Wortart: u'S')] >>> r[0].Grundform u'Schlange' >>> help(Baseform) # doctest: +NORMALIZE_WHITESPACE Help on function Baseform in module libleipzig.protocol: Baseform(*vectors, **options) Baseform(Wort) -> Grundform, Wortart Return the lemmatized (base) form. >>>
Every service calls takes exactly its request parameters (as defined in the list of webservices) as positional or keyword arguments and accepts a number of generic options:
In case of a remote error all services will throw a suds.WebFault (which can be readily imported from libleipzig).
The project collects corpora in a variety of languages, German (de) only being the largest one and thus the default. According to the reference implementation the following corpora are available (those marked with asterisks actually worked as of the time of writing):
Note that these collections are not as comprehensive as the German corpus and thus might only provide selected services. Most often these are the simple text processing calls such as RightNeighbours. You can use these corpora in libleipzig by supplying the corpus parameter to any of the service calls:
>>> import libleipzig >>> libleipzig.Cooccurrences("programming", 0, 1, corpus="en") [(Wort: u'programming', Kookkurrenz: u'language', Signifikanz: u'4152')]
You can increase your rate limit or gain access to private services by supplying authentication credentials to a service call:
Baseform("programming", auth=("username", "password"))
Public service calls can be accessed with the combination anonymous/anonymous, which is also the default. If you wish to persist your credentials among several calls (to the same service) you can save them in the service:
Baseform.set_credentials("username", "password") Baseform("programming")
You should only use the former syntax if you care about thread-safety or do not want to expose your credentials through the service's transport metadata for all of the program's runtime.
For unauthenticated service calls the server might raise errors such as the following:
suds.WebFault: Server raised fault: 'java.lang.Exception: Communication link failure, message from server: "Server shutdown in progress"'
This is the API's way to impose rate limits on anonymous users. See Authentication for a way to avoid this issue.