--- title: Utility Functions keywords: fastai sidebar: home_sidebar nb_path: "nbs/03_util.ipynb" ---
{% raw %}
{% endraw %}

SHA-1 Digest

Calculating the base 32 encoded SHA-1 digest that is commonly used in WARC files and CDX indexes.

{% raw %}

sha1_digest[source]

sha1_digest(content:bytes)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
sha1_digest(b'12345')
'RSZCG7IGPHFIRW3EMTVMMDNJMNCVCOLE'
{% endraw %}

Making URLs Pretty

Sometimes I want to return something that looks like a URL in Jupyter, but works in other environments. Adapted from here.

{% raw %}

class URL[source]

URL(url:str)

Wrapper around a URL string to provide nice display in IPython environments.

{% endraw %} {% raw %}
{% endraw %}

It displays nicely

{% raw %}
url = URL('https://commoncrawl.org/')
url
{% endraw %}

The repr is usable

{% raw %}
repr(url)
"URL(url='https://commoncrawl.org/')"
{% endraw %}

The string form is what we need

{% raw %}
str(url)
'https://commoncrawl.org/'
{% endraw %}

Or we can extract it

{% raw %}
url.url
'https://commoncrawl.org/'
{% endraw %}

Session Helpers

Make a session that can run multiple concurrent requests and retry for intermittent failures.

{% raw %}

make_session[source]

make_session(pool_maxsize)

{% endraw %} {% raw %}
{% endraw %}

Joblib Helpers

Forcing a function with joblib.Memory

{% raw %}
def _forced(f, force):
    """Forced version of memoized function with Memory"""
    assert hasattr(f, 'call')
    if not force:
        return f
    def result(*args, **kwargs):
        # Force returns a tuple of result,metadata
        return f.call(*args, **kwargs)[0]
    return result
{% endraw %}