import numpy as np
import peakutils as pu
[docs]def get_F_0( signal, rate, time_step = 0.0, min_pitch = 75, max_pitch = 600,
max_num_cands = 15, silence_thres = .03, voicing_thres = .45,
octave_cost = .01, octave_jump_cost = .35,
voiced_unvoiced_cost = .14, accurate = False, pulse = False ):
"""
Computes median Fundamental Frequency ( :math:`F_0` ).
The fundamental frequency ( :math:`F_0` ) of a signal is the lowest
frequency, or the longest wavelength of a periodic waveform. In the context
of this algorithm, :math:`F_0` is calculated by segmenting a signal into
frames, then for each frame the most likely candidate is chosen from the
lowest possible frequencies to be :math:`F_0`. From all of these values,
the median value is returned. More specifically, the algorithm filters out
frequencies higher than the Nyquist Frequency from the signal, then
segments the signal into frames of at least 3 periods of the minimum
pitch. For each frame, it then calculates the normalized autocorrelation
( :math:`r_a` ), or the correlation of the signal to a delayed copy of
itself. :math:`r_a` is calculated according to Boersma's paper
( referenced below ), which is an improvement of previous methods.
:math:`r_a` is estimated by dividing the autocorrelation of the windowed
signal by the autocorrelation of the window. After :math:`r_a` is
calculated the maxima values of :math:`r_a` are found. These points
correspond to the lag domain, or points in the delayed signal, where the
correlation value has peaked. The higher peaks indicate a stronger
correlation. These points in the lag domain suggest places of wave
repetition and are the candidates for :math:`F_0`. The best candidate for
:math:`F_0` of each frame is picked by a cost function, a function that
compares the cost of transitioning from the best :math:`F_0` of the
previous frame to all possible :math:`F_0's` of the current frame. Once the
path of :math:`F_0's` of least cost has been determined, the median
:math:`F_0` of all voiced frames is returned.
This algorithm is adapted from:
http://www.fon.hum.uva.nl/david/ba_shs/2010/Boersma_Proceedings_1993.pdf
and from:
https://github.com/praat/praat/blob/master/fon/Sound_to_Pitch.cpp
.. note::
It has been shown that depressed and suicidal men speak with a reduced
fundamental frequency range, ( described in:
http://ameriquests.org/index.php/vurj/article/download/2783/1181 ) and
patients responding well to depression treatment show an increase in
their fundamental frequency variability ( described in :
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3022333/ ). Because
acoustical properties of speech are the earliest and most consistent
indicators of mood disorders, early detection of fundamental frequency
changes could significantly improve recovery time for disorders with
psychomotor symptoms.
Args:
signal ( numpy.ndarray ): This is the signal :math:`F_0` will be calculated from.
rate ( int ): This is the number of samples taken per second.
time_step ( float ): ( optional, default value: 0.0 ) The measurement, in seconds, of time passing between each frame. The smaller the time_step, the more overlap that will occur. If 0 is supplied the degree of oversampling will be equal to four.
min_pitch ( float ): ( optional, default value: 75 ) This is the minimum value to be returned as pitch, which cannot be less than or equal to zero.
max_pitch ( float ): ( optional, default value: 600 ) This is the maximum value to be returned as pitch, which cannot be greater than the Nyquist Frequency.
max_num_cands ( int ): ( optional, default value: 15 ) This is the maximum number of candidates to be considered for each frame, the unvoiced candidate ( i.e. :math:`F_0` equal to zero ) is always considered.
silence_thres ( float ): ( optional, default value: 0.03 ) Frames that do not contain amplitudes above this threshold ( relative to the global maximum amplitude ), are probably silent.
voicing_thres ( float ): ( optional, default value: 0.45 ) This is the strength of the unvoiced candidate, relative to the maximum possible :math:`r_a`. To increase the number of unvoiced decisions, increase this value.
octave_cost ( float ): ( optional, default value: 0.01 per octave ) This is the degree of favouring of high-frequency candidates, relative to the maximum possible :math:`r_a`. This is necessary because in the case of a perfectly periodic signal, all undertones of :math:`F_0` are equally strong candidates as :math:`F_0` itself. To more strongly favour recruitment of high-frequency candidates, increase this value.
octave_jump_cost ( float ): ( optional, default value: 0.35 ) This is degree of disfavouring of pitch changes, relative to the maximum possible :math:`r_a`. To decrease the number of large frequency jumps, increase this value.
voiced_unvoiced_cost ( float ): ( optional, default value: 0.14 ) This is the degree of disfavouring of voiced/unvoiced transitions, relative to the maximum possible :math:`r_a`. To decrease the number of voiced/unvoiced transitions, increase this value.
accurate ( bool ): ( optional, default value: False ) If False, the window is a Hanning window with a length of :math:`\\frac{ 3.0} {min\_pitch}`. If True, the window is a Gaussian window with a length of :math:`\\frac{6.0}{min\_pitch}`, i.e. twice the length.
pulse ( bool ): ( optional, default value: False ) If False, the function returns a list containing only the median :math:`F_0`. If True, the function returns a list with all values necessary to calculate pulses. This list contains the median :math:`F_0`, the frequencies for each frame in a list, a list of tuples containing the beginning time of the frame, and the ending time of the frame, and the signal filtered by the Nyquist Frequency. The indicies in the second and third list correspond to each other.
Returns:
list: Index 0 contains the median :math:`F_0` in hz. If pulse is set
equal to True, indicies 1, 2, and 3 will contain: a list of all voiced
periods in order, a list of tuples of the beginning and ending time
of a voiced interval, with each index in the list corresponding to the
previous list, and a numpy.ndarray of the signal filtered by the
Nyquist Frequency. If pulse is set equal to False, or left to the
default value, then the list will only contain the median :math:`F_0`.
Raises:
ValueError: min_pitch has to be greater than zero.
ValueError: octave_cost isn't in [ 0, 1 ].
ValueError: silence_thres isn't in [ 0, 1 ].
ValueError: voicing_thres isn't in [ 0, 1 ].
ValueError: max_pitch can't be larger than Nyquist Frequency.
Example:
The example below demonstrates what different outputs this function
gives, using a synthesized signal.
>>> import numpy as np
>>> from matplotlib import pyplot as plt
>>> domain = np.linspace( 0, 6, 300000 )
>>> rate = 50000
>>> y = lambda x: np.sin( 2 * np.pi * 140 * x )
>>> signal = y( domain )
>>> get_F_0( signal, rate )
[ 139.70588235294116 ]
>>> get_F_0( signal, rate, voicing_threshold = .99, accurate = True )
[ 139.70588235294116 ]
>>> w, x, y, z = get_F_0( signal, rate, pulse = True )
>>> print( w )
139.70588235294116
>>> print( x[ :5 ] )
[ 0.00715789 0.00715789 0.00715789 0.00715789 0.00715789 ]
>>> print( y[ :5 ] )
[ ( 0.002500008333361111, 0.037500125000416669 ),
( 0.012500041666805555, 0.047500158333861113 ),
( 0.022500075000249999, 0.057500191667305557 ),
( 0.032500108333694447, 0.067500225000749994 ),
( 0.042500141667138891, 0.077500258334194452 ) ]
>>> print( z[ : 5 ] )
[ 0. 0.01759207 0.0351787 0.05275443 0.07031384 ]
The example below demonstrates the algorithms ability to adjust for
signals with dynamic frequencies, by comparing a plot of a synthesized
signal with an increasing frequency, and the calculated frequencies for
that signal.
>>> domain = np.linspace( 1, 2, 10000 )
>>> rate = 10000
>>> y = lambda x : np.sin( x ** 8 )
>>> signal = y( domain )
>>> median_F_0, periods, time_vals, modified_sig = get_F_0( signal,
rate, pulse = True )
>>> plt.subplot( 211 )
>>> plt.plot( domain, signal )
>>> plt.title( "Synthesized Signal" )
>>> plt.ylabel( "Amplitude" )
>>> plt.subplot( 212 )
>>> plt.plot( np.linspace( 1, 2, len( periods ) ), 1.0 / np.array(
periods ) )
>>> plt.title( "Frequencies of Signal" )
>>> plt.xlabel( "Samples" )
>>> plt.ylabel( "Frequency" )
>>> plt.suptitle( "Comparison of Synthesized Signal and it's Calculated Frequencies" )
>>> plt.show()
.. figure:: figures/F_0_synthesized_sig.png
:align: center
"""
if min_pitch <= 0:
raise ValueError( "min_pitch has to be greater than zero." )
if max_num_cands < max_pitch / min_pitch:
max_num_cands = int( max_pitch / min_pitch )
initial_len = len( signal )
total_time = initial_len / float( rate )
tot_time_arr = np.linspace( 0, total_time, initial_len )
max_place_poss = 1.0 / min_pitch
min_place_poss = 1.0 / max_pitch
#to silence formants
min_place_poss2 = 0.5 / max_pitch
if accurate: pds_per_window = 6.0
else: pds_per_window = 3.0
#degree of oversampling is 4
if time_step <= 0: time_step = ( pds_per_window / 4.0 ) / min_pitch
w_len = pds_per_window / min_pitch
#correcting for time_step
octave_jump_cost *= .01 / time_step
voiced_unvoiced_cost *= .01 / time_step
Nyquist_Frequency = rate / 2.0
upper_bound = .95 * Nyquist_Frequency
zeros_pad = 2 ** ( int( np.log2( initial_len ) ) + 1 ) - initial_len
signal = np.hstack( ( signal, np.zeros( zeros_pad ) ) )
fft_signal = np.fft.fft( signal )
fft_signal[ int( upper_bound ) : -int( upper_bound ) ] = 0
sig = np.fft.ifft( fft_signal )
sig = sig[ :initial_len ].real
#checking to make sure values are valid
if Nyquist_Frequency < max_pitch:
raise ValueError( "max_pitch can't be larger than Nyquist Frequency." )
if octave_cost < 0 or octave_cost > 1:
raise ValueError( "octave_cost isn't in [ 0, 1 ]" )
if voicing_thres< 0 or voicing_thres > 1:
raise ValueError( "voicing_thres isn't in [ 0, 1 ]" )
if silence_thres < 0 or silence_thres > 1:
raise ValueError( "silence_thres isn't in [ 0, 1 ]" )
#finding number of samples per frame and time_step
frame_len = int( w_len * rate + .5 )
time_len = int( time_step * rate + .5 )
#initializing list of candidates for F_0, and their strengths
best_cands, strengths, time_vals = [], [], []
#finding the global peak the way Praat does
global_peak = max( abs( sig - sig.mean() ) )
print(type(global_peak),'\n')
e = np.e
inf = np.inf
log = np.log2
start_i = 0
while start_i < len( sig ) - frame_len :
end_i = start_i + frame_len
segment = sig[ start_i : end_i ]
if accurate:
t = np.linspace( 0, w_len, len( segment ) )
numerator = e ** ( -12.0 * ( t / w_len - .5 ) ** 2.0 ) - e ** -12.0
denominator = 1.0 - e ** -12.0
window = numerator / denominator
interpolation_depth = 0.25
else:
window = np.hanning( len( segment ) )
interpolation_depth = 0.50
#shave off ends of time intervals to account for overlapping
start_time = tot_time_arr[ start_i + int( time_len / 4.0 ) ]
stop_time = tot_time_arr[ end_i - int( time_len / 4.0 ) ]
time_vals.append( ( start_time, stop_time ) )
start_i += time_len
long_pd_i = int( rate / min_pitch )
half_pd_i = int( long_pd_i / 2.0 + 1 )
long_pd_cushion = segment[ half_pd_i : - half_pd_i ]
#finding local peak and local mean the way Praat does
#local mean is found by looking a longest period to either side of the
#center of the frame, and using only the values within this interval to
#calculate the local mean, and similarly local peak is found by looking
#a half of the longest period to either side of the center of the
#frame, ( after the frame has windowed ) and choosing the absolute
#maximum in this interval
local_mean = long_pd_cushion.mean()
segment = segment - local_mean
segment *= window
half_pd_cushion = segment[ long_pd_i : -long_pd_i ]
local_peak = max( abs( half_pd_cushion ) )
if local_peak == 0:
#shortcut -> complete silence and only candidate is silent candidate
best_cands.append( [ inf ] )
strengths.append( [ voicing_thres + 2 ] )
else:
#calculating autocorrelation, based off steps 3.2-3.10
intensity = local_peak / float( global_peak )
N = len( segment )
nFFT = 2 ** int( log( ( 1.0 + interpolation_depth ) * N ) + 1 )
window = np.hstack( ( window, np.zeros( nFFT - N ) ) )
segment = np.hstack( ( segment, np.zeros( nFFT - N ) ) )
x_fft = np.fft.fft( segment )
r_a = np.real( np.fft.fft( x_fft * np.conjugate( x_fft ) ) )
r_a = r_a[ : int( N / pds_per_window ) ]
x_fft = np.fft.fft( window )
r_w = np.real( np.fft.fft( x_fft * np.conjugate( x_fft ) ) )
r_w = r_w[ : int( N / pds_per_window ) ]
r_x = r_a / r_w
r_x /= r_x[ 0 ]
#creating an array of the points in time corresponding to sampled
#autocorrelation of the signal ( r_x )
time_array = np.linspace( 0 , w_len / pds_per_window, len( r_x ) )
peaks = pu.indexes( r_x , thres = 0 )
max_values, max_places = r_x[ peaks ], time_array[ peaks ]
#only consider places that are voiced over a certain threshold
max_places = max_places[ max_values > 0.5 * voicing_thres ]
max_values = max_values[ max_values > 0.5 * voicing_thres ]
for i in range( len( max_values ) ):
#reflecting values > 1 through 1.
if max_values[ i ] > 1.0 :
max_values[ i ] = 1.0 / max_values[ i ]
#calculating the relative strength value
rel_val = [ val - octave_cost * log( place * min_pitch ) for
val, place in zip( max_values, max_places ) ]
if len( max_values ) > 0.0 :
#finding the max_num_cands-1 maximizers, and maximums, then
#calculating their strengths ( eq. 23 and 24 ) and accounting for
#silent candidate
max_places = [ max_places[ i ] for i in np.argsort( rel_val )[
-max_num_cands + 1 : ] ]
max_values = [ max_values[ i ] for i in np.argsort( rel_val )[
-max_num_cands + 1 : ] ]
max_places = np.array( max_places )
max_values = np.array( max_values )
rel_val = list(np.sort( rel_val )[ -max_num_cands + 1 : ] )
#adding the silent candidate's strength to strengths
rel_val.append( voicing_thres + max( 0, 2 - ( intensity /
( silence_thres / ( 1 + voicing_thres ) ) ) ) )
#inf is our silent candidate
max_places = np.hstack( ( max_places, inf ) )
best_cands.append( list( max_places ) )
strengths.append( rel_val )
else:
#if there are no available maximums, only account for silent
#candidate
best_cands.append( [ inf ] )
strengths.append( [ voicing_thres + max( 0, 2 - intensity /
( silence_thres / ( 1 + voicing_thres ) ) ) ] )
#Calculates smallest costing path through list of candidates ( forwards ),
#and returns path.
best_total_cost, best_total_path = -inf, []
#for each initial candidate find the path of least cost, then of those
#paths, choose the one with the least cost.
for cand in range( len( best_cands[ 0 ] ) ):
start_val = best_cands[ 0 ][ cand ]
total_path = [ start_val ]
level = 1
prev_delta = strengths[ 0 ][ cand ]
maximum = -inf
while level < len( best_cands ) :
prev_val = total_path[ -1 ]
best_val = inf
for j in range( len( best_cands[ level ] ) ):
cur_val = best_cands[ level ][ j ]
cur_delta = strengths[ level ][ j ]
cost = 0
cur_unvoiced = cur_val == inf or cur_val < min_place_poss2
prev_unvoiced = prev_val == inf or prev_val < min_place_poss2
if cur_unvoiced:
#both voiceless
if prev_unvoiced:
cost = 0
#voiced-to-unvoiced transition
else:
cost = voiced_unvoiced_cost
else:
#unvoiced-to-voiced transition
if prev_unvoiced:
cost = voiced_unvoiced_cost
#both are voiced
else:
cost = octave_jump_cost * abs( log( cur_val /
prev_val ) )
#The cost for any given candidate is given by the transition
#cost, minus the strength of the given candidate
value = prev_delta - cost + cur_delta
if value > maximum: maximum, best_val = value, cur_val
prev_delta = maximum
total_path.append( best_val )
level += 1
if maximum > best_total_cost:
best_total_cost, best_total_path = maximum, total_path
f_0_forth = np.array( best_total_path )
#Calculates smallest costing path through list of candidates ( backwards ),
#and returns path. Going through the path backwards introduces frequencies
#previously marked as unvoiced, or increases undertones, to decrease
#frequency jumps
best_total_cost, best_total_path2 = -inf, []
#Starting at the end, for each initial candidate find the path of least
#cost, then of those paths, choose the one with the least cost.
for cand in range( len( best_cands[ -1 ] ) ):
start_val = best_cands[ -1 ][ cand ]
total_path = [ start_val ]
level = len( best_cands ) - 2
prev_delta = strengths[ -1 ][ cand ]
maximum = -inf
while level > -1 :
prev_val = total_path[ -1 ]
best_val = inf
for j in range( len( best_cands[ level ] ) ):
cur_val = best_cands[ level ][ j ]
cur_delta = strengths[ level ][ j ]
cost = 0
cur_unvoiced = cur_val == inf or cur_val < min_place_poss2
prev_unvoiced = prev_val == inf or prev_val < min_place_poss2
if cur_unvoiced:
#both voiceless
if prev_unvoiced:
cost = 0
#voiced-to-unvoiced transition
else:
cost = voiced_unvoiced_cost
else:
#unvoiced-to-voiced transition
if prev_unvoiced:
cost = voiced_unvoiced_cost
#both are voiced
else:
cost = octave_jump_cost * abs( log( cur_val /
prev_val ) )
#The cost for any given candidate is given by the transition
#cost, minus the strength of the given candidate
value = prev_delta - cost + cur_delta
if value > maximum: maximum, best_val = value, cur_val
prev_delta = maximum
total_path.append( best_val )
level -= 1
if maximum > best_total_cost:
best_total_cost, best_total_path2 = maximum, total_path
f_0_back = np.array( best_total_path2 )
#reversing f_0_backward so the initial value corresponds to first frequency
f_0_back = f_0_back[ -1 : : -1 ]
#choose the maximum frequency from each path for the total path
f_0 = np.array( [ min( i, j ) for i, j in zip( f_0_forth, f_0_back ) ] )
if pulse:
#removing all unvoiced time intervals from list
removed = 0
for i in range( len( f_0 ) ):
if f_0[ i ] > max_place_poss or f_0[ i] < min_place_poss:
time_vals.remove( time_vals[ i - removed ] )
removed += 1
for i in range( len( f_0 ) ):
#if f_0 is voiceless assign occurance of peak to inf -> when divided
#by one this will give us a frequency of 0, corresponding to a unvoiced
#frame
if f_0[ i ] > max_place_poss or f_0[ i ] < min_place_poss :
f_0[ i ] = inf
f_0 = f_0[ f_0 < inf ]
if pulse:
return [ np.median( 1.0 / f_0 ), list( f_0 ), time_vals, signal ]
if len( f_0 ) == 0:
return [ 0 ]
else:
return [ np.median( 1.0 / f_0 ) ]
[docs]def get_HNR( signal, rate, time_step = 0, min_pitch = 75,
silence_threshold = .1, periods_per_window = 4.5 ):
"""
Computes mean Harmonics-to-Noise ratio ( HNR ).
The Harmonics-to-Noise ratio ( HNR ) is the ratio
of the energy of a periodic signal, to the energy of the noise in the
signal, expressed in dB. This value is often used as a measure of
hoarseness in a person's voice. By way of illustration, if 99% of the
energy of the signal is in the periodic part and 1% of the energy is in
noise, then the HNR is :math:`10 \cdot log_{10}( \\frac{99}{1} ) = 20`.
A HNR of 0 dB means there is equal energy in harmonics and in noise. The
first step for HNR determination of a signal, in the context of this
algorithm, is to set the maximum frequency allowable to the signal's
Nyquist Frequency. Then the signal is segmented into frames of length
:math:`\\frac{periods\_per\_window}{min\_pitch}`. Then for each frame, it
calculates the normalized autocorrelation ( :math:`r_a` ), or the
correlation of the signal to a delayed copy of itself. :math:`r_a` is
calculated according to Boersma's paper ( referenced below ). The highest
peak is picked from :math:`r_a`. If the height of this peak is larger than
the strength of the silent candidate, then the HNR for this frame is
calculated from that peak. The height of the peak corresponds to the energy
of the periodic part of the signal. Once the HNR value has been calculated
for all voiced frames, the mean is taken from these values and returned.
This algorithm is adapted from:
http://www.fon.hum.uva.nl/david/ba_shs/2010/Boersma_Proceedings_1993.pdf
and from:
https://github.com/praat/praat/blob/master/fon/Sound_to_Harmonicity.cpp
.. note::
The Harmonics-to-Noise ratio of a person's voice is strongly negatively
correlated to depression severity ( described in:
https://ll.mit.edu/mission/cybersec/publications/publication-files/full_papers/2012_09_09_MalyskaN_Interspeech_FP.pdf )
and can be used as an early indicator of depression, and suicide risk.
After this indicator has been realized, preventative medicine can be
implemented, improving recovery time or even preventing further
symptoms.
Args:
signal ( numpy.ndarray ): This is the signal the HNR will be calculated from.
rate ( int ): This is the number of samples taken per second.
time_step ( float ): ( optional, default value: 0.0 ) This is the measurement, in seconds, of time passing between each frame. The smaller the time_step, the more overlap that will occur. If 0 is supplied, the degree of oversampling will be equal to four.
min_pitch ( float ): ( optional, default value: 75 ) This is the minimum value to be returned as pitch, which cannot be less than or equal to zero
silence_threshold ( float ): ( optional, default value: 0.1 ) Frames that do not contain amplitudes above this threshold ( relative to the global maximum amplitude ), are considered silent.
periods_per_window ( float ): ( optional, default value: 4.5 ) 4.5 is best for speech. The more periods contained per frame, the more the algorithm becomes sensitive to dynamic changes in the signal.
Returns:
float: The mean HNR of the signal expressed in dB.
Raises:
ValueError: min_pitch has to be greater than zero.
ValueError: silence_threshold isn't in [ 0, 1 ].
Example:
The example below adjusts parameters of the function, using the same
synthesized signal with added noise, to demonstrate the stability of
the function.
>>> import numpy as np
>>> from matplotlib import pyplot as plt
>>> domain = np.linspace( 0, 6, 300000 )
>>> rate = 50000
>>> y = lambda x:( 1 + .3 * np.sin( 2 * np.pi * 140 * x ) ) * np.sin(
2 * np.pi * 140 * x )
>>> signal = y( domain ) + .2 * np.random.random( 300000 )
>>> get_HNR( signal, rate )
21.885338007330802
>>> get_HNR( signal, rate, periods_per_window = 6 )
21.866307805597849
>>> get_HNR( signal, rate, time_step = .04, periods_per_window = 6 )
21.878451649148804
We'd expect an increase in noise to reduce HNR and similar energies
in noise and harmonics to produce a HNR that approaches zero. This is
demonstrated below.
>>> signals = [ y( domain ) + i / 10.0 * np.random.random( 300000 ) for
i in range( 1, 11 ) ]
>>> HNRx10 = [ get_HNR( sig, rate ) for sig in signals ]
>>> plt.plot( np.linspace( .1, 1, 10 ), HNRx10 )
>>> plt.xlabel( "Amount of Added Noise" )
>>> plt.ylabel( "HNR" )
>>> plt.title( "HNR Values of Signals with Added Noise" )
>>> plt.show()
.. figure:: figures/HNR_values_added_noise.png
:align: center
"""
#checking to make sure values are valid
if min_pitch <= 0:
raise ValueError( "min_pitch has to be greater than zero." )
if silence_threshold < 0 or silence_threshold > 1:
raise ValueError( "silence_threshold isn't in [ 0, 1 ]." )
#degree of overlap is four
if time_step <= 0: time_step = ( periods_per_window / 4.0 ) / min_pitch
Nyquist_Frequency = rate / 2.0
max_pitch = Nyquist_Frequency
global_peak = max( abs( signal - signal.mean() ) )
window_len = periods_per_window / float( min_pitch )
#finding number of samples per frame and time_step
frame_len = int( window_len * rate )
t_len = int( time_step * rate )
#segmenting signal, there has to be at least one frame
num_frames = max( 1, int( len( signal ) / t_len + .5 ) )
seg_signal = [ signal[ int( i * t_len ) : int( i * t_len ) + frame_len ]
for i in range( num_frames + 1 ) ]
#initializing list of candidates for HNR
best_cands = []
for index in range( len( seg_signal ) ):
segment = seg_signal[ index ]
#ignoring any potential empty segment
if len( segment) > 0:
window_len = len( segment ) / float( rate )
#calculating autocorrelation, based off steps 3.2-3.10
segment = segment - segment.mean()
local_peak = max( abs( segment ) )
if local_peak == 0 :
best_cands.append( .5 )
else:
intensity = local_peak / global_peak
window = np.hanning( len( segment ) )
segment *= window
N = len( segment )
nsampFFT = 2 ** int( np.log2( N ) + 1 )
window = np.hstack( ( window, np.zeros( nsampFFT - N ) ) )
segment = np.hstack( ( segment, np.zeros( nsampFFT - N ) ) )
x_fft = np.fft.fft( segment )
r_a = np.real( np.fft.fft( x_fft * np.conjugate( x_fft ) ) )
r_a = r_a[ : N ]
r_a = np.nan_to_num( r_a )
x_fft = np.fft.fft( window )
r_w = np.real( np.fft.fft( x_fft * np.conjugate( x_fft ) ) )
r_w = r_w[ : N ]
r_w = np.nan_to_num( r_w )
r_x = r_a / r_w
r_x /= r_x[ 0 ]
#creating an array of the points in time corresponding to the
#sampled autocorrelation of the signal ( r_x )
time_array = np.linspace( 0, window_len, len( r_x ) )
i = pu.indexes( r_x )
max_values, max_places = r_x[ i ], time_array[ i ]
max_place_poss = 1.0 / min_pitch
min_place_poss = 1.0 / max_pitch
max_values = max_values[ max_places >= min_place_poss ]
max_places = max_places[ max_places >= min_place_poss ]
max_values = max_values[ max_places <= max_place_poss ]
max_places = max_places[ max_places <= max_place_poss ]
for i in range( len( max_values ) ):
#reflecting values > 1 through 1.
if max_values[ i ] > 1.0 :
max_values[ i ] = 1.0 / max_values[ i ]
#eq. 23 and 24 with octave_cost, and voicing_threshold set to zero
if len( max_values ) > 0:
strengths = [ max( max_values ), max( 0, 2 - ( intensity /
( silence_threshold ) ) ) ]
#if the maximum strength is the unvoiced candidate, then .5
#corresponds to HNR of 0
if np.argmax( strengths ):
best_cands.append( 0.5 )
else:
best_cands.append( strengths[ 0 ] )
else:
best_cands.append( 0.5 )
best_cands = np.array( best_cands )
best_cands = best_cands[ best_cands > 0.5 ]
if len(best_cands) == 0:
return 0
#eq. 4
best_cands = 10.0 * np.log10( best_cands / ( 1.0 - best_cands ) )
best_candidate = np.mean( best_cands )
return best_candidate
[docs]def get_Pulses( signal, rate, min_pitch = 75, max_pitch = 600,
include_max = False, include_min = True ):
"""
Computes glottal pulses of a signal.
This algorithm relies on the voiced/unvoiced decisions and fundamental
frequencies, calculated for each voiced frame by get_F_0. For every voiced
interval, a list of points is created by finding the initial point
:math:`t_1`, which is the absolute extremum ( or the maximum/minimum,
depending on your include_max and include_min parameters ) of the amplitude
of the sound in the interval
:math:`[\ t_{mid} - \\frac{T_0}{2},\ t_{mid} + \\frac{T_0}{2}\ ]`, where
:math:`t_{mid}` is the midpoint of the interval, and :math:`T_0` is the
period at :math:`t_{mid}`, as can be linearly interpolated from the periods
acquired from get_F_0. From this point, the algorithm searches for points
:math:`t_i` to the left until we reach the left edge of the interval. These
points are the absolute extrema ( or the maxima/minima ) in the interval
:math:`[\ t_{i-1} - 1.25 \cdot T_{i-1},\ t_{i-1} - 0.8 \cdot T_{i-1}\ ]`,
with :math:`t_{i-1}` being the last found point, and :math:`T_{i-1}` the
period at this point. The same is done to the right of :math:`t_1`. The
points are returned in consecutive order.
This algorithm is adapted from:
https://pdfs.semanticscholar.org/16d5/980ba1cf168d5782379692517250e80f0082.pdf
and from:
https://github.com/praat/praat/blob/master/fon/Sound_to_PointProcess.cpp
.. note::
This algorithm is a helper function for the jitter algorithm, that
returns a list of points in the time domain corresponding to minima or
maxima of the signal. These minima or maxima are the sequence of
glottal closures in vocal-fold vibration. The distance between
consecutive pulses is defined as the wavelength of the signal at this
interval, which can be used to later calculate jitter.
Args:
signal ( numpy.ndarray ): This is the signal the glottal pulses will be calculated from.
rate ( int ): This is the number of samples taken per second.
min_pitch ( float ): ( optional, default value: 75 ) This is the minimum value to be returned as pitch, which cannot be less than or equal to zero
max_pitch ( float ): ( optional, default value: 600 ) This is the maximum value to be returned as pitch, which cannot be greater than Nyquist Frequency
include_max ( bool ): ( optional, default value: False ) This determines if maxima values will be used when calculating pulses
include_min ( bool ): ( optional, default value: True ) This determines if minima values will be used when calculating pulses
Returns:
numpy.ndarray: This is an array of points in a time series that
correspond to the signal's periodicity.
Raises:
ValueError: include_min and include_max can't both be False
Example:
Pulses are calculated for a synthesized signal, and the variation in
time between each pulse is shown.
>>> import numpy as np
>>> from matplotlib import pyplot as plt
>>> domain = np.linspace( 0, 6, 300000 )
>>> y = lambda x:( 1 + .3 * np.sin( 2 * np.pi * 140 * x ) ) * np.sin(
2 * np.pi * 140 * x )
>>> signal = y( domain ) + .2 * np.random.random( 300000 )
>>> rate = 50000
>>> p = get_Pulses( signal, rate )
>>> print( p[ :5 ] )
[ 0.00542001 0.01236002 0.01946004 0.02702005 0.03402006 ]
>>> print( np.diff( p[ :6 ] ) )
[ 0.00694001 0.00710001 0.00756001 0.00700001 0.00712001 ]
>>> p = get_Pulses( signal, rate, include_max = True )
>>> print( p[ :5 ] )
[ 0.00886002 0.01608003 0.02340004 0.03038006 0.03732007 ]
>>> print( np.diff( p[ :6 ] ) )
[ 0.00722001 0.00732001 0.00698001 0.00694001 0.00734001 ]
A synthesized signal, with an increasing frequency, and the calculated
pulses of that signal are plotted together to demonstrate the
algorithms ability to adapt to dynamic pulses.
>>> domain = np.linspace( 1.85, 2.05, 10000 )
>>> rate = 50000
>>> y = lambda x : np.sin( x ** 8 )
>>> signal = np.hstack( ( np.zeros( 2500 ), y( domain[ 2500: -2500 ] ),
np.zeros( 2500 ) ) )
>>> pulses = get_Pulses( signal, rate )
>>> plt.plot( domain, signal, 'r', alpha = .5, label = "Signal" )
>>> plt.plot( ( 1.85 + pulses[ 0 ] ) * np.ones ( 5 ),
np.linspace( -1, 1, 5 ), 'b', alpha = .5, label = "Pulses" )
>>> plt.legend()
>>> for pulse in pulses[ 1: ]:
>>> plt.plot( ( 1.85 + pulse ) * np.ones ( 5 ),
np.linspace( -1, 1, 5 ), 'b', alpha = .5 )
>>> plt.xlabel( "Samples" )
>>> plt.ylabel( "Amplitude" )
>>> plt.title( "Signal with Pulses, Calculated from Minima of Signal" )
>>> plt.show()
.. figure:: figures/Pulses_sig.png
:align: center
"""
#first calculate F_0 estimates for each voiced interval
add = np.hstack
if not include_max and not include_min:
raise ValueError( "include_min and include_max can't both be False" )
median, period, intervals, signal = get_F_0( signal, rate,
min_pitch = min_pitch,
max_pitch = max_pitch,
pulse = True )
global_peak = max( abs( signal - signal.mean() ) )
#points will be a list of points where pulses occur, voiced_intervals will
#be a list of tuples consisting of voiced intervals with overlap
#eliminated
points, voiced_intervals = [], []
#f_times will be an array of times corresponding to our given frequencies,
#to be used for interpolating, v_time be an array consisting of all the
#points in time that are voiced
f_times, v_time = np.array( [] ), np.array( [] )
total_time = np.linspace( 0, len( signal ) / float( rate ), len( signal ) )
for interval in intervals:
start, stop = interval
#finding all midpoints for each interval
f_times = add( ( f_times, ( start + stop ) / 2.0 ) )
i = 0
while i < len( intervals ) - 1 :
start, stop = intervals[ i ]
i_start, prev_stop = intervals[ i ]
#while there is overlap, look to the next interval
while start <= prev_stop and i < len( intervals ) - 1 :
prev_start, prev_stop = intervals[ i ]
i += 1
start, stop = intervals[ i ]
if i == len( intervals ) - 1:
samp = int ( ( stop - i_start ) * rate )
v_time = add( ( v_time, np.linspace( i_start, stop, samp ) ) )
voiced_intervals.append( ( i_start, stop ) )
else:
samp = int ( ( prev_stop - i_start ) * rate )
v_time = add( ( v_time, np.linspace( i_start, prev_stop, samp ) ) )
voiced_intervals.append( ( i_start, prev_stop ) )
#interpolate the periods so that each voiced point has a corresponding
#period attached to it
periods_interp = np.interp( v_time, f_times, period )
for interval in voiced_intervals:
start, stop = interval
midpoint = ( start + stop ) / 2.0
#out of all the voiced points, look for index of the one that is
#closest to our calculated midpoint
midpoint_index = np.argmin( abs( v_time - midpoint ) )
midpoint = v_time[ midpoint_index ]
T_0 = periods_interp[ midpoint_index ]
frame_start = midpoint - T_0
frame_stop = midpoint + T_0
#finding points, start by looking to the left of the center of the
#voiced interval
while frame_start > start :
#out of all given time points in signal, find index of closest to
#start and stop
frame_start_index = np.argmin( abs( total_time - frame_start ) )
frame_stop_index = np.argmin( abs( total_time - frame_stop ) )
frame = signal[ frame_start_index : frame_stop_index ]
if include_max and include_min:
p_index = np.argmax( abs( frame ) ) + frame_start_index
elif include_max:
p_index = np.argmax( frame ) + frame_start_index
else:
p_index = np.argmin( frame ) + frame_start_index
if abs( signal[ p_index ] ) > .02333 * global_peak:
points.append( total_time[ p_index ] )
t = total_time[ p_index ]
t_index = np.argmin( abs( v_time - t ) )
T_0 = periods_interp[ t_index ]
frame_start = t - 1.25 * T_0
frame_stop = t - 0.80 * T_0
T_0 = periods_interp[ midpoint_index ]
frame_start = midpoint - T_0
frame_stop = midpoint + T_0
#finding points by now looking to the right of the center of the
#voiced interval
while frame_stop < stop :
#out of all given time points in signal, find index of closest to
#start and stop
frame_start_index = np.argmin( abs( total_time - frame_start ) )
frame_stop_index = np.argmin( abs( total_time - frame_stop ) )
frame = signal[ frame_start_index : frame_stop_index ]
if include_max and include_min:
p_index = np.argmax( abs( frame ) ) + frame_start_index
elif include_max:
p_index = np.argmax( frame ) + frame_start_index
else:
p_index = np.argmin( frame ) + frame_start_index
if abs( signal[ p_index ] ) > .02333 * global_peak:
points.append( total_time[ p_index ] )
t = total_time[ p_index ]
t_index = np.argmin( abs( v_time - t ) )
T_0 = periods_interp[ t_index ]
frame_start = t + 0.80 * T_0
frame_stop = t + 1.25 * T_0
#returning an ordered array of points with any duplicates removed
return np.array( sorted( list( set( points ) ) ) )
[docs]def get_Jitter( signal, rate, period_floor = .0001, period_ceiling = .02,
max_period_factor = 1.3 ):
"""
Compute Jitter.
Jitter is the measurement of random pertubations in period length. For most
accurate jitter measurements, calculations are typically performed on long
sustained vowels. This algorithm calculates 5 different types of jitter for
all voiced intervals. Each different type of jitter describes different
characteristics of the period pertubations. The 5 types of jitter
calculated are absolute jitter, relative jitter, relative average
perturbation ( rap ), the 5-point period pertubation quotient ( ppq5 ), and
the difference of differences of periods ( ddp ).\n
Absolute jitter is defined as the cycle-to-cycle variation of
fundamental frequency, or in other words, the average absolute difference
between consecutive periods.
.. math::
\\frac{1}{N-1}\sum_{i=1}^{N-1}|T_i-T_{i-1}|
Relative jitter is defined as the average absolute difference between
consecutive periods ( absolute jitter ), divided by the average period.
.. math::
\\frac{\\frac{1}{N-1}\sum_{i=1}^{N-1}|T_i-T_{i-1}|}{\\frac{1}{N}\sum_{i=1}^N T_i}
Relative average perturbation is defined as the average absolute difference
between a period and the average of it and its two neighbors divided by the
average period.
.. math::
\\frac{\\frac{1}{N-1}\sum_{i=1}^{N-1}|T_i-(\\frac{1}{3}\sum_{n=i-1}^{i+1}T_n)|}{\\frac{1}{N}\sum_{i=1}^N T_i}
The 5-point period pertubation quotient is defined as the average absolute
difference between a period and the average of it and its 4 closest neighbors
divided by the average period.
.. math::
\\frac{\\frac{1}{N-1}\sum_{i=2}^{N-2}|T_i-(\\frac{1}{5}\sum_{n=i-2}^{i+2}T_n)|}{\\frac{1}{N}\sum_{i=1}^N T_i}
The difference of differences of periods is defined as the relative mean
absolute second-order difference of periods, which is equivalent to 3 times
rap.
.. math::
\\frac{\\frac{1}{N-2}\sum_{i=2}^{N-1}|(T_{i+1}-T_i)-(T_i-T_{i-1})|}{\\frac{1}{N}\sum_{i=1}^{N}T_i}
After each type of jitter has been calculated the values are
returned in a dictionary.
.. warning::
This algorithm has 4.2% relative error when compared to Praat's values.
This algorithm is adapted from:
http://www.lsi.upc.edu/~nlp/papers/far_jit_07.pdf
and from:
http://ac.els-cdn.com/S2212017313002788/1-s2.0-S2212017313002788-main.pdf?_tid=0c860a76-7eda-11e7-a827-00000aab0f02&acdnat=1502486243_009951b8dc70e35597f4cd19f8e05930
and from:
https://github.com/praat/praat/blob/master/fon/VoiceAnalysis.cpp
.. note::
Significant differences can occur in jitter and shimmer measurements
between different speaking styles, these differences make it possible to
use jitter as a feature for speaker recognition ( referenced above ).
Args:
signal ( numpy.ndarray ): This is the signal the jitter will be calculated from.
rate ( int ): This is the number of samples taken per second.
period_floor ( float ): ( optional, default value: .0001 ) This is the shortest possible interval that will be used in the computation of jitter, in seconds. If an interval is shorter than this, it will be ignored in the computation of jitter ( the previous and next intervals will not be regarded as consecutive ).
period_ceiling ( float ): ( optional, default value: .02 ) This is the longest possible interval that will be used in the computation of jitter, in seconds. If an interval is longer than this, it will be ignored in the computation of jitter ( the previous and next intervals will not be regarded as consecutive ).
max_period_factor ( float ): ( optional, default value: 1.3 ) This is the largest possible difference between consecutive intervals that will be used in the computation of jitter. If the ratio of the durations of two consecutive intervals is greater than this, this pair of intervals will be ignored in the computation of jitter ( each of the intervals could still take part in the computation of jitter, in a comparison with its neighbor on the other side ).
Returns:
dict: a dictionary with keys: 'local', 'local, absolute', 'rap',
'ppq5', and 'ddp'. The values correspond to each type of jitter.\n
local jitter is expressed as a ratio of mean absolute period variation
to the mean period. \n
local absolute jitter is given in seconds.\n
rap is expressed as a ratio of the mean absolute difference between a
period and the mean of its 2 neighbors to the mean period.\n
ppq5 is expressed as a ratio of the mean absolute difference between a
period and the mean of its 4 neighbors to the mean period.\n
ddp is expressed as a ratio of the mean absolute second-order
difference to the mean period.
Example:
In the example below a synthesized signal is used to demonstrate random
perturbations in periods, and how get_Jitter responds.
>>> import numpy as np
>>> domain = np.linspace( 0, 6, 300000 )
>>> y = lambda x:( 1 - .3 * np.sin( 2 * np.pi * 140 * x ) ) * np.sin(
2 * np.pi * 140 * x )
>>> signal = y( domain ) + .2 * np.random.random( 300000 )
>>> rate = 50000
>>> get_Jitter( signal, rate )
{ 'ddp': 0.047411037373434134,
'local': 0.02581897560637415,
'local, absolute': 0.00018442618908563846,
'ppq5': 0.014805010237029443,
'rap': 0.015803679124478043 }
>>> get_Jitter( signal, rate, period_floor = .001,
period_ceiling = .01, max_period_factor = 1.05 )
{ 'ddp': 0.03264516540374475,
'local': 0.019927260366800197,
'local, absolute': 0.00014233584195389132,
'ppq5': 0.011472274162612033,
'rap': 0.01088172180124825 }
>>> y = lambda x:( 1 - .3 * np.sin( 2 * np.pi * 140 * x ) ) * np.sin(
2 * np.pi * 140 * x )
>>> signal = y( domain )
>>> get_Jitter( signal, rate )
{ 'ddp': 0.0015827628114371581,
'local': 0.00079043477724730755,
'local, absolute': 5.6459437833161522e-06,
'ppq5': 0.00063462518488944565,
'rap': 0.00052758760381238598 }
"""
pulses = get_Pulses( signal, rate )
periods = np.diff( pulses )
min_period_factor = 1.0 / max_period_factor
#finding local, absolute
#described at:
#http://www.fon.hum.uva.nl/praat/manual/PointProcess__Get_jitter__local__absolute____.html
sum_total = 0
num_periods = len( pulses ) - 1
for i in range( len( periods ) - 1 ):
p1, p2 = periods[ i ], periods[ i + 1 ]
ratio = p2 / p1
if (ratio < max_period_factor and ratio > min_period_factor and
p1 < period_ceiling and p1 > period_floor and
p2 < period_ceiling and p2 > period_floor ):
sum_total += abs( periods[ i + 1 ] - periods[ i ] )
else: num_periods -= 1
absolute = sum_total / ( num_periods - 1 )
#finding local,
#described at:
#http://www.fon.hum.uva.nl/praat/manual/PointProcess__Get_jitter__local____.html
sum_total = 0
num_periods = 0
#duplicating edges so there is no need to test edge cases
periods = np.hstack( ( periods[ 0 ], periods, periods[ -1 ] ) )
for i in range( len( periods ) - 2):
p1, p2, p3 = periods[ i ], periods[ i + 1 ], periods[ i + 2 ]
ratio_1, ratio_2 = p1 / p2, p2 / p3
if (ratio_1 < max_period_factor and ratio_1 > min_period_factor and
ratio_2 < max_period_factor and ratio_2 > min_period_factor and
p2 < period_ceiling and p2 > period_floor ):
sum_total += p2
num_periods += 1
#removing duplicated edges
periods = periods[ 1 : -1 ]
avg_period = sum_total / ( num_periods )
relative = absolute / avg_period
#finding rap
#described at:
#http://www.fon.hum.uva.nl/praat/manual/PointProcess__Get_jitter__rap____.html
sum_total = 0
num_periods = 0
for i in range( len( periods ) - 2 ):
p1, p2, p3 = periods[ i ], periods[ i + 1 ], periods[ i + 2 ]
ratio_1, ratio_2 = p1 / p2, p2 / p3
if (ratio_1 < max_period_factor and ratio_1 > min_period_factor and
ratio_2 < max_period_factor and ratio_2 > min_period_factor and
p1 < period_ceiling and p1 > period_floor and
p2 < period_ceiling and p2 > period_floor and
p3 < period_ceiling and p3 > period_floor ):
sum_total += abs( p2 - ( p1 + p2 + p3 ) / 3.0 )
num_periods += 1
rap = ( sum_total / num_periods ) / avg_period
#finding ppq5
#described at:
#http://www.fon.hum.uva.nl/praat/manual/PointProcess__Get_jitter__ppq5____.html
sum_total = 0
num_periods = 0
for i in range( len( periods ) - 4 ):
p1, p2, p3 = periods[ i ], periods[ i + 1 ], periods[ i + 2 ]
p4, p5 = periods[ i + 3 ], periods[ i + 4 ]
ratio_1, ratio_2, ratio_3, ratio_4 = p1 / p2, p2 / p3, p3 / p4, p4 / p5
if (ratio_1 < max_period_factor and ratio_1 > min_period_factor and
ratio_2 < max_period_factor and ratio_2 > min_period_factor and
ratio_3 < max_period_factor and ratio_3 > min_period_factor and
ratio_4 < max_period_factor and ratio_4 > min_period_factor and
p1 < period_ceiling and p1 > period_floor and
p2 < period_ceiling and p2 > period_floor and
p3 < period_ceiling and p3 > period_floor and
p4 < period_ceiling and p4 > period_floor and
p5 < period_ceiling and p5 > period_floor ):
sum_total += abs( p3 - ( p1 + p2 + p3 +p4 + p5 ) / 5.0 )
num_periods += 1
ppq5 = ( sum_total / num_periods ) / avg_period
#Praat calculates ddp by multiplying rap by 3
#described at:
#http://www.fon.hum.uva.nl/praat/manual/PointProcess__Get_jitter__ddp____.html
return { 'local' : relative, 'local, absolute' : absolute, 'rap' : rap,
'ppq5' : ppq5, 'ddp' : 3 * rap }