--- title: Music30 dataset keywords: fastai sidebar: home_sidebar summary: "Music30 dataset." description: "Music30 dataset." nb_path: "nbs/datasets/music30.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

class Music30Dataset[source]

Music30Dataset(root, process_method, min_session_length=2, min_item_support=2, num_slices=5, days_offset=0, days_shift=95, days_train=90, days_test=5) :: SessionDataset

Session dataset base class.

Args: root (string): Root directory where the dataset should be saved. process_method (string): last: last day => test set last_min_date: last day => test set, but from a minimal date onwards days_test: last N days => test set slice: create multiple train-test-combinations with a sliding window approach min_date (string): Minimum date session_length (int): Session time length :default = 30 * 60 #30 minutes min_session_length (int): Minimum number of items for a session to be valid min_item_support (int): Minimum number of interactions for an item to be valid num_slices (int): Offset in days from the first date in the data set days_offset (int): Number of days the training start date is shifted after creating one slice days_shift (int): Days shift days_train (int): Days in train set in each slice days_test (int): Days in test set in each slice

{% endraw %} {% raw %}
{% endraw %} {% raw %}
musicdata = Music30Dataset(root='/content/music30', process_method='last')
Downloading https://github.com/RecoHut-Datasets/30music/raw/v1/30music.zip
Extracting /content/music30/raw/30music.zip
Processing...
Loaded data set
	Events: 3707857
	Sessions: 200000
	Items: 1203432
	Span: 2014-01-20 / 2015-01-20


Filtered data set
	Events: 2953382
	Sessions: 190216
	Items: 452855
	Span: 2014-01-20 / 2015-01-20


Full train set
	Events: 2892862
	Sessions: 186627
	Items: 450895
Test set
	Events: 54606
	Sessions: 3468
	Items: 35100
Train set
	Events: 2847481
	Sessions: 183674
	Items: 449290
Validation set
	Events: 41785
	Sessions: 2852
	Items: 29293
Done!
{% endraw %} {% raw %}
musicdata = Music30Dataset(root='/content/music30', process_method='last')
Processing...
Loaded data set
	Events: 3707857
	Sessions: 200000
	Items: 1203432
	Span: 2014-01-20 / 2015-01-20


Filtered data set
	Events: 2149666
	Sessions: 165766
	Items: 139016
	Span: 2014-01-20 / 2015-01-20


Full train set
	Events: 2105847
	Sessions: 162634
	Items: 138861
Test set
	Events: 41871
	Sessions: 3091
	Items: 23508
Train set
	Events: 2073194
	Sessions: 160047
	Items: 138755
Validation set
	Events: 31937
	Sessions: 2564
	Items: 20210
Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/music30
/content/music30
├── [157M]  processed
│   ├── [1.6M]  events_test.txt
│   ├── [ 78M]  events_train_full.txt
│   ├── [ 77M]  events_train_tr.txt
│   └── [1.2M]  events_train_valid.txt
└── [137M]  raw
    └── [137M]  30music-200ks.csv

 295M used in 2 directories, 5 files
{% endraw %} {% raw %}
!rm /content/music30/processed/*
musicdata = Music30Dataset(root='/content/music30', process_method='days_test')
Processing...
Loaded data set
	Events: 3707857
	Sessions: 200000
	Items: 1203432
	Span: 2014-01-20 / 2015-01-20


Filtered data set
	Events: 2149666
	Sessions: 165766
	Items: 139016
	Span: 2014-01-20 / 2015-01-20


Full train set
	Events: 2073194
	Sessions: 160047
	Items: 138755
Test set
	Events: 73532
	Sessions: 5652
	Items: 36423
Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/music30
/content/music30
├── [ 79M]  processed
│   ├── [2.7M]  events_test.txt
│   └── [ 77M]  events_train_full.txt
└── [137M]  raw
    └── [137M]  30music-200ks.csv

 217M used in 2 directories, 3 files
{% endraw %} {% raw %}
!rm /content/music30/processed/*
musicdata = Music30Dataset(root='/content/music30', process_method='slice')
Processing...
Loaded data set
	Events: 3707857
	Sessions: 200000
	Items: 1203432
	Span: 2014-01-20 / 2015-01-20


Filtered data set
	Events: 2149666
	Sessions: 165766
	Items: 139016
	Span: 2014-01-20 / 2015-01-20


Done!
{% endraw %} {% raw %}
!tree --du -h -C /content/music30
/content/music30
├── [4.0K]  processed
└── [137M]  raw
    └── [137M]  30music-200ks.csv

 137M used in 2 directories, 1 file
{% endraw %}