Lost scrobbles and JavaScript Jupyter Notebooks
I've been a Last.fm user since 2008 and a pro user for the past few years to support the service.
For those not familiar, Last.fm allows someone to track the music they've listened to from mostly any music client. These listens are called "scrobbles".
For example, Spotify has integration through a connected app and every time I listen to a song on Spotify it sends that song to Last.fm.
Spotify Wrapped
Every December, Spotify releases Spotify Wrapped, which I'm sure you're familiar with.
This year, I noticed some discrepancy between the results from Spotify and Last.fm.
Spotify - Top artists and songs
Last.fm - Top artists
Last.fm - Top songs
There are some clear discrepancies between these two sources and I use Spotify to listen to music ~99% of the time. Most notably, my top artist for Spotify is Alpha 9, yet Last.fm says it's i_o. Neither appear in the other.
My initial thought was that perhaps Spotify has a different reporting period than Last.fm. This is definitely the case with my top artist on Last.fm (i_o) as I listened to him a lot early December 2022 and I was looking at these numbers around December 1st 2023. That said, it doesn't explain everything. One issue I've had is Last.fm scrobbles just not happening and I then need to reconnect it to get it working again. I was kind of curious just how many scrobbles were being lost.
Luckily it's possible to get all the raw data from Spotify and Last.fm so that I can try to better understand what's happening.
Getting the data
Spotify
Spotify provides this data for download under Profile > Account > Privacy settings.
https://www.spotify.com/ca-en/account/privacy/
Last.fm
Last.fm is a little trickier, but luckily there is a third party website that helps download your entire listening history via Last.fm's API:
https://benjaminbenben.com/lastfm-to-csv/
I found this did the job, but it would occasionally error on some API calls. A quick fix was to apply this patch to the repo to retry on failure in order to make it more reliable.
JavaScript Jupyter Notebooks
Now that I had the data, I needed a way to analyze it. My data analysis background is very poor and usually in this case I would just write a quick script in whatever language is easiest for the task.
In this case, Deno recently released Jupyter Notebook support and I hadn't really tried it out yet (my colleague Bartek did most of the work implementing it) nor had I ever used a Jupyter Notebook. This seemed like a good occasion.
I
setup Jupyter for Deno in VS Code,
created a notebook.ipynb file, a deno.json file with {}
in it (to activate
Deno's language server and automatically get a Deno lockfile), then added the
Last.fm & Spotify data to the same folder.
Loading and normalizing the data
Last.fm
The Last.fm data is a csv file. I created a notebook cell that loaded and normalized the data like so:
import $, { PathRef } from "https://deno.land/x/dax@0.36.0/mod.ts";
import { parse as parseCsv } from "npm:csv-string@4.1.1";
interface NormalizedRow {
artist: string;
album: string;
track: string;
date: Date;
}
const lastFmText = $.path("lastfm.csv").readTextSync();
const lastFmData: NormalizedRow[] = parseCsv(lastFmText).map((
row: string[],
) => ({
artist: row[0].trim(),
album: row[1].trim(),
track: row[2].trim(),
date: new Date(row[3] + " GMT"),
})).reverse();
console.log("Loaded", lastFmData.length, "rows");
Spotify
Spotify stores the streaming data in multiple JSON files.
Streaming_History_Audio_2012-2014_0.json
Streaming_History_Audio_2014-2015_1.json
...
Streaming_History_Audio_2022-2023_10.json
Streaming_History_Audio_2023_11.json
I loaded it like so:
interface SpotifyRow {
ts: string;
ms_played: number;
master_metadata_track_name: string;
master_metadata_album_artist_name: string;
master_metadata_album_album_name: string;
reason_start: string;
reason_end: string;
}
function normalizeSpotify(row: SpotifyRow): NormalizedRow {
return {
artist: row.master_metadata_album_artist_name.trim(),
album: row.master_metadata_album_album_name.trim(),
track: row.master_metadata_track_name.trim(),
date: new Date(Date.parse(row.ts) - row.ms_played),
};
}
function loadFromFile(path: PathRef) {
return path
.readJsonSync<SpotifyRow[]>()
.filter((row) =>
row.master_metadata_album_artist_name != null
&& row.master_metadata_album_album_name != null
&& row.master_metadata_track_name != null
// not perfect, but probably a close enough approximation
// to what counts as a scrobble
&& (row.reason_end === "trackdone" || row.ms_played > 120_000)
)
.map(normalizeSpotify);
}
const spotifyData = Array.from($.path(".").readDirSync())
.filter(entry =>
entry.isFile && entry.name.startsWith("Streaming_History_Audio")
)
.map(entry => entry.path)
.sort((a, b) => a.toString().localeCompare(b.toString()))
.map(path => loadFromFile(path))
.flat();
console.log("Loaded", spotifyData.length, "rows");
What's hard to determine here is what Spotify play is worthy of being counted as
a Last.fm scrobble. Spotify stores all listens—even if it's only 3
seconds—whereas Last.fm only stores plays that it considers to be actually
listening to the song. I don't know how Last.fm does this, so I approximated it
with the condition row.reason_end === "trackdone" || row.ms_played > 120_000
,
which is definitely inaccurate.
Due to me not really knowing this condition, you should be interpret the charts in this post with a low level of confidence as how I modify this condition can have a big impact on the output. That said, I think how this condition is structured is probably good enough to get some idea about what's going on.
At this point I have the Last.fm data in lastFmData
and Spotify data in
spotifyData
.
Outputting difference in total plays in 2023
To start, I wanted to find out on which days did I have more total plays in Spotify vs Last.fm in 2023:
import { display } from "https://deno.land/x/display@v1.1.2/mod.ts";
import * as Plot from "npm:@observablehq/plot@^0.6";
import { DOMParser } from "npm:linkedom@^0.16";
interface DateRange {
start: Date;
end: Date;
}
function filterData(data: NormalizedRow[], dateRange: DateRange) {
return data.filter(row =>
row.date >= dateRange.start && row.date < dateRange.end
);
}
async function displayPlaysPerDay(dateRange: DateRange) {
// There is most definitely a better way of doing this directly with
// observablehq/plot, but I didn't want to spend too much time going
// through the documentation figuring it out
function getDayKey(date: Date) {
return date.getFullYear() + "-" + date.getMonth() + "-" + date.getDate();
}
function getPlaysPerDay(data: NormalizedRow[]) {
const counts = new Map<string, number>();
for (const { date } of data) {
const day = getDayKey(date);
counts.set(day, (counts.get(day) ?? 0) + 1);
}
return counts;
}
const spotifyPlaysPerDay = getPlaysPerDay(
filterData(spotifyData, dateRange),
);
const lastFmPlaysPerDay = getPlaysPerDay(
filterData(lastFmData, dateRange),
);
let date = dateRange.start;
const rows = [];
while (date < dateRange.end) {
const day = getDayKey(date);
const spotifyCount = spotifyPlaysPerDay.get(day) ?? 0;
const lastFmCount = lastFmPlaysPerDay.get(day) ?? 0;
rows.push({
date: new Date(date),
count: spotifyCount - lastFmCount,
});
date.setDate(date.getDate() + 1);
}
// create a virtual document
const document = new DOMParser().parseFromString(
`<!DOCTYPE html><html lang="en"></html>`,
"text/html",
);
// output the results
await display(Plot.plot({
marks: [
Plot.line(rows, {
x: "date",
y: "count",
z: null,
stroke: (r) => r.count >= 0 ? "green" : "red",
}),
],
document,
}));
}
await displayPlaysPerDay({
start: new Date("2023-01-01 00:00:00 EST"),
// I downloaded all the data from Last.fm around December 1st, but
// it would be several weeks until I received the data from Spotify.
end: new Date("2023-12-01 00:00:00 EST"),
});
Outputs:
Positive (green) values show more Spotify plays on a day. Negative (red) values show more Last.fm scrobbles.
In an ideal world, the Spotify plays would be a subset of the Last.fm ones or only lag behind by a day or two (as shown occasionally in the chart), but that doesn't seem to be the case here. I further looked into these numbers and found that there are ~830 plays that are exclusive to Spotify and ~410 scrobbles exclusive to Last.fm. It's not clear who is at fault here getting the data into Last.fm and I don't want to speculate in this post.
Overall, this is not terrible, but it also doesn't seem perfect.
Past years play count difference - 2017-2022
Looking at past years, here's 2017-2022 inclusive:
await displayPlaysPerDay({
start: new Date("2017-01-01 00:00:00 EST"),
end: new Date("2023-01-01 00:00:00 EST"),
});
The large red spike was me backfilling Last.fm manually with their API because scrobbling got disconnected for a month or so.
Past years play count difference - 2012-2016
Finally, here's 2012-2016 inclusive:
await displayPlaysPerDay({
start: new Date("2012-01-01 00:00:00 EST"),
end: new Date("2017-01-01 00:00:00 EST"),
});
My transition to Spotify started in 2012 and it seems like the Last.fm / Spotify integration for my account was very reliable at the start.
Top 10 artists plays exclusive to Spotify in 2023
I also looked at the plays exclusive to Spotify in 2023 and saw this huge standout, which explains why my top artist on Spotify (Alpha 9) was barely present in the Last.fm:
Thoughts on Deno Jupyter Notebook experience
Overall it was quite enjoyable to use Deno in a Jupyter Notebook.
- I like how dependencies are expressed in the notebook itself. Deno's single
file scripting support translates well to notebooks.
- Of course, I could have put the dependencies in the deno.json file if I wanted. That's useful for scenarios like sharing the same dependencies between notebooks.
- TypeScript support in the editor makes it easy to understand APIs directly in VS Code.
- With TypeScript, type checking errors alert me in real time about certain mistakes.
There were a few annoyances that should be improved over time though.
- I didn't like that I had to write this code:
I'm not sure what the solution is to getting rid of that, but I feel like it's something that could be abstracted away. That said, it's not too big of a deal.const document = new DOMParser().parseFromString( `<!DOCTYPE html><html lang="en"></html>`, "text/html", );
- TypeScript types flowing between notebook cells don't work at the moment.
- I opened https://github.com/denoland/deno/issues/21709
- It was pointed out that https://github.com/denoland/vscode_deno/issues/932 is the tracking issue.
- Looks like this depends on the
lsp-types
crate in Rust supporting notebook cells
- In VS Code, the "Save As" button for SVGs should show the last saved location rather than the current folder in the "Save As" dialog, in my opinion.
Other than that, I really enjoyed the experience and I'm looking forward to JavaScript/TypeScript becoming more prevalent in Jupyter Notebooks.