← Main

Lost scrobbles and JavaScript Jupyter Notebooks

by David Sherret

I've been a Last.fm user since 2008 and a pro user for the past few years to support the service.

For those not familiar, Last.fm allows someone to track the music they've listened to from mostly any music client. These listens are called "scrobbles".

For example, Spotify has integration through a connected app and every time I listen to a song on Spotify it sends that song to Last.fm.

Shows the Last.fm Scrobbler in Spotify's apps.

Spotify Wrapped

Every December, Spotify releases Spotify Wrapped, which I'm sure you're familiar with.

This year, I noticed some discrepancy between the results from Spotify and Last.fm.

Spotify - Top artists and songs

Spotify Top Artists (Alpha 9, M83, Royksopp, The Knocks, Hania Rani). Top Songs (Sacrifice, Luna, Ordinary Love, kisses, If We Ever Find The Right Place)

Last.fm - Top artists

Last.fm Top Artists (i_o, Hammock, M83, Royksopp, The Knocks, Cinnamon Chasers, Roosevelt, Fay Wildhagen, Hania Rani, Cannons)

Last.fm - Top songs

Last.fm Top Songs (Luna, If We Ever Find The Right Place, kisses, Set Sails, One Man Band, Ordinary Love, anything but wet, The Murder of Love, Vegas High, Holding Me Like Water)

There are some clear discrepancies between these two sources and I use Spotify to listen to music ~99% of the time. Most notably, my top artist for Spotify is Alpha 9, yet Last.fm says it's i_o. Neither appear in the other.

My initial thought was that perhaps Spotify has a different reporting period than Last.fm. This is definitely the case with my top artist on Last.fm (i_o) as I listened to him a lot early December 2022 and I was looking at these numbers around December 1st 2023. That said, it doesn't explain everything. One issue I've had is Last.fm scrobbles just not happening and I then need to reconnect it to get it working again. I was kind of curious just how many scrobbles were being lost.

Luckily it's possible to get all the raw data from Spotify and Last.fm so that I can try to better understand what's happening.

Getting the data

Spotify

Spotify provides this data for download under Profile > Account > Privacy settings.

https://www.spotify.com/ca-en/account/privacy/

Spotify's extended streaming history form component

Last.fm

Last.fm is a little trickier, but luckily there is a third party website that helps download your entire listening history via Last.fm's API:

https://benjaminbenben.com/lastfm-to-csv/

I found this did the job, but it would occasionally error on some API calls. A quick fix was to apply this patch to the repo to retry on failure in order to make it more reliable.

JavaScript Jupyter Notebooks

Now that I had the data, I needed a way to analyze it. My data analysis background is very poor and usually in this case I would just write a quick script in whatever language is easiest for the task.

In this case, Deno recently released Jupyter Notebook support and I hadn't really tried it out yet (my colleague Bartek did most of the work implementing it) nor had I ever used a Jupyter Notebook. This seemed like a good occasion.

I setup Jupyter for Deno in VS Code, created a notebook.ipynb file, a deno.json file with {} in it (to activate Deno's language server and automatically get a Deno lockfile), then added the Last.fm & Spotify data to the same folder.

Loading and normalizing the data

Last.fm

The Last.fm data is a csv file. I created a notebook cell that loaded and normalized the data like so:

import $, { PathRef } from "https://deno.land/x/dax@0.36.0/mod.ts";
import { parse as parseCsv } from "npm:csv-string@4.1.1";

interface NormalizedRow {
  artist: string;
  album: string;
  track: string;
  date: Date;
}

const lastFmText = $.path("lastfm.csv").readTextSync();
const lastFmData: NormalizedRow[] = parseCsv(lastFmText).map((
  row: string[],
) => ({
  artist: row[0].trim(),
  album: row[1].trim(),
  track: row[2].trim(),
  date: new Date(row[3] + " GMT"),
})).reverse();

console.log("Loaded", lastFmData.length, "rows");

Spotify

Spotify stores the streaming data in multiple JSON files.

Streaming_History_Audio_2012-2014_0.json
Streaming_History_Audio_2014-2015_1.json
...
Streaming_History_Audio_2022-2023_10.json
Streaming_History_Audio_2023_11.json

I loaded it like so:

interface SpotifyRow {
  ts: string;
  ms_played: number;
  master_metadata_track_name: string;
  master_metadata_album_artist_name: string;
  master_metadata_album_album_name: string;
  reason_start: string;
  reason_end: string;
}

function normalizeSpotify(row: SpotifyRow): NormalizedRow {
  return {
    artist: row.master_metadata_album_artist_name.trim(),
    album: row.master_metadata_album_album_name.trim(),
    track: row.master_metadata_track_name.trim(),
    date: new Date(Date.parse(row.ts) - row.ms_played),
  };
}

function loadFromFile(path: PathRef) {
  return path
    .readJsonSync<SpotifyRow[]>()
    .filter((row) =>
      row.master_metadata_album_artist_name != null
      && row.master_metadata_album_album_name != null
      && row.master_metadata_track_name != null
      // not perfect, but probably a close enough approximation
      // to what counts as a scrobble
      && (row.reason_end === "trackdone" || row.ms_played > 120_000)
    )
    .map(normalizeSpotify);
}

const spotifyData = Array.from($.path(".").readDirSync())
  .filter(entry =>
    entry.isFile && entry.name.startsWith("Streaming_History_Audio")
  )
  .map(entry => entry.path)
  .sort((a, b) => a.toString().localeCompare(b.toString()))
  .map(path => loadFromFile(path))
  .flat();

console.log("Loaded", spotifyData.length, "rows");

What's hard to determine here is what Spotify play is worthy of being counted as a Last.fm scrobble. Spotify stores all listens—even if it's only 3 seconds—whereas Last.fm only stores plays that it considers to be actually listening to the song. I don't know how Last.fm does this, so I approximated it with the condition row.reason_end === "trackdone" || row.ms_played > 120_000, which is definitely inaccurate.

Due to me not really knowing this condition, you should be interpret the charts in this post with a low level of confidence as how I modify this condition can have a big impact on the output. That said, I think how this condition is structured is probably good enough to get some idea about what's going on.

At this point I have the Last.fm data in lastFmData and Spotify data in spotifyData.

Outputting difference in total plays in 2023

To start, I wanted to find out on which days did I have more total plays in Spotify vs Last.fm in 2023:

import { display } from "https://deno.land/x/display@v1.1.2/mod.ts";
import * as Plot from "npm:@observablehq/plot@^0.6";
import { DOMParser } from "npm:linkedom@^0.16";

interface DateRange {
  start: Date;
  end: Date;
}

function filterData(data: NormalizedRow[], dateRange: DateRange) {
  return data.filter(row =>
    row.date >= dateRange.start && row.date < dateRange.end
  );
}

async function displayPlaysPerDay(dateRange: DateRange) {
  // There is most definitely a better way of doing this directly with
  // observablehq/plot, but I didn't want to spend too much time going
  // through the documentation figuring it out

  function getDayKey(date: Date) {
    return date.getFullYear() + "-" + date.getMonth() + "-" + date.getDate();
  }

  function getPlaysPerDay(data: NormalizedRow[]) {
    const counts = new Map<string, number>();
    for (const { date } of data) {
      const day = getDayKey(date);
      counts.set(day, (counts.get(day) ?? 0) + 1);
    }
    return counts;
  }

  const spotifyPlaysPerDay = getPlaysPerDay(
    filterData(spotifyData, dateRange),
  );
  const lastFmPlaysPerDay = getPlaysPerDay(
    filterData(lastFmData, dateRange),
  );
  let date = dateRange.start;
  const rows = [];
  while (date < dateRange.end) {
    const day = getDayKey(date);
    const spotifyCount = spotifyPlaysPerDay.get(day) ?? 0;
    const lastFmCount = lastFmPlaysPerDay.get(day) ?? 0;
    rows.push({
      date: new Date(date),
      count: spotifyCount - lastFmCount,
    });
    date.setDate(date.getDate() + 1);
  }

  // create a virtual document
  const document = new DOMParser().parseFromString(
    `<!DOCTYPE html><html lang="en"></html>`,
    "text/html",
  );

  // output the results
  await display(Plot.plot({
    marks: [
      Plot.line(rows, {
        x: "date",
        y: "count",
        z: null,
        stroke: (r) => r.count >= 0 ? "green" : "red",
      }),
    ],
    document,
  }));
}

await displayPlaysPerDay({
  start: new Date("2023-01-01 00:00:00 EST"),
  // I downloaded all the data from Last.fm around December 1st, but
  // it would be several weeks until I received the data from Spotify.
  end: new Date("2023-12-01 00:00:00 EST"),
});

Outputs:

Chart shows a lot of Spotify plays that don't happen on Last.fm

Positive (green) values show more Spotify plays on a day. Negative (red) values show more Last.fm scrobbles.

In an ideal world, the Spotify plays would be a subset of the Last.fm ones or only lag behind by a day or two (as shown occasionally in the chart), but that doesn't seem to be the case here. I further looked into these numbers and found that there are ~830 plays that are exclusive to Spotify and ~410 scrobbles exclusive to Last.fm. It's not clear who is at fault here getting the data into Last.fm and I don't want to speculate in this post.

Overall, this is not terrible, but it also doesn't seem perfect.

Past years play count difference - 2017-2022

Looking at past years, here's 2017-2022 inclusive:

await displayPlaysPerDay({
  start: new Date("2017-01-01 00:00:00 EST"),
  end: new Date("2023-01-01 00:00:00 EST"),
});

Chart shows a lot of Spotify plays that don't happen on Last.fm from 2017-2022

The large red spike was me backfilling Last.fm manually with their API because scrobbling got disconnected for a month or so.

Past years play count difference - 2012-2016

Finally, here's 2012-2016 inclusive:

await displayPlaysPerDay({
  start: new Date("2012-01-01 00:00:00 EST"),
  end: new Date("2017-01-01 00:00:00 EST"),
});

Chart shows a lot of Spotify plays that don't happen on Last.fm from 2012-2016

My transition to Spotify started in 2012 and it seems like the Last.fm / Spotify integration for my account was very reliable at the start.

Top 10 artists plays exclusive to Spotify in 2023

I also looked at the plays exclusive to Spotify in 2023 and saw this huge standout, which explains why my top artist on Spotify (Alpha 9) was barely present in the Last.fm:

Chart shows Alpha 9 having ~110 plays that were never synced to last.fm

Thoughts on Deno Jupyter Notebook experience

Overall it was quite enjoyable to use Deno in a Jupyter Notebook.

  1. I like how dependencies are expressed in the notebook itself. Deno's single file scripting support translates well to notebooks.
  2. TypeScript support in the editor makes it easy to understand APIs directly in VS Code.
  3. With TypeScript, type checking errors alert me in real time about certain mistakes.

There were a few annoyances that should be improved over time though.

  1. I didn't like that I had to write this code:
    const document = new DOMParser().parseFromString(
      `<!DOCTYPE html><html lang="en"></html>`,
      "text/html",
    );
    
    I'm not sure what the solution is to getting rid of that, but I feel like it's something that could be abstracted away. That said, it's not too big of a deal.
  2. TypeScript types flowing between notebook cells don't work at the moment.
  3. In VS Code, the "Save As" button for SVGs should show the last saved location rather than the current folder in the "Save As" dialog, in my opinion.

Other than that, I really enjoyed the experience and I'm looking forward to JavaScript/TypeScript becoming more prevalent in Jupyter Notebooks.