Skip to content
Martin Stoffers edited this page Nov 8, 2015 · 15 revisions

Building statistics for shownot.es

REST-API definitions under https://github.com/shownotes/snotes20-restapi/wiki/Statistics (@drake81)

Idea

Perform analyses of all publications on shownot.es and generate some cool statistics with that data. Therefore we need to generate word frequency tables and tf-idf tables on each episode of each podcast. We also need to build an overall corpus and generate the same tables for this. We can do this analyses on all text and on all URLs separately. First of all there must be a definition of the tables in the database. By getting a clear structure the statistic feature must be implemented as a separate django application namely statistic.

Later on, the data from the extracted features must be reachable via REST. By achieving this, we need to build a proper API to generate and deliver the data for each graph to the angular frontend. The graphs will be implemented with the library d3.js. Therefore it's necessary to discover which JSON data is needed for the graphs.

Interactive TimeLine-Plot for episodes and podcasts (@felipedsp)

Required JSON for graph

{
"foo":"bar"
}

Possible problems

  • All episodes have a created_date but not a date
    • Date is the date where the episode was live (discovered by hoersuppe API)
    • Maybe we should use episodes numbers not dates as X on the graph

Similarity of Podcasts and Episodes (@bratwurscht)

Required JSON for graph

{
"foo":"bar"
}

Wordclouds beside search (@bratwurscht)

Required JSON for graph

{
"foo":"bar"
}
Clone this wiki locally