I am trying to learn a bit of Spanish very quickly.
I’m going to Peru in a few months and I think it would be fun to see how much I can learn for that trip. I have never formally studied Spanish but am familiar with Romance languages and so I feel like I’m learning pretty quickly. I’ve been listening to Spanish music and DuoLingo’s excellent Spanish podcast.
I am hoping to pick up the most common words and phrases and thought I would take a text analysis approach. I picked a few articles from a Peruvian news site, scraped their text, and then visualized the most common words.
I built the visualization in Tableau because I love how easy it is to filter the words by their frequency, so that as you learn the most common words you can move down the line and focus on the next most common words.
I visualized the same data in a bar graph:
Obviously this data would benefit from more diverse sources and if this is helpful to me I’ll probably add other sources.
I used R for the webscraping and for analyzing word counts. The code for that is on GitHub, and if you want to do this for any other language there are only a few tweaks you’ll need to make beyond finding relevant articles in that language.
Thanks for reading! Let me know of any questions, and if you enjoy analyses like this one sign up below to get updated when I write a new post.