Midland and Odessa

I know this happens all the time. In fact, this is the fourth massacre in Texas in under two years. And just last year alone there were 340 mass shootings in America. Amy and I were just talking…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Data

Introduction

My assignment submission will chose Dublin, Ireland and perform postcode clustering to determine which post codes are the most similar. This will enable individuals looking to move house to chose the most appropriate locations.

The data I will use is a list of Dublin’s postcodes combined with their latitude and longitude coordinates. This dataset didn’t exist in the public domain so was manually made using wikipedia to get a list of the postcodes in Dublin, these postcodes were then each manually Googled to return the latitude and longitude coordinates for each postcode. Due to an error with the manually created csv a null column had to be dropped.

f

In total I had 22 postcodes for Dublin as per the below code output. For each of these postcodes Foursquare location data was pulled, capturing up to a hundred venues and their venue categories for each postcode

Kmeans cluster was used to cluster the postcodes to determine which were the most similar. First the venue categories were converted to categorical data using one hot encoding. This categorical data was then grouped to form one row per postcode.

Kmeans clustering was then performed on the grouped data to assign each postcode to one the four cluster centroids. The top ten venues per postcode was found. This was then combined into one overall dataset and the results were plotted using Folium.

We can see that overall the clustering algorithm has worked well. The city is dominated by two main cluster grouping cluster 1& 2, and one stand alone group

This is likely due to prevalence of certain venues within the city both are in very developed parts of the city and likely dominated by certain service centric venues such as pubs, clubs and cafes depending on their clientele. While the third stand alone cluster is in a highly commercial suburb dominated by retail shops.

In conclusion we can see the clustering algorithm has worked well. Clearly identifying three stand alone clusters within the city.

Add a comment

Related posts:

Getting creative with your Sprint Retrospectives

This article is part of a series of Sprint Retrospective stories. Each story will walk you through the steps, my audience for the original posts were the organisation I was working with at the time.

Being disciplined is capital to achieve any kind of success!

From a very young age, we notice that our parents have many obligations. Their work is probably the one we see first. Who did not ask his father or mother — “Why do you need to go to work today? Do…

A Book Review About a Girl Named Tuesday

Waiting For Tuesday tells us the story of a girl named Tuesday who is desperately seeking some stability in her life. Born to a hippie mom and a dad who left, Tuesday and her mom traveled her entire…