A quick report applying some of what I’ve learned of data science.
Introduction: Business Problem
I love bubble tea. Enough that I’ve thought about opening my own bubble tea shop. There are a few in my area, but there are definitely large gaps in coverage for service of this delicacy. If I were to open a bubble tea shop in the Vancouver (Washington) area, where would the best place be to open it?
Data
For general venue information, I used foursquare. It provides excellent location based data on what venues are in areas and where they are.
Unfortunately bubble tea shops in the area do not seem to perfectly match up with Foursquare’s data. So to get more accurate shop information, I had to use my own knowledge of the area as well as web listings to check all possible shops. Many of the venues listed were either no longer in business or didn’t offer bubble tea, but by the end of checking search results and venue listings, I was fairly confident I had found all the real and currently in business bubble tea shops in the area. Coordinate data for these locations was gathered from google maps.
Additionally, I found zip code area data from inspecting the following link’s source code: Vancouver, Washington (WA) ZIP Code Maps, Data, Jobs
Methodology
To get an idea for the area I was working with, I used the zip code coordinate data to calculate the center of each zip code area, and used python’s folium library to display the city and its zip codes. I was further able to use this data to calculate a rough area to search for venues in.

From there, I went on to get foursquare venue data. I queried it for venues nearby the center of each zip code to get fairly good coverage of the region, and converted the query results into a Pandas DataFrame containing the following venue information: Zip Code, Venue, Venue ID, Venue Latitude, Venue Longitude, and Venue Category.

I explored the data to see what venues were common in the area, mostly to get a feel for the data. I found that Coffee Shops and Pizza places were the most common.

I then checked for venue data related to bubble tea.

As mentioned in the data section, there were some issues with the accuracy of the information on bubble tea venues, probably partly because it returned a lot of tea and coffee shops that didn’t sell bubble tea, and partly because some of the venue information was outdated due to shops closing. So I eventually had to gather the data myself as best I could through search engines, google maps, and personal knowledge of the area.

With this information, I started on trying to solve the original question of where would be the ideal location for another bubble tea shop. I created a grid of points that were to be scored on quality of location, positively correlated with proximity to other venues, and negatively correlated with proximity to existing bubble tea shops. This in my estimation should find a location that was both good for business, and lacking in competition from other bubble tea venues.

Dot Grid of location scores. Black dots are venue locations. Blue dots are locations with negative scores due to their proximity to existing Bubble Tea shops. Red dots are locations close to many non-Bubble Tea venues. The large red dot near the bottom right was the highest scoring point.
Results
After tweaking the logic and number of points some to refine the display and make the calculations complete in under an hour (distance calculations between hundreds of venues for each point in a grid with several thousand points was a little too ambitious), I was able to make a map detailing the rough suitability of locations, along with highlighting the location with the highest score. That location ended up being in the area of Southeast 164th Avenue and Southeast McGrillivray Boulevard.
Discussion
Some of the locations that scored well surprised me, but the logic seems to check out. I think part of it is I’m not used to thinking in terms of venue distribution. In my head, I was prioritizing distance from other bubble tea shops. Being far away from competition though is in no way a guarantee of success though, and if that was the only criteria then the ideal location for a new venue would be in the middle of nowhere. Obviously though, any shop that doesn’t have many potential customers in its area is going to fail. So I used a calculation that would allow venue density to tell me indirectly where there would be many potential customers.
These results might be further improved with a deeper understanding of who potential customers for bubble tea would be, crossed with census information for the area. It may be that proximity to high schools or other types of locations is more advantageous for bubble tea shops then for other venues.
Conclusion
I think this is a solid recommendation for where to look to open a bubble tea shop. And the methods used here, venue data on the competition and venues in the area in general used to generate a weighted grid of an area, could be useful in finding possible locations for other types of venues in other cities as well.













