Affinio is an advanced marketing intelligence platform that enables brands to understand their users in a deeper and richer level. Affinio’s learning engine extracts marketing insights for its clients from mining billions of points of social media data. In order to store and process billions of social network connections without the overhead of database management, partitioning, and indexing, the Affinio engineering team chose Azure DocumentDB.
Why are NoSQL databases a good fit for social data?
Affinio’s marketing platform extracts data from social network platforms like Twitter and other large social networks in order to feed into its learning engine and learn insights about users and their interests. The biggest dataset consisted of approximately one billion social media profiles, growing at 10 million per month. Affinio also needs to store and process a number of other feeds including Twitter tweets (status messages), geo-location data, and machine learning results of which topics are likely to interest which users.
A NoSQL database is a natural choice for these data feeds for a number of reasons:
- The APIs from popular social networks produced data in JSON format.
- The data volume is in the Tbs, and needs to be refreshed frequently (with both the volume and frequency expected to increase rapidly over time).
- Data from multiple social media producers is processed downstream, and each social media channel has its own schema that evolves independently.
- And crucially, a small development team needs to be able to iterate rapidly on new features, which means that the database must be easy to setup, manage, and scale.
Why does Affinio use DocumentDB over AWS DynamoDB and Elasticsearch
The Affinio engineering team initially built their storage solution on top of Elasticsearch on AWS EC2 virtual machines. While Elasticsearch addressed their need for scalable JSON storage, they realized that setting up and managing their own Elasticsearch servers took away precious time from their development team. They then evaluated Amazon’s DynamoDB service which was fully-managed, but it did not have the query capabilities that Affinio needed.
Affinio then tried Microsoft Azure DocumentDB, Microsoft’s planet-scale NoSQL database service. DocumentDB is a fully-managed NoSQL database with automatic indexing of JSON documents, elastic scaling of throughput and storage, and rich query capabilities which meets all their requirements for functionality and performance. As a result, Affinio decided to migrate its entire stack off AWS and onto Microsoft Azure.
“Before moving to DocumentDB, my developers would need to come to me to confirm that our Elasticsearch deployment would support their data or if I would need to scale things to handle it. DocumentDB removed me as a bottleneck, which has been great for me and them.”
-Stephen Hankinson, CTO, Affinio
Read the Affinio case study to learn more about how Affinio harnesses DocumentDB to process terabytes of social network data, and why they chose DocumentDB over Amazon DynamoDB and Elasticsearch
https://customers.microsoft.com/en-US/story/CS-06632