This project is read-only.

Project Description

This project is developed by members of Cloud Architects team within Microsoft's Technical Evangelism & Development organization. The patterns presented here are based on real-world patterns implemented by Cloud Architects team for our partners. We will publish these patterns including code samples over the period of next two months.

generic-architecture-v5.png


There are several application types that rely on the rapid ingestion of data. Perhaps its sensor readings, or maybe log entries, mobile device actions, social media content, or other application types that deal with a live-stream flow of data.

This type of streaming data poses a few challenges: There are many ways to imagine ingesting this data, such as IIS + Web API, or perhaps through Node.js. Content may end up in SQL, blobs, tables, or other 3rd-party solutions. Analysis could be through HDInsight or custom code running in worker roles or virtual machines.

Architectural Components

Ingestion API Frontend

How will you handle incoming data? At the edge, you have a few choices, like IIS, OWIN, or Node.js. Data may arrive to the API via several protocols. For example: HTTP, TCP, XMPP, and UDP. Additionally, the data itself could be encapsulated in a variety of formats such as JSON, XML, or an application-specific binary or text format.

Persistence

Once data is received by front ends, it needs to persisted for event processing or analysis. There are various choices of persistent stores:
  • Cache: Azure Cache (service, in-role cache), Redis (in-memory or persistent), Memcached, etc.
  • Queue: Azure Storage queues, Azure Service Bus Queue (including pub/sub), single- vs multi-queue configurations (for priority, partitioning), RabbitMQ
  • NoSQL Database: Azure Tables, Cassandra, HBase, others.
  • Unstructured Storage: Data can be buffered and written as binary chunks or log files to Azure blob storage.

Event processing

As data arrives, some near-realtime analysis may be needed based on specific events, high-priority messages, etc. This can be done with custom processing logic or frameworks such as Orleans or Storm.

Analysis

Data analysis is usually done across a large data set, utilizing tools such as Hadoop for massive-scale map/reduce operations. Azure provides Hadoop-as-a-Service (HDInsight), which is the basis for the Hadoop analysis sample project here.

SQL Server is often used as well, especially with Analysis Services to produce searchable cubes.

MongoDB offers built-in map/reduce, flexible schema and query constructs, resulting in MongoDB being a popular data-analysis engine.

Patterns

  1. Simple Ingest & Persist - Ingest data via API front end and store it in various persistent stores like NoSQL database, queues, or log files. Our implementation will include ingestion via IIS/WebAPI and persistence via Azure Tables, Azure Queues, and Event Tracing for Windows to write to log files. Target release date: Nov 2013
  2. Near-real-time event processing - Ingest data via API front end, storing last n data points in low-latency, high-throughput in-memory cache. Lightweight data analysis based on last n values. This setup can tolerate some data loss, trading durability for performance. Our implementation will include IIS/WebAPI for API front-end and Azure Cache as store. Alternate implementation will replace IIS with OWIN. Another implementation will use Node.js for front-end + Redis for store. Target release date: Dec 2013
  3. Buffered Ingestion - Data is buffered at the data source, which can be on-premises hosts or devices with storage+processing logic. Such data sources write data directly to durable cloud storage after obtaining secure URI from Ingestion API front-end. Our implementation will include IIS/WebAPI for front-end and Azure Blob storage for persisting data (with optional notification via Azure Queues). Target release date: Jan 2014
  4. Routed Ingestion - Route ingested data to receivers based on payload attributes or message priority for further processing. Our implementation will implement routing via Azure Service Bus Topics. Priority Queues will be implemented as Azure Queues or Service Bus Topics. Target Release Date: Jan 2014

Team

This project is developed by members of Cloud Architects team within Microsoft's Technical Evangelism & Development organization.

Last edited Nov 28, 2013 at 12:45 AM by sebastus, version 45