Kari Marttila

AWS IoT First Reflections

IntroductionLink to Introduction

I returned from my two-month summer vacation (yes, very relaxing), and when chatting to our CEO, I was happy to hear that my next project would be cloud and Clojure, just as I had wished, before heading for my long, well-deserved summer break. With the team, we would be building an IoT storage and analytics platform for one of our customers.

I read the requirements papers and pondered the solution. I read about the AWS IoT Core and AWS IoT Analytics services. Then I decided that I needed more hands-on knowledge about these services before making the architectural decisions. Therefore I did some experiments in the customer AWS account.

The SolutionsLink to The Solutions

I considered two different solutions. One solution is to use the AWS off-the-shelf services, AWS IoT Core, and AWS IoT Analytics as much as possible. That would be an excellent solution: you get many of the needed functionalities from the services and don't have to build that much yourself. After consulting the Metosin IoT guru Kimmo Koskinen I also had another solution in my mind - but let's write about that solution in another blog post, and let's focus in this blog post on those two services.

The architecture based on AWS IoT Core and AWS IoT Analytics services is depicted in the diagram below.

Architecture image

IoT solution architecture based on the AWS IoT Analytics service.

AWS GreengrassLink to AWS Greengrass

AWS Greengrass provides various components and libraries so that a developer can easily integrate the device into the AWS IoT ecosystem in the AWS cloud. I'm not talking more about AWS Greengrass since it's not the focus of my work in that project - I'm responsible for building the cloud infrastructure and applications running in the cloud. Another team builds the device capabilities. There are two components that AWS provides for building device integration, one being AWS Greengrass and another one being FreeRTOS for more low-power devices.

AWS IoT CoreLink to AWS IoT Core

AWS IoT Core provides an ingress interface for the devices. The devices need to have the certificates installed to authenticate when they are subscribing to IoT Core Topics or are publishing to those topics (not part of my work and not focus on this blog post - typically done at the factory). Based on the authentication of the devices, AWS can identify the device - as an IoT Thing. The certificate binds together the device, the AWS IoT Thing, and a policy saying what this Thing can do (e.g., which AWS IoT Topics it can subscribe/publish). So, after everything is set up, the device can publish messages to AWS IoT Core which then stores the messages into an AWS Core IoT Topic. I used the AWS Tutorials to tweak a small Python "IoT device" running on my laptop, sending messages to the topic:

Publishing message 400 , message:  
{'client_id': 1, 
 'device_id': 7, 
 'voltage': 647, 
 'temperature': 110, 
 'sequence': 400}

Now I have the development setup ready: two "IoT devices", one publishing messages to the AWS IoT Core Topic and another subscribing those messages from the same Topic - I can see in a console that the messages are arriving into the Topic.

AWS IoT AnalyticsLink to AWS IoT Analytics

AWS IoT Analytics Pipeline Options

AWS IoT Analytics Pipeline Options.

AWS IoT Analytics provides an off-the-shelf solution for storing and analyzing IoT messages. The story goes like this:

  • Channel. First, you need to create an AWS IoT Analytics Channel. The channel can be integrated with an AWS IoT Core Topic as depicted in the architecture diagram.
  • Pipeline. AWS IoT Analytics Pipeline provides a place to filter and enrich the messages (see the picture at the beginning of this chapter).
  • Data Store. An AWS IoT Analytics Data Store is an abstraction for the backend storage, which is S3. You can let AWS IoT Analytics manage the S3 backend or use your own S3 bucket.
  • Dataset. An AWS IoT Analytics Dataset is the query interface to the Data store. Since the AWS IoT Analytics Data Store is based on the S3 data lake idea, I'm pretty sure the Dataset uses Athena behind the curtains.

I added ECS to the diagram to represent the backend that utilizes the device metrics. So, in this solution, it uses the AWS IoT Analytics Dataset.

ConclusionsLink to Conclusions

Which are the pros and cons of this solution? One obvious benefit is that you get the overall solution pretty much like an off-the-shelf solution. After experimenting with the solution using the AWS Console, you can create the actual infrastructure using, e.g., Terraform or AWS CloudFormation. Now you are good to go and ready to build the backend that utilizes the device metrics. By the way, it might be interesting to know why the AWS IoT team decided to use S3 based data lake as their IoT Analytics data store and not their new Amazon Timestream service.

There is the other side of the coin, however. The major disadvantage is the data storage model, data lake. If the backend needs swift response times, this is a bit of a deal-breaker. We are going to the Athena, which starts collecting objects from the S3 buckets (remember: S3 is an object store) and then interpreting the SQL query against the information in those objects (either JSON or parquet gzipped objects) - this takes time. Another disadvantage is that every query has a cost. A real relational database with indexing might be a better option in this case.

I'll present another solution in my next blog post, so stay tuned!

Kari Marttila