Amazon Kinesis Data Stream Basic

Amazon Kinesis Data Stream Service can ingest and store a massive amount of data in real-time and also makes it available for consumption instantly.

List of important Terminology

  • Kinesis Data Stream: A Kinesis data stream is a group of one or more shards. Each shard contains one or more sequences of data records.
  • Data Record: Data Record is an immutable sequence of bytes. It contains a unique sequence number, a partition key, and a data blob.
  • Shard: A shard is a unique sequence of data records. It is a unit of throughput capacity. A single shard can read of up to 5 transactions per second with a maximum total data read rate of 2 MB per second and can write of up to 1,000 records per second with a maximum total data write rate of 1 MB per second which also includes partition keys. You have to specify the number of shards while creating the stream. The full capacity of a stream can be determined by adding all the capacities of its shards. The number of shards in a stream can be increased as well as decreased as per your requirement. However, you are charged on a per-shard basis.
  • Partition key: A partition key is a key that is used by a shard to group data within a stream. It is associated with each data record and determines which shard a given data record belongs to.
  • Sequence Number: A sequence number is a unique number in each partition-key of every data record within a shard.
  • Producer: A producer puts data records into Amazon Kinesis Data Streams using shards. It can be your own Amazon Kinesis Data Streams application to put data into shards.
  • Consumer: A consumer gets data records from Amazon Kinesis Data Streams using shards and processes them. It can be your own Amazon Kinesis Data Streams application to consume data records from shards.
  • Retention Period: The data records are accessible only for a certain period of time after adding them to the stream. The data records availability length of time is the retention period. The default retention period of a data record within a stream is 24 hours after creation. However, the retention period can be increased up to 168 hours or 7 days using the IncreaseStreamRetentionPeriod operation. The minimum retention period is 24 hours or 1 day. The retention period can be decreased using the DecreaseStreamRetentionPeriod operation. You may have to pay additional charges if you set a retention period of more than 24 hours.
  • Kinesis Client Library: Kinesis Client Library enables easy and error-free consumption of data. It also simplifies the reading of data from the stream.
  • Server-Side Encryption: Sensitive data are automatically encrypted when a producer puts data into a stream.