Tag Archives: kerberos

How to Delegate Administration of Your AWS Managed Microsoft AD Directory to Your On-Premises Active Directory Users

Post Syndicated from Vijay Sharma original https://aws.amazon.com/blogs/security/how-to-delegate-administration-of-your-aws-managed-microsoft-ad-directory-to-your-on-premises-active-directory-users/

You can now enable your on-premises users administer your AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD. Using an Active Directory (AD) trust and the new AWS delegated AD security groups, you can grant administrative permissions to your on-premises users by managing group membership in your on-premises AD directory. This simplifies how you manage who can perform administration. It also makes it easier for your administrators because they can sign in to their existing workstation with their on-premises AD credential to administer your AWS Managed Microsoft AD.

AWS created new domain local AD security groups (AWS delegated groups) in your AWS Managed Microsoft AD directory. Each AWS delegated group has unique AD administrative permissions. Users that are members in the new AWS delegated groups get permissions to perform administrative tasks, such as add users, configure fine-grained password policies and enable Microsoft enterprise Certificate Authority. Because the AWS delegated groups are domain local in scope, you can use them through an AD Trust to your on-premises AD. This eliminates the requirement to create and use separate identities to administer your AWS Managed Microsoft AD. Instead, by adding selected on-premises users to desired AWS delegated groups, you can grant your administrators some or all of the permissions. You can simplify this even further by adding on-premises AD security groups to the AWS delegated groups. This enables you to add and remove users from your on-premises AD security group so that they can manage administrative permissions in your AWS Managed Microsoft AD.

In this blog post, I will show you how to delegate permissions to your on-premises users to perform an administrative task–configuring fine-grained password policies–in your AWS Managed Microsoft AD directory. You can follow the steps in this post to delegate other administrative permissions, such as configuring group Managed Service Accounts and Kerberos constrained delegation, to your on-premises users.

Background

Until now, AWS Managed Microsoft AD delegated administrative permissions for your directory by creating AD security groups in your Organization Unit (OU) and authorizing these AWS delegated groups for common administrative activities. The admin user in your directory created user accounts within your OU, and granted these users permissions to administer your directory by adding them to one or more of these AWS delegated groups.

However, if you used your AWS Managed Microsoft AD with a trust to an on-premises AD forest, you couldn’t add users from your on-premises directory to these AWS delegated groups. This is because AWS created the AWS delegated groups with global scope, which restricts adding users from another forest. This necessitated that you create different user accounts in AWS Managed Microsoft AD for the purpose of administration. As a result, AD administrators typically had to remember additional credentials for AWS Managed Microsoft AD.

To address this, AWS created new AWS delegated groups with domain local scope in a separate OU called AWS Delegated Groups. These new AWS delegated groups with domain local scope are more flexible and permit adding users and groups from other domains and forests. This allows your admin user to delegate your on-premises users and groups administrative permissions to your AWS Managed Microsoft AD directory.

Note: If you already have an existing AWS Managed Microsoft AD directory containing the original AWS delegated groups with global scope, AWS preserved the original AWS delegated groups in the event you are currently using them with identities in AWS Managed Microsoft AD. AWS recommends that you transition to use the new AWS delegated groups with domain local scope. All newly created AWS Managed Microsoft AD directories have the new AWS delegated groups with domain local scope only.

Now, I will show you the steps to delegate administrative permissions to your on-premises users and groups to configure fine-grained password policies in your AWS Managed Microsoft AD directory.

Prerequisites

For this post, I assume you are familiar with AD security groups and how security group scope rules work. I also assume you are familiar with AD trusts.

The instructions in this blog post require you to have the following components running:

Solution overview

I will now show you how to manage which on-premises users have delegated permissions to administer your directory by efficiently using on-premises AD security groups to manage these permissions. I will do this by:

  1. Adding on-premises groups to an AWS delegated group. In this step, you sign in to management instance connected to AWS Managed Microsoft AD directory as admin user and add on-premises groups to AWS delegated groups.
  2. Administer your AWS Managed Microsoft AD directory as on-premises user. In this step, you sign in to a workstation connected to your on-premises AD using your on-premises credentials and administer your AWS Managed Microsoft AD directory.

For the purpose of this blog, I already have an on-premises AD directory (in this case, on-premises.com). I also created an AWS Managed Microsoft AD directory (in this case, corp.example.com) that I use with Amazon RDS for SQL Server. To enable Integrated Windows Authentication to my on-premises.com domain, I established a one-way outgoing trust from my AWS Managed Microsoft AD directory to my on-premises AD directory. To administer my AWS Managed Microsoft AD, I created an Amazon EC2 for Windows Server instance (in this case, Cloud Management). I also have an on-premises workstation (in this case, On-premises Management), that is connected to my on-premises AD directory.

The following diagram represents the relationships between the on-premises AD and the AWS Managed Microsoft AD directory.

The left side represents the AWS Cloud containing AWS Managed Microsoft AD directory. I connected the directory to the on-premises AD directory via a 1-way forest trust relationship. When AWS created my AWS Managed Microsoft AD directory, AWS created a group called AWS Delegated Fine Grained Password Policy Administrators that has permissions to configure fine-grained password policies in AWS Managed Microsoft AD.

The right side of the diagram represents the on-premises AD directory. I created a global AD security group called On-premises fine grained password policy admins and I configured it so all members can manage fine grained password policies in my on-premises AD. I have two administrators in my company, John and Richard, who I added as members of On-premises fine grained password policy admins. I want to enable John and Richard to also manage fine grained password policies in my AWS Managed Microsoft AD.

While I could add John and Richard to the AWS Delegated Fine Grained Password Policy Administrators individually, I want a more efficient way to delegate and remove permissions for on-premises users to manage fine grained password policies in my AWS Managed Microsoft AD. In fact, I want to assign permissions to the same people that manage password policies in my on-premises directory.

Diagram showing delegation of administrative permissions to on-premises users

To do this, I will:

  1. As admin user, add the On-premises fine grained password policy admins as member of the AWS Delegated Fine Grained Password Policy Administrators security group from my Cloud Management machine.
  2. Manage who can administer password policies in my AWS Managed Microsoft AD directory by adding and removing users as members of the On-premises fine grained password policy admins. Doing so enables me to perform all my delegation work in my on-premises directory without the need to use a remote desktop protocol (RDP) session to my Cloud Management instance. In this case, Richard, who is a member of On-premises fine grained password policy admins group can now administer AWS Managed Microsoft AD directory from On-premises Management workstation.

Although I’m showing a specific case using fine grained password policy delegation, you can do this with any of the new AWS delegated groups and your on-premises groups and users.

Let’s get started.

Step 1 – Add on-premises groups to AWS delegated groups

In this step, open an RDP session to the Cloud Management instance and sign in as the admin user in your AWS Managed Microsoft AD directory. Then, add your users and groups from your on-premises AD to AWS delegated groups in AWS Managed Microsoft AD directory. In this example, I do the following:

  1. Sign in to the Cloud Management instance with the user name admin and the password that you set for the admin user when you created your directory.
  2. Open the Microsoft Windows Server Manager and navigate to Tools > Active Directory Users and Computers.
  3. Switch to the tree view and navigate to corp.example.com > AWS Delegated Groups. Right-click AWS Delegated Fine Grained Password Policy Administrators and select Properties.
  4. In the AWS Delegated Fine Grained Password Policy window, switch to Members tab and choose Add.
  5. In the Select Users, Contacts, Computers, Service Accounts, or Groups window, choose Locations.
  6. In the Locations window, select on-premises.com domain and choose OK.
  7. In the Enter the object names to select box, enter on-premises fine grained password policy admins and choose Check Names.
  8. Because I have a 1-way trust from AWS Managed Microsoft AD to my on-premises AD, Windows prompts me to enter credentials for an on-premises user account that has permissions to complete the search. If I had a 2-way trust and the admin account in my AWS Managed Microsoft AD has permissions to read my on-premises directory, Windows will not prompt me.In the Windows Security window, enter the credentials for an account with permissions for on-premises.com and choose OK.
  9. Click OK to add On-premises fine grained password policy admins group as a member of the AWS Delegated Fine Grained Password Policy Administrators group in your AWS Managed Microsoft AD directory.

At this point, any user that is a member of On-premises fine grained password policy admins group has permissions to manage password policies in your AWS Managed Microsoft AD directory.

Step 2 – Administer your AWS Managed Microsoft AD as on-premises user

Any member of the on-premises group(s) that you added to an AWS delegated group inherited the permissions of the AWS delegated group.

In this example, Richard signs in to the On-premises Management instance. Because Richard inherited permissions from Delegated Fine Grained Password Policy Administrators, he can now administer fine grained password policies in the AWS Managed Microsoft AD directory using on-premises credentials.

  1. Sign in to the On-premises Management instance as Richard.
  2. Open the Microsoft Windows Server Manager and navigate to Tools > Active Directory Users and Computers.
  3. Switch to the tree view, right-click Active Directory Users and Computers, and then select Change Domain.
  4. In the Change Domain window, enter corp.example.com, and then choose OK.
  5. You’ll be connected to your AWS Managed Microsoft AD domain:

Richard can now administer the password policies. Because John is also a member of the AWS delegated group, John can also perform password policy administration the same way.

In future, if Richard moves to another division within the company and you hire Judy as a replacement for Richard, you can simply remove Richard from On-premises fine grained password policy admins group and add Judy to this group. Richard will no longer have administrative permissions, while Judy can now administer password policies for your AWS Managed Microsoft AD directory.

Summary

We’ve tried to make it easier for you to administer your AWS Managed Microsoft AD directory by creating AWS delegated groups with domain local scope. You can add your on-premises AD groups to the AWS delegated groups. You can then control who can administer your directory by managing group membership in your on-premises AD directory. Your administrators can sign in to their existing on-premises workstations using their on-premises credentials and administer your AWS Managed Microsoft AD directory. I encourage you to explore the new AWS delegated security groups by using Active Directory Users and Computers from the management instance for your AWS Managed Microsoft AD. To learn more about AWS Directory Service, see the AWS Directory Service home page. If you have questions, please post them on the Directory Service forum. If you have comments about this post, submit them in the “Comments” section below.

 

Best Practices for Running Apache Kafka on AWS

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-kafka-on-aws/

This post was written in partnership with Intuit to share learnings, best practices, and recommendations for running an Apache Kafka cluster on AWS. Thanks to Vaishak Suresh and his colleagues at Intuit for their contribution and support.

Intuit, in their own words: Intuit, a leading enterprise customer for AWS, is a creator of business and financial management solutions. For more information on how Intuit partners with AWS, see our previous blog post, Real-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS. Apache Kafka is an open-source, distributed streaming platform that enables you to build real-time streaming applications.

The best practices described in this post are based on our experience in running and operating large-scale Kafka clusters on AWS for more than two years. Our intent for this post is to help AWS customers who are currently running Kafka on AWS, and also customers who are considering migrating on-premises Kafka deployments to AWS.

AWS offers Amazon Kinesis Data Streams, a Kafka alternative that is fully managed.

Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data. AWS offers many different instance types and storage option combinations for Kafka deployments. However, given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this blog post, we cover the following aspects of running Kafka clusters on AWS:

  • Deployment considerations and patterns
  • Storage options
  • Instance types
  • Networking
  • Upgrades
  • Performance tuning
  • Monitoring
  • Security
  • Backup and restore

Note: While implementing Kafka clusters in a production environment, make sure also to consider factors like your number of messages, message size, monitoring, failure handling, and any operational issues.

Deployment considerations and patterns

In this section, we discuss various deployment options available for Kafka on AWS, along with pros and cons of each option. A successful deployment starts with thoughtful consideration of these options. Considering availability, consistency, and operational overhead of the deployment helps when choosing the right option.

Single AWS Region, Three Availability Zones, All Active

One typical deployment pattern (all active) is in a single AWS Region with three Availability Zones (AZs). One Kafka cluster is deployed in each AZ along with Apache ZooKeeper and Kafka producer and consumer instances as shown in the illustration following.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers and Kafka cluster are deployed on each AZ.
  • Data is distributed evenly across three Kafka clusters by using Elastic Load Balancer.
  • Kafka consumers aggregate data from all three Kafka clusters.

Kafka cluster failover occurs this way:

  • Mark down all Kafka producers
  • Stop consumers
  • Debug and restack Kafka
  • Restart consumers
  • Restart Kafka producers

Following are the pros and cons of this pattern.

ProsCons
  • Highly available
  • Can sustain the failure of two AZs
  • No message loss during failover
  • Simple deployment

 

  • Very high operational overhead:
    • All changes need to be deployed three times, one for each Kafka cluster
    • Maintaining and monitoring three Kafka clusters
    • Maintaining and monitoring three consumer clusters

A restart is required for patching and upgrading brokers in a Kafka cluster. In this approach, a rolling upgrade is done separately for each cluster.

Single Region, Three Availability Zones, Active-Standby

Another typical deployment pattern (active-standby) is in a single AWS Region with a single Kafka cluster and Kafka brokers and Zookeepers distributed across three AZs. Another similar Kafka cluster acts as a standby as shown in the illustration following. You can use Kafka mirroring with MirrorMaker to replicate messages between any two clusters.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers are deployed on all three AZs.
  • Only one Kafka cluster is deployed across three AZs (active).
  • ZooKeeper instances are deployed on each AZ.
  • Brokers are spread evenly across all three AZs.
  • Kafka consumers can be deployed across all three AZs.
  • Standby Kafka producers and a Multi-AZ Kafka cluster are part of the deployment.

Kafka cluster failover occurs this way:

  • Switch traffic to standby Kafka producers cluster and Kafka cluster.
  • Restart consumers to consume from standby Kafka cluster.

Following are the pros and cons of this pattern.

ProsCons
  • Less operational overhead when compared to the first option
  • Only one Kafka cluster to manage and consume data from
  • Can handle single AZ failures without activating a standby Kafka cluster
  • Added latency due to cross-AZ data transfer among Kafka brokers
  • For Kafka versions before 0.10, replicas for topic partitions have to be assigned so they’re distributed to the brokers on different AZs (rack-awareness)
  • The cluster can become unavailable in case of a network glitch, where ZooKeeper does not see Kafka brokers
  • Possibility of in-transit message loss during failover

Intuit recommends using a single Kafka cluster in one AWS Region, with brokers distributing across three AZs (single region, three AZs). This approach offers stronger fault tolerance than otherwise, because a failed AZ won’t cause Kafka downtime.

Storage options

There are two storage options for file storage in Amazon EC2:

Ephemeral storage is local to the Amazon EC2 instance. It can provide high IOPS based on the instance type. On the other hand, Amazon EBS volumes offer higher resiliency and you can configure IOPS based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. Your choice of storage is closely related to the type of workload supported by your Kafka cluster.

Kafka provides built-in fault tolerance by replicating data partitions across a configurable number of instances. If a broker fails, you can recover it by fetching all the data from other brokers in the cluster that host the other replicas. Depending on the size of the data transfer, it can affect recovery process and network traffic. These in turn eventually affect the cluster’s performance.

The following table contrasts the benefits of using an instance store versus using EBS for storage.

Instance storeEBS
  • Instance storage is recommended for large- and medium-sized Kafka clusters. For a large cluster, read/write traffic is distributed across a high number of brokers, so the loss of a broker has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important, but a failed broker takes longer and requires more network traffic for a smaller Kafka cluster.
  • Storage-optimized instances like h1, i3, and d2 are an ideal choice for distributed applications like Kafka.

 

  • The primary advantage of using EBS in a Kafka deployment is that it significantly reduces data-transfer traffic when a broker fails or must be replaced. The replacement broker joins the cluster much faster.
  • Data stored on EBS is persisted in case of an instance failure or termination. The broker’s data stored on an EBS volume remains intact, and you can mount the EBS volume to a new EC2 instance. Most of the replicated data for the replacement broker is already available in the EBS volume and need not be copied over the network from another broker. Only the changes made after the original broker failure need to be transferred across the network. That makes this process much faster.

 

 

Intuit chose EBS because of their frequent instance restacking requirements and also other benefits provided by EBS.

Generally, Kafka deployments use a replication factor of three. EBS offers replication within their service, so Intuit chose a replication factor of two instead of three.

Instance types

The choice of instance types is generally driven by the type of storage required for your streaming applications on a Kafka cluster. If your application requires ephemeral storage, h1, i3, and d2 instances are your best option.

Intuit used r3.xlarge instances for their brokers and r3.large for ZooKeeper, with ST1 (throughput optimized HDD) EBS for their Kafka cluster.

Here are sample benchmark numbers from Intuit tests.

ConfigurationBroker bytes (MB/s)
  • r3.xlarge
  • ST1 EBS
  • 12 brokers
  • 12 partitions

 

Aggregate 346.9

If you need EBS storage, then AWS has a newer-generation r4 instance. The r4 instance is superior to R3 in many ways:

  • It has a faster processor (Broadwell).
  • EBS is optimized by default.
  • It features networking based on Elastic Network Adapter (ENA), with up to 10 Gbps on smaller sizes.
  • It costs 20 percent less than R3.

Note: It’s always best practice to check for the latest changes in instance types.

Networking

The network plays a very important role in a distributed system like Kafka. A fast and reliable network ensures that nodes can communicate with each other easily. The available network throughput controls the maximum amount of traffic that Kafka can handle. Network throughput, combined with disk storage, is often the governing factor for cluster sizing.

If you expect your cluster to receive high read/write traffic, select an instance type that offers 10-Gb/s performance.

In addition, choose an option that keeps interbroker network traffic on the private subnet, because this approach allows clients to connect to the brokers. Communication between brokers and clients uses the same network interface and port. For more details, see the documentation about IP addressing for EC2 instances.

If you are deploying in more than one AWS Region, you can connect the two VPCs in the two AWS Regions using cross-region VPC peering. However, be aware of the networking costs associated with cross-AZ deployments.

Upgrades

Kafka has a history of not being backward compatible, but its support of backward compatibility is getting better. During a Kafka upgrade, you should keep your producer and consumer clients on a version equal to or lower than the version you are upgrading from. After the upgrade is finished, you can start using a new protocol version and any new features it supports. There are three upgrade approaches available, discussed following.

Rolling or in-place upgrade

In a rolling or in-place upgrade scenario, upgrade one Kafka broker at a time. Take into consideration the recommendations for doing rolling restarts to avoid downtime for end users.

Downtime upgrade

If you can afford the downtime, you can take your entire cluster down, upgrade each Kafka broker, and then restart the cluster.

Blue/green upgrade

Intuit followed the blue/green deployment model for their workloads, as described following.

If you can afford to create a separate Kafka cluster and upgrade it, we highly recommend the blue/green upgrade scenario. In this scenario, we recommend that you keep your clusters up-to-date with the latest Kafka version. For additional details on Kafka version upgrades or more details, see the Kafka upgrade documentation.

The following illustration shows a blue/green upgrade.

In this scenario, the upgrade plan works like this:

  • Create a new Kafka cluster on AWS.
  • Create a new Kafka producers stack to point to the new Kafka cluster.
  • Create topics on the new Kafka cluster.
  • Test the green deployment end to end (sanity check).
  • Using Amazon Route 53, change the new Kafka producers stack on AWS to point to the new green Kafka environment that you have created.

The roll-back plan works like this:

  • Switch Amazon Route 53 to the old Kafka producers stack on AWS to point to the old Kafka environment.

For additional details on blue/green deployment architecture using Kafka, see the re:Invent presentation Leveraging the Cloud with a Blue-Green Deployment Architecture.

Performance tuning

You can tune Kafka performance in multiple dimensions. Following are some best practices for performance tuning.

 These are some general performance tuning techniques:

  • If throughput is less than network capacity, try the following:
    • Add more threads
    • Increase batch size
    • Add more producer instances
    • Add more partitions
  • To improve latency when acks =-1, increase your num.replica.fetches value.
  • For cross-AZ data transfer, tune your buffer settings for sockets and for OS TCP.
  • Make sure that num.io.threads is greater than the number of disks dedicated for Kafka.
  • Adjust num.network.threads based on the number of producers plus the number of consumers plus the replication factor.
  • Your message size affects your network bandwidth. To get higher performance from a Kafka cluster, select an instance type that offers 10 Gb/s performance.

For Java and JVM tuning, try the following:

  • Minimize GC pauses by using the Oracle JDK, which uses the new G1 garbage-first collector.
  • Try to keep the Kafka heap size below 4 GB.

Monitoring

Knowing whether a Kafka cluster is working correctly in a production environment is critical. Sometimes, just knowing that the cluster is up is enough, but Kafka applications have many moving parts to monitor. In fact, it can easily become confusing to understand what’s important to watch and what you can set aside. Items to monitor range from simple metrics about the overall rate of traffic, to producers, consumers, brokers, controller, ZooKeeper, topics, partitions, messages, and so on.

For monitoring, Intuit used several tools, including Newrelec, Wavefront, Amazon CloudWatch, and AWS CloudTrail. Our recommended monitoring approach follows.

For system metrics, we recommend that you monitor:

  • CPU load
  • Network metrics
  • File handle usage
  • Disk space
  • Disk I/O performance
  • Garbage collection
  • ZooKeeper

For producers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg

For consumers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg

Security

Like most distributed systems, Kafka provides the mechanisms to transfer data with relatively high security across the components involved. Depending on your setup, security might involve different services such as encryption, Kerberos, Transport Layer Security (TLS) certificates, and advanced access control list (ACL) setup in brokers and ZooKeeper. The following tells you more about the Intuit approach. For details on Kafka security not covered in this section, see the Kafka documentation.

Encryption at rest

For EBS-backed EC2 instances, you can enable encryption at rest by using Amazon EBS volumes with encryption enabled. Amazon EBS uses AWS Key Management Service (AWS KMS) for encryption. For more details, see Amazon EBS Encryption in the EBS documentation. For instance store–backed EC2 instances, you can enable encryption at rest by using Amazon EC2 instance store encryption.

Encryption in transit

Kafka uses TLS for client and internode communications.

Authentication

Authentication of connections to brokers from clients (producers and consumers) to other brokers and tools uses either Secure Sockets Layer (SSL) or Simple Authentication and Security Layer (SASL).

Kafka supports Kerberos authentication. If you already have a Kerberos server, you can add Kafka to your current configuration.

Authorization

In Kafka, authorization is pluggable and integration with external authorization services is supported.

Backup and restore

The type of storage used in your deployment dictates your backup and restore strategy.

The best way to back up a Kafka cluster based on instance storage is to set up a second cluster and replicate messages using MirrorMaker. Kafka’s mirroring feature makes it possible to maintain a replica of an existing Kafka cluster. Depending on your setup and requirements, your backup cluster might be in the same AWS Region as your main cluster or in a different one.

For EBS-based deployments, you can enable automatic snapshots of EBS volumes to back up volumes. You can easily create new EBS volumes from these snapshots to restore. We recommend storing backup files in Amazon S3.

For more information on how to back up in Kafka, see the Kafka documentation.

Conclusion

In this post, we discussed several patterns for running Kafka in the AWS Cloud. AWS also provides an alternative managed solution with Amazon Kinesis Data Streams, there are no servers to manage or scaling cliffs to worry about, you can scale the size of your streaming pipeline in seconds without downtime, data replication across availability zones is automatic, you benefit from security out of the box, Kinesis Data Streams is tightly integrated with a wide variety of AWS services like Lambda, Redshift, Elasticsearch and it supports open source frameworks like Storm, Spark, Flink, and more. You may refer to kafka-kinesis connector.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Implement Serverless Log Analytics Using Amazon Kinesis Analytics and Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics.


About the Author

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

Best Practices for Running Apache Cassandra on Amazon EC2

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-cassandra-on-amazon-ec2/

Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.

Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:

  • Cassandra resource overview
  • Deployment considerations
  • Storage options
  • Networking
  • High availability and resiliency
  • Maintenance
  • Security

Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.

Several customers who have been using large Cassandra clusters for many years have moved to DynamoDB to eliminate the complications of administering Cassandra clusters and maintaining high availability and durability themselves. Gumgum.com is one customer who migrated to DynamoDB and observed significant savings. For more information, see Moving to Amazon DynamoDB from Hosted Cassandra: A Leap Towards 60% Cost Saving per Year.

AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.

Cassandra resource overview

Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.

ResourceCassandraAWS
Cluster

A single Cassandra deployment.

 

This typically consists of multiple physical locations, keyspaces, and physical servers.

A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
DatacenterA group of nodes configured as a single replication group.

A logical deployment construct in AWS.

 

A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.

Rack

A collection of servers.

 

A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.

A single Availability Zone.
Server/nodeA physical virtual machine running Cassandra software.An EC2 instance.
TokenConceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data.Managed within Cassandra.
Virtual node (vnode)Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter.Managed within Cassandra.
Replication factorThe total number of replicas across the cluster.Managed within Cassandra.

Deployment considerations

One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.

We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.

You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.

Deployment patterns

In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.

  • Single AWS Region, 3 Availability Zones
  • Active-active, multi-Region
  • Active-standby, multi-Region

Single region, 3 Availability Zones

In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.

To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).

This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.

ProsCons

●     Highly available, can sustain failure of one Availability Zone.

●     Simple deployment

●     Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.

 

Active-active, multi-Region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.

ProsCons

●     No data loss during failover.

●     Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.

●     Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.

●     High operational overhead

●     The second Region effectively doubles the cost

 

Active-standby, multi-region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).

ProsCons

●     No data loss during failover.

●     Highly available, can sustain failure or partitioning of one whole Region.

●     High operational overhead.

●     High latency for writes for eventual consistency.

●     The second Region effectively doubles the cost.

Storage options

In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:

Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.

The choice of instance type is generally driven by the type of storage:

  • If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
  • If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
  • Burstable instance types (T2) don’t offer good performance for Cassandra deployments.

Instance store

Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.

In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.

As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).

 Ephemeral storageAmazon EBSComments

IOPS

(translates to higher query performance)

Up to 3.3M on I3

80K/instance

10K/gp2/volume

32K/io1/volume

This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.

 

Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.

AWS instance typesI3Compute optimized, C5Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Backup/ recoveryCustomBasic building blocks are available from AWS.

Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.

a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.

b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.

Amazon EBS

EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.

The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.

Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.

Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.

EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.

Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.

EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.

Networking

If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.

Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.

In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.

To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.

If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.

High availability and resiliency

Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.

Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.

Maintenance

In this section, we look at ways to ensure that your Cassandra cluster is healthy:

  • Scaling
  • Upgrades
  • Backup and restore

Scaling

Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.

Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.

Upgrades

All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.

In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.

Backup and restore

Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.

For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.

Security

We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.

Encryption at rest

Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.

Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.

Encryption in transit

Cassandra uses Transport Layer Security (TLS) for client and internode communications.

Authentication

The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.

Authorization

The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.

Conclusion

In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Analyze Your Data on Amazon DynamoDB with Apache Spark and Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight.


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

 

Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.

 

 

 

Build a Multi-Tenant Amazon EMR Cluster with Kerberos, Microsoft Active Directory Integration and EMRFS Authorization

Post Syndicated from Songzhi Liu original https://aws.amazon.com/blogs/big-data/build-a-multi-tenant-amazon-emr-cluster-with-kerberos-microsoft-active-directory-integration-and-emrfs-authorization/

One of the challenges faced by our customers—especially those in highly regulated industries—is balancing the need for security with flexibility. In this post, we cover how to enable multi-tenancy and increase security by using EMRFS (EMR File System) authorization, the Amazon S3 storage-level authorization on Amazon EMR.

Amazon EMR is an easy, fast, and scalable analytics platform enabling large-scale data processing. EMRFS authorization provides Amazon S3 storage-level authorization by configuring EMRFS with multiple IAM roles. With this functionality enabled, different users and groups can share the same cluster and assume their own IAM roles respectively.

Simply put, on Amazon EMR, we can now have an Amazon EC2 role per user assumed at run time instead of one general EC2 role at the cluster level. When the user is trying to access Amazon S3 resources, Amazon EMR evaluates against a predefined mappings list in EMRFS configurations and picks up the right role for the user.

In this post, we will discuss what EMRFS authorization is (Amazon S3 storage-level access control) and show how to configure the role mappings with detailed examples. You will then have the desired permissions in a multi-tenant environment. We also demo Amazon S3 access from HDFS command line, Apache Hive on Hue, and Apache Spark.

EMRFS authorization for Amazon S3

There are two prerequisites for using this feature:

  1. Users must be authenticated, because EMRFS needs to map the current user/group/prefix to a predefined user/group/prefix. There are several authentication options. In this post, we launch a Kerberos-enabled cluster that manages the Key Distribution Center (KDC) on the master node, and enable a one-way trust from the KDC to a Microsoft Active Directory domain.
  2. The application must support accessing Amazon S3 via Applications that have their own S3FileSystem APIs (for example, Presto) are not supported at this time.

EMRFS supports three types of mapping entries: user, group, and Amazon S3 prefix. Let’s use an example to show how this works.

Assume that you have the following three identities in your organization, and they are defined in the Active Directory:

To enable all these groups and users to share the EMR cluster, you need to define the following IAM roles:

In this case, you create a separate Amazon EC2 role that doesn’t give any permission to Amazon S3. Let’s call the role the base role (the EC2 role attached to the EMR cluster), which in this example is named EMR_EC2_RestrictedRole. Then, you define all the Amazon S3 permissions for each specific user or group in their own roles. The restricted role serves as the fallback role when the user doesn’t belong to any user/group, nor does the user try to access any listed Amazon S3 prefixes defined on the list.

Important: For all other roles, like emrfs_auth_group_role_data_eng, you need to add the base role (EMR_EC2_RestrictedRole) as the trusted entity so that it can assume other roles. See the following example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::511586466501:role/EMR_EC2_RestrictedRole"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

The following is an example policy for the admin user role (emrfs_auth_user_role_admin_user):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "*"
        }
    ]
}

We are assuming the admin user has access to all buckets in this example.

The following is an example policy for the data science group role (emrfs_auth_group_role_data_sci):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-science-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

This role grants all Amazon S3 permissions to the emrfs-auth-data-science-bucket-demo bucket and all the objects in it. Similarly, the policy for the role emrfs_auth_group_role_data_eng is shown below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo/*",
                "arn:aws:s3:::emrfs-auth-data-engineering-bucket-demo"
            ],
            "Action": [
                "s3:*"
            ]
        }
    ]
}

Example role mappings configuration

To configure EMRFS authorization, you use EMR security configuration. Here is the configuration we use in this post

Consider the following scenario.

First, the admin user admin1 tries to log in and run a command to access Amazon S3 data through EMRFS. The first role emrfs_auth_user_role_admin_user on the mapping list, which is a user role, is mapped and picked up. Then admin1 has access to the Amazon S3 locations that are defined in this role.

Then a user from the data engineer group (grp_data_engineering) tries to access a data bucket to run some jobs. When EMRFS sees that the user is a member of the grp_data_engineering group, the group role emrfs_auth_group_role_data_eng is assumed, and the user has proper access to Amazon S3 that is defined in the emrfs_auth_group_role_data_eng role.

Next, the third user comes, who is not an admin and doesn’t belong to any of the groups. After failing evaluation of the top three entries, EMRFS evaluates whether the user is trying to access a certain Amazon S3 prefix defined in the last mapping entry. This type of mapping entry is called the prefix type. If the user is trying to access s3://emrfs-auth-default-bucket-demo/, then the prefix mapping is in effect, and the prefix role emrfs_auth_prefix_role_default_s3_prefix is assumed.

If the user is not trying to access any of the Amazon S3 paths that are defined on the list—which means it failed the evaluation of all the entries—it only has the permissions defined in the EMR_EC2RestrictedRole. This role is assumed by the EC2 instances in the cluster.

In this process, all the mappings defined are evaluated in the defined order, and the first role that is mapped is assumed, and the rest of the list is skipped.

Setting up an EMR cluster and mapping Active Directory users and groups

Now that we know how EMRFS authorization role mapping works, the next thing we need to think about is how we can use this feature in an easy and manageable way.

Active Directory setup

Many customers manage their users and groups using Microsoft Active Directory or other tools like OpenLDAP. In this post, we create the Active Directory on an Amazon EC2 instance running Windows Server and create the users and groups we will be using in the example below. After setting up Active Directory, we use the Amazon EMR Kerberos auto-join capability to establish a one-way trust from the KDC running on the EMR master node to the Active Directory domain on the EC2 instance. You can use your own directory services as long as it talks to the LDAP (Lightweight Directory Access Protocol).

To create and join Active Directory to Amazon EMR, follow the steps in the blog post Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory.

After configuring Active Directory, you can create all the users and groups using the Active Directory tools and add users to appropriate groups. In this example, we created users like admin1, dataeng1, datascientist1, grp_data_engineering, and grp_data_science, and then add the users to the right groups.

Join the EMR cluster to an Active Directory domain

For clusters with Kerberos, Amazon EMR now supports automated Active Directory domain joins. You can use the security configuration to configure the one-way trust from the KDC to the Active Directory domain. You also configure the EMRFS role mappings in the same security configuration.

The following is an example of the EMR security configuration with a trusted Active Directory domain EMRKRB.TEST.COM and the EMRFS role mappings as we discussed earlier:

The EMRFS role mapping configuration is shown in this example:

We will also provide an example AWS CLI command that you can run.

Launching the EMR cluster and running the tests

Now you have configured Kerberos and EMRFS authorization for Amazon S3.

Additionally, you need to configure Hue with Active Directory using the Amazon EMR configuration API in order to log in using the AD users created before. The following is an example of Hue AD configuration.

[
  {
    "Classification":"hue-ini",
    "Properties":{

    },
    "Configurations":[
      {
        "Classification":"desktop",
        "Properties":{

        },
        "Configurations":[
          {
            "Classification":"ldap",
            "Properties":{

            },
            "Configurations":[
              {
                "Classification":"ldap_servers",
                "Properties":{

                },
                "Configurations":[
                  {
                    "Classification":"AWS",
                    "Properties":{
                      "base_dn":"DC=emrkrb,DC=test,DC=com",
                      "ldap_url":"ldap://emrkrb.test.com",
                      "search_bind_authentication":"false",
                      "bind_dn":"CN=adjoiner,CN=users,DC=emrkrb,DC=test,DC=com",
                      "bind_password":"Abc123456",
                      "create_users_on_login":"true",
                      "nt_domain":"emrkrb.test.com"
                    },
                    "Configurations":[

                    ]
                  }
                ]
              }
            ]
          },
          {
            "Classification":"auth",
            "Properties":{
              "backend":"desktop.auth.backend.LdapBackend"
            },
            "Configurations":[

            ]
          }
        ]
      }
    ]
  }

Note: In the preceding configuration JSON file, change the values as required before pasting it into the software setting section in the Amazon EMR console.

Now let’s use this configuration and the security configuration you created before to launch the cluster.

In the Amazon EMR console, choose Create cluster. Then choose Go to advanced options. On the Step1: Software and Steps page, under Edit software settings (optional), paste the configuration in the box.

The rest of the setup is the same as an ordinary cluster setup, except in the Security Options section. In Step 4: Security, under Permissions, choose Custom, and then choose the RestrictedRole that you created before.

Choose the appropriate subnets (these should meet the base requirement in order for a successful Active Directory join—see the Amazon EMR Management Guide for more details), and choose the appropriate security groups to make sure it talks to the Active Directory. Choose a key so that you can log in and configure the cluster.

Most importantly, choose the security configuration that you created earlier to enable Kerberos and EMRFS authorization for Amazon S3.

You can use the following AWS CLI command to create a cluster.

aws emr create-cluster --name "TestEMRFSAuthorization" \ 
--release-label emr-5.10.0 \ --instance-type m3.xlarge \ 
--instance-count 3 \ 
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair \ --service-role EMR_DefaultRole \ 
--security-configuration MyKerberosConfig \ 
--configurations file://hue-config.json \
--applications Name=Hadoop Name=Hive Name=Hue Name=Spark \ 
--kerberos-attributes Realm=EC2.INTERNAL, \ KdcAdminPassword=<YourClusterKDCAdminPassword>, \ ADDomainJoinUser=<YourADUserLogonName>,ADDomainJoinPassword=<YourADUserPassword>, \ 
CrossRealmTrustPrincipalPassword=<MatchADTrustPwd>

Note: If you create the cluster using CLI, you need to save the JSON configuration for Hue into a file named hue-config.json and place it on the server where you run the CLI command.

After the cluster gets into the Waiting state, try to connect by using SSH into the cluster using the Active Directory user name and password.

ssh -l [email protected] <EMR IP or DNS name>

Quickly run two commands to show that the Active Directory join is successful:

  1. id [user name] shows the mapped AD users and groups in Linux.
  2. hdfs groups [user name] shows the mapped group in Hadoop.

Both should return the current Active Directory user and group information if the setup is correct.

Now, you can test the user mapping first. Log in with the admin1 user, and run a Hadoop list directory command:

hadoop fs -ls s3://emrfs-auth-data-science-bucket-demo/

Now switch to a user from the data engineer group.

Retry the previous command to access the admin’s bucket. It should throw an Amazon S3 Access Denied exception.

When you try listing the Amazon S3 bucket that a data engineer group member has accessed, it triggers the group mapping.

hadoop fs -ls s3://emrfs-auth-data-engineering-bucket-demo/

It successfully returns the listing results. Next we will test Apache Hive and then Apache Spark.

 

To run jobs successfully, you need to create a home directory for every user in HDFS for staging data under /user/<username>. Users can configure a step to create a home directory at cluster launch time for every user who has access to the cluster. In this example, you use Hue since Hue will create the home directory in HDFS for the user at the first login. Here Hue also needs to be integrated with the same Active Directory as explained in the example configuration described earlier.

First, log in to Hue as a data engineer user, and open a Hive Notebook in Hue. Then run a query to create a new table pointing to the data engineer bucket, s3://emrfs-auth-data-engineering-bucket-demo/table1_data_eng/.

You can see that the table was created successfully. Now try to create another table pointing to the data science group’s bucket, where the data engineer group doesn’t have access.

It failed and threw an Amazon S3 Access Denied error.

Now insert one line of data into the successfully create table.

Next, log out, switch to a data science group user, and create another table, test2_datasci_tb.

The creation is successful.

The last task is to test Spark (it requires the user directory, but Hue created one in the previous step).

Now let’s come back to the command line and run some Spark commands.

Login to the master node using the datascientist1 user:

Start the SparkSQL interactive shell by typing spark-sql, and run the show tables command. It should list the tables that you created using Hive.

As a data science group user, try select on both tables. You will find that you can only select the table defined in the location that your group has access to.

Conclusion

EMRFS authorization for Amazon S3 enables you to have multiple roles on the same cluster, providing flexibility to configure a shared cluster for different teams to achieve better efficiency. The Active Directory integration and group mapping make it much easier for you to manage your users and groups, and provides better auditability in a multi-tenant environment.


Additional Reading

If you found this post useful, be sure to check out Use Kerberos Authentication to Integrate Amazon EMR with Microsoft Active Directory and Launching and Running an Amazon EMR Cluster inside a VPC.


About the Authors

Songzhi Liu is a Big Data Consultant with AWS Professional Services. He works closely with AWS customers to provide them Big Data & Machine Learning solutions and best practices on the Amazon cloud.

 

 

 

 

How AWS Managed Microsoft AD Helps to Simplify the Deployment and Improve the Security of Active Directory–Integrated .NET Applications

Post Syndicated from Peter Pereira original https://aws.amazon.com/blogs/security/how-aws-managed-microsoft-ad-helps-to-simplify-the-deployment-and-improve-the-security-of-active-directory-integrated-net-applications/

Companies using .NET applications to access sensitive user information, such as employee salary, Social Security Number, and credit card information, need an easy and secure way to manage access for users and applications.

For example, let’s say that your company has a .NET payroll application. You want your Human Resources (HR) team to manage and update the payroll data for all the employees in your company. You also want your employees to be able to see their own payroll information in the application. To meet these requirements in a user-friendly and secure way, you want to manage access to the .NET application by using your existing Microsoft Active Directory identities. This enables you to provide users with single sign-on (SSO) access to the .NET application and to manage permissions using Active Directory groups. You also want the .NET application to authenticate itself to access the database, and to limit access to the data in the database based on the identity of the application user.

Microsoft Active Directory supports these requirements through group Managed Service Accounts (gMSAs) and Kerberos constrained delegation (KCD). AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD, enables you to manage gMSAs and KCD through your administrative account, helping you to migrate and develop .NET applications that need these native Active Directory features.

In this blog post, I give an overview of how to use AWS Managed Microsoft AD to manage gMSAs and KCD and demonstrate how you can configure a gMSA and KCD in six steps for a .NET application:

  1. Create your AWS Managed Microsoft AD.
  2. Create your Amazon RDS for SQL Server database.
  3. Create a gMSA for your .NET application.
  4. Deploy your .NET application.
  5. Configure your .NET application to use the gMSA.
  6. Configure KCD for your .NET application.

Solution overview

The following diagram shows the components of a .NET application that uses Amazon RDS for SQL Server with a gMSA and KCD. The diagram also illustrates authentication and access and is numbered to show the six key steps required to use a gMSA and KCD. To deploy this solution, the AWS Managed Microsoft AD directory must be in the same Amazon Virtual Private Cloud (VPC) as RDS for SQL Server. For this example, my company name is Example Corp., and my directory uses the domain name, example.com.

Diagram showing the components of a .NET application that uses Amazon RDS for SQL Server with a gMSA and KCD

Deploy the solution

The following six steps (numbered to correlate with the preceding diagram) walk you through configuring and using a gMSA and KCD.

1. Create your AWS Managed Microsoft AD directory

Using the Directory Service console, create your AWS Managed Microsoft AD directory in your Amazon VPC. In my example, my domain name is example.com.

Image of creating an AWS Managed Microsoft AD directory in an Amazon VPC

2. Create your Amazon RDS for SQL Server database

Using the RDS console, create your Amazon RDS for SQL Server database instance in the same Amazon VPC where your directory is running, and enable Windows Authentication. To enable Windows Authentication, select your directory in the Microsoft SQL Server Windows Authentication section in the Configure Advanced Settings step of the database creation workflow (see the following screenshot).

In my example, I create my Amazon RDS for SQL Server db-example database, and enable Windows Authentication to allow my db-example database to authenticate against my example.com directory.

Screenshot of configuring advanced settings

3. Create a gMSA for your .NET application

Now that you have deployed your directory, database, and application, you can create a gMSA for your .NET application.

To perform the next steps, you must install the Active Directory administration tools on a Windows server that is joined to your AWS Managed Microsoft AD directory domain. If you do not have a Windows server joined to your directory domain, you can deploy a new Amazon EC2 for Microsoft Windows Server instance and join it to your directory domain.

To create a gMSA for your .NET application:

  1. Log on to the instance on which you installed the Active Directory administration tools by using a user that is a member of the Admins security group or the Managed Service Accounts Admins security group in your organizational unit (OU). For my example, I use the Admin user in the example OU.

Screenshot of logging on to the instance on which you installed the Active Directory administration tools

  1. Identify which .NET application servers (hosts) will run your .NET application. Create a new security group in your OU and add your .NET application servers as members of this new group. This allows a group of application servers to use a single gMSA, instead of creating one gMSA for each server. In my example, I create a group, App_server_grp, in my example OU. I also add Appserver1, which is my .NET application server computer name, as a member of this new group.

Screenshot of creating a new security group

  1. Create a gMSA in your directory by running Windows PowerShell from the Start menu. The basic syntax to create the gMSA at the Windows PowerShell command prompt follows.
    PS C:\Users\admin> New-ADServiceAccount -name [gMSAname] -DNSHostName [domainname] -PrincipalsAllowedToRetrieveManagedPassword [AppServersSecurityGroup] -TrustedForDelegation $truedn <Enter>

    In my example, the gMSAname is gMSAexample, the DNSHostName is example.com, and the PrincipalsAllowedToRetrieveManagedPassword is the recently created security group, App_server_grp.

    PS C:\Users\admin> New-ADServiceAccount -name gMSAexample -DNSHostName example.com -PrincipalsAllowedToRetrieveManagedPassword App_server_grp -TrustedForDelegation $truedn <Enter>

    To confirm you created the gMSA, you can run the Get-ADServiceAccount command from the PowerShell command prompt.

    PS C:\Users\admin> Get-ADServiceAccount gMSAexample <Enter>
    
    DistinguishedName : CN=gMSAexample,CN=Managed Service Accounts,DC=example,DC=com
    Enabled           : True
    Name              : gMSAexample
    ObjectClass       : msDS-GroupManagedServiceAccount
    ObjectGUID        : 24d8b68d-36d5-4dc3-b0a9-edbbb5dc8a5b
    SamAccountName    : gMSAexample$
    SID               : S-1-5-21-2100421304-991410377-951759617-1603
    UserPrincipalName :

    You also can confirm you created the gMSA by opening the Active Directory Users and Computers utility located in your Administrative Tools folder, expand the domain (example.com in my case), and expand the Managed Service Accounts folder.
    Screenshot of confirming the creation of the gMSA

4. Deploy your .NET application

Deploy your .NET application on IIS on Amazon EC2 for Windows Server instances. For this step, I assume you are the application’s expert and already know how to deploy it. Make sure that all of your instances are joined to your directory.

5. Configure your .NET application to use the gMSA

You can configure your .NET application to use the gMSA to enforce strong password security policy and ensure password rotation of your service account. This helps to improve the security and simplify the management of your .NET application. Configure your .NET application in two steps:

  1. Grant to gMSA the required permissions to run your .NET application in the respective application folders. This is a critical step because when you change the application pool identity account to use gMSA, downtime can occur if the gMSA does not have the application’s required permissions. Therefore, make sure you first test the configurations in your development and test environments.
  2. Configure your application pool identity on IIS to use the gMSA as the service account. When you configure a gMSA as the service account, you include the $ at the end of the gMSA name. You do not need to provide a password because AWS Managed Microsoft AD automatically creates and rotates the password. In my example, my service account is gMSAexample$, as shown in the following screenshot.

Screenshot of configuring application pool identity

You have completed all the steps to use gMSA to create and rotate your .NET application service account password! Now, you will configure KCD for your .NET application.

6. Configure KCD for your .NET application

You now are ready to allow your .NET application to have access to other services by using the user identity’s permissions instead of the application service account’s permissions. Note that KCD and gMSA are independent features, which means you do not have to create a gMSA to use KCD. For this example, I am using both features to show how you can use them together. To configure a regular service account such as a user or local built-in account, see the Kerberos constrained delegation with ASP.NET blog post on MSDN.

In my example, my goal is to delegate to the gMSAexample account the ability to enforce the user’s permissions to my db-example SQL Server database, instead of the gMSAexample account’s permissions. For this, I have to update the msDS-AllowedToDelegateTo gMSA attribute. The value for this attribute is the service principal name (SPN) of the service instance that you are targeting, which in this case is the db-example Amazon RDS for SQL Server database.

The SPN format for the msDS-AllowedToDelegateTo attribute is a combination of the service class, the Kerberos authentication endpoint, and the port number. The Amazon RDS for SQL Server Kerberos authentication endpoint format is [database_name].[domain_name]. The value for my msDS-AllowedToDelegateTo attribute is MSSQLSvc/db-example.example.com:1433, where MSSQLSvc and 1433 are the SQL Server Database service class and port number standards, respectively.

Follow these steps to perform the msDS-AllowedToDelegateTo gMSA attribute configuration:

  1. Log on to your Active Directory management instance with a user identity that is a member of the Kerberos Delegation Admins security group. In this case, I will use admin.
  2. Open the Active Directory Users and Groups utility located in your Administrative Tools folder, choose View, and then choose Advanced Features.
  3. Expand your domain name (example.com in this example), and then choose the Managed Service Accounts security group. Right-click the gMSA account for the application pool you want to enable for Kerberos delegation, choose Properties, and choose the Attribute Editor tab.
  4. Search for the msDS-AllowedToDelegateTo attribute on the Attribute Editor tab and choose Edit.
  5. Enter the MSSQLSvc/db-example.example.com:1433 value and choose Add.
    Screenshot of entering the value of the multi-valued string
  6. Choose OK and Apply, and your KCD configuration is complete.

Congratulations! At this point, your application is using a gMSA rather than an embedded static user identity and password, and the application is able to access SQL Server using the identity of the application user. The gMSA eliminates the need for you to rotate the application’s password manually, and it allows you to better scope permissions for the application. When you use KCD, you can enforce access to your database consistently based on user identities at the database level, which prevents improper access that might otherwise occur because of an application error.

Summary

In this blog post, I demonstrated how to simplify the deployment and improve the security of your .NET application by using a group Managed Service Account and Kerberos constrained delegation with your AWS Managed Microsoft AD directory. I also outlined the main steps to get your .NET environment up and running on a managed Active Directory and SQL Server infrastructure. This approach will make it easier for you to build new .NET applications in the AWS Cloud or migrate existing ones in a more secure way.

For additional information about using group Managed Service Accounts and Kerberos constrained delegation with your AWS Managed Microsoft AD directory, see the AWS Directory Service documentation.

To learn more about AWS Directory Service, see the AWS Directory Service home page. If you have questions about this post or its solution, start a new thread on the Directory Service forum.

– Peter

Introducing AWS Directory Service for Microsoft Active Directory (Standard Edition)

Post Syndicated from Peter Pereira original https://aws.amazon.com/blogs/security/introducing-aws-directory-service-for-microsoft-active-directory-standard-edition/

Today, AWS introduced AWS Directory Service for Microsoft Active Directory (Standard Edition), also known as AWS Microsoft AD (Standard Edition), which is managed Microsoft Active Directory (AD) that is performance optimized for small and midsize businesses. AWS Microsoft AD (Standard Edition) offers you a highly available and cost-effective primary directory in the AWS Cloud that you can use to manage users, groups, and computers. It enables you to join Amazon EC2 instances to your domain easily and supports many AWS and third-party applications and services. It also can support most of the common use cases of small and midsize businesses. When you use AWS Microsoft AD (Standard Edition) as your primary directory, you can manage access and provide single sign-on (SSO) to cloud applications such as Microsoft Office 365. If you have an existing Microsoft AD directory, you can also use AWS Microsoft AD (Standard Edition) as a resource forest that contains primarily computers and groups, allowing you to migrate your AD-aware applications to the AWS Cloud while using existing on-premises AD credentials.

In this blog post, I help you get started by answering three main questions about AWS Microsoft AD (Standard Edition):

  1. What do I get?
  2. How can I use it?
  3. What are the key features?

After answering these questions, I show how you can get started with creating and using your own AWS Microsoft AD (Standard Edition) directory.

1. What do I get?

When you create an AWS Microsoft AD (Standard Edition) directory, AWS deploys two Microsoft AD domain controllers powered by Microsoft Windows Server 2012 R2 in your Amazon Virtual Private Cloud (VPC). To help deliver high availability, the domain controllers run in different Availability Zones in the AWS Region of your choice.

As a managed service, AWS Microsoft AD (Standard Edition) configures directory replication, automates daily snapshots, and handles all patching and software updates. In addition, AWS Microsoft AD (Standard Edition) monitors and automatically recovers domain controllers in the event of a failure.

AWS Microsoft AD (Standard Edition) has been optimized as a primary directory for small and midsize businesses with the capacity to support approximately 5,000 employees. With 1 GB of directory object storage, AWS Microsoft AD (Standard Edition) has the capacity to store 30,000 or more total directory objects (users, groups, and computers). AWS Microsoft AD (Standard Edition) also gives you the option to add domain controllers to meet the specific performance demands of your applications. You also can use AWS Microsoft AD (Standard Edition) as a resource forest with a trust relationship to your on-premises directory.

2. How can I use it?

With AWS Microsoft AD (Standard Edition), you can share a single directory for multiple use cases. For example, you can share a directory to authenticate and authorize access for .NET applications, Amazon RDS for SQL Server with Windows Authentication enabled, and Amazon Chime for messaging and video conferencing.

The following diagram shows some of the use cases for your AWS Microsoft AD (Standard Edition) directory, including the ability to grant your users access to external cloud applications and allow your on-premises AD users to manage and have access to resources in the AWS Cloud. Click the diagram to see a larger version.

Diagram showing some ways you can use AWS Microsoft AD (Standard Edition)--click the diagram to see a larger version

Use case 1: Sign in to AWS applications and services with AD credentials

You can enable multiple AWS applications and services such as the AWS Management Console, Amazon WorkSpaces, and Amazon RDS for SQL Server to use your AWS Microsoft AD (Standard Edition) directory. When you enable an AWS application or service in your directory, your users can access the application or service with their AD credentials.

For example, you can enable your users to sign in to the AWS Management Console with their AD credentials. To do this, you enable the AWS Management Console as an application in your directory, and then assign your AD users and groups to IAM roles. When your users sign in to the AWS Management Console, they assume an IAM role to manage AWS resources. This makes it easy for you to grant your users access to the AWS Management Console without needing to configure and manage a separate SAML infrastructure.

Use case 2: Manage Amazon EC2 instances

Using familiar AD administration tools, you can apply AD Group Policy objects (GPOs) to centrally manage your Amazon EC2 for Windows or Linux instances by joining your instances to your AWS Microsoft AD (Standard Edition) domain.

In addition, your users can sign in to your instances with their AD credentials. This eliminates the need to use individual instance credentials or distribute private key (PEM) files. This makes it easier for you to instantly grant or revoke access to users by using AD user administration tools you already use.

Use case 3: Provide directory services to your AD-aware workloads

AWS Microsoft AD (Standard Edition) is an actual Microsoft AD that enables you to run traditional AD-aware workloads such as Remote Desktop Licensing Manager, Microsoft SharePoint, and Microsoft SQL Server Always On in the AWS Cloud. AWS Microsoft AD (Standard Edition) also helps you to simplify and improve the security of AD-integrated .NET applications by using group Managed Service Accounts (gMSAs) and Kerberos constrained delegation (KCD).

Use case 4: SSO to Office 365 and other cloud applications

You can use AWS Microsoft AD (Standard Edition) to provide SSO for cloud applications. You can use Azure AD Connect to synchronize your users into Azure AD, and then use Active Directory Federation Services (AD FS) so that your users can access Microsoft Office 365 and other SAML 2.0 cloud applications by using their AD credentials.

Use case 5: Extend your on-premises AD to the AWS Cloud

If you already have an AD infrastructure and want to use it when migrating AD-aware workloads to the AWS Cloud, AWS Microsoft AD (Standard Edition) can help. You can use AD trusts to connect AWS Microsoft AD (Standard Edition) to your existing AD. This means your users can access AD-aware and AWS applications with their on-premises AD credentials, without needing you to synchronize users, groups, or passwords.

For example, your users can sign in to the AWS Management Console and Amazon WorkSpaces by using their existing AD user names and passwords. Also, when you use AD-aware applications such as SharePoint with AWS Microsoft AD (Standard Edition), your logged-in Windows users can access these applications without needing to enter credentials again.

3. What are the key features?

AWS Microsoft AD (Standard Edition) includes the features detailed in this section.

Extend your AD schema

With AWS Microsoft AD, you can run customized AD-integrated applications that require changes to your directory schema, which defines the structures of your directory. The schema is composed of object classes such as user objects, which contain attributes such as user names. AWS Microsoft AD lets you extend the schema by adding new AD attributes or object classes that are not present in the core AD attributes and classes.

For example, if you have a human resources application that uses employee badge color to assign specific benefits, you can extend the schema to include a badge color attribute in the user object class of your directory. To learn more, see How to Move More Custom Applications to the AWS Cloud with AWS Directory Service.

Create user-specific password policies

With user-specific password policies, you can apply specific restrictions and account lockout policies to different types of users in your AWS Microsoft AD (Standard Edition) domain. For example, you can enforce strong passwords and frequent password change policies for administrators, and use less-restrictive policies with moderate account lockout policies for general users.

Add domain controllers

You can increase the performance and redundancy of your directory by adding domain controllers. This can help improve application performance by enabling directory clients to load-balance their requests across a larger number of domain controllers.

Encrypt directory traffic

You can use AWS Microsoft AD (Standard Edition) to encrypt Lightweight Directory Access Protocol (LDAP) communication between your applications and your directory. By enabling LDAP over Secure Sockets Layer (SSL)/Transport Layer Security (TLS), also called LDAPS, you encrypt your LDAP communications end to end. This helps you to protect sensitive information you keep in your directory when it is accessed over untrusted networks.

Improve the security of signing in to AWS services by using multi-factor authentication (MFA)

You can improve the security of signing in to AWS services, such as Amazon WorkSpaces and Amazon QuickSight, by enabling MFA in your AWS Microsoft AD (Standard Edition) directory. With MFA, your users must enter a one-time passcode (OTP) in addition to their AD user names and passwords to access AWS applications and services you enable in AWS Microsoft AD (Standard Edition).

Get started

To get started, use the Directory Service console to create your first directory with just a few clicks. If you have not used Directory Service before, you may be eligible for a 30-day limited free trial.

Summary

In this blog post, I explained what AWS Microsoft AD (Standard Edition) is and how you can use it. With a single directory, you can address many use cases for your business, making it easier to migrate and run your AD-aware workloads in the AWS Cloud, provide access to AWS applications and services, and connect to other cloud applications. To learn more about AWS Microsoft AD, see the Directory Service home page.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about this blog post, start a new thread on the Directory Service forum.

– Peter

Samba 4.7.0 released

Post Syndicated from corbet original https://lwn.net/Articles/734639/rss

The Samba 4.7.0 release is out. New features include whole DB read locks
(a reliability improvement), active directory with Kerberos support,
detailed audit trails for authentication and authorization activities, a
multi-process LDAP server, better read-only domain controller support, and
more. See the release
notes
for details.

PowerMemory – Exploit Windows Credentials In Memory

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/xZfoefLWcs0/

PowerMemory is a PowerShell based tool to exploit Windows credentials present in files and memory, it levers Microsoft signed binaries to hack Windows. The method is totally new. It proves that it can be extremely easy to get credentials or any other information from Windows memory without needing to code in C-type languages. In addition,…

Read the full post at darknet.org.uk

Някои идеи за електронното гласуване

Post Syndicated from Delian Delchev original http://feedproxy.google.com/~r/delian/~3/3SzV7avcgtQ/blog-post_24.html

Чета постоянно истерия от представители или почитатели на партии с твърд пенсионерски електорат или партии разчитащи на по малко грамотни граждани срещу електронното гласуване, как то нямало да бъде сигурно.


Аз смятам, че това е най смешната теза на противниците на електронното гласуване преди референдума. Дали електронното гласуване ще бъде сигурно или не, зависи единствено и само от това как ще бъде реализирано. Това е технически проблем, който няма отношение към това дали ще го има или не. Електронното гласуване може да бъде много сигурно, всъщност може да бъде значително по-сигурно отколкото настоящето ни не електронно присъствено гласуване. Но също така може да бъде направено и много несигурно, по малко сигурно отколкото е сегашното ни присъствено гласуване.
Няма принципна причина, която да прави дистанционното гласуване несигурно, различна от избраните процедури за провеждането му. Но това няма отношение към дискусията по референдума – да има или да няма електронно гласуване. Това са техническите детайли, които трябва да се обсъждат открито и активно, ако референдума е положителен и предизвика публична дускусия, която да доведе до промяна на законодателството, която да разреши да има електронно гласуване.


За мен технологичните дискусии на този етап са преждевременни. Те са еквивалентни на дискусиите – дайте да нямаме гласуване въобще, защото може да се подправя. По добре ли ще ни е с един неподправен цар или дикатура на партията? 🙂 Няма съществена разлика в тезите. И вероятно за това царските движения и почитатели на диктатурата на партии, са основните противници на електронното гласуване.


Това не значи, че бягам от обсъждане на технологии. Това ми е страст. В този текст, искам да опиша един процедурен модел, как може да бъде реализирано електронно гласуване сигурно и анонимно. Това не е предложение как да бъде реализирано. Нито твърдение, че ще бъде реализирано точно така. Това е само илюстрация, че може. Технологичната дискусия е въпрос за обсъждане след като бъде взето решение да има електронно гласуване. Там трябва да сме внимателни, за да няма издънки от държавата. Но същевременно, не можем да смятаме, че държавата цели или съществува само за да прави издънки, защото иначе какъв е смисъла от нея? Най добре да дойде друг цар и господар, който да ни управлява нали? Ах, да, рязясних още една от тезите на противниците 🙂


Всъщност няма значение дали говорим за електронно или хартиено, присъствено или дистанционно гласуване, всяко нещо може да бъде реализирано сигурно или несигурно, като основният момент в него никога не е технологията по същество, а алгоритъма – процедурата използвана за реализацията.


В този смисъл нека за момент да не говорим за технология. Нека говорим за процедури-процедура.
И аз нямам за цел, да кажа, че дадена процедура за дистанционно гласуване гарантира абсолютна сигурност. Аз искам само да отбележа, че във всеки един момент, тя е винаги по-добра и по-сигурна отколкото моментната процедура за присъствено гласуване.


Изискванията към гласуването, без значение дали е присъствено или дистанционно, са да бъде лично, тайно, и гарантирано (тоест да не може да се подмени вота).


Тук не става въпрос за електронно гласуване или не. Можем да имаме електронно безприсъствено гласуване и електронно присъствено (което се експериментира от години при вотовете у нас, наричано машинно гласуване, заради помощ от машина), технологията е от не особено съществено значение. Основният въпрос е в процедурите за прилагане (включително и на технология).


Процедурата, която коментирам, е приложима за присъствено и безприсъствено гласуване, машинно или с бюлетини. Подобна процедура вече се разработва и в подходяща законодателна форма от друга група ентусиасти, но основните идеи (макар и да изказвам различно някой специфични детайли) са същите.


При присъственото гласуване, “личното” се гарантира типично чрез механизмите за аутентикация предвидени за личната карта. Ако снимката ти отговаря на личната карта, и ти имаш доверие на картата и издателят и, то ти приемаш (но няма абсолютна гаранция), че лицето е това, за което се представя и идва да гласува лично.


Тайната на вота се гарантира от безименни бюлетини и от гласуване в (макар и) обществена стая, такава в която трети страни твърдят, че е сигурно, че няма никой друг (но няма абсолютна гаранция).


Сигурноста на вота трябва да се гарантира от това, че няма как друга бюлетина да е влязла в урната, освен такава от гласуващи. Бюлетината не може пък да се подмени, защото е пазена зорко, от хора, които се предполага, че са врагове и ще се дебнат един друг (изборната комисия съставена от представители на различни партии).


Ние добре знаем обаче, че над 1 на 10000 лични карти са фалшифицирани.
Знаем, за мъртвите души – хора, които имат право да гласуват, и са в списъците за гласоподаватели, но са починали, и се използват за да се позволи да има останали свободни бюлетини добавяни в последният момент в урните, без това да създава подозрения при централното преброяване, тъй като броят на бюлетините е под или около очакваната бройка.
Само сега, за тези избори са премахнати над 500000 мъртви души (заради правилото за уседналост), но това не са всички, и ни дават чудесен индикатор колко неточни са всъщност изборните списъци. На по-предни избори бяха премахнати други 700000. Но постоянната урбанизация (при местните избори има изискване за уседналост), емиграция, динамика на смъртността, създават естествен процес на проява на мъртви души, тъй като списъците по места, физически няма как да са идеално актуални (макар централно да има как).

Знаем, че някой хора гласуват едновременно на две различни физически места, и поради невъзможност при присъственото гласуване да се подсигури в реално време национална проверка дали вече си гласувал или не, гласът се приема, а поради невъзможност за валидация на бюлетината по произход, не може да се премахнат двойни гласувания и следователно въпреки, че знаем за нарушението, то остава в преброеният резултат и какво ли още не.

Нещо повече, на някой места местни картели от представители в ИК си добавят бюлетини в последният момент, и дори да са преброени много над подписалите се, че са гласували, тъй като няма как да се разбере, кои са истински и кои не са, се броят всички и участват в преброяването. Общо, грешките (невалидни, дублирани бюлетини, мъртви души) плават между 5 и 15% на изборите и обикновено нямат съществено значение по отношение на цялостният изборен резултат. Но имат съществено значение при преразпределението на локалните мандати и при местните избори, където често един мандат (заради фрагментацията) или кмет се избират с по малко от 5000 гласа. Тези грешки, както и ниската избирателна активност имат значение и за партиите, които влизат в парламента или получават субсидия. За пример, на предните парламентарни избори само 50000 гласа по голяма изборна активност (под 1% от всички имащи право на глас) деляха дали АБВ и АТАКА ще влязат в парламента или не. Даже, нямаше значение за кого са гласували тези хора, тъй като става въпрос за фрагментацията от разпределението на мандати. Само ниска избирателна активност. Дали това е една от причините и двете партии да са твърди противници на всякакви методики за повишаване на изборната активност (от реклами, през задължително гласуване, до против електронното гласуване, всъщност АБВ официално е за електронно гласуване, но негови представители се изказват публично против) не знам.

Всички тези дефекти, и други, са породени от това, че няма въведен механизъм за двустранна валидация.
Ние знаем за тях от години, коментират се на всички избори, и постоянно променяме изборният кодекс така че да атакуваме един или друг проблем, до момента не особено ефективно.


Състемата за сигурност на присъственото гласуване разчита прекоменрно много на заплахи (глоби), политически присъствени изборни комисии (които ще се следят едни други поради естествената политическа конкуренция), и изборни наблюдатели за защита. Но с години, постепенно се съкращава количеството хора и партии имащи право да присъстват в изборна комисия, до парламентарно представени или до коалиционно партниращи си субекти, и така започват да се толерират естествени картелни съглашения, които отслабват значително присъственият контрол. Отделно независимите изборните наблюдатели се съкращават (сложна процедура за регистрация, и изискване да трябва да идват от политически партии предимно), а самите сътрудници или наблюдатели от своя страна злоупотребяват като гласоподаватели (предимно с дублиране на гласовете – пример ББЦ, БСП и ДПС на предни избори, те когато гласуват могат да заобикалят правилото за уседналост, което автоматично допуска злоупотреби с дублиране на гласовете, двойно гласуване, и други).


Нека да си представим обаче, следната хипотетична ситуация –


Че имаме разеделние между компонентите – лична оторизация, придобиване на право да гласуваш (бюлетина), асоциирането на бюлетина с вот (гласуване) и преброяване на гласовете. Всеки един от тези елементи се оторизира и проверява в отделна и независима организация, а нещо друго (например математиката?) ни гарантира съседна двустранна асоциация, но не и такава, която прескача съсед. Тогава много изкривявания (но не всички) на текущият модел биха били преодолени автоматично.


Обяснявам малко по детайлно – всичко това си го имаме и сега, но нека си представим че то е напълно отделено в отделни организации и процесите се случват независимо. Следните отделни и независими компоненти, мога да определя:


  1. Оторизация на едно място, че ти си си ти (еквивалент на издаване на личната карта) и получаване на съответният идентификатор за това (сега това е личната карта издадена от МВР, но може да бъде и електроне подпис или друго, от една организация)
  2. Оторизация, че имаш право да получиш бюлетина, с която да гласуваш (сега това го прави изборната комисия, проверяваща те в изборният списък и гарантираща, че ти си ти, чрез сертификата издаден от организация 1, в електронният можеш да получиш генериран математически случаен електронен сертификат, проверяем математически (може да е hash функция) и подписан със сертификата на тази втора организация, така че да е непроменим)
  3. Асоцииране на вот с бюлетина (сега го правиш пред изборната комисия, която подпечатва бюлетината ти, за да гарантира пред трети страни че е валидна, електронно може да е трета организация, например сайт, в който ти подписваш електронно избора си с полученият от 2 сертификат)
  4. Преброяване на бюлетините (прави го ЦИК, с под изпълнител типично ИО, в електронният свят това е преброяване и двойно валидиране на валидноста на електронната бюлетина, от четвърта организация)


Нека си представим преднатата система реализирана така (отново, технологията е примерна, не е важна, важна е процедурата и принципа) –
  1. Отиваш и си издаваш електронен подпис, който валидира, че ти си си ти. Той съдържа уникален твой сертификат валидиран от издателят на сертификата (всъщност днес това го правят всички клонове на банки, за 15 лева), което има това право (еквивалент на МВР). Електронният подпис е само пример. Може да бъде електронната ти идентификационна карта (личната ти карта, която ще можеш да получиш от 2017-та година нататък), може да е One-Time-Password система от друга форма. Няма голямо значение. Технологията не е важна, на този етап.
  2. Отиваш и получаваш електронна бюлетина срещу електронният си подпис, тя представлява да речем hash функционално число, подписано от твоят сертификат (но без идентификационна информация) или от алтернативен по-анонимен алгоритъм, като да речем модифициран DH (или уникален случаен частен ключ, и двете могат да дойдат пак през електронният ти подпис). Получената информация пък е подписана със сертификат от организацията издаваща електронните бюлетин (тук можем да правим вариации, може да имаш уникален публичен и частен ключ, генерирани случайно и дадени за теб например).
  3. Отиваш на сайта за гласуване, там с полученият частен ключ подписваш бюлетината и я пращаш обратно на сайта за гласуване, който я подписва със своят си сертификат (сертификатите играят роля на печатите при ИК) и предава веднага или по късно на организацията за преброяване (или пък я предава веднага на междинна организация, но тя я предава на организацията за преброяване след приключване на изборният ден).
  4. Организацията за преброяване получава всички сертификати. Тя има публичните ключове на (1), 2 и 3 (но не и частните). Като резултат тя може да чете информацията (но никой друг не може да чете освен своята си част) и да направи преброяване и пълна валидация. Отделно от 2 е получила списъка с генерираните хешове, има как да ги валидира математически (че са истински) и приема само тези сертификати, които съдържат коректен хеш.


Какъв е резултата от това разделение на отделни независими и несвързани организации:


Имаме невъзможност за генериране на фалшификати (подписванията при 2 и 3 стават локално при гласуващият, през мрежата пътуват и сайтовете получават само подписани сертификати), за да има фалшификат трябва някой да е компрометирал едновременно 1, 2 и 3 (което може да стане само при личният компютър на гласуващият, но дори той да е компрометиран, пак не може да стане лесно – опитайте се да подмените съдържанието на подписваната информация от електронният подпис в ръцете ви на собственият ви компютър. Ако успеете, ми се обадете).
Дори да приемем това за възможно при компютъра на гласуващият, то ще бъде невъзможно да бъде направено масово (няма как в ограничен период от време да повлияеш и да проконтролираш милиони компютри и милиони сертификати), което го прави по добре стоящо от сегашният модел (ЦИК публикува на всяко гласуване статистика показваща близо поне 100к невалидни или дублирани бюлетини, и то по доста консервативният механизъм на оценка, който имат). Също така, подмяната изисква интерактивно действие в момента, в който потребителят се оторизира. Ще ми бъде изключително интересно да чуя как може да стане при един масов електронен вот, в реално време, в рамките на изборният ден. Дори да имаме 1-2 случая на подмяна, те ще са много малко за да повлияят на резултата, и отделно точно електронното гласуване може да допусне механизъм как да се откриват и поправят.


Имаме и гаранция за тайна на вота. Защото само 1 знае дали имаш право да гласуваш и кой си ти. Само 2 издава бюлетина срещу информацията от 1 (но не е задължително да знае кой си, стига да знае, че имаш валиден сертификат, чиято валидация може да се подсигури анонимно), но не знае дали и за кой си гласувал. Само 3 може да знае (това е опционално, 3 може да не разполага със собственият си публичен ключ и да не може да знае), електронната бюлетина с кой вот е асоциирана, но не знае кой е гласувал. Само 4 може да преброи и да валидира бюлетините, пак без да знае кой е гласувал.
За да компрометираш тайната на вота, трябва да компрометираш всичките едновременно. Това е теоретично възможно за държавата да се организира и да го направи, но това е малко вероятно, защото първо може лесно да се установи (много по лесно от при физическото гласуване, понеже тук всички записи се пазят на всякъде), и второ това не е проблем да се прави и сега, така че за него няма технологичен проблем при присъственото гласуване, има принципен държавен и морален проблем, и не можем да кажем, че е нещо, което ще го докара електронното гласуване. Проблемът ако съществува, ще е проблем е на организацията, държавата и културата.


Тъй като подписите стават локално там където ги извършва гласуващият, то хакер хакнал по отделно 2 или 3 или 4 не може да генерира фалшиви подписи. Нито може да подпише (няма частните ключове или сертификатите при клиента от предната организация), нито да създаде валиден сертификат (защото той изисква участие от край до край). Теоретично е възможно да създаде масови фалшиви подписи ако хакне 1, но пък красотата на всичко тук, е че това може лесно да се установи (от ГРАО, в реално време или постфактум при проверката при 4) и всички гласове генерирани по този начин да се извадят от вота (да се инвалидизират хешовете от 2 или сертификатите от 1 и ще се получи инвалидизация при преброяването при 4), включително пост фактум, отново при запазване на пълна анонимност за гласуващият.


Дублирането е премахнато заради валидацията по активен хеш в организация 4. Но за дублирането ще кажа още нещо по-късно. Всеки електронно гласувал ще има само един единствен валиден вот.


Как гарантираме личното гласуване? Това в хипотезата ми го прави електронният подпис, но той може да бъде валидизиран (при 2) ако трябва от видео или снимка обаждане, и ще получи същата форма на валидизация, каквато има и при изборната комисия (да те видят че си приличаш).


Ами подслушването? Дори да гледаме логове и IP адреси, можем да догадаем че някой е гласувал, но не със сигирност кой е той точно, и определено не и за кого е гласувал. Ако криптираната информация е с еднаква дължина, дори статистически механизми не биха могли да помогнат да се различи от подслушване, кой или колко за кого са гласували. Този отговор може да даде само организация номер 4.
Защитата, която имаме тук е дори значително по-добра, при това дори при пълна откритост, отколкото тази, която имаме при присъственото гласуване. Ако например журналист следи пред сградата с изборни секции и да снима кой влиза и излиза (а това го виждаме по телевизията на всички избори), той получава много по-голяма и точна информация, отколкото ако подслушвате мрежата за електронно гласуване.


Как ще гарантираме, че подписите не се дават на други хора? Гаранция, че не си си дал подписа (паспорта) на някой друг е същата, каквато е и при нормалното гласуване. Никаква. Но отново, много е трудно и малко вероятно това да стане масово, и може да се валидира с видео обаждане (или заснемане при получаване на правото за гласуване).


Звучи сложно? На пръв поглед технологията изглежда сложна и многостъпкова. Сигурно ще е трудно за потребителите да я следват? Всъщност не. Идеята с разделението и многото подписвания не е нова и не е случайна. Всъщност тя се изпълнява от ИК и сега, в присъственото гласуване (личната ти карта е сертификат/електронен подпис издаден от МВР, бюлетина с воден знак е сертификат от издателя на бюлетините, проверката по ИК и лична карта и подписа в избирателният списък е еквивалента на получаването на право за гласуване е моята стъпка 2, гласуването в стаичаката и последващият печат върху бюлетината е еквивалента на стъпка 3, валидацията на печатите, и водният знак бюлетините и съдържанието от ЦИК и ИО е еквивалента на стъпка 4). Тази технология с 3-ен подпис е класическата технология използвана от технологията за оторизация KERBEROS на MIT, която до сега не е разбивана, а всъщност е изключително масово употребявана (Active Directory оторизацията в Windows е базирана на нея). Използва се и от свръх популярната OAUTH2 система за оторизация. За потребителят изглежда че попълва данните си само на едно място, но отзад има двойна (или дори тройна) оторизация, и нито една организация не разполага с цялата и пълната информация за личните данни на участника. По важното е, че потребителите дори не разбират как става, и не се налага да се логват на 3 сайта едновременно. Но това не премахва гаранциите за сигурността.


Компрометирани броусъри, операционни системи, троянски коне не компрометират автоматично например електронният подпис (всъщност всички хакове, за които знаем не компрометират системата по същество, а само дебнат ситуация в която потребителят подписва в реално време и се опитват да изземат сесията. Дори да заподозрем 1-2 такива случая, това е невъзможно да се изпълни масово в изборният ден по начин по който да се повлияе на изборният резултат).


За дублиранията – тъй като имаме вторична анонимна проверка (при 4), потребителят може да гласува безброй пъти анонимно (издавайки си безброй бюлетини), и да му се брои само последният вот. Това не само не е недостатък, но може да е и предимство. Дори да бъде принуден от някой “да се гласува правилно” пред него, по късно потребителят може да гласува пак, и да инвалидизира предното си гласуване. Така насилственият вот може да стане много по труден (но не и продаденият, но той не може да бъде преборен с технология – това е лично решение), тъй като хората ще имат алтернатива. Особено в малките населени места локалното влияние (в изборната секция със заплаха от кмета, както често се случва) може да стане много трудно, тъй като хората могат да отидат и пак да си гласуват валидно – другаде.


Тъй като само потребителят може да е в състояние да прочете информацията от подписите си, то системата може да бъде направена така, че потребителят да може да провери как е гласувал (ако се предостави такава услуга от 4) и да открие фалшификации, хакове, или да инвалидизира гласа си. Нещо повече, никой няма да бъде в състояние да му попречи да го направи, или да разбере, че това се е случило, без да има абсолютно цялата информация от 1,2,3 и 4 (а пък ако дори само един се оплаче, това лесно ще се открива проблема и ще се хваща виновника. При добре работеща полиция, много бързо и точно ще се излавят хората опитващи да правят компрометиране на системата).


Обърнете внимание, технологиите не са толкова важни (ползвам публични-частни ключове, защото са добра и позната илюстрация), а процедурата и разделението са важни. Ние следваме същата процедура и сега, но нямаме разделение, просто всичко е концентрирано на едно място и не се проверява вторично. Така там където е концентрацията (ИК) може да се правят фалшификации, а поради липсата на вторична проверка, не могат да се изфилтрират. Също така в приръственото гласуване днес използваме значително по-лесни за фалшификация идентификатори (номера на кочани, водни знаци), отколкото в една електронна система.


Аз лично не само че не виждам по големи недостатъци при така описаната по горе процедура, за дистанционно електронно гласуване, отколкото при сегашният ни физически модел, но и виждам адресирани едни от най важните технически проблеми при сегашното гласуване – повишаване на изборната активност (с което пък адресираме изкривяванията, които имаме от ниската изборателна активност) поради улесняване, премахване на възможностите за валидно преброяване на дубликати, и също така създаване на алтернативи за запазване на гражданските права и личният избор, с което при добра комуникация с избирателите, можем директно да атакуваме (минимум с оскъпяване на) платеният вот и (да обезсмислим) феодализираният вот.


До момента няма нито един случай, на наистина хакната система за електронно гласуване или за електронно банкиране. Нямаме познат случай, при който да е хакнат електронен подпис (но знаем за хиляди подправени паспорти). Нямаме случай, в който да е хакнат алгоритъм за оторизация.
Имаме случаи на откраднати (физически) електронни подписи (но както и при личният паспорт, те могат да се инвалидизират и за разлика от личният паспорт инвалидизацията може да е мигновенна). Имаме случаи на откраднати статични сертификати. Имаме случаи на пробити банкови системи по друг начин (достъп до софтуера управляващ трансферите на парите). Имаме класически Denial of Service атаки, които блокират или забавят скороста на работа на системите. Но те не нарушават анонимноста или оторизационната система на потребителите, а са плод на грешен дизайн в други части от софтуера. Така че не могат да се ползват като доказателство за несигурност. Сигурността за личноста и решенията накрайните потребители си се запазва. Но некадърността на системите си е друг проблем.


В предложената от мен схема, нито една организация и дори нито една произволна комбинация от две организации, не може да генерира фалшива система за оторизация, фалшива бюлетина, фалшива асоциация с вот, които да минат на проверка при номер 4. Всички те само ги записват, подписванията прави само потребителят локално.
Можем да имаме (умишлено) фалшиво преброяване при 4, но пък всички сертификати/бюлетини си стоят и подлежат на вторична проверка, така че при всяко подозрение, могат да се проверят и изловят. Доста по добре, отколкото при сегашният модел на присъствено гласуване.


Отделно, една от красотите на цялата работа е, че 1,2,3 и частично дори 4, могат да бъдат реализирани частно, и повече от веднъж едновременно. Можете да имате много издатели на оторизационни системи (електронен подпис), много издатели на електронни бюлетини, много сайтове за асоцииране на вот, много преброители. Това не само че не нарушава сигурността, но дори я увеличава. Потребител, който има подозрения, или ако направи проверка за собственият си вот, какво е гласувал и види несъвпадение, може да смени в рамките на изборният период (ден) и да гласува пак на друго място, и така да инвалидизира грешката, както и да заобиколи проблема. Обратно, полицията пък може да сваля фалшиви сайтове.


Фишинг атаки (сайтове за фалшиво гласувне) няма да сработят, тъй като не разполагат със сертификатите от публичните ключове, както за да вземат гласа на потребителя, така за да го пратят валидно към 3 или 4.


Единствената атака, която остава е DoS атака, към електронните места за гласуване, така че да бъдат блокирани и потребителите да не могат да гласуват и да се нервят. Но DoS атаките се решават лесно, с добре дистрибутирани и скалирани системи. Дори и без атака, системата може да бъде крайно бавна, защото е проектирана лошо (и тук мога да дам пример с Търговският Регистър или сайта за електронно преброяване на предните избори). Но това не е причина да няма електронно гласуване, това може да е причина да притиснем държавните организации да бъдат много по сериозни в разработката на софтуера си.


Послепис:
За незапознатите, ще кажа няколко думи за основни алгоритми в криптографията, които могат да се ползват (и всъщност се ползват във всички системи с електронни подписи, онлайн оторизация и електронно банкиране):


Challenge алгоритъм – Представете си че имате две страни които си комуникират през несигурна среда. Страна А иска да валидира, че страна Б е тази, за която се представя. Страна А знае, че страна Б ще се валидира с информация, която страна А знае (да кажем парола).
Когато Б иска да се представи, той моли А да му даде едно случайно число. А му изпраща случайно число. Б използва случайното число, за да криптира (или hash-не) информацията си, и я подава на А. А знае числото, знае алгоритъма за криптация (hash), и извършва действието на Б локално, след което сравнява резултата с това, което е получил от Б. Ако има съвпадение, значи Б е този, за когото се представя.
Ако някой подслушва мрежата, той знае че Б се опитва да се оторизира пред А. Но не знае паролата на Б, тя никога не минава некриптирана. Отделно не може да използва информацията, получена от Б за да се представи за Б някъде другаде, понеже друго случайно число ще бъде използвано в онази оторизация.
В резултат – имаме активна оторизация, която не може да се декриптира, и не може да се използва другаде или тук по друго време.


Асиметрични алгоритми за криптиране – Това са алгоритми с публични частни ключове. Идеята е проста – имаме математически алгоритми, които ни гарантират, с изключително високо ниво на сигурност, че имаме два ключа, наричани условно публичен и частен. С частният ключ можем да криптираме информация, която може да се декриптира само с публичният ключ. Частният не може да я декриптира. А по частният ключ не можем да създадем публичен и обратно. Така ако вземем частен ключ и криптираме с него някакъв текст, то знаем, че той е криптиран и подписан от този, за който се представя отсрещната страна, ако можем да му го декриптираме с публичният ключ. Без да имаме публичният ключ, не можем да декриптираме, без да е криптиран с реципрочният частен ключ, не можем да декриптираме с публичният. Този алгоритъм е хитър, защото можете да раздадете на приятелите си вашият публичен ключ, и само те ще могат да четат криптираната информация, която им пращате. Но никой друг, освен вас, няма да може да им пише и изпраща информация.


Електронен Сертификат – Двоен (или повече) подпис с публични и частни ключове. Пример – представете си, че вие си създадете своя двойка публичен и частен ключ. Частният си е винаги у вас. Но искате публичният да е достъпен за група хора или всички. Отивате в полицията (или Certificate Authority) и те проверяват, че вие сте вие. След което криптират с техният си частен ключ информация-текст, която казва – да, това е лицето Пешо Пешев, и неговият публичен ключ е този и този (записан вътре). Полученото нещо се нарича сертификат, и можете да го публикувате публично (или да го изпращате, с криптираната от вашият частен ключ поща).
Всеки, който получи вашият сертификат, ако има публичният ключ на полицията (CA) ще може да го прочете. И ще знае, че полицията (гарантирано от нейният частен ключ) твърди, че вие сте този, за когото се представяте, и вашият публичен ключ кой е. С него пък получателят ще декриптира вашата поща, и ще знае че тя е написана от вас, защото само вие имате вашият частен ключ.


Дифъл-Хелман алгоритъм – друга форма на асиметрично криптиране, при което участниците дори си разменят публично индикатори за ключовете, от които може да се регенерира публичният (или друг) ключ, но подслушващият не може да го направи.


Еднократна парола – частна форма на challenge алгоритъма, при която криптираната информация, която се обменя за валидация (паролата) се допълва от компонент, който е зависим от времето (или последователността на събитието), примерно времето е добавено към паролата. Така дори някой да подслуша и да се опита да кракне с речник паролата, няма да може, понеже само след да речем минута информацията е чисто нова, и старата инвалидизирана.


Електронен подпис – Това е комбинация от електронен сертификат и еднократна парола. През мрежата се транспортира сертификационна информация, модифицирана и криптирана с форма на еднократна парола. Може да бъде реализиран и по опростено (например чиповете на кредитните карти са с много по опростена форма на оторизация), и по сложно. Но това е избор на имплементацията, а не е проблем на технологията. Технологията може да бъде удивително сигурна, дори и всички публични ключове и информация, да бъдат свободно достъпни в мрежата.

Безспорно е, че нещо може да бъде реализирано несигурно. Но също така е безспорно, че може да бъде реализирано и сигурно. Това обаче не е въпрос на дискусията дали да има или да няма електронно гласуване. Това е дискусия, след като решим да има електронно гласуване, как точно да бъде направено то. Да казваш, по добре да няма, защото може да бъде направено лошо, е същото, като да отказваш да летиш със самолети, защото от време на време падат. Да, няма да паднеш със самолет, но не значи, че ще живееш вечно, или че самолет няма да ти падне на главата. Значи само, че ще си винаги по бавен от тези, които пътуват със самолет.

ППС:
Може по детайлно да го разпиша, но основната идея е в разделението. Ако си представим, че асоциацията на бюлетина с вот, е всъщност като действието по създаване на сертификат, само че имаме частен ключ генериран локално, и публичен ключ асоцииран с анонимен хеш, който половината да речем отива при асоциятора на вота и половината при преброителя директно, то асоциатора (3) може само да валидира информацията, но не и да знае кой е гласувал и какво. Преброителя може да валидира, но не и да знае кой е гласувал. Само потребителят, ще има възможност да провери, да чете и да (пре)гласува, локално при него.
Математиката допуска това. Така работят вече и много електронни системи за оторизации. Така че може да стане сигурно. Въпросът в действителност обаче никога не е бил технологичен, нали?