Не за пръв път, когато Навални публикува гръмко разследване (този път това е съвместното разследване с Bellingcat, CNN и The Insider) активно се лансират версиите, че това, разбира се, е било колаборация на службите. Защото нали уж не е по силите на група ентусиасти да съберат и проанализират такъв обем информация, необходими са специални навици, достъп до секретни данни и възможност за оперативна работа (проследяване/подслушване/видеонаблюдение и т.н.). Този път се появи версията, че всъщност германското разузнаване вече отдавна е знаело всичко.
В социалните мрежи горещо се коментира, че подобно разследване е невъзможно да се извърши по телефонни разпечатки. На практика западните разузнавания били направили разследването и го предоставили на Навални.
За обикновения човек тези версии изглеждат правдоподобни. Нали има спецслужби с много милиардни бюджети. Спецслужбите имат спецвъзможности. Те могат да наемат най-добрите от най-добрите, да внедряват агенти. И как така няколко души с бюджет в копейки редовно ги заобикалят?
Вероятно за някои това ще бъде изненада, но представата за мощта и аналитичните възможности на държавните служители е силно преувеличена, а за това какви данни сега са лесно достъпни за всеки човек е занижена.
Аз имам голям опит с анализа на различни изтекли бази данни. През 2005 г., когато на пазара се появи базата с трансфери на Централната банка, реших на тази основа да напиша дисертация (тогава бях докторант в Чикагския университет). Идентифицирах няколко десетки хиляди фирми-еднодневки и изчислих колко всяка компания в Русия (в това число “Газпром”, РЖД, РАО, ЕЭС) не плаща данъци и мами акционерите.
Когато за пръв път представих своите резултати, първият въпрос на научните ми ръководители беше: “Щом ти сам успя да направиш това за няколко месеца, при това в Чикаго, защо руската данъчна служба и Централната банка не могат?”.Аз нямах отговор на този въпрос. Когато резултатите от моята работа бяха публикувани в няколко руски медии, ме поканиха да се изкажа на държавна конференция за данъците. На среща ме покани и първият заместник-председател на Централната банка Андрей Козлов с идеята да му представя методиката си.
За съжаление, той бе убит месец преди нашата вероятна среща. Но фактът си е факт, за руската Централна банка най-малко от 1989 г. се трупат данни, които в реално време позволяват да се идентифицират всички фирми-еднодневки и да се изчисли размерът на отклоненията от данъците на всяка руска компания с точност до копейка. Руската държава не прави това, макар че в анализаторските отдели на банката, на финансовото министерство, в данъчната служба работят стотици служители на бюджетна издръжка.
След това изготвих още няколко научни проекти, като анализирах индивидуалните данни по “белите” работни заплати, шофьорските книжки, нарушенията на правилата за движение по пътищата, авариите, регистрациите, акционерите в компаниите и т.н. По тези данни може да се разкрият рушветите на губернаторите, на пътните полицаи, размерът на “черните” заплати и т.н. Някои от моите изследвания бяха публикувани във водещи световни научни списания.
Написах това не да се похваля колко съм як. Искам само да покажа, че дори човек с минимални ресурси (покупката на всички тези данни ми струваше 1 500 долара) може да направи дълбоко детайлно разследване.Например, от закупената от мен база данни през 2005 – 2008 г. аз знаех всичко за всеки московчанин – дата на раждане, регистрация, шофьорска книжка, автомобилите, които някога е притежавал, всички работни места, заплата по месеци, пътни нарушения, катастрофи, в които е участвал. Тези данни използвах за ето тази статия: (https://www.sciencedirect.com/science/article/abs/pii/S0304405X14000440)
Сега на пазара има далеч повече данни, отколкото когато аз започнах да се занимавам с моите изследвания. Поради това съм категоричен, че всяка аналитична работа, описана в разследването би могла да бъде извършена от един или няколко души за няколко хиляди долара. Притежавайки 15-годишен опит в анализа на различните държавни бази данни, аз не виждам нито един момент от разследването, който да не може да се направи самостоятелно или да опрат до помощта на спецслужбите. Нужни са просто креативност и аналитична мисъл.
Ние отдавна влязохме в епохата, когато човешките мозъци играят много по-важна роля, отколкото бюджетите и административният ресурс. Навални и колегите му във Фонда за борба с корупцията притежават изключителни мозъци. Bellingkat и The Insider също.
А при наличието на качествени мозъци в руските и чуждестранните спецслужби нещата стоят твърде зле. Те могат само да пускат мътни намеци, че всъщност отдавна са разследвали и са били наясно, но нали разбирате, че това е била секретна информация.
*** Статията на руски е публикувана в блога на Максим Миронов – икономист и професор по финанси. Роден е през 1980 г. в Новосибирск, където завършва Висшия колеж по информатика при държавния университет. После завършва икономика, следват две магистратури по финанси и икономика. През 2008 г. получава докторска степен в Чикагския университет. След завръщането си в Русия работи като директор по инвестициите в голяма компания, влиза в директорските бордове на водещи медии като “Аргументи и факти”, “Труд”, “Екстра-М”, “Медиа-Преса” и т.н. От 2009 г. преподава финанси в бизнес-училището при Институто де Емпреса в Мадрид. Миронов бе част от експертния съвет на кандидата за президент на Русия Алексей Навални.
Превод: Биволъ
Снимка: Христо Грозев (главният разследващ на Bellingcat, Юлия и Алексей Навални. Източник: Фейсбук профил на Навални)
Raspberry Pi computers have always been used in a huge variety of settings, since the combination of low cost, high performance, and ease of use make it an ideal device for almost any application. We’ve seen a large proportion of sales go into the industrial market – businesses using Raspberry Pi, rather than educational settings or individual consumers. Today we’re announcing new support for this group of customers: a dedicated area of our website for industry, and our Raspberry Pi Approved Design Partners programme, connecting businesses that want to integrate Raspberry Pi into their products with hand-picked design partners who can help.
The industrial market for Raspberry Pi has grown over the years, and now represents around 44% of our annual total sales. We’ve seen this borne out with new releases of Raspberry Pi products: typically sales of a consumer product drop off once a new product is released, but we still see incredible sales of older models of Raspberry Pi. Our inference is that these are destined for embedded applications, where changing to the latest model is not practical.
A new online resource for industry
To support Raspberry Pi’s industrial customers, we have developed a new, dedicated area of our website. Our For industry pages are the best place to go for industrial applications of Raspberry Pi. They provide access to the information and support you need when using our products in an industrial setting, with links to datasheets, compliance documents, and more.
As part of our commitment to industrial customers, we guarantee product lifetimes until at least 2026 on all products. We rarely ever end a product line – in fact, you can still buy Raspberry Pi 1 Model B+ from 2014. And we’ve made it easy for you to take a product through the necessary regulatory compliance steps, with the Raspberry Pi Integrator Programme.
Raspberry Pi Approved Design Partners
Along with our online resources for industry, we’re announcing a new programme to help customers who want to integrate Raspberry Pi into their products, and to recognise companies with specialist knowledge and proven expertise in designing with Raspberry Pi. The Raspberry Pi Approved Design Partners programme is a way of connecting trusted design consultancies with customers who need support designing Raspberry Pi computing solutions into their products.
We’re launching with a select set of designers whom we already know and work with, and we hope to grow this group over the coming years. If your company provides hardware, software, or mechanical design services with Raspberry Pi, and you’d like us to promote your offering on our website, you can find out more about applying to become a Raspberry Pi Approved Design Partner.
If you have a product or a piece of work that uses Raspberry Pi, and you need technical assistance, Raspberry Pi Approved Design Partners have the capacity to provide you with effective help. All our Design Partners have been through a rigorous application process, and we will monitor them regularly for quality and ability. You can be confident that Raspberry Pi Approved Design Partners have the backing of Raspberry Pi, and have access to a deep level of technical knowledge and support within Raspberry Pi.
We’re excited to help customers build fantastic products using Raspberry Pi, and we’re looking forward to working with a diverse range of designers across the world.
If you are tired of administering the infrastructure on your own and would prefer to gain time to focus on real monitoring activities rather than costly platform upgrades, you can easily lift and shift your MySQL-based Zabbix installation stack to Oracle Cloud.
The data is increasingly moving to the cloud — the consumer data followed by the enterprise data, as enterprises are always a bit slower in adopting technologies.
Data moving to the cloud
Oracle Cloud Infrastructure, OCI, is the 4th cloud provider in the Cloud Infrastructure Ranking of the Gartner Magic Quadrant based on ‘Completeness of Vision’ and ‘Ability to Execute’.
OCI is available in 26 regions and has 26 data centers across the world with 12 more planned.
26 Regions Live, 12+ Planned
24+ Industry and Regional Certifications
Moving Zabbix to Oracle Cloud
With Zabbix in the Oracle Cloud you can:
get the latest updates on the technology stack, minimizing downtime and service windows.
convert the time you spend managing your monitoring platform into the time you spend monitoring your platforms.
leverage the most secure and cost-effective cloud platform in the market, including security information and security updates made available by OCI.
Planning migration
To plan effective migration of the on-premise Zabbix instance with clients, proxies, management server, interface, and database, we need to migrate the last three instance components. Basically, we need:
the server configuration;
on-premise network topology to understand what can communicate with the outside or what would eventually go over VPN, that is, the network topology of client and proxies; and
the database.
Migration requirements
We also need to set up the following in the OCI tenancy:
MySQL Database System,
Compute instance for the Zabbix Server,
storage for database and backup,
networking/load balancing.
The target architecture involves setting up the VPN from your data center to the Oracle cloud tenancy and deploying the load balancer, the Zabbix server in redundancy over availability domains, and the MySQL database in a separate subnet.
Required Components:
• Cloud Networking,
• Zabbix Cloud Image,
• MySQL Database Service,
• VPN Connection for client/proxies.
Oracle Cloud target architecture for Zabbix
You can also have a lighter setup, for instance, with proxies communicating over TLS connections over the Internet or communicating directly with the Zabbix Server in the Oracle Cloud, and the Zabbix server interfacing with the database. Here, you will need fewer elements: server, database, and VCN.
Oracle Cloud target architecture for Zabbix — a simpler solution
Migrating Zabbix to Oracle Cloud
Zabbix migration to the Oracle Cloud is straightforward.
1. Before you begin:
set up tenancy and compartments,
set up cloud networking — public and private VCN.
2. Zabbix deployment on the VM:
select one-click deployment or DIY — use the official Zabbix OCI Marketplace Image or deploy an OCI Compute Instance and install manually,
choose the desired Compute ‘shape’ during deployment.
3. Configuration:
start the instance,
edit the config file,
point to the database with the IP address, username, and password (to do that, you’ll need to open several ports in the cloud network via the GUI).
The OCI infrastructure allows for multiple choices. The Zabbix Server is lightweight software requiring resources. In the majority of cases, a powerful VM will be enough. Otherwise, you’ll have the Oracle Cloud available.
Compute services for any enterprise use case
In the Oracle Cloud you’ll have the bare metal option — the physical machines dedicated to a single customer, Kubernetes container engine, and a lot of fast storage possibilities, which end up being quite cheap.
Migrating the database to MySQL Database Service
MySQL Database Service is the managed offer for MySQL in Oracle Cloud, fully developed, managed, and supported by the MySQL team. It is secure and provides the latest features as it leverages the Oracle Cloud, which has been rated by various sources as one of the most secure cloud platforms.
In addition, the platform is built on the MySQL Enterprise Edition binaries, so it is fully compatible with the platform you might be using. Finally, it costs way less on a yearly basis than a full-blown on-premise MySQL Enterprise subscription.
MySQL Database Service — 100% developed, managed, and supported by the MySQL team
Considerations before migration
Before you begin:
check your MySQL 8.0 compatibility,
check your database size (to assess the time needed to migrate), and
plan a service window.
High-level migration plan
Set up cloud networking.
Set up your (on-premise) networking secure connection (to communicate with the cloud).
Create MySQL Database Service DB System with storage.
Move the data using MySQL Shell Dump & Load utility.
Creating MySQL DB system with just a few clicks
Create a customized configuration.
Start the wizard to create DB system.
Select Virtual Cloud Network (VCN).
Select subnet to place your MySQL endpoint.
Select MySQL configuration (or create customized instances for your workload).
The shape for the DB System (CPU and RAM) will be set automatically.
Select the size of the storage for data and backup.
Create a backup policy or accept the default.
Creating MySQL instances
You can use MySQL Shell Upgrade Checker Utility to check the compatibility with MySQL8.0.
util.checkForServerUpgrade()
Loading the data
To move the data, you can use the MySQL Shell Dump & Load utility, which is capable of multi-threading and is callable with the JavaScript methods from MySQL Shell.
So, you can dump on what can be a bastion machine, and load your instance to the cloud. It will take several minutes to load the database of several gigabytes, so it is necessary to plan the service maintenance window accordingly.
In addition, the utility is easy to use. You just need to connect to an instance and dump.
MySQL Shell Dump & Load
The operation is pretty straightforward and the migration time will depend on the size of the database.
Free trial
You can have a test drive of the MySQL Database Service with $300 in cloud credits, which you can spend in the Oracle Cloud on MySQL Database Service or other cloud services.
Questions & Answers
Question. Do you help with migrating the databases from older versions to MySQL 8.0?
Answer. Yes, this is the thing we normally do for our customers — providing guidance, though data migration is normally straightforward.
Question. Does the database size matter? How efficient MySQL Shell Dump is? What if my database is terabytes in size?
Answer. MySQL Shell Dump & Load utility is much more efficient than what MySQL Dump used to be. The database size still matters. In that case, it will require more time, still way less than it used to take
The SF Chronicle is reporting (more details here), and the FBI is confirming, that a Melbourne mathematician and team has decrypted the 1969 message sent by the Zodiac Killer to the newspaper.
There’s no paper yet, but there are a bunch of details in the news articles.
Cryptologist David Oranchak, who has been trying to crack the notorious “340 cipher” (it contains 340 characters) for more than a decade, made a crucial breakthrough earlier this year when applied mathematician Sam Blake came up with about 650,000 different possible ways in which the code could be read. From there, using code-breaking software designed by Jarl Van Eycke, the team’s third member, they came up with a small number of valuable clues that helped them piece together a message in the cipher
However, when an error occurs in one of the items in the batch, this can result in reprocessing some of the same messages in that batch. With the new custom checkpoint feature, there is now much greater control over how you choose to process batches containing failed messages.
This blog post explains the default behavior of batch failures and options available to developers to handle this error state. I also cover how to use this new checkpoint capability and show the benefits of using this feature in your stream processing functions.
Overview
When using a Lambda function to consume messages from a stream, the batch size property controls the maximum number of messages passed in each event.
The stream manages two internal pointers: a checkpoint and a current iterator. The checkpoint is the last known item position that was successfully processed. The current iterator is the position in the stream for the next read operation. In a successful operation, here are two batches processed from a stream with a batch size of 10:
The first batch delivered to the Lambda function contains items 1–10. The function processes these items without error.
The checkpoint moves to item 11. The next batch delivered to the Lambda function contains items 11–20.
In default operation, the processing of the entire batch must succeed or fail. If a single item fails processing and the function returns an error, the batch fails. The entire batch is then retried until the maximum retries is reached. This can result in the same failure occurring multiple times and unnecessary processing of individual messages.
You can also enable the BisectBatchOnFunctonError property in the event source mapping. If there is a batch failure, the calling service splits the failed batch into two and retries the half-batches separately. The process continues recursively until there is a single item in a batch or messages are processed successfully. For example, in a batch of 10 messages, where item number 5 is failing, the processing occurs as follows:
Batch 1 fails. It’s split into batches 2 and 3.
Batch 2 fails, and batch 3 succeeds. Batch 2 is split into batches 4 and 5.
Batch 4 fails and batch 5 succeeds. Batch 4 is split into batches 6 and 7.
Batch 6 fails and batch 7 succeeds.
While this provides a way to process messages in a batch with one failing message, it results in multiple invocations of the function. In this example, message number 4 is processed four times before succeeding.
With the new custom checkpoint feature, you can return the sequence identifier for the failed messages. This provides more precise control over how to choose to continue processing the stream. For example, in a batch of 10 messages where the sixth message fails:
Lambda processes the batch of messages, items 1–10. The sixth message fails and the function returns the failed sequence identifier.
The checkpoint in the stream is moved to the position of the failed message. The batch is retried for only messages 6–10.
Existing stream processing behaviors
In the following examples, I use a DynamoDB table with a Lambda function that is invoked by the stream for the table. You can also use a Kinesis data stream if preferred, as the behavior is the same. The event source mapping is set to a batch size of 10 items so all the stream messages are passed in the event to a single Lambda invocation.
I use the following Node.js script to generate batches of 10 items in the table.
const AWS = require('aws-sdk')
AWS.config.update({ region: 'us-east-1' })
const docClient = new AWS.DynamoDB.DocumentClient()
const ddbTable = 'ddbTableName'
const BATCH_SIZE = 10
const createRecords = async () => {
// Create envelope
const params = {
RequestItems: {}
}
params.RequestItems[ddbTable] = []
// Add items to batch and write to DDB
for (let i = 0; i < BATCH_SIZE; i++) {
params.RequestItems[ddbTable].push({
PutRequest: {
Item: {
ID: Date.now() + i
}
}
})
}
await docClient.batchWrite(params).promise()
}
const main = async() => await createRecords()
main()
After running this script, there are 10 items in the DynamoDB table, which are then put into the DynamoDB stream for processing.
The processing Lambda function uses the following code. This contains a constant called FAILED_MESSAGE_NUM to force an error on the message with the corresponding index in the event batch:
The code uses the DynamoDB item’s sequence number, which is provided in each record of the stream event:
In the default configuration of the event source mapping, the failure of message 6 causes the whole batch to fail. The entire batch is then retried multiple times. This appears in the CloudWatch Logs for the function:
Next, I enable the bisect-on-error feature in the function’s event trigger. The first invocation fails as before but this causes two subsequent invocations with batches of five messages. The original batch is bisected. These batches complete processing successfully.
Configuring a custom checkpoint
Finally, I enable the custom checkpoint feature. This is configured in the Lambda function console by selecting the “Report batch item failures” check box in the DynamoDB trigger:
I update the processing Lambda function with the following code:
In this version of the code, the processing of each message is wrapped in a try…catch block. When processing fails, the function stops processing any remaining messages. It returns the sequence number of the failed message in a JSON object:
The calling service then updates the checkpoint value with the sequence number provided. If the batchItemFailures array is empty, the caller assumes all messages have been processed correctly. If the batchItemFailures array contains multiple items, the lowest sequence number is used as the checkpoint.
In this example, I also modify the FAILED_MESSAGE_NUM constant to 4 in the Lambda function. This causes the fourth message in every batch to throw an error. After adding 10 items to the DynamoDB table, the CloudWatch log for the processing function shows:
This is how the stream of 10 messages has been processed using the custom checkpoint:
In the first invocation, all 10 messages are in the batch. The fourth message throws an error. The function returns this position as the checkpoint.
In the second invocation, messages 4–10 are in the batch. Message 7 throws an error and its sequence number is returned as the checkpoint.
In the third invocation, the batch contains messages 7–10. Message 10 throws an error and its sequence number is now the returned checkpoint.
The final invocation contains only message 10, which is successfully processed.
Using this approach, subsequent invocations do not receive messages that have been successfully processed previously.
Conclusion
The default behavior for stream processing in Lambda functions enables entire batches of messages to succeed or fail. You can also use batch bisecting functionality to retry batches iteratively if a single message fails. Now with custom checkpoints, you have more control over handling failed messages.
This post explains the three different processing modes and shows example code for handling failed messages. Depending upon your use-case, you can choose the appropriate mode for your workload. This can help reduce unnecessary Lambda invocations and prevent reprocessing of the same messages in batches containing failures.
AWS Lambda now supports streaming analytics calculations for Amazon Kinesis and Amazon DynamoDB. This allows developers to calculate aggregates in near-real time and pass state across multiple Lambda invocations. This feature provides an alternative way to build analytics in addition to services like Amazon Kinesis Data Analytics.
In this blog post, I explain how this feature works with Kinesis Data Streams and DynamoDB Streams, together with example use-cases.
Overview
For workloads using streaming data, data arrives continuously, often from different sources, and is processed incrementally. Discrete data processing tasks, such as operating on files, have a known beginning and end boundary for the data. For applications with streaming data, the processing function does not know when the data stream starts or ends. Consequently, this type of data is commonly processed in batches or windows.
Before this feature, Lambda-based stream processing was limited to working on the incoming batch of data. For example, in Amazon Kinesis Data Firehose, a Lambda function transforms the current batch of records with no information or state from previous batches. This is also the same for processing DynamoDB streams using Lambda functions. This existing approach works well for MapReduce or tasks focused exclusively on the date in the current batch.
DynamoDB streams invoke a processing Lambda function asynchronously. After processing, the function may then store the results in a downstream service, such as Amazon S3.
Kinesis Data Firehose invokes a transformation Lambda function synchronously, which returns the transformed data back to the service.
This new feature introduces the concept of a tumbling window, which is a fixed-size, non-overlapping time interval of up to 15 minutes. To use this, you specify a tumbling window duration in the event-source mapping between the stream and the Lambda function. When you apply a tumbling window to a stream, items in the stream are grouped by window and sent to the processing Lambda function. The function returns a state value that is passed to the next tumbling window.
You can use this to calculate aggregates over multiple windows. For example, you can calculate the total value of a data item in a stream using 30-second tumbling windows:
Integer data arrives in the stream at irregular time intervals.
The first tumbling window consists of data in the 0–30 second range, passed to the Lambda function. It adds the items and returns the total of 6 as a state value.
The second tumbling window invokes the Lambda function with the state value of 6 and the 30–60 second batch of stream data. This adds the items to the existing total, returning 18.
The third tumbling window invokes the Lambda function with a state value of 18 and the next window of values. The running total is now 28 and returned as the state value.
The fourth tumbling window invokes the Lambda function with a state value of 28 and the 90–120 second batch of data. The final total is 32.
This feature is useful in workloads where you need to calculate aggregates continuously. For example, for a retailer streaming order information from point-of-sale systems, it can generate near-live sales data for downstream reporting. Using Lambda to generate aggregates only requires minimal code, and the function can access other AWS services as needed.
Using tumbling windows with Lambda functions
When you configure an event source mapping between Kinesis or DynamoDB and a Lambda function, use the new setting, Tumbling window duration. This appears in the trigger configuration in the Lambda console:
You can also set this value in AWS CloudFormation and AWS SAM templates. After the event source mapping is created, events delivered to the Lambda function have several new attributes:
These include:
Window start and end: the beginning and ending timestamps for the current tumbling window.
State: an object containing the state returned from the previous window, which is initially empty. The state object can contain up to 1 MB of data.
isFinalInvokeForWindow: indicates if this is the last invocation for the tumbling window. This only occurs once per window period.
isWindowTerminatedEarly: a window ends early only if the state exceeds the maximum allowed size of 1 MB.
In any tumbling window, there is a series of Lambda invocations following this pattern:
The first invocation contains an empty state object in the event. The function returns a state object containing custom attributes that are specific to the custom logic in the aggregation.
The second invocation contains the state object provided by the first Lambda invocation. This function returns an updated state object with new aggregated values. Subsequent invocations follow this same sequence.
The final invocation in the tumbling window has the isFinalInvokeForWindow flag set to the true. This contains the state returned by the most recent Lambda invocation. This invocation is responsible for storing the result in S3 or in another data store, such as a DynamoDB table. There is no state returned in this final invocation.
Using tumbling windows with DynamoDB
DynamoDB streams can invoke Lambda function using tumbling windows, enabling you to generate aggregates per shard. In this example, an ecommerce workload saves orders in a DynamoDB table and uses a tumbling window to calculate the near-real time sales total.
First, I create a DynamoDB table to capture the order data and a second DynamoDB table to store the aggregate calculation. I create a Lambda function with a trigger from the first orders table. The event source mapping is created with a Tumbling window duration of 30 seconds:
I use the following code in the Lambda function:
const AWS = require('aws-sdk')
AWS.config.update({ region: process.env.AWS_REGION })
const docClient = new AWS.DynamoDB.DocumentClient()
const TableName = 'tumblingWindowsAggregation'
function isEmpty(obj) { return Object.keys(obj).length === 0 }
exports.handler = async (event) => {
// Save aggregation result in the final invocation
if (event.isFinalInvokeForWindow) {
console.log('Final: ', event)
const params = {
TableName,
Item: {
windowEnd: event.window.end,
windowStart: event.window.start,
sales: event.state.sales,
shardId: event.shardId
}
}
return await docClient.put(params).promise()
}
console.log(event)
// Create the state object on first invocation or use state passed in
let state = event.state
if (isEmpty (state)) {
state = {
sales: 0
}
}
console.log('Existing: ', state)
// Process records with custom aggregation logic
event.Records.map((item) => {
// Only processing INSERTs
if (item.eventName != "INSERT") return
// Add sales to total
let value = parseFloat(item.dynamodb.NewImage.sales.N)
console.log('Adding: ', value)
state.sales += value
})
// Return the state for the next invocation
console.log('Returning state: ', state)
return { state: state }
}
This function code processes the incoming event to aggregate a sales attribute, and return this aggregated result in a state object. In the final invocation, it stores the aggregated value in another DynamoDB table.
I then use this Node.js script to generate random sample order data:
Once the script is complete, the console shows the individual order transactions and the total sales:
After the tumbling window duration is finished, the second DynamoDB table shows the aggregate values calculated and stored by the Lambda function:
Since aggregation for each shard is independent, the totals are stored by shardId. If I continue to run the test data script, the aggregation function continues to calculate and store more totals per tumbling window period.
Using tumbling windows with Kinesis
Kinesis data streams can also invoke a Lambda function using a tumbling window in a similar way. The biggest difference is that you control how many shards are used in the data stream. Since aggregation occurs per shard, this controls the total number aggregate results per tumbling window.
Using the same sales example, first I create a Kinesis data stream with one shard. I use the same DynamoDB tables from the previous example, then create a Lambda function with a trigger from the first orders table. The event source mapping is created with a Tumbling window duration of 30 seconds:
I use the following code in the Lambda function, modified to process the incoming Kinesis data event:
const AWS = require('aws-sdk')
AWS.config.update({ region: process.env.AWS_REGION })
const docClient = new AWS.DynamoDB.DocumentClient()
const TableName = 'tumblingWindowsAggregation'
function isEmpty(obj) {
return Object.keys(obj).length === 0
}
exports.handler = async (event) => {
// Save aggregation result in the final invocation
if (event.isFinalInvokeForWindow) {
console.log('Final: ', event)
const params = {
TableName,
Item: {
windowEnd: event.window.end,
windowStart: event.window.start,
sales: event.state.sales,
shardId: event.shardId
}
}
console.log({ params })
await docClient.put(params).promise()
}
console.log(JSON.stringify(event, null, 2))
// Create the state object on first invocation or use state passed in
let state = event.state
if (isEmpty (state)) {
state = {
sales: 0
}
}
console.log('Existing: ', state)
// Process records with custom aggregation logic
event.Records.map((record) => {
const payload = Buffer.from(record.kinesis.data, 'base64').toString('ascii')
const item = JSON.parse(payload).Item
// // Add sales to total
let value = parseFloat(item.sales)
console.log('Adding: ', value)
state.sales += value
})
// Return the state for the next invocation
console.log('Returning state: ', state)
return { state: state }
}
This function code processes the incoming event in the same way as the previous example. I then use this Node.js script to generate random sample order data, modified to put the data on the Kinesis stream:
const AWS = require('aws-sdk')
AWS.config.update({ region: 'us-east-1' })
const kinesis = new AWS.Kinesis()
const StreamName = 'testStream'
const ITERATIONS = 100
const SLEEP_MS = 10
let totalSales = 0
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
const createSales = async() => {
for (let i = 0; i < ITERATIONS; i++) {
let sales = Math.round (parseFloat(100 * Math.random()))
totalSales += sales
console.log ({i, sales, totalSales})
const data = {
Item: {
ID: Date.now().toString(),
sales,
timeStamp: new Date().toString()
}
}
await kinesis.putRecord({
Data: Buffer.from(JSON.stringify(data)),
PartitionKey: 'PK1',
StreamName
}).promise()
await sleep(SLEEP_MS)
}
}
const main = async() => {
await createSales()
}
main()
Once the script is complete, the console shows the individual order transactions and the total sales:
After the tumbling window duration is finished, the second DynamoDB table shows the aggregate values calculated and stored by the Lambda function:
As there is only one shard in this Kinesis stream, there is only one aggregation value for all the data items in the test.
Conclusion
With tumbling windows, you can calculate aggregate values in near-real time for Kinesis data streams and DynamoDB streams. Unlike existing stream-based invocations, state can be passed forward by Lambda invocations. This makes it easier to calculate sums, averages, and counts on values across multiple batches of data.
In this post, I walk through an example that aggregates sales data stored in Kinesis and DynamoDB. In each case, I create an aggregation function with an event source mapping that uses the new tumbling window duration attribute. I show how state is passed between invocations and how to persist the aggregated value at the end of the tumbling window.
Apache Kafka is an open source event streaming platform used to support workloads such as data pipelines and streaming analytics. Apache Kafka is a distributed streaming platform that it is conceptually similar to Amazon Kinesis.
With the launch of Kafka as an event source for Lambda, you can now consume messages from a topic in a Lambda function. This makes it easier to integrate your self-hosted Kafka clusters with downstream serverless workflows.
In this blog post, I explain how to set up an Apache Kafka cluster on Amazon EC2 and configure key elements in the networking configuration. I also show how to create a Lambda function to consume messages from a Kafka topic. Although the process is similar to using Amazon Managed Streaming for Apache Kafka (Amazon MSK) as an event source, there are also some important differences.
Overview
Using Kafka as an event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service internally polls for new records or messages from the event source, and then synchronously invokes the target Lambda function. Lambda reads the messages in batches and provides the message batches to your function in the event payload.
Lambda is a consumer application for your Kafka topic. It processes records from one or more partitions and sends the payload to the target function. Lambda continues to process batches until there are no more messages in the topic.
Configuring networking for self-hosted Kafka
It’s best practice to deploy the Amazon EC2 instances running Kafka in private subnets. For the Lambda function to poll the Kafka instances, you must ensure that there is a NAT Gateway running in the public subnet of each Region.
It’s possible to route the traffic to a single NAT Gateway in one AZ for test and development workloads. For redundancy in production workloads, it’s recommended that there is one NAT Gateway available in each Availability Zone. This walkthrough creates the following architecture:
Deploy a VPC with public and private subnets and a NAT Gateway that enables internet access. To configure this infrastructure with AWS CloudFormation, deploy this template.
From the VPC console, edit the default security group created by this template to provide inbound access to the following ports:
Custom TCP: ports 2888–3888 from all sources.
SSH (port 22), restricted to your own IP address.
Custom TCP: port 2181 from all sources.
Custom TCP: port 9092 from all sources.
All traffic from the same security group identifier.
Deploying the EC2 instances and installing Kafka
Next, you deploy the EC2 instances using this network configuration and install the Kafka application:
From the EC2 console, deploy an instance running Ubuntu Server 18.04 LTS. Ensure that there is one instance in each private subnet, in different Availability Zones. Assign the default security group configured by the template.
Next, deploy another EC2 instance in either of the public subnets. This is a bastion host used to access the private instances. Assign the default security group configured by the template.
Connect to the bastion host, then SSH to the first private EC2 instance using the method for your preferred operating system. This post explains different methods. Repeat the process in another terminal for the second private instance.
wget http://www-us.apache.org/dist/kafka/2.3.1/kafka_2.12-2.3.1.tgz
tar xzf kafka_2.12-2.3.1.tgz
ln -s kafka_2.12-2.3.1 kafka
Configure and start Zookeeper
Configure and start the Zookeeper service that manages the Kafka brokers:
On the first instance, configure the Zookeeper ID:
cd kafka
mkdir /tmp/zookeeper
touch /tmp/zookeeper/myid
echo "1" >> /tmp/zookeeper/myid
Repeat the process on the second instance, using a different ID value:
cd kafka
mkdir /tmp/zookeeper
touch /tmp/zookeeper/myid
echo "2" >> /tmp/zookeeper/myid
On the first instance, edit the config/zookeeper.properties file, adding the private IP address of the second instance:
initLimit=5
syncLimit=2
tickTime=2000
# list of servers: <ip>:2888:3888
server.1=0.0.0.0:2888:3888
server.2=<<IP address of second instance>>:2888:3888
On the second instance, edit the config/zookeeper.properties file, adding the private IP address of the first instance:
initLimit=5
syncLimit=2
tickTime=2000
# list of servers: <ip>:2888:3888
server.1=<<IP address of first instance>>:2888:3888
server.2=0.0.0.0:2888:3888
On each instance, start Zookeeper:bin/zookeeper-server-start.sh config/zookeeper.properties
Configure and start Kafka
Configure and start the Kafka broker:
On the first instance, edit the config/server.properties file: broker.id=1 zookeeper.connect=0.0.0.0:2181, =<<IP address of second instance>>:2181
On the second instance, edit the config/server.properties file: broker.id=2 zookeeper.connect=0.0.0.0:2181, =<<IP address of first instance>>:2181
Start Kafka on each instance: bin/kafka-server-start.sh config/server.properties
At the end of this process, Zookeeper and Kafka are running on both instances. If you use separate terminals, it looks like this:
Configuring and publishing to a topic
Kafka organizes channels of messages around topics, which are virtual groups of one or many partitions across Kafka brokers in a cluster. Multiple producers can send messages to Kafka topics, which can then be routed to and processed by multiple consumers. Producers publish to the tail of a topic and consumers read the topic at their own pace.
From either of the two instances:
Create a new topic called test:
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 2 --partitions 2 --topic test
Enter test messages to check for successful publication:
At this point, you can successfully publish messages to your self-hosted Kafka cluster. Next, you configure a Lambda function as a consumer for the test topic on this cluster.
Configuring the Lambda function and event source mapping
You can create the Lambda event source mapping using the AWS CLI or AWS SDK, which provide the CreateEventSourceMapping API. In this walkthrough, you use the AWS Management Console to create the event source mapping.
Create a Lambda function that uses the self-hosted cluster and topic as an event source:
Enter a function name, and select Node.js 12.x as the runtime.
Select the Permissions tab, and select the role name in the Execution role panel to open the IAM console.
Choose Add inline policy and create a new policy called SelfHostedKafkaPolicy with the following permissions. Replace the resource example with the ARNs of your instances:
Choose Create policy and ensure that the policy appears in Permissions policies.
Back in the Lambda function, select the Configuration tab. In the Designer panel, choose Add trigger.
In the dropdown, select Apache Kafka:
For Bootstrap servers, add each of the two instances private IPv4 DNS addresses with port 9092 appended.
For Topic name, enter ‘test’.
Enter your preferred batch size and starting position values (see this documentation for more information).
For VPC, select the VPC created by the template.
For VPC subnets, select the two private subnets.
For VPC security groups, select the default security group.
Choose Add.
The trigger’s status changes to Enabled in the Lambda console after a few seconds. It then takes several minutes for the trigger to receive messages from the Kafka cluster.
Testing the Lambda function
At this point, you have created a VPC with two private and public subnets and a NAT Gateway. You have created a Kafka cluster on two EC2 instances in private subnets. You set up a target Lambda function with the necessary IAM permissions. Next, you publish messages to the test topic in Kafka and see the resulting invocation in the logs for the Lambda function.
In the Functioncode panel, replace the contents of index.js with the following code and choose Deploy:
exports.handler = async (event) => {
// Iterate through keys
for (let key in event.records) {
console.log('Key: ', key)
// Iterate through records
event.records[key].map((record) => {
console.log('Record: ', record)
// Decode base64
const msg = Buffer.from(record.value, 'base64').toString()
console.log('Message:', msg)
})
}
}
Back in the terminal with the producer script running, enter a test message:
In the Lambda function console, select the Monitoring tab then choose View logs in CloudWatch. In the latest log stream, you see the original event and the decoded message:
Using Lambda as event source
The Lambda function target in the event source mapping does not need to be connected to a VPC to receive messages from the private instance hosting Kafka. However, you must provide details of the VPC, subnets, and security groups in the event source mapping for the Kafka cluster.
The Lambda function must have permission to describe VPCs and security groups, and manage elastic network interfaces. These execution roles permissions are:
ec2:CreateNetworkInterface
ec2:DescribeNetworkInterfaces
ec2:DescribeVpcs
ec2:DeleteNetworkInterface
ec2:DescribeSubnets
ec2:DescribeSecurityGroups
The event payload for the Lambda function contains an array of records. Each array item contains details of the topic and Kafka partition identifier, together with a timestamp and base64 encoded message:
There is an important difference in the way the Lambda service connects to the self-hosted Kafka cluster compared with Amazon MSK. MSK encrypts data in transit by default so the broker connection defaults to using TLS. With a self-hosted cluster, TLS authentication is not supported when using the Apache Kafka event source. Instead, if accessing brokers over the internet, the event source uses SASL/SCRAM authentication, which can be configured in the event source mapping:
To learn how to configure SASL/SCRAM authentication your self-hosted Kafka cluster, see this documentation.
Conclusion
Lambda now supports self-hosted Kafka as an event source so you can invoke Lambda functions from messages in Kafka topics to integrate into other downstream serverless workflows.
This post shows how to configure a self-hosted Kafka cluster on EC2 and set up the network configuration. I also cover how to set up the event source mapping in Lambda and test a function to decode the messages sent from Kafka.
Security has always been a top-priority at Grab; our product security team works round-the-clock to ensure that our customers’ data remains safe. Five years ago, we launched our private bug bounty program on HackerOne, which evolved into a public program in August 2017. The idea was to complement the security efforts our team has been putting through to keep Grab secure. We were a pioneer in South East Asia to implement a public bug bounty program, and now we stand among the Top 20 programs on HackerOne worldwide.
We started as a private bug bounty program which provided us with fantastic results, thus encouraging us to increase our reach and benefit from the vibrant security community across the globe which have helped us iron-out security issues 24×7 in our products and infrastructure. We then publicly launched our bug bounty program offering competitive rewards and hackers can even earn additional bonuses if their report is well-written and display an innovative approach to testing.
In 2019, we also enrolled ourselves in the Google Play Security Reward Program (GPSRP), Offered by Google Play, GPSRP allows researchers to re-submit their resolved mobile security issues directly and get additional bounties if the report qualifies under the GPSRP rules. A selected number of Android applications are eligible, including Grab’s Android mobile application. Through the participation in GPSP, we hope to give researchers the recognition they deserve for their efforts.
In this blog post, we’re going to share our journey of running a bug bounty program, challenges involved and share the learnings we had on the way to help other companies in SEA and beyond to establish and build a successful bug bounty program.
Transitioning from Private to a Public Program
At Grab, before starting the private program, we defined policy and scope, allowing us to communicate the objectives of our bug bounty program and list the targets that can be tested for security issues. We did a security sweep of the targets to eliminate low-hanging security issues, assigned people from the security team to take care of incoming reports, and then launched the program in private mode on HackerOne with a few chosen researchers having demonstrated a history of submitting quality submissions.
One of the benefits of running a private bug bounty program is to have some control over the number of incoming submissions of potential security issues and researchers who can report issues. This ensures the quality of submissions and helps to control the volume of bug reports, thus avoiding overwhelming a possibly small security team with a deluge of issues so that they won’t be overwhelming for the people triaging potential security issues. The invited researchers to the program are limited, and it is possible to invite researchers with a known track record or with a specific skill set, further working in the program’s favour.
The results and lessons from our private program were valuable, making our program and processes mature enough to open the bug bounty program to security researchers across the world. We still did another security sweep, reworded the policy, redefined the targets by expanding the scope, and allocated enough folks from our security team to take on the initial inflow of reports which was anticipated to be in tune with other public programs.
Noticeable spike in the number of incoming reports as we went public in July 2017.
Lessons Learned from the Public Program
Although we were running our bug bounty program in private for sometime before going public, we still had not worked much on building standard operating procedures and processes for managing our bug bounty program up until early 2018. Listed below, are our key takeaways from 2018 till July 2020 in terms of improvements, challenges, and other insights.
Response Time: No researcher wants to work with a bug bounty team that doesn’t respect the time that they are putting into reporting bugs to the program. We initially didn’t have a formal process around response times, because we wanted to encourage all security engineers to pick-up reports. Still, we have been consistently delivering a first response to reports in a matter of hours, which is significantly lower than the top 20 bug bounty programs running on HackerOne. Know what structured (or unstructured) processes work for your team in this area, because your program can see significant rewards from fast response times.
Time to Bounty: In most bug bounty programs the payout for a bug is made in one of the following ways: full payment after the bug has been resolved, full payment after the bug has been triaged, or paying a portion of the bounty after triage and the remaining after resolution. We opt to pay the full bounty after triage. While we’re always working to speed up resolution times, that timeline is in our hands, not the researcher’s. Instead of making them wait, we pay them as soon as impact is determined to incentivize long-term engagement in the program.
Noise Reduction: With HackerOne Triage and Human-Augmented Signal, we’re able to focus our team’s efforts on resolving unique, valid vulnerabilities. Human-Augmented Signal flags any reports that are likely false-positives, and Triage provides a validation layer between our security team and the report inbox. Collaboration with the HackerOne Triage team has been fantastic and ultimately allows us to be more efficient by focusing our energy on valid, actionable reports. In addition, we take significant steps to block traffic coming from networks running automated scans against our Grab infrastructure and we’re constantly exploring this area to actively prevent automated external scanning.
Team Coverage: We introduced a team scheduling process, in which we assign a security engineer (chosen during sprint planning) on a weekly basis, whose sole responsibility is to review and respond to bug bounty reports. We have integrated our systems with HackerOne’s API and PagerDuty to ensure alerts are for valid reports and verified as much as possible.
Looking Ahead
In one area we haven’t been doing too great is ensuring higher rates of participation in our core mobile applications; some of the pain points researchers have informed us about while testing our applications are:
Researchers’ accounts are getting blocked due to our anti-fraud checks.
Researchers are not able to register driver accounts (which is understandable as our driver-partners have to go through manual verification process)
Researchers who are not residing in the Southeast Asia region are unable to complete end-to-end flows of our applications.
We are open to community feedback and how we can improve. We want to hear from you! Please drop us a note at [email protected] for any program suggestions or feedback.
Last but not least, we’d like to thank all researchers who have contributed to the Grab program so far. Your immense efforts have helped keep Grab’s businesses and users safe. Here’s a shoutout to our program’s top-earning hackers 🏆:
Lastly, here is a special shoutout to @bagipro who has done some great work and testing on our Grab mobile applications!
Well done and from everyone on the Grab team, we look forward to seeing you on the program!
Join us
Grab is more than just the leading ride-hailing and mobile payments platform in Southeast Asia. We use data and technology to improve everything from transportation to payments and financial services across a region of more than 620 million people. We aspire to unlock the true potential of Southeast Asia and look for like-minded individuals to join us on this ride.
If you share our vision of driving South East Asia forward, apply to join our team today.
Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy to create and deliver insights to everyone in your organization. In this post, we explore how authors of QuickSight dashboards can use some of the new chart types, layout options, and dashboard formatting controls to deliver dashboards that intuitively deliver insights to all your users, whether within QuickSight or embedded in your websites or multi-tenant apps.
This blog post explores some of the visualization and dashboard customization features offered in Amazon QuickSight with the following datasets:
In this section, we explore some of the new charts QuickSight introduced in 2020 and how these help with various use cases.
Funnel charts
Funnel charts help visualize the progressive reduction of data as it passes from one phase to another. Data in each of these phases is represented as different portions of 100% (the whole). The most common use of the funnel chart is in visualizing conversion data. For example, you can represent sales lead generation showing different stages of sales conversion from first contact to lead generation.
To build a funnel chart with our Ads dataset, complete the following steps:
On the analysis page, choose Visualize.
Choose Add, then choose Add visual.
In the Visual types pane, choose the funnel chart icon.
For Group by, choose Stage.
For Value, choose Ad name.
To change default configuration, choose the gear icon.
In the Data labels section, for Metric label style, choose Value and percent of first.
The video below demonstrates these steps.
Stacked area charts
Stacked area charts are best used to visualize part-to-whole relationships, to show how each category contributes to the cumulative total. For this post, we create a stacked area chart with the Ads dataset.
On the analysis page, choose Visualize.
Choose Add, then choose Add visual.
In the Visual types pane, choose the stacked area chart icon.
For X axis, choose Date (MONTH).
For Value, choose Cost (Sum).
For Color, choose Segment.
Choose the gear icon.
Under Legend, deselect Legend title.
Under Y-Axis, select Show Y axis label.
Under Data labels, select Show data labels.
Choose your desired position, font size, font color, and label pattern.
Histograms
Histograms help visualize the frequency distribution of a dataset and display numerical data by grouping data into bins of equal width. Each bin is plotted as a bar whose height corresponds to the number of data points within the bin.
For this post, we use the Student Performance dataset to create a histogram.
On the analysis page, choose Visualize.
Choose Add, then choose Add visual.
In the Visual types pane, choose the histogram icon.
For Value, choose math score.
You can customize the histogram to show bins by bin count, bin width, or a custom start value. For this post, we sort by bin width.
Under Histogram, select Bin width.
For Bin width, enter 5.
Box plots
Box plot (also called box or whisker plot) is a standardized way of displaying distribution of data based on a five-number summary (minimum, first quartile (Q1), median, third quartile (Q3), and maximum). This is useful to determine if data is symmetrical, skewed, or tightly grouped. Box plots also show outliers.
For this post, we create a box plot on the Student Performance dataset.
On the analysis page, choose Visualize.
Choose Add, then choose Add visual.
In the Visual types pane, choose the box plot icon.
For Group by, choose Gender.
For Value, choose writing score and reading score.
In the visual settings, under Box plot, select Show outliers and Show all data points.
Under Legend, deselect Show legend title.
Waterfall charts
Waterfall charts help you understand the cumulative effect of sequentially introduced positive or negative values. This is great to understand contributions to a whole, for example the main contributors to a monthly increase in revenue, or the breakdown of revenue vs costs.
We use the P&L dataset to create a waterfall chart.
On the analysis page, choose Visualize.
Choose Add, then choose Add visual.
In the Visual selection pane, choose the waterfall chart icon.
For Category, choose Line item.
For Value¸ choose Value (Sum).
Under Legend, deselect Show legend title.
For Position, select Bottom.
Under Title, deselect Show title.
Choropleth maps
Choropleth maps use differences in shading or coloring within geographical areas or regions to indicate the value of a variable in those areas
We use the Patient Info dataset to create a choropleth map.
On the analysis page, choose Visualize.
Choose Add, then choose Add visual.
In the Visual types pane, choose the funnel chart icon.
For Location, choose State.
For Color, choose Revenue (Sum).
Choose the menu options icon (…) and choose Conditional formatting.
For Column, choose Revenue.
For Fill type, select Gradient.
For Max value, choose a color (for this post, we choose blue).
The video below demonstrates these steps.
You can also control the color and shading of the geographic areas using conditional formatting as shown below.
The video below demonstrates these steps.
Additionally, you can configure the alignment of column headers and content within. You can change the vertical alignment – top, middle, bottom and also the horizontal alignment – left, center and right.
Customization and formatting options
QuickSight also supports several formatting options that allow you to streamline visualizations and convey additional information in your dashboards.
Table/pivot table formatting
Pin or unpin and add custom text totals
You can now pin totals to the top or bottom of tables and pivot tables in QuickSight. This feature helps you view the totals even while scrolling through the tables.
Go to visual setting (gear icon on the visual menu).
Under Total, select Pin totals.
For Position, choose a position (for this post, we choose Bottom).
Additionally, you can edit the text you want to show on totals and subtotals.
For Total label, enter your custom text (for this post, we enter Grand Total).
Table alignment and wrapping
You can now align horizontal (left, right, center, auto) and vertical (top, middle, bottom) alignment on column headers and cell values in a table visual. Additionally, you can apply text wrapping on table and pivot table headers so that long headers are readable versus having to scroll over the header.
These options are available under Table options.
Hide +/- buttons on pivot tables
You can now show or hide +/- buttons on pivot tables. This allows you to improve presentation of pivot tables by removing these icons and keeping the pivot table simple. This option is available under Styling.
Visual customization options
In this section, we discuss additional customization options in QuickSight.
Custom sorting
If you want to sort your charts in a custom defined order different from the default alphabetic order, you can now do so on QuickSight. For example, you can sort geographical regions in the order of East, West, Central, and South, by ranking these regions 1– 4 and then sorting on this rank field. See video below to learn how to.
You can also therefore sort using any other critical metric fields that aren’t part of the visual. Choose your field well and choose Sort options to see available sort order options.
You can also therefore sort on any other critical metric fields that aren’t part of the visual. Choose your field well and choose Sort options to see available sort order options.
The following screenshot shows your sorted visualizations.
Adding descriptive text, images, and links
You can add images or logos to your dashboard using QuickSight’s narrative component available in Enterprise Edition.
On the analysis page, choose Visualize.
Choose Add, then choose Add visual.
In the Visual types pane, choose the insights icon.
Choose Customize insight.
Remove any existing text and add your custom text.
You can also add hyperlinks to text and images. Upload your image to a secure location where the image is accessible to QuickSight and your users.
The video below demonstrates these steps.
Customizing colors and fonts
QuickSight offers easy-to-build themes that allow customization of the color palette, background and foreground colors, spacing, fonts, and more. Themes can be created by authors and shared within an account, and are also accessible via APIs for programmatic management. Themes can also be defaulted for all users in the organization using APIs.
You can also prioritize the colors that you want to use in your dashboard by prioritizing them within your theme’s color palette.
You can apply predefined themes available out of the box or create your own themes that fit your corporate branding. The following screenshots show how a dashboard looks in both dark and light themes.
You may occasionally have null values in your data and want to represent nulls with different annotations. For each of the dimensions and metrics in the dataset, you can provide custom text for null values. Go to More formatting options for any field.
The option is available under Null values.
Reference lines
You can draw reference lines based on a calculated field or a constant value. Choose the gear icon and navigate to reference lines section. In the following screenshot, the orange reference line is based off a calculation (average profit) and the black reference line is plotted on a constant value of 1,000.
You can also link reference lines to parameters via a calculated field, which allows you to create what-if scenarios within your dashboard.
Custom colors on heat and tree maps
Color gradient customization on heat and tree maps allows you to select colors for lower, intermediate, and upper limits so that the gradient is applied within these colors. You can configure this under Color in the visual settings.
Using logarithmic scale
If your metric numbers aren’t in the same order of magnitude, where some are extremely on the higher end and others on the lower end (for example, representing stock price for different entities or pandemic spread rates), you can represent them on a logarithmic scale so they’re normalized, yet relative. To use logarithmic scale, go to visual setting and under Y-Axis, for Scale, select Logarithmic.
The following screenshot shows your visualization after applying logarithmic scale.
Adjustable font size
You can now apply different font sizes on all visual content and visuals titles. In pivot tables and tables, you can see font sizes for table headers, cells, totals, and subtotals. In key performance indicators (KPIs), you can set font sizes for primary and comparison values, which allows you to keep dashboards dense and add more KPIs.
Actions
Finally, to all these charts, you can apply the following actions:
Filter actions– Select points on a chart to filter across the dashboard. QuickSight supports hierarchical filter actions that allow you to trigger one filter action from more than one chart. For more information, see Enhancing dashboard interactivity with Amazon QuickSight Actions.
URLactions – Trigger navigation from the dashboard to an external website and pass dynamic values within a URL.
Layout enhancements
QuickSight dashboards default to auto-fit mode, which makes them responsive based on screen size. However, in many situations, it’s preferable that the view you design is exactly what end-users see, whether on a laptop or a large monitor. QuickSight offers optimized layouts that allow you to pick a specific screen resolution to optimize for (such as the screen size most of your users use on a daily basis), and QuickSight automatically scales the dashboard view to render appropriately on larger or smaller screens. This doesn’t affect mobile devices—QuickSight automatically optimizes for mobile devices using a single-column layout. To adjust the scaling mode, choose Settings in the navigation pane while in dashboard authoring (analysis) mode.
If you build your dashboard for a 1024 px screen, for example, QuickSight scales that view to a larger or smaller screen to ensure that all users see the same content (mobile devices continue to fall back to a single-column, mobile-specific layout to ensure usability). Opting for the optimized mode also makes sure that your email reports look exactly like the dashboard that your viewers interact with.
On-sheet filter controls
You can now add filters to your dashboard directly without having to create parameters. Choose the field that you need to filter on choose Add filter for this field. Choose the newly added filter and choose Add to sheet.
If you need to pin it to the controls section, choose the filter and choose Pin to top.
The video below demonstrates these steps.
Thus, QuickSight allows you to choose from any of these control types to add to dashboards – single-select drop-downs, multi-select drop-downs, date and time picker, single-sided slider, single-line text box, time range picker, relative date selection, and numeric range slider. Learn more about on-sheet controls from the blog post here.
Other launches in 2020
While this blog covers all key charting and visualization launches in 2020, you can take a look at all new features enabled across other areas within QuickSight from this blog post here.
Conclusion
With these new QuickSight feature releases, you can now choose the chart type that is best suited to represent your data. You can provide richer dashboards for your readers by using the new formatting table options, dynamic titles, and reference lines. For more information about authoring dashboards in QuickSight, watch the virtual workshop Build Advanced Analytics and Dashboards with Amazon QuickSight and consider subscribing to the Amazon QuickSight YouTube channel for the latest training and feature walkthroughs.
About the Author
Sapna Maheshwari is a Specialist Solutions Architect for Amazon QuickSight. She is passionate about telling stories with data. In her previous roles at American Express and Early Warning services , she managed and led several projects in the data and analytics space.She enjoys helping customers unearth actionable insights from their data.
AWS Launch Wizard is a console-based service to quickly and easily size, configure, and deploy third party applications, such as Microsoft SQL Server Always On and HANA based SAP systems, on AWS without the need to identify and provision individual AWS resources. AWS Launch Wizard offers an easy way to deploy enterprise applications and optimize costs. Instead of selecting and configuring separate infrastructure services, you go through a few steps in the AWS Launch Wizard and it deploys a ready-to-use application on your behalf. It reduces the time you need to spend on investigating how to provision, cost and configure your application on AWS.
You can now use AWS Launch Wizard to deploy and configure self-managed Microsoft Windows Server Active Directory Domain Services running on Amazon Elastic Compute Cloud (EC2) instances. With Launch Wizard, you can have fully-functioning, production-ready domain controllers within a few hours—all without having to manually deploy and configure your resources.
You can use AWS Directory Service to run Microsoft Active Directory (AD) as a managed service, without the hassle of managing your own infrastructure. If you need to run your own AD infrastructure, you can use AWS Launch Wizard to simplify the deployment and configuration process.
In this post, I walk through creation of a cross-region Active Directory domain using Launch Wizard. First, I deploy a single Active Directory domain spanning two regions. Then, I configure Active Directory Sites and Services to match the network topology. Finally, I create a user account to verify replication of the Active Directory domain.
Figure 1: Diagram of resources deployed in this post
Prerequisites
You must have a VPC in your home. Additionally, you must have remote regions that have CIDRs that do not overlap with each other. If you need to create VPCs and subnets that do not overlap, please refer here.
Each subnet used must have outbound internet connectivity. Feel free to either use a NAT Gateway or Internet Gateway.
The VPCs must be peered in order to complete the steps in this post. For information on creating a VPC Peering connection between regions, please refer here.
If you choose to deploy your Domain Controllers to a private subnet, you must have an RDP jump / bastion instance setup to allow you to RDP to your instance.
Deploy Your Domain Controllers in the Home Region using Launch Wizard
In this section, I deploy the first set of domain controllers into the us-east-1 the home region using Launch Wizard. I refer to US-East-1 as the homeregion, and US-West-2 as the remoteregion.
Assign a Controller IP address for each domain controller
Remote Desktop Gateway preferences: Disregard for now, this is set up later.
Check the I confirm that a public subnet has been set up. Each of the selected private subnets have outbound connectivity enabled check box.
Select Next.
In the Define infrastructure requirements page, set the following inputs.
Storage and compute: Based on infrastructure requirements
Number of AD users: Up to 5000 users
Select Next.
In the Review and deploy page, review your selections. Then, select Deploy.
Note that it may take up to 2 hours for your domain to be deployed. Once the status has changed to Completed, you can proceed to the next section. In the next section, I prepare Active Directory Sites and Services for the second set of domain controller in my other region.
Configure Active Directory Sites and Services
In this section, I configure the Active Directory Sites and Services topology to match my network topology. This step ensures proper Active Directory replication routing so that domain clients can find the closest domain controller. For more information on Active Directory Sites and Services, please refer here.
Retrieve your Administrator Credentials from Secrets Manager
From the AWS Secrets Manager Console in us-east-1, select the Secret that begins with LaunchWizard-UsEast1AD.
In the middle of the Secret page, select Retrieve secret value.
This will display the username and password key with their values.
You need these credentials when you RDP into one of the domain controllers in the next steps.
Rename the Default First Site
Log in to the one of the domain controllers in us-east-1.
Select Start, type dssite and hit Enter on your keyboard.
The Active Directory Sites and Services MMC should appear.
Expand Sites. There is a site named Default-First-Site-Name.
Right click on Default-First-Site-Name select Rename.
Enter us-east-1 as the name.
Leave the Active Directory Sites and Services MMC open for the next set of steps.
Create a New Site and Subnet Definition for US-West-2
Using the Active Directory Sites and Services MMC from the previous steps, right click on Sites.
Select New Site… and enter the following inputs:
Name: us-west-2
Select DEFAULTIPSITELINK.
Select OK.
A pop up will appear telling you there will need to be some additional configuration. Select OK.
Expand Sites and right click on Subnets and select New Subnet.
Enter the following information:
Prefix: the CIDR of your us-west-2 VPC. An example would be 1.0.0/24
Site: select us-west-2
Select OK.
Leave the Active Directory Sites and Services MMC open for the following set of steps.
Configure Site Replication Settings
Using the Active Directory Sites and Services MMC from the previous steps, expand Sites, Inter-Site Transports, and select IP. You should see an object named DEFAULTIPSITELINK,
Right click on DEFAULTIPSITELINK.
Select Properties. Set or verify the following inputs on the General tab:
In the DEFAULTIPSITELINKProperties, select the Attribute Editor tab and modify the following:
Scroll down and double click on Enter 1 for the Value, then select OK twice.
For more information on these settings, please refer here.
Close the Active Directory Sites and Services MMC, as it is no longer needed.
Prepare Your Home Region Domain Controllers Security Group
In this section, I modify the Domain Controllers Security Group in us-east-1. This allows the domain controllers deployed in us-west-2 to communicate with each other.
Assign a Controller IP address for each domain controller
Remote Desktop Gateway preferences: disregard for now, as I set this later.
Check the I confirm that a public subnet has been set up. Each of the selected private subnets have outbound connectivity enabled check box
In the Define infrastructure requirements page set the following:
Storage and compute: Based on infrastructure requirements
Number of AD users: Up to 5000 users
In the Review and deploy page, review your selections. Then, select Deploy.
Note that it may take up to 2 hours to deploy domain controllers. Once the status has changed to Completed, proceed to the next section. In this next section, I prepare Active Directory Sites and Services for the second set of domain controller in another region.
Prepare Your Remote Region Domain Controllers Security Group
In this section, I modify the Domain Controllers Security Group in us-west-2. This allows the domain controllers deployed in us-west-2 to communicate with each other.
Select the Domain Controllers Security Group that was created by your Launch Wizard Active Directory.
Select Edit inbound rules. The Security Group should start with LaunchWizard-UsWest2AD-EC2ADStackExistingVPC-
Choose Add rule and enter the following:
Type: Select All traffic
Protocol: All
Port range: All
Source: Select Custom
Enter the CIDR of your remote VPC. An example would be 0.0.0/24
Choose Save rules.
Create an AD User and Verify Replication
In this section, I create a user in one region and verify that it replicated to the other region. I also use AD replication diagnostics tools to verify that replication is working properly.
Create a Test User Account
Log in to one of the domain controllers in us-east-1.
Select Start, type dsa and press Enter on your keyboard. The Active Directory Users and Computers MMC should appear.
Right click on the Users container and select New > User.
Enter the following inputs:
First name: John
Last name: Doe
User logon name: jdoe and select Next
Password and Confirm password: Your choice of complex password
Uncheck User must change password at next logon
Select Next.
Select Finish.
Verify Test User Account Has Replicated
Log in to the one of the domain controllers in us-west-2.
Select Start and type dsa.
Then, press Enter on your keyboard. The Active Directory Users and Computers MMC should appear.
Select Users. You should see a user object named John Doe.
Note that if the user is not present, it may not have been replicated yet. Replication should not take longer than 60 seconds from when the item was created.
Summary
Congratulations, you have created a cross-region Active Directory! In this post you:
Launched a new Active Directory forest in us-east-1 using AWS Launch Wizard.
Configured Active Directory Sites and Service for a multi-region configuration.
Launched a set of new domain controllers in the us-west-2 region using AWS Launch Wizard.
Created a test user and verified replication.
This post only touches on a couple of features that are available in the AWS Launch Wizard Active Directory deployment. AWS Launch Wizard also automates the creation of a Single Tier PKI infrastructure or trust creation. One of the prime benefits of this solution is the simplicity in deploying a fully functional Active Directory environment in just a few clicks. You no longer need to do the undifferentiated heavy lifting required to deploy Active Directory. For more information, please refer to AWS Launch Wizard documentation.
Toward the end of the second incident that Volexity worked involving Dark Halo, the actor was observed accessing the e-mail account of a user via OWA. This was unexpected for a few reasons, not least of which was the targeted mailbox was protected by MFA. Logs from the Exchange server showed that the attacker provided username and password authentication like normal but were not challenged for a second factor through Duo. The logs from the Duo authentication server further showed that no attempts had been made to log into the account in question. Volexity was able to confirm that session hijacking was not involved and, through a memory dump of the OWA server, could also confirm that the attacker had presented cookie tied to a Duo MFA session named duo-sid.
Volexity’s investigation into this incident determined the attacker had accessed the Duo integration secret key (akey) from the OWA server. This key then allowed the attacker to derive a pre-computed value to be set in the duo-sid cookie. After successful password authentication, the server evaluated the duo-sid cookie and determined it to be valid. This allowed the attacker with knowledge of a user account and password to then completely bypass the MFA set on the account. It should be noted this is not a vulnerability with the MFA provider and underscores the need to ensure that all secrets associated with key integrations, such as those with an MFA provider, should be changed following a breach.
Again, this is not a Duo vulnerability. From ArsTechnica:
While the MFA provider in this case was Duo, it just as easily could have involved any of its competitors. MFA threat modeling generally doesn’t include a complete system compromise of an OWA server. The level of access the hacker achieved was enough to neuter just about any defense.
On November 26, version 6.1 of GNU Octave, a language and
environment for numerical computing, was released. There are several new
features and enhancements in this release, including improvements to
graphics output, better communication with web services, and over 40 new
functions. We will take a look at where Octave fits into the landscape of
numerical tools for scientists and engineers, and recount some of its long
history.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.