Tag Archives: Case Study

ConexED uses Amazon QuickSight to empower its institutional partners by unifying and curating powerful insights using engagement data

Post Syndicated from Michael Gorham original https://aws.amazon.com/blogs/big-data/conexed-uses-amazon-quicksight-to-empower-its-institutional-partners-by-unifying-and-curating-powerful-insights-using-engagement-data/

This post was co-written with Michael Gorham, Co-Founder and CTO of ConexED.

ConexED is one of the country’s fastest-growing EdTech companies designed specifically for education to enhance the student experience and elevate student success. Founded as a startup in 2008 to remove obstacles that hinder student persistence and access to student services, ConexED provides advisors, counselors, faculty, and staff in all departments across campus the tools necessary to meet students where they are.

ConexED offers a student success and case management platform, HUB Kiosk – Queuing System, and now a business intelligence (BI) dashboard powered by Amazon QuickSight to empower its institutional partners.

ConexED strives to make education more accessible by providing tools that make it easy and convenient for all students to connect with the academic support services that are vital to their success in today’s challenging and ever-evolving educational environment. ConexED’s student- and user-friendly interface makes online academic communications intuitive and as personalized as face-to-face encounters, while also making on-campus meetings as streamlined, and well reported as online meetings.

One of the biggest obstacles facing school administrators is getting meaningful data quickly so that informed, data-driven decisions can be made. Reporting can be time-consuming, so they are often generated infrequently, which leads to outdated data. In addition, reporting often lacks customization and data is typically captured in spreadsheets, which doesn’t provide a visual representation of the information that is easy to interpret. ConnexED has always offered robust reporting features, but the problem was that in providing this kind of data for our partners, our development team was spending more than half its time creating custom reporting for the constantly increasing breadth of data the ConexED system generates.

Every new feature we built requires at least two or three new reports – and therefore more of our development team’s time. After we implemented QuickSight, not only can ConexED’s development team focus all its energies on creating competitive features to accelerate the rollout of new product features, but also the reporting and data visualization are now features our customers can control and customize. QuickSight features such as drill-down filtering, predictive forecasting, and aggregation insights have given us the competitive edge that our customers expect from a modern, cloud-based solution.

New technology enables strategic planning

With QuickSight, we’re able to focus on building customer-facing solutions that capture data rather than spending a large portion of our development time solving data visualization and custom report problems. Our development team no longer has to spend its time creating reports for all the data generated, and our customers don’t need to wait. Partnering with QuickSight has enabled ConexED to develop its business intelligence dashboard, which is designed to create operational efficiencies, identify opportunities, and empower institutions by uniting critical data insights to cross-campus student support services. The QuickSight data used in ConexED’s BI dashboard analyzes collected information in real time, allowing our partners to properly project trends in the coming school year using predictive analytics to improve staff efficiency, enhance the student experience, and increase rates of retention and graduation.

The following image demonstrates heat mapping, which displays the recurring days and times when student requests for support services are most frequent, with the busiest hour segments appearing more saturated in color. This enables leadership to utilize staff efficiently so that students have the support services they need when they need it on their pathway to graduation. ConexED’s BI dashboard powered by QuickSight makes this kind of information possible so that our partners can plan strategically.

QuickSight dashboards allow our customers to drill down on the data to glean even more insights of what is happening on their campus. In the following example, the pie chart depicts a whole-campus view of meetings by department, but leadership can choose one of the colored segments to drill down further for more information about a specific department. Whatever the starting point, leadership now has the ability to access more specific, real-time data to understand what’s happening on their campus or any part of it.

Dashboards provide data visualization

Our customers have been extremely impressed with our QuickSight dashboards because they provide data visualizations that make the information easier to comprehend and parse. The dynamic, interactive nature of the dashboards allows ConexED’s partners to go deeper into the data with just a click of the mouse, which immediately generates new data based on what was clicked and therefore new visuals.

With QuickSight, not only can we programmatically display boiler-plate dashboards based on role type, but we can also allow our clients to branch off these dashboards and customize the reporting to their liking. The development team is now able to move quickly to build interesting features that ingest data and provide insightful visualizations and reports on the gathered data easily. ConexED’s BI dashboard powered by QuickSight enables leadership at our partner institutions to understand how users engage with support services on their campus – when they meet, why they meet, how they meet – so that they can make informed decisions to improve student engagement and services.

The right people with the right information

In education, giving the right level of data access to the right people is essential. With intuitive row- and column-level security and anonymous tagging in QuickSight, the ConexED development team was able to quickly build visualizations that correctly display partitioned data to thousands of different users with varying levels of access across our client base.

At ConexED, student success is paramount, and with QuickSight powering our BI dashboard, the right people get the right data, and our institutional customers can now easily analyze vast amounts of data to identify trends in student acquisition, retention, and completion rates. They can also solve student support staffing allocation problems and improve the student experience at their institutions.

QuickSight does the heavy lifting

The ability to securely pull and aggregate data from disparate sources with very little setup work has given ConexED a head start on the predictive analytics space in the EdTech market. Now building visualizations is intuitive, insightful, and fun. In fact, the development team even built in only 1 day an internal QuickSight dashboard to view our own customers’ QuickSight usage. The data visualization combinations are seemingly endless and infinitely valuable to our customers.

ConexED’s partnership with AWS has enabled us to use QuickSight to drive our BI dashboard and provide our customers with the power and information needed for today’s dynamic modern student support services teams.


About the Author

Michael Gorham is Co-Founder and CTO of ConexED. Michael is a multidisciplinary software architect with over 20 years’ experience

SEEK Asia modernizes search with CI/CD and Amazon OpenSearch Service

Post Syndicated from Fabian Tan original https://aws.amazon.com/blogs/big-data/seek-asia-modernizes-search-with-ci-cd-and-amazon-opensearch-service/

This post was written in collaboration with Abdulsalam Alshallah (Salam), Software Architect, and Hans Roessler, Principal Software Engineer at SEEK Asia.

SEEK is a market leader in online employment marketplaces with deep and rich insights into the future of work. As a global business, SEEK has a presence in Australia, New Zealand, Hong Kong, Southeast Asia, Brazil and Mexico and its websites attract over 400 million visits per year. SEEK Asia’s business operates across seven countries and includes leading portal brands such as jobsdb.com and jobstreet.com and leverages data and technology to create innovative solutions for candidates and hirers.

In this post, we share how SEEK Asia modernized their search-based system with a continuous integration and continuous delivery (CI/CD) pipeline and Amazon OpenSearch Service (successor to Amazon Elasticsearch Service).

Challenges associated with a self-managed search system

SEEK Asia provides a search-based system that enables employers to manage interactions between hirers and candidates. Although the system was already on AWS, it was a self-managed system running on Amazon Elastic Compute Cloud (Amazon EC2) with limited automation.

The self-managed system posed several challenges:

  • Slower release cycles – Deploying new configurations or new field mappings into the Elasticsearch cluster was a high-risk activity because changes affected the stability of the system. The little automation on both the self-managed cluster and workflows led to slower release cycles.
  • Higher operational overhead – Sizing the cluster to deliver greater performance, while managing cost effectively, was the other challenge. As with every other distributed system, even with sizing guidance, identifying the appropriate number of shards per node and the number of nodes to meet performance requirements still required some amount of trial and error, turning the exercise into a tedious and time-consuming activity. This consequently also led to slower release cycles. To overcome this challenge, in many occasions, oversizing the cluster became the quickest way to achieve the desired time to market, at the expense of cost.

Further challenges the team faced with self-managing their own Elasticsearch cluster included keeping up with new security patches, and minor and major platform upgrades.

Automating search delivery with Amazon OpenSearch Service

SEEK Asia knew that automation would the key to solving the challenges of their existing search service. Automating the undifferentiated heavy lifting would enable them to deliver more value to their customers quickly and improve staff productivity.

With the problems defined, the team set out to solve the challenges by automating the following:

  • Search infrastructure deployment
  • Search A/B testing infrastructure deployment
  • Redeployment of search infrastructure for any new infrastructure configuration (such as security patches or platform upgrades) and index mapping updates

The key services enabling the automation would be Amazon OpenSearch Service and establishing a search infrastructure CI/CD pipeline.

Architecture overview

The following diagram illustrates the architecture of the SEEK infrastructure and CI/CD pipeline with Amazon OpenSearch Service.

The workflow includes the following steps:

  1. Before the workflow kicks off, an existing Amazon OpenSearch Service cluster with a live feeder hydrates it. The live feeder is a serverless application built on Amazon Simple Queue Service (Amazon SQS) via Amazon Simple Notification Service (Amazon SNS) and AWS Lambda. Amazon SQS queues documents for processing, Amazon SNS enables data fanout (if required), and a Lambda function is invoked to process messages in the SQS queue to import data into Amazon OpenSearch Service. The feeder receives live updates for changes that need to be reflected on the cluster. Write concurrency to Amazon OpenSearch Service is managed by limiting the number of concurrent Lambda function invocations.
  2. The Amazon OpenSearch Service index mapping is version controlled in SEEK’s Git repository. Whenever an update to the index mapping is committed, the CI/CD pipeline kicks off a new Amazon OpenSearch Service cluster provisioning workflow.
  3. As part of the workflow, a new data hydration initialization feeder is deployed. The initialization feeder construct is similar to the live feeder, with one additional component: a script that runs within the CI/CD pipeline to calculate the number of batches required to hydrate the newly provisioned Amazon OpenSearch Service cluster up to a specific timestamp. The feeder systems were designed to achieve idempotency processing. This meant unique identifiers (UIDs) from the source data stores are reused for each document, and duplicated documents update an existing document with the exact same values.
  4. At the same time as Step 3, an Amazon OpenSearch Service cluster is deployed. To accelerate the initial data hydration process temporarily, the new cluster may be sized two or three times larger against sizing guidance with shard replicas and index refresh interval disabled until the hydration process is complete. The existing Amazon OpenSearch Service cluster remains as is, which means that two clusters are running concurrently.
  5. The script inspects the number of documents the source data store has and groups the documents by batch sizes. SEEK identified that 1,000 documents per batch provided the optimal ingestion import time, after running numerous experiments.
  6. Each batch is represented as one message and is queued into Amazon SQS via Amazon SNS. Every message that lands in Amazon SQS invokes a Lambda function. The Lambda function queries a separate data store, builds the document, and loads it into Amazon OpenSearch Service. The more messages that go into the queue, the more functions are invoked. To create baselines that allowed for further indexing optimization, the team took the following configurations into consideration and reiterated to achieve higher ingestion performance:
    1. Memory of the Lambda function
    2. Size of batch
    3. Size of each document in the batch
    4. Size of cluster (memory, vCPU, and number of primary shards)
  7. With the initialization feeder running, new documents are streamed to the cluster until it is synced with the data source. Eventually, the newly provisioned Amazon OpenSearch Service cluster catches up and is in the same state as the existing cluster. The hydration is complete when there are no remaining messages in the SQS queue.
  8. The initialization feeder is deleted and the Amazon OpenSearch Service cluster is downsized automatically to complete the deployment workflow, with replica shards created and the index refresh interval configured.
  9. Live search traffic is routed to the newly provisioned cluster when A/B testing is enabled via the API layer built on Application Load Balancer, Amazon Elastic Container Service (Amazon ECS), and Amazon CloudFront. The API layer decouples the client interface from the backend implementation that runs on Amazon OpenSearch Service.

Improved time to market and other outcomes

With Amazon OpenSearch Service, SEEK was able to automate an entire cluster, complete with Kibana, in a secure, managed environment. If testing didn’t produce the desired results, the team could change the dimensions of the cluster horizontally or vertically using different instance offerings within minutes. This enabled them to perform stress tests quickly to identify the sweet spot between performance and cost of the workload.

“By integrating Amazon OpenSearch Service with our existing CI/CD tools, we’re able to fully automate our search function deployments, which accelerated software delivery time,” says Abdulsalam Alshallah, APAC Software Architect. “The newly found confidence in the modern stack, alongside improved engineering practices, allowed us to mitigate the risk of changes—improving our time to market by 89% with zero impact to uptime.”

With the adoption of Amazon OpenSearch Service, other teams also saw improvements, including the following:

  • Common Vulnerability and Exposure (CVE) has dropped to zero with Amazon OpenSearch Service handling the underlying hardware security updates on SEEK’s behalf, improving their security posture
  • Improved availability with the Amazon OpenSearch Service Availability Zone awareness feature

Conclusion

Amazon OpenSearch Service managed capabilities has helped SEEK Asia to improve customer experience with speed and automation. By removing the undifferentiated heavy lifting, teams can deploy changes quickly to their search engines, allowing customers to get the latest search features faster and ultimately contributing to the SEEK purpose of helping people live more productive working lives and organisations succeed.

To learn more about Amazon OpenSearch Service, see Amazon OpenSearch Service features, the Developer Guide, or Introducing OpenSearch.


About the Authors

Fabian Tan is a Principal Solutions Architect at Amazon Web Services. He has a strong passion for software development, databases, data analytics and machine learning. He works closely with the Malaysian developer community to help them bring their ideas to life.

Hans Roessler is a Principal Software Architect at SEEKAsia. He is excited about new technologies and upgrading legacy to newer stacks. Always staying in touch with the latest technologies is one of his passions.

Abdulsalam Alshallah (Salam) is a Software architect at SEEK, Previously a Lead Cloud Architect for SEEKAsia, Salam has always been excited about new technologies, Cloud, Serverless & DevOps, in addition to his passion of eliminating wasted time/effort & resources; He is also one of the leaders of AWS User Group Malaysia.

Lucerna Health uses Amazon QuickSight embedded analytics to help healthcare customers uncover new insights

Post Syndicated from David Atkins original https://aws.amazon.com/blogs/big-data/lucerna-health-uses-amazon-quicksight-embedded-analytics-to-help-healthcare-customers-uncover-new-insights/

This is a guest post by Lucerna Health. Founded in 2018, Lucerna Health is a data technology company that connects people and data to deliver value-based care (VBC) results and operational transformation.

At Lucerna Health, data is at the heart of our business. Every day, we use clinical, sales, and operational data to help healthcare providers and payers grow and succeed in the value-based care (VBC) environment. Through our HITRUST CSF® certified Healthcare Data Platform, we support payer-provider integration, health engagement, database marketing, and VBC operations.

As our business grew, we found that faster real-time analysis and reporting capabilities through our platform were critical to success. However, that was a challenge for our data analytics team, which was busier than ever developing our proprietary data engine and data model. No matter how many dashboards we built, we knew we could never keep up with user demand with our previous BI solutions. We needed a more scalable technology that could grow as our customer base continued to expand.

In this post, we will outline how Amazon QuickSight helped us overcome these challenges.

Embedding analytics with QuickSight

We had a rising demand for business intelligence (BI) from our customers, and we needed a better tool to help us keep pace that met our security requirements and was part of a comprehensive business associate contract (BAA) and met HIPAA and other privacy standards. We were using several other BI solutions internally for impromptu analysis and reporting, but we realized we needed a fully embedded solution to provide more automation and an integrated experience within our Healthcare Data Platform. After trying out a different solution, we discovered it wasn’t cost-effective for us. That’s when we turned our attention to AWS.

Three years ago, we decided to go all-in on AWS, implementing a range of AWS services for compute, storage, and networking. Today, each of the building blocks we have in our IT infrastructure run on AWS. For example, we use Amazon Redshift, AWS Glue, and Amazon EMR for our Spark data pipelines, data lake, and data analytics. Because of our all-in approach, we were pleased to find that AWS had a BI platform called QuickSight. QuickSight is a powerful and cost-effective BI service that offers a strong feature set including self-service BI capabilities and interactive dashboards, and we liked the idea of continuing to be all-in on AWS by implementing this service.

One of the QuickSight’s features we were most excited about was its ability to embed analytics deep within our Healthcare Data Platform. With this solution’s embedded analytics software, we were able to integrate QuickSight dashboards directly into our own platform. For example, we offer our customers a portal where they can register a new analytical dashboard through our user interface. That interface connects to the QuickSight application programming interface (API) to enable embedding in a highly configurable and secure way.

With this functionality, our customers can ingest and visualize complex healthcare data, such as clinical data from electronic medical record (EMR) systems, eligibility and claims, CRM and digital interactions data. Our Insights data model is projected into Quicksight’s high performance in memory calculation engine enabling high performance analysis on massive datasets.

Creating a developer experience for customers

We have also embedded the QuickSight console into our platform. Through this approach, our healthcare data customers can build their own datasets and quickly share that data with a wider group of users through our platform. This gives our customers a developer experience that enables them to customize and share analytical reports with their colleagues. With only a few clicks, users can aggregate and compare data from their sales and EMR solutions.

QuickSight has also improved collaboration for our own teams when it comes to custom reports. In the past, teams could only do monthly or specialized reports, spending a lot of time building them, downloading them as PDFs, and sending them out to clients as slides. It was a time-consuming and inefficient way to share data. Now, our users can get easy access to data from previously siloed sources, and then simply publish reports and share access to that data immediately.

Helping healthcare providers uncover new insights

Because healthcare providers now have centralized data at their fingertips, they can make faster and more strategic decisions. For instance, management teams can look at dashboards on our platform to see updated demand data to plan more accurate staffing models. We’ve also created patient and provider data models that provide a 360-degree view of patient and payer data, increasing visibility. Additionally, care coordinators can reprioritize tasks and take action if necessary because they can view gaps in care through the dashboards. Armed with this data, care coordinators can work to improve the patient experience at the point of care.

Building and publishing reports twice as fast

QuickSight is a faster BI solution than anything we’ve used before. We can now craft a new dataset, apply permissions to it, build out an analysis, and publish and share it in a report twice as fast as we could before. The solution also gives our developers a better overall experience. For rapid development and deployment at scale, QuickSight performs extremely well at a very competitive price.

Because QuickSight is a serverless solution, we no longer need to worry about our BI overhead. With our previous solution, we had a lot of infrastructure, maintenance, and licensing costs. We have eliminated those challenges by implementing QuickSight. This is a key benefit because we’re an early stage company and our lean product development team can now focus on innovation instead of spinning up servers.

As our platform has become more sophisticated over the past few years, QuickSight has introduced vast number of great features for data catalog management, security, ML integrations, and look/feel that has really improved on our original solution’s BI capabilities. We look forward to continuing to use this powerful tool to help our customers get more out of their data.


About the Authors

David Atkins is the Co-Founder & Chief Operating Officer at Lucerna Health. Before co-founding Lucnera Health in 2018, David held multiple leadership roles in healthcare organizations, including spending six years at Centen Corporation as the Corporate Vice President of Enterprise Data and Analytic Solutions. Additionally, he served as the Provider Network Management Director at Anthem. When he isn’t spending time with his family, he can be found on the ski slopes or admiring his motorcycle, which he never rides.

Adriana Murillo is the Co-Founder & Chief Marketing Officer at Lucerna Health. Adriana has been involved in the healthcare industry for nearly 20 years. Before co-founding Lucerna Health, she founded Andes Unite, a marketing firm primarily serving healthcare provider organizations and health insurance plans. In addition, Adriana held leadership roles across market segment leadership, product development, and multicultural marketing at not-for-profit health solutions company Florida Blue. Adriana is a passionate cook who loves creating recipes and cooking for her family.

Accelo uses Amazon QuickSight to accelerate time to value in delivering embedded analytics to professional services businesses

Post Syndicated from Mahlon Duke original https://aws.amazon.com/blogs/big-data/accelo-uses-amazon-quicksight-to-accelerate-time-to-value-in-delivering-embedded-analytics-to-professional-services-businesses/

This is a guest post by Accelo. In their own words, “Accelo is the leading cloud-based platform for managing client work, from prospect to payment, for professional services companies. Each month, tens of thousands of Accelo users across 43 countries create more than 3 million activities, log 1.2 million hours of work, and generate over $140 million in invoices.”

Imagine driving a car with a blacked-out windshield. It sounds terrifying, but it’s the way things are for most small businesses. While they look into the rear-view mirror to see where they’ve been, they lack visibility into what’s ahead of them. The lack of real-time data and reliable forecasts leaves critical decisions like investment, hiring, and resourcing to “gut feel.” An industry survey conducted by Accelo shows 67% of senior leaders don’t have visibility into team utilization, and 54% of them can’t track client project budgets, much less profitability.

Professional services businesses generate most of their revenue directly from billable work they do for clients every day. Because no two clients, projects, or team members are the same, real-time and actionable insight is paramount to ensure happy clients and a successful, profitable business. A big part of the problem is that many businesses are trying to manage their client work with a cocktail of different, disconnected systems. No wonder KPMG found that 56% of CEOs have little confidence in the integrity of the data they’re using for decision-making.

Accelo’s mission is to solve this problem by giving businesses an integrated system to manage all their client work, from prospect to payment. By combining what have historically been disparate parts of the business—CRM, sales, project management, time tracking, client support, and billing—Accelo becomes the single source of truth for your business’s most important data.

Even with a trustworthy, automated and integrated system, decision makers still need to harness the data so they see what’s in front of them and can anticipate for the future. Accelo devoted all our resources and expertise to building a complete client work management platform, made up of essential products to achieve the greatest profitability. We recognized that in order to make the platform most effective, users needed to be empowered with the strongest analytics and actionable insights for strategic decision making. This drove us to seek out a leading BI solution that could seamlessly integrate with our platform and create the greatest user experience. Our objective was to ensure that Accelo users had access to the best BI tool without requiring them to spend more of their valuable time learning yet another tool – not to mention another login. We needed a powerful embedded analytics solution.

We evaluated dozens of leading BI and embedded reporting solutions, and Amazon QuickSight was the clear winner. In this post, we discuss why, and how QuickSight accelerated our time to value in delivering embedded analytics to our users.

Data drives insights

Even today, many organizations track their work manually. They extract data from different systems that don’t talk to each other, and manually manipulate it in spreadsheets, which wastes time and introduces the kinds of data integrity problems that cause CEOs to lose their confidence. As companies grow, these manual and error-prone approaches don’t scale with them, and the sheer level of effort required to keep data up to date can easily result in leaders just giving up.

With this in mind, Accelo’s embedded analytics solution was built from the ground up to grow with us and with our users. As a part of the AWS family, QuickSight eliminated one of the biggest hurdles for embedded analytics through its SPICE storage system. SPICE enables us to create unlimited, purpose-built datasets that are hosted in Amazon’s dynamic storage infrastructure. These smaller datasets load more quickly than your typical monolithic database, and can be updated as often as we need, all at an affordable per-gigabyte rate. This allows us to provide real-time analytics to our users swiftly, accurately, and economically.

“Being able to rely on Accelo to tell us everything about our projects saves us a lot of time, instead of having to go in and download a lot of information to create a spreadsheet to do any kind of analysis,” says Katherine Jonelis, Director of Operations, MHA Consulting. “My boss loves the dashboards. He loves just being able to look at that and instantly know, ‘Here’s where we are.’”

In addition to powering analytics for our users, QuickSight also helps our internal teams identify and track vital KPIs, which historically has been done via third-party apps. These metrics can cover anything, from calculating the effective billable rate across hundreds of projects and thousands of time entries, to determining how much time is left for the team to finish their tasks profitably and on budget. Because the reports are embedded directly in Accelo, which already houses all the data, it was easy for our team to adapt to the new reports and require minimal training.

Integrated vs. embedded

One of the most important factors in our evaluation of BI platforms was the time to value. We asked ourselves two questions: How long would it take to have the solution up and running, and how long would it take for our users to see value from it?

While there are plenty of powerful third-party, integrated BI products out there, they often require a complete integration, adding authentication and configuration on top of basic data extraction and transformations. This makes them an unattractive option, especially in an increasingly security-focused landscape. Meanwhile, most of the embedded products we evaluated required a time to launch that numbered in the months—spending time on infrastructure, data sources, and more. And that’s without considering the infrastructure and engineering costs of ongoing maintenance. One key benefit that propels QuickSight above other products is that it allowed us to reduce that setup time from months to weeks, and completely eliminated any configuration work for the end-user. This is possible thanks to built-in tools like native connections for AWS data sources, row-level security for datasets, and a simple user provisioning process.

Developer hours can be expensive, and are always in high demand. Even in a responsive and agile development environment like Accelo’s, development work still requires lead time before it can be scheduled and completed. Engineering resources are also finite—if they’re working on one thing today, something else is probably going into the backlog. QuickSight enables us to eliminate this bottleneck by shifting the task of managing these analytics from developers to data analysts. We used QuickSight to easily create datasets and reports, and placed a simple API call to embed them for our clients so they can start using them instantly. Now we’re able to quickly respond to our users’ ever-changing needs without requiring developers. That further improves the speed and quality of our data by using both the analysts’ general expertise with data visualization and their unique knowledge of Accelo’s schema. Today, all of Accelo’s reports are created and deployed through QuickSight. We’re able to accommodate dozens of custom requests each month for improvements—major and minor—without ever needing to involve a developer.

Implementation and training were also key considerations during our evaluation. Our customers are busy running their businesses. The last thing they want is to get trained on a new tool, not to mention the typically high cost associated with implementation. As a turnkey solution that requires no configuration and minimal education, QuickSight was the clear winner.

Delivering value in an agile environment

It’s no secret that employees dislike timesheets and would rather spend time working with their clients. For many services companies, logged time is how they bill their clients and get paid. Therefore, it’s vital that employees log all their hours. To make that process as painless as possible, Accelo offers several tools that minimize the amount of work it takes an employee to log their time. For example, the Auto Scheduling tool automatically builds out employees’ schedules based on the work they’re assigned, and logs their time with a single click. Inevitably, however, someone always forgets to log their time, leading to lost revenue.

To address this issue, Accelo built the Missing Time report, which pulls hundreds of thousands of time entries, complex work schedules, and even holiday and PTO time together to offer answers to these questions: Who hasn’t logged their time? How much time is missing? And from what time period?

Every business needs to know whether they’re profitable. Professional services businesses are unique in that profitability is tied directly to their individual clients and the relationships with them. Some clients may generate high revenues but require so much extra maintenance that they become unprofitable. On the other hand, low-profile clients that don’t require a lot of attention can significantly contribute to the business’s bottom line. By having all the client data under one roof, these centralized and embedded reports can provide visibility into your budgets, time entries, work status, and team utilization. This makes it possible to make real-time, data-driven actions without having to spend all day to get the data.

Summary

Clean and holistic data fosters deep insights that can lead to higher margins and profits. We’re excited to partner with AWS and QuickSight to provide professional services businesses with real-time insights into their operations so they can become truly data driven, effortlessly. Learn more about Accelo, and Amazon QuickSight Embedded Analytics!


About the Authors

Mahlon Duke, Accelo Product Manager of BI and Data.

Geoff McQueen, Accelo Founder and CEO.

How GE Aviation built cloud-native data pipelines at enterprise scale using the AWS platform

Post Syndicated from Alcuin Weidus original https://aws.amazon.com/blogs/big-data/how-ge-aviation-built-cloud-native-data-pipelines-at-enterprise-scale-using-the-aws-platform/

This post was co-written with Alcuin Weidus, Principal Architect from GE Aviation.

GE Aviation, an operating unit of GE, is a world-leading provider of jet and turboprop engines, as well as integrated systems for commercial, military, business, and general aviation aircraft. GE Aviation has a global service network to support these offerings.

From the turbosupercharger to the world’s most powerful commercial jet engine, GE’s history of powering the world’s aircraft features more than 90 years of innovation.

In this post, we share how GE Aviation built cloud-native data pipelines at enterprise scale using the AWS platform.

A focus on the foundation

At GE Aviation, we’ve been invested in the data space for many years. Witnessing the customer value and business insights that could be extracted from data at scale has propelled us forward. We’re always looking for new ways to evolve, grow, and modernize our data and analytics stack. In 2019, this meant moving from a traditional on-premises data footprint (with some specialized AWS use cases) to a fully AWS Cloud-native design. We understood the task was challenging, but we were committed to its success. We saw the tremendous potential in AWS, and were eager to partner closely with a company that has over a decade of cloud experience.

Our goal from the outset was clear: build an enterprise-scale data platform to accelerate and connect the business. Using the best of cloud technology would set us up to deliver on our goal and prioritize performance and reliability in the process. From an early point in the build, we knew that if we wanted to achieve true scale, we had to start with solid foundations. This meant first focusing on our data pipelines and storage layer, which serve as the ingest point for hundreds of source systems. Our team chose Amazon Simple Storage Service (Amazon S3) as our foundational data lake storage platform.

Amazon S3 was the first choice as it provides an optimal foundation for a data lake store delivering virtually unlimited scalability and 11 9s of durability. In addition to its scalable performance, it has ease-of-use features, native encryption, and access control capabilities. Equally important, Amazon S3 integrates with a broad portfolio of AWS services, such as Amazon Athena, the AWS Glue Data Catalog, AWS Glue ETL (extract, transform, and load) Amazon Redshift, Amazon Redshift Spectrum, and many third-party tools, providing a growing ecosystem of data management tools.

How we started

The journey started with an internal hackathon that brought cross-functional team members together. We organized around an initial design and established an architecture to start the build using serverless patterns. A combination of Amazon S3, AWS Glue ETL, and the Data Catalog were central to our solution. These three services in particular aligned to our broader strategy to be serverless wherever possible and build on top of AWS services that were experiencing heavy innovation in the way of new features.

We felt good about our approach and promptly got to work.

Solution overview

Our cloud data platform built on Amazon S3 is fed from a combination of enterprise ELT systems. We have an on-premises system that handles change data capture (CDC) workloads and another that works more in a traditional batch manner.

Our design has the on-premises ELT systems dropping files into an S3 bucket set up to receive raw data for both situations. We made the decision to standardize our processed data layer into Apache Parquet format for our cataloged S3 data lake in preparation for more efficient serverless consumption.

Our enterprise CDC system can already land files natively in Parquet; however, our batch files are limited to CSV, so the landing of CSV files triggers another serverless process to convert these files to Parquet using AWS Glue ETL.

The following diagram illustrates this workflow.

When raw data is present and ready in Apache Parquet format, we have an event-triggered solution that processes the data and loads it to another mirror S3 bucket (this is where our users access and consume the data).

Pipelines are developed to support loading at a table level. We have specific AWS Lambda functions to identify schema errors by comparing each file’s schema against the last successful run. Another function validates that a necessary primary key file is present for any CDC tables.

Data partitioning and CDC updates

When our preprocessing Lambda functions are complete, the files are processed in one of two distinct paths based on the table type. Batch table loads are by far the simpler of the two and are handled via a single Lambda function.

For CDC tables, we use AWS Glue ETL to load and perform the updates against our tables stored in the mirror S3 bucket. The AWS Glue job uses Apache Spark data frames to combine historical data, filter out deleted records, and union with any records inserted. For our process, updates are treated as delete-then-insert. After performing the union, the entire dataset is written out to the mirror S3 bucket in a newly created bucket partition.

The following diagram illustrates this workflow.

We write data into a new partition for each table load, so we can provide read consistency in a way that makes sense to our consuming business partners.

Building the Data Catalog

When each Amazon S3 mirror data load is complete, another separate serverless branch is triggered to handle catalog management.

The branch updates the location property within the catalog for pre-existing tables, indicating each newly added partition. When loading a table for the first time, we trigger a series of purpose-built Lambda functions to create the AWS Glue Data Catalog database (only required when it’s an entirely new source schema), create an AWS Glue crawler, start the crawler, and delete the crawler when it’s complete.

The following diagram illustrates this workflow.

These event-driven design patterns allow us to fully automate the catalog management piece of our architecture, which became a big win for our team because it lowered the operational overhead associated with onboarding new source tables. Every achievement like this mattered because it realized the potential the cloud had to transform how we build and support products across our technology organization.

Final implementation architecture and best practices

The solution evolved several times throughout the development cycle, typically due to learning something new in terms of serverless and cloud-native development, and further working with AWS Solutions Architect and AWS Professional Services teams. Along the way, we’ve discovered many cloud-native best practices and accelerated our serverless data journey to AWS.

The following diagram illustrates our final architecture.

We strategically added Amazon Simple Queue Service (Amazon SQS) between purpose-built Lambda functions to decouple the architecture. Amazon SQS gave our system a level of resiliency and operational observability that otherwise would have been a challenge.

Another best practice arose from using Amazon DynamoDB as a state table to help ensure our entire serverless integration pattern was writing to our mirror bucket with ACID guarantees.

On the topic of operational observability, we use Amazon EventBridge to capture and report on operational metadata like table load status, time of the last load, and row counts.

Bringing it all together

At the time of writing, we’ve had production workloads running through our solution for the better part of 14 months.

Production data is integrated from more than 30 source systems at present and totals several hundred tables. This solution has given us a great starting point for building our cloud data ecosystem. The flexibility and extensibility of AWS’s many services have been key to our success.

Appreciation for the AWS Glue Data Catalog has been an essential element. Without knowing it at the time we started building a data lake, we’ve been embracing a modern data architecture pattern and organizing around our transactionally consistent and cataloged mirror storage layer.

The introduction of a more seamless Apache Hudi experience within AWS has been a big win for our team. We’ve been busy incorporating Hudi into our CDC transaction pipeline and are thrilled with the results. We’re able to spend less time writing code managing the storage of our data, and more time focusing on the reliability of our system. This has been critical in our ability to scale. Our development pipeline has grown beyond 10,000 tables and more than 150 source systems as we approach another major production cutover.

Looking ahead, we’re intrigued by the potential for AWS Lake Formation goverened tables to further accelerate our momentum and management of CDC table loads.

Conclusion

Building our cloud-native integration pipeline has been a journey. What started as an idea and has turned into much more in a brief time. It’s hard to appreciate how far we’ve come when there’s always more to be done. That being said, the entire process has been extraordinary. We built deep and trusted partnerships with AWS, learned more about our internal value statement, and aligned more of our organization to a cloud-centric way of operating.

The ability to build solutions in a serverless manner opens up many doors for our data function and, most importantly, our customers. Speed to delivery and the pace of innovation is directly related to our ability to focus our engineering teams on business-specific problems while trusting a partner like AWS to do the heavy lifting of data center operations like racking, stacking, and powering servers. It also removes the operational burden of managing operating systems and applications with managed services. Finally, it allows us to focus on our customers and business process enablement rather than on IT infrastructure.

The breadth and depth of data and analytics services on AWS make it possible to solve our business problems by using the right resources to run whatever analysis is most appropriate for a specific need. AWS Data and Analytics has deep integrations across all layers of the AWS ecosystem, giving us the tools to analyze data using any approach quickly. We appreciate AWS’s continual innovation on behalf of its customers.


About the Authors

Alcuin Weidus is a Principal Data Architect for GE Aviation. Serverless advocate, perpetual data management student, and cloud native strategist, Alcuin is a data technology leader on a team responsible for accelerating technical outcomes across GE Aviation. Connect him on Linkedin.

Suresh Patnam is a Sr Solutions Architect at AWS; He works with customers to build IT strategy, making digital transformation through the cloud more accessible, focusing on big data, data lakes, and AI/ML. In his spare time, Suresh enjoys playing tennis and spending time with his family. Connect him on LinkedIn.

How Roche democratized access to data with Google Sheets and Amazon Redshift Data API

Post Syndicated from Dr. Yannick Misteli original https://aws.amazon.com/blogs/big-data/how-roche-democratized-access-to-data-with-google-sheets-and-amazon-redshift-data-api/

This post was co-written with Dr. Yannick Misteli, João Antunes, and Krzysztof Wisniewski from the Roche global Platform and ML engineering team as the lead authors.

Roche is a Swiss multinational healthcare company that operates worldwide. Roche is the largest pharmaceutical company in the world and the leading provider of cancer treatments globally.

In this post, Roche’s global Platform and machine learning (ML) engineering team discuss how they used Amazon Redshift data API to democratize access to the data in their Amazon Redshift data warehouse with Google Sheets (gSheets).

Business needs

Go-To-Market (GTM) is the domain that lets Roche understand customers and create and deliver valuable services that meet their needs. This lets them get a better understanding of the health ecosystem and provide better services for patients, doctors, and hospitals. It extends beyond health care professionals (HCPs) to a larger Healthcare ecosystem consisting of patients, communities, health authorities, payers, providers, academia, competitors, etc. Data and analytics are essential to supporting our internal and external stakeholders in their decision-making processes through actionable insights.

For this mission, Roche embraced the modern data stack and built a scalable solution in the cloud.

Driving true data democratization requires not only providing business leaders with polished dashboards or data scientists with SQL access, but also addressing the requirements of business users that need the data. For this purpose, most business users (such as Analysts) leverage Excel—or gSheet in the case of Roche—for data analysis.

Providing access to data in Amazon Redshift to these gSheets users is a non-trivial problem. Without a powerful and flexible tool that lets data consumers use self-service analytics, most organizations will not realize the promise of the modern data stack. To solve this problem, we want to empower every data analyst who doesn’t have an SQL skillset with a means by which they can easily access and manipulate data in the applications that they are most familiar with.

The Roche GTM organization uses the Redshift Data API to simplify the integration between gSheets and Amazon Redshift, and thus facilitate the data needs of their business users for analytical processing and querying. The Amazon Redshift Data API lets you painlessly access data from Amazon Redshift with all types of traditional, cloud-native, and containerized, serverless web service-based applications and event-driven applications. Data API simplifies data access, ingest, and egress from languages supported with AWS SDK, such as Python, Go, Java, Node.js, PHP, Ruby, and C++ so that you can focus on building applications as opposed to managing infrastructure. The process they developed using Amazon Redshift Data API has significantly lowered the barrier for entry for new users without needing any data warehousing experience.

Use-Case

In this post, you will learn how to integrate Amazon Redshift with gSheets to pull data sets directly back into gSheets. These mechanisms are facilitated through the use of the Amazon Redshift Data API and Google Apps Script. Google Apps Script is a programmatic way of manipulating and extending gSheets and the data that they contain.

Architecture

It is possible to include publicly available JS libraries such as JQuery-builder provided that Apps Script is natively a cloud-based Javascript platform.

The JQuery builder library facilitates the creation of standard SQL queries via a simple-to-use graphical user interface. The Redshift Data API can be used to retrieve the data directly to gSheets with a query in place. The following diagram illustrates the overall process from a technical standpoint:

Even though AppsScript is, in fact, Javascript, the AWS-provided SDKs for the browser (NodeJS and React) cannot be used on the Google platform, as they require specific properties that are native to the underlying infrastructure. It is possible to authenticate and access AWS resources through the available API calls. Here is an example of how to achieve that.

You can use an access key ID and a secret access key to authenticate the requests to AWS by using the code in the link example above. We recommend following the least privilege principle when granting access to this programmatic user, or assuming a role with temporary credentials. Since each user will require a different set of permissions on the Redshift objects—database, schema, and table—each user will have their own user access credentials. These credentials are safely stored under the AWS Secrets Manager service. Therefore, the programmatic user needs a set of permissions that enable them to retrieve secrets from the AWS Secrets Manager and execute queries against the Redshift Data API.

Code example for AppScript to use Data API

In this section, you will learn how to pull existing data back into a new gSheets Document. This section will not cover how to parse the data from the JQuery-builder library, as it is not within the main scope of the article.

<script src="https://cdn.jsdelivr.net/npm/jQuery-QueryBuilder/dist/js/query-builder.standalone.min.js"></script>    
  1. In the AWS console, go to Secrets Manager and create a new secret to store the database credentials to access the Redshift Cluster: username and password. These will be used to grant Redshift access to the gSheets user.
  2. In the AWS console, create a new IAM user with programmatic access, and generate the corresponding Access Key credentials. The only set of policies required for this user is to be able to read the secret created in the previous step from the AWS Secrets Manager service and to query the Redshift Data API.

    Below is the policy document:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "VisualEditor0",
          "Effect": "Allow",
          "Action": [
            "secretsmanager:GetSecretValue",
            "secretsmanager:DescribeSecret"
          ],
          "Resource": "arn:aws:secretsmanager:*::secret:*"
        },
        {
          "Sid": "VisualEditor1",
          "Effect": "Allow",
          "Action": "secretsmanager:ListSecrets",
          "Resource": "*"
        },
        {
          "Sid": "VisualEditor2",
          "Effect": "Allow",
          "Action": "redshift-data:*",
          "Resource": "arn:aws:redshift:*::cluster:*"
        }
      ]
    }

  3. Access the Google Apps Script console. Create an aws.gs file with the code available here. This will let you perform authenticated requests to the AWS services by providing an access key and a secret access key.
  4. Initiate the AWS variable providing the access key and secret access key created in step 3.
    AWS.init("<ACCESS_KEY>", "<SECRET_KEY>");

  5. Request the Redshift username and password from the AWS Secrets Manager:
    function runGetSecretValue_(secretId) {
     
      var resultJson = AWS.request(
        	getSecretsManagerTypeAWS_(),
        	getLocationAWS_(),
        	'secretsmanager.GetSecretValue',
        	{"Version": getVersionAWS_()},
        	method='POST',
        	payload={          
          	"SecretId" : secretId
        	},
        	headers={
          	"X-Amz-Target": "secretsmanager.GetSecretValue",
          	"Content-Type": "application/x-amz-json-1.1"
        	}
      );
     
      Logger.log("Execute Statement result: " + resultJson);
      return JSON.parse(resultJson);
     
    }

  6. Query a table using the Amazon Redshift Data API:
    function runExecuteStatement_(sql) {
     
      var resultJson = AWS.request(
        	getTypeAWS_(),
        	getLocationAWS_(),
        	'RedshiftData.ExecuteStatement',
        	{"Version": getVersionAWS_()},
        	method='POST',
        	payload={
          	"ClusterIdentifier": getClusterIdentifierReshift_(),
          	"Database": getDataBaseRedshift_(),
          	"DbUser": getDbUserRedshift_(),
          	"Sql": sql
        	},
        	headers={
          	"X-Amz-Target": "RedshiftData.ExecuteStatement",
          	"Content-Type": "application/x-amz-json-1.1"
        	}
      ); 
     
      Logger.log("Execute Statement result: " + resultJson); 

  7. The result can then be displayed as a table in gSheets:
    function fillGsheet_(recordArray) { 
     
      adjustRowsCount_(recordArray);
     
      var rowIndex = 1;
      for (var i = 0; i < recordArray.length; i++) {  
           
    	var rows = recordArray[i];
    	for (var j = 0; j < rows.length; j++) {
      	var columns = rows[j];
      	rowIndex++;
      	var columnIndex = 'A';
         
      	for (var k = 0; k < columns.length; k++) {
           
        	var field = columns[k];       
        	var value = getFieldValue_(field);
        	var range = columnIndex + rowIndex;
        	addToCell_(range, value);
     
        	columnIndex = nextChar_(columnIndex);
     
      	}
     
    	}
     
      }
     
    }

  8. Once finished, the Apps Script can be deployed as an Addon that enables end-users from an entire organization to leverage the capabilities of retrieving data from Amazon Redshift directly into their spreadsheets. Details on how Apps Script code can be deployed as an Addon can be found here.

How users access Google Sheets

  1. Open a gSheet, and go to manage addons -> Install addon:
  2. Once the Addon is successfully installed, select the Addon menu and select Redshift Synchronization. A dialog will appear prompting the user to select the combination of database, schema, and table from which to load the data.
  3. After choosing the intended table, a new panel will appear on the right side of the screen. Then, the user is prompted to select which columns to retrieve from the table, apply any filtering operation, and/or apply any aggregations to the data.
  4. Upon submitting the query, app scripts will translate the user selection into a query that is sent to the Amazon Redshift Data API. Then, the returned data is transformed and displayed as a regular gSheet table:

Security and Access Management

In the scripts above, there is a direct integration between AWS Secrets Manager and Google Apps Script. The scripts above can extract the currently-authenticated user’s Google email address. Using this value and a set of annotated tags, the script can appropriately pull the user’s credentials securely to authenticate the requests made to the Amazon Redshift cluster. Follow these steps to set up a new user in an existing Amazon Redshift cluster. Once the user has been created, follow these steps for creating a new AWS Secrets Manager secret for your cluster. Make sure that the appropriate tag is applied with the key of “email” along with the corresponding user’s Google email address. Here is a sample configuration that is used for creating Redshift groups, users, and data shares via the Redshift Data API:

connection:
 redshift_super_user_database: dev
 redshift_secret_name: dev_
 redshift_cluster_identifier: dev-cluster
 redshift_secrets_stack_name: dev-cluster-secrets
 environment: dev
 aws_region: eu-west-1
 tags:
   - key: "Environment"
 	value: "dev"
users:
 - name: user1
   email: [email protected]
 data_shares:
 - name: test_data_share
   schemas:
 	- schema1
   redshift_namespaces:
 	- USDFJIL234234WE
group:
 - name: readonly
   users:
 	- user1
   databases:
 	- database: database1
   	exclude-schemas:
     	- public
     	- pg_toast
     	- catalog_history
   	include-schemas:
     	- schema1
   	grant:
     	- select

Operational Metrics and Improvement

Providing access to live data that is hosted in Redshift directly to the business users and enabling true self-service decrease the burden on platform teams to provide data extracts or other mechanisms to deliver up-to-date information. Additionally, by not having different files and versions of data circulating, the business risk of reporting different key figures or KPI can be reduced, and an overall process efficiency can be achieved.

The initial success of this add-on in GTM led to the extension of this to a broader audience, where we are hoping to serve hundreds of users with all of the internal and public data in the future.

Conclusion

In this post, you learned how to create new Amazon Redshift tables and pull existing Redshift tables into a Google Sheet for business users to easily integrate with and manipulate data. This integration was seamless and demonstrated how easy the Amazon Redshift Data API makes integration with external applications, such as Google Sheets with Amazon Redshift. The outlined use-cases above are just a few examples of how the Amazon Redshift Data API can be applied and used to simplify interactions between users and Amazon Redshift clusters.


About the Authors

Dr. Yannick Misteli is leading cloud platform and ML engineering teams in global product strategy (GPS) at Roche. He is passionate about infrastructure and operationalizing data-driven solutions, and he has broad experience in driving business value creation through data analytics.

João Antunes is a Data Engineer in the Global Product Strategy (GPS) team at Roche. He has a track record of deploying Big Data batch and streaming solutions for the telco, finance, and pharma industries.

Krzysztof Wisniewski is a back-end JavaScript developer in the Global Product Strategy (GPS) team at Roche. He is passionate about full-stack development from the front-end through the back-end to databases.

Matt Noyce is a Senior Cloud Application Architect at AWS. He works together primarily with Life Sciences and Healthcare customers to architect and build solutions on AWS for their business needs.

Debu Panda, a Principal Product Manager at AWS, is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world. Debu has published numerous articles on analytics, enterprise Java, and databases and has presented at multiple conferences such as re:Invent, Oracle Open World, and Java One. He is lead author of the EJB 3 in Action (Manning Publications 2007, 2014) and Middleware Management (Packt).

TrueBlue uses Amazon QuickSight to deliver more accurate pricing and grow business

Post Syndicated from Robert Ward original https://aws.amazon.com/blogs/big-data/trueblue-uses-amazon-quicksight-to-deliver-more-accurate-pricing-and-grow-business/

This is a guest post by TrueBlue. In their own words, “Founded in 1989, TrueBlue provides specialized workforce solutions, including staffing, talent management, and recruitment process outsourcing (RPO). In 2020, the company connected approximately 490,000 people with work.”

At TrueBlue, we offer solutions that help employers connect with workers worldwide. Every day, sales teams at our 500-plus locations offer our customers job quotes. These quotes show our staff the hourly rates they should charge and what the gross margin might be on a bill rate.

As part of our work, our sales professionals use a concept called lockout, which is the process for approving sales orders below standard margins. As our company has grown, these approval requests have skyrocketed. We have more than 850 people bidding for potential customers at any time, but only a few dozen managers can approve lockout requests. The number of requests that managers had in their inboxes was increasingly overwhelming and took time away from more important daily tasks. They wanted a way to avoid the process altogether by standardizing job rate information.

In this post, I discuss the steps we took to solve our problem using data analysis and Amazon QuickSight.

Identifying regional pricing differences

To begin, I looked at hourly worker rates across all our locations and added state tax data and other information. That gave me our customer billing rate, plus the overhead to calculate the gross margin. Through my research, I discovered that regionality is important in determining different rates and margins, and that pricing isn’t consistent overall.

Our sales leaders wanted to take this to the next level and figure out the gross margin they would need to maintain a specific hourly billing rate. I could only see 7 months of information, but it amounted to nearly 1 million rows of data. We needed a fast, easy way to use spreadsheet software to find what we were looking for.

Using QuickSight to give sales teams better pricing data

In 2020, we decided to go all in on AWS to create a new data lake and invest in other business intelligence (BI) solutions. After speaking with the AWS team, we learned that QuickSight, a powerful BI service that runs on AWS, could give us the detailed filtering and analytical capabilities we needed.

We used QuickSight to create a new customer job quoting engine for our sales teams in 40 of our branch offices. This solution provides our team with the price quotes that optimize profit margins and the data to calculate the precise charge in each market, all of which can be quickly accessed on their laptops. Now, the lockout requests are disappearing because the sales teams have the information at their fingertips and don’t need to ask for approvals. And because our sales leaders don’t have to read through countless emails every day, they can focus on more value-added tasks.

The following diagram illustrates our solution workflow, which sends data from AWS Database Migration Service (AWS DMS) through a data pipeline to Amazon Athena for analysis, and ultimately to QuickSight.

Boosting customer retention and acquisition by 3%

With the data we’re getting from QuickSight, we can present our customers with more accurate pricing and billing information. As a result, we’ve increased new customer acquisition and retention. Our sales teams are closing phone deals at rates 3% higher than an internal sales control group. We’ve also seen an 11% increase in gross margin for the market in which we’ve used the job quoting engine the longest. Applying the data we have now is really making a difference in our business.

And with the live data powering QuickSight, we’re able to increase our margins. Every time we pay someone, our pricing is updated based on real-time regional data. The solution is always adapting to market conditions, so we can give customers nationwide a price with detailed market segmentation. For example, they can see why we’re charging more in the Midwest than in the South.

Being more transparent with customers

Our frontline sales teams can be more transparent about pricing with potential customers because they have better, more accurate pricing data. When a salesperson is on the phone with a customer, they can view the data in QuickSight and accurately explain what’s going on in a specific market. The pricing information is no longer an estimate; it’s completely accurate and up to date, and we can talk more confidently about what’s driving the cost, such as local conditions or risk ratings.

Another advantage of QuickSight and AWS is the agility and speed they give us. With AWS services, we can control how quickly to roll out the solution and who gets access. And we have more flexibility with AWS, so we can change things as we go and create better, faster tools for our internal teams without relying on a time-consuming, cumbersome development process. We can try things tomorrow that would have previously taken us 6 weeks to get into production, giving salespeople the new features they ask for quickly. And as a rapid prototyping vehicle, QuickSight is perfect for defining the next generation of job quoting packages that we’ll create for our customers.

Our job quoting tool isn’t just helping our frontline sales employees, it’s also benefiting staffing specialists, branch managers, market managers, and even regional and senior vice presidents. They can all see pricing averages and trends (as in the following screenshot), and select the data for specific markets or TrueBlue branches.

Conclusion

The downstream implications of our new job quoting tool powered by QuickSight are huge. Now conversations are happening at the right level, with the right kinds of customers driving more value for our business.


About the Authors

Robert Ward is the Senior Director of Technology at PeopleReady. His teams are responsible for delivering data science and machine learning solutions, strategy and data insights, democratized data, and business analytics solutions. PeopleReady is modernizing how the North American staffing industry connects people with work. Robert Ward is driven to craft innovations for desired outcomes.

Ryan Coyle is the AWS Account Manager for TrueBlue. He has partnered with TrueBlue on their digital transformation efforts since the beginning of 2020. In this function he has collaborated with them to close on-premises datacenter facilities, develop and deliver new products to market, and deliver data driven results to TrueBlue business units.

Shivani Sharma is one of the Account Managers supporting TrueBlue. She joined the team July 2020 where she partners with TrueBlue to drive and collaborate on their transformation initiatives.

Integral Ad Science secures self-service data lake using AWS Lake Formation

Post Syndicated from Mat Sharpe original https://aws.amazon.com/blogs/big-data/integral-ad-science-secures-self-service-data-lake-using-aws-lake-formation/

This post is co-written with Mat Sharpe, Technical Lead, AWS & Systems Engineering from Integral Ad Science.

Integral Ad Science (IAS) is a global leader in digital media quality. The company’s mission is to be the global benchmark for trust and transparency in digital media quality for the world’s leading brands, publishers, and platforms. IAS does this through data-driven technologies with actionable real-time signals and insight.

In this post, we discuss how IAS uses AWS Lake Formation and Amazon Athena to efficiently manage governance and security of data.

The challenge

IAS processes over 100 billion web transactions per day. With strong growth and changing seasonality, IAS needed a solution to reduce cost, eliminate idle capacity during low utilization periods, and maximize data processing speeds during peaks to ensure timely insights for customers.

In 2020, IAS deployed a data lake in AWS, storing data in Amazon Simple Storage Service (Amazon S3), cataloging its metadata in the AWS Glue Data Catalog, ingesting and processing using Amazon EMR, and using Athena to query and analyze the data. IAS wanted to create a unified data platform to meet its business requirements. Additionally, IAS wanted to enable self-service analytics for customers and users across multiple business units, while maintaining critical controls over data privacy and compliance with regulations such as GDPR and CCPA. To accomplish this, IAS needed to securely ingest and organize real-time and batch datasets, as well as secure and govern sensitive customer data.

To meet the dynamic nature of IAS’s data and use cases, the team needed a solution that could define access controls by attribute, such as classification of data and job function. IAS processes significant volumes of data and this continues to grow. To support the volume of data, IAS needed the governance solution to scale in order to create and secure many new daily datasets. This meant IAS could enable self-service access to data from different tools, such as development notebooks, the AWS Management Console, and business intelligence and query tools.

To address these needs, IAS evaluated several approaches, including a manual ticket-based onboarding process to define permissions on new datasets, many different AWS Identity and Access Management (IAM) policies, and an AWS Lambda based approach to automate defining Lake Formation table and column permissions triggered by changes in security requirements and the arrival of new datasets.

Although these approaches worked, they were complex and didn’t support the self-service experience that IAS data analysts required.

Solution overview

IAS selected Lake Formation, Athena, and Okta to solve this challenge. The following architectural diagram shows how the company chose to secure its data lake.

The solution needed to support data producers and consumers in multiple AWS accounts. For brevity, this diagram shows a central data lake producer that includes a set of S3 buckets for raw and processed data. Amazon EMR is used to ingest and process the data, and all metadata is cataloged in the data catalog. The data lake consumer account uses Lake Formation to define fine-grained permissions on datasets shared by the producer account; users logging in through Okta can run queries using Athena and be authorized by Lake Formation.

Lake Formation enables column-level control, and all Amazon S3 access is provisioned via a Lake Formation data access role in the query account, ensuring only that service can access the data. Each business unit with access to the data lake is provisioned with an IAM role that only allows limited access to:

  • That business unit’s Athena workgroup
  • That workgroup’s query output bucket
  • The lakeformation:GetDataAccess API

Because Lake Formation manages all the data access and permissions, the configuration of the user’s role policy in IAM becomes very straightforward. By defining an Athena workgroup per business unit, IAS also takes advantage of assigning per-department billing tags and query limits to help with cost management.

Define a tag strategy

IAS commonly deals with two types of data: data generated by the company and data from third parties. The latter usually includes contractual stipulations on privacy and use.

Some data sets require even tighter controls, and defining a tag strategy is one key way that IAS ensures compliance with data privacy standards. With the tag-based access controls in Lake Formation IAS can define a set of tags within an ontology that is assigned to tables and columns. This ensures users understand available data and whether or not they have access. It also helps IAS manage privacy permissions across numerous tables with new ones added every day.

At a simplistic level, we can define policy tags for class with private and non-private, and for owner with internal and partner.

As we progressed, our tagging ontology evolved to include individual data owners and data sources within our product portfolio.

Apply tags to data assets

After IAS defined the tag ontology, the team applied tags at the database, table, and column level to manage permissions. Tags are inherited, so they only need to be applied at the highest level. For example, IAS applied the owner and class tags at the database level and relied on inheritance to propagate the tags to all the underlying tables and columns. The following diagram shows how IAS activated a tagging strategy to distinguish between internal and partner datasets , while classifying sensitive information within these datasets.

Only a small number of columns contain sensitive information; IAS relied on inheritance to apply a non-private tag to the majority of the database objects and then overrode it with a private tag on a per-column basis.

The following screenshot shows the tags applied to a database on the Lake Formation console.

With its global scale, IAS needed a way to automate how tags are applied to datasets. The team experimented with various options including string matching on column names, but the results were unpredictable in situations where unexpected column names are used (ipaddress vs. ip_address, for example). Ultimately, IAS incorporated metadata tagging into its existing infrastructure as code (IaC) process, which gets applied as part of infrastructure updates.

Define fine-grained permissions

The final piece of the puzzle was to define permission rules to associate with tagged resources. The initial data lake deployment involved creating permission rules for every database and table, with column exclusions as necessary. Although these were generated programmatically, it added significant complexity when the team needed to troubleshoot access issues. With Lake Formation tag-based access controls, IAS reduced hundreds of permission rules down to precisely two rules, as shown in the following screenshot.

When using multiple tags, the expressions are logically ANDed together. The preceding statements permit access only to data tagged non-private and owned by internal.

Tags allowed IAS to simplify permission rules, making it easy to understand, troubleshoot, and audit access. The ability to easily audit which datasets include sensitive information and who within the organization has access to them made it easy to comply with data privacy regulations.

Benefits

This solution provides self-service analytics to IAS data engineers, analysts, and data scientists. Internal users can query the data lake with their choice of tools, such as Athena, while maintaining strong governance and auditing. The new approach using Lake Formation tag-based access controls reduces the integration code and manual controls required. The solution provides the following additional benefits:

  • Meets security requirements by providing column-level controls for data
  • Significantly reduces permission complexity
  • Reduces time to audit data security and troubleshoot permissions
  • Deploys data classification using existing IaC processes
  • Reduces the time it takes to onboard data users including engineers, analysts, and scientists

Conclusion

When IAS started this journey, the company was looking for a fully managed solution that would enable self-service analytics while meeting stringent data access policies. Lake Formation provided IAS with the capabilities needed to deliver on this promise for its employees. With tag-based access controls, IAS optimized the solution by reducing the number of permission rules from hundreds down to a few, making it even easier to manage and audit. IAS continues to analyze data using more tools governed by Lake Formation.


About the Authors

Mat Sharpe is the Technical Lead, AWS & Systems Engineering at IAS where he is responsible for the company’s AWS infrastructure and guiding the technical teams in their cloud journey. He is based in New York.

Brian Maguire is a Solution Architect at Amazon Web Services, where he is focused on helping customers build their ideas in the cloud. He is a technologist, writer, teacher, and student who loves learning. Brian is the co-author of the book Scalable Data Streaming with Amazon Kinesis.

Danny Gagne is a Solutions Architect at Amazon Web Services. He has extensive experience in the design and implementation of large-scale high-performance analysis systems, and is the co-author of the book Scalable Data Streaming with Amazon Kinesis. He lives in New York City.

How Rapid7 built multi-tenant analytics with Amazon Redshift using near-real-time datasets

Post Syndicated from Rahul Monga original https://aws.amazon.com/blogs/big-data/how-rapid7-built-multi-tenant-analytics-with-amazon-redshift-using-near-real-time-datasets/

This is a guest post co-written by Rahul Monga, Principal Software Engineer at Rapid7.

Rapid7 InsightVM is a vulnerability assessment and management product that provides visibility into the risks present across an organization. It equips you with the reporting, automation, and integrations needed to prioritize and fix those vulnerabilities in a fast and efficient manner. InsightVM has more than 5,000 customers across the globe, runs exclusively on AWS, and is available for purchase on AWS Marketplace.

To provide near-real-time insights to InsightVM customers, Rapid7 has recently undertaken a project to enhance the dashboards in their multi-tenant software as a service (SaaS) portal with metrics, trends, and aggregated statistics on vulnerability information identified in their customer assets. They chose Amazon Redshift as the data warehouse to power these dashboards due to its ability to deliver fast query performance on gigabytes to petabytes of data.

In this post, we discuss the design options that Rapid7 evaluated to build a multi-tenant data warehouse and analytics platform for InsightVM. We will deep dive into the challenges and solutions related to ingesting near-real-time datasets and how to create a scalable reporting solution that can efficiently run queries across more than 3 trillion rows. This post also discusses an option to address the scenario where a particular customer outgrows the average data access needs.

This post uses the terms customers, tenants, and organizations interchangeably to represent Rapid7 InsightVM customers.

Background

To collect data for InsightVM, customers can use scan engines or Rapid7’s Insight Agent. Scan engines allow you to collect vulnerability data on every asset connected to a network. This data is only collected when a scan is run. Alternatively, you can install the Insight Agent on individual assets to collect and send asset change information to InsightVM numerous times each day. The agent also ensures that asset data is sent to InsightVM regardless of whether or not the asset is connected to your network.

Data from scans and agents is sent in the form of packed documents, in micro-batches of hundreds of events. Around 500 documents per second are received across customers, and each document is around 2 MB in size. On a typical day, InsightVM processes 2–3 trillion rows of vulnerability data, which translates to around 56 GB of compressed data for a large customer. This data is normalized and processed by InsightVM’s vulnerability management engine and streamed to the data warehouse system for near-real-time availability of data for analytical insights to customers.

Architecture overview

In this section, we discuss the overall architectural setup for the InsightVM system.

Scan engines and agents collect and send asset information to the InsightVM cloud. Asset data is pooled, normalized, and processed to identify vulnerabilities. This is stored in an Amazon ElastiCache for Redis cluster and also pushed to Amazon Kinesis Data Firehouse for use in near-real time by InsightVM’s analytics dashboards. Kinesis Data Firehose delivers raw asset data to an Amazon Simple Storage Service (Amazon S3) bucket. The data is transformed using a custom developed ingestor service and stored in a new S3 bucket. The transformed data is then loaded into the Redshift data warehouse. Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), and AWS Lambda are used to orchestrate this data flow.  In addition, to identify the latest timestamp of vulnerability data for assets, an auxiliary table is maintained and updated periodically with the update logic in the Lambda function, which is triggered through an Amazon CloudWatch event rule. Custom-built middleware components interface between the web user interface (UI) and the Amazon Redshift cluster to fetch asset information for display in dashboards.

The following diagram shows the implementation architecture of InsightVM, including the data warehouse system:

Rapid-7 Multi-tenant Architecture

The architecture has built-in tenant isolation because data access is abstracted through the API. The application uses a dimensional model to support low-latency queries and extensibility for future enhancements.

Amazon Redshift data warehouse design: Options evaluated and selection

Considering Rapid7’s need for near-real-time analytics at any scale, the InsightVM data warehouse system is designed to meet the following requirements:

  • Ability to view asset vulnerability data at near-real time, within 5–10 minutes of ingest
  • Less than 5 seconds’ latency when measured at 95 percentiles (p95) for reporting queries
  • Ability to support 15 concurrent queries per second, with the option to support more in the future
  • Simple and easy-to-manage data warehouse infrastructure
  • Data isolation for each customer or tenant

Rapid7 evaluated Amazon Redshift RA3 instances to support these requirements. When designing the Amazon Redshift schema to support these goals, they evaluated the following strategies:

  • Bridge model – Storage and access to data for each tenant is controlled at the individual schema level in the same database. In this approach, multiple schemas are set up, where each schema is associated with a tenant, with the same exact structure of the dimensional model.
  • Pool model – Data is stored in a single database schema for all tenants, and a new column (tenant_id) is used to scope and control access to individual tenant data. Access to the multi-tenant data is controlled using API-level access to the tables. Tenants aren’t aware of the underlying implementation of the analytical system and can’t query them directly.

For more information about multi-tenant models, see Implementing multi-tenant patterns in Amazon Redshift using data sharing.

Initially when evaluating the bridge model, it provided an advantage for tenant-only data for queries, plus the ability to decouple a tenant to an independent cluster if they outgrow the resources that are available in the single cluster. Also, when the p95 metrics were evaluated in this setup, the query response times were less than 5 seconds, because each tenant data is isolated into smaller tables. However, the major concern with this approach was with the near-real-time data ingestion into over 50,000 tables (5,000 customer schemas x approximately 10 tables per schema) every 5 minutes. Having thousands of commits every minute into an online analytical processing (OLAP) system like Amazon Redshift can lead to most resources being exhausted in the ingestion process. As a result, the application suffers query latencies as data grows.

The pool model provides a simpler setup, but the concern was with query latencies when multiple tenants access the application from the same tables. Rapid7 hoped that these concerns would be addressed by using Amazon Redshift’s support for massively parallel processing (MPP) to enable fast execution of most complex queries operating on large amounts of data. With the right table design using the right sort and distribution keys, it’s possible to optimize the setup. Furthermore, with automatic table optimization, the Amazon Redshift cluster can automatically make these determinations without any manual input.

Rapid7 evaluated both the pool and bridge model designs, and decided to implement the pool model. This model provides simplified data ingestion and can support query latencies of under 5 seconds at p95 with the right table design. The following table summarizes the results of p95 tests conducted with the pool model setup.

Query P95
Large customer: Query with multiple joins, which list assets, their vulnerabilities, and all their related attributes, with aggregated metrics for each asset, and filters to scope assets by attributes like location, names, and addresses Less than 4 seconds
Large customer: Query to return vulnerability content information given a list of vulnerability identifiers Less than 4 seconds

Tenet isolation and security

Tenant isolation is fundamental to the design and development of SaaS systems. It enables SaaS providers to reassure customers that, even in a multi-tenant environment, their resources can’t be accessed by other tenants.

With the Amazon Redshift table design using the pool model, Rapid7 built a separate data access layer in the middleware that templatized queries, augmented with runtime parameter substitution to uniquely filter specific tenant and organization data.

The following is a sample of templatized query:

<#if useDefaultVersion()>
currentAssetInstances AS (
SELECT tablename.*
FROM tablename
<#if (applyTags())>
JOIN dim_asset_tag USING (organization_id, attribute2, attribute3)
</#if>
WHERE organization_id ${getOrgIdFilter()}
<#if (applyTags())>
AND tag_id IN ($(tagIds))
</#if>
),
</#if>

The following is a Java interface snippet to populate the template:

public interface TemplateParameters {

boolean useDefaultVersion(); boolean useVersion(); default Set<String> getVersions() {
return null;
} default String getVersionJoin(String var1) {
return "";
}

String getTemplateName();

String getOrgIdString();

default String getOrgIdFilter() {
return "";
}

Every query uses organization_id and additional parameters to uniquely access tenant data. During runtime, organization_id and other metadata are extracted from the secured JWT token that is passed to middleware components after the user is authenticated in the Rapid7 cloud platform.

Best practices and lessons learned

To fully realize the benefits of the Amazon Redshift architecture and design for the multiple tenants & near real-time ingestion, considerations on the table design allow you to take full advantage of the massively parallel processing and columnar data storage. In this section, we discuss the best practices and lessons learned from building this solution.

Sort key for effective data pruning

Sorting a table on an appropriate sort key can accelerate query performance, especially queries with range-restricted predicates, by requiring fewer table blocks to be read from disk. To have Amazon Redshift choose the appropriate sort order, the AUTO option was utilized. Automatic table optimization continuously observes how queries interact with tables and discovers the right sort key for the table. To effectively prune the data by the tenant, organization_id is identified as the sort key to perform the restricted scans. Furthermore, because all queries are routed through the data access layer, organization_id is automatically added in the predicate conditions to ensure effective use of the sort keys.

Micro-batches for data ingestion

Amazon Redshift is designed for large data ingestion, rather than transaction processing. The cost of commits is relatively high, and excessive use of commits can result in queries waiting for access to the commit queue. Data is micro-batched during ingestion as it arrives for multiple organizations. This results in fewer transactions and commits when ingesting the data.

Load data in bulk

If you use multiple concurrent COPY commands to load one table from multiple files, Amazon Redshift is forced to perform a serialized load, and this type of load is much slower.

The Amazon Redshift manifest file is used to ingest the datasets that span multiple files in a single COPY command, which allows fast ingestion of data in each micro-batch.

RA3 instances for data sharing

Rapid 7 uses Amazon Redshift RA3 instances, which enable data sharing to allow you to securely and easily share live data across Amazon Redshift clusters for reads. In this multi-tenant architecture when a tenant outgrows the average data access needs, it can be isolated to a separate cluster easily and independently scaled using the data sharing. This is accomplished by monitoring the STL_SCAN table to identify different tenants and isolate them to allow for independent scalability as needed.

Concurrency scaling for consistently fast query performance

When concurrency scaling is enabled, Amazon Redshift automatically adds additional cluster capacity when you need it to process an increase in concurrent read queries. To meet the uptick in user requests, the concurrency scaling feature is enabled to dynamically bring up additional capacity to provide consistent p95 values that meet Rapid7’s defined requirements for the InsightVM application.

Results and benefits

Rapid7 saw the following results from this architecture:

  • The new architecture has reduced the time required to make data accessible to customers to less than 5 minutes on average. The previous architecture had higher level of processing time variance, and could sometimes exceed 45 minutes
  • Dashboards load faster and have enhanced drill-down functionality, improving the end-user experience
  • With all data in a single warehouse, InsightVM has a single source of truth, compared to the previous solution where InsightVM had copies of data maintained in different databases and domains, which could occasionally get out of sync
  • The new architecture lowers InsightVM’s reporting infrastructure cost by almost three times, as compared to the previous architecture

Conclusion

With Amazon Redshift, the Rapid7 team has been able to centralize asset and vulnerability information for InsightVM customers. The team has simultaneously met its performance and management objectives with the use of a multi-tenant pool model and optimized table design. In addition, data ingestion via Kinesis Data Firehose and custom-built microservices to load data into Amazon Redshift in near-real time enabled Rapid7 to deliver asset vulnerability information to customers more than nine times faster than before, improving the InsightVM customer experience.


About the Authors

Rahul Monga is a Principal Software Engineer at Rapid7, currently working on the next iteration of InsightVM. Rahul’s focus areas are highly distributed cloud architectures and big data processing. Originally from the Washington DC area, Rahul now resides in Austin, TX with his wife, daughter, and adopted pup.

Sujatha Kuppuraju is a Senior Solutions Architect at Amazon Web Services (AWS). She works with ISV customers to help design secured, scalable and well-architected solutions on the AWS Cloud. She is passionate about solving complex business problems with the ever-growing capabilities of technology.

Thiyagarajan Arumugam is a Principal Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

How Tophatter improved stability and lowered costs by migrating to Amazon Redshift RA3

Post Syndicated from Julien DeFrance original https://aws.amazon.com/blogs/big-data/how-tophatter-improved-stability-and-lowered-costs-by-migrating-to-amazon-redshift-ra3/

This is a guest post co-written by Julien DeFrance of Tophatter and Jordan Myers of Etleap. Tophatter is a mobile discovery marketplace that hosts live auctions for products spanning every major category. Etleap, an AWS Advanced Tier Data & Analytics partner, is an extract, transform, load, and transform (ETLT) service built for AWS.

As a company grows, it continually seeks out solutions that help its teams achieve better performance and scale of their data analytics, especially when business growth has eclipsed current capabilities. Migrating to a new architecture is often a key component of this. However, a migration path that is painless, flexible, and supported is not always available.

In this post, we walk through how Tophatter—a virtual auction house where buyers and sellers interact, chat, and transact in diverse categories—recently migrated from DS2 to RA3 nodes in Amazon Redshift. We highlight the steps they took, how they improved stability and lowered costs, and the lessons other companies can follow.

Tophatter’s data storage and ETL architecture

Tophatter stores the majority of their product data in MySQL databases, while sending some webhook and web events to Amazon Simple Storage Service (Amazon S3). Additionally, some of their vendors drop data directly into dedicated S3 buckets. Etleap integrates with these sources. Every hour (according to the schedule configured by Tophatter), Etleap extracts all the new data that has been added or changed in the source, transforms the new data according to the pipeline rules defined by the user in the UI, and loads the resulting data into Amazon Redshift.

Tophatter relies on Mode Analytics and Looker for data analysis, and uses Etleap’s model feature based on Amazon Redshift materialized views to persist the results of frequently used business intelligence (BI) queries. Tophatter configures the update schedule of the model to happen at defined times or when certain source tables have been updated with new data.

Ultimately, these critical data pipelines fuel Tophatter dashboards that both internal analysts and users interact with.

The following diagram illustrates how Tophatter uses Etleap’s AWS-native extract, transform, and load (ETL) tool to ingest data from their operational databases, applications, and Amazon S3 into Amazon Redshift.

Company growth leads to data latency

Before the migration, Tophatter’s team operated 4 DS2 Reserved Instance (RI) nodes (ds2.xlarge) in Amazon Redshift, which use HDD drives as opposed to relatively faster SSDs.

As their user base expanded and online auction activity increased exponentially, Tophatter’s ETL needs grew. In response, Etleap seamlessly scaled to support their increased volume of ingestion pipelines and materialized data models. But Tophatter’s Amazon Redshift cluster—which they managed internally—wasn’t as easy to scale. When Amazon Redshift usage increased, Tophatter had to resize the cluster manually or reduce the frequency of certain analytics queries or models. Finding the optimal cluster size often required multiple iterations.

Due to the time-sensitive nature of data needed for live online auctions, Tophatter used automated monitoring to notify on-call engineers when data pipeline latency had exceeded the desired threshold. Latencies and errors began to pop up more frequently—at least once or twice a week. These events caused distress for the on-call engineers. When the issue couldn’t be resolved internally, they notified Etleap support, who typically recommended either canceling or reducing the frequency of certain long-running model queries.

While the issue was still being resolved, the latencies resulted in downstream issues for the analytics team, such as certain tables being out of sync with others, resulting in incorrect query results.

Migrating to Amazon Redshift RA3

To improve stability and reduce engineering maintenance, Tophatter decided to migrate from DS2 to RA3 nodes. Amazon Redshift RA3 with managed storage is the latest generation node type and would allow Tophatter to scale compute and storage independently.

With DS2 nodes, there was pressure to offload or archive historical data to other storage because of fixed storage limits. RA3 nodes with managed storage are an excellent fit for analytics workloads that require high storage capacity, such as operational analytics, where the subset of data that’s most important continually evolves over time.

Moving to the RA3 instance type would also enable Tophatter to capitalize on the latest features of Amazon Redshift, such as AQUA (Advanced Query Accelerator), Data Sharing, Amazon Redshift ML, and cross-VPC support.

RA3 migration upgrade program

Tophatter understood the benefits of migrating to RA3, but worried that their 3-year DS2 Reserved Instances commitment would present a roadblock. They still had 2 years remaining in their agreement and were unsure if an option was available that would allow them to change course.

They found out about the AWS RA3 upgrade program from the AWS account team, which helps customers convert their DS2 Reserved Instance commitments into RA3 without breaking the commitment agreement. This path enables you to seamlessly migrate from your legacy Amazon Redshift clusters to one of three RA3 node types: ra3.xlplus, ra3.4xlarge, or ra3.16xlarge.

Tophatter’s engagement with the program consisted of five steps:

  1. Engage with the AWS account team.
  2. Receive pricing information from the AWS account team.
  3. Schedule the migration.
  4. Purchase RA3 Reserved Instances.
  5. Submit a case to cancel their DS2 Reserved Instances.

Tophatter had the opportunity to evaluate three options for their RA3 migration:

  • Elastic resize – This is the most efficient way to change the instance type and update the nodes in your Amazon Redshift cluster. The cluster endpoint doesn’t change and the downtime during resize is minimal.
  • Snapshot and restore method – Choose the snapshot and restore method if elastic resize is unavailable (from a mismatch between slice and node count). Or, use this method to minimize the amount of time it takes to write to your production database.
  • Classic resize – Choose the classic resize method if it’s the only option available. For single-node DS2 clusters, only a classic resize can be performed to convert the cluster into a multi-node cluster.

Achieving operational goals

Tophatter worked closely with their AWS account team for the migration and chose elastic resize due to the minimal downtime that option presented as well as prior experience using it. They completed the migration in under 2 hours, which included the requisite pre- and post-testing. As a pre-migration step, they took a snapshot of the cluster and were prepared to restore it if something went wrong.

After migrating to a 2 node ra3.4xlarge cluster, the Tophatter team realized numerous benefits:

  • Storage with up to 256 TB of Amazon Redshift managed storage for their cluster
  • Dramatically reduced latency of ingestion and data modeling
  • Control of the compute and storage capacities and costs, independently
  • Better system stability, with latency incidents requiring on-call engineer responses dropping to near zero

The following graph illustrates the average latency of a typical ingestion pipeline before and after the migration from DS2 to RA3.

Conclusion

When Tophatter set out to migrate their DS2 instances to RA3, they were unaware of the AWS RA3 upgrade program and its benefits. They were happy to learn that the program avoided the administrative overhead of getting approvals ahead of time for various specific configurations, and let them try several options in order to find a stable configuration.

Tophatter migrated from DS2 to RA3 without breaking their current commitment and enhanced their analytics to be agile and versatile. Now, Tophatter is aiming to realize greater scaling benefits to support its exponential growth by exploring new RA3 features such as:

  • Amazon Redshift Data Sharing – Provides instant, granular, high-performance data access without data copies or movement
  • Amazon Redshift ML – Allows you to create, train, and apply machine learning models using SQL commands in Amazon Redshift
  • AQUA – Provides a new distributed and hardware accelerated cache that brings compute to the storage layer for Amazon Redshift and delivers up to 10 times faster query performance than other enterprise cloud data warehouses
  • Cross-VPC support for Amazon Redshift – With an Amazon Redshift-managed VPC endpoint, you can privately access your Amazon Redshift data warehouse within your VPC from your client applications in another VPC within the same AWS account, another AWS account, or running on-premises without using public IPs or requiring encrypted traffic to traverse the internet

We hope Tophatter’s migration journey can help other AWS customers reap the benefits from the AWS RA3 upgrade program from DS2 or DC2 cluster families to RA3. We believe this enables better performance and cost benefits while unlocking valuable new Amazon Redshift features.


About the Authors

Julien DeFrance is a Principal Software Engineer at Tophatter based out of San Francisco. With a strong focus on backend and cloud infrastructure, he is part of the Logistics Engineering Squad, building and supporting integrations with third parties such as sellers, ERP systems, and carriers, architecting and implementing solutions to help optimize cost efficiency and service quality. Julien holds two AWS Certifications (Cloud Practitioner, Solutions Architect – Associate).

 

Jordan Myers is an engineer at Etleap with 5 years of experience in programming ETL software. In addition to programming, he provides deep-level technical customer support and writes technical documentation

 

 

Jobin George is a Big Data Solutions Architect with more than a decade of experience designing and implementing large-scale big data and analytics solutions. He provides technical guidance, design advice, and thought leadership to some of the key AWS customers and big data partners.

 

 

Maneesh Sharma is a Senior Database Engineer with Amazon Redshift. He works and collaborates with various Amazon Redshift Partners to drive better integration. In his spare time, he likes running, playing ping pong, and exploring new travel destinations.

Zabbix 5.0 – My happiness and disenchantment

Post Syndicated from Dennis Ananiev original https://blog.zabbix.com/zabbix-5-0-my-happiness-and-disenchantment/14107/

Zabbix is an open-source solution, and all features are available out of the box for free. You don’t have to pay for the pro, or business, or community versions. You can download Zabbix source files or packages from the official site and use them in your enterprise or your home lab, test and apply or even suggest your changes. Zabbix offers many new features in every release, and it’s an excellent approach to interact with the community. This post will share my experience with Zabbix and my opinion of improvements made in Zabbix 5.2.

Contents

I. Pros (3:49)

    1. Global view Dashboard (3:49)
    2. Host configuration (7:19)
    3. Discovery rules (11:56)
    4. Maintenance (15:46)

II. Cons (20:13)

Pros

Global view Dashboard

Improvements start from the central Zabbix 5.2 dashboard — it’s totally different from the earlier versions. Now it looks more clear and user-friendly.

Global view Dashboard

Now, we have a hiding vertical menu. Since this is a Global view dashboard, we can see hosts by availability and problems by the severity level (we didn’t have this opportunity in earlier versions), as well as system information.

From the Global view dashboard, you can configure the widgets. For instance, you can choose how many lines you can see in the problems panel.

Configuring widgets in the Dashboard

In earlier versions, you could see only 20 problems in your Dashboard, and you could change this parameter only in the Zabbix source code if you had some PHP knowledge. Now you can choose how many problems you display in the Show line field. This is really convenient as you might have a really enormous infrastructure and almost 200 problems per day filling in the Dashboard. In earlier versions, if the Zabbix Server was down, you could not see the previous problems without opening the menu “Last values”. Now you can choose the number of problems to display. In addition, you can choose to display the problems of a certain severity level only or to display only tags. For duty admins, it’s pretty good to see operational data with problems and show unacknowledged only.

This is convenient to Zabbix engineers or admins as sometimes admins monitor only certain parts of the infrastructure: some servers, databases, or middleware levels. In this case, you can choose to display Host groups or Tags for different layers. Then all you need is to click Apply.

Host configuration

There are many other configuration options that make the life of an engineer more comfortable. For instance, in Configuration > Hosts, new features are available.

New Hosts configuration

  • Here, as opposed to the earlier Zabbix versions, you can filter hosts by a specific proxy or specific tags. This made it hard to understand, which proxy was monitoring a specific host, especially if you were monitoring, for instance, one or two thousand hosts. The new feature saves you a lot of time as you don’t have to open other pages and try to find the necessary information.
  • Another new feature in the Hosts dashboard is the improved Items configuration.

Improved Items configuration

Here, if you click any item, for instance, the one collecting CPU data, you can now use the new Execute now and Test buttons to test values without waiting for an update interval.

New Execute now and Test buttons

So, if you click Test > Get value and test, you can get the value from a remote host immediately.

Using Get value and test button

Clicking the Test button, you can also check the correct Type for your data collection. Execute now allows you to pull a request to the remote host and return data back without waiting for a response, and immediately find the required information in the Latest data without waiting for an update interval.

Requesting data without waiting for update interval

You normally don’t need to collect data such as hostname or OS name very often. Such data is collected once per day or once per hour. However, you might not want to stay online waiting for collection. So, you can click Execute now and collect the data immediately.

NOTE. Execute now and Test buttons are available only starting from Zabbix 5.x.

Discovery rules

  • Another Zabbix configuration tool — Discovery rules were also improved. Previously, if we needed to discover some data, for instance, from a Linux server, such as Mounted filesystem discovery or Network interface discovery, we had to stay online and wait for the data to be collected. Now with Execute now and Test buttons, you don’t have to wait for the stated update interval and get values immediately.

New Discovery rules options

So, if you click Get value and test, you immediately get all data Types and all file system names for all partitions on the server, as well s JSON array. Here, you can check what data you do and don’t need and then exclude certain data using regular expressions. It’s a really big achievement to add the ability Test and Execute Now button everywhere because it makes system more complex and dynamic.

  • In earlier Zabbix versions, in Item prototypes, we couldn’t change anything in bulk. You had to open each of the items, for instance, Free nodes or Space utilization, and change what you need for each of them. Now, you can check All items box and use Mass update button.

Mass update for Items prototype

For instance, we can change all update intervals for all items at once.

Changing all update intervals at once

Previously, we could mass update only items and some triggers, while now we can use Mass update for item prototypes as well. Item prototypes are used very often in our everyday operations, for instance, to discover data by SNMP as SNMP is collecting data for network or storage devices where item prototypes are really important. For instance, NetApp storage may have about 1,500 items, and it is really difficult to change update interval history for such an enormous number of items. Now, you just click Mass update, change parameters for item prototypes, and apply changes to all items at once.

Maintenance

Maintenance has been a headache for many Zabbix engineers and administrators for ages. In Zabbix 4.2, we had three Maintenance menus: Maintenance, Periods, and Hosts and groups.

Maintenance settings in earlier Zabbix versions

Windows or Linux administrators using Zabbix only for monitoring their stuff could just select the period using Active since and Active till and didn’t know what to do if data collection and maintenance didn’t work correctly. For instance, if we started replacing RAM in the data center at 8 a.m. and spent two hours, we could set Active till to 10 a.m. However, surprisingly, it didn’t work.

In Zabbix 5.x, the team used a different approach — a separate menu for all items, which previously was displayed in three separate tabs.

Now you can set up all parameters in one window.

Improved Maintenance settings

NOTE. In most cases, Active since and Active till don’t work correctly for setting up downtime. To set up the downtime, the Period field should be used to choose Period type, date, and the number of days or hours needed to fix RAM in our example.

 

Maintenance period settings

Setting downtime period due to maintenance

This change is not intuitive; however, you should put attention to your Maintenance period settings when receiving calls from your admins and engineers about maintenance alerts. In addition, Maintenance period settings are more detailed, so you just need to practice selecting the required parameters. However, this is the question to the Zabbix team to make these parameter settings more user-friendly.

Cons

Unfortunately, some problems have been inherited from the earlier Zabbix versions.

  • For instance, in Administration > Users you still can’t change any parameters or clone users with the same characteristics, you have to create each user separately. If you have a thousand users, this will give you a headache to create all of them manually if you don’t know much about Zabbix API or Ansible.

Limited Users setting options

  • In addition, Zabbix doesn’t have any mechanisms for importing LDAP/SAML users and LDAP?SAML groups. It is still hard to create and synchronize this account with, for instance, Active Directory or other service directories. Active Directory administrator might change the users’ surname and move them to some other department, and Zabbix administrator won’t know about this due to this synchronization gap.
  • There are obvious drawbacks to the Zabbix menu. For instance, Hosts are still available under Monitoring, Inventory, and Configuration sections, which might be messy for the newbies as it is difficult to decide, which menu should be used. So, merging these menus will be a step forward to usability.
  • Lastly, in the Configuration > Hosts menu there was a drop-down list for host groups and templates, but in the newest Zabbix only the Select button is left. Now, without the drop-down list, it is tricky for newbies to choose host groups and templates.

Selecting host groups and templates

Zabbix migration in a mid-sized bank environment

Post Syndicated from Angelo Porta original https://blog.zabbix.com/zabbix-migration-in-a-mid-sized-bank-environment/13040/

A real CheckMK/LibreNMS to Zabbix migration for a mid-sized Italian bank (1,700 branches, many thousands of servers and switches). The customer needed a very robust architecture and ancillary services around the Zabbix engine to manage a robust and error-free configuration.

Content

I. Bank monitoring landscape (1:45)
II. Zabbix monitoring project (h2)
III. Questions & Answers (19:40)

Bank monitoring landscape

The bank is one of the 25 largest European banks for market capitalization and one of the 10 largest banks in Italy for:

  • branch network,
  • loans to customers,
  • direct funding from customers,
  • total assets,

At the end of 2019, at least 20 various monitoring tools were used by the bank:

  • LibreNMS for networking,
  • CheckMK for servers besides Microsoft,
  • Zabbix for some limited areas inside DCs,
  • Oracle Enterprise Monitor,
  • Microsoft SCCM,
  • custom monitoring tools (periodic plain counters, direct HTML page access, complex dashboards, etc.)

For each alert, hundreds of emails were sent to different people, which made it impossible to really monitor the environment. There was no central monitoring and monitoring efforts were distributed.

The bank requirements:

  • Single pane of glass for two Data Centers and branches.
  • Increased monitoring capabilities.
  • Secured environment (end-to-end encryption).
  • More automation and audit features.
  • Separate monitoring of two DCs and branches.
  • No direct monitoring: all traffic via Zabbix Proxy.
  • Revised and improved alerting schema/escalation.
  • Parallel with CheckMK and LibreNMS for a certain period of time.

Why Zabbix?

The bank has chosen Zabbix among its competitors for many reasons:

  • better cross feature on the network/server/software environment;
  • opportunity to integrate with other internal bank software;
  • continuous enhancements on every Zabbix release;
  • the best integration with automation software (Ansible); and
  • personnel previous experience and skills.

Zabbix central infrastructure — DCs

First, we had to design one infrastructure able to monitor many thousands of devices in two data centers and the branches, and many items and thousands of values per second, respectively.

The architecture is now based on two database servers clusterized using Patroni and Etcd, as well as many Zabbix proxies (one for each environment — preproduction, production, test, and so on). Two Zabbix servers, one for DCs and another for the branches. We also suggested deploying a third Zabbix server to monitor the two main Zabbix servers. The DC database is replicated on the branches DB server, while the branches DB is replicated on the server handling the DCs using Patroni, so two copies of each database are available at any point in time. The two data centers are located more than 50 kilometers apart from each other. In this picture, the focus is on DC monitoring:

Zabbix central infrastructure — DCs

Zabbix central infrastructure — branches

In this picture the focus is on branches.

Before starting the project, we projected one proxy for each branch, that is, more or less 1,500 proxies. We changed this initial choice during implementation by reducing branch proxies to four.

Zabbix central infrastructure — branches

Zabbix monitoring project

New infrastructure

Hardware

  • Two nodes bare metal Cluster for PostgreSQL DB.
  • Two bare Zabbix Engines — each with 2 Intel Xeon Gold 5120 2.2G, 14C/28T processors, 4 NVMe disks, 256GB RAM.
  • A single VM for Zabbix MoM.
  • Another bare server for databases backup

Software

  • OS RHEL 7.
  • PostgreSQL 12 with TimeScaleDB 1.6 extension.
  • Patroni Cluster 1.6.5 for managing Postgres/TimeScaleDB.
  • Zabbix Server 5.0.
  • Proxy for metrics collection (5 for each DC and 4 for branches).

Zabbix templates customization

We started using Zabbix 5.0 official templates. We deleted many metrics and made changes to templates keeping in mind a large number of servers and devices to monitor. We have:

  • added throttling and keepalive tuning for massive monitoring;
  • relaxed some triggers and related recovery to have no false positives and false negatives;
  • developed a new Custom templates module for Linux Multipath monitoring;
  • developed a new Custom template for NFS/CIFS monitoring (ZBXNEXT 6257);
  • developed a new custom Webhook for event ingestion on third-party software (CMS/Ticketing).

Zabbix configuration and provisioning

  • An essential part of the project was Zabbix configuration and provisioning, which was handled using Ansible tasks and playbook. This allowed us to distribute and automate agent installation and associate the templates with the hosts according to their role in the environment and with the host groups using the CMDB.
  • We have also developed some custom scripts, for instance, to have user alignment with the Active Directory.
  • We developed the single sign-on functionality using the Active Directory Federation Service and Zabbix SAML2.0 in order to interface with the Microsoft Active Directory functionality.

 

Issues found and solved

During the implementation, we found and solved many issues.

  • Dedicated proxy for each of 1,500 branches turned out too expensive to provide maintenance and support. So, it was decided to deploy fewer proxies and managed to connect all the devices in the branches using only four proxies.
  • Following deployment of all the metrics and the templates associated with over 10,000 devices, the Data Center database exceeded 3.5TB. To decrease the size of the database, we worked on throttling and on keep-alive and had to increase the keep-alive from 15 to 60 minutes and lower the sample interval to 5 minutes.
  • There is no official Zabbix Agent for Solaris 10 operating system. So, we needed to recompile and test this agent extensively.
  • The preprocessing step is not available for NFS stale status (ZBXNEXT-6257).
  • We needed to increase the maximum length of user macro to 2,048 characters on the server-side (ZBXNEXT-2603).
  • We needed to ask for JavaScript preprocessing user macros support (ZBXNEXT-5185).

Project deliverables

  • The project was started in April 2020, and massive deployment followed in July/August.
  • At the moment, we have over 5,000 monitored servers in two data centers and over 8,000 monitored devices in branches — servers, ATMs, switches, etc.
  • Currently, the data center database is less than 3.5TB each, and the branches’ database is about 0.5 TB.
  • We monitor two data centers with over 3,800 NPVS (new values per second).
  • Decommissioning of LibreNMS and CheckML is planned for the end of 2020.

Next steps

  • To complete the data center monitoring for other devices — to expand monitoring to networking equipment.
  • To complete branch monitoring for switches and Wi-Fi AP.
  • To implement Custom Periodic reporting.
  • To integrate with C-level dashboard.
  • To tune alerting and escalation to send the right messages to the right people so that messages will not be discarded.

Questions & Answers

Question. Have you considered upgrading to Zabbix 5.0 and using TimeScaleDB compression? What TimeScaleDB features are you interested in the most — partitioning or compression?

Answer. We plan to upgrade to Zabbix 5.0 later. First, we need to hold our infrastructure stress testing. So, we might wait for some minor release and then activate compression.

We use Postgres solutions for database, backup, and cluster management (Patroni), and TimeScaleDB is important to manage all this data efficiently.

Question. What is the expected NVPS for this environment?

Answer. Nearly 4,000 for the main DC and about 500 for the branches — a medium-large instance.

Question. What methods did you use to migrate from your numerous different solutions to Zabbix?

Answer. We used the easy method — installed everything from scratch as it was a complex task to migrate from too many different solutions. Most of the time, we used all monitoring solutions to check if Zabbix can collect the same monitoring information.

Data solution for solar energy application

Post Syndicated from Brad Berwald original https://blog.zabbix.com/data-solution-for-solar-energy-application/13005/

Morningstar, the world’s leading supplier of solar controllers for remote solar power systems, has partnered with Zabbix to provide pre-configured integration of their data-enabled solar power products with the Zabbix network monitoring solution. Now, both power system data and network performance metrics integrate seamlessly to allow remote solar systems to be monitored and managed from a single software platform on the premise or in the cloud, using solutions from Zabbix.

Contents

I. About Morningstar (1:43)
II. Products and technology (6:01)

III. Data solution for solar application (10:28)

IV. Zabbix solution (15:09)

V. Conclusion (21:40)
VI. Questions and Answers (23:09)

 

In this post, an overview of Morningstar’s diverse product line is presented and the industry applications they support, as well as of how the Zabbix network monitor can be used to manage and log time-series data using SNMP and MODBUS protocols for powerful and scalable system oversight and trend analysis.

Morningstar has been working in partnership with Zabbix to provide integration for the Morningstar products, including easy-to-use templates and pre-formatted data sets in order to speed up getting these products online, so that customers can monitor both their network data and solar power system data at remote sites.

About Morningstar

Morningstar is the leading supplier of charge controllers and inverters generally used in remote power systems around the world.

Morningstar, located in Newtown, Pennsylvania, USA, has sold over 4 million products deployed into the field since the company’s inception in 1993. Morningstar currently works in over 100 countries and provides reliable remote power for mission-critical applications.

We’d like to think of ourselves as the ‘charging experts’ because of our focus on battery life and many years of charging innovation.  We have a diverse product line and many models designed for application specific needs, such as solar lighting and telecommunications. Morningstar has one of the lowest hardware failure rates in the industry.

Some of these mission-critical applications include:

— Residential and rural electrification.

— Commercial systems.

— Industrial products, including telecommunications, oil and gas, security applications.

— Mobile and marine application, which generally includes boats, RVs, and caravans, agricultural applications, etc.

Overview of Morningstar solar applications

 

— Railroad industry where remote signaling and track management is often remotely powered by solar applications because of its critical nature and absence of a readily available electric grid.

— Traffic applications, early warning systems, signaling messaging systems, traffic, and speed monitoring equipment also can be easily powered for mobile deployment with a battery-based system.

— Oil and gas industry is a specific and notable market for Morningstar, because oil field automation measurement of the gas flow and pressure (RTUs), as well as methane injection points used to keep the gas flowing and avoid well freeze-ups, can be powered by solar power with a very modest amount of power for data monitoring. Since the pipelines often traverse very remote regions, this is highly advantageous to get power where it’s needed.

— In telecommunications, cellular base stations and backhaul links to provide the data for the sites, lane mobile radio applications, and satellite-based infrastructure benefit from remote solar power. In these applications, the loads can be modest or they can be quite significant. In that case, several controllers can be combined together to charge a very large battery bank often with a hybrid diesel gen-set in conjunction with other renewable energy sources to provide a hybrid power system. This increases reliability, provides diversity during inclement weather, and maintains the high integrity of the site link.

— In the cases of rural electrification, small amounts of DC power can be provided in remote locations with no grid access in the countries with a large populace and huge needs for lighting and cell phone charging.

Recently, we did a notable project in Peru, where nearly 1 million Peruvians were provided remote power access using 200,000 DC energy boxes. They provided basic 12V DC power, USB charging, and were distributed over the country in some of the most remote locations.  These home systems easily met the needs for lighting, device charging and other small equipment power needs. In addition, 3,000 integrated power systems for community centers were also deployed to provide 230V AC power for more critical loads, including more substantial lighting, communication and, in some cases, health equipment for the benefit of the local population.

So, we’re very proud to have deployed probably one of the largest and most ambitious rural electrification projects in the history of the off-grid industry. The project was completed last year with our partner Tozzi Green of Italy.

Products and technology

Morningstar has a diverse product set covering all power levels  — anywhere from modest 50W needs up to models that handle 3.2kW per device and that can be paralleled for even greater capacity. We also provide inverter systems in both our SureSine and coming MultiWave, which will be coming to market in the near future.  These inverters provide AC power and enable hybrid system charging (combining both solar and AC sources together).  These meet more demanding load needs and add robust high current charging capabilities from the grid or from diesel generators.

Morningstar products and technologies

So, together all these product lines make up a diverse set of products that really fulfill the variety of needs in an off-grid remote power system. In many cases, each of our products includes open communication protocols, which can be used for remote management.

Charge controllers

A charge controller is installed between the PV modules and the battery. It monitors various system power and voltage readings and temperatures, The charge controlled is also intended for managing the batteries in order to provide long-life adequate charging, take the batteries through their various charging stages, and to manage the DC loads connected to the device. They can, of course, extend the battery life significantly if the battery’s setpoints are configured correctly and the right choice for the battery model is made. That depends a lot on the battery chemistry, the temperatures it will experience, and how deeply it will be cycled or discharged each day while providing power to the system.

We have products in both the PWM and MPPT in our line of charge controller topologies.

 

MPPT controllers are able to convert DC power from the PV array to the proper battery voltage. So, it has an integrated DC-to-DC converter and controls the charge of the battery preventing overcharging and extending the life.

 

It is also used with one of our inverter products in order to provide small AC loads, such as equipment that requires 120 or 230-volt power remotely in the field.

Our product line covers a variety of PWM charge controllers.

PWM charge controllers

  • Pulse width modulation products are more cost-effective, simpler in design (from a complexity standpoint), and provide direct charging from an equally sized nominal solar array.
  • The MPPT charge controller line ensures the maximum power point is tracked (MPPT), therefore optimizing power harvest. The modules can be of a much wider range of voltages, much higher voltage, and will actually be monitored and tracked by the controller to provide the optimal operating point for the system.
  • The SunSaver and ProStar MPPT lines are used extensively in smaller systems under a thousand watts.
  • Our TriStar family is used for 3kW or greater and is able to be paralleled. A notable product is our 600V controller, which allows the PV modules to be wired in series for very high voltage input providing advantages in efficiency and PV array distance from the controller. All the MPPT controllers will take the input voltage and conver to the proper output voltage. They will convert it to the expected output to support 12V, 24V, or 48V battery systems.
  • Morningstar inverter line includes the SureSine and MultiWave inverter chargers. We also have a very extensive line of accessories used with each of these controllers. These generally provide protocol conversion hardware, interface adapters, and other items that can control relays to support system control or actuate additional components in remote off-grid systems.

EMC-1 Morningstar’s Ethernet MeterBus converter

EMC-1 is a simple serial 2 Ethernet converter that also supports a real-time operating system and a variety of protocols. So, Morningstar products can be connected to industry-specific applications using those standard protocols — Modbus over IP, or SNMP. It can display a simple HTML web GUI to allow direct connection and a one-to-one basis with the product for simple status monitoring using any type of device, including mobile devices, such as phones or tablets.

Data solution for solar application

Challenges to remote monitoring of solar power sources

Power for wireless ISP infrastructure is a common off-grid application requiring network traffic and power to be monitored together. Customers’ access in the field using Wi-Fi or LTE communications should also be enabled.

When these devices are deployed, clearly they have to have their network equipment monitored with the network management system or NMS. These network monitors can now be enabled with EMS and using SNMP, and Zabbix to make this far easier to integrate the power systems into the same monitoring system. So, you have a single point of software and data collection, and both power and network bandwidth and status can be monitored at the same time.

What this monitoring can help achieve:

  • Measurement of the true load consumption in the field. The power levels will vary depending on the type, amount of usage, and the technology and frequencies used. So, the load in the field throughout the day, during peak and off-peak hours can be directly monitored in real time.
  • Detection and root cause of network outages. We need to minimize network outages and to ensure that the site is reliable and the network is on at all times to avoid customer dissatisfaction and frustration by the operating carrier. The ability to monitor both power and network allows the root cause of network outages to be determined, whether it’s a system configuration, bandwidth restrictions, or something that has caused difficulty with the power system itself, such as a depleted battery, insufficient solar, even electrical faults, or possibly tampering with the system.
  • Ensuring sufficient power at the site to prevent deep battery discharge. It also ensures that you have adequate PV to cycle the battery properly. So, when the battery is depleted each day from powering the loads, it can be fully recharged the next day when PV power is available again. This balance is difficult to manage because you have to always ensure power for the battery, protect the loads, but you may or may not have adequate sun each day. So, reserve power is often provided in the system to ensure that the site will remain up during lower than average or uneven periods of PV supply.

Measurement of the current system status, as well as historical data. Measurement of all this data is ensured by the network monitoring software. Sometimes, with a high level of granularity, so that you can see what is happening in the system on a minute-by-minute or hour-by-hour basis.  This helps detect system faults otherwise missed.

Ensuring system resiliency during low periods of production. In peak times, you usually have more than adequate power. However, off-grid systems may be system-sized in order to handle a worst-case scenario, for instance, for the winter months or the off-peak months with the less amount of sun hours and lower levels of solar insulation that can provide as much power as you expected during the summer. So, during these worst-case periods of the year,  monitoring can be really critical because it’s when you’re most likely to experience an outage due to inadequate solar.

  • When long-term data is available, you can compare, for instance, month-to-month or season-to-season power output, as well as look at trends and analyze the system lifetime of operation to detect anomalies and negative trends during operation to indicate pending battery failure.

With a lot of lead-acid batteries, a minimum of five years is generally acceptable. With newer lithium technologies, battery life is extended to 10 or more years when adequately sized. So, the batteries can have a robust life as long as they are sized correctly and given adequate power.

But monitoring the end stage of a battery, which most likely will occur at some time in the system, is really critical. Many remote sites that are deployed for an extended period of time can go through one or two battery replacement cycles. So, a downward trend with the power declining over time and the batteries beginning to show signs that their health is no longer adequate to support the system can be detected with this long-term data analysis.

Zabbix solution

Remote monitoring of a Morningstar EMC-1 adapter’s IP connection through Zabbix monitoring platform

A typical system in this diagram shows one of the ProStar MPPT controllers connected to a solar array and a battery storage system. Loads can be typical among many of the applications. EMC-1 can be connected to the device in order to provide IP connectivity. That’s something that can be connected to a variety of services:

  • Modbus protocol to connect to SCADA or other HMI Data viewing solutions, which are common in automation and oil and gas.
  • Simple HTTP or HTML web pages to get a simple look at a dashboard to understand what’s going on in the system.
  • SNMP can be used alongside the network monitoring software, and the Zabbix network monitoring software can be enabled to monitor the entire site with just one tool.

Being cloud-based or server-based can have great applications for energy storage, data logging, notifications, and alarms. Native or external databases in the cloud can be used to archive the large amounts of data that will be accumulated. The sites can grow to hundreds or even thousands of deployed systems. So, the software tool and the server must be scalable so that they can grow in time and keep up with the needs of the data.

Advantages:

  • The benefits of an IP-based solution involve its compatibility with any network transport layer. In addition, there’s a variety of wireless applications in the field, including point-to-point, licensed, unlicensed, Wi-Fi, proprietary, wireless protocols,and cellular.
  • Recently, notable gains in the satellite industry have provided lower latency and higher bandwidth. With satellite, you can often reach almost any part of the world, which gives it great benefits for solar power applications.
  • SNMP — a very lightweight protocol. On a metered and wireless connection, especially in these hard-to-reach locations, low overhead UDP packets and minimal infrastructure for monitoring can make sure that you have minimal impact on the system itself in terms of overhead.

Zabbix dashboards for Morningstar solar systems

  • Native Morningstar SNMP support is provided by these tools. We work to review use cases, system needs for solar applications, as well as data sets. The MIB files are already being imported and device templates are being pre-configured for a variety of Morningstar product solutions that support the EMC-1.
  • Dedicated templates allow you to easily connect the hardware to an existing system and go about monitoring Morningstar’s tools using your existing Zabbix instance.
  • Performance visibility of solar-powered systems is available:

— on a very high level to see if there are any systems that have needs or are in a fault state, or

— in greater detail to analyze the time series data and to allow you to correlate that data with other aspects of the system, to determine what is the root cause of the system and how that data is trending. Time series data correlation provides for accelerated troubleshooting.

  • At a glance dashboard management tool makes it easy to monitor the status of all Morningstar devices on the network and to scale to hundreds of sites.
  • Active advanced and custom alerts sent out by the Zabbix system and triggered by the power system events ensure proactive notification of when there is a pending issue at the site, hopefully, before critical loads drop. If you can be notified, then using the bi-directional nature of some of the other protocols, system changes, corrections, extended runtimes, or even auxiliary charging systems, such as generators, can be activated to prevent an outage. Such proactive monitoring can only take place when you’re working at scale using a tool such as Zabbix.

Advantages of monitoring with Zabbix

  • Morningstar provides some simple PC-based utilities that run on Windows software and can provide direct Modbus capability for communication, very simple data logging on a very small number of devices. Morningstar MSView functional utility allows configuration files for the products to be uploaded and deployed to the controllers in the field, as well as basic troubleshooting.
  • Morningstar Live View is our built-in web dashboard that also runs on the EMC. It allows a simple web page with everything displayed in HTML so that it can be viewed on any device regardless of the operating system.

These two products are meant for troubleshooting, site deployment, and configuration of small-scale systems. They’re not set to be scaled.

  • With Zabbix, an almost unlimited number of devices can be connected depending on your computing resources and power.
  • Zabbix supports SNMP and Modbus, which is beneficial for both telecom and industrial automation or smart city applications.
  • Zabbix gives you a real-time data display, as well as custom alerts and notifications. You can set up custom log intervals, downloading of extensive amounts of log data, etc. Reports can be generated based on custom filtering, as well as long-term historical data, which becomes more critical to understanding the site’s longevity.
  • There are cloud-based systems where APIs are available, cloud-to-cloud integrations can be utilized and advanced data management analysis or intelligence can be added onto existing servers by using additional third-party tools.

So, it’s really the only way to manage data of this scale and size.

Conclusion

Zabbix adds a great deal of value and capabilities to Morningstar products when used in the field. If additional access is provided via satellite, cellular, or fixed wireless technologies, then the charge controllers can perform their duty of providing remote power for these systems but easily integrating using the existing protocols to monitor across the entire system deployment.

As solar equipment is often used to provide power for remote network infrastructure. Integrating data from the network components and power systems into a centralized NMS provides an essential management tool to optimize system health and increase uptime. Zabbix also adds configuration options and valuable data analytics to ensure full system visibility. More information on the Zabbix network monitoring tools or Morningstar data-enabled remote power products is available at https://www.zabbix.com and www.morningstarcorp.com or can be requested from [email protected] and [email protected], respectively.

Questions and Answers

Question. Are Morningstar templates shared somewhere? Are they available to the public?

Answer. A part of the partnership with Zabbix is to get all this integrated. We’re putting the finishing touches on how that will be available and easily downloadable as part of our SNMP support documentation. In addition, we can do some cross-referring, so that we can help our products get online. Hopefully, you can get them plugged into the major network monitoring access. All that will be available probably within the next month.

Question.  Zabbix starting from 5.2 natively supports Modbus and MQTT. Do you plan on using that in your environment?

Answer.  Yes, MQTT has come up quite recently and is an indeal solution where IP addressing challenges exist and pub/sub style data reporting is preferred from session initiated within the network. Currently, we support Modbus and SNMP, though we are considering other protocols. Modbus has been used within the solar industry for a long time for automation and control. We also have extenive market in the oil gas industry, where they utilize Modbus both for polling of the data, as well as real-time control by actually making configuration changes to the product remotely.

SNMP is a more recent development and it helps to get on the bandwagon with telecommunications and IT-related markets. So, it’s an easy transition using a protocol that customers are already familiar with.

Question. How do you use report generation? How do you enable it and implement it in Zabbix?

Answer. A lot of our customers are looking for trending data over a certain period of time. So, they would set up regular intervals for the data to be collected and reported because the long-term trending data is about looking at the same site during different periods of time or looking at the same site next to its peers to see how the power system may be varying from what is expected. So, regular report intervals can be executed and filtered based on certain conditions.

There are really a few key parameters of a solar site to look at to understand the health of the system. You need to focus on the battery levels, the maximum power of the solar panel, and a quick diagnostic check to find out if the controller shows any faults or alarms. So, if you have a simple report you can quickly be sure that hundreds of sites are in good shape. If one of them isn’t, you could drill down into more detail on just that specific site.

Let me subscribe – Zabbix masters IoT topics

Post Syndicated from Wolfgang Alper original https://blog.zabbix.com/let-me-subscribe-zabbix-masters-iot-topics/12710/

Zabbix 5.2 supports two important protocols used in the world of the Internet of Things — MQTT and Modbus. Now we can benefit from the newest Zabbix features and integrate Zabbix network monitoring in the world of IoT.

Contents

I. What is MQTT? (3:32:13)
II. MQTT and Zabbix integration (3:39:48)

1.MQTT setup (3:40:03)
2.Node-RED (3:42:12)
3.Splitting data (3:45:45)
4.Publishing data from Zabbix (3:52:23)

III. Questions & Answers (3:55:42)

What is MQTT?

MQTT — the Message Queuing Telemetry Transport was invented in 1999, and designed to be bandwidth-efficient and lightweight, thus battery efficient. Initially, it was developed to allow for monitoring oil pipelines.

It is a well-defined ISO standard — ISO/IEC 20922, and it is getting increasingly adopted due to its suitability for the Internet of Things (IoT), sensor networks, home automation, machine-to-machine (M2M), and mobile applications. MQTT usually uses TCP/IP as the transport protocol — over ports 1883, and can be encrypted using TLS transport mechanism with 8883 as the default port.

There is a variation of MQTT available — MQTT-SN (MQTT for Sensor Networks) used for non-TCP/IP networks, such as Zigbee (IEEE 80215.4 radio-based protocol) or other UDP / Bluetooth-based implementations.

There are 2 types of network entities available: ‘Message broker‘ and ‘Clients‘.

MQTT supports 3 Quality-of Service levels:

— 0: At most once – “Fire and forget” where you might or might not receive the message.
— 1: At least once – The message can be sent/delivered multiple times.
— 2: Exactly once – Safest and slowest service.

MQTT is based on a ‘publish’ / ‘subscribe-to-topic’ mechanism:

1. Publish/subscribe.

Publish/subscribe pattern

MQTT Message Broker consumes messages published by clients (on the left) using two-level ‘Topics‘ (such as, for instance, office temperature, office humidity, or indoor air quality). The clients on the rights side act as subscribers receiving any information published on a particular topic. Every time a message is published to the broker, the broker notifies all of the subscribers (Clients 3 and 4), and these clients get the sensor value.

2. Combined publishing/subscribing

Combined pub/sub

A client can be a subscriber and a publisher at the same time. So, in this example, Client 1 is publishing a brightness value and Client 3 has a subscription for that brightness value. Client 3 may decide that the brightness, for instance, of 1,500 might be too low, so it can publish a new message to the topic ‘office’ to let the light controller know that it should increase the brightness, while Client 2, for instance, the light controller with a subscription, may change the brightness level on receipt of the message.

3. Wildcards subs

+ = single-level, # = multi-level

Wildcards in MQTT are easy. So, you can have, for instance, ‘office + brightness’ topic,  where the ‘+’ sign can be substituted by any topic name. If the ‘+’ sign substitutes just one level in our topic, then it is a single-level wildcard. While the pound sign works for a multi-level wildcard.

MQTT features:

  • Clients can publish and subscribe to one or more topics.
  • One client can publish and subscribe at the same time.
  • Clients can subscribe using single/multi-level wildcards.
  • Clients can choose between three different QoS levels.

MQTT advanced features:

  • Messages can be retained by the broker for new subscribers. So, if a new client subscribes to a particular topic, then the publisher can mark its messages as ‘Retained‘ so that the new subscriber gets the last retained message.
  • Clients can provide a “last will and testament” that will be published by the broker when the client “dies”.

MQTT and Zabbix integration

MQTT setup

Integrating Zabbix into the multiple-client mix

Integrated structure:

1. Four sensors:

    • Server room.
    • Training room.
    • Sales room.
    • Support room.

2. Four different topics:

    • office
    • bielefeld (home town)
    • serverroom
    • trainingroom

3. Mosquitto MQTT Message Broker, which is one of the well-known message brokers.

So, the sensors are publishing the data to the Mosquitto Message Broker, where any MQTT-enabled device or system can pick those values up. In our case, it’s the home automation system, which subscribes to the Message Broker and has access to all of the values published by the sensors.

Thanks to MQTT support in Zabbix 5.2, Zabbix can now subscribe to the Mosquitto Message Broker and immediately get access to all of the sensors publishing their values to the broker.

As we can have multiple subscribers, multiple clients can subscribe to one topic on the Message Broker. So the home automation system can subscribe to the same values published to the Message Broker, as well as Zabbix.

Node-RED

Sooner or later, you will need Node-RED, which is a flow-based programming tool allowing you to subscribe to the broker and to publish messages to the broker acting as the client, as well as to work with the data.

Data Processing in Node-RED

This setup might be useful, if, for instance, some Zabbix trigger fires and passes the information over to the MQTT to publish the outcome of the trigger to the Message Broker, which will be then picked up by the home automation system.

Zabbix publishes data to the broker

You can have two different Zabbix instances subscribing to the same Message Broker acting just as two different clients.

Multiple Zabbix servers sharing the same data

Node-RED:

    • Construction kit for the Internet of Things and home automation.
    • Acts as MQTT client able to publish and subscribe.
    • Flow-based tool for visual programming based on Node.js.
    • Graphical web editor.
    • Supports input, processing, and output nodes.
    • Extensible with plugins and custom function nodes.

Different types of nodes can be connected in the workspace. For instance, the nodes subscribing to a topic and transforming the data, or the nodes writing the data to a log file.

Node-RED

We can get the data from the sensors as the raw JSON string containing 20-30 metrics in a payload, and as a parsed JSON object in the Node-RED Debug node with easy-to-read metrics, such as, for instance, temperature, humidity, WiFi quality, indoor air quality, etc.

Multiple metrics in one message

Splitting data

We have different options for data splitting available:

  • Split on MQTT level: use Node-RED to split metrics and then publish them in their own topics (it’s good to set up when other clients can handle only a single metric at a time).

Splitting data in Node-RED

 

  • Split on Zabbix level: set up an MQTT item as a master item and use Zabbix JSON preprocessing with corresponding dependent items. Its more efficient because Zabbix would need only one subscription.

We can get the data with the brand-new mqtt.get item in Zabbix 5.2:

— Requires Agent 2.
— Requires active checks. As every time a client publishes a message to the topic, we need the broker to push that data to us, we need active checks, so mqtt.get must listen to the subscription and get notified when the new data comes in.
— Broker URL default is localhost.
— User name and password are optional.
— Uses Eclipse Paho Go client library.

One Zabbix agent in active mode sending data to multiple hosts

For our setup with four sensors: in Sales Room, Server Room, Support Room, and Training Room, we need four hosts in Zabbix. Traditionally, you need four different agents to handle them as each agent running as active needs to configure its own hostname. However in our setup, we need just one agent installed and handling different hosts by subscribing to multiple topics.

This is possible because of the the new feature  running active agent checks from multiple hosts which is now available in Zabbix 5.2. All we need is:

—  to set up hosts in Zabbix (as usual),
—  to define our MQTT items (as usual),
—  to set up just one agent with all of the hostnames the agent should be responsible for (the new feature),
—  to set up the master item, which is our mqtt.get item,
—  to define several dependent items and preprocessing for each of the dependent items, and
—  to start preprocessing with JSONPath.

NOTE. Every time the master item gets an update, so do all of the dependent items in Zabbix.

Master item and dependent items

  • Combine both methods: let other clients subscribe to a single metric using their specific topic, but publish all sensor data for Zabbix in one topic.

NOTEData received and displayed on the dashboard is based on the MQTT item, the payload, and the MQTT messages received from the Message Broker.

Sensor data dashboard

Publishing data from Zabbix

Now you want to publish the outcome of a Zabbix trigger, so it can be consumed by other MQTT-enabled devices. Any MQTT subscriber, like Node-RED, should receive the alert. To do that, you need:

  • to define a new media type to send problems to the topic, that is, to pass the data over to the Message Broker:
  • to use the command-line tool for Mosquitto — mosquitto_pub allowing us to publish the message.
#!/bin/sh
mosquitto_pub -h yourbroker.io -m "$1" -t "zabbix/problems/$2"

  • to make sure that the data is sent to the broker in the right format. In this case, we use JSON as transport and define a JSON problem template and a JSON problem recovery template.

 

In Zabbix, you’ll see the problem, the actions, and the media type firing using the subscription, and in the Debug node of Note-RED, you’ll see that the data is received from Zabbix.

Zabbix problems  published via MQTT

This model with Node-RED can be used to create sophisticated setups. For instance, you can take the data from Zabbix, forward it by actions and media types, preprocess them in Node-RED, and transform the data in many different ways.

IoT devices and other subscribers can react to issues detected by Zabbix using Node-RED

NOTE. To try out the MQTT setup and new Zabbix features, you can use the Live broker available on IntelliTrend new GitHub account, getting data from Zabbix sensors every 10 minutes. You’ll also find templates,  access data, address of the broker, etc. —  everything you need to to get started.

Questions & Answers

Question. If the MQTT client gets overloaded due to high message frequency on subscribe topics, how will that affect Zabbix?

Answer. Here the broker might be overloaded or the Zabbix agent might not be able to follow up. If for the problem with the broker, the quality of service levels is defined in the MQTT protocol, more specifically — QoS level 2, which guarantees delivery. So if QoS2 is used as a QoS level, the messages won’t get lost but would be resent in case of failure.

Question. What else would you expect from the IoT side of Zabbix? What kind of protocols or things would get added? 

Answer. There’s always room for improvement. You can use third-party tools, custom scripts, or any tools to enhance Zabbix. I’m sure that using user script parameters was an excellent design decision. But the official support of MQTT is a quantum leap for Zabbix because it opens the door to most IoT infrastructures, as MQTT is the most important IoT protocol so far.

For instance, one of our customers is monitoring the infrastructure of electricity generators, production systems, etc. They use their own monitoring platform provided by vendors. The request was to integrate alerts or some metrics into Zabbix. The customer’s monitoring platform used MQTT protocol. So, all we had to do was to make their monitoring platform use external scripts and MQTT support.

Close problem automatically via Zabbix API

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/close-problem-automatically-via-zabbix-api/12461/

Today we are talking about a use case when it’s impossible to find a proper way to write a recovery expression for the Zabbix trigger. In other words, we know how to identify problems. But there is no good way to detect when the problem is gone.

This mostly relates to a huge environment, for example:

  • Got one log file. There are hundreds of patterns inside. We respect all of them. We need them
  • SNMP trap item (snmptrap.fallback) with different patterns being written inside

In these situations, the trigger is most likely configured to “Event generation mode: Multiple.” This practically means: when a “problematic metric” hits the instance, it will open +1 additional problem.

Goal:
I just need to receive an email about the record, then close the event.

As a workaround (let’s call it a solution here), we can define an action which will:

  1. contact an API endpoint
  2. manually acknowledge the event and close it

The biggest reason why this functionality is possible is that: when an event hits the action, the operation actually knows the event ID of the problem. The macro {EVENT.ID} saves the day.

To solve the problem, we need to install API characteristics at the global level:

     {$Z_API_PHP}=http://127.0.0.1/api_jsonrpc.php
    {$Z_API_USER}=api
{$Z_API_PASSWORD}=zabbix

NOTE
‘http://127.0.0.1/api_jsonrpc.php’ means the frontend server runs on the same server as systemd:zabbix-server. If it is not the case, we need to plot a front-end address of Zabbix GUI + add ‘api_jsonrpc.php’.

We will have 2 actions. The first one will deliver a notification to email:

After 1 minute, a second action will close the event:

This is a full bash snippet we must put inside. No need to change anything. It works with copy and paste:

url={$Z_API_PHP}
    user={$Z_API_USER}
password={$Z_API_PASSWORD}

# authorization
auth=$(curl -sk -X POST -H "Content-Type: application/json" -d "
{
	\"jsonrpc\": \"2.0\",
	\"method\": \"user.login\",
	\"params\": {
		\"user\": \"$user\",
		\"password\": \"$password\"
	},
	\"id\": 1,
	\"auth\": null
}
" $url | \
grep -E -o "([0-9a-f]{32,32})")

# acknowledge and close event
curl -sk -X POST -H "Content-Type: application/json" -d "
{
	\"jsonrpc\": \"2.0\",
	\"method\": \"event.acknowledge\",
	\"params\": {
		\"eventids\": \"{EVENT.ID}\",
		\"action\": 1,
		\"message\": \"Problem resolved.\"
	},
	\"auth\": \"$auth\",
	\"id\": 1
}" $url

# close api key
curl -sk -X POST -H "Content-Type: application/json" -d "
{
    \"jsonrpc\": \"2.0\",
    \"method\": \"user.logout\",
    \"params\": [],
    \"id\": 1,
    \"auth\": \"$auth\"
}
" $url