Tag Archives: Oath

Announcing the 2nd Annual Moloch Conference: Learn how to augment your current security infrastructure

Post Syndicated from amberwilsonla original https://yahooeng.tumblr.com/post/179556677346

yahoodevelopers:

We’re excited to share that the 2nd Annual MolochON will be Thursday, Nov. 1, 2018 in Dulles, Virginia, at the Oath campus. Moloch is a large-scale, open source, full packet capturing, indexing and database system.

There’s no cost to attend the event and we’d love to see you there! Feel free to register here.

We’ll be joined by many fantastic speakers from the Moloch community to present on the following topics:

Moloch: Recent Changes & Upcoming Features
by Andy Wick, Sr Princ Architect, Oath & Elyse Rinne, Software Dev Engineer, Oath

Since the last MolochON, many new features have been added to Moloch. We will review some of these features and demo how to use them. We will also discuss a few desired upcoming features.

Speaker Bios
Andy is the creator of Moloch and former Architect of AIM. He joined the security team in 2011 and hasn’t looked back.

Elyse is the UI and full stack engineer for Moloch. She revamped the UI to be more user-friendly and maintainable. Now that the revamp has been completed, Elyse is working on implementing awesome new Moloch features!



Small Scale at Large Scale: Putting Moloch on the Analyst’s Desk
by Phil Hagen, SANS Senior Instructor, DFIR Strategist, Red Canary

I’ve been excited to add Moloch to the FOR572 class, Advanced Network Forensics at the SANS Institute. In FOR572, we cover Moloch with nearly 1,000 students per year, via classroom discussions and hands-on labs. This presents an interesting engineering problem, in that we provide a self-contained VMware image for the classroom lab, but it is also suitable for use in forensic casework. In this talk, I’ll cover some of what we did to make a single VM into a stable and predictable environment, distributed to hundreds of students across the world.

Speaker Bio
Phil is a Senior Instructor with the SANS Institute and the DFIR Strategist at Red Canary. He is the course lead for SANS FOR572, Advanced Network Forensics, and has been in the information security industry for over 20 years. Phil is also the lead for the SOF-ELK project, which provides a free, open source, ready-to-use Elastic Stack appliance to aid and optimize security operations and forensic processing. Networking is in his blood, dating back to a 2400 baud modem in an Apple //e, which he still has.



Oath Deployments
by Andy Wick, Sr Princ Architect, Oath

The formation of Oath gave us an opportunity to rethink and create a new visibility stack. In this talk, we will be sharing our process for designing our stack for both office and data center deployments and discussing the technologies we decided to use.

Speaker Bio
Andy is the creator of Moloch and former Architect of AIM. He joined the security team in 2011 and hasn’t looked back.



Centralized Management and Deployment with Docker and Ansible
by Taylor Ashworth, Cybersecurity Analyst

I will focus on how to use Docker and Ansible to deploy, update, and manage Moloch along with other tools like Suricata, WISE, and ES. I will explain the time-saving benefits of Ansible and the workload reduction benefits of Docker,and I will also cover the topic “Pros and cons of using Ansible tower/AWX over Ansible in CLI.” If time permits, I’ll discuss “Using WISE for data enrichment.”

Speaker Bio
Taylor is a cybersecurity analyst who was tired of the terrible tools he was presented with and decided to teach himself how to set up tools to successfully do his job.



Automated Threat Intel Investigation Pipeline
by Matt Carothers, Principal Security Architect, Cox Communications

I will discuss integrating Moloch into an automated threat intel investigation pipeline with MISP.

Speaker Bio
Matt enjoys sunsets, long hikes in the mountains and intrusion detection. After studying Computer Science at the University of Oklahoma, he accepted a position with Cox Communications in 2001 under the leadership of renowned thought leader and virtuoso bass player William “Wild Bill” Beesley, who asked to be credited in this bio. There, Matt formed Cox’s abuse department, which he led for several years, and today he serves as Cox’s Principal Security Architect.



Using WISE
by Andy Wick, Sr Princ Architect, Oath

We will review how to use WISE and provide real-life examples of features added since the last MolochON.

Speaker Bio
Andy is the creator of Moloch and former Architect of AIM. He joined the security team in 2011 and hasn’t looked back.



Moloch Deployments
by Srinath Mantripragada, Linux Integrator, SecureOps

I will present a Moloch deployment with 20+ different Moloch nodes. A range will be presented, including small, medium, and large deployments that go from full hardware with dedicated capture cards to virtualized point-of-presence and AWS with transit network. All nodes run Moloch, Suricata and Bro.

Speaker Bio
Srinath has worked as a SysAdmin and related positions for most of his career. He currently works as an Integrator/SysAdmin/DevOps for SecureOps, a Security Services company in Montreal, Canada.



Elasticsearch for Time-series Data at Scale
by Andrew Selden, Solution Architect, Elastic

Elasticsearch has evolved beyond search and logging to be a first-class, time-series metric store. This talk will explore how to achieve 1 million metrics/second on a relatively modest cluster. We will take a look at issues such as data modeling, debugging, tuning, sharding, rollups and more.

Speaker Bio
Andrew Selden has been running Elasticsearch at scale since 2011 where he previously led the search, NLP, and data engineering teams at Meltwater News and later developed streaming analytics solutions for BlueKai’s advertising platform (acquired by Oracle). He started his tenure at Elastic as a core engineer and for the last two years has been helping customers architect and scale.


After the conference, enjoy a complimentary happy hour, sponsored by Arista.

Hope to see you there!

A Peek Behind the Mail Curtain

Post Syndicated from marcelatoath original https://yahooeng.tumblr.com/post/174023151641

USE IMAP TO ACCESS SOME UNIQUE FEATURES

By Libby Lin, Principal Product Manager

Well, we actually won’t show you how we create the magic in our big OATH consumer mail factory. But nevertheless we wanted to share how interested developers could leverage some of our unique features we offer for our Yahoo and AOL Mail customers.

To drive experiences like our travel and shopping smart views or message threading, we tag qualified mails with something we call DECOS and THREADID. While we will not indulge in explaining how exactly we use them internally, we wanted to share how they can be used and accessed through IMAP.

So let’s just look at a sample IMAP command chain. We’ll just assume that you are familiar with the IMAP protocol at this point and you know how to properly talk to an IMAP server.

So here’s how you would retrieve DECO and THREADIDs for specific messages:

1. CONNECT

   openssl s_client -crlf -connect imap.mail.yahoo.com:993

2. LOGIN

   a login username password

   a OK LOGIN completed

3. LIST FOLDERS

   a list “” “*”

   * LIST (\Junk \HasNoChildren) “/” “Bulk Mail”

   * LIST (\Archive \HasNoChildren) “/” “Archive”

   * LIST (\Drafts \HasNoChildren) “/” “Draft”

   * LIST (\HasNoChildren) “/” “Inbox”

   * LIST (\HasNoChildren) “/” “Notes”

   * LIST (\Sent \HasNoChildren) “/” “Sent”

   * LIST (\Trash \HasChildren) “/” “Trash”

   * LIST (\HasNoChildren) “/” “Trash/l2”

   * LIST (\HasChildren) “/” “test level 1”

   * LIST (\HasNoChildren) “/” “test level 1/nestedfolder”

   * LIST (\HasNoChildren) “/” “test level 1/test level 2”

   * LIST (\HasNoChildren) “/” “&T2BZfXso-”

   * LIST (\HasNoChildren) “/” “&gQKAqk7WWr12hA-”

   a OK LIST completed

4.SELECT FOLDER

   a select inbox

   * 94 EXISTS

   * 0 RECENT

   * OK [UIDVALIDITY 1453335194] UIDs valid

   * OK [UIDNEXT 40213] Predicted next UID

   * FLAGS (\Answered \Deleted \Draft \Flagged \Seen $Forwarded $Junk $NotJunk)

   * OK [PERMANENTFLAGS (\Answered \Deleted \Draft \Flagged \Seen $Forwarded $Junk $NotJunk)] Permanent flags

   * OK [HIGHESTMODSEQ 205]

   a OK [READ-WRITE] SELECT completed; now in selected state

5. SEARCH FOR UID

   a uid search 1:*

   * SEARCH 1 2 3 4 11 12 14 23 24 75 76 77 78 114 120 121 124 128 129 130 132 133 134 135 136 137 138 40139 40140 40141 40142 40143 40144 40145 40146 40147 40148     40149 40150 40151 40152 40153 40154 40155 40156 40157 40158 40159 40160 40161 40162 40163 40164 40165 40166 40167 40168 40172 40173 40174 40175 40176     40177 40178 40179 40182 40183 40184 40185 40186 40187 40188 40190 40191 40192 40193 40194 40195 40196 40197 40198 40199 40200 40201 40202 40203 40204     40205 40206 40207 40208 40209 40211 40212

   a OK UID SEARCH completed

6. FETCH DECOS BASED ON UID

   a uid fetch 40212 (X-MSG-DECOS X-MSG-ID X-MSG-THREADID)

   * 94 FETCH (UID 40212 X-MSG-THREADID “108” X-MSG-ID “ACfIowseFt7xWtj0og0L2G0T1wM” X-MSG-DECOS (“FTI” “F1” “EML”))

   a OK UID FETCH completed

Secure Images

Post Syndicated from marcelatoath original https://yahooeng.tumblr.com/post/172068649246

oath-postmaster:

By Marcel Becker

The mail team at OATH is busy  integrating  Yahoo and AOL technology to deliver an even better experience across all our consumer mail products. While privacy and security are top priority for us, we also want to improve the experience and remove unnecessary clutter across all of our products.

Starting this week we will be serving images in mails via our own secure proxy servers. This will not only increase speed and security in our own mail products and reduce the risk of phishing and other scams,  but it will also mean that our users don’t have to fiddle around with those “enable images” settings. Messages and inline images will now just show up as originally intended.

We are aware that commercial mail senders are relying on images (so-called pixels) to track delivery and open rates. Our proxy solution will continue to support most of these cases and ensure that true mail opens are recorded.

For senders serving dynamic content based on the recipient’s location (leveraging standard IP-based browser and app capabilities) we recommend falling back on other tools and technologies which do not rely on IP-based targeting.

All of our consumer mail applications (Yahoo and AOL) will benefit from this change. This includes our desktop products as well as our mobile applications across iOS and Android.

If you have any feedback or want to discuss those changes with us personally, just send us a note to [email protected].

Secure Images

Post Syndicated from marcelatoath original https://yahooeng.tumblr.com/post/172037447286

By Marcel Becker

The mail team at OATH is busy  integrating  Yahoo and AOL technology to deliver an even better experience across all our consumer mail products. While privacy and security are top priority for us, we also want to improve the experience and remove unnecessary clutter across all of our products.

Starting this week we will be serving images in mails via our own secure proxy servers. This will not only increase speed and security in our own mail products and reduce the risk of phishing and other scams,  but it will also mean that our users don’t have to fiddle around with those “enable images” settings. Messages and inline images will now just show up as originally intended.

We are aware that commercial mail senders are relying on images (so-called pixels) to track delivery and open rates. Our proxy solution will continue to support most of these cases and ensure that true mail opens are recorded.

For senders serving dynamic content based on the recipient’s location (leveraging standard IP-based browser and app capabilities) we recommend falling back on other tools and technologies which do not rely on IP-based targeting.

All of our consumer mail applications (Yahoo and AOL) will benefit from this change. This includes our desktop products as well as our mobile applications across iOS and Android.

If you have any feedback or want to discuss those changes with us personally, just send us a note to [email protected].

Success at Apache: A Newbie’s Narrative

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/170536010891

yahoodevelopers:

Kuhu Shukla (bottom center) and team at the 2017 DataWorks Summit


By Kuhu Shukla

This post first appeared here on the Apache Software Foundation blog as part of ASF’s “Success at Apache” monthly blog series.

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then-new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath’s business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – “Just needs an API fix,”  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie’s rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made.

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing lifecycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project’s inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the whiteboards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

About the Author:

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on “Tez Shuffle Handler: Shuffling At Scale With Apache Hadoop”. Prior to that she worked on Juniper Networks’ router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

How to Make Your Web App More Reliable and Performant Using webpack: a Yahoo Mail Case Study

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/168508200981

yahoodevelopers:

image

By Murali Krishna Bachhu, Anurag Damle, and Utkarsh Shrivastava

As engineers on the Yahoo Mail team at Oath, we pride ourselves on the things that matter most to developers: faster development cycles, more reliability, and better performance. Users don’t necessarily see these elements, but they certainly feel the difference they make when significant improvements are made. Recently, we were able to upgrade all three of these areas at scale by adopting webpack® as Yahoo Mail’s underlying module bundler, and you can do the same for your web application.

What is webpack?

webpack is an open source module bundler for modern JavaScript applications. When webpack processes your application, it recursively builds a dependency graph that includes every module your application needs. Then it packages all of those modules into a small number of bundles, often only one, to be loaded by the browser.

webpack became our choice module bundler not only because it supports on-demand loading, multiple bundle generation, and has a relatively low runtime overhead, but also because it is better suited for web platforms and NodeJS apps and has great community support.

image

Comparison of webpack to other open source bundlers


How did we integrate webpack?

Like any developer does when integrating a new module bundler, we started integrating webpack into Yahoo Mail by looking at its basic config file. We explored available default webpack plugins as well as third-party webpack plugins and then picked the plugins most suitable for our application. If we didn’t find a plugin that suited a specific need, we wrote the webpack plugin ourselves (e.g., We wrote a plugin to execute Atomic CSS scripts in the latest Yahoo Mail experience in order to decrease our overall CSS payload**).

During the development process for Yahoo Mail, we needed a way to make sure webpack would continuously run in the background. To make this happen, we decided to use the task runner Grunt. Not only does Grunt keep the connection to webpack alive, but it also gives us the ability to pass different parameters to the webpack config file based on the given environment. Some examples of these parameters are source map options, enabling HMR, and uglification.

Before deployment to production, we wanted to optimize the javascript bundles for size to make the Yahoo Mail experience faster. webpack provides good default support for this with the UglifyJS plugin. Although the default options are conservative, they give us the ability to configure the options. Once we modified the options to our specifications, we saved approximately 10KB.

image

Code snippet showing the configuration options for the UglifyJS plugin


Faster development cycles for developers

While developing a new feature, engineers ideally want to see their code changes reflected on their web app instantaneously. This allows them to maintain their train of thought and eventually results in more productivity. Before we implemented webpack, it took us around 30 seconds to 1 minute for changes to reflect on our Yahoo Mail development environment. webpack helped us reduce the wait time to 5 seconds.

More reliability

Consumers love a reliable product, where all the features work seamlessly every time. Before we began using webpack, we were generating javascript bundles on demand or during run-time, which meant the product was more prone to exceptions or failures while fetching the javascript bundles. With webpack, we now generate all the bundles during build time, which means that all the bundles are available whenever consumers access Yahoo Mail. This results in significantly fewer exceptions and failures and a better experience overall.

Better Performance

We were able to attain a significant reduction of payload after adopting webpack.

  1. Reduction of about 75 KB gzipped Javascript payload
  2. 50% reduction on server-side render time
  3. 10% improvement in Yahoo Mail’s launch performance metrics, as measured by render time above the fold (e.g., Time to load contents of an email).

Below are some charts that demonstrate the payload size of Yahoo Mail before and after implementing webpack.

image

Payload before using webpack (JavaScript Size = 741.41KB)


image

Payload after switching to webpack (JavaScript size = 669.08KB)


image

Conclusion

Shifting to webpack has resulted in significant improvements. We saw a common build process go from 30 seconds to 5 seconds, large JavaScript bundle size reductions, and a halving in server-side rendering time. In addition to these benefits, our engineers have found the community support for webpack to have been impressive as well. webpack has made the development of Yahoo Mail more efficient and enhanced the product for users. We believe you can use it to achieve similar results for your web application as well.

**Optimized CSS generation with Atomizer

Before we implemented webpack into the development of Yahoo Mail, we looked into how we could decrease our CSS payload. To achieve this, we developed an in-house solution for writing modular and scoped CSS in React. Our solution is similar to the Atomizer library, and our CSS is written in JavaScript like the example below:

image

Sample snippet of CSS written with Atomizer


Every React component creates its own styles.js file with required style definitions. React-Atomic-CSS converts these files into unique class definitions. Our total CSS payload after implementing our solution equaled all the unique style definitions in our code, or only 83KB (21KB gzipped).

During our migration to webpack, we created a custom plugin and loader to parse these files and extract the unique style definitions from all of our CSS files. Since this process is tied to bundling, only CSS files that are part of the dependency chain are included in the final CSS.