Tag Archives: yahoo

Announcing the 2nd Annual Moloch Conference: Learn how to augment your current security infrastructure

Post Syndicated from amberwilsonla original https://yahooeng.tumblr.com/post/179556677346

yahoodevelopers:

We’re excited to share that the 2nd Annual MolochON will be Thursday, Nov. 1, 2018 in Dulles, Virginia, at the Oath campus. Moloch is a large-scale, open source, full packet capturing, indexing and database system.

There’s no cost to attend the event and we’d love to see you there! Feel free to register here.

We’ll be joined by many fantastic speakers from the Moloch community to present on the following topics:

Moloch: Recent Changes & Upcoming Features
by Andy Wick, Sr Princ Architect, Oath & Elyse Rinne, Software Dev Engineer, Oath

Since the last MolochON, many new features have been added to Moloch. We will review some of these features and demo how to use them. We will also discuss a few desired upcoming features.

Speaker Bios
Andy is the creator of Moloch and former Architect of AIM. He joined the security team in 2011 and hasn’t looked back.

Elyse is the UI and full stack engineer for Moloch. She revamped the UI to be more user-friendly and maintainable. Now that the revamp has been completed, Elyse is working on implementing awesome new Moloch features!



Small Scale at Large Scale: Putting Moloch on the Analyst’s Desk
by Phil Hagen, SANS Senior Instructor, DFIR Strategist, Red Canary

I’ve been excited to add Moloch to the FOR572 class, Advanced Network Forensics at the SANS Institute. In FOR572, we cover Moloch with nearly 1,000 students per year, via classroom discussions and hands-on labs. This presents an interesting engineering problem, in that we provide a self-contained VMware image for the classroom lab, but it is also suitable for use in forensic casework. In this talk, I’ll cover some of what we did to make a single VM into a stable and predictable environment, distributed to hundreds of students across the world.

Speaker Bio
Phil is a Senior Instructor with the SANS Institute and the DFIR Strategist at Red Canary. He is the course lead for SANS FOR572, Advanced Network Forensics, and has been in the information security industry for over 20 years. Phil is also the lead for the SOF-ELK project, which provides a free, open source, ready-to-use Elastic Stack appliance to aid and optimize security operations and forensic processing. Networking is in his blood, dating back to a 2400 baud modem in an Apple //e, which he still has.



Oath Deployments
by Andy Wick, Sr Princ Architect, Oath

The formation of Oath gave us an opportunity to rethink and create a new visibility stack. In this talk, we will be sharing our process for designing our stack for both office and data center deployments and discussing the technologies we decided to use.

Speaker Bio
Andy is the creator of Moloch and former Architect of AIM. He joined the security team in 2011 and hasn’t looked back.



Centralized Management and Deployment with Docker and Ansible
by Taylor Ashworth, Cybersecurity Analyst

I will focus on how to use Docker and Ansible to deploy, update, and manage Moloch along with other tools like Suricata, WISE, and ES. I will explain the time-saving benefits of Ansible and the workload reduction benefits of Docker,and I will also cover the topic “Pros and cons of using Ansible tower/AWX over Ansible in CLI.” If time permits, I’ll discuss “Using WISE for data enrichment.”

Speaker Bio
Taylor is a cybersecurity analyst who was tired of the terrible tools he was presented with and decided to teach himself how to set up tools to successfully do his job.



Automated Threat Intel Investigation Pipeline
by Matt Carothers, Principal Security Architect, Cox Communications

I will discuss integrating Moloch into an automated threat intel investigation pipeline with MISP.

Speaker Bio
Matt enjoys sunsets, long hikes in the mountains and intrusion detection. After studying Computer Science at the University of Oklahoma, he accepted a position with Cox Communications in 2001 under the leadership of renowned thought leader and virtuoso bass player William “Wild Bill” Beesley, who asked to be credited in this bio. There, Matt formed Cox’s abuse department, which he led for several years, and today he serves as Cox’s Principal Security Architect.



Using WISE
by Andy Wick, Sr Princ Architect, Oath

We will review how to use WISE and provide real-life examples of features added since the last MolochON.

Speaker Bio
Andy is the creator of Moloch and former Architect of AIM. He joined the security team in 2011 and hasn’t looked back.



Moloch Deployments
by Srinath Mantripragada, Linux Integrator, SecureOps

I will present a Moloch deployment with 20+ different Moloch nodes. A range will be presented, including small, medium, and large deployments that go from full hardware with dedicated capture cards to virtualized point-of-presence and AWS with transit network. All nodes run Moloch, Suricata and Bro.

Speaker Bio
Srinath has worked as a SysAdmin and related positions for most of his career. He currently works as an Integrator/SysAdmin/DevOps for SecureOps, a Security Services company in Montreal, Canada.



Elasticsearch for Time-series Data at Scale
by Andrew Selden, Solution Architect, Elastic

Elasticsearch has evolved beyond search and logging to be a first-class, time-series metric store. This talk will explore how to achieve 1 million metrics/second on a relatively modest cluster. We will take a look at issues such as data modeling, debugging, tuning, sharding, rollups and more.

Speaker Bio
Andrew Selden has been running Elasticsearch at scale since 2011 where he previously led the search, NLP, and data engineering teams at Meltwater News and later developed streaming analytics solutions for BlueKai’s advertising platform (acquired by Oracle). He started his tenure at Elastic as a core engineer and for the last two years has been helping customers architect and scale.


After the conference, enjoy a complimentary happy hour, sponsored by Arista.

Hope to see you there!

Innovating on Authentication Standards

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/175238642656

yahoodevelopers:

By George Fletcher and Lovlesh Chhabra

When Yahoo and AOL came together a year ago as a part of the new Verizon subsidiary Oath,  we took on the challenge of unifying their identity platforms based on current identity standards. Identity standards have been a critical part of the Internet ecosystem over the last 20+ years. From single-sign-on and identity federation with SAML; to the newer identity protocols including OpenID Connect, OAuth2, JOSE, and SCIM (to name a few); to the explorations of “self-sovereign identity” based on distributed ledger technologies; standards have played a key role in providing a secure identity layer for the Internet.

As we navigated this journey, we ran across a number of different use cases where there was either no standard or no best practice available for our varied and complicated needs. Instead of creating entirely new standards to solve our problems, we found it more productive to use existing standards in new ways.

One such use case arose when we realized that we needed to migrate the identity stored in mobile apps from the legacy identity provider to the new Oath identity platform. For most browser (mobile or desktop) use cases, this doesn’t present a huge problem; some DNS magic and HTTP redirects and the user will sign in at the correct endpoint. Also it’s expected for users accessing services via their browser to have to sign in now and then.

However, for mobile applications it’s a completely different story. The normal user pattern for mobile apps is for the user to sign in (via OpenID Connect or OAuth2) and for the app to then be issued long-lived tokens (well, the refresh token is long lived) and the user never has to sign in again on the device (entering a password on the device is NOT a good experience for the user).

So the issue is, how do we allow the mobile app to move from one
identity provider to another without the user having to re-enter their
credentials? The solution came from researching what standards currently
exist that might addres this use case (see figure “Standards Landscape”
below) and finding the OAuth 2.0 Token Exchange draft specification (https://tools.ietf.org/html/draft-ietf-oauth-token-exchange-13).

image

The Token Exchange draft allows for a given token to be exchanged for new tokens in a different domain. This could be used to manage the “audience” of a token that needs to be passed among a set of microservices to accomplish a task on behalf of the user, as an example. For the use case at hand, we created a specific implementation of the Token Exchange specification (a profile) to allow the refresh token from the originating Identity Provider (IDP) to be exchanged for new tokens from the consolidated IDP. By profiling this draft standard we were able to create a much better user experience for our consumers and do so without inventing proprietary mechanisms.

During this identity technical consolidation we also had to address how to support sharing signed-in users across mobile applications written by the same company (technically, signed with the same vendor signing key). Specifically, how can a signed-in user to Yahoo Mail not have to re-sign in when they start using the Yahoo Sports app? The current best practice for this is captured in OAuth 2.0 for Natives Apps (RFC 8252). However, the flow described by this specification requires that the mobile device system browser hold the user’s authenticated sessions. This has some drawbacks such as users clearing their cookies, or using private browsing mode, or even worse, requiring the IDPs to support multiple users signed in at the same time (not something most IDPs support).

While, RFC 8252 provides a mechanism for single-sign-on (SSO) across mobile apps provided by any vendor, we wanted a better solution for apps provided by Oath. So we looked at how could we enable mobile apps signed by the vendor to share the signed-in state in a more “back channel” way. One important fact is that mobile apps cryptographically signed by the same vender can securely share data via the device keychain on iOS and Account Manager on Android.

Using this as a starting point we defined a new OAuth2 scope, device_sso, whose purpose is to require the Authorization Server (AS) to return a unique “secret” assigned to that specific device. The precedent for using a scope to define specification behaviour is OpenID Connect itself, which defines the “openid” scope as the trigger for the OpenID Provider (an OAuth2 AS) to implement the OpenID Connect specification. The device_secret is returned to a mobile app when the OAuth2 code is exchanged for tokens and then stored by the mobile app in the device keychain and with the id_token identifying the user who signed in.

At this point, a second mobile app signed by the same vendor can look in the keychain and find the id_token, ask the user if they want to use that identity with the new app, and then use a profile of the token exchange spec to obtain tokens for the second mobile app based on the id_token and the device_secret. The full sequence of steps looks like this:

image

As a result of our identity consolidation work over the past year, we derived a set of principles identity architects should find useful for addressing use cases that don’t have a known specification or best practice. Moreover, these are applicable in many contexts outside of identity standards:

  1. Spend time researching the existing set of standards and draft standards. As the diagram shows, there are a lot of standards out there already, so understanding them is critical.
  2. Don’t invent something new if you can just profile or combine already existing specifications.
  3. Make sure you understand the spirit and intent of the existing specifications.
  4. For those cases where an extension is required, make sure to extend the specification based on its spirit and intent.
  5. Ask the community for clarity regarding any existing specification or draft.
  6. Contribute back to the community via blog posts, best practice documents, or a new specification.

As we learned during the consolidation of our Yahoo and AOL identity platforms, and as demonstrated in our examples, there is no need to resort to proprietary solutions for use cases that at first look do not appear to have a standards-based solution. Instead, it’s much better to follow these principles, avoid the NIH (not-invented-here) syndrome, and invest the time to build solutions on standards.

A Peek Behind the Mail Curtain

Post Syndicated from marcelatoath original https://yahooeng.tumblr.com/post/174023151641

USE IMAP TO ACCESS SOME UNIQUE FEATURES

By Libby Lin, Principal Product Manager

Well, we actually won’t show you how we create the magic in our big OATH consumer mail factory. But nevertheless we wanted to share how interested developers could leverage some of our unique features we offer for our Yahoo and AOL Mail customers.

To drive experiences like our travel and shopping smart views or message threading, we tag qualified mails with something we call DECOS and THREADID. While we will not indulge in explaining how exactly we use them internally, we wanted to share how they can be used and accessed through IMAP.

So let’s just look at a sample IMAP command chain. We’ll just assume that you are familiar with the IMAP protocol at this point and you know how to properly talk to an IMAP server.

So here’s how you would retrieve DECO and THREADIDs for specific messages:

1. CONNECT

   openssl s_client -crlf -connect imap.mail.yahoo.com:993

2. LOGIN

   a login username password

   a OK LOGIN completed

3. LIST FOLDERS

   a list “” “*”

   * LIST (\Junk \HasNoChildren) “/” “Bulk Mail”

   * LIST (\Archive \HasNoChildren) “/” “Archive”

   * LIST (\Drafts \HasNoChildren) “/” “Draft”

   * LIST (\HasNoChildren) “/” “Inbox”

   * LIST (\HasNoChildren) “/” “Notes”

   * LIST (\Sent \HasNoChildren) “/” “Sent”

   * LIST (\Trash \HasChildren) “/” “Trash”

   * LIST (\HasNoChildren) “/” “Trash/l2”

   * LIST (\HasChildren) “/” “test level 1”

   * LIST (\HasNoChildren) “/” “test level 1/nestedfolder”

   * LIST (\HasNoChildren) “/” “test level 1/test level 2”

   * LIST (\HasNoChildren) “/” “&T2BZfXso-”

   * LIST (\HasNoChildren) “/” “&gQKAqk7WWr12hA-”

   a OK LIST completed

4.SELECT FOLDER

   a select inbox

   * 94 EXISTS

   * 0 RECENT

   * OK [UIDVALIDITY 1453335194] UIDs valid

   * OK [UIDNEXT 40213] Predicted next UID

   * FLAGS (\Answered \Deleted \Draft \Flagged \Seen $Forwarded $Junk $NotJunk)

   * OK [PERMANENTFLAGS (\Answered \Deleted \Draft \Flagged \Seen $Forwarded $Junk $NotJunk)] Permanent flags

   * OK [HIGHESTMODSEQ 205]

   a OK [READ-WRITE] SELECT completed; now in selected state

5. SEARCH FOR UID

   a uid search 1:*

   * SEARCH 1 2 3 4 11 12 14 23 24 75 76 77 78 114 120 121 124 128 129 130 132 133 134 135 136 137 138 40139 40140 40141 40142 40143 40144 40145 40146 40147 40148     40149 40150 40151 40152 40153 40154 40155 40156 40157 40158 40159 40160 40161 40162 40163 40164 40165 40166 40167 40168 40172 40173 40174 40175 40176     40177 40178 40179 40182 40183 40184 40185 40186 40187 40188 40190 40191 40192 40193 40194 40195 40196 40197 40198 40199 40200 40201 40202 40203 40204     40205 40206 40207 40208 40209 40211 40212

   a OK UID SEARCH completed

6. FETCH DECOS BASED ON UID

   a uid fetch 40212 (X-MSG-DECOS X-MSG-ID X-MSG-THREADID)

   * 94 FETCH (UID 40212 X-MSG-THREADID “108” X-MSG-ID “ACfIowseFt7xWtj0og0L2G0T1wM” X-MSG-DECOS (“FTI” “F1” “EML”))

   a OK UID FETCH completed

Yahoo! Fined 35 Million USD For Late Disclosure Of Hack

Post Syndicated from Darknet original https://www.darknet.org.uk/2018/05/yahoo-fined-35-million-usd-for-late-disclosure-of-hack/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

Yahoo! Fined 35 Million USD For Late Disclosure Of Hack

Ah Yahoo! in trouble again, this time the news is Yahoo! fined for 35 million USD by the SEC for the 2 years delayed disclosure of the massive hack, we actually reported on the incident in 2016 when it became public – Massive Yahoo Hack – 500 Million Accounts Compromised.

Yahoo! has been having a rocky time for quite a few years now and just recently has sold Flickr to SmugMug for an undisclosed amount, I hope that at least helps pay off some of the fine.

Read the rest of Yahoo! Fined 35 Million USD For Late Disclosure Of Hack now! Only available at Darknet.

Achieving Major Stability and Performance Improvements in Yahoo Mail with a Novel Redux Architecture

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/173062946866

yahoodevelopers:

By Mohit Goenka, Gnanavel Shanmugam, and Lance Welsh

At Yahoo Mail, we’re constantly striving to upgrade our product experience. We do this not only by adding new features based on our members’ feedback, but also by providing the best technical solutions to power the most engaging experiences. As such, we’ve recently introduced a number of novel and unique revisions to the way in which we use Redux that have resulted in significant stability and performance improvements. Developers may find our methods useful in achieving similar results in their apps.

Improvements to product metrics

Last year Yahoo Mail implemented a brand new architecture using Redux. Since then, we have transformed the overall architecture to reduce latencies in various operations, reduce JavaScript exceptions, and better synchronized states. As a result, the product is much faster and more stable.

Stability improvements:

  • when checking for new emails – 20%
  • when reading emails – 30%
  • when sending emails – 20%

Performance improvements:

  • 10% improvement in page load performance
  • 40% improvement in frame rendering time

We have also reduced API calls by approximately 20%.

How we use Redux in Yahoo Mail

Redux architecture is reliant on one large store that represents the application state. In a Redux cycle, action creators dispatch actions to change the state of the store. React Components then respond to those state changes. We’ve made some modifications on top of this architecture that are atypical in the React-Redux community.

For instance, when fetching data over the network, the traditional methodology is to use Thunk middleware. Yahoo Mail fetches data over the network from our API. Thunks would create an unnecessary and undesirable dependency between the action creators and our API. If and when the API changes, the action creators must then also change. To keep these concerns separate we dispatch the action payload from the action creator to store them in the Redux state for later processing by “action syncers”. Action syncers use the payload information from the store to make requests to the API and process responses. In other words, the action syncers form an API layer by interacting with the store. An additional benefit to keeping the concerns separate is that the API layer can change as the backend changes, thereby preventing such changes from bubbling back up into the action creators and components. This also allowed us to optimize the API calls by batching, deduping, and processing the requests only when the network is available. We applied similar strategies for handling other side effects like route handling and instrumentation. Overall, action syncers helped us to reduce our API calls by ~20% and bring down API errors by 20-30%.

Another change to the normal Redux architecture was made to avoid unnecessary props. The React-Redux community has learned to avoid passing unnecessary props from high-level components through multiple layers down to lower-level components (prop drilling) for rendering. We have introduced action enhancers middleware to avoid passing additional unnecessary props that are purely used when dispatching actions. Action enhancers add data to the action payload so that data does not have to come from the component when dispatching the action. This avoids the component from having to receive that data through props and has improved frame rendering by ~40%. The use of action enhancers also avoids writing utility functions to add commonly-used data to each action from action creators.

image

In our new architecture, the store reducers accept the dispatched action via action enhancers to update the state. The store then updates the UI, completing the action cycle. Action syncers then initiate the call to the backend APIs to synchronize local changes.

Conclusion

Our novel use of Redux in Yahoo Mail has led to significant user-facing benefits through a more performant application. It has also reduced development cycles for new features due to its simplified architecture. We’re excited to share our work with the community and would love to hear from anyone interested in learning more.

Secure Images

Post Syndicated from marcelatoath original https://yahooeng.tumblr.com/post/172068649246

oath-postmaster:

By Marcel Becker

The mail team at OATH is busy  integrating  Yahoo and AOL technology to deliver an even better experience across all our consumer mail products. While privacy and security are top priority for us, we also want to improve the experience and remove unnecessary clutter across all of our products.

Starting this week we will be serving images in mails via our own secure proxy servers. This will not only increase speed and security in our own mail products and reduce the risk of phishing and other scams,  but it will also mean that our users don’t have to fiddle around with those “enable images” settings. Messages and inline images will now just show up as originally intended.

We are aware that commercial mail senders are relying on images (so-called pixels) to track delivery and open rates. Our proxy solution will continue to support most of these cases and ensure that true mail opens are recorded.

For senders serving dynamic content based on the recipient’s location (leveraging standard IP-based browser and app capabilities) we recommend falling back on other tools and technologies which do not rely on IP-based targeting.

All of our consumer mail applications (Yahoo and AOL) will benefit from this change. This includes our desktop products as well as our mobile applications across iOS and Android.

If you have any feedback or want to discuss those changes with us personally, just send us a note to [email protected].

Secure Images

Post Syndicated from marcelatoath original https://yahooeng.tumblr.com/post/172037447286

By Marcel Becker

The mail team at OATH is busy  integrating  Yahoo and AOL technology to deliver an even better experience across all our consumer mail products. While privacy and security are top priority for us, we also want to improve the experience and remove unnecessary clutter across all of our products.

Starting this week we will be serving images in mails via our own secure proxy servers. This will not only increase speed and security in our own mail products and reduce the risk of phishing and other scams,  but it will also mean that our users don’t have to fiddle around with those “enable images” settings. Messages and inline images will now just show up as originally intended.

We are aware that commercial mail senders are relying on images (so-called pixels) to track delivery and open rates. Our proxy solution will continue to support most of these cases and ensure that true mail opens are recorded.

For senders serving dynamic content based on the recipient’s location (leveraging standard IP-based browser and app capabilities) we recommend falling back on other tools and technologies which do not rely on IP-based targeting.

All of our consumer mail applications (Yahoo and AOL) will benefit from this change. This includes our desktop products as well as our mobile applications across iOS and Android.

If you have any feedback or want to discuss those changes with us personally, just send us a note to [email protected].

Success at Apache: A Newbie’s Narrative

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/170536010891

yahoodevelopers:

Kuhu Shukla (bottom center) and team at the 2017 DataWorks Summit


By Kuhu Shukla

This post first appeared here on the Apache Software Foundation blog as part of ASF’s “Success at Apache” monthly blog series.

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then-new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath’s business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – “Just needs an API fix,”  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie’s rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made.

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing lifecycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project’s inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the whiteboards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

About the Author:

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on “Tez Shuffle Handler: Shuffling At Scale With Apache Hadoop”. Prior to that she worked on Juniper Networks’ router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

Sublist3r – Fast Python Subdomain Enumeration Tool

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/12/sublist3r-fast-python-subdomain-enumeration-tool/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

Sublist3r – Fast Python Subdomain Enumeration Tool

Sublist3r is a Python-based tool designed to enumerate subdomains of websites using OSINT. It helps penetration testers and bug hunters collect and gather subdomains for the domain they are targeting.

It also integrates with subbrute for subdomain brute-forcing with word lists.

Features of Sublist3r Subdomain Enumeration Tool

It enumerates subdomains using many search engines such as:

  • Google
  • Yahoo
  • Bing
  • Baidu
  • Ask

The tool also enumerates subdomains using:

  • Netcraft
  • Virustotal
  • ThreatCrowd
  • DNSdumpster
  • ReverseDNS

Requirements of Sublist3r Subdomain Search

It currently supports Python 2 and Python 3.

Read the rest of Sublist3r – Fast Python Subdomain Enumeration Tool now! Only available at Darknet.

How to Make Your Web App More Reliable and Performant Using webpack: a Yahoo Mail Case Study

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/168508200981

yahoodevelopers:

image

By Murali Krishna Bachhu, Anurag Damle, and Utkarsh Shrivastava

As engineers on the Yahoo Mail team at Oath, we pride ourselves on the things that matter most to developers: faster development cycles, more reliability, and better performance. Users don’t necessarily see these elements, but they certainly feel the difference they make when significant improvements are made. Recently, we were able to upgrade all three of these areas at scale by adopting webpack® as Yahoo Mail’s underlying module bundler, and you can do the same for your web application.

What is webpack?

webpack is an open source module bundler for modern JavaScript applications. When webpack processes your application, it recursively builds a dependency graph that includes every module your application needs. Then it packages all of those modules into a small number of bundles, often only one, to be loaded by the browser.

webpack became our choice module bundler not only because it supports on-demand loading, multiple bundle generation, and has a relatively low runtime overhead, but also because it is better suited for web platforms and NodeJS apps and has great community support.

image

Comparison of webpack to other open source bundlers


How did we integrate webpack?

Like any developer does when integrating a new module bundler, we started integrating webpack into Yahoo Mail by looking at its basic config file. We explored available default webpack plugins as well as third-party webpack plugins and then picked the plugins most suitable for our application. If we didn’t find a plugin that suited a specific need, we wrote the webpack plugin ourselves (e.g., We wrote a plugin to execute Atomic CSS scripts in the latest Yahoo Mail experience in order to decrease our overall CSS payload**).

During the development process for Yahoo Mail, we needed a way to make sure webpack would continuously run in the background. To make this happen, we decided to use the task runner Grunt. Not only does Grunt keep the connection to webpack alive, but it also gives us the ability to pass different parameters to the webpack config file based on the given environment. Some examples of these parameters are source map options, enabling HMR, and uglification.

Before deployment to production, we wanted to optimize the javascript bundles for size to make the Yahoo Mail experience faster. webpack provides good default support for this with the UglifyJS plugin. Although the default options are conservative, they give us the ability to configure the options. Once we modified the options to our specifications, we saved approximately 10KB.

image

Code snippet showing the configuration options for the UglifyJS plugin


Faster development cycles for developers

While developing a new feature, engineers ideally want to see their code changes reflected on their web app instantaneously. This allows them to maintain their train of thought and eventually results in more productivity. Before we implemented webpack, it took us around 30 seconds to 1 minute for changes to reflect on our Yahoo Mail development environment. webpack helped us reduce the wait time to 5 seconds.

More reliability

Consumers love a reliable product, where all the features work seamlessly every time. Before we began using webpack, we were generating javascript bundles on demand or during run-time, which meant the product was more prone to exceptions or failures while fetching the javascript bundles. With webpack, we now generate all the bundles during build time, which means that all the bundles are available whenever consumers access Yahoo Mail. This results in significantly fewer exceptions and failures and a better experience overall.

Better Performance

We were able to attain a significant reduction of payload after adopting webpack.

  1. Reduction of about 75 KB gzipped Javascript payload
  2. 50% reduction on server-side render time
  3. 10% improvement in Yahoo Mail’s launch performance metrics, as measured by render time above the fold (e.g., Time to load contents of an email).

Below are some charts that demonstrate the payload size of Yahoo Mail before and after implementing webpack.

image

Payload before using webpack (JavaScript Size = 741.41KB)


image

Payload after switching to webpack (JavaScript size = 669.08KB)


image

Conclusion

Shifting to webpack has resulted in significant improvements. We saw a common build process go from 30 seconds to 5 seconds, large JavaScript bundle size reductions, and a halving in server-side rendering time. In addition to these benefits, our engineers have found the community support for webpack to have been impressive as well. webpack has made the development of Yahoo Mail more efficient and enhanced the product for users. We believe you can use it to achieve similar results for your web application as well.

**Optimized CSS generation with Atomizer

Before we implemented webpack into the development of Yahoo Mail, we looked into how we could decrease our CSS payload. To achieve this, we developed an in-house solution for writing modular and scoped CSS in React. Our solution is similar to the Atomizer library, and our CSS is written in JavaScript like the example below:

image

Sample snippet of CSS written with Atomizer


Every React component creates its own styles.js file with required style definitions. React-Atomic-CSS converts these files into unique class definitions. Our total CSS payload after implementing our solution equaled all the unique style definitions in our code, or only 83KB (21KB gzipped).

During our migration to webpack, we created a custom plugin and loader to parse these files and extract the unique style definitions from all of our CSS files. Since this process is tied to bundling, only CSS files that are part of the dependency chain are included in the final CSS.

Me on the Equifax Breach

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/me_on_the_equif.html

Testimony and Statement for the Record of Bruce Schneier
Fellow and Lecturer, Belfer Center for Science and International Affairs, Harvard Kennedy School
Fellow, Berkman Center for Internet and Society at Harvard Law School

Hearing on “Securing Consumers’ Credit Data in the Age of Digital Commerce”

Before the

Subcommittee on Digital Commerce and Consumer Protection
Committee on Energy and Commerce
United States House of Representatives

1 November 2017
2125 Rayburn House Office Building
Washington, DC 20515

Mister Chairman and Members of the Committee, thank you for the opportunity to testify today concerning the security of credit data. My name is Bruce Schneier, and I am a security technologist. For over 30 years I have studied the technologies of security and privacy. I have authored 13 books on these subjects, including Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World (Norton, 2015). My popular newsletter CryptoGram and my blog Schneier on Security are read by over 250,000 people.

Additionally, I am a Fellow and Lecturer at the Harvard Kennedy School of Government –where I teach Internet security policy — and a Fellow at the Berkman-Klein Center for Internet and Society at Harvard Law School. I am a board member of the Electronic Frontier Foundation, AccessNow, and the Tor Project; and an advisory board member of Electronic Privacy Information Center and VerifiedVoting.org. I am also a special advisor to IBM Security and the Chief Technology Officer of IBM Resilient.

I am here representing none of those organizations, and speak only for myself based on my own expertise and experience.

I have eleven main points:

1. The Equifax breach was a serious security breach that puts millions of Americans at risk.

Equifax reported that 145.5 million US customers, about 44% of the population, were impacted by the breach. (That’s the original 143 million plus the additional 2.5 million disclosed a month later.) The attackers got access to full names, Social Security numbers, birth dates, addresses, and driver’s license numbers.

This is exactly the sort of information criminals can use to impersonate victims to banks, credit card companies, insurance companies, cell phone companies and other businesses vulnerable to fraud. As a result, all 143 million US victims are at greater risk of identity theft, and will remain at risk for years to come. And those who suffer identify theft will have problems for months, if not years, as they work to clean up their name and credit rating.

2. Equifax was solely at fault.

This was not a sophisticated attack. The security breach was a result of a vulnerability in the software for their websites: a program called Apache Struts. The particular vulnerability was fixed by Apache in a security patch that was made available on March 6, 2017. This was not a minor vulnerability; the computer press at the time called it “critical.” Within days, it was being used by attackers to break into web servers. Equifax was notified by Apache, US CERT, and the Department of Homeland Security about the vulnerability, and was provided instructions to make the fix.

Two months later, Equifax had still failed to patch its systems. It eventually got around to it on July 29. The attackers used the vulnerability to access the company’s databases and steal consumer information on May 13, over two months after Equifax should have patched the vulnerability.

The company’s incident response after the breach was similarly damaging. It waited nearly six weeks before informing victims that their personal information had been stolen and they were at increased risk of identity theft. Equifax opened a website to help aid customers, but the poor security around that — the site was at a domain separate from the Equifax domain — invited fraudulent imitators and even more damage to victims. At one point, the official Equifax communications even directed people to that fraudulent site.

This is not the first time Equifax failed to take computer security seriously. It confessed to another data leak in January 2017. In May 2016, one of its websites was hacked, resulting in 430,000 people having their personal information stolen. Also in 2016, a security researcher found and reported a basic security vulnerability in its main website. And in 2014, the company reported yet another security breach of consumer information. There are more.

3. There are thousands of data brokers with similarly intimate information, similarly at risk.

Equifax is more than a credit reporting agency. It’s a data broker. It collects information about all of us, analyzes it all, and then sells those insights. It might be one of the biggest, but there are 2,500 to 4,000 other data brokers that are collecting, storing, and selling information about us — almost all of them companies you’ve never heard of and have no business relationship with.

The breadth and depth of information that data brokers have is astonishing. Data brokers collect and store billions of data elements covering nearly every US consumer. Just one of the data brokers studied holds information on more than 1.4 billion consumer transactions and 700 billion data elements, and another adds more than 3 billion new data points to its database each month.

These brokers collect demographic information: names, addresses, telephone numbers, e-mail addresses, gender, age, marital status, presence and ages of children in household, education level, profession, income level, political affiliation, cars driven, and information about homes and other property. They collect lists of things we’ve purchased, when we’ve purchased them, and how we paid for them. They keep track of deaths, divorces, and diseases in our families. They collect everything about what we do on the Internet.

4. These data brokers deliberately hide their actions, and make it difficult for consumers to learn about or control their data.

If there were a dozen people who stood behind us and took notes of everything we purchased, read, searched for, or said, we would be alarmed at the privacy invasion. But because these companies operate in secret, inside our browsers and financial transactions, we don’t see them and we don’t know they’re there.

Regarding Equifax, few consumers have any idea what the company knows about them, who they sell personal data to or why. If anyone knows about them at all, it’s about their business as a credit bureau, not their business as a data broker. Their website lists 57 different offerings for business: products for industries like automotive, education, health care, insurance, and restaurants.

In general, options to “opt-out” don’t work with data brokers. It’s a confusing process, and doesn’t result in your data being deleted. Data brokers will still collect data about consumers who opt out. It will still be in those companies’ databases, and will still be vulnerable. It just don’t be included individually when they sell data to their customers.

5. The existing regulatory structure is inadequate.

Right now, there is no way for consumers to protect themselves. Their data has been harvested and analyzed by these companies without their knowledge or consent. They cannot improve the security of their personal data, and have no control over how vulnerable it is. They only learn about data breaches when the companies announce them — which can be months after the breaches occur — and at that point the onus is on them to obtain credit monitoring services or credit freezes. And even those only protect consumers from some of the harms, and only those suffered after Equifax admitted to the breach.

Right now, the press is reporting “dozens” of lawsuits against Equifax from shareholders, consumers, and banks. Massachusetts has sued Equifax for violating state consumer protection and privacy laws. Other states may follow suit.

If any of these plaintiffs win in the court, it will be a rare victory for victims of privacy breaches against the companies that have our personal information. Current law is too narrowly focused on people who have suffered financial losses directly traceable to a specific breach. Proving this is difficult. If you are the victim of identity theft in the next month, is it because of Equifax or does the blame belong to another of the thousands of companies who have your personal data? As long as one can’t prove it one way or the other, data brokers remain blameless and liability free.

Additionally, much of this market in our personal data falls outside the protections of the Fair Credit Reporting Act. And in order for the Federal Trade Commission to levy a fine against Equifax, it needs to have a consent order and then a subsequent violation. Any fines will be limited to credit information, which is a small portion of the enormous amount of information these companies know about us. In reality, this is not an effective enforcement regime.

Although the FTC is investigating Equifax, it is unclear if it has a viable case.

6. The market cannot fix this because we are not the customers of data brokers.

The customers of these companies are people and organizations who want to buy information: banks looking to lend you money, landlords deciding whether to rent you an apartment, employers deciding whether to hire you, companies trying to figure out whether you’d be a profitable customer — everyone who wants to sell you something, even governments.

Markets work because buyers choose from a choice of sellers, and sellers compete for buyers. None of us are Equifax’s customers. None of us are the customers of any of these data brokers. We can’t refuse to do business with the companies. We can’t remove our data from their databases. With few limited exceptions, we can’t even see what data these companies have about us or correct any mistakes.

We are the product that these companies sell to their customers: those who want to use our personal information to understand us, categorize us, make decisions about us, and persuade us.

Worse, the financial markets reward bad security. Given the choice between increasing their cybersecurity budget by 5%, or saving that money and taking the chance, a rational CEO chooses to save the money. Wall Street rewards those whose balance sheets look good, not those who are secure. And if senior management gets unlucky and the a public breach happens, they end up okay. Equifax’s CEO didn’t get his $5.2 million severance pay, but he did keep his $18.4 million pension. Any company that spends more on security than absolutely necessary is immediately penalized by shareholders when its profits decrease.

Even the negative PR that Equifax is currently suffering will fade. Unless we expect data brokers to put public interest ahead of profits, the security of this industry will never improve without government regulation.

7. We need effective regulation of data brokers.

In 2014, the Federal Trade Commission recommended that Congress require data brokers be more transparent and give consumers more control over their personal information. That report contains good suggestions on how to regulate this industry.

First, Congress should help plaintiffs in data breach cases by authorizing and funding empirical research on the harm individuals receive from these breaches.

Specifically, Congress should move forward legislative proposals that establish a nationwide “credit freeze” — which is better described as changing the default for disclosure from opt-out to opt-in — and free lifetime credit monitoring services. By this I do not mean giving customers free credit-freeze options, a proposal by Senators Warren and Schatz, but that the default should be a credit freeze.

The credit card industry routinely notifies consumers when there are suspicious charges. It is obvious that credit reporting agencies should have a similar obligation to notify consumers when there is suspicious activity concerning their credit report.

On the technology side, more could be done to limit the amount of personal data companies are allowed to collect. Increasingly, privacy safeguards impose “data minimization” requirements to ensure that only the data that is actually needed is collected. On the other hand, Congress should not create a new national identifier to replace the Social Security Numbers. That would make the system of identification even more brittle. Better is to reduce dependence on systems of identification and to create contextual identification where necessary.

Finally, Congress needs to give the Federal Trade Commission the authority to set minimum security standards for data brokers and to give consumers more control over their personal information. This is essential as long as consumers are these companies’ products and not their customers.

8. Resist complaints from the industry that this is “too hard.”

The credit bureaus and data brokers, and their lobbyists and trade-association representatives, will claim that many of these measures are too hard. They’re not telling you the truth.

Take one example: credit freezes. This is an effective security measure that protects consumers, but the process of getting one and of temporarily unfreezing credit is made deliberately onerous by the credit bureaus. Why isn’t there a smartphone app that alerts me when someone wants to access my credit rating, and lets me freeze and unfreeze my credit at the touch of the screen? Too hard? Today, you can have an app on your phone that does something similar if you try to log into a computer network, or if someone tries to use your credit card at a physical location different from where you are.

Moreover, any credit bureau or data broker operating in Europe is already obligated to follow the more rigorous EU privacy laws. The EU General Data Protection Regulation will come into force, requiring even more security and privacy controls for companies collecting storing the personal data of EU citizens. Those companies have already demonstrated that they can comply with those more stringent regulations.

Credit bureaus, and data brokers in general, are deliberately not implementing these 21st-century security solutions, because they want their services to be as easy and useful as possible for their actual customers: those who are buying your information. Similarly, companies that use this personal information to open accounts are not implementing more stringent security because they want their services to be as easy-to-use and convenient as possible.

9. This has foreign trade implications.

The Canadian Broadcast Corporation reported that 100,000 Canadians had their data stolen in the Equifax breach. The British Broadcasting Corporation originally reported that 400,000 UK consumers were affected; Equifax has since revised that to 15.2 million.

Many American Internet companies have significant numbers of European users and customers, and rely on negotiated safe harbor agreements to legally collect and store personal data of EU citizens.

The European Union is in the middle of a massive regulatory shift in its privacy laws, and those agreements are coming under renewed scrutiny. Breaches such as Equifax give these European regulators a powerful argument that US privacy regulations are inadequate to protect their citizens’ data, and that they should require that data to remain in Europe. This could significantly harm American Internet companies.

10. This has national security implications.

Although it is still unknown who compromised the Equifax database, it could easily have been a foreign adversary that routinely attacks the servers of US companies and US federal agencies with the goal of exploiting security vulnerabilities and obtaining personal data.

When the Fair Credit Reporting Act was passed in 1970, the concern was that the credit bureaus might misuse our data. That is still a concern, but the world has changed since then. Credit bureaus and data brokers have far more intimate data about all of us. And it is valuable not only to companies wanting to advertise to us, but foreign governments as well. In 2015, the Chinese breached the database of the Office of Personal Management and stole the detailed security clearance information of 21 million Americans. North Korea routinely engages in cybercrime as way to fund its other activities. In a world where foreign governments use cyber capabilities to attack US assets, requiring data brokers to limit collection of personal data, securely store the data they collect, and delete data about consumers when it is no longer needed is a matter of national security.

11. We need to do something about it.

Yes, this breach is a huge black eye and a temporary stock dip for Equifax — this month. Soon, another company will have suffered a massive data breach and few will remember Equifax’s problem. Does anyone remember last year when Yahoo admitted that it exposed personal information of a billion users in 2013 and another half billion in 2014?

Unless Congress acts to protect consumer information in the digital age, these breaches will continue.

Thank you for the opportunity to testify today. I will be pleased to answer your questions.

Improved Search for Backblaze’s Blog

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/using-relevannssi-wordpress-search/

Improved Search for Backblaze's Blog
Search has become the most powerful method to find content on the Web, both for finding websites themselves and for discovering information within websites. Our blog readers find content in both ways — using Google, Bing, Yahoo, Ask, DuckDuckGo, and other search engines to follow search results directly to our blog, and using the site search function once on our blog to find content in the blog posts themselves.

There’s a Lot of Great Content on the Backblaze Blog

Backblaze’s CEO Gleb Budman wrote the first post for this blog in March of 2008. Since that post there have been 612 more. There’s a lot of great content on this blog, as evidenced by the more than two million page views we’ve had since the beginning of this year. We typically publish two blog posts per week on a variety of topics, but we focus primarily on cloud storage technology and data backup, company news, and how-to articles on how to use cloud storage and various hardware and software solutions.

Earlier this year we initiated a series of posts on entrepreneurship by our CEO and co-founder, Gleb Budman, which has proven tremendously popular. We also occasionally publish something a little lighter, such as our current Halloween video contest — there’s still time to enter!

Blog search box

The Site Search Box — Your gateway to Backblaze blog content

We Could do a Better Job of Helping You Find It

I joined Backblaze as Content Director in July of this year. During the application process, I spent quite a bit of time reading through the blog to understand the company, the market, and its customers. That’s a lot of reading. I used the site search many times to uncover topics and posts, and discovered that site search had a number of weaknesses that made it less-than-easy to find what I was looking for.

These site search weaknesses included:

Searches were case sensitive
Visitor could easily miss content capitalized differently than the search terms
Results showed no date or author information
Visitor couldn’t tell how recent the post was or who wrote it
Search terms were not highlighted in context
Visitor had to scrutinize the results to find the terms in the post
No indication of the number of results or number of pages of results
Visitor didn’t know how fruitful the search was
No record of search terms used by visitors
We couldn’t tell what our visitors were searching for!

I wanted to make it easier for blog visitors to find all the great content on the Backblaze blog and help me understand what our visitors are searching for. To do that, we needed to upgrade our site search.

I started with a list of goals I wanted for site search.

  1. Make it easier to find content on the blog
  2. Provide a summary of what was found
  3. Search the comments as well as the posts
  4. Highlight the search terms in the results to help find them in context
  5. Provide a record of searches to help me understand what interests our readers

I had the goals, now how could I find a solution to achieve them?

Our blog is built on WordPress, which has a built-in site search function that could be described as simply adequate. The most obvious of its limitations is that search results are listed chronologically, not based on “most popular,” most occurring,” or any other metric that might make the result more relevant to your interests.

The Search for Improved (Site) Search

An obvious choice to improve site search would be to adopt Google Site Search, as many websites and blogs have done. Unfortunately, I quickly discovered that Google is sunsetting Site Search by April of 2018. That left the choice among a number of third-party services or WordPress-specific solutions. My immediate inclination was to see what is available specifically for WordPress.

There are a handful of search plugins for WordPress. One stood out to me for the number of installations (100,000+) and overwhelmingly high reviews: Relevanssi. Still, I had a number of questions. The first question was whether the plugin retained any search data from our site — I wanted to make sure that the privacy of our visitors is maintained, and even harvesting anonymous search data would not be acceptable to Backblaze. I wrote to the developer and was pleased by the responsiveness from Relevanssi’s creator, Mikko Saari. He explained to me that Relevanssi doesn’t have access to any of the search data from the sites using his plugin. Receiving a quick response from a developer is always a good sign. Other signs of a good WordPress plugin are recent updates and an active support forum.

Our solution: Relevanssi for Site Search

The WordPress plugin Relevanssi met all of our criteria, so we installed the plugin and switched to using it for site search in September.

In addition to solving the problems listed above, our search results are now displayed based on relevance instead of date, which is the default behavior of WordPress search. That capability is very useful on our blog where a lot of the content from years ago is still valuable — often called evergreen content. The new site search also enables visitors to search using the boolean expressions AND and OR. For example, a visitor can search for “seagate AND drive,” and see results that only include both words. Alternatively, a visitor can search for “seagate OR drive” and see results that include either word.

screenshot of relevannssi wordpress search results

Search results showing total number of results, hits and their location, and highlighted search terms in context

Visitors can put search terms in quotation marks to search for an entire phrase. For example, a visitor can search for “2016 drive stats” and see results that include only that exact phrase. In addition, the site search results come with a summary, showing where the results were found (title, post, or comments). Search terms are highlighted in yellow in the content, showing exactly where the search result was found.

Here’s an example of a popular post that shows up in searches. Hard Drive Stats for Q1 2017 was published on May 9, 2017. Since September 4, it has shown up over 150 times in site searches and in the last 90 days in has been viewed over 53,000 times on our blog.

Hard Drive Stats for Q1 2017

The Results Tell the Story

Since initiating the new search on our blog on September 4, there have been almost 23,000 site searches conducted, so we know you are using it. We’ve implemented pagination for the blog feed and search results so you know how many pages of results there are and made it easier to navigate to them.

Now that we have this site search data, you likely are wondering which are the most popular search terms on our blog. Here are some of the top searches:

What Do You Search For?

Please tell us how you use site search and whether there are any other capabilities you’d like to see that would make it easier to find content on our blog.

The post Improved Search for Backblaze’s Blog appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

SQLiv – SQL Injection Dork Scanning Tool

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/10/sqliv-sql-injection-dork-scanning-tool/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

SQLiv – SQL Injection Dork Scanning Tool

SQLiv is a Python-based massive SQL Injection dork scanning tool which uses Google, Bing or Yahoo for targetted scanning, multiple-domain scanning or reverse domain scanning.

SQLiv Massive SQL Injection Scanner Features

Both the SQLi scanning and domain info checking are done in a multiprocess manner so the script is super fast at scanning a lot of URLs. It’s a fairly new tool and there are plans for more features and to add support for other search engines like DuckDuckGo.

Read the rest of SQLiv – SQL Injection Dork Scanning Tool now! Only available at Darknet.

Open Sourcing Vespa, Yahoo’s Big Data Processing and Serving Engine

Post Syndicated from ris original https://lwn.net/Articles/734926/rss

Oath, parent company of Yahoo, has announced
that it has released Vespa as an open source
project on GitHub.
Building applications increasingly means dealing with huge amounts of data. While developers can use the the Hadoop stack to store and batch process big data, and Storm to stream-process data, these technologies do not help with serving results to end users. Serving is challenging at large scale, especially when it is necessary to make computations quickly over data while a user is waiting, as with applications that feature search, recommendation, and personalization.

By releasing Vespa, we are making it easy for anyone to build applications
that can compute responses to user requests, over large datasets, at real
time and at internet scale – capabilities that up until now, have been
within reach of only a few large companies.” (Thanks to Paul Wise)

On the Equifax Data Breach

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/on_the_equifax_.html

Last Thursday, Equifax reported a data breach that affects 143 million US customers, about 44% of the population. It’s an extremely serious breach; hackers got access to full names, Social Security numbers, birth dates, addresses, driver’s license numbers — exactly the sort of information criminals can use to impersonate victims to banks, credit card companies, insurance companies, and other businesses vulnerable to fraud.

Many sites posted guides to protecting yourself now that it’s happened. But if you want to prevent this kind of thing from happening again, your only solution is government regulation (as unlikely as that may be at the moment).

The market can’t fix this. Markets work because buyers choose between sellers, and sellers compete for buyers. In case you didn’t notice, you’re not Equifax’s customer. You’re its product.

This happened because your personal information is valuable, and Equifax is in the business of selling it. The company is much more than a credit reporting agency. It’s a data broker. It collects information about all of us, analyzes it all, and then sells those insights.

Its customers are people and organizations who want to buy information: banks looking to lend you money, landlords deciding whether to rent you an apartment, employers deciding whether to hire you, companies trying to figure out whether you’d be a profitable customer — everyone who wants to sell you something, even governments.

It’s not just Equifax. It might be one of the biggest, but there are 2,500 to 4,000 other data brokers that are collecting, storing, and selling information about you — almost all of them companies you’ve never heard of and have no business relationship with.

Surveillance capitalism fuels the Internet, and sometimes it seems that everyone is spying on you. You’re secretly tracked on pretty much every commercial website you visit. Facebook is the largest surveillance organization mankind has created; collecting data on you is its business model. I don’t have a Facebook account, but Facebook still keeps a surprisingly complete dossier on me and my associations — just in case I ever decide to join.

I also don’t have a Gmail account, because I don’t want Google storing my e-mail. But my guess is that it has about half of my e-mail anyway, because so many people I correspond with have accounts. I can’t even avoid it by choosing not to write to gmail.com addresses, because I have no way of knowing if [email protected] is hosted at Gmail.

And again, many companies that track us do so in secret, without our knowledge and consent. And most of the time we can’t opt out. Sometimes it’s a company like Equifax that doesn’t answer to us in any way. Sometimes it’s a company like Facebook, which is effectively a monopoly because of its sheer size. And sometimes it’s our cell phone provider. All of them have decided to track us and not compete by offering consumers privacy. Sure, you can tell people not to have an e-mail account or cell phone, but that’s not a realistic option for most people living in 21st-century America.

The companies that collect and sell our data don’t need to keep it secure in order to maintain their market share. They don’t have to answer to us, their products. They know it’s more profitable to save money on security and weather the occasional bout of bad press after a data loss. Yes, we are the ones who suffer when criminals get our data, or when our private information is exposed to the public, but ultimately why should Equifax care?

Yes, it’s a huge black eye for the company — this week. Soon, another company will have suffered a massive data breach and few will remember Equifax’s problem. Does anyone remember last year when Yahoo admitted that it exposed personal information of a billion users in 2013 and another half billion in 2014?

This market failure isn’t unique to data security. There is little improvement in safety and security in any industry until government steps in. Think of food, pharmaceuticals, cars, airplanes, restaurants, workplace conditions, and flame-retardant pajamas.

Market failures like this can only be solved through government intervention. By regulating the security practices of companies that store our data, and fining companies that fail to comply, governments can raise the cost of insecurity high enough that security becomes a cheaper alternative. They can do the same thing by giving individuals affected by these breaches the ability to sue successfully, citing the exposure of personal data itself as a harm.

By all means, take the recommended steps to protect yourself from identity theft in the wake of Equifax’s data breach, but recognize that these steps are only effective on the margins, and that most data security is out of your hands. Perhaps the Federal Trade Commission will get involved, but without evidence of “unfair and deceptive trade practices,” there’s nothing it can do. Perhaps there will be a class-action lawsuit, but because it’s hard to draw a line between any of the many data breaches you’re subjected to and a specific harm, courts are not likely to side with you.

If you don’t like how careless Equifax was with your data, don’t waste your breath complaining to Equifax. Complain to your government.

This essay previously appeared on CNN.com.

EDITED TO ADD: In the early hours of this breach, I did a radio interview where I minimized the ramifications of this. I didn’t know the full extent of the breach, and thought it was just another in an endless string of breaches. I wondered why the press was covering this one and not many of the others. I don’t remember which radio show interviewed me. I kind of hope it didn’t air.

ЕСПЧ: проследяване на електронните съобщения на работното място

Post Syndicated from nellyo original https://nellyo.wordpress.com/2017/09/13/echrretention/

Решението на ЕСПЧ по делото Bărbulescu v. Румъния (61496/08) засяга решението на едно търговско дружество да освободи служител, след като наблюдава електронните  комуникации вътре в дружеството. 

Голямата Камара (11:6 гласа) решава, че е налице нарушение на чл.8   (право на зачитане на личния и семейния живот, дома и кореспонденцията) от Европейската конвенция за правата на човека. Съдът стигна до заключението, че националните органи не са защитили по подходящ начин  правото на г-н Bărbulescu на зачитане на личния живот и кореспонденция.  По-специално, националните съдилища не са установили:

  • дали г-н Bărbulescu е получил предварително уведомяване на работодателя си, че комуникацията му може да бъде наблюдавана;
  • дали  е бил информиран за естеството и  степента на мониторинга и  степента на навлизане в личния  живот и кореспонденция;
  • конкретните причини, обосноваващи въвеждането на мерки за мониторинг;
  • дали работодателят е могъл да използва мерки, които водят до по-малко навлизане в личния живот и кореспонденция на г-н Bărbulescu;
  • дали комуникациите биха могли да бъдат достъпни без негово знание и др.

 Фактите

Жалбоподателят Bogdan Mihai Bărbulescu е румънски гражданин, нает от частна компания като инженер, отговарящ за продажбите. По искане на работодателя си той създава акаунт в Yahoo Messenger с цел да отговаря на  запитванията на клиентите. По-късно Bărbulescu е уволнен с мотив, че неговите съобщения в Yahoo Messenger са били наблюдавани и че е имало доказателство, че е използвал интернет за лични цели, което било забранено от правилника. Г-н Bărbulescu оспорва неуспешно решението на работодателя си пред съда в Букурещ.

В решението си в състав от 12 януари 2016 г. Европейският съд по правата на човека се произнесе (6:1), че няма нарушение на чл.8 от Конвенцията и съдът е постигнал справедлив баланс между правото на г-н Bărbulescu на зачитане на личния му живот и кореспонденцията по член 8 и интересите на  работодателя.
Голямата камара

 В решението се отбеляза, че  – дори ограничени – на работното място правото на зачитане на личния живот и кореспонденцията продължават да съществуват.
 Националните органи са били задължени да намерят баланс между конкурентни  интереси:  правото на г-н Bărbulescu на зачитане на личния му живот, от една страна, и правото на работодателя му да предприеме мерки за  гладкото функциониране на търговското дружество, от друга страна.
 
Националните съдилища обаче не са установили дали г-н Bărbulescu е бил уведомен предварително за възможността работодателят му да въведе мерки за мониторинг и на естеството на такива мерки.  За да се квалифицира като предварителнопредупреждението от страна на работодателя трябва  да бъде дадено преди започване на мониторинга. От материалите   по делото е видно, че г-н Bărbulescu не е бил предварително информиран за обхвата и характера на наблюдението.
Националните съдилища не са анализирали дали причините, които   оправдават  мониторинга върху комуникациите на г-н Bărbulescu, са достатъчни. Има ли основателно предположение, че поведението на служителя излага на риск дружеството и може да ангажира отговорността му.
Националните съдилища не са проверили дали целта, преследвана от работодателя,  би могла да бъде постигната чрез   методи, представляващи по-малка намеса в личния живот и кореспонденцията.
Не е изяснено също  налага ли се служителят да бъде уволнен, т. е. вследствие на наблюдението да се наложи най-тежката дисциплинарна санкция.
Нарушение на чл.8 ЕКПЧ.
Очаква се и решение по делото Libert v. France ( 588/13), в което служител възразява срещу уволнение, основано на достъп на работодателя до лични файлове на служителя на служебния компютър.

Filed under: Media Law Tagged: еспч

Yahoo Mail’s New Tech Stack, Built for Performance and Reliability

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/162320493306

By Suhas Sadanandan, Director of Engineering 

When it comes to performance and reliability, there is perhaps no application where this matters more than with email. Today, we announced a new Yahoo Mail experience for desktop based on a completely rewritten tech stack that embodies these fundamental considerations and more.

We built the new Yahoo Mail experience using a best-in-class front-end tech stack with open source technologies including React, Redux, Node.js, react-intl (open-sourced by Yahoo), and others. A high-level architectural diagram of our stack is below.

image

New Yahoo Mail Tech Stack

In building our new tech stack, we made use of the most modern tools available in the industry to come up with the best experience for our users by optimizing the following fundamentals:

Performance

A key feature of the new Yahoo Mail architecture is blazing-fast initial loading (aka, launch).

We introduced new network routing which sends users to their nearest geo-located email servers (proximity-based routing). This has resulted in a significant reduction in time to first byte and should be immediately noticeable to our international users in particular.

We now do server-side rendering to allow our users to see their mail sooner. This change will be immediately noticeable to our low-bandwidth users. Our application is isomorphic, meaning that the same code runs on the server (using Node.js) and the client. Prior versions of Yahoo Mail had programming logic duplicated on the server and the client because we used PHP on the server and JavaScript on the client.   

Using efficient bundling strategies (JavaScript code is separated into application, vendor, and lazy loaded bundles) and pushing only the changed bundles during production pushes, we keep the cache hit ratio high. By using react-atomic-css, our homegrown solution for writing modular and scoped CSS in React, we get much better CSS reuse.  

In prior versions of Yahoo Mail, the need to run various experiments in parallel resulted in additional branching and bloating of our JavaScript and CSS code. While rewriting all of our code, we solved this issue using Mendel, our homegrown solution for bucket testing isomorphic web apps, which we have open sourced.  

Rather than using custom libraries, we use native HTML5 APIs and ES6 heavily and use PolyesterJS, our homegrown polyfill solution, to fill the gaps. These factors have further helped us to keep payload size minimal.

With all the above optimizations, we have been able to reduce our JavaScript and CSS footprint by approximately 50% compared to the previous desktop version of Yahoo Mail, helping us achieve a blazing-fast launch.

In addition to initial launch improvements, key features like search and message read (when a user opens an email to read it) have also benefited from the above optimizations and are considerably faster in the latest version of Yahoo Mail.

We also significantly reduced the memory consumed by Yahoo Mail on the browser. This is especially noticeable during a long running session.

Reliability

With this new version of Yahoo Mail, we have a 99.99% success rate on core flows: launch, message read, compose, search, and actions that affect messages. Accomplishing this over several billion user actions a day is a significant feat. Client-side errors (JavaScript exceptions) are reduced significantly when compared to prior Yahoo Mail versions.

Product agility and launch velocity

We focused on independently deployable components. As part of the re-architecture of Yahoo Mail, we invested in a robust continuous integration and delivery flow. Our new pipeline allows for daily (or more) pushes to all Mail users, and we push only the bundles that are modified, which keeps the cache hit ratio high.

Developer effectiveness and satisfaction

In developing our tech stack for the new Yahoo Mail experience, we heavily leveraged open source technologies, which allowed us to ensure a shorter learning curve for new engineers. We were able to implement a consistent and intuitive onboarding program for 30+ developers and are now using our program for all new hires. During the development process, we emphasise predictable flows and easy debugging.

Accessibility

The accessibility of this new version of Yahoo Mail is state of the art and delivers outstanding usability (efficiency) in addition to accessibility. It features six enhanced visual themes that can provide accommodation for people with low vision and has been optimized for use with Assistive Technology including alternate input devices, magnifiers, and popular screen readers such as NVDA and VoiceOver. These features have been rigorously evaluated and incorporate feedback from users with disabilities. It sets a new standard for the accessibility of web-based mail and is our most-accessible Mail experience yet.

Open source 

We have open sourced some key components of our new Mail stack, like Mendel, our solution for bucket testing isomorphic web applications. We invite the community to use and build upon our code. Going forward, we plan on also open sourcing additional components like react-atomic-css, our solution for writing modular and scoped CSS in React, and lazy-component, our solution for on-demand loading of resources.

Many of our company’s best technical minds came together to write a brand new tech stack and enable a delightful new Yahoo Mail experience for our users.

We encourage our users and engineering peers in the industry to test the limits of our application, and to provide feedback by clicking on the Give Feedback call out in the lower left corner of the new version of Yahoo Mail.

More notes on US-CERTs IOCs

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/06/more-notes-on-us-certs-iocs.html

Yet another Russian attack against the power grid, and yet more bad IOCs from the DHS US-CERT.

IOCs are “indicators of compromise“, things you can look for in order to order to see if you, too, have been hacked by the same perpetrators. There are several types of IOCs, ranging from the highly specific to the uselessly generic.

A uselessly generic IOC would be like trying to identify bank robbers by the fact that their getaway car was “white” in color. It’s worth documenting, so that if the police ever show up in a suspected cabin in the woods, they can note that there’s a “white” car parked in front.

But if you work bank security, that doesn’t mean you should be on the lookout for “white” cars. That would be silly.

This is what happens with US-CERT’s IOCs. They list some potentially useful things, but they also list a lot of junk that waste’s people’s times, with little ability to distinguish between the useful and the useless.

An example: a few months ago was the GRIZZLEYBEAR report published by US-CERT. Among other things, it listed IP addresses used by hackers. There was no description which would be useful IP addresses to watch for, and which would be useless.

Some of these IP addresses were useful, pointing to servers the group has been using a long time as command-and-control servers. Other IP addresses are more dubious, such as Tor exit nodes. You aren’t concerned about any specific Tor exit IP address, because it changes randomly, so has no relationship to the attackers. Instead, if you cared about those Tor IP addresses, what you should be looking for is a dynamically updated list of Tor nodes updated daily.

And finally, they listed IP addresses of Yahoo, because attackers passed data through Yahoo servers. No, it wasn’t because those Yahoo servers had been compromised, it’s just that everyone passes things though them, like email.

A Vermont power-plant blindly dumped all those IP addresses into their sensors. As a consequence, the next morning when an employee checked their Yahoo email, the sensors triggered. This resulted in national headlines about the Russians hacking the Vermont power grid.

Today, the US-CERT made similar mistakes with CRASHOVERRIDE. They took a report from Dragos Security, then mutilated it. Dragos’s own IOCs focused on things like hostile strings and file hashes of the hostile files. They also included filenames, but similar to the reason you’d noticed a white car — because it happened, not because you should be on the lookout for it. In context, there’s nothing wrong with noting the file name.

But the US-CERT pulled the filenames out of context. One of those filenames was, humorously, “svchost.exe”. It’s the name of an essential Windows service. Every Windows computer is running multiple copies of “svchost.exe”. It’s like saying “be on the lookout for Windows”.

Yes, it’s true that viruses use the same filenames as essential Windows files like “svchost.exe”. That’s, generally, something you should be aware of. But that CRASHOVERRIDE did this is wholly meaningless.

What Dragos Security was actually reporting was that a “svchost.exe” with the file hash of 79ca89711cdaedb16b0ccccfdcfbd6aa7e57120a was the virus — it’s the hash that’s the important IOC. Pulling the filename out of context is just silly.

Luckily, the DHS also provides some of the raw information provided by Dragos. But even then, there’s problems: they provide it in formatted form, for HTML, PDF, or Excel documents. This corrupts the original data so that it’s no longer machine readable. For example, from their webpage, they have the following:

import “pe”
import “hash”

Among the problems are the fact that the quote marks have been altered, probably by Word’s “smart quotes” feature. In other cases, I’ve seen PDF documents get confused by the number 0 and the letter O, as if the raw data had been scanned in from a printed document and OCRed.

If this were a “threat intel” company,  we’d call this snake oil. The US-CERT is using Dragos Security’s reports to promote itself, but ultimate providing negative value, mutilating the content.

This, ultimately, causes a lot of harm. The press trusted their content. So does the network of downstream entities, like municipal power grids. There are tens of thousands of such consumers of these reports, often with less expertise than even US-CERT. There are sprinklings of smart people in these organizations, I meet them at hacker cons, and am fascinated by their stories. But institutionally, they are dumbed down the same level as these US-CERT reports, with the smart people marginalized.

There are two solutions to this problem. The first is that when the stupidity of what you do causes everyone to laugh at you, stop doing it. The second is to value technical expertise, empowering those who know what they are doing. Examples of what not to do are giving power to people like Obama’s cyberczar, Michael Daniels, who once claimed his lack of technical knowledge was a bonus, because it allowed him to see the strategic picture instead of getting distracted by details.