Tag Archives: reports

The importance of paying attention in building community trust

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/44915.html

Trust is important in any kind of interpersonal relationship. It’s inevitable that there will be cases where something you do will irritate or upset others, even if only to a small degree. Handling small cases well helps build trust that you will do the right thing in more significant cases, whereas ignoring things that seem fairly insignificant (or saying that you’ll do something about them and then failing to do so) suggests that you’ll also fail when there’s a major problem. Getting the small details right is a major part of creating the impression that you’ll deal with significant challenges in a responsible and considerate way.

This isn’t limited to individual relationships. Something that distinguishes good customer service from bad customer service is getting the details right. There are many industries where significant failures happen infrequently, but minor ones happen a lot. Would you prefer to give your business to a company that handles those small details well (even if they’re not overly annoying) or one that just tells you to deal with them?

And the same is true of software communities. A strong and considerate response to minor bug reports makes it more likely that users will be patient with you when dealing with significant ones. Handling small patch contributions quickly makes it more likely that a submitter will be willing to do the work of making more significant contributions. These things are well understood, and most successful projects have actively worked to reduce barriers to entry and to be responsive to user requests in order to encourage participation and foster a feeling that they care.

But what’s often ignored is that this applies to other aspects of communities as well. Failing to use inclusive language may not seem like a big thing in itself, but it leaves people with the feeling that you’re less likely to do anything about more egregious exclusionary behaviour. Allowing a baseline level of sexist humour gives the impression that you won’t act if there are blatant displays of misogyny. The more examples of these “insignificant” issues people see, the more likely they are to choose to spend their time somewhere else, somewhere they can have faith that major issues will be handled appropriately.

There’s a more insidious aspect to this. Sometimes we can believe that we are handling minor issues appropriately, that we’re acting in a way that handles people’s concerns, while actually failing to do so. If someone raises a concern about an aspect of the community, it’s important to discuss solutions with them. Putting effort into “solving” a problem without ensuring that the solution has the desired outcome is not only a waste of time, it alienates those affected even more – they’re now not only left with the feeling that they can’t trust you to respond appropriately, but that you will actively ignore their feelings in the process.

It’s not always possible to satisfy everybody’s concerns. Sometimes you’ll be left in situations where you have conflicting requests. In that case the best thing you can do is to explain the conflict and why you’ve made the choice you have, and demonstrate that you took this issue seriously rather than ignoring it. Depending on the issue, you may still alienate some number of participants, but it’ll be fewer than if you just pretend that it’s not actually a problem.

One warning, though: while building trust in this way enhances people’s willingness to join your community, it also builds expectations. If a significant issue does arise, and if you fail to handle it well, you’ll burn a lot of that trust in the process. The fact that you’ve built that trust in the first place may be what saves your community from disintegrating completely, but people will feel even more betrayed if you don’t actively work to rebuild it. And if there’s a pattern of mishandling major problems, no amount of getting the details right will matter.

Communities that ignore these issues are, long term, likely to end up weaker than communities that pay attention to them. Making sure you get this right in the first place, and setting expectations that you will pay attention to your contributors, is a vital part of building a meaningful relationship between your community and its members.

comment count unavailable comments

Case 224: Unsupported Accusations

Post Syndicated from The Codeless Code original http://thecodelesscode.com/case/224

While passing by the temple’s Support Desk, the nun
Hwídah heard of strange behavior in a certain
application. Since she had been appointed by master
Banzen to assist with production issues, the nun
dutifully described the symptoms to the application’s senior
monk:

“Occasionally a user will return to a record they had
previously edited, only to discover that some information is
missing,” said Hwídah. “The behavior is not repeatable, and
the users confess that they may be imagining things.”

“I have heard these reports,” said the senior monk. “There is
no bug in the code that I can see, nor can we reproduce the
problem in a lower environment.”

“Still, it may be prudent to investigate further,” said the
nun.

The monk sighed. “We are all exceedingly busy. Only a few
users have reported this issue, and even they doubt
themselves. So far, all are content to simply re-enter the
‘missing’ information and continue about their business.
Can you offer me one shred of evidence that this is anything
more than user error?”

The nun shook her head, bowed, and departed.

- - -

That night, the senior monk was awoken from his sleep by a
squeaking under his bed, of the sort a mouse might make.
This sound continued throughout the night—sometimes in
one place, sometimes another, presumably as the intruder
wandered about in search of food. A sandal flung in the
direction of the sound resulted in immediate quiet, but
eventually the squeaking would begin again in a different
part of the room.

“This is doubtless some lesson that the meddlesome Hwídah
wishes to teach me,” he complained to his fellows the next
day, dark circles under his eyes. “Yet I will not be
bullied into chasing nonexistent bugs. If the nun is so
annoyed by the squeaking of our users, let her deal with
it!”

The monk set mousetraps in the corners and equipped himself
with a pair of earplugs. Thus he passed the next night, and
the night after, though his sleep was less restful than he
would have liked.

On the seventh night, the exhausted monk turned off the
light and fell hard upon his bed. There was a loud CRACK
and the monk found himself tumbling through space. With a
CRASH he bounced off his mattress and rolled onto a cold
stone floor. His bed had, apparently, fallen through the
floor into the basement.

Perched high on a ladder—just outside the gaping hole in
the basement’s wooden ceiling—was the nun Hwídah, her
face lit only by a single candle hanging nearby. She
descended and dropped an old brace-and-bit hand drill into
the monk’s lap. Then she crouched down next to his ear.

“If you don’t understand it, it’s dangerous,” whispered the
nun.

Selecting Service Endpoints for Reliability and Performance

Post Syndicated from Nate Dye original https://aws.amazon.com/blogs/architecture/selecting-service-endpoints-for-reliability-and-performance/

Choose Your Route Wisely

Much like a roadway, the Internet is subject to congestion and blockage that cause slowdowns and at worst prevent packets from arriving at their destination. Like too many cars jamming themselves onto a highway, too much data over a route on the Internet results in slowdowns. Transatlantic cable breaks have much the same effect as road construction, resulting in detours and further congestion and may prevent access to certain sites altogether. Services with multiple points of presence paired with either smart clients or service-side routing logic improve performance and reliability by ensuring your viewers have access to alternative routes when roadways are blocked. This blog post provides you with an overview of service side and client side designs that can improve the reliability and performance of your Internet-facing services.

In order to understand how smart routing can increase performance, we must first understand why Internet downloads take time. To extend our roadway analogy, imagine we need to drive to the hardware to obtain supplies. If we can get our supplies in one trip and it takes 10 minutes to drive one way, it takes 20 minutes to get the supplies and return to the house project. (For sake of simplicity we’ll exclude the time involved with finding a parking spot, wandering aimlessly around the store in search of the right parts, and loading up materials.) Much of the time is taken up by driving the 5 km across multiples types of roadways to the hardware store.

Packets on the Internet must also transit physical distance and many links to reach their destination. A traceroute from my home network to my website hosted in eu-west-1 traverse 20 different links. It takes time for a packet to traverse the physical distance between each of these nodes. Traveling at a speed of 200,000 km/sec my packets travel from Seattle to Dublin and back in 170 ms.

Unlike roadways, when congestion occurs on Internet paths routers discard packets. Discarded packets result in retransmits by the sender adding additional round trips to the overall download. The additional round trips result in a slower download. Just as multiple trips to the hardware store results in more time spent driving and less time building.

To reduce the time of a round trip, a packet needs to travel a shorter physical distance. To avoid packet loss and additional round trips, a packet needs to avoid broken or congested routes. Performance and reliability of an Internet-facing service are improved by using the shortest physical path between the service and the client and by minimizing the number of round trips for data transfer.

Adding multiple points of presence to your architecture is like opening up a chain of hardware stores around the city. With multiple points of presence, drivers can choose a destination hardware store that allows them to drive the shortest physical distant. If the route to any one hardware store is congested or blocked entirely, driver can instead drive to the next nearest store.

Measuring the Internet

Finding the fastest and most reliable paths on across the Internet requires real world measurement. The closest endpoint as the crow flies often isn’t the same as fastest via Internet paths. The diagrams below depict an all too common example when driving in the real world. Given the starting point (the white dot), hardware store B (the red marker in the second diagram) is actually closer as the crow flies, but requires a lot more driving distance. A driver that knows the roadways will know that it will take less overall time to drive to store A.

Hardware store A is 10 minutes away by car

Hardware store B would be much closer by way of the water, but is 16 minutes by car

The same is true on the Internet, here’s a real-world example from CloudFront just after it launched in 2008. Below is a traceroute from an ISP network in Singapore to the SIN2 edge location in Singapore (internally AWS uses airport codes to identify edge locations). The trace route below shows that a packet sourced from a Singapore ISP to the SIN2 edge location was routed by way of Hong Kong! Based on the connectivity of this viewer network, Singapore customers are better served from the HKG1 edge location rather than SIN2. If routing logic assumed that geographically closer edge locations resulted in lower latency, the router engine would choose SIN2. By measuring latency the system we see that the system should select HKG50 as the preferred edge location, but going doing the routing system can provide the viewer with 32 ms round trip times versus 39 ms round trip times.

The topology of the Internet is always changing. The SIN to HKG example is old, now SIN2 is directly peered with the network above. Continuous or just in time measurements allow a system to adapt when Internet topology changes. Once the AWS networking team built a direct connection between our SIN edge location and the Singapore ISP, the CloudFront measurement system automatically observed lower latencies. SIN2 became preferred lowest latency pop. The direct path results in 2.2 ms round trip times versus 39 ms round trip times.

Leveraging Real World Measurements for Routing Decisions

Service-Side Routing

CloudFront and Route 53 both leverage similar mechanisms to route clients to the closest available end points. CloudFront uses routing logic in its DNS layer. When clients resolve DNS for a CloudFront distribution they receive a set of IPs that map to the closest available edge location. Route 53’s Latency-Based Routing combined with DNS Failover and health checks allows customers to build the same routing logic into any service hosted across multiple AWS regions.

Availability failures are detected via continuous health checks. Each CloudFront edge location checks the availability of every other edge location and reports the health-state to CloudFront DNS servers (which also happen to be Route 53 DNS servers). If an edge location starts failing health checks, CloudFront DNS servers respond to client queries with IPs that map to the client’s the next-closest healthy edge location. AWS customers can implement similar health check fail over capabilities within their services by use the Route 53 DNS failover and health checks feature.

Lowest-latency routing is achieved by continuously sampling latency measurements from viewer networks across the Internet to each AWS region and edge location. Latency measurement data is compiled into a list of AWS sites for each viewer network, sorted by latency. CloudFront and Route 53 DNS servers use this data to direct traffic. When network topology shifts occur, the latency measurement system picks up the changes and reorders the list of least-latent AWS endpoints for that given network.

Client-Side Routing

But service-side technology isn’t the only way to perform latency-based routing. Client software also has access to all of the data it needs to select the best endpoint. A smart client can intelligently rotate through a list of endpoints whenever a connection failure occurs. Smart clients perform their own latency measurements and choose the lowest latency endpoint as its connection destination. Most DNS resolvers do exactly that. Lee’s prior blog post made mention of DNS resolvers and provided a reference to this presentation.

The DNS protocol is well suited to smart-client endpoint selection. When a DNS server resolves a DNS name, say www.internetkitties.com, the server must first determine which DNS servers are authoritative to answer queries for the domain. In my case, my domain is hosted by Route 53, which provides me with 4 assigned name servers. Smart DNS resolvers track the query response times to each name server and preferentially query the fastest name server. If the name server fails to respond, the DNS resolver falls back to other name servers on the list.

There are examples in HTTP as well. The Kindle Fire web browser, Amazon Silk, makes use of a backend-resources hosted in multiple AWS regions to speed up the web browsing experience for its customers. The Amazon Silk client knows about and collects latency measurements against each endpoint of the Amazon Silk backend service. The client connects to the lowest-latency end point. As the device moves around from one network to another, the lowest-latency end point might change.

Service architecture with multiple points of presence on the Internet combined with latency-based service-side or client-side endpoint selection can improve the performance and reliability of your service. Let us know if you’d like to a follow blog on best practices for measuring Internet latencies or lessons learned from the Amazon Silk client-side endpoint selector.

– Nate Dye

Running Multiple HTTP Endpoints as a Highly Available Health Proxy

Post Syndicated from John Winder original https://aws.amazon.com/blogs/architecture/running-multiple-http-endpoints-as-a-highly-available-health-proxy/

Route 53 Health Checks provide the ability to verify that endpoints are reachable and that HTTP and HTTPS endpoints successfully respond. However, there are many situations where DNS failover would be useful, but TCP, HTTP, and HTTPS health checks alone can’t sufficiently determine the health of the endpoint. In these cases, it’s possible for an application to determine its own health, and then use an HTTP endpoint as a health proxy to Route 53 Health Checks to perform DNS failover. Today we’ll look at an example using Route 53 Health Checks to provide DNS failover between many pairs of primary and standby databases.

Determining the Health Status of Resources

In this example, we have many pairs of primary and standby databases. For each pair, we would like to perform DNS failover from the primary to the standby whenever the primary is unable to respond to queries. Each of these pairs is independent of the others, so one primary database failing over should not have any impact on the remaining databases.

The existing Route 53 health check types aren’t really sufficient to determine the health of a database. A TCP health check could determine if the database was reachable, but not if it was able to respond to queries. In addition, we’d like to switch to the standby not only if the primary is unable to answer queries, but also proactively if we need to perform maintenance or updates on the primary.

Reporting Health

Having determined the criteria for failing over a database, we’ll then want to provide this information to Route 53 to perform DNS failover. Since the databases themselves can’t meaningfully respond to health checks, we set up an HTTP endpoint to respond to health checks on behalf of the databases. This endpoint will act as a health proxy for the database, responding to health checks with HTTP status 200 when the database is healthy and HTTP status 500 when the database is unhealthy. We can also use a single HTTP endpoint to publish the health of all our databases. Since HTTP health checks can specify a resource path in addition to IP address and port, we can assign a unique resource path for each database.

To pull the information into Route 53, we then create an HTTP health check for each database. These health checks will all use the HTTP endpoint’s IP address and port, and each will have a unique resource path specific to the database we want to check. For example, in the diagram below, we see that the primary uses a path of “/DB1Primary” while the standby database uses a path of “/DB1Standby” on the same IP address endpoint.

We also configure the health checks with a failure threshold of 1 and interval of 10 seconds to better suit our use case. The default interval of 30 seconds and failure threshold of 3 are useful when we want Route 53 to determine the health of an endpoint, but in this case we already know the health of the endpoint and are simply using health checks to surface this information. By setting the failure threshold to 1, Route 53 immediately takes the result of the latest health check request. This means our health check will become unhealthy on the first health check request after the HTTP endpoint reports the database in unhealthy, rather than waiting for three unhealthy responses. We also use the 10 second interval to speed up our reaction time to an unhealthy database. With both of these changes, Route 53 should detect the HTTP endpoint becoming unhealthy within approximately 10 seconds instead of approximately 90 seconds.

Increasing Availability

At this point we have created health checks for our databases which could be associated with resource records sets for DNS failover. Although this works well for providing the health of the resource to Route 53, it does have a downside that traditional health checks don’t. If the health proxy itself fails, all health checks targeting it will time out and fail as well. This would prevent DNS failover for databases from working at all, and could result in all DNS queries responding with the standby database.

The first step to fixing this is to run more than one HTTP endpoint, each providing the status of the same resources. This way, if one of the endpoints fails, we’ll still have others to report the same information. We’ll then create a health check for each combination of database and HTTP endpoint. So if we run three HTTP endpoints, we’ll create three health checks for each database.

The diagram below shows how the HTTP endpoints receive health statuses from a single primary and standby database, and then pass it on to the Route 53 health checks. The bold line in the diagram depicts how the health status for a single database is sent by multiple HTTP endpoints to multiple Route 53 health checks.

With multiple HTTP endpoints, we’ll want to consider the primary database as healthy if at least one HTTP endpoint reports that it is healthy, or if at least one of its health checks is healthy. We’ll want to failover if none of the primary health checks report healthy, but at least one of the standby’s health checks reports healthy. If none of the primary’s or standby’s health checks report healthy, we would rather respond with the primary’s IP than return nothing at all. This also provides the desired behavior in case all HTTP endpoints go down; we’ll respond with the primary database for all database pairs, rather than failing over all databases.

To configure this behavior, we use multiple weighted round robin (WRR) resource record sets for each database. Each database and health check will have a separate WRR set, with primaries being weighted one and standbys weighted zero. When all primary database’s health checks are healthy, Route 53 will randomly choose one the WRR sets for the primary database with a weight of one. Since all these contain the same value, the IP address of the primary database, it doesn’t matter which one is selected and we get the correct response. When all health checks for a primary DB are unhealthy, their WRR sets will not be considered and Route 53 will chose randomly from the healthy zero-weighted standby sets. Once again, these all return the same IP address so it doesn’t matter which of these is chosen. Finally, if all primary and standby health checks are unhealthy, Route 53 will consider all WRR sets and choose among the one-weighted primary records.

The diagram below shows an example of WRR sets configured for DNS failover of a single database.

Applications and Tradeoffs

While this post discussed using health proxies in the context of health checking databases, this same technique can be used to apply DNS failover to many other resources. This could be applied to any other resources which can’t easily be checked by TCP, HTTP, or HTTPS health checks, such as file servers, mail servers, etc. This could also be applied to use DNS failover for other criteria, such as manually moving traffic away from a resource for deployment or when the error rate for a service rises above 5%.

It’s also worth mentioning that there are some tradeoffs to this approach that may make it less useful for some applications. The biggest of these is that in this approach the health checks no longer provide a connectivity check to the actual endpoint of interest. Since we configured the health checks to target the HTTP endpoints, we would actually be checking the connectivity of the HTTP endpoints instead of the databases themselves. For the use case in this example, this isn’t a problem. The databases are usually queried from other hosts in the same region, so global connectivity isn’t a concern.

The other tradeoff is having to maintain additional hosts for the HTTP endpoints. In the above example, we have more database pairs than HTTP endpoints, so this is a relatively small cost. If we only had a few databases, this design would be much less sensible.

– John Winder

 

Polypaudio 0.8 Released

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/polypaudio-0.8.html

The reports of Polypaudio’s death are greatly exaggerated.

We are proud to announce the release of Polypaudio
0.8, our networked sound daemon for Linux, other Unix-like operating
systems, and Microsoft Windows. Since the last official release, 0.7,
more than a year has passed. In the meantime Polypaudio experienced
major improvements. Major contributions have been made by both Pierre
Ossman and me. Pierre is being payed by Cendio AB to work on
Polypaudio. Cendio distributes Polypaudio along with their ThinLinc Terminal
Server
.

Some of the major changes:

  • New playback buffer model that allows applications to freely seek in
    the server side playback buffer (both with relative and absolute indexes) and to synchronize
    multiple streams together, in a way that the playback times are guaranteed to
    stay synchronized even in the case of a buffer underrun. (Lennart)
  • Ported to Microsoft Windows and Sun Solaris (Pierre)
  • Many inner loops (like sample type conversions) have been ported
    to liboil, which
    enables us to take advantage of modern SIMD instruction sets, like MMX or SSE/SSE2. (Lennart)
  • Support for channel maps which allow applications to assign
    specific speaker positions to logical channels. This enables support
    for “surround sound”. In addition we now support seperate volumes for
    all channels. (Lennart)
  • Support for hardware volume control for drivers that support
    it. (Lennart, Pierre)
  • Local users may now be authenticated just by the membership in a
    UNIX group, without the need to exchange authentication cookies. (Lennart)
  • A new driver module module-detect which detects
    automatically what local output devices are available and loads the
    needed drivers. Supports ALSA, OSS, Solaris and Win32 devices. (Lennart, Pierre)
  • Two new modules implementing RTP/SDP/SAP based multicast audio
    streaming. Useful for streaming music to multiple PCs with speakers
    simultaneously. Or for implementing a simple “always-on” conferencing
    solution for the LAN. Or for sharing a single MIC/LINE-IN jack on the
    LAN. (Lennart)
  • Two new modules for connecting Polypaudio to a JACK audio server
    (Lennart)
  • A new Zeroconf (mDNS/DNS-SD) publisher module. (Lennart)
  • A new module to control the volume of an output sink with a LIRC supported infrared remote
    control, and another one for doing so with a multimeda keyboard. (Lennart)
  • Support for resolving remote host names asynchronously using libasyncns. (Lennart)
  • A simple proof-of-concept HTTP module, which dumps the current daemon status to HTML. (Lennart)
  • Add proper validity checking of passed parameter to every single
    API functions. (Lennart)
  • Last but not least, the documentation has been beefed up a lot and
    is no longer just a simple doxygen-based API documentation (Pierre, Lennart)

Sounds good, doesn’t it? But that’s not all!

We’re really excited about this new Polypaudio release. However,
there are more very exciting, good news in the Polypaudio world. Pierre
implemented a Polypaudio plugin for alsa-libs. This means you
may now use any ALSA-aware application to access a Polypaudio sound
server! The patch has already merged upstream, and will probably
appear in the next official release of alsa-plugins.

Due to the massive internal changes we had to make a lot of modifications to
the public API. Hence applications which currently make use of the Polypaudio
0.7 API need to be updated. The patches or packages I maintain will be updated
in the next weeks one-by-one. (That is: xmms-polyp, the MPlayer patch, the
libao patch, the GStreamer patch and the PortAudio patch)

A side note: I wonder what this new release means for Polypaudio in
Debian. I’ve never been informed by the Debian maintainers of
Polypaudio that it has been uploaded to Debian, and never of the
removal either. In fact I never exchanged a single line with those who
were the Debian maintainers of Polypaudio. Is this the intended way
how the Debian project wants its developers to communicate with
upstream? I doubt that!

How does Polypaudio compare to ESOUND?

Polypaudio does everything what ESOUND does, and much more. It is a
fully compatible drop-in replacement. With a small script you can make
it command line compatible (including autospawning). ESOUND clients
may connect to our daemon just like they did to the original ESOUND
daemon, since we implemented a compatibility module for the ESOUND
protocol.

Support for other well known networked audio protocols (such as
NAS) should be easy to add – if there is a need.

For a full list of the features that Polypaudio has over ESOUND,
see Polypaudio’s
homepage
.

How does Polypaudio compare to ALSA‘s dmix?

Some people might ask whether there still is a need for a sound
server in times where ALSA’s dmix plugin is available. The
answer is: yes!

Firstly, Polypaudio is networked, which dmix is
not. However, there are many reasons why Polypaudio is useful on
non-networked systems as well. Polypaudio is portable, it is available
not just for Linux but for FreeBSD, Solaris and even Microsoft
Windows. Polypaudio is extensible, there is broad range of additional
modules
available which allow the user to use Polypaudio in many
exciting ways ALSA doesn’t offer. In Polypaudio streams, devices and
other server internals can be monitored and introspected freely. The
volume of the multiple streams may be manipulated independently of
each other, which allows new exciting applications like a work-alike
of the new per-application mixer tool featured in upcoming Windows
Vista. In multi-user systems, Polypaudio offers a secure and safe way
to allow multiple users to access the sound device
simultaneously. Polypaudio may be accessed through the ESOUND and the
ALSA APIs. In addition, ALSA dmix is still not supported properly by
many ALSA clients, and is difficult to setup.

A side node: dmix forks off its own simple sound daemon
anyway, hence there is no big difference to using Polypaudio with the
ALSA plugin in auto-spawning mode. (Though admittedly, those ALSA
clients that don’t work properly with dmix, won’t do so with our ALSA
plugin as well since they actually use the ALSA API incorrectly.)

How does Polypaudio compare to JACK?

Everytime people discuss sound servers on Unix/Linux and which way
is the right to go for desktops, JACK gets mentioned and suggested by some as a
replacement for ESOUND for the desktop. However, this is not
practical. JACK is not intended to be a desktop sound server, instead
it is designed for professional audio in mind. Its semantics are
different from other sound servers: e.g. it uses exclusively floating
point samples, doesn’t deal directly with interleaved channels and
maintains a server global time-line which may be stopped and seeked
around. All that translates badly to desktop usages. JACK is really
nice software, but just not designed for the normal desktop user,
who’s not working on professional audio production.

Since we think that JACK is really a nice piece of work, we added
two new modules to Polypaudio which can be used to hook it up to a
JACK server.

Get Polypaudio 0.8, while it is hot!

BTW: We’re looking for a logo for Polypaudio. Feel free to send us your suggestions!

Update: The Debian rant is unjust to Jeff Waugh. In fact, he had informed me that he prepared Debian packages of Polypaudio. I just never realized that he had actually uploaded them to Debian. What still stands, however, is that I’ve not been informed or asked about the removal.