Tag Archives: pets

Wanted: Junior Support Technician

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-junior-support-technician/

Backblaze is growing and as we grow we want to make sure that our customers are very well taken care of! One of the departments that grows along with our customer base is the support department, which is located in our San Mateo, California headquarters. Want to jump start your career? Take a look below and if this sounds like you, apply to join our team!

Responsibilities:

  • Answer questions in the queue commonly found in the FAQ.
  • Ability to install and uninstall programs on Mac and PC.
  • Clear communication via email.
  • Learn and expand your knowledge base to become a Tech Support Agent.
  • Learn how to navigate the Zendesk support tool and create helpful macros and work flow views.
  • Create receipts for users that ask for them, via template.
  • Respond to any tickets that you get a reply to.
  • Ask questions for facilitate learning.
  • Obtain skills and knowledge to move into a Tier 2 position.

Requirements:

  • Excellent communication, time management, problem solving, and organizational skills.
  • Ability to learn quickly.
  • Position based in San Mateo, California.

Backblaze Employees Have:

  • Good attitude and willingness to do whatever it takes to get the job done.
  • Strong desire to work for a small, fast-paced company.
  • Desire to learn and adapt to rapidly changing technologies and work environment.
  • Comfortable with well-behaved pets in the office.

Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

If This Sounds Like You:
Send an email to jobscontact@backblaze.com with:

  1. The position title in the subject line
  2. Your resume attached
  3. An overview of your relevant experience

The post Wanted: Junior Support Technician appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Wanted: Sales Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-sales-engineer/

At inception, Backblaze was a consumer company. Thousands upon thousands of individuals came to our website and gave us $5/mo to keep their data safe. But, we didn’t sell business solutions. It took us years before we had a sales team. In the last couple of years, we’ve released products that businesses of all sizes love: Backblaze B2 Cloud Storage and Backblaze for Business Computer Backup. Those businesses want to integrate Backblaze deeply into their infrastructure, so it’s time to hire our first Sales Engineer!

Company Description:
Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Backblaze B2 cloud storage is a building block for almost any computing service that requires storage. Customers need our help integrating B2 into iOS apps to Docker containers. Some customers integrate directly to the API using the programming language of their choice, others want to solve a specific problem using ready made software, already integrated with B2.

At the same time, our computer backup product is deepening it’s integration into enterprise IT systems. We are commonly asked for how to set Windows policies, integrate with Active Directory, and install the client via remote management tools.

We are looking for a sales engineer who can help our customers navigate the integration of Backblaze into their technical environments.

Are you 1/2” deep into many different technologies, and unafraid to dive deeper?

Can you confidently talk with customers about their technology, even if you have to look up all the acronyms right after the call?

Are you excited to setup complicated software in a lab and write knowledge base articles about your work?

Then Backblaze is the place for you!

Enough about Backblaze already, what’s in it for me?
In this role, you will be given the opportunity to learn about the technologies that drive innovation today; diverse technologies that customers are using day in and out. And more importantly, you’ll learn how to learn new technologies.

Just as an example, in the past 12 months, we’ve had the opportunity to learn and become experts in these diverse technologies:

  • How to setup VM servers for lab environments, both on-prem and using cloud services.
  • Create an automatically “resetting” demo environment for the sales team.
  • Setup Microsoft Domain Controllers with Active Directory and AD Federation Services.
  • Learn the basics of OAUTH and web single sign on (SSO).
  • Archive video workflows from camera to media asset management systems.
  • How upload/download files from Javascript by enabling CORS.
  • How to install and monitor online backup installations using RMM tools, like JAMF.
  • Tape (LTO) systems. (Yes – people still use tape for storage!)

How can I know if I’ll succeed in this role?

You have:

  • Confidence. Be able to ask customers questions about their environments and convey to them your technical acumen.
  • Curiosity. Always want to learn about customers’ situations, how they got there and what problems they are trying to solve.
  • Organization. You’ll work with customers, integration partners, and Backblaze team members on projects of various lengths. You can context switch and either have a great memory or keep copious notes. Your checklists have their own checklists.

You are versed in:

  • The fundamentals of Windows, Linux and Mac OS X operating systems. You shouldn’t be afraid to use a command line.
  • Building, installing, integrating and configuring applications on any operating system.
  • Debugging failures – reading logs, monitoring usage, effective google searching to fix problems excites you.
  • The basics of TCP/IP networking and the HTTP protocol.
  • Novice development skills in any programming/scripting language. Have basic understanding of data structures and program flow.
  • Your background contains:

  • Bachelor’s degree in computer science or the equivalent.
  • 2+ years of experience as a pre or post-sales engineer.
  • The right extra credit:
    There are literally hundreds of previous experiences you can have had that would make you perfect for this job. Some experiences that we know would be helpful for us are below, but make sure you tell us your stories!

  • Experience using or programming against Amazon S3.
  • Experience with large on-prem storage – NAS, SAN, Object. And backing up data on such storage with tools like Veeam, Veritas and others.
  • Experience with photo or video media. Media archiving is a key market for Backblaze B2.
  • Program arduinos to automatically feed your dog.
  • Experience programming against web or REST APIs. (Point us towards your projects, if they are open source and available to link to.)
  • Experience with sales tools like Salesforce.
  • 3D print door stops.
  • Experience with Windows Servers, Active Directory, Group policies and the like.
  • What’s it like working with the Sales team?
    The Backblaze sales team collaborates. We help each other out by sharing ideas, templates, and our customer’s experiences. When we talk about our accomplishments, there is no “I did this,” only “we”. We are truly a team.

    We are honest to each other and our customers and communicate openly. We aim to have fun by embracing crazy ideas and creative solutions. We try to think not outside the box, but with no boxes at all. Customers are the driving force behind the success of the company and we care deeply about their success.

    If this all sounds like you:

    1. Send an email to [email protected] with the position in the subject line.
    2. Tell us a bit about your Sales Engineering experience.
    3. Include your resume.

    The post Wanted: Sales Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Wanted: Datacenter Technician

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-datacenter-technician/

    As we shoot way past 400 Petabytes of data under management we need some help scaling up our datacenters! We’re on the lookout for some datacenter technicians that can help us. This role is located near the Sacramento, California area. If you want to join a dynamic team that helps keep our almost 90,000+ hard drives spinning, this might be the job for you!

    Responsibilities

    • Work as Backblaze’s physical presence in Sacramento area datacenter(s).
    • Help maintain physical infrastructure including racking equipment, replacing hard drives and other system components.
    • Repair and troubleshoot defective equipment with minimal supervision.
    • Support datacenter’s 24×7 staff to install new equipment, handle after hours emergencies and other tasks.
    • Help manage onsite inventory of hard drives, cables, rails and other spare parts.
    • RMA defective components.
    • Setup, test and activate new equipment via the Linux command line.
    • Help train new Datacenter Technicians as needed.
    • Help with projects to install new systems and services as time allows.
    • Follow and improve Datacenter best practices and documentation.
    • Maintain a clean and well organized work environment.
    • On-call responsibilities require being within an hour of the SunGard’s Rancho Cordova/Roseville facility and occasional trips onsite 24×7 to resolve issues that can’t be handled remotely.
    • Work days may include Saturday and/or Sunday (e.g. working Tuesday – Saturday).

    Requirements

    • Excellent communication, time management, problem solving and organizational skills.
    • Ability to learn quickly.
    • Ability to lift/move 50-75 lbs and work down near the floor on a daily basis.
    • Position based near Sacramento, California and may require periodic visits to the corporate office in San Mateo.
    • May require travel to other Datacenters to provide coverage and/or to assist
      with new site set-up.

    Backblaze Employees Have:

    • Good attitude and willingness to do whatever it takes to get the job done.
    • Strong desire to work for a small, fast-paced company.
    • Desire to learn and adapt to rapidly changing technologies and work environment.
    • Comfortable with well-behaved pets in the office.
    • This position is located near Sacramento, California.

    Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    If This Sounds Like You:
    Send an email to [email protected] with:

    1. Datacenter Tech in the subject line
    2. Your resume attached
    3. An overview of your relevant experience

    The post Wanted: Datacenter Technician appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Wanted: Fixed Assets Accountant

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-fixed-assets-accountant/

    As Backblaze continues to grow, we’re expanding our accounting team! We’re looking for a seasoned Fixed Asset Accountant to help us with fixed assets and equipment leases.

    Job Duties:

    • Maintain and review fixed assets.
    • Record fixed asset acquisitions and dispositions.
    • Review and update the detailed schedule of fixed assets and accumulated depreciation.
    • Calculate depreciation for all fixed assets.
    • Investigate the potential obsolescence of fixed assets.
    • Coordinate with Operations team data center asset dispositions.
    • Conduct periodic physical inventory counts of fixed assets. Work with Operations team on cycle counts.
    • Reconcile the balance in the fixed asset subsidiary ledger to the summary-level account in the general ledger.
    • Track company expenditures for fixed assets in comparison to the capital budget and management authorizations.
    • Prepare audit schedules relating to fixed assets, and assist the auditors in their inquiries.
    • Recommend to management any updates to accounting policies related to fixed assets.
    • Manage equipment leases.
    • Engage and negotiate acquisition of new equipment lease lines.
    • Overall control of original lease documentation and maintenance of master lease files.
    • Facilitate and track routing and execution of various lease related: agreements — documents/forms/lease documents.
    • Establish and maintain proper controls to track expirations, renewal options, and all other critical dates.
    • Perform other duties and special projects as assigned.

    Qualifications:

    • 5-6 years relevant accounting experience.
    • Knowledge of inventory and cycle counting preferred.
    • Quickbooks, Excel, Word experience desired.
    • Organized, with excellent attention to detail, meticulous, quick-learner.
    • Good interpersonal skills and a team player.
    • Flexibility and ability to adapt and wear different hats.

    Backblaze Employees Have:

    • Good attitude and willingness to do whatever it takes to get the job done.
    • Strong desire to work for a small, fast-paced company.
    • Desire to learn and adapt to rapidly changing technologies and work environment.
    • Comfortable with well-behaved pets in the office.

    This position is located in San Mateo, California. Regular attendance in the office is expected. Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    If This Sounds Like You:
    Send an email to [email protected] with:

    1. Fixed Asset Accountant in the subject line
    2. Your resume attached
    3. An overview of your relevant experience

    The post Wanted: Fixed Assets Accountant appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Bitcoin: In Crypto We Trust

    Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/12/bitcoin-in-crypto-we-trust.html

    Tim Wu, who coined “net neutrality”, has written an op-ed on the New York Times called “The Bitcoin Boom: In Code We Trust“. He is wrong about “code”.

    The wrong “trust”

    Wu builds a big manifesto about how real-world institutions aren’t can’t be trusted. Certainly, this reflects the rhetoric from a vocal wing of Bitcoin fanatics, but it’s not the Bitcoin manifesto.

    Instead, the word “trust” in the Bitcoin paper is much narrower, referring to how online merchants can’t trust credit-cards (for example). When I bought school supplies for my niece when she studied in Canada, the online site wouldn’t accept my U.S. credit card. They didn’t trust my credit card. However, they trusted my Bitcoin, so I used that payment method instead, and succeeded in the purchase.

    Real-world currencies like dollars are tethered to the real-world, which means no single transaction can be trusted, because “they” (the credit-card company, the courts, etc.) may decide to reverse the transaction. The manifesto behind Bitcoin is that a transaction cannot be reversed — and thus, can always be trusted.

    Deliberately confusing the micro-trust in a transaction and macro-trust in banks and governments is a sort of bait-and-switch.

    The wrong inspiration

    Wu claims:

    “It was, after all, a carnival of human errors and misfeasance that inspired the invention of Bitcoin in 2009, namely, the financial crisis.”

    Not true. Bitcoin did not appear fully formed out of the void, but was instead based upon a series of innovations that predate the financial crisis by a decade. Moreover, the financial crisis had little to do with “currency”. The value of the dollar and other major currencies were essentially unscathed by the crisis. Certainly, enthusiasts looking backward like to cherry pick the financial crisis as yet one more reason why the offline world sucks, but it had little to do with Bitcoin.

    In crypto we trust

    It’s not in code that Bitcoin trusts, but in crypto. Satoshi makes that clear in one of his posts on the subject:

    A generation ago, multi-user time-sharing computer systems had a similar problem. Before strong encryption, users had to rely on password protection to secure their files, placing trust in the system administrator to keep their information private. Privacy could always be overridden by the admin based on his judgment call weighing the principle of privacy against other concerns, or at the behest of his superiors. Then strong encryption became available to the masses, and trust was no longer required. Data could be secured in a way that was physically impossible for others to access, no matter for what reason, no matter how good the excuse, no matter what.

    You don’t possess Bitcoins. Instead, all the coins are on the public blockchain under your “address”. What you possess is the secret, private key that matches the address. Transferring Bitcoin means using your private key to unlock your coins and transfer them to another. If you print out your private key on paper, and delete it from the computer, it can never be hacked.

    Trust is in this crypto operation. Trust is in your private crypto key.

    We don’t trust the code

    The manifesto “in code we trust” has been proven wrong again and again. We don’t trust computer code (software) in the cryptocurrency world.

    The most profound example is something known as the “DAO” on top of Ethereum, Bitcoin’s major competitor. Ethereum allows “smart contracts” containing code. The quasi-religious manifesto of the DAO smart-contract is that the “code is the contract”, that all the terms and conditions are specified within the smart-contract code, completely untethered from real-world terms-and-conditions.

    Then a hacker found a bug in the DAO smart-contract and stole most of the money.

    In principle, this is perfectly legal, because “the code is the contract”, and the hacker just used the code. In practice, the system didn’t live up to this. The Ethereum core developers, acting as central bankers, rewrote the Ethereum code to fix this one contract, returning the money back to its original owners. They did this because those core developers were themselves heavily invested in the DAO and got their money back.

    Similar things happen with the original Bitcoin code. A disagreement has arisen about how to expand Bitcoin to handle more transactions. One group wants smaller and “off-chain” transactions. Another group wants a “large blocksize”. This caused a “fork” in Bitcoin with two versions, “Bitcoin” and “Bitcoin Cash”. The fork championed by the core developers (central bankers) is worth around $20,000 right now, while the other fork is worth around $2,000.

    So it’s still “in central bankers we trust”, it’s just that now these central bankers are mostly online instead of offline institutions. They have proven to be even more corrupt than real-world central bankers. It’s certainly not the code that is trusted.

    The bubble

    Wu repeats the well-known reference to Amazon during the dot-com bubble. If you bought Amazon’s stock for $107 right before the dot-com crash, it still would be one of wisest investments you could’ve made. Amazon shares are now worth around $1,200 each.

    The implication is that Bitcoin, too, may have such long term value. Even if you buy it today and it crashes tomorrow, it may still be worth ten-times its current value in another decade or two.

    This is a poor analogy, for three reasons.

    The first reason is that we knew the Internet had fundamentally transformed commerce. We knew there were going to be winners in the long run, it was just a matter of picking who would win (Amazon) and who would lose (Pets.com). We have yet to prove Bitcoin will be similarly transformative.

    The second reason is that businesses are real, they generate real income. While the stock price may include some irrational exuberance, it’s ultimately still based on the rational expectations of how much the business will earn. With Bitcoin, it’s almost entirely irrational exuberance — there are no long term returns.

    The third flaw in the analogy is that there are an essentially infinite number of cryptocurrencies. We saw this today as Coinbase started trading Bitcoin Cash, a fork of Bitcoin. The two are nearly identical, so there’s little reason one should be so much valuable than another. It’s only a fickle fad that makes one more valuable than another, not business fundamentals. The successful future cryptocurrency is unlikely to exist today, but will be invented in the future.

    The lessons of the dot-com bubble is not that Bitcoin will have long term value, but that cryptocurrency companies like Coinbase and BitPay will have long term value. Or, the lesson is that “old” companies like JPMorgan that are early adopters of the technology will grow faster than their competitors.

    Conclusion

    The point of Wu’s paper is to distinguish trust in traditional real-world institutions and trust in computer software code. This is an inaccurate reading of the situation.

    Bitcoin is not about replacing real-world institutions but about untethering online transactions.

    The trust in Bitcoin is in crypto — the power crypto gives individuals instead of third-parties.

    The trust is not in the code. Bitcoin is a “cryptocurrency” not a “codecurrency”.

    Browser hacking for 280 character tweets

    Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/09/browser-hacking-for-280-character-tweets.html

    Twitter has raised the limit to 280 characters for a select number of people. However, they left open a hole, allowing anybody to make large tweets with a little bit of hacking. The hacking skills needed are basic hacking skills, which I thought I’d write up in a blog post.


    Specifically, the skills you will exercise are:

    • basic command-line shell
    • basic HTTP requests
    • basic browser DOM editing

    The short instructions

    The basic instructions were found in tweets like the following:
    These instructions are clear to the average hacker, but of course, a bit difficult for those learning hacking, hence this post.

    The command-line

    The basics of most hacking start with knowledge of the command-line. This is the “Terminal” app under macOS or cmd.exe under Windows. Almost always when you see hacking dramatized in the movies, they are using the command-line.
    In the beginning, the command-line is all computers had. To do anything on a computer, you had to type a “command” telling it what to do. What we see as the modern graphical screen is a layer on top of the command-line, one that translates clicks of the mouse into the raw commands.
    On most systems, the command-line is known as “bash”. This is what you’ll find on Linux and macOS. Windows historically has had a different command-line that uses slightly different syntax, though in the last couple years, they’ve also supported “bash”. You’ll have to install it first, such as by following these instructions.
    You’ll see me use command that may not be yet installed on your “bash” command-line, like nc and curl. You’ll need to run a command to install them, such as:
    sudo apt-get install nc curl
    The thing to remember about the command-line is that the mouse doesn’t work. You can’t click to move the cursor as you normally do in applications. That’s because the command-line predates the mouse by decades. Instead, you have to use arrow keys.
    I’m not going to spend much effort discussing the command-line, as a complete explanation is beyond the scope of this document. Instead, I’m assuming the reader either already knows it, or will learn-from-example as we go along.

    Web requests

    The basics of how the web works are really simple. A request to a web server is just a small packet of text, such as the following, which does a search on Google for the search-term “penguin” (presumably, you are interested in knowing more about penguins):
    GET /search?q=penguin HTTP/1.0
    Host: www.google.com
    User-Agent: human
    The command we are sending to the server is GET, meaning get a page. We are accessing the URL /search, which on Google’s website, is how you do a search. We are then sending the parameter q with the value penguin. We also declare that we are using version 1.0 of the HTTP (hyper-text transfer protocol).
    Following the first line there are a number of additional headers. In one header, we declare the Host name that we are accessing. Web servers can contain many different websites, with different names, so this header is usually imporant.
    We also add the User-Agent header. The “user-agent” means the “browser” that you use, like Edge, Chrome, Firefox, or Safari. It allows servers to send content optimized for different browsers. Since we are sending web requests without a browser here, we are joking around saying human.
    Here’s what happens when we use the nc program to send this to a google web server:
    The first part is us typing, until we hit the [enter] key to create a blank line. After that point is the response from the Google server. We get back a result code (OK), followed by more headers from the server, and finally the contents of the webpage, which goes on from many screens. (We’ll talk about what web pages look like below).
    Note that a lot of HTTP headers are optional and really have little influence on what’s going on. They are just junk added to web requests. For example, we see Google report a P3P header is some relic of 2002 that nobody uses anymore, as far as I can tell. Indeed, if you follow the URL in the P3P header, Google pretty much says exactly that.
    I point this out because the request I show above is a simplified one. In practice, most requests contain a lot more headers, especially Cookie headers. We’ll see that later when making requests.

    Using cURL instead

    Sending the raw HTTP request to the server, and getting raw HTTP/HTML back, is annoying. The better way of doing this is with the tool known as cURL, or plainly, just curl. You may be familiar with the older command-line tools wget. cURL is similar, but more flexible.
    To use curl for the experiment above, we’d do something like the following. We are saving the web page to “penguin.html” instead of just spewing it on the screen.
    Underneath, cURL builds an HTTP header just like the one we showed above, and sends it to the server, getting the response back.

    Web-pages

    Now let’s talk about web pages. When you look at the web page we got back from Google while searching for “penguin”, you’ll see that it’s intimidatingly complex. I mean, it intimidates me. But it all starts from some basic principles, so we’ll look at some simpler examples.
    The following is text of a simple web page:
    <html>
    <body>
    <h1>Test</h1>
    <p>This is a simple web page</p>
    </body>
    </html>
    This is HTML, “hyper-text markup language”. As it’s name implies, we “markup” text, such as declaring the first text as a level-1 header (H1), and the following text as a paragraph (P).
    In a web browser, this gets rendered as something that looks like the following. Notice how a header is formatted differently from a paragraph. Also notice that web browsers can use local files as well as make remote requests to web servers:
    You can right-mouse click on the page and do a “View Source”. This will show the raw source behind the web page:
    Web pages don’t just contain marked-up text. They contain two other important features, style information that dictates how things appear, and script that does all the live things that web pages do, from which we build web apps.
    So let’s add a little bit of style and scripting to our web page. First, let’s view the source we’ll be adding:
    In our header (H1) field, we’ve added the attribute to the markup giving this an id of mytitle. In the style section above, we give that element a color of blue, and tell it to align to the center.
    Then, in our script section, we’ve told it that when somebody clicks on the element “mytitle”, it should send an “alert” message of “hello”.
    This is what our web page now looks like, with the center blue title:
    When we click on the title, we get a popup alert:
    Thus, we see an example of the three components of a webpage: markup, style, and scripting.

    Chrome developer tools

    Now we go off the deep end. Right-mouse click on “Test” (not normal click, but right-button click, to pull up a menu). Select “Inspect”.
    You should now get a window that looks something like the following. Chrome splits the screen in half, showing the web page on the left, and it’s debug tools on the right.
    This looks similar to what “View Source” shows, but it isn’t. Instead, it’s showing how Chrome interpreted the source HTML. For example, our style/script tags should’ve been marked up with a head (header) tag. We forgot it, but Chrome adds it in anyway.
    What Google is showing us is called the DOM, or document object model. It shows us all the objects that make up a web page, and how they fit together.
    For example, it shows us how the style information for #mytitle is created. It first starts with the default style information for an h1 tag, and then how we’ve changed it with our style specifications.
    We can edit the DOM manually. Just double click on things you want to change. For example, in this screen shot, I’ve changed the style spec from blue to red, and I’ve changed the header and paragraph test. The original file on disk hasn’t changed, but I’ve changed the DOM in memory.
    This is a classic hacking technique. If you don’t like things like paywalls, for example, just right-click on the element blocking your view of the text, “Inspect” it, then delete it. (This works for some paywalls).
    This edits the markup and style info, but changing the scripting stuff is a bit more complicated. To do that, click on the [Console] tab. This is the scripting console, and allows you to run code directly as part of the webpage. We are going to run code that resets what happens when we click on the title. In this case, we are simply going to change the message to “goodbye”.
    Now when we click on the title, we indeed get the message:
    Again, a common way to get around paywalls is to run some code like that that change which functions will be called.

    Putting it all together

    Now let’s put this all together in order to hack Twitter to allow us (the non-chosen) to tweet 280 characters. Review Dildog’s instructions above.
    The first step is to get to Chrome Developer Tools. Dildog suggests F12. I suggest right-clicking on the Tweet button (or Reply button, as I use in my example) and doing “Inspect”, as I describe above.
    You’ll now see your screen split in half, with the DOM toward the right, similar to how I describe above. However, Twitter’s app is really complex. Well, not really complex, it’s all basic stuff when you come right down to it. It’s just so much stuff — it’s a large web app with lots of parts. So we have to dive in without understanding everything that’s going on.
    The Tweet/Reply button we are inspecting is going to look like this in the DOM:
    The Tweet/Reply button is currently greyed out because it has the “disabled” attribute. You need to double click on it and remove that attribute. Also, in the class attribute, there is also a “disabled” part. Double-click, then click on that and removed just that disabled as well, without impacting the stuff around it. This should change the button from disabled to enabled. It won’t be greyed out, and it’ll respond when you click on it.
    Now click on it. You’ll get an error message, as shown below:
    What we’ve done here is bypass what’s known as client-side validation. The script in the web page prevented sending Tweets longer than 140 characters. Our editing of the DOM changed that, allowing us to send a bad request to the server. Bypassing client-side validation this way is the source of a lot of hacking.
    But Twitter still does server-side validation as well. They know any client-side validation can be bypassed, and are in on the joke. They tell us hackers “You’ll have to be more clever”. So let’s be more clever.
    In order to make longer 280 characters tweets work for select customers, they had to change something on the server-side. The thing they added was adding a “weighted_character_count=true” to the HTTP request. We just need to repeat the request we generated above, adding this parameter.
    In theory, we can do this by fiddling with the scripting. The way Dildog describes does it a different way. He copies the request out of the browser, edits it, then send it via the command-line using curl.
    We’ve used the [Elements] and [Console] tabs in Chrome’s DevTools. Now we are going to use the [Network] tab. This lists all the requests the web page has made to the server. The twitter app is constantly making requests to refresh the content of the web page. The request we made trying to do a long tweet is called “create”, and is red, because it failed.
    Google Chrome gives us a number of ways to duplicate the request. The most useful is that it copies it as a full cURL command we can just paste onto the command-line. We don’t even need to know cURL, it takes care of everything for us. On Windows, since you have two command-lines, it gives you a choice to use the older Windows cmd.exe, or the newer bash.exe. I use the bash version, since I don’t know where to get the Windows command-line version of cURL.exe.
    There’s a lot of going on here. The first thing to notice is the long xxxxxx strings. That’s actually not in the original screenshot. I edited the picture. That’s because these are session-cookies. If inserted them into your browser, you’d hijack my Twitter session, and be able to tweet as me (such as making Carlos Danger style tweets). Therefore, I have to remove them from the example.
    At the top of the screen is the URL that we are accessing, which is https://twitter.com/i/tweet/create. Much of the rest of the screen uses the cURL -H option to add a header. These are all the HTTP headers that I describe above. Finally, at the bottom, is the –data section, which contains the data bits related to the tweet, especially the tweet itself.
    We need to edit either the URL above to read https://twitter.com/i/tweet/create?weighted_character_count=true, or we need to add &weighted_character_count=true to the –data section at the bottom (either works). Remember: mouse doesn’t work on command-line, so you have to use the cursor-keys to navigate backwards in the line. Also, since the line is larger than the screen, it’s on several visual lines, even though it’s all a single line as far as the command-line is concerned.
    Now just hit [return] on your keyboard, and the tweet will be sent to the server, which at the moment, works. Presto!
    Twitter will either enable or disable the feature for everyone in a few weeks, at which point, this post won’t work. But the reason I’m writing this is to demonstrate the basic hacking skills. We manipulate the web pages we receive from servers, and we manipulate what’s sent back from our browser back to the server.

    Easier: hack the scripting

    Instead of messing with the DOM and editing the HTTP request, the better solution would be to change the scripting that does both DOM client-side validation and HTTP request generation. The only reason Dildog above didn’t do that is that it’s a lot more work trying to find where all this happens.
    Others have, though. @Zemnmez did just that, though his technique works for the alternate TweetDeck client (https://tweetdeck.twitter.com) instead of the default client. Go copy his code from here, then paste it into the DevTools scripting [Console]. It’ll go in an replace some scripting functions, such like my simpler example above.
    The console is showing a stream of error messages, because TweetDeck has bugs, ignore those.
    Now you can effortlessly do long tweets as normal, without all the messing around I’ve spent so much text in this blog post describing.
    Now, as I’ve mentioned this before, you are only editing what’s going on in the current web page. If you refresh this page, or close it, everything will be lost. You’ll have to re-open the DevTools scripting console and repaste the code. The easier way of doing this is to use the [Sources] tab instead of [Console] and use the “Snippets” feature to save this bit of code in your browser, to make it easier next time.
    The even easier way is to use Chrome extensions like TamperMonkey and GreaseMonkey that’ll take care of this for you. They’ll save the script, and automatically run it when they see you open the TweetDeck webpage again.
    An even easier way is to use one of the several Chrome extensions written in the past day specifically designed to bypass the 140 character limit. Since the purpose of this blog post is to show you how to tamper with your browser yourself, rather than help you with Twitter, I won’t list them.

    Conclusion

    Tampering with the web-page the server gives you, and the data you send back, is a basic hacker skill. In truth, there is a lot to this. You have to get comfortable with the command-line, using tools like cURL. You have to learn how HTTP requests work. You have to understand how web pages are built from markup, style, and scripting. You have to be comfortable using Chrome’s DevTools for messing around with web page elements, network requests, scripting console, and scripting sources.
    So it’s rather a lot, actually.
    My hope with this page is to show you a practical application of all this, without getting too bogged down in fully explaining how every bit works.

    Pirate Sites and the Dying Art of Customer Service

    Post Syndicated from Andy original https://torrentfreak.com/pirate-sites-and-the-dying-art-of-customer-service-170803/

    Consumers of products and services in the West are now more educated than ever before. They often research before making a purchase and view follow-up assistance as part of the package. Indeed, many companies live and die on the levels of customer support they’re able to offer.

    In this ultra-competitive world, we send faulty technology items straight back to the store, cancel our unreliable phone providers, and switch to new suppliers for the sake of a few dollars, pounds or euros per month. But does this demanding environment translate to the ‘pirate’ world?

    It’s important to remember that when the first waves of unauthorized platforms appeared after the turn of the century, content on the Internet was firmly established as being ‘free’. When people first fired up KaZaA, LimeWire, or the few fledgling BitTorrent portals, few could believe their luck. Nevertheless, the fact that there was no charge for content was quickly accepted as the standard.

    That’s a position that continues today but for reasons that are not entirely clear, some users of pirate sites treat the availability of such platforms as some kind of right, holding them to the same standards of service that they would their ISP, for example.

    One only has to trawl the comments section on The Pirate Bay to see hundreds of examples of people criticizing the quality of uploaded movies, the fact that a software crack doesn’t work, or that some anonymous uploader failed to deliver the latest album quickly enough. That’s aside from the continual complaints screamed on various external platforms which bemoan the site’s downtime record.

    For people who recall the sheer joy of finding a working Suprnova mirror for a few minutes almost 15 years ago, this attitude is somewhat baffling. Back then, people didn’t go ballistic when a site went down, they savored the moment when enthusiastic volunteers brought it back up. There was a level of gratefulness that appears somewhat absent today, in a new world where free torrent and streaming sites are suddenly held to the same standards as Comcast or McDonalds.

    But while a cultural change among users has definitely taken place over the years, the way sites communicate with their users has taken a hit too. Despite the advent of platforms including Twitter and Facebook, the majority of pirate site operators today have a tendency to leave their users completely in the dark when things go wrong, leading to speculation and concern among grateful and entitled users alike.

    So why does The Pirate Bay’s blog stay completely unattended these days? Why do countless sites let dust gather on Twitter accounts that last made an announcement in 2012? And why don’t site operators announce scheduled downtime in advance or let people know what’s going on when the unexpected happens?

    “Honestly? I don’t have the time anymore. I also care less than I did,” one site operator told TF.

    “11 years of doing this shit is enough to grind anybody down. It’s something I need to do but not doing it makes no difference either. People complain in any case. Then if you start [informing people] again they’ll want it always. Not happening.”

    Rather less complimentary was the operator of a large public site. He told us that two decades ago relationships between operators and users were good but have been getting worse ever since.

    “Users of pirate content 20 years ago were highly technical. 10 years ago they were somewhat technical. Right now they are fucking watermelon head puppets. They are plain stupid,” he said.

    “Pirate sites don’t have customers. They have users. The definition of a customer, when related to the web, is a person that actually buys a service. Since pirates sites don’t sell services (I’m talking about public ones) they have no customers.”

    Another site operator told us that his motivations for not interacting with users are based on the changing legal environment, which has become steadily and markedly worse, year upon year.

    “I’m not enjoying being open like before. I used to chat keenly with the users, on the site and IRC [Internet Relay Chat] but i’m keeping my distance since a long time ago,” he told us.

    “There have always been risks but now I lock everything down. I’m not using Facebook in any way personally or for the site and I don’t need the dramas of Twitter. Everytime you engage on there, problems arise with people wanting a piece of you. Some of the staff use it but I advise the contrary where possible.”

    Interested in where the boundaries lie, we asked a couple of sites whether they should be doing more to keep users informed and if that should be considered a ‘customer service’ obligation these days.

    “This is not Netflix and i’m not the ‘have a nice day’ guy from McDonalds,” one explained.

    “If people want Netflix help then go to Netflix. There’s two of us here doing everything and I mean everything. We’re already in a pinch so spending time to answer every retarded question from kids is right out.”

    Our large public site operator agreed, noting that users complain about the most crazy things, including why they don’t have enough space on a drive to download, why a movie that’s out in 2020 hasn’t been uploaded yet, and why can’t they login – when they haven’t even opened an account yet.

    While the responses aren’t really a surprise given the ‘free’ nature of the sites and the volume of visitors, things don’t get any better when moving up (we use the term loosely) to paid ‘pirate’ services.

    Last week, one streaming platform in particular had an absolute nightmare with what appeared to be technical issues. Nevertheless, some of its users, despite only paying a few pounds per month, demanded their pound of flesh from the struggling service.

    One, who raised the topic on Reddit, was advised to ask for his money back for the trouble caused. It raised a couple of eyebrows.

    “Put in a ticket and ask [for a refund], morally they should,” the user said.

    The use of the word “morally” didn’t sit well with some observers, one of which couldn’t understand how the word could possibly be mentioned in the context of a pirate paying another pirate money, for a pirate service that had broken down.

    “Wait let me get this straight,” the critic said. “You want a refund for a gray market service. It’s like buying drugs off the corner only to find out it’s parsley. Do you go back to the dealer and demand a refund? You live and you learn bud. [Shaking my head] at people in here talking about it being morally responsible…too funny.”

    It’s not clear when pirate sites started being held to the same standards as regular commercial entities but from anecdotal evidence at least, the problem appears to be getting worse. That being said and from what we’ve heard, users can stop holding their breath waiting for deluxe customer service – it’s not coming anytime soon.

    “There’s no way to monetize support,” one admin concludes.

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

    GitMiner – Advanced Tool For Mining Github

    Post Syndicated from Darknet original https://www.darknet.org.uk/2017/08/gitminer-advanced-tool-mining-github/?utm_source=darknet&utm_medium=rss&utm_campaign=feed

    GitMiner is an Advanced search tool for automation in Github, it enables mining Github for useful or potentially dangerous information or for example specific vulnerable or useful WordPress files. This tool aims to facilitate mining the code or snippets on Github through the site’s search page. What is Mining Github? GitHub is a web-based Git […]

    The post GitMiner – Advanced Tool For Mining Github appeared first on Darknet.

    Announcing the Winners of the AWS Chatbot Challenge – Conversational, Intelligent Chatbots using Amazon Lex and AWS Lambda

    Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/announcing-the-winners-of-the-aws-chatbot-challenge-conversational-intelligent-chatbots-using-amazon-lex-and-aws-lambda/

    A couple of months ago on the blog, I announced the AWS Chatbot Challenge in conjunction with Slack. The AWS Chatbot Challenge was an opportunity to build a unique chatbot that helped to solve a problem or that would add value for its prospective users. The mission was to build a conversational, natural language chatbot using Amazon Lex and leverage Lex’s integration with AWS Lambda to execute logic or data processing on the backend.

    I know that you all have been anxiously waiting to hear announcements of who were the winners of the AWS Chatbot Challenge as much as I was. Well wait no longer, the winners of the AWS Chatbot Challenge have been decided.

    May I have the Envelope Please? (The Trumpets sound)

    The winners of the AWS Chatbot Challenge are:

    • First Place: BuildFax Counts by Joe Emison
    • Second Place: Hubsy by Andrew Riess, Andrew Puch, and John Wetzel
    • Third Place: PFMBot by Benny Leong and his team from MoneyLion.
    • Large Organization Winner: ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang

     

    Diving into the Winning Chatbot Projects

    Let’s take a walkthrough of the details for each of the winning projects to get a view of what made these chatbots distinctive, as well as, learn more about the technologies used to implement the chatbot solution.

     

    BuildFax Counts by Joe Emison

    The BuildFax Counts bot was created as a real solution for the BuildFax company to decrease the amount the time that sales and marketing teams can get answers on permits or properties with permits meet certain criteria.

    BuildFax, a company co-founded by bot developer Joe Emison, has the only national database of building permits, which updates data from approximately half of the United States on a monthly basis. In order to accommodate the many requests that come in from the sales and marketing team regarding permit information, BuildFax has a technical sales support team that fulfills these requests sent to a ticketing system by manually writing SQL queries that run across the shards of the BuildFax databases. Since there are a large number of requests received by the internal sales support team and due to the manual nature of setting up the queries, it may take several days for getting the sales and marketing teams to receive an answer.

    The BuildFax Counts chatbot solves this problem by taking the permit inquiry that would normally be sent into a ticket from the sales and marketing team, as input from Slack to the chatbot. Once the inquiry is submitted into Slack, a query executes and the inquiry results are returned immediately.

    Joe built this solution by first creating a nightly export of the data in their BuildFax MySQL RDS database to CSV files that are stored in Amazon S3. From the exported CSV files, an Amazon Athena table was created in order to run quick and efficient queries on the data. He then used Amazon Lex to create a bot to handle the common questions and criteria that may be asked by the sales and marketing teams when seeking data from the BuildFax database by modeling the language used from the BuildFax ticketing system. He added several different sample utterances and slot types; both custom and Lex provided, in order to correctly parse every question and criteria combination that could be received from an inquiry.  Using Lambda, Joe created a Javascript Lambda function that receives information from the Lex intent and used it to build a SQL statement that runs against the aforementioned Athena database using the AWS SDK for JavaScript in Node.js library to return inquiry count result and SQL statement used.

    The BuildFax Counts bot is used today for the BuildFax sales and marketing team to get back data on inquiries immediately that previously took up to a week to receive results.

    Not only is BuildFax Counts bot our 1st place winner and wonderful solution, but its creator, Joe Emison, is a great guy.  Joe has opted to donate his prize; the $5,000 cash, the $2,500 in AWS Credits, and one re:Invent ticket to the Black Girls Code organization. I must say, you rock Joe for helping these kids get access and exposure to technology.

     

    Hubsy by Andrew Riess, Andrew Puch, and John Wetzel

    Hubsy bot was created to redefine and personalize the way users traditionally manage their HubSpot account. HubSpot is a SaaS system providing marketing, sales, and CRM software. Hubsy allows users of HubSpot to create engagements and log engagements with customers, provide sales teams with deals status, and retrieves client contact information quickly. Hubsy uses Amazon Lex’s conversational interface to execute commands from the HubSpot API so that users can gain insights, store and retrieve data, and manage tasks directly from Facebook, Slack, or Alexa.

    In order to implement the Hubsy chatbot, Andrew and the team members used AWS Lambda to create a Lambda function with Node.js to parse the users request and call the HubSpot API, which will fulfill the initial request or return back to the user asking for more information. Terraform was used to automatically setup and update Lambda, CloudWatch logs, as well as, IAM profiles. Amazon Lex was used to build the conversational piece of the bot, which creates the utterances that a person on a sales team would likely say when seeking information from HubSpot. To integrate with Alexa, the Amazon Alexa skill builder was used to create an Alexa skill which was tested on an Echo Dot. Cloudwatch Logs are used to log the Lambda function information to CloudWatch in order to debug different parts of the Lex intents. In order to validate the code before the Terraform deployment, ESLint was additionally used to ensure the code was linted and proper development standards were followed.

     

    PFMBot by Benny Leong and his team from MoneyLion

    PFMBot, Personal Finance Management Bot,  is a bot to be used with the MoneyLion finance group which offers customers online financial products; loans, credit monitoring, and free credit score service to improve the financial health of their customers. Once a user signs up an account on the MoneyLion app or website, the user has the option to link their bank accounts with the MoneyLion APIs. Once the bank account is linked to the APIs, the user will be able to login to their MoneyLion account and start having a conversation with the PFMBot based on their bank account information.

    The PFMBot UI has a web interface built with using Javascript integration. The chatbot was created using Amazon Lex to build utterances based on the possible inquiries about the user’s MoneyLion bank account. PFMBot uses the Lex built-in AMAZON slots and parsed and converted the values from the built-in slots to pass to AWS Lambda. The AWS Lambda functions interacting with Amazon Lex are Java-based Lambda functions which call the MoneyLion Java-based internal APIs running on Spring Boot. These APIs obtain account data and related bank account information from the MoneyLion MySQL Database.

     

    ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang

    ADP PI (Payroll Innovation) bot is designed to help employees of ADP customers easily review their own payroll details and compare different payroll data by just asking the bot for results. The ADP PI Bot additionally offers issue reporting functionality for employees to report payroll issues and aids HR managers in quickly receiving and organizing any reported payroll issues.

    The ADP Payroll Innovation bot is an ecosystem for the ADP payroll consisting of two chatbots, which includes ADP PI Bot for external clients (employees and HR managers), and ADP PI DevOps Bot for internal ADP DevOps team.


    The architecture for the ADP PI DevOps bot is different architecture from the ADP PI bot shown above as it is deployed internally to ADP. The ADP PI DevOps bot allows input from both Slack and Alexa. When input comes into Slack, Slack sends the request to Lex for it to process the utterance. Lex then calls the Lambda backend, which obtains ADP data sitting in the ADP VPC running within an Amazon VPC. When input comes in from Alexa, a Lambda function is called that also obtains data from the ADP VPC running on AWS.

    The architecture for the ADP PI bot consists of users entering in requests and/or entering issues via Slack. When requests/issues are entered via Slack, the Slack APIs communicate via Amazon API Gateway to AWS Lambda. The Lambda function either writes data into one of the Amazon DynamoDB databases for recording issues and/or sending issues or it sends the request to Lex. When sending issues, DynamoDB integrates with Trello to keep HR Managers abreast of the escalated issues. Once the request data is sent from Lambda to Lex, Lex processes the utterance and calls another Lambda function that integrates with the ADP API and it calls ADP data from within the ADP VPC, which runs on Amazon Virtual Private Cloud (VPC).

    Python and Node.js were the chosen languages for the development of the bots.

    The ADP PI bot ecosystem has the following functional groupings:

    Employee Functionality

    • Summarize Payrolls
    • Compare Payrolls
    • Escalate Issues
    • Evolve PI Bot

    HR Manager Functionality

    • Bot Management
    • Audit and Feedback

    DevOps Functionality

    • Reduce call volume in service centers (ADP PI Bot).
    • Track issues and generate reports (ADP PI Bot).
    • Monitor jobs for various environment (ADP PI DevOps Bot)
    • View job dashboards (ADP PI DevOps Bot)
    • Query job details (ADP PI DevOps Bot)

     

    Summary

    Let’s all wish all the winners of the AWS Chatbot Challenge hearty congratulations on their excellent projects.

    You can review more details on the winning projects, as well as, all of the submissions to the AWS Chatbot Challenge at: https://awschatbot2017.devpost.com/submissions. If you are curious on the details of Chatbot challenge contest including resources, rules, prizes, and judges, you can review the original challenge website here:  https://awschatbot2017.devpost.com/.

    Hopefully, you are just as inspired as I am to build your own chatbot using Lex and Lambda. For more information, take a look at the Amazon Lex developer guide or the AWS AI blog on Building Better Bots Using Amazon Lex (Part 1)

    Chat with you soon!

    Tara

    Wanted: Front End Developer

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-front-end-developer/

    Want to work at a company that helps customers in over 150 countries around the world protect the memories they hold dear? Do you want to challenge yourself with a business that serves consumers, SMBs, Enterprise, and developers? If all that sounds interesting, you might be interested to know that Backblaze is looking for a Front End Developer​!

    Backblaze is a 10 year old company. Providing great customer experiences is the “secret sauce” that enables us to successfully compete against some of technology’s giants. We’ll finish the year at ~$20MM ARR and are a profitable business. This is an opportunity to have your work shine at scale in one of the fastest growing verticals in tech – Cloud Storage.

    You will utilize HTML, ReactJS, CSS and jQuery to develop intuitive, elegant user experiences. As a member of our Front End Dev team, you will work closely with our web development, software design, and marketing teams.

    On a day to day basis, you must be able to convert image mockups to HTML or ReactJS – There’s some production work that needs to get done. But you will also be responsible for helping build out new features, rethink old processes, and enabling third party systems to empower our marketing/sales/ and support teams.

    Our Front End Developer must be proficient in:

    • HTML, ReactJS
    • UTF-8, Java Properties, and Localized HTML (Backblaze runs in 11 languages!)
    • JavaScript, CSS, Ajax
    • jQuery, Bootstrap
    • JSON, XML
    • Understanding of cross-browser compatibility issues and ways to work around them
    • Basic SEO principles and ensuring that applications will adhere to them
    • Learning about third party marketing and sales tools through reading documentation. Our systems include Google Tag Manager, Google Analytics, Salesforce, and Hubspot

    Struts, Java, JSP, Servlet and Apache Tomcat are a plus, but not required.

    We’re looking for someone that is:

    • Passionate about building friendly, easy to use Interfaces and APIs.
    • Likes to work closely with other engineers, support, and marketing to help customers.
    • Is comfortable working independently on a mutually agreed upon prioritization queue (we don’t micromanage, we do make sure tasks are reasonably defined and scoped).
    • Diligent with quality control. Backblaze prides itself on giving our team autonomy to get work done, do the right thing for our customers, and keep a pace that is sustainable over the long run. As such, we expect everyone that checks in code that is stable. We also have a small QA team that operates as a secondary check when needed.

    Backblaze Employees Have:

    • Good attitude and willingness to do whatever it takes to get the job done
    • Strong desire to work for a small fast, paced company
    • Desire to learn and adapt to rapidly changing technologies and work environment
    • Comfort with well behaved pets in the office

    This position is located in San Mateo, California. Regular attendance in the office is expected. Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    If this sounds like you
    Send an email to [email protected] with:

    1. Front End Dev​ in the subject line
    2. Your resume attached
    3. An overview of your relevant experience

    The post Wanted: Front End Developer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Wanted: Automation Systems Administrator

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-automation-systems-administrator/

    Are you an Automation Systems Administrator who is looking for a challenging and fast-paced working environment? Want to a join our dynamic team and help Backblaze grow to new heights? Our Operations team is a distributed and collaborative group of individual contributors. We work closely together to build and maintain our home grown cloud storage farm, carefully controlling costs by utilizing open source and various brands of technology, as well as designing our own cloud storage servers. Members of Operations participate in the prioritization and decision making process, and make a difference everyday. The environment is challenging, but we balance the challenges with rewards, and we are looking for clever and innovative people to join us.

    Responsibilities:

    • Develop and deploy automated provisioning & updating of systems
    • Lead projects across a range of IT disciplines
    • Understand environment thoroughly enough to administer/debug any system
    • Participate in the 24×7 on-call rotation and respond to alerts as needed

    Requirements:

    • Expert knowledge of automated provisioning
    • Expert knowledge of Linux administration (Debian preferred)
    • Scripting skills
    • Experience in automation/configuration management
    • Position based in the San Mateo, California Corporate Office

    Required for all Backblaze Employees

    • Good attitude and willingness to do whatever it takes to get the job done.
    • Desire to learn and adapt to rapidly changing technologies and work environment.
    • Relentless attention to detail.
    • Excellent communication and problem solving skills.
    • Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    Company Description:
    Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

    We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

    We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

    Some Backblaze Perks:

    • Competitive healthcare plans
    • Competitive compensation and 401k
    • All employees receive Option grants
    • Unlimited vacation days
    • Strong coffee
    • Fully stocked Micro kitchen
    • Catered breakfast and lunches
    • Awesome people who work on awesome projects
    • Childcare bonus
    • Normal work hours
    • Get to bring your pets into the office
    • San Mateo Office – located near Caltrain and Highways 101 & 280.

    If this sounds like you — follow these steps:

    1. Send an email to [email protected] with the position in the subject line.
    2. Include your resume.
    3. Tell us a bit about your experience and why you’re excited to work with Backblaze.

    The post Wanted: Automation Systems Administrator appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Wanted: Site Reliability Engineer

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-site-reliability-engineer/

    Are you a Site Reliability Engineer who is looking for a challenging and fast-paced working environment? Want to a join our dynamic team and help Backblaze grow to new heights? Our Operations team is a distributed and collaborative group of individual contributors. We work closely together to build and maintain our home grown cloud storage farm, carefully controlling costs by utilizing open source and various brands of technology, as well as designing our own cloud storage servers. Members of Operations participate in the prioritization and decision making process, and make a difference everyday. The environment is challenging, but we balance the challenges with rewards, and we are looking for clever and innovative people to join us.

    Responsibilities:

    • Lead projects across a range of IT disciplines
    • Understand environment thoroughly enough to administer/debug any system
    • Collaborate on automated provisioning & updating of systems
    • Collaborate on network administration and security
    • Collaborate on database administration
    • Participate in the 24×7 on-call rotation and respond to alerts
      as needed

    Requirements:

    • Expert knowledge of Linux administration (Debian preferred)
    • Scripting skills
    • Experience in automation/configuration management (Ansible preferred)
    • Position based in the San Mateo, California Corporate Office

    Required for all Backblaze Employees

    • Good attitude and willingness to do whatever it takes to get the job done.
    • Desire to learn and adapt to rapidly changing technologies and work environment.
    • Relentless attention to detail.
    • Excellent communication and problem solving skills.
    • Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    Company Description:
    Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

    We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

    We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

    Some Backblaze Perks:

    • Competitive healthcare plans
    • Competitive compensation and 401k
    • All employees receive Option grants
    • Unlimited vacation days
    • Strong coffee
    • Fully stocked Micro kitchen
    • Catered breakfast and lunches
    • Awesome people who work on awesome projects
    • Childcare bonus
    • Normal work hours
    • Get to bring your pets into the office
    • San Mateo Office – located near Caltrain and Highways 101 & 280.

    If this sounds like you — follow these steps:

    1. Send an email to [email protected] with the position in the subject line.
    2. Include your resume.
    3. Tell us a bit about your experience and why you’re excited to work with Backblaze.

    The post Wanted: Site Reliability Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Wanted: Network Systems Administrator

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-network-systems-administrator/

    Are you a Network Systems Administrator who is looking for a challenging and fast-paced working environment? Want to a join our dynamic team and help Backblaze grow to new heights? Our Operations team is a distributed and collaborative group of individual contributors. We work closely together to build and maintain our home grown cloud storage farm, carefully controlling costs by utilizing open source and various brands of technology, as well as designing our own cloud storage servers. Members of Operations participate in the prioritization and decision making process, and make a difference everyday. The environment is challenging, but we balance the challenges with rewards, and we are looking for clever and innovative people to join us.

    Responsibilities:

    • Own the network administration and security
    • Lead projects across a range of IT disciplines
    • Understand environment thoroughly enough to administer/debug any system
    • Participate in the 24×7 on-call rotation and respond to alerts as needed

    Requirements:

    • Expert knowledge of network administration and security
    • Expert knowledge of Linux administration (Debian preferred)
    • Scripting skills
    • Position based in the San Mateo, California Corporate Office

    Required for all Backblaze Employees

    • Good attitude and willingness to do whatever it takes to get the job done.
    • Desire to learn and adapt to rapidly changing technologies and work environment.
    • Relentless attention to detail.
    • Excellent communication and problem solving skills.
    • Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    Company Description:
    Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

    We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

    We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

    Some Backblaze Perks:

    • Competitive healthcare plans
    • Competitive compensation and 401k
    • All employees receive Option grants
    • Unlimited vacation days
    • Strong coffee
    • Fully stocked Micro kitchen
    • Catered breakfast and lunches
    • Awesome people who work on awesome projects
    • Childcare bonus
    • Normal work hours
    • Get to bring your pets into the office
    • San Mateo Office – located near Caltrain and Highways 101 & 280.

    If this sounds like you — follow these steps:

    1. Send an email to [email protected] with the position in the subject line.
    2. Include your resume.
    3. Tell us a bit about your experience and why you’re excited to work with Backblaze.

    The post Wanted: Network Systems Administrator appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Wanted: Database Systems Administrator

    Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-database-systems-administrator/

    Are you a Database Systems Administrator who is looking for a challenging and fast-paced working environment? Want to a join our dynamic team and help Backblaze grow to new heights? Our Operations team is a distributed and collaborative group of individual contributors. We work closely together to build and maintain our home grown cloud storage farm, carefully controlling costs by utilizing open source and various brands of technology, as well as designing our own cloud storage servers. Members of Operations participate in the prioritization and decision making process, and make a difference everyday. The environment is challenging, but we balance the challenges with rewards, and we are looking for clever and innovative people to join us.

    Responsibilities:

    • Own the administration of Cassandra and MySQL
    • Lead projects across a range of IT disciplines
    • Understand environment thoroughly enough to administer/debug the system
    • Participate in the 24×7 on-call rotation and respond to alerts as needed

    Requirements:

    • Expert knowledge of Cassandra & MySQL
    • Expert knowledge of Linux administration (Debian preferred)
    • Scripting skills
    • Experience in automation/configuration management
    • Position is based in the San Mateo, California corporate office

    Required for all Backblaze Employees

    • Good attitude and willingness to do whatever it takes to get the job done.
    • Desire to learn and adapt to rapidly changing technologies and work environment.
    • Relentless attention to detail.
    • Excellent communication and problem solving skills.
    • Backblaze is an Equal Opportunity Employer and we offer competitive salary and benefits, including our no policy vacation policy.

    Company Description:
    Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

    We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

    We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

    Some Backblaze Perks:

    • Competitive healthcare plans
    • Competitive compensation and 401k
    • All employees receive Option grants
    • Unlimited vacation days
    • Strong coffee
    • Fully stocked Micro kitchen
    • Catered breakfast and lunches
    • Awesome people who work on awesome projects
    • Childcare bonus
    • Normal work hours
    • Get to bring your pets into the office
    • San Mateo Office – located near Caltrain and Highways 101 & 280.

    If this sounds like you — follow these steps:

    1. Send an email to [email protected] with the position in the subject line.
    2. Include your resume.
    3. Tell us a bit about your experience and why you’re excited to work with Backblaze.

    The post Wanted: Database Systems Administrator appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    How To Get Your First 1,000 Customers

    Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/how-to-get-your-first-1000-customers/

    PR for getting your first 1000 customers

    If you launch your startup and no one knows, did you actually launch? As mentioned in my last post, our initial launch target was to get a 1,000 people to use our service. But how do you get even 1,000 people to sign up for your service when no one knows who you are?

    There are a variety of methods to attract your first 1,000 customers, but launching with the press is my favorite. I’ll explain why and how to do it below.

    Paths to Attract Your First 1,000 Customers

    Social following: If you have a massive social following, those people are a reasonable target for what you’re offering. In particular if your relationship with them is one where they would buy something you recommend, this can be one of the easiest ways to get your initial customers. However, building this type of following is non-trivial and often is done over several years.

    Press not only provides awareness and customers, but credibility and SEO benefits as well

    Paid advertising: The advantage of paid ads is you have control over when they are presented and what they say. The primary disadvantage is they tend to be expensive, especially before you have your positioning, messaging, and funnel nailed.

    Viral: There are certainly examples of companies that launched with a hugely viral video, blog post, or promotion. While fantastic if it happens, even if you do everything right, the likelihood of massive virality is miniscule and the conversion rate is often low.

    Press: As I said, this is my favorite. You don’t need to pay a PR agency and can go from nothing to launched in a couple weeks. Press not only provides awareness and customers, but credibility and SEO benefits as well.

    How to Pitch the Press

    It’s easy: Have a compelling story, find the right journalists, make their life easy, pitch and follow-up. Of course, each one of those has some nuance, so let’s dig in.

    Have a compelling story

    How to Get Attention When you’ve been working for months on your startup, it’s easy to get lost in the minutiae when talking to others. Stories that a journalist will write about need to be something their readers will care about. Knowing what story to tell and how to tell it is part science and part art. Here’s how you can get there:

    The basics of your story

    Ask yourself the following questions, and write down the answers:

    • What are we doing? What product service are we offering?
    • Why? What problem are we solving?
    • What is interesting or unique? Either about what we’re doing, how we’re doing it, or for who we’re doing it.

    “But my story isn’t that exciting”

    Neither was announcing a data backup company, believe me. Look for angles that make it compelling. Here are some:

    • Did someone on your team do something major before? (build a successful company/product, create some innovation, market something we all know, etc.)
    • Do you have an interesting investor or board member?
    • Is there a personal story that drove you to start this company?
    • Are you starting it in a unique place?
    • Did you come upon the idea in a unique way?
    • Can you share something people want to know that’s not usually shared?
    • Are you partnered with a well-known company?
    • …is there something interesting/entertaining/odd/shocking/touching/etc.?

    It doesn’t get much less exciting than, “We’re launching a company that will backup your data.” But there were still a lot of compelling stories:

    • Founded by serial entrepreneurs, bootstrapped a capital-intensive company, committed to each other for a year without salary.
    • Challenging the way that every backup company before was set up by not asking customers to pick and choose files to backup.
    • Designing our own storage system.
    • Etc. etc.

    For the initial launch, we focused on “unlimited for $5/month” and statistics from a survey we ran with Harris Interactive that said that 94% of people did not regularly backup their data.

    It’s an old adage that “Everyone has a story.” Regardless of what you’re doing, there is always something interesting to share. Dig for that.

    The headline

    Once you’ve captured what you think the interesting story is, you’ve got to boil it down. Yes, you need the elevator pitch, but this is shorter…it’s the headline pitch. Write the headline that you would love to see a journalist write.

    Regardless of what you’re doing, there is always something interesting to share. Dig for that.

    Now comes the part where you have to be really honest with yourself: if you weren’t involved, would you care?

    The “Techmeme Test”

    One way I try to ground myself is what I call the “Techmeme Test”. Techmeme lists the top tech articles. Read the headlines. Imagine the headline you wrote in the middle of the page. If you weren’t involved, would you click on it? Is it more or less compelling than the others. Much of tech news is dominated by the largest companies. If you want to get written about, your story should be more compelling. If not, go back above and explore your story some more.

    Embargoes, exclusives and calls-to-action

    Journalists write about news. Thus, if you’ve already announced something and are then pitching a journalist to cover it, unless you’re giving her something significant that hasn’t been said, it’s no longer news. As a result, there are ‘embargoes’ and ‘exclusives’.

    Embargoes

      • : An embargo simply means that you are sharing news with a journalist that they need to keep private until a certain date and time.

    If you’re Apple, this may be a formal and legal document. In our case, it’s as simple as saying, “Please keep embargoed until 4/13/17 at 8am California time.” in the pitch. Some sites explicitly will not keep embargoes; for example The Information will only break news. If you want to launch something later, do not share information with journalists at these sites. If you are only working with a single journalist for a story, and your announcement time is flexible, you can jointly work out a date and time to announce. However, if you have a fixed launch time or are working with a few journalists, embargoes are key.

    Exclusives: An exclusive means you’re giving something specifically to that journalist. Most journalists love an exclusive as it means readers have to come to them for the story. One option is to give a journalist an exclusive on the entire story. If it is your dream journalist, this may make sense. Another option, however, is to give exclusivity on certain pieces. For example, for your launch you could give an exclusive on funding detail & a VC interview to a more finance-focused journalist and insight into the tech & a CTO interview to a more tech-focused journalist.

    Call-to-Action: With our launch we gave TechCrunch, Ars Technica, and SimplyHelp URLs that gave the first few hundred of their readers access to the private beta. Once those first few hundred users from each site downloaded, the beta would be turned off.

    Thus, we used a combination of embargoes, exclusives, and a call-to-action during our initial launch to be able to brief journalists on the news before it went live, give them something they could announce as exclusive, and provide a time-sensitive call-to-action to the readers so that they would actually sign up and not just read and go away.

    How to Find the Most Authoritative Sites / Authors

    “If a press release is published and no one sees it, was it published?” Perhaps the time existed when sending a press release out over the wire meant journalists would read it and write about it. That time has long been forgotten. Over 1,000 unread press releases are published every day. If you want your compelling story to be covered, you need to find the handful of journalists that will care.

    Determine the publications

    Find the publications that cover the type of story you want to share. If you’re in tech, Techmeme has a leaderboard of publications ranked by leadership and presence. This list will tell you which publications are likely to have influence. Visit the sites and see if your type of story appears on their site. But, once you’ve determined the publication do NOT send a pitch their “[email protected]” or “[email protected]” email addresses. In all the times I’ve done that, I have never had a single response. Those email addresses are likely on every PR, press release, and spam list and unlikely to get read. Instead…

    Determine the journalists

    Once you’ve determined which publications cover your area, check which journalists are doing the writing. Skim the articles and search for keywords and competitor names.

    Over 1,000 unread press releases are published every day.

    Identify one primary journalist at the publication that you would love to have cover you, and secondary ones if there are a few good options. If you’re not sure which one should be the primary, consider a few tests:

    • Do they truly seem to care about the space?
    • Do they write interesting/compelling stories that ‘get it’?
    • Do they appear on the Techmeme leaderboard?
    • Do their articles get liked/tweeted/shared and commented on?
    • Do they have a significant social presence?

    Leveraging Google

    Google author search by date

    In addition to Techmeme or if you aren’t in the tech space Google will become a must have tool for finding the right journalists to pitch. Below the search box you will find a number of tabs. Click on Tools and change the Any time setting to Custom range. I like to use the past six months to ensure I find authors that are actively writing about my market. I start with the All results. This will return a combination of product sites and articles depending upon your search term.

    Scan for articles and click on the link to see if the article is on topic. If it is find the author’s name. Often if you click on the author name it will take you to a bio page that includes their Twitter, LinkedIn, and/or Facebook profile. Many times you will find their email address in the bio. You should collect all the information and add it to your outreach spreadsheet. Click here to get a copy. It’s always a good idea to comment on the article to start building awareness of your name. Another good idea is to Tweet or Like the article.

    Next click on the News tab and set the same search parameters. You will get a different set of results. Repeat the same steps. Between the two searches you will have a list of authors that actively write for the websites that Google considers the most authoritative on your market.

    How to find the most socially shared authors

    Buzzsumo search for most shared by date

    Your next step is to find the writers whose articles get shared the most socially. Go to Buzzsumo and click on the Most Shared tab. Enter search terms for your market as well as competitor names. Again I like to use the past 6 months as the time range. You will get a list of articles that have been shared the most across Facebook, LinkedIn, Twitter, Pinterest, and Google+. In addition to finding the most shared articles and their authors you can also see some of the Twitter users that shared the article. Many of those Twitter users are big influencers in your market so it’s smart to start following and interacting with them as well as the authors.

    How to Find Author Email Addresses

    Some journalists publish their contact info right on the stories. For those that don’t, a bit of googling will often get you the email. For example, TechCrunch wrote a story a few years ago where they published all of their email addresses, which was in response to this new service that charges a small fee to provide journalist email addresses. Sometimes visiting their twitter pages will link to a personal site, upon which they will share an email address.

    Of course all is not lost if you don’t find an email in the bio. There are two good services for finding emails, https://app.voilanorbert.com/ and https://hunter.io/. For Voila Norbert enter the author name and the website you found their article on. The majority of the time you search for an author on a major publication Norbert will return an accurate email address. If it doesn’t try Hunter.io.

    On Hunter.io enter the domain name and click on Personal Only. Then scroll through the results to find the author’s email. I’ve found Norbert to be more accurate overall but between the two you will find most major author’s email addresses.

    Email, by the way, is not necessarily the best way to engage a journalist. Many are avid Twitter users. Follow them and engage – that means read/retweet/favorite their tweets; reply to their questions, and generally be helpful BEFORE you pitch them. Later when you email them, you won’t be just a random email address.

    Don’t spam

    Now that you have all these email addresses (possibly thousands if you purchased a list) – do NOT spam. It is incredibly tempting to think “I could try to figure out which of these folks would be interested, but if I just email all of them, I’ll save myself time and be more likely to get some of them to respond.” Don’t do it.

    Follow them and engage – that means read/retweet/favorite their tweets; reply to their questions, and generally be helpful BEFORE you pitch them.

    First, you’ll want to tailor your pitch to the individual. Second, it’s a small world and you’ll be known as someone who spams – reputation is golden. Also, don’t call journalists. Unless you know them or they’ve said they’re open to calls, you’re most likely to just annoy them.

    Build a relationship

    Build Trust with reporters Play the long game. You may be focusing just on the launch and hoping to get this one story covered, but if you don’t quickly flame-out, you will have many more opportunities to tell interesting stories that you’ll want the press to cover. Be honest and don’t exaggerate.
    When you have 500 users it’s tempting to say, “We’ve got thousands!” Don’t. The good journalists will see through it and it’ll likely come back to bite you later. If you don’t know something, say “I don’t know but let me find out for you.” Most journalists want to write interesting stories that their readers will appreciate. Help them do that. Build deeper relationships with 5 – 10 journalists, rather than spamming thousands.

    Stay organized

    It doesn’t need to be complicated, but keep a spreadsheet that includes the name, publication, and contact info of the journalists you care about. Then, use it to keep track of who you’ve pitched, who’s responded, whether you’ve sent them the materials they need, and whether they intend to write/have written.

    Make their life easy

    Journalists have a million PR people emailing them, are actively engaging with readers on Twitter and in the comments, are tracking their metrics, are working their sources…and all the while needing to publish new articles. They’re busy. Make their life easy and they’re more likely to engage with yours.

    Get to know them

    Before sending them a pitch, know what they’ve written in the space. If you tell them how your story relates to ones they’ve written, it’ll help them put the story in context, and enable them to possibly link back to a story they wrote before.

    Prepare your materials

    Journalists will need somewhere to get more info (prepare a fact sheet), a URL to link to, and at least one image (ideally a few to choose from.) A fact sheet gives bite-sized snippets of information they may need about your startup or product: what it is, how big the market is, what’s the pricing, who’s on the team, etc. The URL is where their reader will get the product or more information from you. It doesn’t have to be live when you’re pitching, but you should be able to tell what the URL will be. The images are ones that they could embed in the article: a product screenshot, a CEO or team photo, an infographic. Scan the types of images included in their articles. Don’t send any of these in your pitch, but have them ready. Studies, stats, customer/partner/investor quotes are also good to have.

    Pitch

    A pitch has to be short and compelling.

    Subject Line

    Think back to the headline you want. Is it really compelling? Can you shorten it to a subject line? Include what’s happening and when. For Mike Arrington at Techcrunch, our first subject line was “Startup doing an ‘online time machine’”. Later I would include, “launching June 6th”.

    For John Timmer at ArsTechnica, it was “Demographics data re: your 4/17 article”. Why? Because he wrote an article titled “WiFi popular with the young people; backups, not so much”. Since we had run a demographics survey on backups, I figured as a science editor he’d be interested in this additional data.

    Body

    A few key things about the body of the email. It should be short and to the point, no more than a few sentences. Here was my actual, original pitch email to John:

    Hey John,

    We’re launching Backblaze next week which provides a Time Machine-online type of service. As part of doing some research I read your article about backups not being popular with young people and that you had wished Accenture would have given you demographics. In prep for our invite-only launch I sponsored Harris Interactive to get demographic data on who’s doing backups and if all goes well, I should have that data on Friday.

    Next week starts Backup Awareness Month (and yes, probably Clean Your House Month and Brush Your Teeth Month)…but nonetheless…good time to remind readers to backup with a bit of data?

    Would you be interested in seeing/talking about the data when I get it?

    Would you be interested in getting a sneak peak at Backblaze? (I could give you some invite codes for your readers as well.)

    Gleb Budman        

    CEO and Co-Founder

    Backblaze, Inc.

    Automatic, Secure, High-Performance Online Backup

    Cell: XXX-XXX-XXXX

    The Good: It said what we’re doing, why this relates to him and his readers, provides him information he had asked for in an article, ties to something timely, is clearly tailored for him, is pitched by the CEO and Co-Founder, and provides my cell.

    The Bad: It’s too long.

    I got better later. Here’s an example:

    Subject: Does temperature affect hard drive life?

    Hi Peter, there has been much debate about whether temperature affects how long a hard drive lasts. Following up on the Backblaze analyses of how long do drives last & which drives last the longest (that you wrote about) we’ve now analyzed the impact of heat on the nearly 40,000 hard drives we have and found that…

    We’re going to publish the results this Monday, 5/12 at 5am California-time. Want a sneak peak of the analysis?

    Timing

    A common question is “When should I launch?” What day, what time? I prefer to launch on Tuesday at 8am California-time. Launching earlier in the week gives breathing room for the news to live longer. While your launch may be a single article posted and that’s that, if it ends up a larger success, earlier in the week allows other journalists (including ones who are in other countries) to build on the story. Monday announcements can be tough because the journalists generally need to have their stories finished by Friday, and while ideally everything is buttoned up beforehand, startups sometimes use the weekend as overflow before a launch.

    The 8am California-time is because it allows articles to be published at the beginning of the day West Coast and around lunch-time East Coast. Later and you risk it being past publishing time for the day. We used to launch at 5am in order to be morning for the East Coast, but it did not seem to have a significant benefit in coverage or impact, but did mean that the entire internal team needed to be up at 3am or 4am. Sometimes that’s critical, but I prefer to not burn the team out when it’s not.

    Finally, try to stay clear of holidays, major announcements and large conferences. If Apple is coming out with their next iPhone, many of the tech journalists will be busy at least a couple days prior and possibly a week after. Not always obvious, but if you can, find times that are otherwise going to be slow for news.

    Follow-up

    There is a fine line between persistence and annoyance. I once had a journalist write me after we had an announcement that was covered by the press, “Why didn’t you let me know?! I would have written about that!” I had sent him three emails about the upcoming announcement to which he never responded.

    My general rule is 3 emails.

    Ugh. However, my takeaway from this isn’t that I should send 10 emails to every journalist. It’s that sometimes these things happen.

    My general rule is 3 emails. If I’ve identified a specific journalist that I think would be interested and have a pitch crafted for her, I’ll send her the email ideally 2 weeks prior to the announcement. I’ll follow-up a week later, and one more time 2 days prior. If she ever says, “I’m not interested in this topic,” I note it and don’t email her on that topic again.

    If a journalist wrote, I read the article and engage in the comments (or someone on our team, such as our social guy, @YevP does). We’ll often promote the story through our social channels and email our employees who may choose to share the story as well. This helps us, but also helps the journalist get their story broader reach. Again, the goal is to build a relationship with the journalists your space. If there’s something relevant to your customers that the journalist wrote, you’re providing a service to your customers AND helping the journalist get the word out about the article.

    At times the stories also end up shared on sites such as Hacker News, Reddit, Slashdot, or become active conversations on Twitter. Again, we try to engage there and respond to questions (when we do, we are always clear that we’re from Backblaze.)

    And finally, I’ll often send a short thank you to the journalist.

    Getting Your First 1,000 Customers With Press

    As I mentioned at the beginning, there is more than one way to get your first 1,000 customers. My favorite is working with the press to share your story. If you figure out your compelling story, find the right journalists, make their life easy, pitch and follow-up, you stand a high likelyhood of getting coverage and customers. Better yet, that coverage will provide credibility for your company, and if done right, will establish you as a resource for the press for the future.

    Like any muscle, this process takes working out. The first time may feel a bit daunting, but just take the steps one at a time. As you do this a few times, the process will be easier and you’ll know who to reach out and quickly determine what stories will be compelling.

    The post How To Get Your First 1,000 Customers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Deploying Java Microservices on Amazon EC2 Container Service

    Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/deploying-java-microservices-on-amazon-ec2-container-service/

    This post and accompanying code graciously contributed by:

    Huy Huynh
    Sr. Solutions Architect
    Magnus Bjorkman
    Solutions Architect

    Java is a popular language used by many enterprises today. To simplify and accelerate Java application development, many companies are moving from a monolithic to microservices architecture. For some, it has become a strategic imperative. Containerization technology, such as Docker, lets enterprises build scalable, robust microservice architectures without major code rewrites.

    In this post, I cover how to containerize a monolithic Java application to run on Docker. Then, I show how to deploy it on AWS using Amazon EC2 Container Service (Amazon ECS), a high-performance container management service. Finally, I show how to break the monolith into multiple services, all running in containers on Amazon ECS.

    Application Architecture

    For this example, I use the Spring Pet Clinic, a monolithic Java application for managing a veterinary practice. It is a simple REST API, which allows the client to manage and view Owners, Pets, Vets, and Visits.

    It is a simple three-tier architecture:

    • Client
      You simulate this by using curl commands.
    • Web/app server
      This is the Java and Spring-based application that you run using the embedded Tomcat. As part of this post, you run this within Docker containers.
    • Database server
      This is the relational database for your application that stores information about owners, pets, vets, and visits. For this post, use MySQL RDS.

    I decided to not put the database inside a container as containers were designed for applications and are transient in nature. The choice was made even easier because you have a fully managed database service available with Amazon RDS.

    RDS manages the work involved in setting up a relational database, from provisioning the infrastructure capacity that you request to installing the database software. After your database is up and running, RDS automates common administrative tasks, such as performing backups and patching the software that powers your database. With optional Multi-AZ deployments, Amazon RDS also manages synchronous data replication across Availability Zones with automatic failover.

    Walkthrough

    You can find the code for the example covered in this post at amazon-ecs-java-microservices on GitHub.

    Prerequisites

    You need the following to walk through this solution:

    • An AWS account
    • An access key and secret key for a user in the account
    • The AWS CLI installed

    Also, install the latest versions of the following:

    • Java
    • Maven
    • Python
    • Docker

    Step 1: Move the existing Java Spring application to a container deployed using Amazon ECS

    First, move the existing monolith application to a container and deploy it using Amazon ECS. This is a great first step before breaking the monolith apart because you still get some benefits before breaking apart the monolith:

    • An improved pipeline. The container also allows an engineering organization to create a standard pipeline for the application lifecycle.
    • No mutations to machines.

    You can find the monolith example at 1_ECS_Java_Spring_PetClinic.

    Container deployment overview

    The following diagram is an overview of what the setup looks like for Amazon ECS and related services:

    This setup consists of the following resources:

    • The client application that makes a request to the load balancer.
    • The load balancer that distributes requests across all available ports and instances registered in the application’s target group using round-robin.
    • The target group that is updated by Amazon ECS to always have an up-to-date list of all the service containers in the cluster. This includes the port on which they are accessible.
    • One Amazon ECS cluster that hosts the container for the application.
    • A VPC network to host the Amazon ECS cluster and associated security groups.

    Each container has a single application process that is bound to port 8080 within its namespace. In reality, all the containers are exposed on a different, randomly assigned port on the host.

    The architecture is containerized but still monolithic because each container has all the same features of the rest of the containers

    The following is also part of the solution but not depicted in the above diagram:

    • One Amazon EC2 Container Registry (Amazon ECR) repository for the application.
    • A service/task definition that spins up containers on the instances of the Amazon ECS cluster.
    • A MySQL RDS instance that hosts the applications schema. The information about the MySQL RDS instance is sent in through environment variables to the containers, so that the application can connect to the MySQL RDS instance.

    I have automated setup with the 1_ECS_Java_Spring_PetClinic/ecs-cluster.cf AWS CloudFormation template and a Python script.

    The Python script calls the CloudFormation template for the initial setup of the VPC, Amazon ECS cluster, and RDS instance. It then extracts the outputs from the template and uses those for API calls to create Amazon ECR repositories, tasks, services, Application Load Balancer, and target groups.

    Environment variables and Spring properties binding

    As part of the Python script, you pass in a number of environment variables to the container as part of the task/container definition:

    'environment': [
    {
    'name': 'SPRING_PROFILES_ACTIVE',
    'value': 'mysql'
    },
    {
    'name': 'SPRING_DATASOURCE_URL',
    'value': my_sql_options['dns_name']
    },
    {
    'name': 'SPRING_DATASOURCE_USERNAME',
    'value': my_sql_options['username']
    },
    {
    'name': 'SPRING_DATASOURCE_PASSWORD',
    'value': my_sql_options['password']
    }
    ],

    The preceding environment variables work in concert with the Spring property system. The value in the variable SPRING_PROFILES_ACTIVE, makes Spring use the MySQL version of the application property file. The other environment files override the following properties in that file:

    • spring.datasource.url
    • spring.datasource.username
    • spring.datasource.password

    Optionally, you can also encrypt sensitive values by using Amazon EC2 Systems Manager Parameter Store. Instead of handing in the password, you pass in a reference to the parameter and fetch the value as part of the container startup. For more information, see Managing Secrets for Amazon ECS Applications Using Parameter Store and IAM Roles for Tasks.

    Spotify Docker Maven plugin

    Use the Spotify Docker Maven plugin to create the image and push it directly to Amazon ECR. This allows you to do this as part of the regular Maven build. It also integrates the image generation as part of the overall build process. Use an explicit Dockerfile as input to the plugin.

    FROM frolvlad/alpine-oraclejdk8:slim
    VOLUME /tmp
    ADD spring-petclinic-rest-1.7.jar app.jar
    RUN sh -c 'touch /app.jar'
    ENV JAVA_OPTS=""
    ENTRYPOINT [ "sh", "-c", "java $JAVA_OPTS -Djava.security.egd=file:/dev/./urandom -jar /app.jar" ]

    The Python script discussed earlier uses the AWS CLI to authenticate you with AWS. The script places the token in the appropriate location so that the plugin can work directly against the Amazon ECR repository.

    Test setup

    You can test the setup by running the Python script:
    python setup.py -m setup -r <your region>

    After the script has successfully run, you can test by querying an endpoint:
    curl <your endpoint from output above>/owner

    You can clean this up before going to the next section:
    python setup.py -m cleanup -r <your region>

    Step 2: Converting the monolith into microservices running on Amazon ECS

    The second step is to convert the monolith into microservices. For a real application, you would likely not do this as one step, but re-architect an application piece by piece. You would continue to run your monolith but it would keep getting smaller for each piece that you are breaking apart.

    By migrating microservices, you would get four benefits associated with microservices:

    • Isolation of crashes
      If one microservice in your application is crashing, then only that part of your application goes down. The rest of your application continues to work properly.
    • Isolation of security
      When microservice best practices are followed, the result is that if an attacker compromises one service, they only gain access to the resources of that service. They can’t horizontally access other resources from other services without breaking into those services as well.
    • Independent scaling
      When features are broken out into microservices, then the amount of infrastructure and number of instances of each microservice class can be scaled up and down independently.
    • Development velocity
      In a monolith, adding a new feature can potentially impact every other feature that the monolith contains. On the other hand, a proper microservice architecture has new code for a new feature going into a new service. You can be confident that any code you write won’t impact the existing code at all, unless you explicitly write a connection between two microservices.

    Find the monolith example at 2_ECS_Java_Spring_PetClinic_Microservices.
    You break apart the Spring Pet Clinic application by creating a microservice for each REST API operation, as well as creating one for the system services.

    Java code changes

    Comparing the project structure between the monolith and the microservices version, you can see that each service is now its own separate build.
    First, the monolith version:

    You can clearly see how each API operation is its own subpackage under the org.springframework.samples.petclinic package, all part of the same monolithic application.
    This changes as you break it apart in the microservices version:

    Now, each API operation is its own separate build, which you can build independently and deploy. You have also duplicated some code across the different microservices, such as the classes under the model subpackage. This is intentional as you don’t want to introduce artificial dependencies among the microservices and allow these to evolve differently for each microservice.

    Also, make the dependencies among the API operations more loosely coupled. In the monolithic version, the components are tightly coupled and use object-based invocation.

    Here is an example of this from the OwnerController operation, where the class is directly calling PetRepository to get information about pets. PetRepository is the Repository class (Spring data access layer) to the Pet table in the RDS instance for the Pet API:

    @RestController
    class OwnerController {
    
        @Inject
        private PetRepository pets;
        @Inject
        private OwnerRepository owners;
        private static final Logger logger = LoggerFactory.getLogger(OwnerController.class);
    
        @RequestMapping(value = "/owner/{ownerId}/getVisits", method = RequestMethod.GET)
        public ResponseEntity<List<Visit>> getOwnerVisits(@PathVariable int ownerId){
            List<Pet> petList = this.owners.findById(ownerId).getPets();
            List<Visit> visitList = new ArrayList<Visit>();
            petList.forEach(pet -> visitList.addAll(pet.getVisits()));
            return new ResponseEntity<List<Visit>>(visitList, HttpStatus.OK);
        }
    }

    In the microservice version, call the Pet API operation and not PetRepository directly. Decouple the components by using interprocess communication; in this case, the Rest API. This provides for fault tolerance and disposability.

    @RestController
    class OwnerController {
    
        @Value("#{environment['SERVICE_ENDPOINT'] ?: 'localhost:8080'}")
        private String serviceEndpoint;
    
        @Inject
        private OwnerRepository owners;
        private static final Logger logger = LoggerFactory.getLogger(OwnerController.class);
    
        @RequestMapping(value = "/owner/{ownerId}/getVisits", method = RequestMethod.GET)
        public ResponseEntity<List<Visit>> getOwnerVisits(@PathVariable int ownerId){
            List<Pet> petList = this.owners.findById(ownerId).getPets();
            List<Visit> visitList = new ArrayList<Visit>();
            petList.forEach(pet -> {
                logger.info(getPetVisits(pet.getId()).toString());
                visitList.addAll(getPetVisits(pet.getId()));
            });
            return new ResponseEntity<List<Visit>>(visitList, HttpStatus.OK);
        }
    
        private List<Visit> getPetVisits(int petId){
            List<Visit> visitList = new ArrayList<Visit>();
            RestTemplate restTemplate = new RestTemplate();
            Pet pet = restTemplate.getForObject("http://"+serviceEndpoint+"/pet/"+petId, Pet.class);
            logger.info(pet.getVisits().toString());
            return pet.getVisits();
        }
    }

    You now have an additional method that calls the API. You are also handing in the service endpoint that should be called, so that you can easily inject dynamic endpoints based on the current deployment.

    Container deployment overview

    Here is an overview of what the setup looks like for Amazon ECS and the related services:

    This setup consists of the following resources:

    • The client application that makes a request to the load balancer.
    • The Application Load Balancer that inspects the client request. Based on routing rules, it directs the request to an instance and port from the target group that matches the rule.
    • The Application Load Balancer that has a target group for each microservice. The target groups are used by the corresponding services to register available container instances. Each target group has a path, so when you call the path for a particular microservice, it is mapped to the correct target group. This allows you to use one Application Load Balancer to serve all the different microservices, accessed by the path. For example, https:///owner/* would be mapped and directed to the Owner microservice.
    • One Amazon ECS cluster that hosts the containers for each microservice of the application.
    • A VPC network to host the Amazon ECS cluster and associated security groups.

    Because you are running multiple containers on the same instances, use dynamic port mapping to avoid port clashing. By using dynamic port mapping, the container is allocated an anonymous port on the host to which the container port (8080) is mapped. The anonymous port is registered with the Application Load Balancer and target group so that traffic is routed correctly.

    The following is also part of the solution but not depicted in the above diagram:

    • One Amazon ECR repository for each microservice.
    • A service/task definition per microservice that spins up containers on the instances of the Amazon ECS cluster.
    • A MySQL RDS instance that hosts the applications schema. The information about the MySQL RDS instance is sent in through environment variables to the containers. That way, the application can connect to the MySQL RDS instance.

    I have again automated setup with the 2_ECS_Java_Spring_PetClinic_Microservices/ecs-cluster.cf CloudFormation template and a Python script.

    The CloudFormation template remains the same as in the previous section. In the Python script, you are now building five different Java applications, one for each microservice (also includes a system application). There is a separate Maven POM file for each one. The resulting Docker image gets pushed to its own Amazon ECR repository, and is deployed separately using its own service/task definition. This is critical to get the benefits described earlier for microservices.

    Here is an example of the POM file for the Owner microservice:

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        <modelVersion>4.0.0</modelVersion>
        <groupId>org.springframework.samples</groupId>
        <artifactId>spring-petclinic-rest</artifactId>
        <version>1.7</version>
        <parent>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-parent</artifactId>
            <version>1.5.2.RELEASE</version>
        </parent>
        <properties>
            <!-- Generic properties -->
            <java.version>1.8</java.version>
            <docker.registry.host>${env.docker_registry_host}</docker.registry.host>
        </properties>
        <dependencies>
            <dependency>
                <groupId>javax.inject</groupId>
                <artifactId>javax.inject</artifactId>
                <version>1</version>
            </dependency>
            <!-- Spring and Spring Boot dependencies -->
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-actuator</artifactId>
            </dependency>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-data-rest</artifactId>
            </dependency>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-cache</artifactId>
            </dependency>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-data-jpa</artifactId>
            </dependency>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-web</artifactId>
            </dependency>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-test</artifactId>
                <scope>test</scope>
            </dependency>
            <!-- Databases - Uses HSQL by default -->
            <dependency>
                <groupId>org.hsqldb</groupId>
                <artifactId>hsqldb</artifactId>
                <scope>runtime</scope>
            </dependency>
            <dependency>
                <groupId>mysql</groupId>
                <artifactId>mysql-connector-java</artifactId>
                <scope>runtime</scope>
            </dependency>
            <!-- caching -->
            <dependency>
                <groupId>javax.cache</groupId>
                <artifactId>cache-api</artifactId>
            </dependency>
            <dependency>
                <groupId>org.ehcache</groupId>
                <artifactId>ehcache</artifactId>
            </dependency>
            <!-- end of webjars -->
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-devtools</artifactId>
                <scope>runtime</scope>
            </dependency>
        </dependencies>
        <build>
            <plugins>
                <plugin>
                    <groupId>org.springframework.boot</groupId>
                    <artifactId>spring-boot-maven-plugin</artifactId>
                </plugin>
                <plugin>
                    <groupId>com.spotify</groupId>
                    <artifactId>docker-maven-plugin</artifactId>
                    <version>0.4.13</version>
                    <configuration>
                        <imageName>${env.docker_registry_host}/${project.artifactId}</imageName>
                        <dockerDirectory>src/main/docker</dockerDirectory>
                        <useConfigFile>true</useConfigFile>
                        <registryUrl>${env.docker_registry_host}</registryUrl>
                        <!--dockerHost>https://${docker.registry.host}</dockerHost-->
                        <resources>
                            <resource>
                                <targetPath>/</targetPath>
                                <directory>${project.build.directory}</directory>
                                <include>${project.build.finalName}.jar</include>
                            </resource>
                        </resources>
                        <forceTags>false</forceTags>
                        <imageTags>
                            <imageTag>${project.version}</imageTag>
                        </imageTags>
                    </configuration>
                </plugin>
            </plugins>
        </build>
    </project>

    Test setup

    You can test this by running the Python script:

    python setup.py -m setup -r <your region>

    After the script has successfully run, you can test by querying an endpoint:

    curl <your endpoint from output above>/owner

    Conclusion

    Migrating a monolithic application to a containerized set of microservices can seem like a daunting task. Following the steps outlined in this post, you can begin to containerize monolithic Java apps, taking advantage of the container runtime environment, and beginning the process of re-architecting into microservices. On the whole, containerized microservices are faster to develop, easier to iterate on, and more cost effective to maintain and secure.

    This post focused on the first steps of microservice migration. You can learn more about optimizing and scaling your microservices with components such as service discovery, blue/green deployment, circuit breakers, and configuration servers at http://aws.amazon.com/containers.

    If you have questions or suggestions, please comment below.

    Hiring a Content Director

    Post Syndicated from Ahin Thomas original https://www.backblaze.com/blog/hiring-content-director/


    Backblaze is looking to hire a full time Content Director. This role is an essential piece of our team, reporting directly to our VP of Marketing. As the hiring manager, I’d like to tell you a little bit more about the role, how I’m thinking about the collaboration, and why I believe this to be a great opportunity.

    A Little About Backblaze and the Role

    Since 2007, Backblaze has earned a strong reputation as a leader in data storage. Our products are astonishingly easy to use and affordable to purchase. We have engaged customers and an involved community that helps drive our brand. Our audience numbers in the millions and our primary interaction point is the Backblaze blog. We publish content for engineers (data infrastructure, topics in the data storage world), consumers (how to’s, merits of backing up), and entrepreneurs (business insights). In all categories, our Content Director drives our earned positioned as leaders.

    Backblaze has a culture focused on being fair and good (to each other and our customers). We have created a sustainable business that is profitable and growing. Our team places a premium on open communication, being cleverly unconventional, and helping each other out. The Content Director, specifically, balances our needs as a commercial enterprise (at the end of the day, we want to sell our products) with the custodianship of our blog (and the trust of our audience).

    There’s a lot of ground to be covered at Backblaze. We have three discreet business lines:

    • Computer Backup -> a 10 year old business focusing on backing up consumer computers.
    • B2 Cloud Storage -> Competing with Amazon, Google, and Microsoft… just at ¼ of the price (but with the same performance characteristics).
    • Business Backup -> Both Computer Backup and B2 Cloud Storage, but focused on SMBs and enterprise.

    The Best Candidate Is…

    An excellent writer – possessing a solid academic understanding of writing, the creative process, and delivering against deadlines. You know how to write with multiple voices for multiple audiences. We do not expect our Content Director to be a storage infrastructure expert; we do expect a facility with researching topics, accessing our engineering and infrastructure team for guidance, and generally translating the technical into something easy to understand. The best Content Director must be an active participant in the business/ strategy / and editorial debates and then must execute with ruthless precision.

    Our Content Director’s “day job” is making sure the blog is running smoothly and the sales team has compelling collateral (emails, case studies, white papers).

    Specifically, the Perfect Content Director Excels at:

    • Creating well researched, elegantly constructed content on deadline. For example, each week, 2 articles should be published on our blog. Blog posts should rotate to address the constituencies for our 3 business lines – not all blog posts will appeal to everyone, but over the course of a month, we want multiple compelling pieces for each segment of our audience. Similarly, case studies (and outbound emails) should be tailored to our sales team’s proposed campaigns / audiences. The Content Director creates ~75% of all content but is responsible for editing 100%.
    • Understanding organic methods for weaving business needs into compelling content. The majority of our content (but not EVERY piece) must tie to some business strategy. We hate fluff and hold our promotional content to a standard of being worth someone’s time to read. To be effective, the Content Director must understand the target customer segments and use cases for our products.
    • Straddling both Consumer & SaaS mechanics. A key part of the job will be working to augment the collateral used by our sales team for both B2 Cloud Storage and Business Backup. This content should be compelling and optimized for converting leads. And our foundational business line, Computer Backup, deserves to be nurtured and grown.
    • Product marketing. The Content Director “owns” the blog. But also assists in writing cases studies / white papers, creating collateral (email, trade show). Each of these things has a variety of call to action(s) and audiences. Direct experience is a plus, experience that will plausibly translate to these areas is a requirement.
    • Articulating views on storage, backup, and cloud infrastructure. Not everyone has experience with this. That’s fine, but if you do, it’s strongly beneficial.

    A Thursday In The Life:

    • Coordinate Collaborators – We are deliverables driven culture, not a meeting driven one. We expect you to collaborate with internal blog authors and the occasional guest poster.
    • Collaborate with Design – Ensure imagery for upcoming posts / collateral are on track.
    • Augment Sales team – Lock content for next week’s outbound campaign.
    • Self directed blog agenda – Feedback for next Tuesday’s post is addressed, next Thursday’s post is circulated to marketing team for feedback & SEO polish.
    • Review Editorial calendar, make any changes.

    Oh! And We Have Great Perks:

    • Competitive healthcare plans
    • Competitive compensation and 401k
    • All employees receive Option grants
    • Unlimited vacation days
    • Strong coffee & fully stocked Micro kitchen
    • Catered breakfast and lunches
    • Awesome people who work on awesome projects
    • Childcare bonus
    • Normal work hours
    • Get to bring your pets into the office
    • San Mateo Office – located near Caltrain and Highways 101 & 280.

    Interested in Joining Our Team?

    Send us an email to [email protected]blaze.com with the subject “Content Director”. Please include your resume and 3 brief abstracts for content pieces.
    Some hints for each of your three abstracts:

    • Create a compelling headline
    • Write clearly and concisely
    • Be brief, each abstract should be 100 words or less – no longer
    • Target each abstract to a different specific audience that is relevant to our business lines

    Thank you for taking the time to read and consider all this. I hope it sounds like a great opportunity for you or someone you know. Principles only need apply.

    The post Hiring a Content Director appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Near Zero Downtime Migration from MySQL to DynamoDB

    Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/near-zero-downtime-migration-from-mysql-to-dynamodb/

    Many companies consider migrating from relational databases like MySQL to Amazon DynamoDB, a fully managed, fast, highly scalable, and flexible NoSQL database service. For example, DynamoDB can increase or decrease capacity based on traffic, in accordance with business needs. The total cost of servicing can be optimized more easily than for the typical media-based RDBMS.

    However, migrations can have two common issues:

    • Service outage due to downtime, especially when customer service must be seamlessly available 24/7/365
    • Different key design between RDBMS and DynamoDB

    This post introduces two methods of seamlessly migrating data from MySQL to DynamoDB, minimizing downtime and converting the MySQL key design into one more suitable for NoSQL.

    AWS services

    I’ve included sample code that uses the following AWS services:

    • AWS Database Migration Service (AWS DMS) can migrate your data to and from most widely used commercial and open-source databases. It supports homogeneous and heterogeneous migrations between different database platforms.
    • Amazon EMR is a managed Hadoop framework that helps you process vast amounts of data quickly. Build EMR clusters easily with preconfigured software stacks that include Hive and other business software.
    • Amazon Kinesis can continuously capture and retain a vast amount of data such as transaction, IT logs, or clickstreams for up to 7 days.
    • AWS Lambda helps you run your code without provisioning or managing servers. Your code can be automatically triggered by other AWS services such Amazon Kinesis Streams.

    Migration solutions

    Here are the two options I describe in this post:

    1. Use AWS DMS

    AWS DMS supports migration to a DynamoDB table as a target. You can use object mapping to restructure original data to the desired structure of the data in DynamoDB during migration.

    1. Use EMR, Amazon Kinesis, and Lambda with custom scripts

    Consider this method when more complex conversion processes and flexibility are required. Fine-grained user control is needed for grouping MySQL records into fewer DynamoDB items, determining attribute names dynamically, adding business logic programmatically during migration, supporting more data types, or adding parallel control for one big table.

    After the initial load/bulk-puts are finished, and the most recent real-time data is caught up by the CDC (change data capture) process, you can change the application endpoint to DynamoDB.

    The method of capturing changed data in option 2 is covered in the AWS Database post Streaming Changes in a Database with Amazon Kinesis. All code in this post is available in the big-data-blog GitHub repo, including test codes.

    Solution architecture

    The following diagram shows the overall architecture of both options.

    Option 1:  Use AWS DMS

    This section discusses how to connect to MySQL, read the source data, and then format the data for consumption by the target DynamoDB database using DMS.

    Create the replication instance and source and target endpoints

    Create a replication instance that has sufficient storage and processing power to perform the migration job, as mentioned in the AWS Database Migration Service Best Practices whitepaper. For example, if your migration involves a large number of tables, or if you intend to run multiple concurrent replication tasks, consider using one of the larger instances. The service consumes a fair amount of memory and CPU.

    As the MySQL user, connect to MySQL and retrieve data from the database with the privileges of SUPER, REPLICATION CLIENT. Enable the binary log and set the binlog_format parameter to ROW for CDC in the MySQL configuration. For more information about how to use DMS, see Getting Started  in the AWS Database Migration Service User Guide.

    mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'welcome1';
    mysql> GRANT all ON <database name>.* TO 'repl'@'%';
    mysql> GRANT SUPER,REPLICATION CLIENT  ON *.* TO 'repl'@'%';

    Before you begin to work with a DynamoDB database as a target for DMS, make sure that you create an IAM role for DMS to assume, and grant access to the DynamoDB target tables. Two endpoints must be created to connect the source and target. The following screenshot shows sample endpoints.

    The following screenshot shows the details for one of the endpoints, source-mysql.

    Create a task with an object mapping rule

    In this example, assume that the MySQL table has a composite primary key (customerid + orderid + productid). You are going to restructure the key to the desired structure of the data in DynamoDB, using an object mapping rule.

    In this case, the DynamoDB table has the hash key that is a combination of the customerid and orderid columns, and the sort key is the productid column. However, the partition key should be decided by the user in an actual migration, based on data ingestion and access pattern. You would usually use high-cardinality attributes. For more information about how to choose the right DynamoDB partition key, see the Choosing the Right DynamoDB Partition Key AWS Database blog post.

    DMS automatically creates a corresponding attribute on the target DynamoDB table for the quantity column from the source table because rule-action is set to map-record-to-record and the column is not listed in the exclude-columns attribute list. For more information about map-record-to-record and map-record-to-document, see Using an Amazon DynamoDB Database as a Target for AWS Database Migration Service.

    Migration starts immediately after the task is created, unless you clear the Start task on create option. I recommend enabling logging to make sure that you are informed about what is going on with the migration task in the background.

    The following screenshot shows the task creation page.

    You can use the console to specify the individual database tables to migrate and the schema to use for the migration, including transformations. On the Guided tab, use the Where section to specify the schema, table, and action (include or exclude). Use the Filter section to specify the column name in a table and the conditions to apply.

    Table mappings also can be created in JSON format. On the JSON tab, check Enable JSON editing.

    Here’s an example of an object mapping rule that determines where the source data is located in the target. If you copy the code, replace the values of the following attributes. For more examples, see Using an Amazon DynamoDB Database as a Target for AWS Database Migration Service.

    • schema-name
    • table-name
    • target-table-name
    • mapping-parameters
    • attribute-mappings
    {
      "rules": [
       {
          "rule-type": "selection",
          "rule-id": "1",
          "rule-name": "1",
          "object-locator": {
            "schema-name": "mydatabase",
            "table-name": "purchase"
          },
          "rule-action": "include"
        },
        {
          "rule-type": "object-mapping",
          "rule-id": "2",
          "rule-name": "2",
          "rule-action": "map-record-to-record",
          "object-locator": {
            "schema-name": "mydatabase",
            "table-name": "purchase"
     
          },
          "target-table-name": "purchase",
          "mapping-parameters": {
            "partition-key-name": "customer_orderid",
            "sort-key-name": "productid",
            "exclude-columns": [
              "customerid",
              "orderid"           
            ],
            "attribute-mappings": [
              {
                "target-attribute-name": "customer_orderid",
                "attribute-type": "scalar",
                "attribute-sub-type": "string",
                "value": "${customerid}|${orderid}"
              },
              {
                "target-attribute-name": "productid",
                "attribute-type": "scalar",
                "attribute-sub-type": "string",
                "value": "${productid}"
              }
            ]
          }
        }
      ]
    }

    Start the migration task

    If the target table specified in the target-table-name property does not exist in DynamoDB, DMS creates the table according to data type conversion rules for source and target data types. There are many metrics to monitor the progress of migration. For more information, see Monitoring AWS Database Migration Service Tasks.

    The following screenshot shows example events and errors recorded by CloudWatch Logs.

    DMS replication instances that you used for the migration should be deleted once all migration processes are completed. Any CloudWatch logs data older than the retention period is automatically deleted.

    Option 2: Use EMR, Amazon Kinesis, and Lambda

    This section discusses an alternative option using EMR, Amazon Kinesis, and Lambda to provide more flexibility and precise control. If you have a MySQL replica in your environment, it would be better to dump data from the replica.

    Change the key design

    When you decide to change your database from RDMBS to NoSQL, you need to find a more suitable key design for NoSQL, for performance as well as cost-effectiveness.

    Similar to option #1, assume that the MySQL source has a composite primary key (customerid + orderid + productid). However, for this option, group the MySQL records into fewer DynamoDB items by customerid (hash key) and orderid (sort key). Also, remove the last column (productid) of the composite key by converting the record values productid column in MySQL to the attribute name in DynamoDB, and setting the attribute value as quantity.

    This conversion method reduces the number of items. You can retrieve the same amount of information with fewer read capacity units, resulting in cost savings and better performance. For more information about how to calculate read/write capacity units, see Provisioned Throughput.

    Migration steps

    Option 2 has two paths for migration, performed at the same time:

    • Batch-puts: Export MySQL data, upload it to Amazon S3, and import into DynamoDB.
    • Real-time puts: Capture changed data in MySQL, send the insert/update/delete transaction to Amazon Kinesis Streams, and trigger the Lambda function to put data into DynamoDB.

    To keep the data consistency and integrity, capturing and feeding data to Amazon Kinesis Streams should be started before the batch-puts process. The Lambda function should stand by and Streams should retain the captured data in the stream until the batch-puts process on EMR finishes. Here’s the order:

    1. Start real-time puts to Amazon Kinesis Streams.
    2. As soon as real-time puts commences, start batch-puts.
    3. After batch-puts finishes, trigger the Lambda function to execute put_item from Amazon Kinesis Streams to DynamoDB.
    4. Change the application endpoints from MySQL to DynamoDB.

    Step 1:  Capture changing data and put into Amazon Kinesis Streams

    Firstly, create an Amazon Kinesis stream to retain transaction data from MySQL. Set the Data retention period value based on your estimate for the batch-puts migration process. For data integrity, the retention period should be enough to hold all transactions until batch-puts migration finishes. However you do not necessarily need to select the maximum retention period. It depends on the amount of data to migrate.

    In the MySQL configuration, set binlog_format to ROW to capture transactions by using the BinLogStreamReader module. The log_bin parameter must be set as well to enable the binlog. For more information, see the Streaming Changes in a Database with Amazon Kinesis AWS Database blog post.

     

    [mysqld]
    secure-file-priv = ""
    log_bin=/data/binlog/binlog
    binlog_format=ROW
    server-id = 1
    tmpdir=/data/tmp

    The following sample code is a Python example that captures transactions and sends them to Amazon Kinesis Streams.

     

    #!/usr/bin/env python
    from pymysqlreplication import BinLogStreamReader
    from pymysqlreplication.row_event import (
      DeleteRowsEvent,
      UpdateRowsEvent,
      WriteRowsEvent,
    )
    
    def main():
      kinesis = boto3.client("kinesis")
    
      stream = BinLogStreamReader(
        connection_settings= {
          "host": "<host IP address>",
          "port": <port number>,
          "user": "<user name>",
          "passwd": "<password>"},
        server_id=100,
        blocking=True,
        resume_stream=True,
        only_events=[DeleteRowsEvent, WriteRowsEvent, UpdateRowsEvent])
    
      for binlogevent in stream:
        for row in binlogevent.rows:
          event = {"schema": binlogevent.schema,
          "table": binlogevent.table,
          "type": type(binlogevent).__name__,
          "row": row
          }
    
          kinesis.put_record(StreamName="<Amazon Kinesis stream name>", Data=json.dumps(event), PartitionKey="default")
          print json.dumps(event)
    
    if __name__ == "__main__":
    main()

    The following code is sample JSON data generated by the Python script. The type attribute defines the transaction recorded by that JSON record:

    • WriteRowsEvent = INSERT
    • UpdateRowsEvent = UPDATE
    • DeleteRowsEvent = DELETE
    {"table": "purchase_temp", "row": {"values": {"orderid": "orderidA1", "quantity": 100, "customerid": "customeridA74187", "productid": "productid1"}}, "type": "WriteRowsEvent", "schema": "test"}
    {"table": "purchase_temp", "row": {"before_values": {"orderid": "orderid1", "quantity": 1, "customerid": "customerid74187", "productid": "productid1"}, "after_values": {"orderid": "orderid1", "quantity": 99, "customerid": "customerid74187", "productid": "productid1"}}, "type": "UpdateRowsEvent", "schema": "test"}
    {"table": "purchase_temp", "row": {"values": {"orderid": "orderid100", "quantity": 1, "customerid": "customerid74187", "productid": "productid1"}}, "type": "DeleteRowsEvent", "schema": "test"}

    Step 2. Dump data from MySQL to DynamoDB

    The easiest way is to use DMS, which recently added Amazon S3 as a migration target. For an S3 target, both full load and CDC data is written to CSV format. However, CDC is not a good fit as UPDATE and DELETE statements are not supported. For more information, see Using Amazon S3 as a Target for AWS Database Migration Service.

    Another way to upload data to Amazon S3 is to use the INTO OUTFILE SQL clause and aws s3 sync CLI command in parallel with your own script. The degree of parallelism depends on your server capacity and local network bandwidth. You might find a third-party tool useful, such as pt-archiver (part of the Percona Toolkit see the appendix for details).

    SELECT * FROM purchase WHERE <condition_1>
    INTO OUTFILE '/data/export/purchase/1.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';
    SELECT * FROM purchase WHERE <condition_2>
    INTO OUTFILE '/data/export/purchase/2.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';
    ...
    SELECT * FROM purchase WHERE <condition_n>
    INTO OUTFILE '/data/export/purchase/n.csv' FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n';

    I recommend the aws s3 sync command for this use case. This command works internally with the S3 multipart upload feature. Pattern matching can exclude or include particular files. In addition, if the sync process crashes in the middle of processing, you do not need to upload the same files again. The sync command compares the size and modified time of files between local and S3 versions, and synchronizes only local files whose size and modified time are different from those in S3. For more information, see the sync command in the S3 section of the AWS CLI Command Reference.

    $ aws s3 sync /data/export/purchase/ s3://<your bucket name>/purchase/ 
    $ aws s3 sync /data/export/<other path_1>/ s3://<your bucket name>/<other path_1>/
    ...
    $ aws s3 sync /data/export/<other path_n>/ s3://<your bucket name>/<other path_n>/ 

    After all data is uploaded to S3, put it into DynamoDB. There are two ways to do this:

    • Use Hive with an external table
    • Write MapReduce code

    Hive with an external table

    Create a Hive external table against the data on S3 and insert it into another external table against the DynamoDB table, using the org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler property. To improve productivity and the scalability, consider using Brickhouse, which is a collection of UDFs for Hive.

    The following sample code assumes that the Hive table for DynamoDB is created with the products column, which is of type ARRAY<STRING >.  The productid and quantity columns are aggregated, grouping by customerid and orderid, and inserted into the products column with the CollectUDAF columns provided by Brickhouse.

    hive> DROP TABLE purchase_ext_s3; 
    --- To read data from S3 
    hive> CREATE EXTERNAL TABLE purchase_ext_s3 (
    customerid string,
    orderid    string,
    productid  string,
    quantity   string) 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
    LOCATION 's3://<your bucket name>/purchase/';
    
    Hive> drop table purchase_ext_dynamodb ; 
    --- To connect to DynamoDB table  
    Hive> CREATE EXTERNAL TABLE purchase_ext_dynamodb (
          customerid STRING, orderid STRING, products ARRAY<STRING>)
          STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
          TBLPROPERTIES ("dynamodb.table.name" = "purchase", 
          "dynamodb.column.mapping" = "customerid:customerid,orderid:orderid,products:products");
    
    --- Batch-puts to DynamoDB using Brickhouse 
    hive> add jar /<jar file path>/brickhouse-0.7.1-SNAPSHOT.jar ; 
    hive> create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
    hive> INSERT INTO purchase_ext_dynamodb 
    select customerid as customerid , orderid as orderid
           ,collect(concat(productid,':' ,quantity)) as products
          from purchase_ext_s3
          group by customerid, orderid; 

    Unfortunately, the MAP, LIST, BOOLEAN, and NULL data types are not supported by the  DynamoDBStorageHandler class, so the ARRAY<STRING> data type has been chosen. The products column of ARRAY<STRING> data type in Hive is matched to the StringSet type attribute in DynamoDB. The sample code mostly shows how Brickhouse works, and only for those who want to aggregate multiple records into one StringSet type attribute in DynamoDB.

    Python MapReduce with Hadoop Streaming

    A mapper task reads each record from the input data on S3, and maps input key-value pairs to intermediate key-value pairs. It divides source data from S3 into two parts (key part and value part) delimited by a TAB character (“\t”). Mapper data is sorted in order by their intermediate key (customerid and orderid) and sent to the reducer. Records are put into DynamoDB in the reducer step.

    #!/usr/bin/env python
    import sys
     
    # get all lines from stdin
    for line in sys.stdin:
        line = line.strip()
        cols = line.split(',')
    # divide source data into Key and attribute part.
    # example output : “cusotmer1,order1	product1,10”
        print '%s,%s\t%s,%s' % (cols[0],cols[1],cols[2],cols[3] )

    Generally, the reduce task receives the output produced after map processing (which is key/list-of-values pairs) and then performs an operation on the list of values against each key.

    In this case, the reducer is written in Python and is based on STDIN/STDOUT/hadoop streaming. The enumeration data type is not available. The reducer receives data sorted and ordered by the intermediate key set in the mapper, customerid and orderid (cols[0],cols[1]) in this case, and stores all attributes for the specific key in the item_data dictionary. The attributes in the item_data dictionary are put, or flushed, into DynamoDB every time a new intermediate key comes from sys.stdin.

    #!/usr/bin/env python
    import sys
    import boto.dynamodb
     
    # create connection to DynamoDB
    current_keys = None
    conn = boto.dynamodb.connect_to_region( '<region>', aws_access_key_id='<access key id>', aws_secret_access_key='<secret access key>')
    table = conn.get_table('<dynamodb table name>')
    item_data = {}
    
    # input comes from STDIN emitted by Mapper
    for line in sys.stdin:
        line = line.strip()
        dickeys, items  = line.split('\t')
        products = items.split(',')
        if current_keys == dickeys:
           item_data[products[0]]=products[1]  
        else:
            if current_keys:
              try:
                  mykeys = current_keys.split(',') 
                  item = table.new_item(hash_key=mykeys[0],range_key=mykeys[1], attrs=item_data )
                  item.put() 
              except Exception ,e:
                  print 'Exception occurred! :', e.message,'==> Data:' , mykeys
            item_data = {}
            item_data[products[0]]=products[1]
            current_keys = dickeys
    
    # put last data
    if current_keys == dickeys:
       print 'Last one:' , current_keys #, item_data
       try:
           mykeys = dickeys.split(',')
           item = table.new_item(hash_key=mykeys[0] , range_key=mykeys[1], attrs=item_data )
           item.put()
       except Exception ,e:
    print 'Exception occurred! :', e.message, '==> Data:' , mykeys

    To run the MapReduce job, connect to the EMR master node and run a Hadoop streaming job. The hadoop-streaming.jar file location or name could be different, depending on your EMR version. Exception messages that occur while reducers run are stored at the directory assigned as the –output option. Hash key and range key values are also logged to identify which data causes exceptions or errors.

    $ hadoop fs -rm -r s3://<bucket name>/<output path>
    $ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
               -input s3://<bucket name>/<input path> -output s3://<bucket name>/<output path>\
               -file /<local path>/mapper.py -mapper /<local path>/mapper.py \
               -file /<local path>/reducer.py -reducer /<local path>/reducer.py

    In my migration experiment using the above scripts, with self-generated test data, I found the following results, including database size and the time taken to complete the migration.

    Server MySQL instance m4.2xlarge
    EMR cluster

    master : 1 x m3.xlarge

    core  : 2 x m4.4xlarge

    DynamoDB 2000 write capacity unit
    Data Number of records 1,000,000,000
    Database file size (.ibc) 100.6 GB
    CSV files size 37 GB
    Performance (time) Export to CSV 6 min 10 sec
    Upload to S3 (sync) 3 min 30 sec
    Import to DynamoDB depending on write capacity unit

     

    The following screenshot shows the performance results by write capacity.

    Note that the performance result is flexible and can vary depending on the server capacity, network bandwidth, degree of parallelism, conversion logic, program language, and other conditions. All provisioned write capacity units are consumed by the MapReduce job for data import, so the more you increase the size of the EMR cluster and write capacity units of DynamoDB table, the less time it takes to complete. Java-based MapReduce code would be more flexible for function and MapReduce framework.

    Step 3: Amazon Lambda function updates DynamoDB by reading data from Amazon Kinesis

    In the Lambda console, choose Create a Lambda function and the kinesis-process-record-python blueprint. Next, in the Configure triggers page, select the stream that you just created.

    The Lambda function must have an IAM role with permissions to read from Amazon Kinesis and put items into DynamoDB.

    The Lambda function can recognize the transaction type of the record by looking up the type attribute. The transaction type determines the method for conversion and update.

    For example, when a JSON record is passed to the function, the function looks up the type attribute. It also checks whether an existing item in the DynamoDB table has the same key with the incoming record. If so, the existing item must be retrieved and saved in a dictionary variable (item, in this case). Apply a new update information command to the item dictionary before it is put back into DynamoDB table. This prevents the existing item from being overwritten by the incoming record.

    from __future__ import print_function
    
    import base64
    import json
    import boto3
    
    print('Loading function')
    client = boto3.client('dynamodb')
    
    def lambda_handler(event, context):
        #print("Received event: " + json.dumps(event, indent=2))
        for record in event['Records']:
            # Amazon Kinesis data is base64-encoded so decode here
            payload = base64.b64decode(record['kinesis']['data'])
            print("Decoded payload: " + payload)
            data = json.loads(payload)
            
            # user logic for data triggered by WriteRowsEvent
            if data["type"] == "WriteRowsEvent":
                my_table = data["table"]
                my_hashkey = data["row"]["values"]["customerid"]
                my_rangekey = data["row"]["values"]["orderid"]
                my_productid = data["row"]["values"]["productid"]
                my_quantity = str( data["row"]["values"]["quantity"] )
                try:
                    response = client.get_item( Key={'customerid':{'S':my_hashkey} , 'orderid':{'S':my_rangekey}} ,TableName = my_table )
                    if 'Item' in response:
                        item = response['Item']
                        item[data["row"]["values"]["productid"]] = {"S":my_quantity}
                        result1 = client.put_item(Item = item , TableName = my_table )
                    else:
                        item = { 'customerid':{'S':my_hashkey} , 'orderid':{'S':my_rangekey} , my_productid :{"S":my_quantity}  }
                        result2 = client.put_item( Item = item , TableName = my_table )
                except Exception, e:
                    print( 'WriteRowsEvent Exception ! :', e.message  , '==> Data:' ,data["row"]["values"]["customerid"]  , data["row"]["values"]["orderid"] )
            
            # user logic for data triggered by UpdateRowsEvent
            if data["type"] == "UpdateRowsEvent":
                my_table = data["table"]
                
            # user logic for data triggered by DeleteRowsEvent    
            if data["type"] == "DeleteRowsEvent":
                my_table = data["table"]
                
                
        return 'Successfully processed {} records.'.format(len(event['Records']))

    Step 4:  Switch the application endpoint to DynamoDB

    Application codes need to be refactored when you change from MySQL to DynamoDB. The following simple Java code snippets focus on the connection and query part because it is difficult to cover all cases for all applications. For more information, see Programming with DynamoDB and the AWS SDKs.

    Query to MySQL

    The following sample code shows a common way to connect to MySQL and retrieve data.

    import java.sql.* ;
    ...
    try {
        Connection conn =  DriverManager.getConnection("jdbc:mysql://<host name>/<database name>" , "<user>" , "<password>");
        stmt = conn.createStatement();
        String sql = "SELECT quantity as quantity FROM purchase WHERE customerid = '<customerid>' and orderid = '<orderid>' and productid = '<productid>'";
        ResultSet rs = stmt.executeQuery(sql);
    
        while(rs.next()){ 
           int quantity  = rs.getString("quantity");   //Retrieve by column name 
           System.out.print("quantity: " + quantity);  //Display values 
           }
    } catch (SQLException ex) {
        // handle any errors
        System.out.println("SQLException: " + ex.getMessage());}
    ...
    ==== Output ====
    quantity:1
    Query to DynamoDB

    To retrieve items from DynamoDB, follow these steps:

    1. Create an instance of the DynamoDB
    2. Create an instance of the Table
    3. Add the withHashKey and withRangeKeyCondition methods to an instance of the QuerySpec
    4. Execute the query method with the querySpec instance previously created. Items are retrieved as JSON format, so use the getJSON method to look up a specific attribute in an item.
    ...
    DynamoDB dynamoDB = new DynamoDB( new AmazonDynamoDBClient(new ProfileCredentialsProvider()));
    
    Table table = dynamoDB.getTable("purchase");
    
    QuerySpec querySpec = new QuerySpec()
            .withHashKey("customerid" , "customer1")  // hashkey name and its value 
            .withRangeKeyCondition(new RangeKeyCondition("orderid").eq("order1") ) ; // Ranage key and its condition value 
    
    ItemCollection<QueryOutcome> items = table.query(querySpec); 
    
    Iterator<Item> iterator = items.iterator();          
    while (iterator.hasNext()) {
    Item item = iterator.next();
    System.out.println(("quantity: " + item.getJSON("product1"));   // 
    }
    ...
    ==== Output ====
    quantity:1

    Conclusion

    In this post, I introduced two options for seamlessly migrating data from MySQL to DynamoDB and minimizing downtime during the migration. Option #1 used DMS, and option #2 combined EMR, Amazon Kinesis, and Lambda. I also showed you how to convert the key design in accordance with database characteristics to improve read/write performance and reduce costs. Each option has advantages and disadvantages, so the best option depends on your business requirements.

    The sample code in this post is not enough for a complete, efficient, and reliable data migration code base to be reused across many different environments. Use it to get started, but design for other variables in your actual migration.

    I hope this post helps you plan and implement your migration and minimizes service outages. If you have questions or suggestions, please leave a comment below.

    Appendix

    To install the Percona Toolkit:

    # Install Percona Toolkit

    $ wget https://www.percona.com/downloads/percona-toolkit/3.0.2/binary/redhat/6/x86_64/percona-toolkit-3.0.2-1.el6.x86_64.rpm

    $ yum install perl-IO-Socket-SSL

    $ yum install perl-TermReadKey

    $ rpm -Uvh percona-toolkit-3.0.2-1.el6.x86_64.rpm

    # run pt-archiver

    Example command:

    $ pt-archiver –source h=localhost,D=blog,t=purchase –file ‘/data/export/%Y-%m-%d-%D.%t’  –where “1=1” –limit 10000 –commit-each

     


    About the Author

    Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

     

     

     


    Converging Data Silos to Amazon Redshift Using AWS DMS

     

    AFL experiments, or please eat your brötli

    Post Syndicated from Michal Zalewski original http://lcamtuf.blogspot.com/2017/04/afl-experiments-or-please-eat-your.html

    When messing around with AFL, you sometimes stumble upon something unexpected or amusing. Say,
    having the fuzzer spontaneously synthesize JPEG files,
    come up with non-trivial XML syntax,
    or discover SQL semantics.

    It is also fun to challenge yourself to employ fuzzers in non-conventional ways. Two canonical examples are having your fuzzing target call abort() whenever two libraries that are supposed to implement the same algorithm produce different outputs when given identical input data; or when a library produces different outputs when asked to encode or decode the same data several times in a row.

    Such tricks may sound fanciful, but they actually find interesting bugs. In one case, AFL-based equivalence fuzzing revealed a
    bunch of fairly rudimentary flaws in common bignum libraries,
    with some theoretical implications for crypto apps. Another time, output stability checks revealed long-lived issues in
    IJG jpeg and other widely-used image processing libraries, leaking
    data across web origins.

    In one of my recent experiments, I decided to fuzz
    brotli, an innovative compression library used in Chrome. But since it’s been
    already fuzzed for many CPU-years, I wanted to do it with a twist:
    stress-test the compression routines, rather than the usually targeted decompression side. The latter is a far more fruitful
    target for security research, because decompression normally involves dealing with well-formed inputs, whereas compression code is meant to
    accept arbitrary data and not think about it too hard. That said, the low likelihood of flaws also means that the compression bits are a relatively unexplored surface that may be worth
    poking with a stick every now and then.

    In this case, the library held up admirably – spare for a handful of computationally intensive plaintext inputs
    (that are now easy to spot due to the recent improvements to AFL).
    But the output corpus synthesized by AFL, after being seeded just with a single file containing just “0”, featured quite a few peculiar finds:

    • Strings that looked like viable bits of HTML or XML:
      <META HTTP-AAA IDEAAAA,
      DATA="IIA DATA="IIA DATA="IIADATA="IIA,
      </TD>.

    • Non-trivial numerical constants:
      1000,1000,0000000e+000000,
      0,000 0,000 0,0000 0x600,
      0000,$000: 0000,$000:00000000000000.

    • Nonsensical but undeniably English sentences:
      them with them m with them with themselves,
      in the fix the in the pin th in the tin,
      amassize the the in the in the [email protected] in,
      he the themes where there the where there,
      size at size at the tie.

    • Bogus but semi-legible URLs:
      CcCdc.com/.com/m/ /00.com/.com/m/ /00(0(000000CcCdc.com/.com/.com

    • Snippets of Lisp code:
      )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))).

    The results are quite unexpected, given that they are just a product of randomly mutating a single-byte input file and observing the code coverage in a simple compression tool. The explanation is that brotli, in addition to more familiar binary coding methods, uses a static dictionary constructed by analyzing common types of web content. Somehow, by observing the behavior of the program, AFL was able to incrementally reconstruct quite a few of these hardcoded keywords – and then put them together in various semi-interesting ways. Not bad.