Tag Archives: What’s the Diff?

What’s the Diff: Bandwidth vs. Throughput

Post Syndicated from Vinodh Subramanian original https://backblazeprod.wpenginepowered.com/blog/whats-the-diff-bandwidth-vs-throughput/

A decorative image showing a pipe with water coming out of one end. In the center, the words "What's the Diff" are in a circle. On the left side of the image, brackets indicate that the pipe represents bandwidth, while on the left, brackets indicate that the amount of water flowing through the pipe indicates throughput.

You probably wouldn’t buy a car without knowing its horsepower. The metric might not matter as much to you as things like fuel efficiency, safety, or spiffy good looks. It might not even matter at all, but it’s still something you want to know before driving off the lot.

Similarly, you probably wouldn’t buy cloud storage without knowing a little bit about how it performs. Whether you need the metaphorical Ferrari of cloud providers, the safety features of a  Volvo, or the towing capacity of a semitruck, understanding how each performs can significantly impact your cloud storage decisions. And to understand cloud performance, you have to understand the difference between bandwidth and throughput.

In this blog, I’ll explain what bandwidth and throughput are and how they differ, as well as other key concepts like threading, multi-threading, and throttling—all of which can add more complexity and potential confusion to a cloud storage decision and the efficiency of data transfers. 

Bandwidth, Throughput, and Latency: A Primer

Three critical components form the cornerstone of cloud performance: bandwidth, throughput, and latency. To easily understand their impact, imagine the flow of data to water moving through a pipe—an analogy that paints a visual picture of how data travels across a network.

  • Bandwidth: The diameter of the pipe represents bandwidth. It’s the maximum width that dictates how much water (data) can flow through it at any given time. In technical terms, bandwidth is the data transfer rate that a network connection can support. It’s usually measured in bits per second (bps). A wider pipe (higher bandwidth) means more data can flow, similar to having a multi-lane road where more vehicles can travel side by side.
  • Throughput: If bandwidth is the pipe’s width, then throughput is the rate at which water moves through the pipe successfully. In the context of data, throughput is the actual data transfer rate that is sent over a network. It is also measured in bits per second (bps). Various factors can affect throughput—such as network traffic, processing power, packet loss, etc. While bandwidth is the potential capacity, throughput is the reality of performance, which is often less than the theoretical maximum due to real-world constraints. 
  • Latency: Now, consider the time it takes for water to start flowing from the pipe’s opening after the tap is turned on. That time delay can be considered as latency. It’s the time it takes for a packet of data to travel from the source to the destination. Latency is crucial in use cases where time is of the essence, and even a slight delay can be detrimental to the user experience.

Understanding how bandwidth, throughput, and latency are interrelated is vital for anyone relying on cloud storage services. Bandwidth sets the stage for potential performance, but it’s the throughput that delivers actual results. Meanwhile, latency is a measure of how long it takes data to be delivered to the end user in real time. 

Threading and Multi-Threading in Cloud Storage

When we talk about moving data in the cloud, two concepts often come up: threading and multi-threading. These might sound very technical, but they’re actually pretty straightforward once broken down into simpler terms. 

First of all, threads go by many different names. Different applications may refer to them as streams, concurrent threads, parallel threads, concurrent uploads, parallelism, etc. But what all these terms refer to when we’re discussing cloud storage is the process of uploading files. To understand threads, think of a big pipe with a bunch of garden hoses running through it. The garden hose is a single thread in our pipe analogy.  The hose carries water (your data) from one point to another—say from your computer to the cloud or vice versa. In simple terms, it’s the pathway your data takes. Each hose represents an individual pathway through which data can move between a storage device and the network. 

Cloud storage systems use sophisticated algorithms to manage and prioritize threads. This ensures that resources are allocated efficiently to optimize data flow. Threads can be prioritized based on various criteria such as the type of data being transferred, network conditions, and overall load on the system.

Multi-Threading

Now, imagine: instead of just one garden hose within a pipe, you have several in parallel to each other. This setup is multi-threading. It lets multiple streams of water (data) flow at the same time, significantly speeding up the entire process. In the context of cloud storage, multi-threading enables the simultaneous transfer of multiple data streams, significantly speeding up data upload and download.

Cloud storage takes advantage of multithreading. It can take pretty much as many threads as you can throw at it and its performance should scale accordingly. But it doesn’t do so automatically—because the effectiveness of multi-threading depends on the underlying network infrastructure and the ability of the software to efficiently manage multiple threads. 

Chances are most devices can’t handle or take advantage of the maximum number of threads cloud storage can handle as it puts additional load on your network and device. Therefore, it often takes a trial-and-error approach to find the sweet spot to get optimal performance without severely affecting the usability of your device.

Managing Thread Count

Certain applications automatically manage threading and adjust the number of threads for optimal performance. When you’re using cloud storage with an integration like backup software or a network attached storage (NAS) device, the multi-threading setting is typically found in the integration’s settings. 

Many backup tools, like Veeam, are already set to multi-thread by default. However, some applications might default to using a single thread unless manually configured otherwise. 

That said, there are limitations associated with managing multiple threads. The gains from increasing the number of threads are limited by the bandwidth, processing power, and memory. Additionally, not all tasks are suitable for multi-threading; some processes need to be executed sequentially to maintain data integrity and dependencies between tasks. 

A diagram showing the differences between single and multi-threading.
Learn more about threads in our deep dive.

In essence, threading is about creating a pathway for your data and multi-threading is about creating multiple pathways to move more data at the same time. This makes storing and accessing files in the cloud much faster and more efficient. 

The Role of Throttling

Throttling is the deliberate slowing down of internet speed by service providers. In the pipe analogy, it’s similar to turning down the water flow from a faucet. Service providers use throttling to manage network traffic and prevent the system from becoming overloaded. By controlling the flow, they ensure that no single user or application monopolizes the bandwidth.

Why Do Cloud Service Providers Throttle?

The primary reason cloud service providers would throttle is to maintain an equitable distribution of network resources. During peak usage times, networks can become congested, much like roads during rush hour. Throttling helps manage these peak loads, ensuring all users have access to the network without significant drops in quality or service. It’s a balancing act, aiming to provide a steady, reliable service to as many users as possible. 

Scenarios Where Throttling Can Be a Hindrance

While throttling aims to manage network traffic for fairness purposes, it can be frustrating in certain situations. For heavy data users, such as businesses that rely on real-time data access and media teams uploading and downloading large files, throttling can slow operations and impact productivity. Additionally, for services not directly causing any congestion, throttling can seem unnecessary and restrictive. 

Do CSPs Have to Throttle?

As a quick plug, Backblaze does not throttle, so customers can take advantage of all their bandwidth while uploading to B2 Cloud Storage. Many other public cloud storage providers do throttle, although they certainly may not make it widely known. If you’re considering a cloud storage provider and your use case demands high throughput or fast transfer times, it’s smart to ask the question upfront.

Optimizing Cloud Storage Performance

Achieving optimal performance in cloud storage involves more than just selecting a service; it requires a clear understanding of how bandwidth, throughput, latency, threading, and throttling interact and affect data transfer. Tailoring these elements to your specific needs can significantly enhance your cloud storage experience.

  • Balancing bandwidth, throughput, and latency: The key to optimizing cloud performance lies in your use case. For real-time applications like video conferencing or gaming, low latency is crucial, whereas, for backup use cases, high throughput might be more important. Assessing the types of files you’re transferring and their size along with content delivery networks (CDN) can help in optimizing and achieving peak performance.
  • Effective use of threading and multi-threading: Utilizing multi-threading effectively means understanding when it can be beneficial and when it might lead to diminishing returns. For large file transfers, multi-threading can significantly reduce transfer times. However, for smaller files, the overhead of managing multiple threads might outweigh the benefits. Using tools that automatically adjust the number of threads based on file size and network conditions can offer the best of both worlds.
  • Navigating throttling for optimal performance: When selecting a cloud storage provider (CSP), it’s crucial to consider their throttling policies. Providers vary in how and when they throttle data transfer speeds, affecting performance. Understanding these policies upfront can help you choose a provider that aligns with your performance needs. 

In essence, optimizing cloud storage performance is an ongoing process of adjustment and adaptation. By carefully considering your specific needs, experimenting with settings, and staying informed about your provider’s policies, you can maximize the efficiency and effectiveness of your cloud storage solutions.

The post What’s the Diff: Bandwidth vs. Throughput appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: RAM vs. Storage

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/whats-diff-ram-vs-storage/

A decorative image showing a RAM chip and a hard drive with the words What's the Diff in the center.
Editor’s Note: This post was originally published in 2016 and has since been updated in 2022 and 2023 with the latest information on RAM vs. storage.

Memory is a finite resource when it comes to both humans and computers—it’s one of the most common causes of computer issues. And if you’ve ever left the house without your keys, you know memory is one of the most common human problems, too.

If you’re unclear about the different types of memory in your computer, it makes pinpointing the cause of computer problems that much harder. You might hear folks use the terms memory and storage interchangeably, but there are some important differences. Understanding how both components work can help you understand what kind of computer you need, diagnose problems you’re having, and know when it’s time to consider upgrades. 

The Difference Between RAM and Storage

Random access memory (RAM) and storage are both forms of computer memory, but they serve different functions. 

What Is RAM?

RAM is volatile memory used by the computer’s processor to store and quickly access data that is actively being used or processed. Volatile memory maintains data only while the device is powered on. RAM takes the form of computer chips—integrated circuits—that are either soldered directly onto the main logic board of your computer or installed in memory modules that go in sockets on your computer’s logic board.

You can think of it like a desk—it’s where your computer gets work done. When you double-click on an app, open a document, or do much of anything, part of your “desk” is covered and can’t be used by anything else. As you open more files, it is like covering your desk with more and more items. Using a desk with a handful of files is easy, but a desk that is covered with a bunch of stuff gets difficult to use.

What Is Computer Storage?

On the other hand, storage is used for long-term data retention, like a hard disk drive (HDD) or solid state drive (SSD). Compared with RAM, this type of storage is non-volatile, which means it retains information even when a device is powered off. You can think of storage like a filing cabinet—a place next to your desk where you can retrieve information as needed. 

RAM vs. Storage: How Do They Compare?

Speed and Performance

Two of the primary differences between RAM and storage are speed and performance. RAM is significantly faster than storage. Data stored in RAM can be written and accessed almost instantly, so it’s very fast—milliseconds fast. DDR4 RAM, one of the newer types of RAM technology, is capable of a peak transfer rate of 25.6GB/s! RAM has a very fast path to the computer’s central processing unit (CPU), the brain of the computer that does most of the work. 

Storage, as it’s slower in comparison, is responsible for holding the operating system (OS), applications, and user data for the long term—it should still be fast, but it doesn’t need to be as fast as RAM.

That said, computer storage is getting faster thanks to the popularity of SSDs. SSDs are much faster than hard drives since they use integrated circuits instead of mechanical platters that have to be read sequentially, like HDDs. SSDs use a special type of memory circuitry called non-volatile RAM (NVRAM) to store data, so those shorter term memory access points stay in place even when the computer is turned off.

Even though SSDs are faster than HDDs, they’re still slower than RAM. There are two reasons for that difference in speed. First, the memory chips in SSDs are slower than those in RAM. Second, there is a bottleneck created by the interface that connects the storage device to the computer. RAM, in comparison, has a much faster interface.

Capacity and Size

RAM is typically smaller in capacity compared to storage. It is measured in gigabytes (GB) or terabytes (TB), whereas storage capacities can reach multiple terabytes or even petabytes. The smaller size of RAM is intentional, as it is designed to store only the data currently in use, ensuring quick access for the processor.

Volatility and Persistence

Another key difference is the volatility of RAM and the persistence of storage. RAM is volatile, meaning it loses its data when the power is turned off or the system is rebooted. This makes it ideal for quick data access and manipulation, but unsuitable for long-term storage. Storage is non-volatile or persistent, meaning it retains data even when the power is off, making it suitable for holding files, applications, and the operating system over extended periods.

How Much RAM Do I Have?

Understanding how much RAM you have might be one of your first steps for diagnosing computer performance issues. 

Use the following steps to confirm how much RAM your computer has installed. We’ll start with an Apple computer. Click on the Apple menu and then click About This Mac. In the screenshot below, we can see that the computer has 16GB of RAM.

A screenshot of the Mac system screen that shows a computer summary with total RAM.
How much RAM on macOS (Apple menu > About This Mac).

With a Windows 11 computer, use the following steps to see how much RAM you have installed. Open the Control Panel by clicking the Windows button and typing “control panel,” then click System and Security, and then click System. Look for the line “Installed RAM.” In the screenshot below, you can see that the computer has 32GB of RAM installed.

A screenshot from a Windows computer showing installed RAM.
How much RAM on Windows 11 (Control Panel > System and Security > System).

How Much Computer Storage Do I Have?

To view how much free storage space you have available on a Mac computer, use these steps. Click on the Apple menu, then System Settings, select General, and then open Storage. In the screenshot below, we’ve circled where your available storage is displayed.

A screenshot from a Mac showing total storage and usage.
Disk space on Mac OS (Apple Menu > System Settings > General > Storage).

With a Windows 11 computer, it is also easy to view how much available storage space you have. Click the Windows button and type in “file explorer.” When File Explorer opens, click on This PC from the list of options in the left-hand pane. In the screenshot below, we’ve circled where your available storage is displayed (in this case, 200GB).

A screenshot from a Windows computer showing available and used storage.
Disk Space on Windows 10 (File Explorer > This PC).

How RAM and Storage Affect Your Computer’s Performance

RAM

For most general-purpose uses of computers—email, writing documents, surfing the web, or watching Netflix—the RAM that comes with our computer is enough. If you own your computer for a long enough time, you might need to add a bit more to keep up with memory demands from newer apps and OSes. Specifically, more RAM makes it possible for you to use more apps, documents, and larger files at the same time.

People that work with very large files like large databases, videos, and images can benefit significantly from having more RAM. If you regularly use large files, it is worth checking to see if your computer’s RAM is upgradeable.

Adding More RAM to Your Computer

In some situations, adding more RAM is worth the expense. For example, editing videos and high-resolution images takes a lot of memory. In addition, high-end audio recording and editing as well as some scientific work require significant RAM.

However, not all computers allow you to upgrade RAM. For example, the Chromebook typically has a fixed amount of RAM, and you cannot install more. So, when you’re buying a new computer—particularly if you plan on using that computer for more than five years, make sure to 1) understand how much RAM your computer has, and, 2) if you can upgrade the computer’s RAM. 

When your computer’s RAM is filled up, your computer has to get creative to keep working. Specifically, your computer starts to temporarily use your hard drive or SSD as “virtual memory.” If you have relatively fast storage like an SSD, virtual memory will be fast. On the other hand, using a traditional hard drive will be fairly slow.

Storage

Besides RAM, the most serious bottleneck to improving performance in your computer can be your storage. Even with plenty of RAM installed, computers need to read and write information from the storage system (i.e., the HDD or the SSD).

Hard drives come in different speeds and sizes. For laptops and desktops, the most common RPM rates are between 5400–7200RPM. In some cases, you might even decide to use a 10,000RPM drive. Faster drives cost more, are louder, have greater cooling needs, and use more power, but they may be a good option.

New disk technologies enable hard drives to be bigger and faster. These technologies include filling the drive with helium instead of air to reduce disk platter friction and using heat or microwaves to improve disk density, such as with heat-assisted magnetic recording (HAMR) drives and microwave-assisted magnetic recording (MAMR) drives.

Today, SSDs are becoming increasingly popular for computer storage. This type of computer storage is popular because it is faster, cooler, and takes up less space than traditional hard drives. They’re also less susceptible to magnetic fields and physical jolts, which makes them great for laptops. 

For more about the difference between HDDs and SSDs, check out our post, “Hard Disk Drive (HDD) vs. Solid-state Drive (SSD): What’s the Diff?”

Adding More Computer Storage

As a user’s disk storage needs increase, typically they will look to larger drives to store more data. The first step might be to replace an existing drive with a larger, faster drive. Or you might decide to install a second drive. One approach is to use different drives for different purposes. For example, use an SSD for the operating system, and then store your business videos on a larger SSD.

If more storage space is needed, you can also use an external drive, most often using USB or Thunderbolt to connect to the computer. This can be a single drive or multiple drives and might use a data storage virtualization technology such as RAID to protect the data.

If you have really large amounts of data, or simply wish to make it easy to share data with others in your location or elsewhere, you might consider network-attached storage (NAS). A NAS device can hold multiple drives, typically uses a data virtualization technology like RAID, and is accessible to anyone on your local network and—if you wish—on the internet, as well. NAS devices can offer a great deal of storage and other services that typically have been offered only by dedicated network servers in the past.

Back Up Early and Often

As a cloud storage company, we’d be remiss not to mention that you should back up your computer. No matter how you configure your computer’s storage, remember that technology can fail (we know a thing or two about that). You always want a backup so you can restore everything easily. The best backup strategy shouldn’t be dependent on any single device, either. Your backup strategy should always include three copies of your data on two different mediums with one off-site.

FAQs About Differences Between RAM and Storage

What is the difference between internal storage and RAM and internal storage?

Internal storage is a method of data storage that writes data to a disk, holding onto that data until it’s erased. Think of it as your computer’s brain. RAM is a method of communicating data between your device’s CPU and its internal storage. Think of it as your brain’s short-term memory and ability to multi-task. The data the RAM receives is volatile, so it will only last until it’s no longer needed, usually when you turn off the power or reset the computer.

Is it better to have more RAM or more storage?

If you’re looking for better PC performance, you can upgrade either RAM or storage for a boost in performance. More RAM will make it easier for your computer to perform multiple tasks at once, while upgrading your storage will improve battery life, make it faster to open applications and files, and give you more space for photos and applications. This is especially true if you’re switching your storage from a hard disk drive (HDD) to a solid state drive (SSD).

Does RAM give you more storage?

More RAM does not provide you with more free space. If your computer is giving you notifications that you’re getting close to running out of storage or you’ve already started having to delete files to make room for new ones, you should upgrade the internal storage, not the RAM.

Is memory and storage the same?

Memory and storage are also not the same thing, even though the words are often used interchangeably. Memory is another term for RAM.

The post What’s the Diff: RAM vs. Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: NAS vs. SAN

Post Syndicated from Vinodh Subramanian original https://www.backblaze.com/blog/whats-the-diff-nas-vs-san/

A diagram showing how NAS vs. SAN store data on a network.

The terms NAS and SAN can be confusing—the technology is similar and, making matters worse, the acronyms are the reverse of each other. NAS stands for network attached storage and SAN stands for storage area network. They were both developed to solve the problem of making stored data available to many users at once. But, they couldn’t be more different in how they achieve that goal.

The TL/DR:

  • NAS is a single storage device that serves files over ethernet and is relatively inexpensive. NAS devices are easier for a home user or small business to set up.
  • A SAN is a tightly coupled network of multiple devices that is more expensive and complex to set up and manage. A SAN is better suited for larger businesses and requires administration by IT staff. 

Read on and we’ll dissect the nuances of NAS and SANs to help you make informed decisions about which solution best suits your storage needs.

Check Out Our New Technical Documentation Portal

When you’re working on a storage project, you need to be able to find instructions about the tools you’re using quickly. And, it helps if those instructions are easy to use, easy to understand, and easy to share. Our Technical Documentation Portal has been completely overhauled to deliver on-demand content in a user-friendly way so you can find the information you need. Check out the NAS section, including all of our Integration Guides.

Basic Definitions: What Is NAS?

NAS is a device or devices with a large data storage capacity that provides file-based data storage services to other devices on a network. Usually, they also have a client or web portal interface that’s easy to navigate, as well as services like QNAP’s Hybrid Backup Sync or Synology’s Hyper Backup to help manage your files. In other words, NAS is synonymous with user-friendly file sharing. 

A photo of a Synology NAS device.
NAS with eight drive bays for 3.5″ disk drives.

At its core, NAS operates as a standalone device connected to a network, offering shared access to files and folders. NAS volumes appear to the user as network-mounted volumes. The files to be served are typically contained on one or more hard drives in the system, often arranged in RAID arrays. Generally, the more drive bays available within the NAS, the larger and more flexible storage options you have.

Key Characteristics of NAS:

  • File-Level Access: NAS provides file-level access, ideal for environments where collaborative work and content sharing are paramount.
  • Simplicity: NAS solutions offer straightforward setups and intuitive interfaces, making them accessible to users with varying levels of technical expertise.
  • Scalability: While NAS devices can be expanded by adding more drives, there may be limitations in terms of performance and scalability for large-scale enterprise use.

How NAS Works

The NAS device itself is a network node—much like computers and other TCP/IP devices, all of which maintain their own IP address—and the NAS file service uses the ethernet network to send and receive files. This system employs protocols like network file system (NFS) and server message block (SMB), enabling seamless data exchange between multiple users.

A diagram showing how a NAS stores information on a network. A NAS device is at the starting point, flowing into a network switch, then out to network connected clients (computers).
The NAS system and clients connect via your local network—all file service occurs via ethernet.

Benefits of NAS

NAS devices are designed to be easy to manage, making them a popular choice for home users, small businesses, and departments seeking straightforward centralized storage. They offer an easy way for multiple users in multiple locations to access data, which is valuable when users are collaborating on projects or need to share information. 

For individual home users, if you’re currently using external hard drives or direct attached storage, which can be vulnerable to drive failure, upgrading to a NAS ensures your data is better protected.  

For small business or departments, installing NAS is typically driven by the desire to share files locally and remotely, have files available 24/7, achieve data redundancy, have the ability to replace and upgrade hard drives in the system, and most importantly, support integrations with cloud storage that provide a location for necessary automatic data backups.

NAS offers robust access controls and security mechanisms to facilitate collaborative efforts. Moreover, it empowers non-technical individuals to oversee and manage data access through an embedded web server. Its built-in redundancy, often achieved through RAID configurations, ensures solid data resilience. This technology merges multiple drives into a cohesive unit, mimicking a single, expansive volume capable of withstanding the failure of a subset of its constituent drives.

Download Our Complete NAS Guide ➔ 

Summary of NAS Benefits:

  • Relatively inexpensive.
  • A self-contained solution.
  • Easy administration.
  • Remote data availability and 24/7 access.
  • Wide array of systems and sizes to choose from.
  • Drive failure-tolerant storage volumes.
  • Automatic backups to other devices and the cloud.

Limitations of NAS

The weaknesses of NAS primarily revolve around scalability and performance. If more users need access, the server might struggle to keep pace. If you overprovisioned your NAS, you may be able to add storage. But sooner or later you’ll need to upgrade to a more powerful system with a bigger on-board processor, more memory, and faster and larger network connections. 

Another drawback ties back to ethernet’s inherent nature. Ethernet divides data into packets, forwarding them to their destination. Yet, depending on network traffic or other issues, potential delays or disorder in packet transmission can hinder file availability until all packets arrive and are put back in order. 

Although minor latency (slowness) is not usually noticed by users for small files, in data-intensive domains like video production, where large files are at play, even milliseconds of latency can disrupt operations, particularly video editing workflows.

Basic Definitions: What Is a SAN?

On the other end of the spectrum, SANs are engineered for high-performance and mission-critical applications. They function by connecting multiple storage devices, such as disk arrays or tape libraries, to a dedicated network that is separate from the main local area network (LAN). This isolation ensures that storage traffic doesn’t interfere with regular network traffic, leading to optimized performance and data availability.

Unlike NAS, a SAN operates at the block level, allowing servers to access storage blocks directly. This architecture is optimized for data-intensive tasks like database management and virtualization or video editing, where low latency and consistent high-speed access are essential.

Key Characteristics of SANs:

  • Block-Level Access: SANs provide direct access to storage blocks, which is advantageous for applications requiring fast, low-latency data retrieval.
  • Performance: SANs are designed to meet the rigorous demands of enterprise-level applications, ensuring reliable and high-speed data access.
  • Scalability: SANs offer greater scalability by connecting multiple storage devices, making them suitable for businesses with expanding storage needs.

How Does a SAN Work?

A SAN is built from a combination of servers and storage over a high speed, low latency interconnect that allows direct Fibre Channel (FC) connections from the client to the storage volume to provide the fastest possible performance. The SAN may also require a separate, private ethernet network between the server and clients to keep the file request traffic out of the FC network for even more performance. 

By joining together the clients, SAN server, and storage on a FC network, the SAN volumes appear and perform as if it were a directly connected hard drive. Storage traffic over FC avoids the TCP/IP packetization and latency issues, as well as any LAN congestion, ensuring the highest access speed available for media and mission critical stored data.

A diagram showing how a SAN works. Several server endpoints, including a metadata server and storage arrays flow through a Fibre Channel switch, then to the network endpoints (computers).
The SAN management server, storage arrays, and clients all connect via a FC network—all file serving occurs over Fibre Channel.

Benefits of a SAN

Because it’s considerably more complex and expensive than NAS, a SAN is typically used by businesses versus individuals and typically requires administration by an IT staff. 

The primary strength of a SAN is that it allows simultaneous shared access to shared storage that becomes faster with the addition of storage controllers. SANs are optimized for data-intensive applications. For example, hundreds of video editors can simultaneously access tens of GB per second of storage simultaneously without straining the network. 

SANs can be easily expanded by adding more storage devices, making them suitable for growing storage needs. Storage resources can be efficiently managed and allocated from a central location. SANs also typically include redundancy and fault tolerance mechanisms to ensure data integrity and availability.

Summary of a SAN’s Benefits:

  • Extremely fast data access with low latency.
  • Relieves stress on a local area network.
  • Can be scaled up to the limits of the interconnect.
  • Operating system level (“native”) access to files.
  • Often the only solution for demanding applications requiring concurrent shared access.

Limitations of a SAN

The challenge of a SAN can be summed up in its cost and administration requirements—having to dedicate and maintain both a separate ethernet network for metadata file requests and implement a FC network can be a considerable investment. That being said, a SAN is often the only way to provide very fast data access for a large number of users that also can scale to supporting hundreds of users at the same time.

The Main Differences Between NAS and SANs

NAS SAN
Use case Often used in homes and small to medium sized businesses. Often used in professional and enterprise environments.
Cost Less expensive. More expensive.
Ease of administration Easier to manage. Requires more IT administration.
How data is accessed Data accessed as if it were a network-attached drive. Servers access data as if it were a local hard drive.
Speed Speed is dependent on local TCP/IP ethernet network, typically 1GbE to 10GbE but can be up to 25GbE or even 40GbE connections, and affected by the number of other users accessing the storage at the same time. Generally slower throughput and higher latency due to the nature of ethernet packetization, waiting for the file server, and latency in general. High speed using Fibre Channel, most commonly available in 16 Gb/s to 32 Gb/s however newer standards can go up to 128 Gb/s. FC can be delivered via high speed ethernet such as 10Gbit or 40Gbit+ networks using protocols such as FCoE and iSCSI.
Network connection SMB/CIFS, NFS, SFTP, and WebDAV. Fibre Channel, iSCSI, FCoE.
Scalability Lower-end not highly scalable; high-end NAS scale to petabytes using clusters or scale-out nodes. Can add more storage controllers, or expanded storage arrays allowing SAN admins to scale performance, storage, or both.
Networking method Simply connects to your existing ethernet network. Simply connects to your existing ethernet network.
Simply connects to your existing ethernet network. Entry level systems often have a single point of failure, e.g. power supply. Fault tolerant network and systems with redundant functionality.
Limitations Subject to general ethernet issues. Behavior is more predictable in controlled, dedicated environments.

Choosing the Right Solution

When considering a NAS device or a SAN, you might find it helpful to think of it this way: NAS is simple to set up, easy to administer, and great for general purpose applications. Meanwhile, a SAN can be more challenging to set up and administer, but it’s often the only way to make shared storage available for mission critical and high performance applications.

The choice between a NAS device and a SAN hinges on understanding your unique storage requirements and workloads. NAS is an excellent choice for environments prioritizing collaborative sharing and simple management. In contrast, a SAN shines when performance and scalability are top priorities, particularly for businesses dealing with data-heavy applications.

Ultimately, the decision should factor in aspects such as budget, anticipated growth, workload demands, and the expertise of your IT team. Striking the right balance between ease of use, performance, and scalability will help ensure your chosen storage solution aligns seamlessly with your goals.

Are You Using NAS, a SAN, or Both?

If you are using a NAS device or a SAN, we’d love to hear from you about what you’re using and how you’re using them in the comments.

The post What’s the Diff: NAS vs. SAN appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: Hot and Cold Data Storage

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/whats-the-diff-hot-and-cold-data-storage/

A decorative image showing two thermometers overlaying pictures of servers. The one on the left says "cold" and the one on the right says "hot".

This post was originally published in 2017 and updated in 2019 and 2023 to share the latest information on cloud storage tiering.

Temperature, specifically a range from cold to hot, is a common way to describe different levels of data storage. It’s possible these terms originated based on where data was historically stored. Hot data was stored close to the heat of the spinning drives and CPUs. Cold data was stored on drives or tape away from the warmer data center, likely tucked away on a shelf somewhere. 

Today, they’re used to describe how easily you can access your data. Hot storage is for data you need fast or access frequently. Cold storage is typically used for data you rarely need. The terms are used by most data storage providers to describe their tiered storage plans. However, there are no industry standard definitions for what hot and cold mean, which makes comparing services across different storage providers challenging. 

It’s a common misconception that hot storage means expensive storage and that cold storage means slower, less expensive storage. Today, we’ll explain why these terms may no longer be serving you when it comes to anticipating storage cost and performance.

Defining Hot Storage

Hot storage serves as the go-to destination for frequently accessed and mission-critical data that demands swift retrieval. Think of it as the fast lane of data storage, tailored for scenarios where time is of the essence. Industries relying on real-time data processing and rapid response times, such as video editing, web content, and application development, find hot storage to be indispensable.

To achieve the necessary rapid data access, hot storage is often housed in hybrid or tiered storage environments. The hotter the service, the more it embraces cutting-edge technologies, including the latest drives, fastest transport protocols, and geographical proximity to clients or multiple regions. However, the resource-intensive nature of hot storage warrants a premium, and leading cloud data storage providers like Microsoft’s Azure Hot Blobs and AWS S3 reflect this reality.

Data stored in the hottest tier might use solid-state drives (SSDs), which are optimized for lower latency and higher transactional rates compared to traditional hard drives. In other cases, hard disk drives are more suitable for environments where the drives are heavily accessed due to their higher durability standing up to intensive read/write cycles.

Regardless of the storage medium, hot data workloads necessitate fast and consistent response times, making them ideal for tasks like capturing telemetry data, messaging, and data transformation.

Defining Cold Storage

On the opposite end of the data storage spectrum lies cold storage, catering to information accessed infrequently and without the urgency of hot data. Cold storage houses data that might remain dormant for extended periods, months, years, decades, or maybe forever. Practical examples might include old projects or records mandated for financial, legal, HR, or other business record-keeping requirements.

Cold cloud storage systems prioritize durability and cost-effectiveness over real-time data manipulation capabilities. Services like Amazon Glacier and Google Coldline take this approach, offering slower retrieval and response times than their hot storage counterparts. Lower performing and less expensive storage environments, both on-premises and in the cloud, commonly host cold data. 

Linear Tape Open (LTO or Tape) has historically been a popular storage medium for cold data, though manual retrieval from storage racks renders it relatively slow. To access data from LTO, the tapes must be physically retrieved from storage racks and mounted in a tape reading machine, making it one of the slowest, therefore coldest, methods of storing data.

While cold cloud storage systems generally boast lower overall costs than warm or hot storage, they may incur higher per-operation expenses. Accessing data from cold storage demands patience and thoughtful planning, as the response times are intentionally sluggish.

With the landscape of data storage continually evolving, the definition of cold storage has also expanded. In modern contexts, cold storage might describe completely offline data storage, wherein information resides outside the cloud and remains disconnected from any network. This isolation, also described as air gapped, is crucial for safeguarding sensitive data. However, today, data can be virtually air-gapped using technology like Object Lock.

Traditional Views of Cold and Hot Data Storage

Cold Hot
Access Speed Slow Fast
Access Frequency Seldom or Never Frequent
Data Volume Low High
Storage Media Slower drives, LTO, offline Faster drives, durable drives, SSDs
Cost Lower Higher

What Is Hot Cloud Storage?

Today there are new players in data storage, who, through innovation and efficiency, are able to offer cloud storage at the cost of cold storage, but with the performance and availability of hot storage.

The concept of organizing data by temperature has long been employed by diversified cloud providers like Amazon, Microsoft, and Google to describe their tiered storage services and set pricing accordingly. But, today, in a cloud landscape defined by the open, multi-cloud internet, customers have come to realize the value and benefits they can get from moving away from those diversified providers. 

A wave of independent cloud providers are disrupting the traditional notions of cloud storage temperatures, offering cloud storage that’s as cost-effective as cold storage, yet delivering the speed and availability associated with hot storage. If you’re familiar with Backblaze B2 Cloud Storage, you know where we’re going with this. 

Backblaze B2 falls into this category. We can compete on price with LTO and other traditionally cold storage services, but can be used for applications that are usually reserved for hot storage, such as media management, workflow collaboration, websites, and data retrieval.

The newfound efficiency of this model has prompted customers to rethink their storage strategies, opting to migrate entirely from cumbersome cold storage and archival systems.

What Temperature Is Your Cloud Storage?

When it comes to choosing the right storage temperature for your cloud data, organizations must carefully consider their unique needs. Ensuring that storage costs align with actual requirements is key to maintaining a healthy bottom line. The ongoing evolution of cloud storage services, driven by efficiency, technology, and innovation, further amplifies the need for tailored storage solutions.

Still have questions that aren’t answered here? Join the discussion in the comments.

The post What’s the Diff: Hot and Cold Data Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: VMs vs. Containers

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/vm-vs-containers/

A decorative images comparing VMs and containers.
This post was originally published in 2018 and updated in 2021. We’re sharing an update to this post to provide the latest information on VMs and containers.

Both virtual machines (VMs) and containers help you optimize computer hardware and software resources via virtualization. 

Containers have been around for a while, but their broad adoption over the past few years has fundamentally changed IT practices. On the other hand, VMs have enjoyed enduring popularity, maintaining their presence across data centers of various scales.

As you think about how to run services and build applications in the cloud, these virtualization techniques can help you do so faster and more efficiently.  Today, we’re digging into how they work, how they compare to each other, and how to use them to drive your organization’s digital transformation.

First, the Basics: Some Definitions

What Is Virtualization?

Virtualization is the process of creating a virtual version or representation of computing resources like servers, storage devices, operating systems (OS), or networks that are abstracted from the physical computing hardware. This abstraction enables greater flexibility, scalability, and agility in managing and deploying computing resources. You can create multiple virtual computers from the hardware and software components of a single machine. You can think of it as essentially a computer-generated computer.

What Is a Hypervisor?

The software that enables the creation and management of virtual computing environments is called a hypervisor. It’s a lightweight software or firmware layer that sits between the physical hardware and the virtualized environments and allows multiple operating systems to run concurrently on a single physical machine. The hypervisor abstracts and partitions the underlying hardware resources, such as central processing units (CPUs), memory, storage, and networking, and allocates them to the virtual environments.  You can think of the hypervisor as the middleman that pulls resources from the raw materials of your infrastructure and directs them to the various computing instances.

There are two types of hypervisors: 

  1. Type 1, bare-metal hypervisors, run directly on the hardware. 
  2. Type 2 hypervisors operate within a host operating system. 

Hypervisors are fundamental to virtualization technology, enabling efficient utilization and management of computing resources.

VMs and Containers

What Are VMs?

The computer-generated computers that virtualization makes possible are known as virtual machines (VMs)—separate virtual computers running on one set of hardware or a pool of hardware. Each virtual machine acts as an isolated and self-contained environment, complete with its own virtual hardware components, including CPU, memory, storage, and network interfaces. The hypervisor allocates and manages resources, ensuring each VM has its fair share and preventing interference between them.

Each VM requires its own OS. Thus each VM can host a different OS, enabling diverse software environments and applications to exist without conflict on the same machine. VMs provide a level of isolation, ensuring that failures or issues within one VM do not impact others on the same hardware. They also enable efficient testing and development environments, as developers can create VM snapshots to capture specific system states for experimentation or rollbacks. VMs also offer the ability to easily migrate or clone instances, making it convenient to scale resources or create backups.

Since the advent of affordable virtualization technology and cloud computing services, IT departments large and small have embraced VMs as a way to lower costs and increase efficiencies.

A how virtual diagram of virtual machines interact with and are stored on a server.

VMs, however, can take up a lot of system resources. Each VM runs not just a full copy of an OS, but a virtual copy of all the hardware that the operating system needs to run. It’s why VMs are sometimes associated with the term “monolithic”—they’re single, all-in-one units commonly used to run applications built as single, large files. (The nickname, “monolithic,” will make a bit more sense after you learn more about containers below.) This quickly adds up to a lot of RAM and CPU cycles. They’re still economical compared to running separate actual computers, but for some use cases, particularly applications, it can be overkill, which led to the development of containers.

Benefits of VMs

  • All OS resources available to apps.
  • Well-established functionality.
  • Robust management tools.
  • Well-known security tools and controls.
  • The ability to run different OS on one physical machine.
  • Cost savings compared to running separate, physical machines.

Popular VM Providers

What Are Containers?

With containers, instead of virtualizing an entire computer like a VM, just the OS is virtualized.

Containers sit on top of a physical server and its host OS—typically Linux or Windows. Each container shares the host OS kernel and, usually, the binaries and libraries, too, resulting in more efficient resource utilization. (See below for definitions if you’re not familiar with these terms.) Shared components are read-only.

Why are they more efficient? Sharing OS resources, such as libraries, significantly reduces the need to reproduce the operating system code—a server can run multiple workloads with a single operating system installation. That makes containers lightweight and portable—they are only megabytes in size and take just seconds to start. What this means in practice is you can put two to three times as many applications on a single server with containers than you can with a VM. Compared to containers, VMs take minutes to run and are an order of magnitude larger than an equivalent container, measured in gigabytes versus megabytes.

Container technology has existed for a long time, but the launch of Docker in 2013 made containers essentially industry standard for application and software development. Technologies like Docker or Kubernetes to create isolated environments for applications. And containers solve the problem of environment inconsistency—the old “works on my machine” problem often encountered in software development and deployment.

Developers generally write code locally, say on their laptop, then deploy that code on a server. Any differences between those environments—software versions, permissions, database access, etc.—leads to bugs. With containers, developers can create a portable, packaged unit that contains all of the dependencies needed for that unit to run in any environment whether it’s local, development, testing, or production. This portability is one of containers’ key advantages.

Containers also offer scalability, as multiple instances of a containerized application can be deployed and managed in parallel, allowing for efficient resource allocation and responsiveness to changing demand.

Microservices architectures for application development evolved out of this container boom. With containers, applications could be broken down into their smallest component parts or “services” that serve a single purpose, and those services could be developed and deployed independently of each other instead of in one monolithic unit. 

For example, let’s say you have an app that allows customers to buy anything in the world. You might have a search bar, a shopping cart, a buy button, etc. Each of those “services” can exist in their own container, so that if, say, the search bar fails due to high load, it doesn’t bring the whole thing down. And that’s how you get your Prime Day deals today.

A diagram for how containers interact with and are stored on a server.

More Definitions: Binaries, Libraries, and Kernels

Binaries: In general, binaries are non-text files made up of ones and zeros that tell a processor how to execute a program.

Libraries: Libraries are sets of prewritten code that a program can use to do either common or specialized things. They allow developers to avoid rewriting the same code over and over.

Kernels: Kernels are the ringleaders of the OS. They’re the core programming at the center that controls all other parts of the operating system.

Container Tools

Linux Containers (LXC): Commonly known as LXC, these are the original Linux container technology. LXC is a Linux operating system-level virtualization method for running multiple isolated Linux systems on a single host.

Docker: Originally conceived as an initiative to develop LXC containers for individual applications, Docker revolutionized the container landscape by introducing significant enhancements to improve their portability and versatility. Gradually evolving into an independent container runtime environment, Docker emerged as a prominent Linux utility, enabling the seamless creation, transportation, and execution of containers with remarkable efficiency.

Kubernetes: Kubernetes, though not a container software in its essence, serves as a vital container orchestrator. In the realm of cloud-native architecture and microservices, where applications deploy numerous containers ranging from hundreds to thousands or even billions, Kubernetes plays a crucial role in automating the comprehensive management of these containers. While Kubernetes relies on complementary tools like Docker to function seamlessly, it’s such a big name in the container space it wouldn’t be a container post without mentioning it.

Benefits of Containers

  • Reduced IT management resources.
  • Faster spin ups.
  • Smaller size means one physical machine can host many containers.
  • Reduced and simplified security updates.
  • Less code to transfer, migrate, and upload workloads.

What’s the Diff: VMs vs. Containers

The virtual machine versus container debate gets at the heart of the debate between traditional IT architecture and contemporary DevOps practices.

VMs have been, and continue to be, tremendously popular and useful, but sadly for them, they now carry the term “monolithic” with them wherever they go like a 25-ton Stonehenge around the neck. Containers, meanwhile, pushed the old gods aside, bedecked in the glittering mantle of “microservices.” Cute.

To offer another quirky tech metaphor, VMs are to containers what glamping is to ultralight backpacking. Both equip you with everything you need to survive in the wilds of virtualization. Both are portable, but containers will get you farther, faster, if that’s your goal. And while VMs bring everything and the kitchen sink, containers leave the toothbrush at home to cut weight. To make a more direct comparison, we’ve consolidated the differences into a handy table:

VMs Containers
Heavyweight. Lightweight.
Limited performance. Native performance.
Each VM runs in its own OS. All containers share the host OS.
Hardware-level virtualization. OS virtualization.
Startup time in minutes. Startup time in milliseconds.
Allocates required memory. Requires less memory space.
Fully isolated and hence more secure. Process-level isolation, possibly less secure.

Uses for VMs vs. Uses for Containers

Both containers and VMs have benefits and drawbacks, and the ultimate decision will depend on your specific needs.

When it comes to selecting the appropriate technology for your workloads, virtual machines (VMs) excel in situations where applications demand complete access to the operating system’s resources and functionality. When you need to run multiple applications on servers, or have a wide variety of operating systems to manage, VMs are your best choice. If you have an existing monolithic application that you don’t plan to or need to refactor into microservices, VMs will continue to serve your use case well.

Containers are a better choice when your biggest priority is maximizing the number of applications or services running on a minimal number of servers and when you need maximum portability. If you are developing a new app and you want to use a microservices architecture for scalability and portability, containers are the way to go. Containers shine when it comes to cloud-native application development based on a microservices architecture.

You can also run containers on a virtual machine, making the question less of an either/or and more of an exercise in understanding which technology makes the most sense for your workloads.

In a nutshell:

  • VMs help companies make the most of their infrastructure resources by expanding the number of machines you can squeeze out of a finite amount of hardware and software.
  • Containers help companies make the most of the development resources by enabling microservices and DevOps practices.

Are You Using VMs, Containers, or Both?

If you are using VMs or containers, we’d love to hear from you about what you’re using and how you’re using them. Drop a note in the comments.

The post What’s the Diff: VMs vs. Containers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: Programs, Processes, and Threads

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/

A decorative image showing three computers with the words programs, processes, and threads displayed. In the center, there's a circle with the words what's the diff.

Editor’s Note

This post has been updated since it was originally published in 2017.

Programs, processes, and threads are all terms that relate to software execution, but you may not know what they really mean. Whether you’re a seasoned developer, an aspiring enthusiast, or you’re just wondering what you’re looking at when you open Task Manager on a PC or Activity Monitor on a Mac, learning these terms is essential for understanding how a computer works.

This post explains the technical concepts behind computer programs, processes, and threads to give you a better understanding of the functionality of your digital devices. With this knowledge, you can quickly diagnose problems and come up with solutions, like knowing if you need to install more memory for better performance. If you care about having a fast, efficient computer, it is worth taking the time to understand these key terms. 

What Is a Computer Program?

A program is a sequence of coded commands that tells a computer to perform a given task. There are many types of programs, including programs built into the operating system (OS) and ones to complete specific tasks. Generally, task-specific programs are called applications (or apps). For example, you are probably reading this post using a web browser application like Google Chrome, Mozilla Firefox, or Apple Safari. Other common applications include email clients, word processors, and games.

The process of creating a computer program involves designing algorithms, writing code in a programming language, and then compiling or interpreting that code to transform it into machine-readable instructions that the computer can execute.

What Are Programming Languages?

Programming languages are the way that humans and computers talk to each other. They are formalized sets of rules and syntax.

A decorative image showing stylized C# code.
C# example of program code.

Compiled vs. Interpreted Programs

Many programs are written in a compiled language and created using programming languages like C, C++, C#. The end result is a text file of code that is compiled into binary form in order to run on the computer (more on binary form in a few paragraphs). The text file speaks directly to your computer. While they’re typically fast, they are also fixed compared to interpreted programs. That has positives and negatives: you have more control over things like memory management, but you’re platform dependent and, if you have to change something in your code, it typically takes longer to build and test.

There is another kind of program called an interpreted program. They require an additional program to take your program instructions and translate that to code for your computer. Compared with compiled languages, these types of programs are platform-independent (you just have to find a different interpreter, instead of writing a whole new program) and they typically take up less space. Some of the most common interpreted programming languages are Python, PHP, JavaScript, and Ruby.

Ultimately, both kinds of programs are run and loaded into memory in binary form. Programs have to run in binary because your computer’s CPU understands only binary instructions.

What Is Binary Code?

Binary is the native language of computers. At their most basic level, computers use only two states of electrical current—on and off. The on state is represented by 1 and the off state is represented by 0. Binary is different from the number system—base 10—that we use in daily life. In base 10, each digit position can be anything from 0 to 9. In the binary system, also known as base 2, each position is either a 0 or a 1.

A chart showing the numerals zero through nine shown rendered in base 10 and base 2 numeral systems.

Perhaps you’ve heard the programmer’s joke, “There are only 10 types of people in the world, those who understand binary, and those who don’t.”

How Are Computer Programs Stored and Run?

Programs are typically stored on a disk or in nonvolatile memory in executable format. Let’s break that down to understand why.

In this context, we’ll talk about your computer having two types of memory: volatile and nonvolatile. Volatile memory is temporary and processes in real time. It’s faster, easily accessible, and increases the efficiency of your computer. However, it’s not permanent. When your computer turns off, this type of memory resets.

Nonvolatile memory, on the other hand, is permanent unless deleted. While it’s slower to access, it can store more information. So, that makes it a better place to store programs. A file in an executable format is simply one that runs a program. It can be run directly by your CPU (that’s your processor). Examples of these file types are .exe in Windows and .app in Mac.

What Resources Does a Program Need to Run?

Once a program has been loaded into memory in binary form, what happens next?

Your executing program needs resources from the OS and memory to run. Without these resources, you can’t use the program. Fortunately, your OS manages the work of allocating resources to your programs automatically. Whether you use Microsoft Windows, macOS, Linux, Android, or something else, your OS is always hard at work directing your computer’s resources needed to turn your program into a running process.

In addition to OS and memory resources, there are a few essential resources that every program needs.

  • Register. Think of a register as a holding pen that contains data that may be needed by a process like instructions, storage addresses, or other data.
  • Program counter. Also known as an instruction pointer, the program counter plays an organizational role. It keeps track of where a computer is in its program sequence.
  • Stack. A stack is a data structure that stores information about the active subroutines of a computer program. It is used as scratch space for the process. It is distinguished from dynamically allocated memory for the process that is known as the “heap.”
The main resources a program needs to run.

What Is a Computer Process?

When a program is loaded into memory along with all the resources it needs to operate, it is called a process. You might have multiple instances of a single program. In that situation, each instance of that running program is a process. 

Each process has a separate memory address space. That separate memory address is helpful because it means that a process runs independently and is isolated from other processes. However, processes cannot directly access shared data in other processes. Switching from one process to another requires some amount of time (relatively speaking) for saving and loading registers, memory maps, and other resources.

Having independent processes matters for users because it means one process won’t corrupt or wreak havoc on other processes. If a single process has a problem, you can close that program and keep using your computer. Practically, that means you can end a malfunctioning program and keep working with minimal disruptions.

What Are Threads?

The final piece of the puzzle is threads. A thread is the unit of execution within a process.

A process can have anywhere from one thread to many.

When a process starts, it receives an assignment of memory and other computing resources. Each thread in the process shares that memory and resources. With single-threaded processes, the process contains one thread.

The difference between single thread and multi-thread processes.

In multi-threaded processes, the process contains more than one thread, and the process is accomplishing a number of things at the same time (to be more accurate, we should say “virtually” the same time—you can read more about that in the section below on concurrency).

Earlier, we talked about the stack and the heap, the two kinds of memory available to a thread or process. Distinguishing between these kinds of memory matters because each thread will have its own stack. However, all the threads in a process will share the heap.

Some people call threads lightweight processes because they have their own stack but can access shared data. Since threads share the same address space as the process and other threads within the process, it is easy to communicate between the threads. The disadvantage is that one malfunctioning thread in a process can impact the viability of the process itself.

How Threads and Processes Work Step By Step

Here’s what happens when you open an application on your computer.

  • The program starts out as a text file of programming code.
  • The program is compiled or interpreted into binary form.
  • The program is loaded into memory.
  • The program becomes one or more running processes. Processes are typically independent of one another.
  • Threads exist as the subset of a process.
  • Threads can communicate with each other more easily than processes can.
  • Threads are more vulnerable to problems caused by other threads in the same process.

Computer Process vs. Threads

Aspect Processes Threads
Definition Independent programs with their own memory space. Lightweight, smaller units of a process, share memory.
Creation Overhead Higher overhead due to separate memory space. Lower overhead as they share the same memory space.
Isolation Processes are isolated from each other. Threads share the same memory space.
Resource Allocation Each process has its own set of system resources. Threads share resources within the same process.
Independence Processes are more independent of each other. Threads are dependent on each other within a process.
Failure Impact A failure in one process does not directly affect others. A failure in one thread can affect others in the same process.
Sychronization Less need from synchronization, as processes are isolated. Requires careful synchronization due to shared resources.
Example Use Cases Running multiple independent applications. Multithreading within a single application for parallelism.
Memory Usage Typically consumes more memory. Consumes less memory compared to processes.

What About Concurrency and Parallelism?

A question you might ask is whether processes or threads can run at the same time. The answer is: it depends. In environments with multiple processors or CPU cores, simultaneous execution of multiple processes or threads is feasible. However, on a single processor system, true simultaneous execution isn’t possible. In these cases, a process scheduling algorithm is employed to share the CPU among running processes or threads, creating the illusion of parallel execution. Each task is allocated a “time slice,” and the swift switching between tasks occurs seamlessly, typically imperceptible to users. The terms “parallelism” (denoting genuine simultaneous execution) and “concurrency” (indicating the interleaving of processes over time to simulate simultaneous execution) distinguish between the two modes of operation, whether truly simultaneous or approximated.

How Google Chrome Uses Processes and Threads

To illustrate the impact of processes and threads, let’s consider a real-world example with a program that many of us use, Google Chrome. 

When Google designed the Chrome browser, they faced several important decisions. For instance, how should Chrome handle the fact that many different tasks often happen at the same time when using a browser? Every browser window (or tab) may communicate with several servers on the internet to download audio, video, text, and other resources. In addition, many users have 10 to 20 browser tabs (or more…) open most of the time, and each of these tabs may perform multiple tasks.

Google had to decide how to handle all of these tasks. They chose to run each browser window in Chrome as a separate process rather than a thread or many threads. That approach brought several benefits.

  • Running each window as a process protects the overall application from bugs and glitches.
  • Isolating a JavaScript program in a process prevents it from using too much CPU time and memory and making the entire browser unresponsive.

That said, there is a trade-off cost to Google’s design decision. Starting a new process for each browser window has a higher fixed cost in memory and resources compared to using threads. They were betting that their approach would end up with less memory bloat overall.

Using processes instead of threads provides better memory usage when memory is low. In practice, an inactive browser window is treated as a lower priority. That means the operating system may swap it to disk when memory is needed for other processes. If the windows were threaded, it would be more difficult to allocate memory efficiently which ultimately leads to lost computer performance.

For more insights on Google’s design decisions for Chrome on Google’s Chromium Blog or on the Chrome Introduction Comic.

The screen capture below shows the Google Chrome processes running on a MacBook Air with many tabs open. You can see that some Chrome processes are using a fair amount of CPU time and resources (e.g., the one at the top is using 44 threads) while others are using fewer.

A screen capture of the Mac Activity Monitor.
Mac Activity Monitor displaying Google Chrome threads.

The Activity Monitor on the Mac (or Task Manager in Windows) on your system can be a valuable ally in fine-tuning your computer or troubleshooting problems. If your computer is running slowly or a program or browser window isn’t responding for a while, you can check its status using the system monitor.

In some cases, you’ll see a process marked as “Not Responding.” Try quitting that process and see if your system runs better. If an application is a memory hog, you might consider choosing a different application that will accomplish the same task.

Made It This Far?

We hope this Tron-like dive into the fascinating world of computer programs, processes, and threads has cleared up some questions.

At the start, we promised clarity on using these terms to improve performance. You can use Activity Monitor on the Mac or Task Manager on Windows to close applications and processes that are malfunctioning. That’s beneficial because it means you can end a malfunctioning program without the hassle of turning off your computer.

Still have questions? We’d love to hear from you in the comments.

FAQ

1. What are computer programs?

Computer programs are sets of coded instructions written in programming languages to direct computers in performing specific tasks or functions. Ranging from simple scripts to complex applications, computer programs enable users to interact with and leverage the capabilities of computing devices.

2. What are computer processes?

Computer processes are instances of executing computer programs. They represent the active state of a running application or task. Each process operates independently, with its own memory space and system resources, ensuring isolation from other processes. Processes are managed by the operating system, and they facilitate multitasking and parallel execution. 

3. What are computer threads?

Computer threads are smaller units within computer processes, enabling parallel execution of tasks. Threads share the same memory space and resources within a process, allowing for more efficient communication and coordination. Unlike processes, threads operate in a cooperative manner, sharing data and context, making them suitable for tasks requiring simultaneous execution.

4. What’s the difference between computer processes and threads?

Computer processes are independent program instances with their own memory space and resources, operating in isolation. In contrast, threads are smaller units within processes that share the same memory, making communication easier but requiring careful synchronization. Processes are more independent, while threads enable concurrent execution and resource sharing within a process. The choice depends on the application’s requirements, balancing isolation with the benefits of parallelism and resource efficiency.

5. What’s the difference between concurrency and parallel processing?

Concurrency involves the execution of multiple tasks during overlapping time periods, enhancing system responsiveness. It doesn’t necessarily imply true simultaneous execution but rather the interleaving of processes to create an appearance of parallelism. Parallel processing, on the other hand, refers to the simultaneous execution of multiple tasks using multiple processors or cores, achieving genuine parallelism. Concurrency emphasizes efficient task management, while parallel processing focuses on concurrent tasks executing simultaneously for improved performance in tasks that can be divided into independent subtasks.

The post What’s the Diff: Programs, Processes, and Threads appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: 3-2-1 vs. 3-2-1-1-0 vs. 4-3-2

Post Syndicated from Natasha Rabinov original https://www.backblaze.com/blog/whats-the-diff-3-2-1-vs-3-2-1-1-0-vs-4-3-2/

When it comes to having a backup plan, Navy SEALs go by the rule that “Two is one and one is none.” They’re not often one-upped, but in the world of computer backup, even two is none. The gold standard until recently has been the 3-2-1 rule—three copies of your data on two different media with one copy stored off-site.

The 3-2-1 rule still has value, especially for individuals who aren’t backing up at all. But today, the gold standard is evolving. In this post, we’ll explain why 3-2-1 is being replaced by more comprehensive strategies; we’ll look at the difference between the 3-2-1 rule and emerging rules, including 3-2-1-1-0 and 4-3-2; and we’ll help you decide which is best for you.

Why Is the 3-2-1 Backup Strategy Falling Out of Favor?

When the 3-2-1 backup strategy gained prominence, the world looked a lot different than it does today, technology-wise. The rule is thought to have originated in the world of photography in Peter Krogh’s 2009 book, “The DAM Book: Digital Asset Management for Photographers.” At that time, tape backups were still widely used, especially at the enterprise level, due to their low cost, capacity, and longevity.

The 3-2-1 strategy improved upon existing practices of making one copy of your data on tape and keeping it off-site. It advised keeping three copies of your data (e.g., one primary copy and two backups) on two different media (e.g., the primary copy on an internal hard disk, a backup copy on tape, and an additional backup copy on an external HDD or tape) with one copy off-site (likely the tape backup).

Before cloud storage was widely available, getting the third copy off-site usually involved hiring a storage service to pick up and store the tape drives or physically driving them to an off-site location. (One of our co-founders used to mail a copy of his backup to his brother.) This meant off-site tape backups were “air-gapped” or physically separated from the network that stored the primary copy by a literal gap of air. In the event the primary copy or on-site backup became corrupted or compromised, the off-site backup could be used for a restore.

As storage technology has evolved, the 3-2-1 backup strategy has gotten a little…cloudy. A company might employ a NAS device or SAN to store backups on-site, which is then backed up to object storage in the cloud. An individual might employ a 3-2-1 strategy by backing up their computer to an external hard drive as well as the cloud.

While a 3-2-1 strategy with off-site copies stored in the cloud works well for events like a natural disaster or accidental deletion, it lost the air gap protection that tape provided. Cloud backups are sometimes connected to production networks and thus vulnerable to a digital attack.

Ransomware: The Driver for Stronger Backup Strategies

With as many high-profile ransomware incidents as the past few months have seen, it shouldn’t be news to anyone that ransomware is on the rise. Ransom demands hit an all-time high of $50 million in 2021 so far, and attacks like the ones on Colonial Pipeline and JBS Foods threatened gas and food supply supply chains. In their 2021 report, “Detect, Protect, Recover: How Modern Backup Applications Can Protect You From Ransomware,” Gartner predicted that at least 75% of IT organizations will face one or more attacks by 2025.

Backups are meant to be a company’s saving grace in the event of a ransomware attack, but they only work if they’re not compromised. And hackers know this. Ransomware operators like Sodinokibi, the outfit responsible for attacks on JBS Foods, Acer, and Quanta, are now going after backups in addition to production data.

Cloud backups are sometimes tied to a company’s active directory, and they’re often not virtually isolated from a company’s production network. Once hackers compromise a machine connected to the network, they spread laterally through the network attempting to gain access to admin credentials using tools like keyloggers, phishing attacks, or by reading documentation stored on servers. With admin credentials, they can extract all of the credentials from the active directory and use that information to access backups if they’re configured to authenticate through the active directory.

Is a 3-2-1 Backup Strategy Still Viable?

As emerging technology has changed the way backup strategies are implemented, the core principles of a 3-2-1 backup strategy still hold up:

  • You should have multiple copies of your data.
  • Copies should be geographically distanced.
  • One or more copies should be readily accessible for quick recoveries in the event of a physical disaster or accidental deletion.

But, they need to account for an additional layer of protection: One or more copies should be physically or virtually isolated in the event of a digital disaster like ransomware that targets all of their data, including backups.

What Backup Strategies Are Replacing 3-2-1?

A 3-2-1 backup strategy is still viable, but more extensive, comprehensive strategies exist that make up for the vulnerabilities introduced by connectivity. While not as catchy as 3-2-1, strategies like 3-2-1-1-0 and 4-3-2 offer more protection in the era of cloud backups and ransomware.

What Is 3-2-1-1-0?

A 3-2-1-1-0 strategy stipulates that you:

  • Maintain at least three copies of business data.
  • Store data on at least two different types of storage media.
  • Keep one copy of the backups in an off-site location.
  • Keep one copy of the media offline or air gapped.
  • Ensure all recoverability solutions have zero errors.

The 3-2-1-1-0 method reintroduced the idea of an offline or air gapped copy—either tape backups stored off-site as originally intended in 3-2-1, or cloud backups stored with immutability, meaning the data cannot be modified or changed.

If your company uses a backup software provider like Veeam, storing cloud backups with immutability can be accomplished by using Object Lock. Object Lock is a powerful backup protection tool that prevents a file from being altered or deleted until a given date. Only a few storage platforms currently offer the feature, but if your provider is one of them, you can enable Object Lock and specify the length of time an object should be locked in the storage provider’s user interface or by using API calls.

When Object Lock is set on data, any attempts to manipulate, encrypt, change, or delete the file will fail during that time. The files may be accessed, but no one can change them, including the file owner or whoever set the Object Lock and—most importantly—any hacker that happens upon the credentials of that person.

The 3-2-1-1-0 strategy goes a step further to require that backups are stored with zero errors. This includes data monitoring on a daily basis, correcting for any errors as soon as they’re identified, and regularly performing restore tests.

A strategy like 3-2-1-1-0 offers the protection of air gapped backups with the added fidelity of more rigorous monitoring and testing.

What Is 4-3-2?

If your data is being managed by a disaster recovery expert like Continuity Centers, for example, your backups may be subscribing to the 4-3-2 rule:

  • Four copies of your data.
  • Data in three locations (on-prem with you, on-prem with an MSP like Continuity Centers, and stored with a cloud provider).
  • Two locations for your data are off-site.

Continuity Centers’ CEO, Greg Tellone, explained the benefits of this strategy in a session with Backblaze’s VP of Sales, Nilay Patel, at VeeamON 2021, Veeam’s annual conference. A 4-3-2 strategy means backups are duplicated and geographically distant to offer protection from events like natural disasters. Backups are also stored on two separate networks, isolating them from production networks in the event they’re compromised. Finally, backup copies are stored with immutability, protecting them from deletion or encryption should a hacker gain access to systems.

Which Backup Strategy Is Right for You?

First, any backup strategy is better than no backup strategy. As long as it meets the core principles of 3-2-1 backup, you can still get your data back in the event of a natural disaster, a lost laptop, or an accidental deletion. To summarize, that means:

  • Keeping multiple copies of your data—at least three.
  • Storing copies of your data in geographically separate locations.
  • Keeping at least one copy on-site for quick recoveries.

With tools like Object Lock, you can apply the principles of 3-2-1-1-0 or 4-3-2, giving your data an additional layer of protection by virtually isolating it so it can’t be deleted or encrypted for a specific time. In the unfortunate event that you are attacked by ransomware, backups protected with Object Lock allow you to recover.

For more information on how you can protect your company from ransomware, check out our guide to recovering from and preventing a ransomware attack.

The post What’s the Diff: 3-2-1 vs. 3-2-1-1-0 vs. 4-3-2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.