|
| |||
|
Animals
Birds
Places |
Creating scaleable Web sites using Microsoft technologies(An article I originally wrote for Windows 2000 Experts Journal near the end of 2000--figures are still missing) · Who will benefit from this article? NT administrators, Web managers and systems architects looking to create highly available and/or high performance Web and e-commerce sites. · What you’ll learn: How to design your Web site architecture to meet your performance and reliability demands. · Bottom line: Your Web and e-commerce applications are in such demand that even a dedicated Windows 2000 server isn’t sufficient. In this article Northrup tells how to leverage redundancy of the strategies to give you the reliability and performance you require. In the past five years, Web sites have evolved from simple online brochures into a critical part of a company’s business. For dot-com companies, the Web site is the company’s entire presence! To support this constantly increasing level of importance, Web applications have become very complex. Creating a modern site requires much more than an understanding of HTML; modern Web applications consist of multiple levels and use several different communication protocols. In this article, we’ll explore the factors involved in creating systems architectures for complex Web sites. It’s impossible to describe every technical detail of the different services involved, so I’ll focus on building a conceptual understanding of multi-tiered architectures. You can use these concepts to design sites that use any technology—even those hosting on UNIX platforms. However, when I dive into detailed examples, I’ll use Microsoft’s currently released set of Web services for Windows 2000 Server. Understanding your reliability requirementsThe first step in creating an architecture is determining what level of reliability you need. Most people’s instinct is that they require complete reliability. Even if a completely reliable Web architecture could be created (which it can’t!), it would be cost prohibitive. As shown in Figure 1, cost increases exponentially as reliability increases. With that said, a financial analyst must determine what it will cost the company when the site is unexpectedly offline. A systems architect will use that figure to create an architecture that meets both the budget and reliability requirements of the company. Reliability is not the same as redundancy, but they are related. Redundancy is the most common method used to improve the reliability of a system. For example, hard drives are a common point of a failure in a computer—to avoid downtime as the result of such a failure, hard drives can be made redundant using RAID. If one drive fails, the RAID volume will continue to function, and the Web site will continue running. Redundancy tips and techniquesHard drives are only one component of a Web site, and configuring your servers with RAID does not make the site completely redundant. Indeed, it may not even significantly impact the reliability of the site. When considering redundancy as part of meeting reliability requirements, consider every component, including: · Internal server components: disks, disk controllers, memory, CPUs, power supplies, fans and network cards · Entire servers (ie., clustering) · Network hardware: cables, hubs, switches and routers · Internet connectivity: Leased lines and peering points · Geography If downtime is extraordinarily expensive (if, for example, the site you’re architecting is a large-scale e-commerce site), you should attempt to provide redundancy for every component listed above. For those of us with a limited budget, allocate the funds you have towards providing redundancy for those components that are less expensive and most likely to fail. Providing RAID is an obvious choice, because hard drives are the component most likely to fail, and because the cost is minimal. Providing geographic redundancy to reduce downtime in the event of a natural disaster in the Web site’s hosting center can be very expensive, and not justified for most purposes. Hardware is only a small part of the reliability issueWhile it is possible to make every component in a Web site redundant, this in itself does not provide complete reliability. In my experience, hardware failure accounts for less than 5% of a Web site’s unexpected downtime—even if the computers and networks lack redundancy. The bulk of downtime is caused by software failures. The massive number of patches released by Microsoft for Windows NT 4.0, IIS 4.0 and Site Server support this claim: even with complete hardware redundancy, flaws in the operating system and other applications will cause unplanned downtime. Windows 2000 has already proven far more reliable, but with the vast number of new technologies included, it is bound to have serious flaws. But you can’t blame just Microsoft—Web developers make mistakes, too. So, how does a systems architect decrease downtime caused by software? The easiest answer is to provide redundancy at a system level using services like clustering or Network Load Balancing Service (NLBS)—more about these topics later in this article. The more difficult, but more effective, method is to institute solid development, testing and staging procedures. Testing a new COM+ object by issuing tens of thousands of requests to it in a staging environment will identify the majority of flaws in the code. It is much better to find your bugs in a testing environment instead of during the first day of your company’s new advertising campaign. So, while it may not be the responsibility of the systems architect to institute coding procedures, the systems architect should design development and staging environments that carefully mirror the production environment to limit the downtime caused by software problems. Reliability checklist· Server redundancy · Network redundancy · Solid development methodologies · Thorough testing of all new code · Recovery procedures How to evaluate your performance requirementsEvaluating the performance requirements of a site is, to a large extent, a subset of evaluating reliability requirements. If a Web site becomes so busy that the browser returns an error to the user, the site is effectively offline. As a systems architect, you need to gather several pieces of information before you can properly size a Web architecture. Specifically, you should know: 1) The traffic expectations at peak time including the number of users and hits per user. 2) The nature of these requests: are most pages HTML or Active Server Pages (ASPs), and how many images per page? 3) When a user sends a request, does it kick-off a series of other requests? For example, it is common for an ASP page to call upon custom COM+ objects that, in turn, make a request to a Microsoft SQL 2000 server. Understanding the complexity of the Web application, and the resources required by each different service, is a critical part of properly sizing a Web architecture.I wish I could provide you with a formula that crunched the above information and provided you with a list of CPU and memory requirements. Such a formula could never exist because the site-specific code varies too much. An incoming request that triggers a database query could take milliseconds to perform—or it could take several hours, depending on the nature of the request. Even if a systems architect knew every piece of code that went into building a Web site, it’s impossible to predict which aspects of the site will be most popular with end-users. If you’re re-designing a site that is already in production, you can very accurately predict future hardware needs by measuring the performance of the existing hardware. Tools like the Performance administrative tool allow you to measure the utilization of the various resources used by the Web site. If Marketing predicts site traffic will triple, and your performance analysis indicates that the site is bottlenecked by the database server’s processing capabilities, then the new architecture you create should accommodate either more CPUs or multiple database servers. If you’re designing a brand-new site, specifying the hardware needed to meet the performance requirements will involve a great deal of speculation. The only effective way to change your wild guess into an educated guess is by load testing the Web application. Tools like InetLoad, Web Capacity Analysis Tool (WCAT), and WebBench simulate thousands of users visiting the Web site in a development environment. By simulating a large number of users, you can determine, with reasonable accuracy, the hardware requirements of the services that compose the Web application. For more information on load testing tools useful for Microsoft-based sites, visit: http://support.microsoft.com/support/DNA/Bundles/QA/loadtest.asp. Dealing with multi-tier Web sitesComplex Web sites use multiple logical tiers, even if the entire site operates on a single physical server. Incoming requests from browsers are received by the web service, typically IIS 5.0. IIS interprets the request, and may spawn requests to other services in order to gather or update information. As shown in Figure 2, a common multi-tiered architecture involves IIS 5.0 communicating with Microsoft Transaction Server, which in turn communicates with Microsoft SQL Server 2000. Identifying the multiple tiers that compose a Web site critical for scaling the site beyond a single server. Multi-tiered architectures allow for a great deal of flexibility when creating the systems architecture. Because there is a distinct division between different layers, each tier can be placed onto distinct servers. Different types of hardware can be used for each tier, depending on the requirements of that specific service. If either performance or reliability requirements for the site demand more than one box, you will need to determine how each tier will be scaled. This can be a complicated task, because tiers scale differently. Further, dividing tiers between different physical servers can positively or negatively affect performance. Let’s take a look at each tier individually, and discuss the best ways to scale them. Scalability options for your Web siteWeb services can scale either vertically or horizontally. Scaling a service vertically means using a single physical server, but pumping it up with more processing power, memory, or disks. Scaling horizontally involves using multiple physical servers that each perform the same requests. For example, if you have a Web service that currently overworks a single processor server and need to scale it to 400% of its current capacity, you could either: · scale vertically, by upgrading the Web server to a four-processor system, or · scale horizontally, by adding three additional one-processor systems. Either method would meet the new performance demand, and each has its advantages. Scaling vertically is the more simple of the two methods, and ongoing administration time is significantly reduced because there are fewer systems to manage. Scaling horizontally involves significantly more ongoing effort, but has the advantage of providing both improved performance and system-level redundancy, if configured correctly. As shown in Figure 3, scaling Web services horizontally introduces a whole new tier: load distribution. If four Web servers have identical content, there must exist a mechanism to transparently direct incoming requests from users to one of these four servers. Typically, the load distribution mechanism includes the intelligence to detect the failure of a Web server and redirect traffic to a functioning server. Several different vendors offer load distribution products, including Microsoft. Microsoft’s answer is Network Load Balancing Service (NLBS), a component included with Windows 2000 Advanced Server. By using Windows 2000 Advanced Server as the operating system and enable NLBS, the Web servers will negotiate between each other to determine which system will respond to an incoming request. This particular mechanism requires no additional hardware, but has the limitation of only working with Windows-based Web servers. Cisco offer the Local Director, a separate piece of network hardware that operates at OSI Layers 2 – 4 to transparently direct incoming requests between multiple Web servers. This is one of the most common solutions, and has proved its reliability over the course of several years. Other similar network devices include F5’s BigIP and ArrowPoint’s Content Smart Web switches—the latter of which was recently acquired by Cisco. Keep transaction services on the same serverTransaction services are commonly used in complex Web sites. These services act as middleware, facilitating communications between Web, database and messaging services. For example, if a user places an order on an e-commerce Web site, the Web services initiate a transaction with Microsoft Transaction Server (MTS). MTS, in turn, updates the database with the customer’s order, communicates with an agency to validate the customer’s credit card, sends a message to the customer via Microsoft Exchange server, and then returns a positive response to the Web server. MTS is a separate tier, and as such, can be implemented on dedicated servers. Tip: However, the most sensible architecture is to place transaction services on the same physical servers as the Web services. While it is very possible to have separate physical servers dedicated to MTS, performance tends to decrease because of the additional overhead of network communications between the Web and transaction servers. Further, there is currently no transparent method to distribute requests from Web servers to multiple transaction servers—so the Web developer needs to provide that functionality in their code. By placing MTS on every Web server, developers receive the benefits of MTS without incurring unnecessary overhead. Microsoft’s AppCenter server promises to provide a great deal more flexibility when dividing services between physical servers. Until that time, I recommend keeping transaction services on the same systems as your Web servers, and scaling either vertically or horizontally as your needs dictate. Scale database and messaging services verticallyIn the Microsoft Web hosting arena, Microsoft SQL Server is by far the most commonly used database server. Oracle, Sybase and DB2 each make appearances, but the majority of sites choose Microsoft technologies whenever possible. Regardless of the database software used in the site you are architecting, database services scale similarly. Microsoft Exchange Server (both versions 5.5 and 2000) offer identical scaling possibilities to those described for Microsoft SQL Server. Transaction services can do anythingAlmost all complex Web sites use transaction services to talk to database and messaging services. It doesn’t stop there, though. Transaction services allow almost any event to be initiated without putting the burden on a Web server. So, a mutual fund company could use transaction services to interface between the Web site and the mainframe-based financial records of their customers. An Application Service Provider could use transactions to create help tickets for users. A bank’s Web site could even use transactions to fax an online mortgage applications to a real flesh-and-blood analysts! Tip: Unless you have truly massive requirements for database performance, I recommend scaling the database vertically. Windows NT 4.0 and Windows 2000 Advanced Server each allow for up to eight processors and four gigabytes of RAM; more than enough performance for the vast majority of Web uses. Windows 2000 Datacenter Server will provide even higher vertical scalability, though no vendors have released products based on the software as of this writing. If having a homogenous Windows environment isn’t a requirement, several different UNIX vendors can create massive 32-processor and greater database servers. If your Web site relies upon your database to function (most do), then you must also consider reliability. If you can’t tolerate downtime because the motherboard in your database server fails, you will need to find a way to make the entire server redundant. Today, the preferred method is Microsoft Cluster Server (MSCS), a component of Windows 2000 Advanced Server that requires very specific hardware configurations. Clustering involves attaching two servers (one primary and one backup server) to a single external RAID array. If the primary database server fails, the backup database server takes ownership of the RAID array, claims the name and IP address of the primary database server, and begins serving database queries. So, if you have a catastrophic hardware failure, your Web site is offline for seconds—not hours. If an eight-processor database cluster still isn’t sufficient to meet your performance needs, you do have several options for scaling horizontally. Oracle Parallel Server (OPS) is an ingenious product that allows multiple servers to serve database requests simultaneously. It functions similarly to Web services that are scaled horizontally and provides for the automatic removal of failed systems. Both the hardware and software are very expensive, though. Part of the promise of Windows 2000 Datacenter Server is to provide horizontal scalability for Microsoft SQL 2000 Enterprise Edition, but I’ll wait until the product is released to discuss it. For more information on OPS, go here: http://www.oracle.com/database/options/parallel/. For more information Windows 2000 Datacenter server, go here: http://www.microsoft.com/windows2000/guide/datacenter/overview/. The physical architecture question: flat vs. tiered?At this point, we’ve discussed analyzing the services used by a Web site, determining the performance and reliability requirements for each service, and meeting those requirements by scaling individual services either horizontally or vertically. You have enough information to draw a logical diagram of the interactions between the services. In this section, we’ll build on that information to create a physical diagram that includes network connectivity. Web hosting physical architectures are said to be either flat or tiered. Flat architectures have every Web, transaction, and database server connected to the same networks. Tiered architectures isolate communications to the transaction services (if separated from the Web services) and the database services by connecting them only as required for inter-server communication. If that’s not clear, don’t worry, we’ve got more drawings. A flat architecture, as shown in Figure 4, is very simple to implement. Regardless of how you divide services between physical systems, all servers are connected to a single network. This same network can be used for external traffic, inter-server traffic and management. The primary benefit of this architecture is simplicity: as the requirements of your site change, you can move services between physical systems without moving network connections. Because all systems are connected to the same network, you access each of them identically, making the site easy to manage. A tiered architecture mirrors a logical architecture (for example, as shown in Figure 2). Separate networks are created for communications between different layers in the logical diagram. As Figure 5 shows, transaction servers are connected to a private network that allows communication only with the Web servers. Because transaction servers must also communicate with database and messaging services, yet another tier is added to the architecture. The tiered architecture offers several advantages over the flat architecture. Primarily, it enforces the logical architecture by physical separating systems that do not need to communicate. In other words, users on the Internet could never communicate directly with the database servers. If the Web and transaction services are on separate physical systems, even the Web servers themselves cannot speak directly to the database servers. This improves security by limiting the site’s exposure to the Internet. It can also improve performance, because communications between the transaction and database servers do not use the same bandwidth as communications with end-users, reducing the likelihood that the network will become a bottleneck. The difference in monetary costs between the two types of architectures is minimal, since both architectures use the same network and server hardware. However, there is a hidden cost introduced with the additional complexity. If you manage your servers using network tools like Terminal Services, you will need to connect all the systems to a separate management network. Of course, this isn’t a disadvantage if you manage your systems by physically walking to the keyboard. Final thoughtsCreating a systems architecture for a complex Web site requires an understanding of the specific performance and reliability requirements of the site. Both factors are critical to everyone, but most become willing to compromise once they discover the cost of deploying load-balanced Web servers and clustered databases. If you have the time, it is an interesting study to diagram several different architectures for your Web site. In each, use different techniques from those described above, and compare the overall cost. You may find yourself overwhelmed with the price of an eight-processor SQL Server 2000 cluster, or you may realize that the cost is minimal compared to the business you lose when your site is offline. Modern Web applications provide for a great deal of flexibility in the system architecture because they use a tiered structure. Because Web, transaction, database and messaging services are implemented in distinct processes, you have a great deal of flexibility in creating the physical architecture. Though I used Microsoft technologies as examples in this article, these rules apply to almost every Web platform, including Microsoft Site Sever, HP BroadVision, Lotus Domino, and IBM WebSphere. For architectures that rely on Microsoft technologies from top-to-bottom, it is advisable to implement Microsoft Transaction Server on the same physical systems as your Web servers. However, this is not required—transaction services can be completely abstracted. In fact, this architecture is so flexible that all services can be implemented on a single system or split across dozens of systems. | |||
|