Learn server development and operation method on Amazon EC2 with ARMORED CORE V infrastructure



CEDEC 2012soFrom softwareofTakashi EbaThinks "Case examples of using cloud server in ARMORED CORE V online service"I gave a lecture on problems and specific solutions to be solved when operating a game server on Amazon EC 2.

Erika Takashi:
We will do it with the title "Cloud server utilization example in ARMORED CORE V online service", so please take care.


First of all let me introduce myself. I joined from from software in 2002 and basically worked on developing libraries, building development environments and back end support, but in 2007Meet-meFrom around the time I was involved in the development of an on-line service, and about five years I was doing things concerning online games. I will tell you this timeARMORED CORE VI used the cloud service for the first time.


First, we will briefly explain the contents of this session. This session is based on the development experience with the online title "ARMORED CORE V" of consumer game machines. The ARMORED CORE series is originally a title that was on the consumer machine, but as I develop ARMORED CORE V, there is a story that I want to make the element called online the main, there is no help for it, so I I was in charge of that person. Among them, it told us to use cloud services as infrastructure. I will mainly talk about using consumer machines, online, cloud services in particular this time.


ARMORED CORE V is released on PS3 and Xbox 360. This time, I will explain about the consumer machine based on these two platforms. Of course, depending on the content, I think there are things that can be applied on other platforms. Also, there are a few points to note. Although the word online title often comes up, the definition of the word online title used here is,Client serverIt is a network system of type.


I used a matching system pre-loaded on the platformP2PThe connection-only title is not included. In this session, several examples, problems and solutions are cited as an example, but this is only an example and it is not the best method, so I would like to be aware. Of course, depending on the title, there is a possibility that the method itself is not appropriate, so please be aware of that as well. As a cloud serviceIaaSI basically talked about it, I used it with ARMORED CORE VAmazon EC 2It is based on. There are other cloud services, but I will talk about it based on Amazon EC 2.


I will proceed in this way as an agenda. First of all, I will talk about the point of caution in developing online titles, what is online rather than the cloud, and where we should pay attention to the online title of the consumer machine. Next, attention to infrastructure. And about server infrastructure. Finally, I would like to organize the problems that came from the points of caution, deal with concrete countermeasures and deepened problems, and to consider and summarize.


It is a caveat of online title development for consumer aircraft.


First of all, the point that I have to worry about is the specificity of the consumer platform. Since policies are different for each platform, there will be special characteristics accordingly. It is impossible to avoid this specialty. There are two things to worry about among them. The first is security. Specifically speaking, communication protocol and user authentication.


The second one has QA check unique to the consumer machine. The QA check is carried out on platformers such as Microsoft and Sony and checks whether it follows the rules to protect rather than having bugs in the game. The game will be released only after passing this QA check. I will keep this point as a special feature. As for what it is, in the case of PC, there is nothing similar to the QA check, but in the case of consumer aircraft, you must always pass, so be careful when updating batches or titles not. It means that we can not issue patches with patches.


Next, the difference with PC becomes main. There is a difference in hardware specifications. There is a difference between the CPU and the memory, but it is not a performance but a structural difference.EndianAs for the memory capacity, etc., the capacity of the memory of the consumer machine is small. There are restrictions on available libraries. For example, even if there is a license-free open source library, it is basically designed to run on such a common platform such as Windows, Linux, etc., on a consumer machine It is not supposed to work with. When trying to move on the consumer machine,PortingIt is necessary work. There is nothing I can not do about this, but some of them use special hardware-specific items or some that use OS-dependent system calls, so if you divert it, It will take. However, if you are developing multi-platform titles, it is a common talk that you do not care much about.


In doing online service, there is something that must be considered, management. First of all, you have to think firmly about product support. Consumer machine users are slightly different from users of PC games, there are quite a few people who are not technically familiar with the story of networking stuff and we offer services after firmly considering how to support those people. Also, since troubles always occur, it is necessary to consider beforehand how to deal with problems when online trouble occurs.


Convenience of platform side online service. this is,PlayStation NetworkYaXbox LiveAs maintenance naturally occurs, it is what to do when platform service stops. Basically, I can not stop it, but I need to worry about stopping with timing.


And about monitoring service operation status. This has nothing to do with consumers, but when doing an online service, it monitors the infrastructure that monitors whether the service is working properly and whether the process of the server program is dead.


On infrastructure that can not be separated from online service.


What is required of the server infrastructure? We will just organize on this point. Generally, in the case of a client / server type network, as the number of clients increases, the load on the server increases. The server load simply refers to CPU load, memory usage, bandwidth usage. If the software suddenly sells the load, the load suddenly increases, so it will be necessary to raise the server in a hurry. When you try to manage infrastructure by themselves with just hardware alone, you have to prepare with a margin in advance. Since it may take a month to order the hardware and be ready.


And, I'd like to grasp to some extent how accurately the number of clients will increase. Actual title sales number is difficult to forecast in advance, real reservation by wholesaler entered, it will be hard to see real numbers unless it is about a week before the release date. It is quite difficult to predict the number of sales of titles. It is even more difficult if this is Freemium, the basic free title, so we can not consider how much the user will be connecting. When considering the number of clients to connect, the important thing to watch out for is how to sell.


As for how to sell, how to sell the package title is unique. How to sell is greatly different depending on the sales area.


For example, suppose you have this graph. As you can see, how to sell in Japan and the West is quite different. Since the vertical axis is the number of sales per day, it represents the growth rate of the number of users. Therefore, in Japan, the number of concurrent connections increases rapidly shortly after release. To our disadvantage, this way of selling is only in Japan. It is necessary to presume how to sell special in advance.


About product life. Since the contents of the package title do not change, users will be decreased as long as there is no big update. Active users will suddenly increase rapidly immediately after release and gradually decrease. Ultimately it will settle down close to 0. We need to prepare the infrastructure after understanding this product life and how to sell.


Based on what we have talked about, we will consider server infrastructure options. Originally From Software has experience of online software development, but it was my first time to deploy a service with server infrastructure at home so do not think about how to do infrastructure in developing ARMORED CORE V Was not it. To make the story easyOn-premiseAnd cloud services.


In the case of on-premises, thinking about the problem is the initial cost that does not become a fool of hardware, cables, switches, etc. Having hardware will incur repair costs when broken or the monthly maintenance cost placed in the data center. Although cost is expensive, since the network part can be designed freely, when you want to do special things, you can design accordingly. Although there are merits and demerits, human resources specialized in infrastructure to design the design part is the biggest bottleneck.


In the case of the cloud service, it is possible to omit wasteful costs because it is pay-as-you-go charged that you pay only for the amount you use. There is also a merit that you do not have to worry about maintenance. However, it is not only good things but restrictions based on the mechanism of infrastructure will definitely come out. This time I will talk about IaaS.


For example, Amazon EC 2 (IaaS) restriction. The first one is the service level. For each supplier of IaaSSLA, And it is declared that the annual occupancy rate of Amazon EC 2 is targeted at 99.95%. When deploying services on one IaaS system, service levels beyond this are obviously impossible. In order to provide a higher service level, it must be distributed service. If you use a merchant with a good SLA value, I think that you can raise the service level higher than making an onepreme system that is stingy poorly. After all, trying to raise the quality may be costly in consideration of the redundancy of the network and equipment, etc. In some cases, it may be impossible to do so, perhaps it is not limited by the situation. As a big story, IaaS is basically provided in a virtual environment in many cases. In the case of Amazon EC 2, you will use the guest OS provided in the virtual environment using Xen. Basically you can only do what you can do with Xen's guest OS. What are the restrictions? There is one virtual NIC, and the IP address is dynamically assigned to that NIC. The global IP address assigned for access from outside is mapped by NAT. The global IP address itself to be mapped can be reserved.


We will arrange the problems so far. Roughly citing it looks like this. Of these, differences in hardware specifications and limitations on available libraries are commonplace if they are taking countermeasures against multiple platforms, so it is not that difficult, so we will skip this time. Regarding service administration as well, it is certain that it is online as long as it does not matter whether it is a consumer or a cloud, so I will also omit this.


These three elements are handled this time.


About concrete countermeasures.


First, about security correspondence for each platform.


The important point for security correspondence is how to communicate protocols and user authentication. Also, this time we will talk about Xbox 360 only.


Regarding the communication protocol of Xbox 360, Xbox 360 must first use XSP (Xbox Secure Protocol) as a communication protocol. What is XSP, IP is on Ethernet, XSP is on the next UDP. In terms of the application layer, the existence itself of XSP itself is not necessary to be conscious of much, but communication under arbitrary and arbitrary communication is under way. Xbox 360 will always use XSP when communicating with each other.


Regarding user authentication, in the case of Xbox 360, XLSP (Xbox Live Server Platform) is prepared when authenticating users, so using this requires no concern for user authentication. XLSP is a mechanism for enabling XSP to be used on its own server, and since this protocol is not originally open, it is impossible to use XSP directly by our own server. Instead, XLSP is put in between, making it a platform to realize secure communication.


In summary, using XSP is essential for Xbox 360, but you do not have to worry about security aspects by using it. There is a merit that you do not have to implement security. Because game developers are not security experts, it is a very wonderful system as it is very difficult to do security and it is possible to cause a problem, but unfortunately due to limitations on cloud services I will get caught. When using the cloud with the title of Xbox 360, network services can not be realized unless you move XLSP on the cloud. While explaining XLSP how to move it, we will talk in order.


XLSP refers to the two server parts surrounded by the red dotted line in the middle. It is a program that runs on Windows, but it consists of two modules: Web Proxy and Security Gateway. You can run it as a service in Windows program, but basically it runs on Windows Server. It is located between Internet and Data Center due to network configuration. There are two NICs on the Internet side and the data center side for Web Proxy and Security Gateway, assigning private IP and global IP, respectively.


Regarding movement, Web Proxy works as Proxy as its name, but basically it can communicate with Xbox Live Server. Security Gateway itself can not communicate directly with Xbox Live Server, but it is a mechanism to communicate via Web Proxy. The Security Gateway is a node connected directly from the client, but as soon as the number of clients increases, it is possible to scale out. Scaling out increases the IP address and the number of connected endpoints. As for how to do that point, when starting Security Gateway, it registers its own global IP address with Xbox Live Server via Web Proxy. The client takes the registered global IP address and connects to the Security Gateway using the acquired information. For the Security Gateway, the packet from the flying client is decoded and authenticated by the user, after which the packet is decrypted and sent to the cloud proprietary server. The important thing here is how the Security Gateway works.


Security Gateway is literally playing a role as a gateway. Basically it does protocol conversion. It decodes the packet of XSP that flew from Xbox 360 and converts it to raw packet of UDP / TCP. Then convert the raw packet that flew from the backside server into XSP and return it. Since we do protocol conversion, we are rewriting the header and rewriting the destination and the sender. When a packet first comes from the client, if it can confirm that the user authenticates the user and logs in correctly, it decodes the packet and sends it to the back. Unauthenticated packets are rejected. It is completely separate from the Windows IP stack, and it runs completely independently of the network setting on the OS side.


Suppose you have client 1, client 2, SG (Security Gateway) and your own server. Assume that IP address and port number are set with user authentication completed. First of all, packets fly from client 1. When a packet conforming to XSP flies to the Security Gateway, the Security Gateway decrypts this packet back to the raw UDP packet, rewrites the header, and throws it to its own server. Similarly to client 2 it also decrypts it to its own server, but the port information of From becomes 1001 and it can recognize that it is a packet coming from another person. The point is that the Security Gateway links the global IP address on the Internet side and the private IP address on the Data Center side. Regarding global IP and private IP, the Security Gateway itself assigns the range of IP specified for the NIC. It is here that it is separate from the Windows IP stack. The Security Gateway behaves like a driver, setting up an IP address without permission and picking up packets that fly there without permission. Whatever Windows does, I will intercept or throw the packet without any relation. Even if you kill the Windowns network, it will run on the Security Gateway. This is a neck part, XLSP operation conditions come out.


The first operating condition of XLSP requires two NICs that can be addressed freely. Also, you need a private IP address space that you can use freely. And we also need a free global IP address. This operating condition is stuck with Amazon EC 2 limit. Amazon EC 2 has only one NIC. Moreover, IP address can not be freely assigned to the virtual NIC. Therefore, the Security Gateway becomes a network that can not be controlled.


If you do not manage this part, you can not move it on Amazon EC2. So, how do you move it, first consider what you need for XLSP. The fact that two NICs are necessary is incorrect, and because the Security Gateway stands at the straddling position between the Internet side and the Data Center side, communication between the Data Center side and the Internet side is not a secure communication, so packets It is supposed to be prepared because there is an endeavor to shut down the network properly physically so that it will not be seen. In the meaning of moving, even one NIC will move. In summary, what you need for XLSP will be OK if you have three freely available private IP address space, global IP address, and NIC that can assign addresses freely. Eventually I can not do it with Amazon EC 2. In short, because we wanted to make network settings freely, we were able to speculate that if we could use a virtual network to do something.


So, I tried putting VPN on Amazon EC 2. Amazon EC 2 originally has a VPN service, you can build a virtual network if you use it, Amazon EC 2 VPN service can not set the IP address from the application. You can set the IP address from the management console, but since you can not customize it yourself, I tried using OpenVPN. It's complete software so it will work on Amazon EC2. Since OpenVPN has a virtual interface (TAP), its TAP can be used to set the IP address from the application. So one problem can be solved. However, since VPN is realized by software, the overhead is large. Although overhead is big, I seem to be able to manage it, I tried it.


Suppose there are three Web Proxy, Security Gateway and proprietary server here. The address is arbitrarily assigned on Amazon EC 2 side. Expanding OpenVPN on top of this allows the TAP interface to allocate private addresses of 192.168.x.x. Since it can be set freely on the network part, by using the TAP interface for the Security Gateway, we can satisfy the requirements of XLSP, so I started running XLSP.


However, although XLSP works, the Security Gateway only sees the TAP interface. Therefore, since the Internet and VPN are completely blocked, packets will not arrive from Xbox 360. How to make packets arrive at the Security Gateway is that you need to connect the Internet side and the VPN side, so relay it with a packet repeater.


When the packet repeater receives the packet, if it throws it to the Security Gateway via the TAP interface, it flows cleanly from Xbox 360 to its own server. You can do XLSP on Amazon EC2 by doing so far. In ARMORED CORE V, this type is in operation. If they overlap by this amount, unnecessary throughput and overhead are generated, but they are supposed to be inevitable. Requirements will change depending on what kind of service you offer on this, but in the case of ARMORED CORE V, the bandwidth is not so heavy so I decided to move in this way.


Next about QA check correspondence.


It is a barrier which must always pass before releasing the product. Since it is a check for releasing the product, basically the QA check is the same as the product version. In the product, the connection destination that is set is the same as the product. You must also apply QA checks when products are released and then patches for function addition or defect fixes are applied. When QA check is done after release, there is a problem that the QA check connects to the production server because the connection destination is the same as the actual connection. Therefore, you must control the connection destination.


What I mean is that if you get information from a web server via HTTP, there is a talk about what to do with the access URL. Whether to embed the hard code in the source code or to acquire it dynamically in some way. Since it is necessary to have access to a dedicated Web server prepared for QA check, it is necessary to properly perform the steering of that part.


The story is easy, just transfer the packet with NAT. This is not the only thing, but the quick way to do it on the server side is for packet forwarding. Set up iptables at the entrance of the production server, prepare a dedicated server for QA check, forward the packet that flew from the QA team to the QA server, and forward the returned packet to the QA team Packet forwarding can be done well. Then the user 's packet flies to the production server and the QA team' s packet flies to the QA server. If it is a simple connection it can be done like this.


Finally it is about large fluctuations in server load.


In short, it will be about load balancing. First of all, creating an instance on Amazon EC 2 will create an IP address like this.


Looking at DNS from outside EC 2, this host name will be replaced with global IP address. Referencing from within will replace with local IP address. The global IP address is completely disappeared from Amazon EC2. It is designed to handle only the local IP address completely.


Private IP is used only for internal use, global IP address is used for external EC2 connection. When using the host name, you can connect it properly whenever you use the host name of Public DNS. However, because the instance of EC2 does not know the host name of Public DNS itself, we need to teach it in some form.


In ARMORED CORE V IP address was used. It is to operate in an environment other than EC2, such as using a server under development or a unique server. Global IP address and private IP address are described in the configuration file of each server process. The server process reads from the configuration file and records it in the DB, you can see the IP address of the connection destination.


I will talk about load distribution from here. Since Amazon EC 2 has a mechanism like a load balancer called ELB (Elastic Load Balancing), if you use it, you load distribution without permission, you do not have to worry about the IP address on the back side, There are very useful things.


However, there is a problem with ELB and it only supports session based protocols. It can be used for TCP, HTTP, etc. but it can not be used with UDP. Xbox 360 XLSP can not be used because it is on UDP. Also, as with iptables, packet forwarding is not possible. Therefore, packet forwarding can not be used for QA check. Since it can not be used to distribute connections from clients, let's do it yourself.


ARMORED CORE V decided to fix the server (login server) to be connected first, and teach the server information that the login server really should connect.


It is a consideration. Amazon EC 2 was very stable about what we found after actually using it. The troubles caused by EC 2 are so far as to be counted, but only a few times have occurred. Another thing is that the user concentrates heavily on the title of the consumer aircraft. When concentrating, it was very effective to use scale out by the cloud. It is an advantage that it is easy to scale up. Since we can prepare servers at spots, it was very convenient to be able to prepare servers for development and QA check immediately.


With regard to the assignment, the method of running XLSP described this time has no throughput. It is impossible even if you want to deliver a movie that needs to send and receive large amounts of data, so you have to give up and make it on-premises. Using the cloud makes it easy to increase the server, and very much thankfully it will be hard to start or stop the process as the number of instances of the server increases and the number of processes running on it increases. It is necessary to have a mechanism that can move a large number of instances and processes with simple operation. In short, we need a system to handle the cloud efficiently. Since there is a fee-based service here, I think that we have to deal with such things.


It is a summary. Notes on online titles for consumer machines differ from platform to platform. There are restrictions on the cloud service, but it is possible to avoid it depending on ingenuity. Let's make it after firmly considering the QA check after starting the service. You have to do load balancing yourself. In conclusion, I think that cloud service can be one of the infrastructure most suitable for online titles for consumer aircraft.

in Coverage, Posted by darkhorse_log