Which database to use with rails
Under the Serializable isolation level, this prevents such problems as dirty reads, nonrepeatable reads, and phantom reads. Although Postgres is not by any means a clone or emulation of any commercial database, it will nevertheless be familiar to programmers and DBAs who have experience with one of the commercial databases.
PostgreSQL has an unfortunate reputation for being slow. It got this reputation because the default configuration is optimized for performance on a tiny machine. Like any database, Postgres must be tuned for any serious use. One disadvantage of using PostgreSQL is that it has a smaller community around it. There is no such centralized support structure for Postgres, as there is no single company behind PostgreSQL; however, there are several companies that specialize in Postgres consulting and provide support contracts.
Multiversion concurrency control MVCC is one of the most powerful ways to achieve isolation between concurrent database transactions. MVCC gives each transaction a snapshot of the data it accesses, as the data existed at the start of the transaction. The transaction performs actions on the data, which are logged with timestamps. When the transaction commits, the DBMS checks the logs to ensure there are no conflicts with other transactions; if the transaction can be performed successfully, it is applied to the database at once, atomically.
Row-level locking locks only those rows affected by an update during a transaction as opposed to page-or table-level locking, which are more coarse. Since all update transactions are applied atomically, the database is always in a consistent state. Pending transactions are stored as logs alongside the database to be written upon commit, rather than being applied to the database in the middle of the transaction. The most significant consequence of this is that reads never block, since they are read from the database, which is always consistent.
It is important to realize that isolation for concurrent transactions usually trades off against performance. MVCC uses more storage space than locking because it has to store a snapshot for each in-progress transaction. Some hold it to be a toy, while others consider it to be a good foundation for web applications. However, the bottleneck is the database.
A big assumption of this shared-nothing architecture is that the application servers all share a database. Old versions of MySQL had some fairly serious issues, many revolving around the issue of data integrity and constraints. In versions prior to 5. To be fair, new versions of MySQL are addressing a lot of its issues.
I would still recommend PostgreSQL as a general rule where speed is not the primary criterion, since it has had enterprise-level features for much longer.
If you use MySQL, take these recommendations:. Use version 5. Many of the issues that existed with previous versions have been fixed or improved in 5. Use InnoDB for absolutely anything where data integrity or concurrency matter.
In most business environments, these features are non-negotiable. InnoDB is a journaled storage engine that is much more resilient to failures. Rails does the right thing here and defaults to the InnoDB storage engine when creating tables.
MyISAM is usually faster when reads vastly out-number writes or vice versa, while InnoDB is generally faster when reads and writes are balanced. It all comes down to the requirements of the specific application; these are general rules. You should always benchmark with your real data, and an accurate sample of queries and statements you will be issuing, in a realistic environment. For example, a logging server for web analytics might use MyISAM tables: you want to be able to dump logs into it as fast as possible, and reads are performed far less often than writes.
This can be accomplished with the following command:. This will make MySQL a little bit more strict, raising errors on incorrect data rather than silently discarding it. On the whole, MySQL tends to be faster. For many web applications, query speed may be the most important factor. MySQL also has more stable, tested replication and clustering options available. MySQL is also somewhat better at handling binary data stored in the database we discuss this at length later in the chapter.
For many web applications, MySQL may be a clear win. SQLite is a minimalist database that is excellent for small projects. Although it does not support many fancy features, it is a great choice for projects that will not grow very large. It supports ACID transactions [ 33 ] out of the box. SQLite is a library that is linked into your program; there is no server process to speak of. Therefore, it uses file-level locking: the entire database file is locked at the filesystem level during a transaction.
Still, for many small applications, it fits the bill perfectly. It is a good replacement for data that may have been stored in flat files, as it supports most of the SQL standard and would be easy to migrate to a more traditional DBMS as needs grow.
The configuration looks like this:. Once the database configuration is in place, the standard ActiveRecord API methods can be used to manipulate data. Windows, Linux, and OS X are supported as clients. The connection configuration is fairly standard, using oci as the connection adapter name. However, the Oracle client library still maps net service names to connection specifications, so the host parameter provides a service name rather than a physical hostname:.
ORA file, which will look something like this:. Alternatively, you can provide the connection specification on one line with the Rails database configuration:. The connection setup is the hardest part. Stored procedures and other Oracle-specific syntax are available through the standard methods that expose an SQL interface, such as ActiveRecord::Base.
Sooner or later, many web applications must deal with the issue of LOB large object data. LOB data may be small, but it is usually large compared to other attributes being stored tens of kilobytes to hundreds of gigabytes or larger.
The defining characteristic of LOB data, however, is that the application has no knowledge of the semantics of the internal structure of the data. Some DBMSs separate the two as separate data types. The DBA types among us might prefer database storage of large objects.
From a theoretical standpoint, storing binary data in the database is the most clean and straight-forward solution. It offers some immediate advantages:. All of your application data is in the same place: the database.
There is only one interface to the data, and one program is responsible for managing the data in all its forms. You have greater flexibility with access control, which really helps when working with large-scale projects. DBMS permitting, different permissions may be assigned to different tables within the same database. The binary data is not tied to a physical file path; when using filesystem storage, you must update the file paths in the referring database if you move the storage location.
PostgreSQL has some downright weird support for binary data. Needless to say, this causes unnecessary expansion of the data. In addition, it is impossible to stream data from the database to the web browser without running it through an un escape filter. Pulling a 2 MB binary file from the data-base usually means streaming somewhere around 6 MB of data through the un escape code. A better option would be to have the postgres C library handle quoting and unquoting, but this is a lot of work and still suboptimal.
The other option is large objects. The large object features in PostgreSQL work well enough, but they are also a little bit clunky. Up to 2 GB of data may be stored in a large object. This method is fast, and has good APIs, but there are drawbacks.
The large object mechanism is also slightly deprecated in favor of in-table storage, as the TOAST storage technique allows values of up to 1 GB in length to be stored directly as attributes within the table. My recommendation is to use filesystem storage for all binary objects if you use PostgreSQL. Although the database might be the more proper place for this type of data, it just does not work well enough yet.
If you have to use the database, large objects actually perform pretty well. MySQL does a fairly good job with binary data. Actual storage and performance depend on the wire protocol being used, buffer size, and available memory. Storage is efficient, using up to 4 bytes to store the data length, followed by the binary data itself. However, MySQL suffers from issues similar to PostgreSQL with streaming data, and it is always more awkward for a web application to stream data from the database than from the filesystem.
It is supported by a fairly mature API, and can be used directly from Rails. Consider it a formalization of the filesystem storage method discussed below.
This may prove to be of value in some situations. The reality is that filesystem storage is the best option, as a general rule. The Linux kernel has syscalls such as sendfile that work on physical files. There are hundreds of third-party utilities that you can only leverage when using physical files:.
Image processing is arguably the most popular application for storing binary data. Programs like ImageMagick are much easier to use in their command-line form, operating on files, rather than getting often-problematic libraries like RMagick to work with Ruby. Achieving high availability or load balancing with database large objects can be tricky.
Any other utility that works on files will have to be integrated or otherwise modified to work from a database. The short answer is that web servers are optimized for throwing binary files down a TCP socket. And the most common thing you do with binary files is throw them down a TCP socket. Long answer: the secret to this performance, under Linux and various BSDs, is the kernel sendfile syscall not to be confused with X- Sendfile, discussed later.
The sendfile function copies data quickly from a file descriptor which represents an open file to a socket which is connected to the client. This happens in kernel mode, not user mode— the entire process is handled by the operatingsystem.
When sendfile is invoked, the process looks a bit like Figure On the other hand, Rails is necessarily involved with the whole process when reading data from the database.
The file must be passed, chunk by chunk, from the database to Rails, which creates a response and sends the whole thing including the file to the web server. The web server then sends the response to the client. Using sendfile would be impossible here because the data does not exist as a file. The data must be buffered in memory, and the whole operation runs in user mode. The entire file is processed several times by user-mode code, which is a much more complicated process, as shown in Figure Often you will need to send a file to the client for download after doing some processing in Rails.
The most common example is an access-controlled file—you need to verify that the logged-in user has the appropriate level of access before sending the file, for example. This method is easy, but it is slow if you are sending static files. Rails reads the file and streams it byte-by-byte to the client. The X-Sendfile protocol is a very simple standard, first introduced in the Lighttpd web server, which directs the web server to send a file from the filesystem to the client rather than a response generated by the application server.
Because the web server requires access to the file in order to send it to the client, you must use filesystem large object storage. In addition, the files to be sent must have permissions set so as to be accessible to the web server. However, the files should be outside of the web root, lest someone guess a filename and have free access to your private files.
A typical response using X-Sendfile would look something like this:. Assuming the web server is properly configured, it will ignore any response body and stream the file from disk to the client. From Rails, you can set the response. Of course, the front end web server must be properly configured to recognize and process the X-Sendfile header. Mongrel does not support X-Sendfile, as it assumes you will proxy to it from a server more capable of serving static content.
If you are using Lighttpd, it has X-Sendfile support built in. If you are using Apache 2, things are a little more complicated although not by much.
Determines whether the X-Sendfile header is processed at all. Determines whether that header can send files above the path of the request. It defaults to off for security reasons. Both of these configuration options can be used in any configuration context, down to the.
Best practice dictates that you should only specify XSendFile on in the narrowest possible context. HavingX-Sendfile unnecessarily enabled is a security risk, as it allows a server application to send any file that the web server can access to the client. This completely removes the load from the application servers and provides a more scalable solution. Most applications that use large objects must deal with uploads. This can be tricky in any framework, but Rails handles most of the details and there are some best practices to guide you with the rest.
One of the easiest ways to handle Rails uploads is to use one of the popular plugins for upload processing. Many Rails developers are familiar with its interface, and for quite a while it was the standard way to handle uploaded data. However, there were a few factors that made it unsuitable for many applications:. It is tied to RMagick and therefore ImageMagick for image processing.
ImageMagick is notoriously difficult to install, primarily because it depends on many backend libraries for processing different image formats. Now, however, there is a lighter alternative, ImageScience, based on the FreeImage library. The entire attachment data must be read into memory and converted to a Ruby string.
For large files, this is expensive—Rails passes the application a TempFile , which is slurped into a String. If using filesystem storage, the stringis then written back out into a file! Luckily, there is an alternative.
It can store attachments in a database, the filesystem, or S3 out of the box. Hmmm… The first thought that comes to my mind is something that stores data. Store data in an organized and structured way, so we can get it easily. What about organizing all the data in tables? Fields : all data stored in the database. And now we have a well-structured way to store data : in a Table!
How about get, delete, insert, and update data? GET: if we want to get all data person from People table, we need to select it from that table. Using conditions in our queries Now we can use SQL language to query select, delete, insert, update data.
But what if we want to delete just records with the last name Kinoshita? Or if we want to update a specific person with first name Leandro and last name Kinoshita? Or just select all data from the people table and sort it by age from younger to older? Updating all records from the people table with first name Leandro and last name Kinoshita.
Relationship between tables We know how to execute queries with or without conditions. A person has one passport and that passport belongs to that specific person. So in this example, we have table People, table Passports and a 1—1 relationship. Imagine an e-commerce platform: users, orders, products, payments, etc.
A user can have many orders, and each order belongs to that specific user. So in this example, we have table Users, table Orders, and a 1-n relationship. And a record from another can also reference many records from the one. We have again the e-commerce platform: we divide products into categories. A category has many products category Technology has many products like cell phones, notebooks, etc and a product can belong to many categories product Cellphone belongs to the Technology and Electronics Categories.
So in this example, we have table Products, table Categories, and an n-n relationship. Ruby on Rails recommends to create three databases - a database each for development, testing, and production environment. You should initialize all three of them and create a user and password for them with full read and write privileges. We are using the root user ID for our application. At this point, you need to let Rails know about the user name and password for the databases. You do this in the file database.
This file has live configuration sections for MySQL databases. In each of the sections you use, you need to change the username and password lines to reflect the permissions on the databases you've created.
0コメント