Insights on NoSQL – Advanced Millennium Technologies

AÂ NoSQLÂ (originally referring to “non-SQL” or “non-relational”)Â databaseÂ provides a mechanism forÂ storageÂ andÂ retrievalÂ of data that is modeled in means other than the tabular relations used inÂ relational databases. Such databases have existed since the late 1960s, but the name “NoSQL” was only coined in the early 21st century,Â triggered by the needs ofÂ Web 2.0Â companies.Â NoSQL databases are increasingly used inÂ big dataÂ andÂ real-time webÂ applications.Â NoSQL systems are also sometimes called “Not only SQL” to emphasize that they may supportÂ SQL-like query languages or sit alongside SQL databases inÂ polyglot-persistentÂ architectures.

Motivations for this approach include: simplicity ofÂ design, simplerÂ “horizontal” scalingÂ toÂ clusters of machinesÂ (which is a problem for relational databases),Â finer control overÂ availabilityÂ and limiting theÂ object-relational impedance mismatch.Â The data structures used by NoSQL databases (e.g.Â keyâ€“value pair,Â wide column,Â graph, orÂ document) are different from those used by default in relational databases, making some operations faster in NoSQL. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by NoSQL databases are also viewed as “more flexible” than relational database tables.

Many NoSQL stores compromiseÂ consistencyÂ (in the sense of theÂ CAP theorem) in favor of availability, partition tolerance, and speed. Barriers to the greater adoption of NoSQL stores include the use of low-level query languages (instead of SQL, for instance), lack of ability to perform ad-hocÂ joinsÂ across tables, lack of standardized interfaces, and huge previous investments in existing relational databases.Â Most NoSQL stores lack trueÂ ACIDÂ transactions, although a few databases have made them central to their designs.

Instead, most NoSQL databases offer a concept of “eventual consistency”, in which database changes are propagated to all nodes “eventually” (typically within milliseconds), so queries for data might not return updated data immediately or might result in reading data that is not accurate, a problem known as stale reads.Â Additionally, some NoSQL systems may exhibit lost writes and other forms ofÂ data loss. Some NoSQL systems provide concepts such asÂ write-ahead loggingÂ to avoid data loss.Â ForÂ distributed transaction processingÂ across multiple databases, data consistency is an even bigger challenge that is difficult for both NoSQL and relational databases. Relational databases “do not allow referential integrity constraints to span databases”.Â Few systems maintain bothÂ ACIDÂ transactions andÂ X/Open XAÂ standards for distributed transaction processing.Â Interactive relational databases share conformational relay analysis techniques as a common feature.Â Limitations within the interface environment are overcome using semantic virtualization protocols, such that NoSQL services are accessible to most operating systems.

The termÂ NoSQLÂ was used by Carlo Strozzi in 1998 to name his lightweightÂ Strozzi NoSQL open-source relational databaseÂ that did not expose the standardÂ Structured Query LanguageÂ (SQL) interface, but was still relational.Â His NoSQL RDBMS is distinct from the around-2009 general concept of NoSQL databases. Strozzi suggests that, because the current NoSQL movement “departs from the relational model altogether, it should therefore have been called more appropriately ‘NoREL'”,Â referring to “not relational”.

Johan Oskarsson, then a developer atÂ Last.fm, reintroduced the termÂ NoSQLÂ in early 2009 when he organized an event to discuss “open-sourceÂ distributed, non-relational databases”.Â The name attempted to label the emergence of an increasing number of non-relational, distributed data stores, including open source clones of Google’sÂ Bigtable/MapReduceÂ and Amazon’sÂ DynamoDB.

There are various ways to classify NoSQL databases, with different categories and subcategories, some of which overlap. What follows is a basic classification by data model, with examples:

Wide column:Â Accumulo,Â Cassandra,Â Scylla,Â HBase.
Document:Â Apache CouchDB,Â ArangoDB,Â BaseX,Â Clusterpoint,Â Couchbase,Â Cosmos DB,Â eXist-db,Â IBM Domino,Â MarkLogic,Â MongoDB,Â OrientDB,Â Qizx,Â RethinkDB
Keyâ€“value:Â Aerospike,Â Apache Ignite,Â ArangoDB,Â Berkeley DB,Â Couchbase,Â Dynamo,Â FoundationDB,Â InfinityDB,Â MemcacheDB,Â MUMPS,Â Oracle NoSQL Database,Â OrientDB,Â Redis,Â Riak,Â SciDB, SDBM/Flat FileÂ dbm,Â ZooKeeper
Graph:Â AllegroGraph,Â ArangoDB,Â InfiniteGraph,Â Apache Giraph,Â MarkLogic,Â Neo4J,Â OrientDB,Â Virtuoso

Since most NoSQL databases lack ability for joins in queries, theÂ database schemaÂ generally needs to be designed differently. There are three main techniques for handling relational data in a NoSQL database.

Multiple queries:

Instead of retrieving all the data with one query, it is common to do several queries to get the desired data. NoSQL queries are often faster than traditional SQL queries so the cost of additional queries may be acceptable. If an excessive number of queries would be necessary, one of the other two approaches is more appropriate.

Caching, replication and non-normalized data:

Instead of only storing foreign keys, it is common to store actual foreign values along with the model’s data. For example, each blog comment might include the username in addition to a user id, thus providing easy access to the username without requiring another lookup. When a username changes however, this will now need to be changed in many places in the database. Thus this approach works better when reads are much more common than writes.

Nesting data:

With document databases like MongoDB it is common to put more data in a smaller number of collections. For example, in a blogging application, one might choose to store comments within the blog post document so that with a single retrieval one gets all the comments. Thus in this approach a single document contains all the data you need for a specific task.

The above is a brief about NoSQL. Watch this space for more updates on the latest trends in Technology.

Multiple queries:

Caching, replication and non-normalized data:

Nesting data:

Leave a Reply Cancel reply