{"id":1459,"date":"2021-11-30T08:36:23","date_gmt":"2021-11-30T08:36:23","guid":{"rendered":"https:\/\/blog.amt.in\/?p=1459"},"modified":"2021-11-30T08:36:23","modified_gmt":"2021-11-30T08:36:23","slug":"introduction-to-presto","status":"publish","type":"post","link":"https:\/\/blog.amt.in\/index.php\/2021\/11\/30\/introduction-to-presto\/","title":{"rendered":"Introduction to Presto"},"content":{"rendered":"<p>Presto\u00c2\u00a0is a high performance, distributed\u00c2\u00a0SQL\u00c2\u00a0query engine for big data. Its architecture allows users to query a variety of data sources such as\u00c2\u00a0Hadoop,\u00c2\u00a0AWS S3,\u00c2\u00a0Alluxio,\u00c2\u00a0MySQL,\u00c2\u00a0Cassandra,\u00c2\u00a0Kafka, and\u00c2\u00a0MongoDB. One can even query data from multiple data sources within a single query. Presto is community driven\u00c2\u00a0open-source software\u00c2\u00a0released under the\u00c2\u00a0Apache License.<\/p>\n<p>SQL\u00c2\u00a0(Structured Query Language)<b>\u00c2\u00a0<\/b>is a\u00c2\u00a0domain-specific language\u00c2\u00a0used in programming and designed for managing data held in a\u00c2\u00a0relational database management system\u00c2\u00a0(RDBMS), or for stream processing in a\u00c2\u00a0relational data stream management system\u00c2\u00a0(RDSMS). It is particularly useful in handling\u00c2\u00a0structured data, i.e. data incorporating relations among entities and variables.<\/p>\n<p>SQL offers two main advantages over older read\u00e2\u20ac\u201cwrite\u00c2\u00a0APIs\u00c2\u00a0such as\u00c2\u00a0ISAM\u00c2\u00a0or\u00c2\u00a0VSAM. Firstly, it introduced the concept of accessing many records with one single command. Secondly, it eliminates the need to specify\u00c2\u00a0<i>how<\/i>\u00c2\u00a0to reach a record, e.g. with or without an\u00c2\u00a0index.<\/p>\n<p>Presto was originally designed and developed at\u00c2\u00a0Facebook\u00c2\u00a0for their data analysts to run interactive queries on its large data warehouse in\u00c2\u00a0Apache Hadoop. Before Presto, the data analysts at Facebook relied on\u00c2\u00a0Apache Hive\u00c2\u00a0for running SQL analytics on their multi petabyte data warehouse. Hive was inadequate for Facebook&#8217;s scale and Presto was invented to fill the gap to run fast queries. Original development started in 2012 and deployed at Facebook later that year. In November 2013, Facebook announced its release as open source 2013.\u00c2\u00a0In 2014,\u00c2\u00a0Netflix\u00c2\u00a0disclosed they used Presto on 10\u00c2\u00a0petabytes\u00c2\u00a0of data stored in the\u00c2\u00a0Amazon Simple Storage Service\u00c2\u00a0(S3).<\/p>\n<p>In January 2019, the\u00c2\u00a0Presto Software Foundation\u00c2\u00a0was announced. The foundation is a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL query engine. Development of Presto continues independently with PrestoDB maintained by Facebook and PrestoSQL maintained by the Presto Software Foundation with some cross pollination of code.<\/p>\n<p>Presto\u00e2\u20ac\u2122s architecture is very similar to a classic\u00c2\u00a0database management system\u00c2\u00a0using\u00c2\u00a0cluster computing\u00c2\u00a0(MPP). It can be visualized as one coordinator node working in sync with multiple worker nodes. Clients submit SQL statements that get parsed and planned following which parallel tasks are scheduled to workers. Workers jointly process rows from the data sources and produce results that are returned to the client. Compared to the original\u00c2\u00a0Apache Hive\u00c2\u00a0execution model which used the Hadoop\u00c2\u00a0Map Reduce\u00c2\u00a0mechanism on each query, Presto does not write intermediate results to disk resulting in a significant speed improvement. Presto is written in the\u00c2\u00a0Java programming language.<\/p>\n<p>Connolly and Begg define Database Management System (DBMS) as a &#8220;software system that enables users to define, create, maintain and control access to the database&#8221;.<\/p>\n<p>The DBMS acronym is sometime extended to indicated the underlying\u00c2\u00a0database model, with RDBMS for\u00c2\u00a0relational, OODBMS or ORDBMS for the\u00c2\u00a0object (orientated) model\u00c2\u00a0and ORDBMS for Object-Relational. Other extensions can indicate some other characteristic, such as DDBMS for a distributed database management systems.<\/p>\n<p>The functionality provided by a DBMS can vary enormously. The core functionality is the storage, retrieval and update of data.\u00c2\u00a0Codd\u00c2\u00a0proposed the following functions and services a fully-fledged general purpose DBMS should provide:<\/p>\n<ul>\n<li>Data storage, retrieval and update<\/li>\n<li>User accessible catalog or data dictionary describing the metadata<\/li>\n<li>Support for transactions and concurrency<\/li>\n<li>Facilities for recovering the database should it become damaged<\/li>\n<li>Support for authorization of access and update of data<\/li>\n<li>Access support from remote locations<\/li>\n<li>Enforcing constraints to ensure data in the database abides by certain rules<\/li>\n<\/ul>\n<p>A single Presto query can combine data from multiple sources. Presto offers connectors to data sources including files in\u00c2\u00a0Alluxio,\u00c2\u00a0Hadoop Distributed File System,\u00c2\u00a0Amazon S3,\u00c2\u00a0MySQL,\u00c2\u00a0PostgreSQL,\u00c2\u00a0Microsoft SQL Server,\u00c2\u00a0Amazon Redshift,\u00c2\u00a0Apache Kudu,\u00c2\u00a0Apache Phoenix,\u00c2\u00a0Apache Kafka,\u00c2\u00a0Apache Cassandra,\u00c2\u00a0Apache Accumulo,\u00c2\u00a0MongoDB\u00c2\u00a0and\u00c2\u00a0Redis. Unlike other Hadoop distribution-specific tools, such as\u00c2\u00a0Cloudera Impala, Presto can work with any flavor of Hadoop or without it. Presto supports separation of compute and storage and may be deployed both on premises and in the\u00c2\u00a0cloud.<\/p>\n<p>The above is a brief about Presto. Watch this space for more updates on the latest trends in Technology.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Presto\u00c2\u00a0is a high performance, distributed\u00c2\u00a0SQL\u00c2\u00a0query<\/p>\n","protected":false},"author":1,"featured_media":1461,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[229,445,7],"tags":[230,446,18],"class_list":["post-1459","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mongodb","category-presto","category-techtrends","tag-mongodb","tag-presto","tag-technology"],"_links":{"self":[{"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/posts\/1459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/comments?post=1459"}],"version-history":[{"count":1,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/posts\/1459\/revisions"}],"predecessor-version":[{"id":1460,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/posts\/1459\/revisions\/1460"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/media\/1461"}],"wp:attachment":[{"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/media?parent=1459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/categories?post=1459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/tags?post=1459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}