Amazon Web Services has recently announced significant new analytics capabilities that help customers embrace data at today and tomorrow’s scale. AWS introduced several new Redshift capabilities that bring more than an order of magnitude better query performance and deliver greater flexibility for customers when they are working across their data storage, data warehouse, and operational databases at scale. AWS also announced a new innovative highly-scalable, cost-saving warm storage tier for Amazon Elasticsearch Service.
AWS provides the broadest and deepest set of analytics services of any cloud provider, and is constantly innovating based on customer needs for this new scale of data.
Amazon Redshift RA3 instances
To scale their data warehouse, customers use Redshift’s Elastic resize capability to add additional instances to their cluster. Redshift’s instances include a fixed amount of compute and storage, so it’s possible for customers to end up over-provisioned on either, and paying for capacity they don’t use. Customers have asked for the ability to grow their storage without over-provisioning compute, and for more flexibility to grow their compute capacity without increasing their storage costs.
New Amazon Redshift RA3 instances with Managed Storage allow customers to optimize their data warehouse by scaling and paying for compute and storage independently. Customers choose the number of instances they need based on their data warehousing workload’s performance requirements, and only pay for the managed storage that they use. Customers can automatically scale their data warehouse storage capacity without adding and paying for additional instances. Redshift Managed Storage uses a variety of advanced data management techniques to optimize how efficiently data is offloaded to and retrieved from Amazon S3. get started with Redshift RA3 instances.
AQUA (Advanced Query Accelerator)
As data volumes continue to grow at a rapid clip, this data movement saturates available networking bandwidth and slows down performance.
AQUA (Advanced Query Accelerator) for Amazon Redshift (available mid-2020) is a new distributed and hardware-accelerated cache for Amazon Redshift that provides the next phase of performance improvement and innovation for analytics at the new scale of data.
AQUA brings compute to the storage layer, so data doesn’t have to move back and forth between the two, enabling Redshift to run 10x faster than any other cloud data warehouse. AQUA is a big, high-speed cache architecture on top of Amazon S3 that can scale out and process data in parallel across many nodes.
This new architecture makes queries run so much faster than today’s cloud data warehouses that customers will be able to query raw data directly, even at scale, giving them more up-to-date dashboards, less development time, and easier to maintain systems. AQUA-powered Amazon Redshift will remain 100% compatible with the current version of Amazon Redshift, so customers can easily migrate existing data warehouses with no code changes. AQUA provides the next phase of performance innovation for analytics at the new scale of data, and will be available in mid-2020.
Amazon Redshift Data Lake Export
Customers require data to be combined across their data warehouse and data lake, and don’t want data locked in silos and proprietary formats. Amazon Redshift enables customers to directly query and join data across both their Amazon Redshift data warehouse and Amazon S3 data lake, giving customers a ‘lake house’ approach to data warehousing. In this lake house world, where data is stored both in Amazon Redshift and Amazon S3, customers also need an easy way to get the results from Amazon Redshift queries back into Amazon S3 in an open format that can be used by other services.
Amazon Redshift Data Lake Export allows customers to export data directly from Amazon Redshift to Amazon S3 in an open data format (Apache Parquet) that is optimized for analytics. Customers can now save the results of a query they have done in Amazon Redshift into their data lakes in open formats so that they can analyse that data with other analytics services like Amazon SageMaker, Amazon Athena, and Amazon EMR. No other cloud data warehouse makes it as easy to both query data and write data back to a data lake in open formats.
Amazon Redshift Federated Query
Aggregating, transforming, and uploading large amounts of data from a relational database to a data warehouse can be resource-intensive and time-consuming, which is why many customers choose to do so only once a day. This can create problems when customers need to query their data warehouse for certain types of timely information that is initially stored in an operational database.
Amazon Redshift Federated Query (available in preview) gives customers the ability to run queries in Amazon Redshift on live data across their Amazon Redshift data warehouse, their Amazon S3 data lake, and their Amazon RDS and Amazon Aurora (PostgreSQL) operational databases. This simplifies application development by allowing customers to use familiar SQL statements to combine all of this data across their various data stores. With this capability, Amazon Redshift queries can now provide timely and up-to-date data from operational databases to drive better insights and decisions.
UltraWarm for Amazon Elasticsearch Service
As more and more applications are built using microservices, containers, and purpose-built data stores, they produce an ever-increasing amount of log data. Amazon Elasticsearch Service makes it simple to collect, analyze, and visualize machine-generated log data from websites, mobile devices, and sensors.
AWS built a new storage tier for Amazon Elasticsearch Service called UltraWarm, which finally gives Elasticsearch customers a warm storage tier that both stores large amounts of data cost-effectively and provides the type of snappy, interactive experience that Elasticsearch customers expect. UltraWarm offers a distributed cache for more frequently accessed data, while using advanced placement techniques to determine which blocks of data are less frequently accessed and should be moved outside of the cache to Amazon S3.
UltraWarm also uses high-performance EC2 instances to interact with data stored in S3, providing 50% faster query execution versus competing warm-tier solutions, and giving customers the same interactive analytics experience with all their log data. UltraWarm is a seamless extension of the Amazon Elasticsearch Service. Customers can easily query and visualize across both their recent and longer-term operational data, all from their Kibana interface, at a fraction of the cost today.
Raju Gulabani, Vice President, Database Services, AWS says “customers want to perform fast analytics on all of their raw data across their data warehouse and data lake, and cost effectively deal with the explosion in log data to retain information that might help them run their businesses better. With these announcements we are helping AWS customers do all of this and fearlessly embrace data at scale.”
(Image Courtesy: www.tecklearn.com)