One of the main topics at the Build 2018 conference was AI and machine learning – which rely on data. As well as the updates to the cognitive APIs and tools for building custom data models, Microsoft talked about new and upcoming features for its three key database platforms: Cosmos DB, the Azure managed database platform (which now runs MySQL, Postgres and MariaDB as well as Azure SQL Database), and SQL Server (where the focus was on new tools for database developers working on Mac and Linux).
Cosmos DB is a petabyte-scale cloud database that supports multiple data models and multiple APIs for querying those database models, including MongoDB, Gremlin and Cassandra as well as JSON, SQL and Azure Tables, and multiple consistency models like bounded staleness, session consistency and consistent prefix rather than just the familiar strong and eventual consistency. Now it also supports multi-master replication between databases in different regions, thanks to an adaptive protocol that reduces conflicts to a minimum but still ensures the most recent write always wins, even when multiple sources are writing to the same record.
Microsoft demonstrated the possibilities of multi-master with PxDraw, a live drawing application with a million pixels, real-time shared canvas running in two Azure regions, US West and Asia, letting users collaborate on pixel by pixel drawings in close to real time.
Cosmos DB has an automatic conflict resolution policy that can handle the write conflicts that happen with multiple database masters with active workloads (like dozens of users drawing on a canvas at the same time) using options like Last Write Wins (synchronising clocks between database masters to make sure they all agree on timestamps, even if data updates propagate asynchronously to keep costs down), or you can create your own policy in your database application using stored procedures. Multi-master gives you more options for failover and load balancing because if a region goes offline, it can be integrated back into the database when it comes back online. That means Cosmos DB can also support intermittently available regions – which could be IoT devices or Azure Stack in your own data centres (once Microsoft publishes the protocol they’ll need to use to connect).
If you’re importing very large datasets of Cosmos DB (or updating existing datasets with terabytes of data), the new Cosmos DB bulk executor library offers up to ten times faster write throughput – over a terabyte of data in less than an hour; you can download it for .NET via Nuget and Java via Maven. You can also manage access to Cosmos DB endpoints from virtual subnets and change IP addresses manually with the new virtual service endpoint – something that enterprise customers have been asking Microsoft for.
Azure database platform
The Azure database platform options continue to get more powerful, adding features like Azure AD multi-factor authentication for SQL Database and BYOK integration with Azure Key Vault. The Azure Database Migration Service has now moved from preview to general availability. Preview customers have already used it to migrate 55,000 SQL Server databases to Azure and it will soon also support migrating MySQL, PostgresSQL and Oracle databases to Azure (DB2 and Sybase support is also coming).
The second generation Compute Optimised tier of Azure SQL Data Warehouse is also now GA. It runs on the latest generation of Azure server hardware with NVMe SSDs that automatically cache hot data from remote storage so you can provision up to five times as much compute and unlimited storage to get significantly better query performance, serving up to 128 concurrent queries from a single cluster, without losing the elastic architecture that lets you scale up and down ad necessary. If you’ve been using the preview, you’ll want to check what it will cost you in production as preview pricing ends soon (1 June 2018).
There are also new Azure services designed to reduce the amount of time database engineers spend ‘wrangling’ data. If you’re working with unstructured or semi-structured data, Azure Databricks – which lets you launch and scale Spark clusters on demand to create an interactive workspace with built-in data adapters where you can build a data management pipeline for analytics – is now generally available. For more structured data, Azure Data Factory will soon reach GA with more than 60 data connectors for building reliable data pipelines using a drag and drop user interface, with options like executing your SQL Server Integration Services packages in Azure.
Microsoft has added multiple new tools and services for its database platforms over the last year. (Source: Microsoft)
SQL Server is also getting more connectors for working with remote data stores, using PolyBase. Currently PolyBase lets you run queries over Hadoop clusters and push computation to those clusters, using a query optimiser that automatically decides where to run the job and generates the MapReduce code to do that. The next version of SQL Server, which is currently in private preview, will add many more connectors to external data sources including Azure blob storage, MongoDB Teradata, Oracle and any ODBC source, making them accessible to SQL Server developers without them having to learn new query tools.
Now that SQL Server can run on Linux, Microsoft is building new tools for users working against it on a Mac or Linux system. The SSMS and SSDT tools aren’t going away, but there are new options like mssql-cli – a command line interface with Intellisense autocomplete for things like table names and functions, which also makes SQL output easier to read on the command line by formatting it into columns that you can scroll through. There’s finally a preview of the Linux version of sqlpackage (which had been unavailable for months while it was rebuilt on .NET Core), so you can use this command line utility to import, export, extract, report on and update database schemas on both Windows and Linux SQL Server databases, which also makes it easier to work with SQL Server deployed in containers.
But the main tool for Linux SQL Server developers is SQL Operations Studio, built as an open source project on a fork of Visual Studio Code. The May update adds a dark theme and more options for extensions, including insight widgets that add visualisations and user interface extensions. Extensions from Microsoft and partners like RedGate are available on GitHub, adding features like search and SSMS key mappings, as well as an activity dashboard called whoisactive, and when there are enough extensions Microsoft will create a marketplace to host them. SQL agent manageability is being done as an extension, and extensions could also add connection types to abstractions like availability groups (using the same language services protocol model that Visual Studio Code uses to support different programming languages in extensions). SQL Operations Studio will get a profiler tool in June, with general availability for the tool following some time after.
The Grey Matter licensing specialists can help you with any database questions you may have: +44 (0)1364 654100 or firstname.lastname@example.org