Databricks Unity Catalog (UC) has gained significant attention lately, with Databricks making huge investments and shifting to make it the default choice for all new Databricks accounts.
Databricks SQL is a powerful tool for querying and analysing data in Databricks Lakehouse.
Recently, I had the opportunity to explore the Databricks SQL extension for VSCode, and I was thoroughly impressed.
When writing to a JSON destination using the DataFrameWriter the dataset is split into multiple files to reflect the number of RDD partitions in the dataframe when in memory – this is the most efficient way for Spark to write data out.
The execution plans in Databricks allows you to understand how code will actually get executed across a cluster and is useful for optimising queries.
As data sizes and demand increases as time goes on, you often see slowness on Databricks this can be due to number of factors from security, network transfers, read/write requests, and memory space.