Real-time mode is a breakthrough that lets Spark utilize all available CPUs to process records with single-millisecond latency, while decoupling checkpointing from per-record processing.
Databricks goes native on Excel. You can now ingest + query .xls/.xlsx directly in Databricks (SQL + PySpark, batch and streaming), with auto schema/type inference, sheet + cell-range targeting, and evaluated formulas, no extra libraries anymore.
Tags, whether manually assigned or automatically assigned by the “data classification” service, can be protected using policies. Column masking can automatically mask columns with a given tag for all except some with elevated access.
Imagine all a data engineer or analyst needs to do to read from a REST API is use spark.read(), no direct request calls, no manual JSON parsing - just spark .read. That’s the power of a custom Spark Data Source. Soon, we will see a surge of open-sour...
If you are executing that code from databricks you don't need to call it through API SQL Statement Execution, as it is not necessary overhead. Just use spark.sql("CALL myCatalog.mySchema.insert_data_procedure("f"'{col1_value}', '{col2_value}');"))
don't use
abfss://metastore@[name of storage resource].privatelink.dfs.core.windows.net
Just use the standard URL
abfss://metastore@<storageaccount>.dfs.core.windows.net
DNS will resolve it to a private link.
Under ANSI rules, INT + STRING resolves to BIGINT (Long) that's why it crashes https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html. There are some examples when it works like 1Y or 1L.
Regarding 4.0.1, can you double-check ansi.enabled ...