Hi!We have job, that runs every hour. It extracts data from the API and saves to the databricks table.Sometimes job fails with error "org.apache.spark.SparkException". Here is the full error:An error occurred while calling o7353.saveAsTable.
: org.ap...
Hello,We are receiving DB CDC binlogs through Kafka and synchronizing tables in OLAP system using the apply_changes function in Delta Live Table (DLT). A month ago, a column was added to our table, but due to a type mismatch, it's being stored incorr...
Hey as previously stated you could drop the duplicates of the columns that contain the said duplicates(code you can find online pretty easily), I have had this problem myself and it came when creating a temporary view from a dataframe, the dataframe ...
I have tried many times all the answers from the internet and stackover flowI have already created the config section before this steps, it passed but this below step is not executing.
We were getting this problem when using directory-scoped SAS tokens. While I know there are a number of potential issues that can cause this problem, one potential explanation is that it turns out there is an undocumented spark setting needed on the ...
do the external tables which we create or manage through unity catalog supports acid properties and time traveling, and if we go for the performance issue which is more faster to query and why ?
Hi @Hilium, External tables in the Unity Catalog reference an external storage path. They are used when you require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses. However, the ACID properties and time-tra...
When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.Is there any way I can simply write my data to a CSV file, with the name ...
Hello community!Recently I have been working in delta live table for a big project. My team and me have been studying so much and finally we have built a good pipeline with CDC that load 608 entities (and, therefore, 608 delta live tables and 608 mat...
Interesting...DLT probably spends x seconds/table for the setup.If you have time, you could do some tests to see if the table setup scales linearly (1 table, 5 sec for setup, 10 tables 50 sec etc).If you do, please share the outcome.
Hi, I recently observed that, after creating a new catalog (without a managed location) in Unity Catalog, a column named 'url' is included in the definition of the information_schema.schemata view.However, there is no url column in the underlying tab...
Hi @gardener, Based on the Databricks documentation, the information_schema.schemata view should contain the following columns:
catalog_name: Catalog containing the schema.schema_name: Name of the schema.schema_owner: User or group (principal) that c...
HiI'm using the COPY INTO command to insert new data (in form of CSVs) into an already existing table.The SQL query takes care of the conversion of the fields to the target table schema (well, there isn't other way to do that), and schema update is n...
I actually found an option that could solve the newline issue I mentioned in my previous post:setting spark.sql.csv.parser.columnPruning.enabled to false withspark.conf.set("spark.sql.csv.parser.columnPruning.enabled", False)will consider malformed r...
HelloI created a compute in which I refer the secret inside the spark config like this: spark.hadoop.fs.azure.account.key.xxxxxxxxxx.dfs.core.windows.net {{secrets/kv-xxxxxxx-xxxx/secret-name}} This, however, gives me the following warning. I've l...
Extra info: I have used the format following the instructions on this page for spark configurationhttps://learn.microsoft.com/en-us/azure/databricks/connect/storage/azure-storage#:~:text=Use%20the%20following%20format%20to%20set%20the%20cluster%20Spa...
I have following code:org = spark.read.table("catalog.dbo.organisation")
@dlt.create_table()
def organization():
return orgThe catalog is an external azure sql database (using external connector)When i validate this in Delta live table workflow I...
I've noticed that the current development cycle for DLT jobs is quite time-consuming. The process of coding, saving, running in a workflow, and debugging seems arduous, and the feedback loop is slow. Is there a way to run DLT jobs without relying on ...
Hi @leelee3000, Developing and iterating on Delta Live Tables (DLT) jobs can be time-consuming when relying solely on traditional workflows.
Databricks Jobs:
Databricks jobs allow you to orchestrate multiple tasks within a Databricks job, creating ...
I'm experimenting with liquid clustering and have some questions about compatible types (somewhat similar to Liquid clustering with boolean columns ).Table created as CREATE TABLE IF NOT EXISTS <TABLE>
(
_time DOUBLE
, timestamp TIMESTAMP_NT...
Hi,just educated guess:There is limitation in liquid clustering docs: You can only specify columns with statistics collected for clustering keysPerhaps it is related to data types for which you can collect statistics?But i could not find related docs...
Hello,I was wondering if there was any timeline for Java 21 support with the Databricks JDBC driver (current version is 2.34).One of the required change is to update the dependency to arrow to version 13.0 (current version is 9.0.0).The current worka...
Hello @Kaniz Any update on this topic of Java 21 ? Any timeline ?Our clients really want to upgrade to Java 21 and we don't want to disable Arrow for performance reasons