Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hello!I might have experienced a bug with the ODBC driver. We have an issue where given certain priviledges in databricks, the ODBC driver is unable to show any schemas/tables.When we click the 'expand' button on any catalog in the list (of which we ...
Following this post - we are also faced with the same issue. @KTheJoker- when I'm connecting and trying to expand a catalog, I do see the query fire off in the SQL Warehouse query history but in Excel nothing is returned. I can see the schemas/tables...
Hi everyone, I’m new to Databricks and exploring its features. I’m trying to implement Change Data Capture (CDC) from the bronze layer to the silver layer using streaming. Could anyone share sample code or reference materials for implementing CDC wit...
All my questions is around this code block@Dlt.append_flow(target=”target_table”):
def flow_01():
df = spark.readStream.table(“table_01”)
@dlt.append_flow(target=”target_table”):
def flow_02():
df = spark.readStream.table(“table_02”)The first qu...
Hello @guangyi , I am getting back to you with some insights
Regarding your first question about checkpointing
You can manually check the checkpointing location of your stream table. The checkpoints of your Delta Live Tables are under Storage locatio...
I'm using the Databricks CLI to deploy an asset bundle for a job. I'm attempting to setup the configuration such that the "dev" target does not have a trigger on it, and the "prod" target does. Essentially the dev job is not scheduled to run and th...
in your dev target, you can add mode to pause all trigger.targets:
dev:
mode: development DAB also have new update,you can also use preset to handle different target setting.targets:
dev:
presets:
name_prefix: "testing_" # pre...
I'm trying to login for past two days and i'm still facing this error: "We've encountered an error logging you in." I've tried to reset the password multiple times and nothing happened. My friend is also not able to login. I request you to resolve t...
Hi all,I have a very quick question that I hope someone can help with.I want to execute a very simple sql statement like %sql
select * from json.`/Volumes/adfmeta/Objects.json`
where ObjectName like '%SGm$RITWebsader$911a%'However, the sql does not ...
Hi @JensV , Question marks are used as parameter placeholders, so could you please try to escape the question mark using Backslashes?select * from json.`/Volumes/adfmeta/Objects.json` where ObjectName like '%SGm\\$RITWebsader\\$911a%'Alternatively, w...
i am trying to execute same query on 3 different platforms - dbeaver, python notebook and sql workflow.I was expecting after first execution of the query irrespective of the platform, subsequent execution of same query should NOT re-compute. However ...
I don't think its possible unless the results written into a table and its being used in the queries across the client. Pls refer to this https://docs.databricks.com/en/sql/user/queries/query-caching.html
I am trying to convert JSON string stored in variable into spark dataframe without specifying schema, because I have a big number of different tables, so it has to be dynamically. I managed to do it with sc.parallelize, but since we are moving to Uni...
Hi filipjankovic,
SparkContext sc is a Spark 1.0 API and is deprecated on Standard and Serverless compute. However, your input data is a list of dictionaries, which are supported with spark.createDataFrame.
This should give you identical output witho...
Not sure it exists but maybe there is some trick to get directly from python code:NotebookNameCellTitlejust working on some logger script shared between notebooks and it could make my life a bit easier
I got the solution to work in terms of printing the notebook that I was running; however, what if you have notebook A that calls a function that prints the notebook name, and you run notebook B that %runs notebook A? I get the notebook B's name when...
Hello,I am having trouble using Managed Identity Authentication in Unity Catalog using pyodbc in Azure Databricks. The same code works on a "Legacy Shared Compute". The code snippet is below: import pyodbc jdbc_url = ( "DRIVER={ODBC 17 DRIVER PATH...
Thank you very much!I have spent an enormous amount of hours fighting with this and in the end it was the type of cluster... I hope that this problem will be solved in the future, because affects the developments when you use databricks-connect and s...
Hello, I have encountered an issue recently and was not able to find a solution yet. I have a job on databricks that creates a table using dbt (dbt-databricks>=1.0.0,<2.0.0). I am setting the location_root configuration so that this table is externa...
To recreate the issue:PS. Good to know: using dbt to create materialized tables is equivalent to running "create or replace table table_name"The following code will create an external table with row security:create or replace table table_name using d...
I have a DLT pipeline that has bronze -> silver -> gold -> platinum. I need to include a table that is joined to the gold layer for platinum that allows upserts in the DLT pipeline. This table is managed externally via Databricks API. Anytime a chang...
You obtain this error: "Detected a data update in the source table at version 1. This is currently not supported...."becuse DLT is based on Structured Streaming.... and for Structured Streaming any changes (deletes, updates) in the source table are n...
Hi, I'm doing some something simple on Databricks notebook:spark.sparkContext.setCheckpointDir("/tmp/")
import pyspark.pandas as ps
sql=("""select
field1, field2
From table
Where date>='2021-01.01""")
df = ps.sql(sql)
df.spark.checkpoint()That...
I can connect to an onprem oracle DB using my single-user compute but when i switch over to a shared compute, I get invalid username/password error. I can connect to my onprem singlestore db using either compute so not sure why oracle would be diffe...
Based on internal research I found that shared access mode does not currently support Oracle JDBC connector only in Single/Assigned access mode.
There is a feature request to include Oracle connector as part of the Lakehouse Federation. Once it is in...
I read a huge array with several columns into memory, then I convert it into a spark dataframe, when I want to write to a delta table it using the following command it takes forever (I have a driver with large memory and 32 workers) : df_exp.write.m...
The answers here are not correct.TLDR: _After_ the Spark DF is materialized, saveAsTable takes ages. 35seconds for 1million rows.saveAsTable() is SLOW - terribly so. Why? Would be nice to get an answer. The workaround is to avoid spark for delta - no...