Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
We are wanting to use Databricks asset bundles and deploy code changes and tests using github actions. We have seen lots of content online, but nothing concrete on how this is done at scale. So I'm wondering, if we have many changes and therefore man...
Today, while reading a delta load my notebook failed and I wanted to report a bug. The withColumns command does not tolerate an empty dictionary and gives the following error in PySpark.flat_tuple = namedtuple("flat_tuple", ["old_col", "new_col", "lo...
Hello @Dhruv-22 ,
I have tested this internally, and this seems to be a bug with the new Serverless env version 4
As a solution, you can try switching the version to 3 as shown bleow and re-run the above code, and it should work.
Hello, we are trying to evaluate Databricks solution to extract the data from existing cloudera schema hosted on physical server. We are using the Databricks serverless compute provided by databricks express setup and we assume we will not need t...
I work for a databricks partner called Cirata. Our Data migrator offering allows both data and metadata replication from cloudera to be delivered to the databricks environment , whether this is just delivering it to the ADLS2 object storage or to ...
I'm playing little bit with on the Databricks free environment and I'm super confused by the documentation vs actual behavior. Maybe you could help me to understand better.For the workspace I can define base environment which I can use in serverless ...
Hello @pepco ,
Is it possible to use environments with notebook tasks?
Yes—but only in a very specific way.
Notebook tasks can use base environments, but you don’t attach them in the job’s YAML. You pick the base env in the notebook’s Environment sid...
Hi all,I need some help on this masking problem. If you create a view with used masking function based on table.The user reading this view has to have read access to underlying table. So theoretically, he can access unmasked data in the table.I would...
Hi allI need to ingest data from on prem MS sql tables using Databricks to Azure Cloud. For the ingest, previously I used notebooks, jdbc connectors, read sql tables and write in unity catalog tables. Now, I want to experiment Databricks connectors f...
This is feature is good to go... I can't think of any disadvantages. Here is a guide.
https://landang.ca/2025/01/31/simple-data-ingestion-from-sql-server-to-databricks-using-lakeflow-connect/
If your company uses Databricks with many people, how do you manage security, organize teams, and control costs — and what tools do you use to make it all work smoothly?
Please take a look here to get some initial ideas.
https://medium.com/databricks-unity-catalog-sme/a-practical-guide-to-catalog-layout-data-sharing-and-distribution-with-databricks-unity-catalog-763e4c7b7351
"error_code": "INVALID_PARAMETER_VALUE", "message": "Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}
Hi @Debayan I'd suggest to also mention this explicitly in the documentation of the workspace client for get_run_outputOne has to pay extra attention to the examplerun_id=run.tasks[0].run_id otherwise it can be easily missed.
Hey Community, I'm facing this error, It says that "com.databricks.pipelines.common.errors.deployment.DeploymentException: Communication lost with driver. Cluster 1030-205818-yu28ft9s was not reachable for 120 seconds" This issue occurred in producti...
Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...
You can try to add to your cluster mvn dependency manually ... For example, for spark 3.5.x it will be like:io.graphframes:graphframes-spark3_2.12:0.10.0and add a PyPi dependency graphframes-py. Adding maven coordinates should download and install al...
We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...
You could try enabling full stack traces and checking the Spark executor logs for hidden errors. Null Pointer Exceptions in Scala on DBR often come from lazy evaluations or missing schema fields during I/O. Reviewing your Data Frame transformations a...
Hi there, we have an industry data platform with multiple customers using it. We provide each customer with their own data every night via .csv. Some of our customers use Databricks, and import their data from us into it.We would like to offer a more...
You could use external volumes with a Cloudflare R2 bucket as an intermediary - you write the nightly data files to R2 (using S3-compatible API), and your customers create external volumes in their Databricks workspace pointing to their designated R2...
In DBR 164.LTS, I am trying to add the following Spark config: spark.scheduler.allocation.file: file:/Workspace/init/fairscheduler.xmlBut the all purpose cluster is throwing this error Spark error: Driver down cause: com.databricks.backend.daemon.dri...
Hi EveryoneI am writing a small function, with spark read from a csv and spark write into a table. I could execute this function within the notebook. But, when I register the same function as a unity catalog function and calling it from Playground, i...
Hi @GiriSreerangam, You cannot use a Unity Catalog user-defined function (UDF) in Databricks to perform Spark read from a CSV and write to a table. Unity Catalog Python UDFs execute in a secure, isolated environment without access to the file system ...