cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

zmsoft
by New Contributor III
  • 738 Views
  • 7 replies
  • 2 kudos

Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' :

Hi there,My cluster version is 15.4 LTS, and the workspace has UC enabled. When I used the initialization script to install ODBC Driver 17 for SQL Server, there were no errors and the cluster started successfully. But when I use ODBC Driver 17 for SQ...

Data Engineering
ODBC Driver 17 for SQL Server
Runtime 15.4 LTS
  • 738 Views
  • 7 replies
  • 2 kudos
Latest Reply
APat449
New Contributor II
  • 2 kudos

Also is INIT script route is the only route? or is there any other option available?the reason I am asking, some time back we had a call with DBK and then they mentioned like usage of INIT script is not the right way or so.. cant recall exact explana...

  • 2 kudos
6 More Replies
Andolina
by New Contributor II
  • 263 Views
  • 4 replies
  • 2 kudos

Workflow concurrent runs not working as expected

Hello All,I am trying to fetch data from different sources for tables driven by a metadata table. Data will get fetched from sources using jdbc connector for each table mentioned in the metadata table. A scheduled job is responsible for fetching the ...

  • 263 Views
  • 4 replies
  • 2 kudos
Latest Reply
elguitar
New Contributor II
  • 2 kudos

Soo.. You use a loop to go through metadata table and then retrieve and ingest files using JDBC?If so, then the concurrent runs won't be helpful. Concurrent runs means the number of how many runs of that job can be ran side by side. For you, this wou...

  • 2 kudos
3 More Replies
nikhilkumawat
by New Contributor III
  • 9970 Views
  • 10 replies
  • 8 kudos

Resolved! Get file information while using "Trigger jobs when new files arrive" https://docs.databricks.com/workflows/jobs/file-arrival-triggers.html

I am currently trying to use this feature of "Trigger jobs when new file arrive" in one of my project. I have an s3 bucket in which files are arriving on random days. So I created a job to and set the trigger to "file arrival" type. And within the no...

  • 9970 Views
  • 10 replies
  • 8 kudos
Latest Reply
elguitar
New Contributor II
  • 8 kudos

I spent some time configuring a setup similar to this. Unfortunately, there's no simple way to do this. There's only {{job.trigger.file_arrival.location}} parameter, but that is pretty much useless, since it points to the directory that we are watchi...

  • 8 kudos
9 More Replies
jar
by New Contributor II
  • 278 Views
  • 4 replies
  • 2 kudos

Resolved! Restricting access to secrets

Hi. I want to restrict access to secrets to a security group, as the secrets can be used to retrieve sensitive data only a few people should see. Up until now, we have been using KV-backed secret scopes, but as it's sufficient that Databricks has the...

  • 278 Views
  • 4 replies
  • 2 kudos
Latest Reply
h_h_ak
Contributor
  • 2 kudos

Hi Johan, this should work for restriction: https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secrets.Fine granulat access based on secrets is currently not possible.BR 

  • 2 kudos
3 More Replies
Shiva3
by New Contributor III
  • 113 Views
  • 1 replies
  • 0 kudos

In Unity Catalog repartition method issue

We are in the process of upgrading our notebooks to Unity Catalog. Previously, I was able to write data to an external Delta table using df.repartition(8).write. Save('path'), which correctly created multiple files. However, during the upgrade, in te...

  • 113 Views
  • 1 replies
  • 0 kudos
Latest Reply
agallardrivilla
New Contributor II
  • 0 kudos

Hi @Shiva3,Maybe you can try this option in Delta Lake in Unity Catalog may have optimizedWrites enabled by default, which can reduce the number of files by automatically coalescing partitions during writes. # Disable auto-compaction and optimized wr...

  • 0 kudos
17780
by New Contributor II
  • 11465 Views
  • 5 replies
  • 2 kudos

databricks single user cluster is not able to assign service principals

I want to set the databricks cluster Access mode to single user and assign the service principal account to the user. In other words, after creating a single mode cluster, how can I access only servcie principals?

  • 11465 Views
  • 5 replies
  • 2 kudos
Latest Reply
Pat_IronBridges
New Contributor II
  • 2 kudos

So, here is an alternative to either the UI (doesn't work actually; not possible) or the CLI. Use the Databricks API.endpoint_change = f"{databricksURL}/api/2.1/clusters/edit" # change single ownerpayload_change = {    "cluster_id": cluster_id  ,"clu...

  • 2 kudos
4 More Replies
VovaVili
by New Contributor II
  • 2139 Views
  • 4 replies
  • 0 kudos

Databricks Runtime 13.3 - can I use Databricks Connect without Unity Catalog?

Hello all,The official documentation for Databricks Connect states that, for Databricks Runtime versions 13.0 and above, my cluster needs to have Unity Catalog enabled for me to use Databricks Connect, and use a Databricks cluster through an IDE like...

  • 2139 Views
  • 4 replies
  • 0 kudos
Latest Reply
ZivadinM
New Contributor II
  • 0 kudos

Did you configure databricks connect without UnitCatalog at the end? If you managed to do that can you share with me how?

  • 0 kudos
3 More Replies
SDas1
by New Contributor
  • 7781 Views
  • 2 replies
  • 2 kudos

Identity column value of Databricks delta table is not started with 0 and increaed by 1. It always started with something like 1 or 2 and increased by 2. Below is the sample code and any logical input here is appreciated

spark.sql("CREATE TABLE integrated.TrailingWeeks(ID bigint GENERATED BY DEFAULT AS IDENTITY (START WITH 0 increment by 1) ,Week_ID int NOT NULL) USING delta OPTIONS (path 'dbfs:/<Path in Azure datalake>/delta')")

  • 7781 Views
  • 2 replies
  • 2 kudos
Latest Reply
agallardrivilla
New Contributor II
  • 2 kudos

Hi,When you define an identity column in  Databricks with GENERATED BY DEFAULT AS IDENTITY (START WITH 0 INCREMENT BY 1), it is expected to start at 0 and increment by 1. However, due to Databricks' distributed architecture, the values may not be str...

  • 2 kudos
1 More Replies
Pavan578
by New Contributor II
  • 160 Views
  • 2 replies
  • 0 kudos

Cluster is not starting

Cluster 'xxxxxxx' was terminated. Reason: WORKER_SETUP_FAILURE (SERVICE_FAULT). Parameters: databricks_error_message:DBFS Daemomn is not reachable., gcp_error_message:Unable to reach the colocated DBFS Daemon.Can Anyone help me how can we resolve thi...

  • 160 Views
  • 2 replies
  • 0 kudos
Latest Reply
Pavan578
New Contributor II
  • 0 kudos

Thanks @agallardrivilla . I will check the above steps and let you know. 

  • 0 kudos
1 More Replies
elikvar
by New Contributor III
  • 18626 Views
  • 9 replies
  • 9 kudos

Cluster occasionally fails to launch

I have a daily running notebook that occasionally fails with the error:"Run result unavailable: job failed with error message Unexpected failure while waiting for the cluster Some((xxxxxxxxxxxxxxx) )to be readySome(: Cluster xxxxxxxxxxxxxxxx is in un...

  • 18626 Views
  • 9 replies
  • 9 kudos
Latest Reply
Pavan578
New Contributor II
  • 9 kudos

Cluster 'xxxxxxx' was terminated. Reason: WORKER_SETUP_FAILURE (SERVICE_FAULT). Parameters: databricks_error_message:DBFS Daemomn is not reachable., gcp_error_message:Unable to reach the colocated DBFS Daemon.Can Anyone help me how can we resolve thi...

  • 9 kudos
8 More Replies
tanjil
by New Contributor III
  • 13152 Views
  • 9 replies
  • 6 kudos

Resolved! Downloading sharepoint lists using python

Hello, I am trying to download lists from SharePoint into a pandas dataframe. However I cannot get any information successfully. I have attempted many solution mentioned in stackoverflow. Below is one of those attempts: # https://pypi.org/project/sha...

  • 13152 Views
  • 9 replies
  • 6 kudos
Latest Reply
huntaccess
New Contributor II
  • 6 kudos

The error "<urlopen error [Errno -2] Name or service not known>" suggests that there's an issue with the server URL or network connectivity. Double-check the server URL to ensure it's correct and accessible. Also, verify that your network connection ...

  • 6 kudos
8 More Replies
pesky_chris
by New Contributor III
  • 1187 Views
  • 5 replies
  • 0 kudos

Resolved! Problem with SQL Warehouse (Serverless)

I get the following error message on the attempt to use SQL Warehouse (Serverless) compute with Materialized Views (a simple interaction, e.g. DML, UI sample lookup). The MVs are created off the back of Federated Tables (Postgresql), MVs are created ...

  • 1187 Views
  • 5 replies
  • 0 kudos
Latest Reply
pesky_chris
New Contributor III
  • 0 kudos

Hey,To clarify, as I think I'm potentially hitting Databricks unintended "functionality".Materialised Views are managed by DLT pipeline, which was deployed with DABs off CI/CD pipeline,DLT Pipeline runs a notebook with Python code creating MVs dynami...

  • 0 kudos
4 More Replies
Edthehead
by Contributor II
  • 1655 Views
  • 2 replies
  • 0 kudos

Parameterized Delta live table pipeline

I'm trying to create an ETL framework on delta live tables and basically use the same pipeline for all the transformation from bronze to silver to gold. This works absolutely fine when I hard code the tables and the SQL transformations as an array wi...

Data Engineering
Databricks
Delta Live Table
dlt
  • 1655 Views
  • 2 replies
  • 0 kudos
Latest Reply
canadiandataguy
New Contributor II
  • 0 kudos

Here is how you can do it

  • 0 kudos
1 More Replies
calvinchan_iot
by New Contributor II
  • 212 Views
  • 1 replies
  • 0 kudos

SparkRuntimeException: [UDF_ERROR.ENV_LOST] the execution environment was lost during execution

Hey everyone,I have been facing a weird error when i upgrade to use Unity Catalog.org.apache.spark.SparkRuntimeException: [UDF_ERROR.ENV_LOST] Execution of function line_string_linear_interp(geometry#1432) failed - the execution environment was lost ...

  • 212 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Valued Contributor III
  • 0 kudos

Hi @calvinchan_iot, How are you doing today?As per my understanding, It sounds like the error may be due to environment instability when running the UDF after enabling Unity Catalog. The [UDF_ERROR.ENV_LOST] error often points to the UDF execution en...

  • 0 kudos
KartRasi_10779
by New Contributor
  • 226 Views
  • 2 replies
  • 0 kudos

Glue Catalog Metadata Management with Enforced Tagging in Databricks

As part of the data governance team, we're trying to enforce table-level tagging when users create tables in a Databricks environment where metadata is managed by AWS Glue Catalog (non-Unity Catalog). Is there a way to require tagging at table creati...

  • 226 Views
  • 2 replies
  • 0 kudos
Latest Reply
145676
New Contributor II
  • 0 kudos

You can use lakeFS pre-merge hooks to force this. Works great with this stack -> https://lakefs.io/blog/lakefs-hooks/ 

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels