cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yinan
by New Contributor III
  • 1977 Views
  • 5 replies
  • 2 kudos
  • 1977 Views
  • 5 replies
  • 2 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 2 kudos

Hello @yinan Good day!!Databricks, being a cloud-based platform, does not have direct built-in support for reading data from a truly air-gapped (completely offline, no network connectivity) Cloudera Distribution for Hadoop (CDH) environment.  In such...

  • 2 kudos
4 More Replies
Kurgod
by New Contributor II
  • 995 Views
  • 2 replies
  • 0 kudos

Using Databricks to transform cloudera lakehouse on-prem without bringing the data to cloud

I am looking for a solution to connect databricks to cloudera lakehouse hosted on-prem and transform the data using databricks without bringing the data to databricks delta tables or cloud storage. once the transformation is done the data need to be ...

  • 995 Views
  • 2 replies
  • 0 kudos
Latest Reply
BR_DatabricksAI
Databricks Partner
  • 0 kudos

Hello, What is your data volume? You can connect using  jdbc/odbc but this process will be slower if the data volume is too high.Another way of connecting is if your cloudera storage is in HDFS then you can also connect through HDFS API as well.  

  • 0 kudos
1 More Replies
azam-io
by Databricks Partner
  • 2168 Views
  • 4 replies
  • 2 kudos

How can I structure pipeline-specific job params separately in Databricks Asset Bundle.

Hi all, I am working with databricks asset bundle and want to separate environment-specific job params (for example, for "env" and "dev") for each pipeline within my bundle. I need each pipeline to have its own job params values for different environ...

  • 2168 Views
  • 4 replies
  • 2 kudos
Latest Reply
Michał
New Contributor III
  • 2 kudos

Hi azam-io, were you able to solve your problem? Are you trying to have different parameters depending on the environment, or a different parameter value? I think the targets would allow to specify different parameters per environment / target. As fo...

  • 2 kudos
3 More Replies
seefoods
by Valued Contributor
  • 3172 Views
  • 2 replies
  • 1 kudos

Resolved! assets bundle

Hello Guys,I am working on assets bundle. So i want to make it generic for all team like ( analytics, data engineering), Someone could you share a best practice for this purpose ? Cordially, 

  • 3172 Views
  • 2 replies
  • 1 kudos
Latest Reply
Michał
New Contributor III
  • 1 kudos

Hi seefoods, Were you able to achieve that generic asset bundle setup? I've been working on something, potentially, similar, and I'd be happy to discuss it, hoping to share experiences. While what I have works for a few teams, it is focused on declar...

  • 1 kudos
1 More Replies
korijn
by New Contributor II
  • 2061 Views
  • 4 replies
  • 0 kudos

Git integration inconsistencies between git folders and job git

It's a little confusing and limiting that the git integration support is inconsistent between the two options available.Sparse checkout is only supported when using a workspace Git folder, and checking out by commit hash is only supported when using ...

  • 2061 Views
  • 4 replies
  • 0 kudos
Latest Reply
_J
Databricks Partner
  • 0 kudos

Same here, could be a good improvement for the jobs layer guys!

  • 0 kudos
3 More Replies
IONA
by New Contributor III
  • 3528 Views
  • 6 replies
  • 7 kudos

Resolved! Getting data from the Spark query profiler

When you navigate to Compute > Select Cluster > Spark UI > JDBC/ODBC There you can see grids of Session stats and SQL stats. Is there any way to get this data in a query so that I can do some analysis? Thanks

  • 3528 Views
  • 6 replies
  • 7 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 7 kudos

 Hi @IONA ,As @Louis_Frolio  correctly suggested there no native way to get stats from JDBC/ODBC Spark UI.1. You can try to use query history system table, but it has limited number of metrics %sql SELECT * FROM system.query.history 2. You can use /a...

  • 7 kudos
5 More Replies
Yulei
by New Contributor III
  • 35929 Views
  • 7 replies
  • 1 kudos

Resolved! Could not reach driver of cluster

 Hi, Rencently, I am seeing issue Could not reach driver of cluster <some_id> with my structure streaming job when migrating to unity catalog and found this when checking the traceback:Traceback (most recent call last):File "/databricks/python_shell/...

  • 35929 Views
  • 7 replies
  • 1 kudos
Latest Reply
osingh
Contributor
  • 1 kudos

It seems like a temporary connectivity or cluster initialization glitch. So if anyone else runs into this, try re-running the job before diving into deeper troubleshooting - it might just work!Hope this helps someone save time.

  • 1 kudos
6 More Replies
ChristianRRL
by Honored Contributor
  • 1319 Views
  • 1 replies
  • 1 kudos

Resolved! Can schemaHints dynamically handle nested json structures? (Part 2)

Hi there, I'd like to follow up on a prior post:https://community.databricks.com/t5/data-engineering/can-schemahints-dynamically-handle-nested-json-structures/m-p/130209/highlight/true#M48731Basically I'm wondering what's the best way to set *both* d...

  • 1319 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I am not aware on schemahints supporting wildcards for now.  It would be awesome to have though, I agree.So I think you are stuck with what is already proposed in your previous post, or exploding the json or other transformations.

  • 1 kudos
minhhung0507
by Valued Contributor
  • 1183 Views
  • 1 replies
  • 1 kudos

Could not reach driver of cluster

I am running a pipeline job in Databricks and it failed with the following message:Run failed with error message Could not reach driver of cluster 5824-145411-p65jt7uo. This message is not very descriptive, and I am not able to identify the root ca...

minhhung0507_0-1756870994085.png
  • 1183 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @minhhung0507 ,Typically this error could appear when there's a high load on the driver node. Another reason could be related to high garbage collection on driver node as well as high memory and cpu which leads to throttling, and prevents the driv...

  • 1 kudos
erigaud
by Honored Contributor
  • 13077 Views
  • 7 replies
  • 6 kudos

Resolved! SFTP Autoloader

Hello, Don't know if it is possible, but I am wondering if it is possible to ingest files from a SFTP server using autoloader ? Or do I have to first copy the files to my dbfs and then use autoloader on that location ? Thank you !

  • 13077 Views
  • 7 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @erigaud  We haven't heard from you since the last response from​, @BriceBuso  and I was checking back to see if her suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpful to others.  Al...

  • 6 kudos
6 More Replies
james_
by New Contributor II
  • 1345 Views
  • 5 replies
  • 0 kudos

Low worker utilisation in Spatial SQL

I am finding low worker node utilization when using Spatial SQL features. My cluster is DBR 17.1 with 2x workers and photon enabled.When I view the cluster metrics, they consistently show one worker around 30-50% utilized, the driver around 15-20%, a...

  • 1345 Views
  • 5 replies
  • 0 kudos
Latest Reply
james_
New Contributor II
  • 0 kudos

Thank you again, @-werners- . I have a lot still to learn about partitioning and managing spatial data. Perhaps I mainly need more patience!

  • 0 kudos
4 More Replies
ScottH
by New Contributor III
  • 1384 Views
  • 3 replies
  • 3 kudos

Resolved! Installing Marketplace Listing via Python SDK...

I am trying to use the Databricks Python SDK to install a Databricks Marketplace listing to Unity Catalog. I am getting stuck on how to provide a valid consumer terms version when passing the "accepted_consumer_terms" parameter to the w.consumer_inst...

ScottH_0-1756486791644.png
  • 1384 Views
  • 3 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @ScottH ,It took me about 2 hours to make it right, but here it is. You need to provide valid date. And you can ask, from where that date is coming from? It's coming from consumer listing: listings = w.consumer_listings.get(id= 'e913bea3-9a37-446c...

  • 3 kudos
2 More Replies
der
by Valued Contributor
  • 3025 Views
  • 2 replies
  • 2 kudos

Resolved! DBR 17.1 Spatial SQL Functions and Apache Sedona

I noticed in the DBR 17.1 release notes that ST geospatial functions are now in public preview - great news for us since this means native support in Databricks.https://docs.databricks.com/aws/en/release-notes/runtime/17.1#expanded-spatial-sql-expres...

  • 3025 Views
  • 2 replies
  • 2 kudos
Latest Reply
mjohns
Databricks Employee
  • 2 kudos

Here are a few answers, feel free to hit me up on LinkedIn (michaeljohns2) if you want to discuss more particulars wrt Databricks geospatial. Looks like Sedona 1.8.0 is the release to watch for with Spark 4.0 support, see https://github.com/apache/se...

  • 2 kudos
1 More Replies
mikvaar
by Databricks Partner
  • 2909 Views
  • 4 replies
  • 1 kudos

Resolved! DLT Pipelines with DABs - Support for tags field?

Hi all,I'm working with DABs and trying to define tags for DLT pipelines in the bundle YAML config. However, adding a `tags:` block under the pipeline results in the following warning: Warning: unknown field: tags This suggests that tags might not be...

  • 2909 Views
  • 4 replies
  • 1 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 1 kudos

Hi @mikvaar, Yes, tags are not supported yet in DABs, but it is in the roadmap. The ETA for this is around first or second week of June. 

  • 1 kudos
3 More Replies
DRock
by New Contributor II
  • 6960 Views
  • 7 replies
  • 0 kudos

Resolved! ODBC data source to connect to a Databricks catalog.database via MS Access Not Working

When using an ODBC data source to connect to a Databricks catalog database via Microsoft Access, the tables are not listing/appearing in the MS Access database for selection.However, when using the same ODBC data source to connect to Microsoft Excel,...

  • 6960 Views
  • 7 replies
  • 0 kudos
Latest Reply
Senefelder
New Contributor II
  • 0 kudos

Why do «Databricks employee» keep answering with the same AI generated reply, when that obviously not is the solution? Has anyone been able to come up with a solution which actually works?

  • 0 kudos
6 More Replies
Labels