cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

asrivas
by New Contributor II
  • 1266 Views
  • 3 replies
  • 0 kudos

Azure Databricks – Lakehouse Federation MySQL Connection Fails but Works in Notebook

I am trying to set up a Lakehouse Federation connection to an Azure MySQL database. When I connect from a Databricks notebook using Python - mysql.connector, same cluster, it works fine.But when I set up the Lakehouse Federation connection and test i...

  • 1266 Views
  • 3 replies
  • 0 kudos
Latest Reply
WiliamRosa
Databricks Partner
  • 0 kudos

Hi @asrivas , I’ve been trying to simulate this on my side and in my case I was able to complete the connection, but I believe in your case the issue comes from the MySQL setting --require_secure_transport=ON. In the notebook it works because the dri...

  • 0 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 2442 Views
  • 7 replies
  • 10 kudos

Resolved! How to overwritten job parameter task inside a job task

Hi community,How to overwritten the job parameter inside the job task? Because seems that the job parameter has a higher priority than a task parameter although it is overwritten

  • 2442 Views
  • 7 replies
  • 10 kudos
Latest Reply
jeremy98
Honored Contributor
  • 10 kudos

Hi Pilsner,Thanks for your response, the issue is that I need to know it before. In this case, we need to set inside a notebook for example the task values. I want to be able to set it at task value. I think it is not provided from Databricks.

  • 10 kudos
6 More Replies
km1837
by Databricks Partner
  • 850 Views
  • 1 replies
  • 0 kudos

DLT Pipeline from Streaming Table

HiI have a bronze table with Product_id, *, start_at, end_At which is a streaming and SCD Type 2 Table, which means any change in product_attributes would insert a new row with end_at as null. So when we take this table with end_at as null , the tabl...

  • 850 Views
  • 1 replies
  • 0 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 0 kudos

Hi @km1837 ,Instead of trying to implement a stream table on a stream table, for your use case I think using Materialized View on next child table would be the best choice.For e.g.: @dlt.table(name="workspace.silver.current_product") def sample_trips...

  • 0 kudos
ganapati
by New Contributor III
  • 1730 Views
  • 9 replies
  • 3 kudos

Resolved! issue updating DLT pipeline configurations using databricks sdk

I am updating dlt pipeline configs with job id , run id and run_datetime of the job , so that i can access these values inside dlt pipeline. below is the code i am using to do that. # Databricks notebook sourceimport sysimport loggingfrom databricks....

  • 1730 Views
  • 9 replies
  • 3 kudos
Latest Reply
ganapati
New Contributor III
  • 3 kudos

Hi, just tested it out, it works!, thanks again for helping out

  • 3 kudos
8 More Replies
liu
by Databricks Partner
  • 696 Views
  • 2 replies
  • 1 kudos

Can the default cluster Serverless of Databricks install Scala packages

Can the default cluster Serverless of Databricks install Scala packagesI need to use the spark-sftp package, but it seems that serverless is different from purpose compute, and I can only install python packages?There is another question. I can use p...

  • 696 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

no scala, you can't even run scala notebooks.about the sftp: the serverless compute is way more limited than general purpose clusters.what folder can't be found? dbfs or s3?

  • 1 kudos
1 More Replies
dbx_user
by New Contributor II
  • 2932 Views
  • 8 replies
  • 0 kudos

Intermittent error: "Command failed because warehouse <<warehouse id>> was stopped."

The error "Command failed because warehouse <<warehouse id>> was stopped." has started popping up during deployment runs. Some times the error correlates with serverless warehouse cluster count reducing to zero while a query is running, sometimes it ...

  • 2932 Views
  • 8 replies
  • 0 kudos
Latest Reply
ADbksUser
New Contributor II
  • 0 kudos

Hey all, having the same issue here. Just doing some development work connected to a serverless SQL warehouse from dbt. Suddenly getting the error "Command failed because warehouse <warehouse_id> was stopped."Nothing's changed between those runs â€ƒ

  • 0 kudos
7 More Replies
tabinashabir
by New Contributor II
  • 1673 Views
  • 5 replies
  • 3 kudos

AutoLoader options includeExistingFiles and modifiedAfter not working

I'm using this code to read data from an ADLS Gen2 location. There are txt files present in sub-folders in the container.  df_stream = spark.readStream \ .format("cloudFiles") \ .option("cloudFiles.format", "text") \ .optio...

  • 1673 Views
  • 5 replies
  • 3 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 3 kudos

Root cause:includeExistingFiles is only evaluated the first time the stream is started with a fresh checkpoint. If the stream is restarted or the checkpoint folder is reused, changing this option will have no effect on subsequent runs—old files previ...

  • 3 kudos
4 More Replies
ManojkMohan
by Honored Contributor II
  • 2459 Views
  • 5 replies
  • 5 kudos

Resolved! Extracting PDFs and using AI queries | best practices

Problem i am solving:Upload PDF → available in /Volumes/<catalog>/<schema>/<volume>/.Extract text with pdfplumber (or OCR if scanned).Store in Delta table for governance.Parse intelligently using:ai_query() with Databricks LLMs for flexible JSON outp...

ManojkMohan_0-1757363520515.png ManojkMohan_1-1757363556810.png ManojkMohan_2-1757363593667.png
  • 2459 Views
  • 5 replies
  • 5 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 5 kudos

Hi @ManojkMohan ,Maybe you're using wrong endpoint name. Try with databricks-meta-llama-3-3-70b-instructIn your case you're trying to call an API with following name: databricks-meta-llama-3-70b-instruct which I guess has small typo

  • 5 kudos
4 More Replies
cchiaramelli
by Databricks Partner
  • 723 Views
  • 3 replies
  • 5 kudos

Resolved! Unable to Delete Failed Databricks Job VMs in Azure

My Job Compute had trouble on starting the cluster, acusing "Unexpected failure while waiting for the cluster (xxxx) to be ready: Cluster 'xxxx' is unhealthy"After multiple retries, a new error message appeared:"Operation could not be completed as it...

imagem.png
  • 723 Views
  • 3 replies
  • 5 kudos
Latest Reply
cchiaramelli
Databricks Partner
  • 5 kudos

UPDATE: Before opening the Support Ticket, the machines suddently disappeared. I deleted the Jobs definitions with its JobClusters definitions, and maybe that solved it, or after some hours the machines were cleaned. Not sure what cleaned it.Also I n...

  • 5 kudos
2 More Replies
TechExplorer
by New Contributor II
  • 2448 Views
  • 3 replies
  • 1 kudos

Resolved! Unable to unpack or read rar file

Hi everyone,I'm encountering an issue with the following code when trying to unpack or read a RAR file in Databricks: with rarfile.RarFile(s3_path) as rf: for file_info in rf.infolist(): with rf.open(file_info) as file: file_c...

  • 2448 Views
  • 3 replies
  • 1 kudos
Latest Reply
Upendra_Dwivedi
Databricks Partner
  • 1 kudos

Hi @Walter_C,I am also using this unrar utility but the problem it is a proprietary software and i am working for a client and this license could cause issues. What is the alternative to unrar so that we eliminate the risk of any legal compliance.

  • 1 kudos
2 More Replies
Datalight
by Contributor
  • 1757 Views
  • 5 replies
  • 1 kudos

Resolved! How to design Airship Integration with Azure Databricks

Hello,I have to push data from Airship and persists it to Delta tables. I think We can used SFTP , May someone please help me how to design the inbound part , it using SFTP on Airship end to push file on ADLS Gen2.networking and security consideratio...

Datalight_0-1757430273153.png
  • 1757 Views
  • 5 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

Inbound Flow DesignEnable SFTP on the ADLS Gen2 (or Azure Blob Storage) account;Generate and register an SSH public/private key pair with Airship, enter your SFTP endpoint credentials (username, host, port, key) in Airship’s settings to authenticate ...

  • 1 kudos
4 More Replies
Khaja_Zaffer
by Esteemed Contributor
  • 2270 Views
  • 10 replies
  • 5 kudos

Resolved! CONTAINER_LAUNCH_FAILURE

Hello everyone!I need some help, unable to get cluster up and running. I did try creating classic compute but fails, is there any limit to use databricks community edition? Error here: { "reason": { "code": "CONTAINER_LAUNCH_FAILURE", "type...

  • 2270 Views
  • 10 replies
  • 5 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 5 kudos

To all legacy community edition is working fine if you use dbr <= 15.4 for both general and ML modes.I think legacy community still far more better than free edition. I was selecting >15.4 DBR Thank you. 

  • 5 kudos
9 More Replies
SiarheiSintsou
by New Contributor
  • 557 Views
  • 2 replies
  • 0 kudos

Serverless performance_target option is not available for one time jobs

Why this option https://docs.databricks.com/api/workspace/jobs/create#performance_target does not available for one-time runs https://docs.databricks.com/api/workspace/jobs/submit ?

  • 557 Views
  • 2 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @SiarheiSintsou! The performance_target isn’t currently supported in the SubmitRun API. However, it would be helpful if you could submit a feature request here.

  • 0 kudos
1 More Replies
yvishal519
by Contributor
  • 3940 Views
  • 2 replies
  • 0 kudos

Identifying Full Refresh vs. Incremental Runs in Delta Live Tables

Hello Community,I am working with a Delta Live Tables (DLT) pipeline that primarily operates in incremental mode. However, there are specific scenarios where I need to perform a full refresh of the pipeline. I am looking for an efficient and reliable...

  • 3940 Views
  • 2 replies
  • 0 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 0 kudos

Hello,There are two ways to determine whether a DLT pipeline is running in Full Refresh or Incremental mode:DLT Event Log SchemaThe details column in the DLT event log schema includes information on "full_refresh". You can use this to identify whethe...

  • 0 kudos
1 More Replies
zyang
by Contributor II
  • 1176 Views
  • 2 replies
  • 0 kudos

Resolved! ModuleNotFoundError: No module named 'databricks.sdk.service.database'

Hi , https://learn.microsoft.com/en-gb/azure/databricks/oltp/sync-data/sync-table?source=docs#python-sdk  The module cannot be found. The cluster is as screenshot and the code is from docs. Best regards,   

zyang_0-1757505958128.png
  • 1176 Views
  • 2 replies
  • 0 kudos
Latest Reply
WiliamRosa
Databricks Partner
  • 0 kudos

The current version is the following: 

  • 0 kudos
1 More Replies
Labels