cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Gary_Irick
by Databricks Partner
  • 16150 Views
  • 10 replies
  • 10 kudos

Delta table partition directories when column mapping is enabled

I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'nam...

  • 16150 Views
  • 10 replies
  • 10 kudos
Latest Reply
Narsikakunuri
New Contributor II
  • 10 kudos

Still same behaviour when Column Mapping enabled

  • 10 kudos
9 More Replies
Balazs
by New Contributor III
  • 13719 Views
  • 4 replies
  • 10 kudos

Unity Catalog Volume as spark checkpoint location

Hi,I tried to set the spark checkpoint location in a notebook to a folder in a Unity Catalog Volume, with the following command: sc.setCheckpointDir("/Volumes/catalog_name/schema_name/volume_name/folder_name")Unfortunately I receive the following err...

  • 13719 Views
  • 4 replies
  • 10 kudos
Latest Reply
aaonurdemir
New Contributor II
  • 10 kudos

Any progress on this? https://docs.databricks.com/aws/en/notebooks/source/graphframes-user-guide-py.htmlthis is not working both with checkpointing and standard graph algorithms

  • 10 kudos
3 More Replies
Naveenkumar1811
by New Contributor III
  • 730 Views
  • 6 replies
  • 2 kudos

Reduce the Time for First Spark Streaming Run Kick off

Hi Team,Currently I have a Silver Delta Table(External) is loading on Streaming and the Gold is on Batch.I Need to Make the Gold Delta as well to Streaming. In My First Run I can the stream initializing process is taking an hour or so as my Silver ta...

  • 730 Views
  • 6 replies
  • 2 kudos
Latest Reply
Naveenkumar1811
New Contributor III
  • 2 kudos

Yes, I understand on the optimize and Vaccum...But still the silver table is very heavy. It is definitely going to take long.Any other suggestions in prod scenario where we can perform this without data loss?

  • 2 kudos
5 More Replies
ChrisLawford_n1
by Contributor II
  • 446 Views
  • 2 replies
  • 2 kudos

Resolved! Bug Report: SDP (DLT) with autoloader not passing through pipe delimiter/separator

I am noticing a difference between using autoloader in an interactive notebook vs using it in a Spark Declarative Pipeline (DLT Pipeline). This issue seems to be very similar to this other unanswered post from a few years ago. Bug report: the delimit...

  • 446 Views
  • 2 replies
  • 2 kudos
Latest Reply
ChrisLawford_n1
Contributor II
  • 2 kudos

Hey,Okay thanks @nikhilj0421. I have now solved the issue but not with a full refresh of the table. I had tried this previously and even deleted the DLT pipeline hoping that would provide me the clean slate if this lingering schema was an issue but w...

  • 2 kudos
1 More Replies
matmad
by New Contributor III
  • 1415 Views
  • 5 replies
  • 2 kudos

Resolved! Job fails on clusters only with library dependency

Hello!I have following problem: All my job runs fail when the job uses a library. Even the most basic job (print a string) and the most basic library package (no secondary dependencies, the script does not even import/use the library) fails with `Fai...

matmad_0-1754491920584.png matmad_1-1754492300025.png matmad_2-1754492381339.png
  • 1415 Views
  • 5 replies
  • 2 kudos
Latest Reply
gopal2026
New Contributor II
  • 2 kudos

Hi, can you please share detailed soluton, did you include any config in databricks.yml? I'm also having same issue. 

  • 2 kudos
4 More Replies
pop_smoke
by New Contributor III
  • 1274 Views
  • 4 replies
  • 5 kudos

Resolved! switching to Databrick from Ab Initio (an old ETL software)- NEED ADVICE

All courses in market and on youtube as per my knowledge for databrick is outdated as those courses are for community edition. there is no new course for free edition of databrick. i am a working profession and i do not get much time. do you guys kno...

  • 1274 Views
  • 4 replies
  • 5 kudos
Latest Reply
markjvickers-im
Databricks Partner
  • 5 kudos

@pop_smoke What were the arguments that swayed your organization to swtich to Databricks from Ab Initiio?Pure cost basis?

  • 5 kudos
3 More Replies
Ved88
by Databricks Partner
  • 502 Views
  • 1 replies
  • 1 kudos

Resolved! al purpose databricks cluster disappear

Hi, i can see sometime the cluster get disappeared even though it was created some time back using cluster pipeline,what could be the reason to disappear.we can recreate cluster but wanted to know the reason why this cluster get disappeared.thanks!Ve...

  • 502 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Ved88 ,30 days after a compute is terminated, it is permanently deleted. To keep an all-purpose compute configuration after a compute has been terminated for more than 30 days, an administrator can pin the compute. Up to 100 compute resources can...

  • 1 kudos
smpa01
by Contributor
  • 531 Views
  • 2 replies
  • 2 kudos

Resolved! Python DataSource API utilities/ Import Fails in Spark Declarative Pipeline

TLDR - UDFs work fine when imported from `utilities/` folder in DLT pipelines, but custom Python DataSource APIs fail with ModuleNotFoundError: No module named 'utilities'` during serialization. Only inline definitions work. Need reusable DataSource ...

  • 531 Views
  • 2 replies
  • 2 kudos
Latest Reply
smpa01
Contributor
  • 2 kudos

@emma_s  Thank you for the guidance! The wheel package approach worked perfectly.I also tried putting the .py directly in but it did not work/Workspace/Libraries/custom_datasource.py 

  • 2 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 594 Views
  • 3 replies
  • 5 kudos

Resolved! Is Auto Loader open source now in Apache 4.1 SDP?

With Spark Declarative Pipelines (SDP) being open source now, does this mean that the Databricks Auto Loader functionality is also open source? Is it called something else? If not, how does the open-source version handle incremental data processing a...

  • 594 Views
  • 3 replies
  • 5 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 5 kudos

Hi @ChristianRRL ,No, autoloader is propriety to Databricks. It's not open sourced. Open source version of SDP uses spark structured streaming for incremental processing. Keep in mind that Auto Loader is basically just Spark streaming under the hood ...

  • 5 kudos
2 More Replies
Bkr-dbricks
by New Contributor II
  • 429 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks free Edition to Azure Connectivity

Hello EveryoneAs a beginner in databricks, I have a question. Can we connect Databricks Free Edition to connect Azure Blob/ Gen 2 storage? I would like to create external tables on files on Azure and Delta lake tables on top of it.Your help is apprec...

  • 429 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Bkr-dbricks ,According to following topic Free Edition doesn't support external locations.Solved: If use databricks free version not free trail can ... - Databricks Community - 127421

  • 0 kudos
kivaniutenko
by New Contributor
  • 732 Views
  • 1 replies
  • 1 kudos

HTML Formatting Issue in Databricks Alerts

Hello everyone,I have recently encountered an issue with HTML formatting in custom templates for Databricks Alerts. Previously, the formatting worked correctly, but now the alerts display raw HTML instead of properly rendered content.For example, an ...

  • 732 Views
  • 1 replies
  • 1 kudos
Latest Reply
mmayorga
Databricks Employee
  • 1 kudos

hi @kivaniutenko  thanks for reaching out. Databricks alerts still support basic HTML in email templates, but HTML will render correctly only for email destinations and only with simple, allowed tags.​​ Quick things to try Make sure you are using Ale...

  • 1 kudos
SparkMan
by Databricks Partner
  • 705 Views
  • 2 replies
  • 2 kudos

Resolved! Job Cluster Reuse

Hi, I have a job where a job cluster is reused twice for task A and task C. Between A and C, task B runs for 4 hours on a different interactive cluster. The issue here is that the job cluster doesn't terminate as soon as Task A is completed and sits ...

  • 705 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @SparkMan ,This is expected behavior with Databricks job cluster reuse unless you change your job/task configuration. Look at following documentation entry:So with your flow you have something like this:Task A (job cluster) → Task B (interactive c...

  • 2 kudos
1 More Replies
nkrish
by New Contributor II
  • 472 Views
  • 1 replies
  • 1 kudos

Resolved! Regarding Accelerators

Are there any databricks accelerators to convert the c# and qlikview code to pyspark ? We are using the Open source AI tools to convert now but wondering is there any better way to do the same?Thanks in advance 

  • 472 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @nkrish ,Unfortunately, I don't think so. Available accelerators you can find here:Databricks Solution Accelerators for Data & AI | DatabricksBut I haven't heard anything about accelerator for c# and qlikview specifically.

  • 1 kudos
deepu1
by New Contributor
  • 617 Views
  • 1 replies
  • 0 kudos

Resolved! DLT Gold aggregation with apply_change

I am building a Gold table using Delta Live Tables (DLT). The Gold table contains aggregated data derived from a Silver table. Aggregation happens monthly. However, the requirement is Only the current (year, month) should be recalculated. Previous mo...

  • 617 Views
  • 1 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @deepu1 , Assuming that @dlt.table refers to a Materialized View (MV), you are correct that this is the standard way to create aggregated tables in the Gold layer. A Materialized View is essentially a table that stores the results of a specific qu...

  • 0 kudos
PabloCSD
by Valued Contributor II
  • 790 Views
  • 5 replies
  • 3 kudos

Resolved! How to use/install a driver in Spark Declarative Pipelines (ETL)?

Salutations,I'm using SDP for an ETL that extracts data from HANA and put it in the Unity Catalog. I defined a Policy with the needed driver:But I get this error:An error occurred while calling o1013.load. : java.lang.ClassNotFoundException: com.sap....

PabloCSD_0-1768228884826.png
  • 790 Views
  • 5 replies
  • 3 kudos
Latest Reply
anshu_roy
Databricks Employee
  • 3 kudos

At this time, Databricks does not offer native connectors for SAP HANA. You can find the complete list of managed connectors currently available in Databricks here. We generally recommend beginning with SAP’s own commercial tools, prioritizing SAP Bu...

  • 3 kudos
4 More Replies
Labels