cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MCosta
by New Contributor III
  • 7066 Views
  • 11 replies
  • 20 kudos

Resolved! Debugging!

Hi ML folks, We are using Databricks to train deep learning models. The code, however, has a complex structure of classes. This would work fine in a perfect bug-free world like Alice in Wonderland. Debugging in Databricks is awkward. We ended up do...

  • 7066 Views
  • 11 replies
  • 20 kudos
Latest Reply
petern
New Contributor II
  • 20 kudos

Has this been solved yet; a mature way to debug code on databricks. I'm running in the same kind of issue.Variable explorer can be used and pdb, but not the same really..

  • 20 kudos
10 More Replies
DatBoi
by Contributor
  • 913 Views
  • 2 replies
  • 1 kudos

Resolved! How big should a delta table be to benefit from liquid clustering?

My questions is pretty straightforward - how big should a delta table be to benefit from liquid clustering? I know the answer will most likely depend on the details of how you are querying the data, but what is the recommendation?I know Databricks re...

  • 913 Views
  • 2 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@DatBoi Once you watch this video you'll understand more about Liquid Clustering https://www.youtube.com/watch?v=5t6wX28JC_M&ab_channel=DeltaLakeLong story short:I know Databricks recommends not partitioning on tables less than 1 TB and aim for 1 GB ...

  • 1 kudos
1 More Replies
Jagan_etl
by New Contributor II
  • 581 Views
  • 3 replies
  • 0 kudos

Avro file format generation

Hi All,We are using cluster with 9.1 run time version, I'm getting "incompatible schema exception" error while writing the data into avro file. Fields in Avro schema are more compared to dataframe output Fields. I tried the same in community edition ...

  • 581 Views
  • 3 replies
  • 0 kudos
Latest Reply
Jagan_etl
New Contributor II
  • 0 kudos

Hi All,Any suggestions on this.

  • 0 kudos
2 More Replies
BhaveshPatel
by New Contributor
  • 1239 Views
  • 1 replies
  • 0 kudos

Auto loader

Suppose I have 1000's of historical .csv files stored from Jan, 2022 in a folder of my azure blob storage container. I want to use auto loader to read files beginning only on 1st, Oct, 2023 and ignoring all the files before this date to build a pipel...

  • 1239 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@BhaveshPatel Three things that you can do:- Move the files to the separate folder,- Use a filter on metadata fields to filter out the unnecessary files,- Use a pathGlobFilter to filter only on the files you need

  • 0 kudos
Bharathi7
by New Contributor II
  • 477 Views
  • 3 replies
  • 0 kudos

Python UDF fails with UNAVAILABLE: Channel shutdownNow invoked

I'm using a Python UDF to apply OCR to each row of a dataframe which contains the URL to a PDF document. This is how I define my UDF:  def extract_text(url: str): ocr = MyOcr(url) extracted_text = ocr.get_text() return json.dumps(extracte...

  • 477 Views
  • 3 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@Bharathi7 It's really hard to determine what's going on without knowing what acutally MyOcr function does.Maybe there's some kind of timeout on the service side? To many parallell connections?

  • 0 kudos
2 More Replies
nggianno
by New Contributor III
  • 1174 Views
  • 4 replies
  • 0 kudos

How to enable Delta live tables serverless in Databricks?

I am trying to enable the Serverless mode in the Delta Live Tables, based on what the official Databricks channel YouTube video "Delta Live Tables A to Z: Best practices for Modern Data Pipelines".And I cannot find it in my UI. Could you help me with...

  • 1174 Views
  • 4 replies
  • 0 kudos
Latest Reply
nggianno
New Contributor III
  • 0 kudos

The problem is that the "Serverless" checkbox does not appear in my UI Pipeline Settings. So, I do not know how to enable serverless given your instructions. Can you tell me why the button is not displayed or how to display it or how to enable DLT se...

  • 0 kudos
3 More Replies
Poovarasan
by New Contributor II
  • 671 Views
  • 2 replies
  • 0 kudos

com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id `2` in colum

Hi,Currently, I am using the below-mentioned query to create a materialized view. It was working fine until yesterday in the DLT pipeline, but from today on, the below-provided code throws an error (com.databricks.sql.transaction.tahoe.ColumnMappingE...

Data Engineering
ColumnMapping
dlt
  • 671 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Poovarasan, The error message you’re encountering, com.databricks.sql.transaction.tahoe.ColumnMappingException: Found duplicated column id 2 in column, indicates that there is a conflict related to column IDs in your query. Let’s break down the ...

  • 0 kudos
1 More Replies
elgeo
by Valued Contributor II
  • 5783 Views
  • 6 replies
  • 4 kudos

Resolved! Data type length enforcement

Hello. Is there a way to enforce the length of a column in SQL? For example that a column has to be exactly 18 characters? Thank you!

  • 5783 Views
  • 6 replies
  • 4 kudos
Latest Reply
databricks31
New Contributor II
  • 4 kudos

we are facing similar issues while write into adls location delta format, after that we created on top delta location unity catalog tables. below format of data type length should be possible to change spark sql supported ?Azure SQL Spark            ...

  • 4 kudos
5 More Replies
NT911
by New Contributor II
  • 476 Views
  • 1 replies
  • 0 kudos

how to reduce file size in sedona o/p

I have shape files with polygon/geometry info. I am exporting the file after Sedona integration with Kepler.I o/p file is in .html. I want to reduce the file size.Pls suggest in case any option is available.

  • 476 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @NT911,  When dealing with shape files and trying to reduce the file size, there are a few strategies you can consider: Simplify Geometries: One effective method is to simplify the geometries in your shape file. This involves reducing the numb...

  • 0 kudos
Ajay-Pandey
by Esteemed Contributor III
  • 726 Views
  • 3 replies
  • 7 kudos

docs.databricks.com

Rename and drop columns with Delta Lake column mapping. Hi all,Now databricks started supporting column rename and drop.Column mapping requires the following Delta protocols:Reader version 2 or above.Writer version 5 or above.Blog URL##Available in D...

  • 726 Views
  • 3 replies
  • 7 kudos
Latest Reply
Poovarasan
New Contributor II
  • 7 kudos

Above mentioned feature is not working in the DLT pipeline. if the scrip has more than 4 columns 

  • 7 kudos
2 More Replies
afisl
by New Contributor II
  • 2790 Views
  • 6 replies
  • 2 kudos

Apply unitycatalog tags programmatically

Hello,I'm interested in the "Tags" feature of columns/schemas/tables of the UnityCatalog (described here: https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/tags)I've been able to play with them by hand and would now lik...

Data Engineering
tags
unitycatalog
  • 2790 Views
  • 6 replies
  • 2 kudos
Latest Reply
databass
New Contributor II
  • 2 kudos

just confirming, that as at March 2024 you can use SQL to set/unset tags on:TablesTable ColumnsViewsBut NOT on View Columnshowever you CAN do this via the UI. 

  • 2 kudos
5 More Replies
seydouHR
by New Contributor III
  • 1295 Views
  • 4 replies
  • 0 kudos

Resolved! CLONE not supported on delta table with Liquid Clustering

Hello all,We are building a data warehouse on Unity Catalog and we use the SHALLOW CLONE command to allow folks to spin up their own dev environments by light copying the prod tables. We also started using Liquid Clustering on our feature tables, tho...

  • 1295 Views
  • 4 replies
  • 0 kudos
Latest Reply
seydouHR
New Contributor III
  • 0 kudos

Thanks Kaniz for your reply. I was able to get it make it work using runtime 14.0.Regards, 

  • 0 kudos
3 More Replies
sumitdesai
by New Contributor II
  • 1042 Views
  • 1 replies
  • 2 kudos

How to reuse a cluster with Databricks Asset bundles

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual...

  • 1042 Views
  • 1 replies
  • 2 kudos
Latest Reply
Wojciech_BUK
Contributor III
  • 2 kudos

Hello,Jobs are specific in Databricks; a job definition also contains the cluster definition because when you run a job, a new cluster is created based on the cluster specification you provided for the job, and it exists only until the job is complet...

  • 2 kudos
espenol
by New Contributor III
  • 8592 Views
  • 9 replies
  • 10 kudos

input_file_name() not supported in Unity Catalog

Hey, so our notebooks reading a bunch of json files from storage typically use a input_file_name() when moving from raw to bronze, but after upgrading to Unity Catalog we get an error message:AnalysisException: [UC_COMMAND_NOT_SUPPORTED] input_file_n...

  • 8592 Views
  • 9 replies
  • 10 kudos
Latest Reply
JasonThomas
New Contributor III
  • 10 kudos

.withColumn("RECORD_FILE_NAME", col("_metadata.file_name"))Will work for spark.read to get the file name, or: .withColumn("RECORD_FILE_NAME", col("_metadata.file_path"))To get the whole file path

  • 10 kudos
8 More Replies
n-riesco
by New Contributor
  • 15337 Views
  • 5 replies
  • 0 kudos

How can I view an exported DBC notebook in my computer?

Is it possible to convert to or export as a .ipynb notebook?

  • 15337 Views
  • 5 replies
  • 0 kudos
Latest Reply
AlexV
New Contributor II
  • 0 kudos

You can rename somefile.dbc to somefile.zip and open it with the Windows File Explorer, however the .python files cannot be opened in vscode or pycharm

  • 0 kudos
4 More Replies
Labels
Top Kudoed Authors