cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

naga93
by New Contributor
  • 1263 Views
  • 1 replies
  • 0 kudos

How to read Delta Lake table with Spaces/Special Characters in Column Names in Dremio

Hello,I am currently writing a Delta Lake table from Databricks to Unity Catalog using PySpark 3.5.0 (15.4 LTS Databricks runtime). We want the EXTERNAL Delta Lake tables to be readable from both UC and Dremio. Our Dremio build version is 25.0.6.The ...

  • 1263 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi naga93,How are you doing today?, As per my understanding, you’ve done a great job navigating all the tricky parts of Delta + Unity Catalog + Dremio integration! You're absolutely right to set minReaderVersion to 2 and disable deletion vectors to m...

  • 0 kudos
surajitDE
by New Contributor III
  • 993 Views
  • 1 replies
  • 0 kudos

How can we change from GC to G1GC in serverless

My DLT jobs are experiencing throttling due to the following error message:[GC (GCLocker Initiated GC) [PSYoungGen: 5431990K->102912K(5643264K)] 9035507K->3742053K(17431552K), 0.1463381 secs] [Times: user=0.29 sys=0.00, real=0.14 secs]I came across s...

  • 993 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi surajitDE,How are you doing today?, As per my understanding, You're absolutely right to look into the GC (Garbage Collection) behavior—when you're seeing messages like GCLocker Initiated GC and frequent young gen collections, it usually means your...

  • 0 kudos
prasadvaze
by Valued Contributor II
  • 9181 Views
  • 4 replies
  • 6 kudos

Resolved! Limit on number of result rows displayed on databricks SQL UI

Databricks SQL UI currently limits the query results display to 64000 rows. When will this limit go away? Using SSMS I get 40MM rows results in the UI and my users won't switch to databricks SQL for this reason

  • 9181 Views
  • 4 replies
  • 6 kudos
Latest Reply
User16765136105
New Contributor III
  • 6 kudos

Hi @prasad vaze​ - We do have a feature in the works that will increase this limit. If you reach out to your Databricks contact they can give you more details regarding dates and the preview.

  • 6 kudos
3 More Replies
drag7ter
by Contributor
  • 1047 Views
  • 3 replies
  • 0 kudos

Overwriting delta table takes lot of time

I'm trying simply to overwrite data into delta table. The Table size is not really huge it has 50 Mil of rows and 1.9Gb in size.For running this code I use various cluster configurations starting from 1 node cluster 64Gb 16 Vcpu and also I tried to s...

  • 1047 Views
  • 3 replies
  • 0 kudos
Latest Reply
thackman
New Contributor III
  • 0 kudos

1) You might need to cache the dataframe so it's not recomputing for the write2) What type of cloud storage are you using? We've noticed slow delta writes as well. We are using Azure standard storage which is backed by spinning disks. It's limited to...

  • 0 kudos
2 More Replies
PaoloF
by New Contributor II
  • 1035 Views
  • 3 replies
  • 0 kudos

Resolved! Re-Ingest Autoloader files foreachbatch

Hi all,I'm using autoloader to ingest files, each files contains changed data from a table and I merge it into a delta table. It works fine.But if i want re-ingest all the files (deleting the checkpoint location, at example) i need to re-ingest the f...

  • 1035 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Glad to help!

  • 0 kudos
2 More Replies
dkxxx-rc
by Contributor
  • 1345 Views
  • 2 replies
  • 0 kudos

Can't "run all below" - "command is part of a batch that is still running"

Weirdness in Databricks on AWS.  In a notebook that is doing absolutely nothing, I click the "Run All Above" or "Run All Below" button on a cell, and it won't do anything at all except pop up a little message near the general "Run All" button, saying...

dkxxxrc_0-1743452390146.png
  • 1345 Views
  • 2 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @dkxxx-rc! Can you check if any background processes are still running in your notebook that might be interfering with new executions? If you are using Databricks Runtime 14.0 or above, cells run in batches, so any error halts execution, and in...

  • 0 kudos
1 More Replies
Prabakar
by Databricks Employee
  • 3148 Views
  • 1 replies
  • 2 kudos

Accessing the regions that are disabled by default in AWS from Databricks. In AWS we have 4 regions that are disabled by default. You must first enabl...

Accessing the regions that are disabled by default in AWS from Databricks.In AWS we have 4 regions that are disabled by default. You must first enable it before you can create and manage resources. The following Regions are disabled by default:Africa...

  • 3148 Views
  • 1 replies
  • 2 kudos
Latest Reply
AndreaCuda
New Contributor II
  • 2 kudos

Hello - We are looking to deploy and run Databricks in AWS in Bahrain, or UAE. Is this possible? This post is older so wondering if this is a viable option.

  • 2 kudos
JooseSauli
by New Contributor II
  • 1345 Views
  • 3 replies
  • 3 kudos

How to make .py files available for import?

Hello,I've looked around, but cannot find an answer. In my Azure Databricks workspace, users have Python notebooks which all make use of the same helper functions and classes. Instead of housing the helper code in notebooks and having %run magics in ...

  • 1345 Views
  • 3 replies
  • 3 kudos
Latest Reply
JooseSauli
New Contributor II
  • 3 kudos

Hi Brahmareddy,Thanks for your reply. Your second approach is quite close to what I already tried earlier. Your post got me to do some more testing, and although I don't know how to set the sys.path via the init script (it says here and here that it'...

  • 3 kudos
2 More Replies
turagittech
by Contributor
  • 1388 Views
  • 2 replies
  • 1 kudos

Resolved! Schema updating with CI/CD development in SQL

Hi all,I am working to resolve how to build tables in a development workspace catalog and then easily migrate the code to a production catalog without manually altering the schema name. For those unaware, you can't have the same catalog names in deve...

  • 1388 Views
  • 2 replies
  • 1 kudos
Latest Reply
turagittech
Contributor
  • 1 kudos

Thanks for this. Now to work out how much I want to work out alembic or dbt. I don't see any reason to go with Liquibase.Still an area for some improvements, actually a lot of improvement. Being able to build manageable, governed data warehouse schem...

  • 1 kudos
1 More Replies
MDV
by New Contributor III
  • 655 Views
  • 2 replies
  • 0 kudos

Problem with df.first() or collect() when collation different from UTF8_BINARY

I'm getting a error when I want to select the first() or collect() from a dataframe when using a collation different than UTF8_BINARYExample that reproduces the issue :This works :df_result = spark.sql(f"""                        SELECT 'en-us' AS ET...

  • 655 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @MDV I guess the issue likely comes from how non-default collations like UTF8_LCASE behave during serialization when using first() or collect(). As a workaround wrap the value in a subquery and re-cast the collation back to UTF8_BINARY before acce...

  • 0 kudos
1 More Replies
21f3001806
by New Contributor III
  • 887 Views
  • 3 replies
  • 1 kudos

Resolved! Dynamic inference tasks in workflows using dabs

I have some workflows where we use dynamic inference to set task values or capture job executions counts or output rows. Is there any way I can set these dynamic values using the ui but can i do the same at the time of dabs workflow creation. Can you...

  • 887 Views
  • 3 replies
  • 1 kudos
Latest Reply
21f3001806
New Contributor III
  • 1 kudos

Thanks @ashraf1395 ,  I got the idea of what I was looking for.

  • 1 kudos
2 More Replies
bigkahunaburger
by New Contributor II
  • 1174 Views
  • 1 replies
  • 0 kudos

Databricks SQL row limits

hi there,my dataset is approx 408K rows. i am trying to run a query that will return everything. but the results set seems to stop at 64K rows.i've seen a few posts in here asking about it, but they are several years old and a solution is promised. b...

  • 1174 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

Hi @bigkahunaburger,The 64k row limit in Databricks SQL applies only to the UI display, not the actual data processing. To access your full dataset, you can use the Download full results option to save the query output.Or use Spark or JDBC/ODBC conne...

  • 0 kudos
Soufiane_Darraz
by New Contributor II
  • 1341 Views
  • 2 replies
  • 4 kudos

Resolved! Generic pipeline with Databricks workflows with multiple triggers on a single job

A big limitation of Databricks Workflows is that you can’t have multiple triggers on a single job. If you have a generic pipeline using Databricks notebooks and need to trigger it at different times for different sources, there’s no built-in way to h...

  • 1341 Views
  • 2 replies
  • 4 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 4 kudos

Hi there @Soufiane_Darraz , completely aggreed with this point. it becomes frustrating when we cannot use multiple triggers in our workflows. Some examples we use in our databricks works or have seen being used in the industry are- Simple : Using an ...

  • 4 kudos
1 More Replies
Anonymous
by Not applicable
  • 7669 Views
  • 9 replies
  • 2 kudos

Resolved! Issue in creating workspace - Custom AWS Configuration

We have tried to create new workspace using "Custom AWS Configuration" and we have given our own VPC (Customer managed VPC) and tried but workspace failed to launch. We are getting below error which couldn't understand where the issue is in.Workspace...

  • 7669 Views
  • 9 replies
  • 2 kudos
Latest Reply
Briggsrr
New Contributor II
  • 2 kudos

Experiencing workspace launch failures with custom AWS configuration is frustrating. The "MALFORMED_REQUEST" error and failed network validation checks suggest a VPC configuration issue. It feels like playing Infinite Craft, endlessly combining eleme...

  • 2 kudos
8 More Replies
minhhung0507
by Valued Contributor
  • 2339 Views
  • 4 replies
  • 2 kudos

Optimizing Spark Read Performance on Delta Tables with Deletion Vectors Enabled

 Hi Databricks Experts,I'm currently using Delta Live Table to generate master data managed within Unity Catalog, with the data stored directly in Google Cloud Storage. I then utilize Spark to read these master data from the GCS bucket. However, I’m ...

  • 2339 Views
  • 4 replies
  • 2 kudos
Latest Reply
minhhung0507
Valued Contributor
  • 2 kudos

Hi @Louis_Frolio , thanks for your explaination.In case we can't optimize spark locally as fast as databicks. Do you have any suggestion for us to optimize performance in this scenario?

  • 2 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels