cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Hubert-Dudek
by Esteemed Contributor III
  • 1169 Views
  • 1 replies
  • 1 kudos

Streaming Data Modeling Normalization with Databricks Delta Live Tables

Streamline Data Modeling Normalization with Databricks Delta Live Tables in Just a Few Steps:- Use the "Apply changes" function to populate tables with slowly changing dimensions using auto-increment IDs.- Register SQL mapping functions to associate ...

scd1.png scd2.png scd3.png
  • 1169 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Thank you for sharing this @Hubert-Dudek !!!

  • 1 kudos
BAZA
by New Contributor III
  • 9791 Views
  • 8 replies
  • 2 kudos

Invisible empty spaces when reading .csv files

When importing a .csv file with leading and/or trailing empty spaces around the separators, the output results in strings that appear to be trimmed on the output table or when using .display() but are not actually trimmed.It is possible to identify t...

  • 9791 Views
  • 8 replies
  • 2 kudos
Latest Reply
Raluka
New Contributor III
  • 2 kudos

Thank you so much for helping me.

  • 2 kudos
7 More Replies
Nico1
by New Contributor II
  • 12198 Views
  • 11 replies
  • 2 kudos

Resolved! Problems connecting Simba ODBC with a M1 Macbook Pro

Hi,There's a way to make work the Simba ODBC Driver for M1 Macbook Pros?I find myself able to run on an old intel version of Macbook easily, but now every time I even test the connection with the iODBC Manager fails.Definitely, the issue is around no...

CleanShot 2022-05-15 at 22.50.36@2x
  • 12198 Views
  • 11 replies
  • 2 kudos
Latest Reply
kunalmishra9
New Contributor III
  • 2 kudos

Things seem to be mostly working for me now. I've added a bit more detail on my connection steps and process in case it's helpful for anyone on Stack Overflow: https://stackoverflow.com/questions/76407426/connecting-rstudio-desktop-to-databricks-comm...

  • 2 kudos
10 More Replies
DanBrown
by New Contributor
  • 1566 Views
  • 0 replies
  • 0 kudos

Remove WHERE 1=0

I am hoping someone can help me remove the WHERE 1=0 that is constantly getting added onto the end of my Query (see below).  Please let me know if I can provide more info here.This is running a notebook, in Azure Databricks against a cluster that has...

  • 1566 Views
  • 0 replies
  • 0 kudos
zak_k
by New Contributor III
  • 4227 Views
  • 5 replies
  • 1 kudos

com.databricks.spark.safespark.UDFException: UNAVAILABLE: Channel shutdownNow invoked

Trying to determine a root cause of UDFException that occurs when returning a variable length ArrayType. If I hardcode the data returned from the UDF to a fixed length, say 19, the error does not occur. Setup codesplit_runs_UDF = udf(split_runs_udf, ...

  • 4227 Views
  • 5 replies
  • 1 kudos
Latest Reply
zak_k
New Contributor III
  • 1 kudos

After further investigation, It reproduces slightly differently on single user mode.Single user mode: runs foreverShared: gives the above messageI've determined that there was a corner case in the dataset which lead to UDF never returning. I am am as...

  • 1 kudos
4 More Replies
RiyuLite
by New Contributor III
  • 2189 Views
  • 0 replies
  • 0 kudos

How to retrieve cluster IDs of a deleted All Purpose cluster ?

I need to retrieve the event logs of deleted All Purpose clusters of a certain workspace.databricks list API ({workspace_url}/api/2.0/clusters/list) provides me with the list of all active/terminated clusters but not the clusters that are deleted. I ...

  • 2189 Views
  • 0 replies
  • 0 kudos
miiaramo
by New Contributor II
  • 2491 Views
  • 2 replies
  • 1 kudos

DLT current channel uses same runtime as the preview channel

Hi,According to the latest release notes, the current channel of DLT should be using Databricks runtime 11.3 and the preview channel should be using 12.2. The current channel was using correct runtime version 11.3 still yesterday morning, but since ...

  • 2491 Views
  • 2 replies
  • 1 kudos
Latest Reply
adriennn
Valued Contributor
  • 1 kudos

I'm seeing the same issue with 12 current / 13 preview. Updating the channel didn't bump the runtime version and even creating a pipeline with the preview channel uses the current version.

  • 1 kudos
1 More Replies
Databricks143
by New Contributor III
  • 2783 Views
  • 4 replies
  • 0 kudos

Correlated column is not allowed in non predicate in UDF SQL

Hi Team,I am new to databricks and currently working on creating sql udf 's  in databricks .In udf we are calculating min date and that date column using in where clause also.While running udf getting  Correlated column is not allowed in  non predica...

  • 2783 Views
  • 4 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 0 kudos

Could you please provide your full code? I would also like to know which DBR version you are using in your cluster.

  • 0 kudos
3 More Replies
thomann
by New Contributor III
  • 6973 Views
  • 3 replies
  • 6 kudos

Bug? Unity Catalog incompatible with Sparklyr in RStudio (on Driver) and as well if used on one cluster from multiple notebooks?

If I start a RStudio Server with in cluster init script as described here in a Unity Catalog Cluster the sparklyr connection fails with an error about a missing Credential Scope.=LI tried it both in 11.3LTS and 12.0 Beta. I tried it only in a Persona...

image
  • 6973 Views
  • 3 replies
  • 6 kudos
Latest Reply
kunalmishra9
New Contributor III
  • 6 kudos

Have run into this issue as well. Let me know if there was any resolution 

  • 6 kudos
2 More Replies
soumyaPattnaik
by New Contributor III
  • 3671 Views
  • 3 replies
  • 6 kudos

How can I customize the Notebook Job # while using dbutils.notebook.run method?

When running multiple notebooks parallelly using dbutils.notebook.run from a parent notebook, an url to that running notebook is printed, like belowNotebook job #211371132480519Is there a way I can print the notebook name or some customized string in...

  • 3671 Views
  • 3 replies
  • 6 kudos
Latest Reply
soumyaPattnaik
New Contributor III
  • 6 kudos

Hi @Debayan Thank you for your reply.However, the answer I am looking for is : how to print/get a more meaningful name of the jobs when running multiple notebooks parallelly using dbutils.notebook.run from a parent notebook.Now in the parent notebook...

  • 6 kudos
2 More Replies
Leo_138525
by New Contributor II
  • 3653 Views
  • 4 replies
  • 1 kudos

Resolved! RDD not picking up spark configuration for azure storage account access

I want to open some CSV files as an RDD, do some processing and then load it as a DataFrame. Since the files are stored in an Azure blob storage account I need to configure the access accordingly, which for some reason does not work when using an RDD...

  • 3653 Views
  • 4 replies
  • 1 kudos
Latest Reply
Leo_138525
New Contributor II
  • 1 kudos

I decided to load the files into a DataFrame with a single column and then do the processing before splitting it into separate columns and this works just fine.@Hyper Guy​ thanks for the link, I didn't try that but it seems like it would resolve the ...

  • 1 kudos
3 More Replies
space25
by New Contributor
  • 3061 Views
  • 0 replies
  • 0 kudos

I am trying to use SQL join to combine 3 tables but the execution does not go beyond 93 million rows

Hi all,I ran a code to join 3 tables in Azure Databricks using SQL. When I run the code it is indicated "93 million rows read (1GB). It will be showing me " and does not go beyond this. Who knows what the issue could be?  

SQL Join Query.JPG State.JPG
Data Engineering
Azure Databricks SQL Databricks Join
  • 3061 Views
  • 0 replies
  • 0 kudos
JohnJustus
by New Contributor III
  • 5834 Views
  • 3 replies
  • 2 kudos

TypeError : withcolumn() takes 3 positional arguments but 4 were given.

Hi All,Can some one please help me with the error.This is my small python code.binmaster = binmasterdf.withColumnRenamed("tag_number","BinKey")\.withColumn ("Category", when (length("Bin")==4,'SMALL LOT'),(length("Bin")==7,'RACKING'))TypeError : with...

  • 5834 Views
  • 3 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @JohnJustus If you see closely in .withColumn ("Category", when (length("Bin")==4,'SMALL LOT'), when (length("Bin")==7,'RACKING'), otherwise('FLOOR')), withcolumn would take 2 parameters. The first parameter as a string and the second as the colum...

  • 2 kudos
2 More Replies
Erik
by Valued Contributor III
  • 11964 Views
  • 3 replies
  • 4 kudos

Liquid clustering with structured streaming pyspark

I would like to try out liquid clustering, but all the examples I see seem to be SQL tables created from selecting from other tables. Our gold tables are pyspark tables written directly to a table, e.g. like this: silver_df.writeStream.partitionBy(["...

  • 11964 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

I did not find anything in the docs either.  I suppose a pyspark version will come in the future?

  • 4 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels