cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Arihant
by New Contributor
  • 7034 Views
  • 0 replies
  • 0 kudos

Unable to login to Databricks Community Edition

Hello All,I have successfully created a databricks account and went to login to the community edition with the exact same login credentials as my account, but it tells me that the email/password are invalid. I can login with these same exact credenti...

  • 7034 Views
  • 0 replies
  • 0 kudos
Databricks143
by New Contributor III
  • 2126 Views
  • 4 replies
  • 0 kudos

Correlated column is not allowed in non predicate in UDF SQL

Hi Team,I am new to databricks and currently working on creating sql udf 's  in databricks .In udf we are calculating min date and that date column using in where clause also.While running udf getting  Correlated column is not allowed in  non predica...

  • 2126 Views
  • 4 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 0 kudos

Could you please provide your full code? I would also like to know which DBR version you are using in your cluster.

  • 0 kudos
3 More Replies
thomann
by New Contributor III
  • 5827 Views
  • 3 replies
  • 6 kudos

Bug? Unity Catalog incompatible with Sparklyr in RStudio (on Driver) and as well if used on one cluster from multiple notebooks?

If I start a RStudio Server with in cluster init script as described here in a Unity Catalog Cluster the sparklyr connection fails with an error about a missing Credential Scope.=LI tried it both in 11.3LTS and 12.0 Beta. I tried it only in a Persona...

image
  • 5827 Views
  • 3 replies
  • 6 kudos
Latest Reply
kunalmishra9
New Contributor III
  • 6 kudos

Have run into this issue as well. Let me know if there was any resolution 

  • 6 kudos
2 More Replies
soumyaPattnaik
by New Contributor III
  • 3080 Views
  • 3 replies
  • 6 kudos

How can I customize the Notebook Job # while using dbutils.notebook.run method?

When running multiple notebooks parallelly using dbutils.notebook.run from a parent notebook, an url to that running notebook is printed, like belowNotebook job #211371132480519Is there a way I can print the notebook name or some customized string in...

  • 3080 Views
  • 3 replies
  • 6 kudos
Latest Reply
soumyaPattnaik
New Contributor III
  • 6 kudos

Hi @Debayan Thank you for your reply.However, the answer I am looking for is : how to print/get a more meaningful name of the jobs when running multiple notebooks parallelly using dbutils.notebook.run from a parent notebook.Now in the parent notebook...

  • 6 kudos
2 More Replies
Leo_138525
by New Contributor II
  • 2809 Views
  • 4 replies
  • 1 kudos

Resolved! RDD not picking up spark configuration for azure storage account access

I want to open some CSV files as an RDD, do some processing and then load it as a DataFrame. Since the files are stored in an Azure blob storage account I need to configure the access accordingly, which for some reason does not work when using an RDD...

  • 2809 Views
  • 4 replies
  • 1 kudos
Latest Reply
Leo_138525
New Contributor II
  • 1 kudos

I decided to load the files into a DataFrame with a single column and then do the processing before splitting it into separate columns and this works just fine.@Hyper Guy​ thanks for the link, I didn't try that but it seems like it would resolve the ...

  • 1 kudos
3 More Replies
space25
by New Contributor
  • 2314 Views
  • 0 replies
  • 0 kudos

I am trying to use SQL join to combine 3 tables but the execution does not go beyond 93 million rows

Hi all,I ran a code to join 3 tables in Azure Databricks using SQL. When I run the code it is indicated "93 million rows read (1GB). It will be showing me " and does not go beyond this. Who knows what the issue could be?  

SQL Join Query.JPG State.JPG
Data Engineering
Azure Databricks SQL Databricks Join
  • 2314 Views
  • 0 replies
  • 0 kudos
JohnJustus
by New Contributor III
  • 4477 Views
  • 3 replies
  • 2 kudos

TypeError : withcolumn() takes 3 positional arguments but 4 were given.

Hi All,Can some one please help me with the error.This is my small python code.binmaster = binmasterdf.withColumnRenamed("tag_number","BinKey")\.withColumn ("Category", when (length("Bin")==4,'SMALL LOT'),(length("Bin")==7,'RACKING'))TypeError : with...

  • 4477 Views
  • 3 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 2 kudos

Hi @JohnJustus If you see closely in .withColumn ("Category", when (length("Bin")==4,'SMALL LOT'), when (length("Bin")==7,'RACKING'), otherwise('FLOOR')), withcolumn would take 2 parameters. The first parameter as a string and the second as the colum...

  • 2 kudos
2 More Replies
Erik
by Valued Contributor II
  • 9387 Views
  • 3 replies
  • 4 kudos

Liquid clustering with structured streaming pyspark

I would like to try out liquid clustering, but all the examples I see seem to be SQL tables created from selecting from other tables. Our gold tables are pyspark tables written directly to a table, e.g. like this: silver_df.writeStream.partitionBy(["...

  • 9387 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

I did not find anything in the docs either.  I suppose a pyspark version will come in the future?

  • 4 kudos
2 More Replies
EDDatabricks
by Contributor
  • 1366 Views
  • 0 replies
  • 1 kudos

Multiple DLT pipelines same target table

Is it possible to have multiple DLT pipelines write data concurrently and in append mode to the same Delta table? Because of different data sources, with different data volumes and required processing, we would like to have different pipelines stream...

Data Engineering
Delta tables
DLT pipeline
  • 1366 Views
  • 0 replies
  • 1 kudos
MichaelO
by New Contributor III
  • 3350 Views
  • 4 replies
  • 2 kudos

Resolved! Call python image function in pyspark

I have a function for rotating images written in python:from PIL import Image def rotate_image(image, rotation_angle): im = Image.open(image) out = im.rotate(rotation_angle, expand = True) return outI now want to use this function as a pyspark ...

  • 3350 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raluka
New Contributor III
  • 2 kudos

Stock photos, I've come to realize, are the catalysts of imagination. This website's vast reservoir of images new york seal sparks ideas that ripple through my projects. They empower me to envision the previously unimagined, helping me breathe life i...

  • 2 kudos
3 More Replies
Lucifer
by New Contributor
  • 726 Views
  • 0 replies
  • 0 kudos

How to get job launch type in notebook

I want to get job launched status in notebook if it is launched by scheduler or manuallyI tried using JobTriggerType property of notebook context but it gives only manual and repair but not by scheduleddbutils.notebook.entry_point.getDbutils().notebo...

  • 726 Views
  • 0 replies
  • 0 kudos
ottomes
by New Contributor II
  • 2578 Views
  • 3 replies
  • 0 kudos

What is my subscription plan?

I am working as data engineer I was about checking the subscription plan. I would like to know how I can check the subscription plan. I am "Admin" but I cannot "Manage accounts" on Databricks workspace portal.This subscription information is pretty i...

Data Engineering
calculation
DBU
Premium
pricing
Standard
  • 2578 Views
  • 3 replies
  • 0 kudos
Latest Reply
ottomes
New Contributor II
  • 0 kudos

Hey, not really. Update you: if you try to add ACL to a secret scope then you will be sure that your subscription is Enterprise or Standard, because either you succeed then you are working with Enterprise or the API respond with the Standard subscrip...

  • 0 kudos
2 More Replies
alonisser
by Contributor
  • 1013 Views
  • 2 replies
  • 0 kudos

Trying to vacuum a table that is constantly being "createdOrReplaced"

and it seems that older data (From the "replaced" table) isn't being removed, long after the retention period I'd be glad for clues on how to handle this

  • 1013 Views
  • 2 replies
  • 0 kudos
Latest Reply
alonisser
Contributor
  • 0 kudos

Thanks, I know that, but the table history shows 30 days, but the actual data size and number of files and all other indicators , correlate to 170 days. 

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels