cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

space25
by New Contributor
  • 3745 Views
  • 0 replies
  • 0 kudos

I am trying to use SQL join to combine 3 tables but the execution does not go beyond 93 million rows

Hi all,I ran a code to join 3 tables in Azure Databricks using SQL. When I run the code it is indicated "93 million rows read (1GB). It will be showing me " and does not go beyond this. Who knows what the issue could be?  

SQL Join Query.JPG State.JPG
Data Engineering
Azure Databricks SQL Databricks Join
  • 3745 Views
  • 0 replies
  • 0 kudos
JohnJustus
by New Contributor III
  • 7133 Views
  • 3 replies
  • 2 kudos

TypeError : withcolumn() takes 3 positional arguments but 4 were given.

Hi All,Can some one please help me with the error.This is my small python code.binmaster = binmasterdf.withColumnRenamed("tag_number","BinKey")\.withColumn ("Category", when (length("Bin")==4,'SMALL LOT'),(length("Bin")==7,'RACKING'))TypeError : with...

  • 7133 Views
  • 3 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @JohnJustus If you see closely in .withColumn ("Category", when (length("Bin")==4,'SMALL LOT'), when (length("Bin")==7,'RACKING'), otherwise('FLOOR')), withcolumn would take 2 parameters. The first parameter as a string and the second as the colum...

  • 2 kudos
2 More Replies
Erik
by Valued Contributor III
  • 13367 Views
  • 3 replies
  • 4 kudos

Liquid clustering with structured streaming pyspark

I would like to try out liquid clustering, but all the examples I see seem to be SQL tables created from selecting from other tables. Our gold tables are pyspark tables written directly to a table, e.g. like this: silver_df.writeStream.partitionBy(["...

  • 13367 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

I did not find anything in the docs either.  I suppose a pyspark version will come in the future?

  • 4 kudos
2 More Replies
MichaelO
by New Contributor III
  • 5528 Views
  • 4 replies
  • 2 kudos

Resolved! Call python image function in pyspark

I have a function for rotating images written in python:from PIL import Image def rotate_image(image, rotation_angle): im = Image.open(image) out = im.rotate(rotation_angle, expand = True) return outI now want to use this function as a pyspark ...

  • 5528 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raluka
New Contributor III
  • 2 kudos

Stock photos, I've come to realize, are the catalysts of imagination. This website's vast reservoir of images new york seal sparks ideas that ripple through my projects. They empower me to envision the previously unimagined, helping me breathe life i...

  • 2 kudos
3 More Replies
Lucifer
by New Contributor
  • 1138 Views
  • 0 replies
  • 0 kudos

How to get job launch type in notebook

I want to get job launched status in notebook if it is launched by scheduler or manuallyI tried using JobTriggerType property of notebook context but it gives only manual and repair but not by scheduleddbutils.notebook.entry_point.getDbutils().notebo...

  • 1138 Views
  • 0 replies
  • 0 kudos
ottomes
by New Contributor II
  • 5256 Views
  • 3 replies
  • 0 kudos

What is my subscription plan?

I am working as data engineer I was about checking the subscription plan. I would like to know how I can check the subscription plan. I am "Admin" but I cannot "Manage accounts" on Databricks workspace portal.This subscription information is pretty i...

Data Engineering
calculation
DBU
Premium
pricing
Standard
  • 5256 Views
  • 3 replies
  • 0 kudos
Latest Reply
ottomes
New Contributor II
  • 0 kudos

Hey, not really. Update you: if you try to add ACL to a secret scope then you will be sure that your subscription is Enterprise or Standard, because either you succeed then you are working with Enterprise or the API respond with the Standard subscrip...

  • 0 kudos
2 More Replies
alonisser
by Contributor II
  • 2086 Views
  • 2 replies
  • 0 kudos

Trying to vacuum a table that is constantly being "createdOrReplaced"

and it seems that older data (From the "replaced" table) isn't being removed, long after the retention period I'd be glad for clues on how to handle this

  • 2086 Views
  • 2 replies
  • 0 kudos
Latest Reply
alonisser
Contributor II
  • 0 kudos

Thanks, I know that, but the table history shows 30 days, but the actual data size and number of files and all other indicators , correlate to 170 days. 

  • 0 kudos
1 More Replies
Rishitha
by New Contributor III
  • 9180 Views
  • 3 replies
  • 2 kudos

Resolved! DLT pipeline

Hi all!I have a question about setting a target schema. How to set different targets for 2 different tables in the same delta live table pipeline. We have 2 target schemas in a database Bronze_chema and silver_schema.  The pipeline has a streaming ra...

  • 9180 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishitha
New Contributor III
  • 2 kudos

Thanks again @btafur Hoping for this feature to release soon!

  • 2 kudos
2 More Replies
Jayanth746
by New Contributor III
  • 22071 Views
  • 9 replies
  • 4 kudos

Kafka unable to read client.keystore.jks.

Below is the error we have received when trying to read the stream Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore /dbfs/FileStore/Certs/client.keystore.jksCaused by: java.nio.file.NoSuchFileException: /dbfs...

  • 22071 Views
  • 9 replies
  • 4 kudos
Latest Reply
mwoods
New Contributor III
  • 4 kudos

Ok, scrub that - the problem in my case was that I was using the 14.0 databricks runtime, which appears to have a bug relating to abfss paths here. Switching back to the 13.3 LTS release resolved it for me. So if you're in the same boat finding abfss...

  • 4 kudos
8 More Replies
mwoods
by New Contributor III
  • 12187 Views
  • 2 replies
  • 1 kudos

Resolved! Spark readStream kafka.ssl.keystore.location abfss path

Similar to https://community.databricks.com/t5/data-engineering/kafka-unable-to-read-client-keystore-jks/td-p/23301 - the documentation (https://learn.microsoft.com/en-gb/azure/databricks/structured-streaming/kafka#use-ssl-to-connect-azure-databricks...

  • 12187 Views
  • 2 replies
  • 1 kudos
Latest Reply
mwoods
New Contributor III
  • 1 kudos

@Retired_mod- quick update - managed to find the cause. It's neither of the above, it's a bug in the DataBricks 14.0 runtime. I had switched back to the 13.3 LTS runtime, and that is what caused the error to disappear.As soon as I try to read directl...

  • 1 kudos
1 More Replies
Jozhua
by New Contributor
  • 2794 Views
  • 0 replies
  • 0 kudos

Spark streaming auto loader wildcard not working

Need som help with an issue loading a subdirectory from S3 bucket using auto-loader. For example:S3://path1/path2/databases*/paths/In databases there are various versions of databases. For examplepath1/path2/database_v1/sub_path/*.parquet  path1/path...

Data Engineering
autoloader
S3
wildcard
  • 2794 Views
  • 0 replies
  • 0 kudos
Sahha_Krishna
by New Contributor
  • 2014 Views
  • 1 replies
  • 0 kudos

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Unable to start the Cluster in AWS-hosted Databricks because of the below reason{ "reason": { "code": "BOOTSTRAP_TIMEOUT", "parameters": { "databricks_error_message": "[id: InstanceId(i-0634ee9c2d420edc8), status: INSTANCE_INITIALIZIN...

  • 2014 Views
  • 1 replies
  • 0 kudos
Latest Reply
Harrison_S
Databricks Employee
  • 0 kudos

Hi Sahha, It may be a DNS issue if that wasn't rolled back, can you check the documentation on troubleshooting guide and see if these configurations were rolled back as well? https://docs.databricks.com/en/administration-guide/cloud-configurations/aw...

  • 0 kudos
vlado101
by New Contributor II
  • 4011 Views
  • 0 replies
  • 0 kudos

A way to run OPTIMIZE, VACUUM, ANALYZE on all schemas and tables

Hello everyone,I am not sure if this was asked, but I am trying to find a way to create one python (or Scala) script that would basically take a list of all the schemas and then run optimize, vacuum, and then analyze tables on them.I see a lot of web...

  • 4011 Views
  • 0 replies
  • 0 kudos
Labels