cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yopbibo
by Contributor II
  • 20023 Views
  • 4 replies
  • 4 kudos

How can I connect to an Azure SQL db from a Databricks notebook?

I know how to do it with spark, and read/write tables (like https://docs.microsoft.com/en-gb/azure/databricks/data/data-sources/sql-databases#python-example )But this time, I need to only update a field of a specific row in a table. I do not think I ...

  • 20023 Views
  • 4 replies
  • 4 kudos
Latest Reply
yopbibo
Contributor II
  • 4 kudos

thanks for the link.I am maybe wrong, but they describe how to connect with spark. They do not provide a connection engine that we could use directly (like with pyodbc) or an engine that we could use in pandas, for example.

  • 4 kudos
3 More Replies
sunnyday
by New Contributor
  • 879 Views
  • 0 replies
  • 0 kudos

Naming jobs in the Spark UI in Databricks Runtime 15.4

I am asking almost the same question as: https://community.databricks.com/t5/data-engineering/how-to-improve-spark-ui-job-description-for-pyspark/td-p/48959 .  I would like to know how to improve the readability of the Spark UI by naming jobs.   I am...

  • 879 Views
  • 0 replies
  • 0 kudos
prathameshJoshi
by New Contributor III
  • 2728 Views
  • 7 replies
  • 6 kudos

Resolved! How to obtain the server url for using spark's REST API

Hi,I want to access the stage and job information (usually available through Spark UI) through the REST API provided by Spark: http://<server-url>:18080/api/v1/applications/[app-id]/stages. More information can be found at following link: https://spa...

  • 2728 Views
  • 7 replies
  • 6 kudos
Latest Reply
prathameshJoshi
New Contributor III
  • 6 kudos

Hi @Retired_mod  and @menotron ,Thanks a lot; your solutions are working. I apologise for the delay, as I had some issue logging in.

  • 6 kudos
6 More Replies
doodateika
by New Contributor III
  • 1486 Views
  • 4 replies
  • 1 kudos

Resolved! How to execute stored procedures on synapse sql pool from databricks

In the current version of databricks, previous methods to execute stored procedures seem to fail. spark.sparkContext._gateway.jvm.java.sql.DriverManager/spark._sc._gateway.jvm.java.sql.DriverManager returns that it is JVM dependent and will not work....

  • 1486 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

can you create a connection to external data in unity catalog, and then:use <connectiondb>;exec <sp>

  • 1 kudos
3 More Replies
Siegel_Af_
by New Contributor
  • 651 Views
  • 1 replies
  • 0 kudos

playcasinosnj.com

One of the best platforms where you can find games like online slots is https://playcasinosnj.com/. It is in my opinion the safest site to find the games that suit you. I also had the chance to try out the different games and they all worked for me. ...

  • 651 Views
  • 1 replies
  • 0 kudos
Latest Reply
miorickybort
New Contributor II
  • 0 kudos

Cool, I love slots. Thanks for the tip. I like that they are very diverse and easy to learn, so I often enjoy spending a couple of hours on these games in the evenings after work. Also, I recently found some betting apps not on Gamstop that turned ou...

  • 0 kudos
pinaki1
by New Contributor III
  • 374 Views
  • 1 replies
  • 0 kudos

Peformnace improvement of Databricks Spark Job

Hi,I need performance improvement for data bricks job in my project. Here are some steps being done in the project1. Read csv/Json files with small size (100MB,50MB) from multiple locations in s32. Write the data in bronze layer in delta/parquet form...

  • 374 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

In case of performance issues, always look for 'expensive' operations. Mainly wide operations (shuffle) and collecting data to the driver.Start with checking how long the bronze part takes, then silver etc.Pinpoint where it starts to get slow, then d...

  • 0 kudos
BricksGuy
by New Contributor III
  • 1419 Views
  • 7 replies
  • 0 kudos

WATER MARK ERROR WHILE JOINING WITH MULTIPLE STREAM TABLES

I am creating a ETL pipeline where i am reading multiple stream table into temp tables and at the end am trying to join those tables to get the output feed into another live table. So for that am using below method where i am giving list of tables as...

  • 1419 Views
  • 7 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

it is necessary for the join so if the dataframe has a watermark that's enough.No need to define it multiple times.

  • 0 kudos
6 More Replies
SrinuM
by New Contributor III
  • 307 Views
  • 0 replies
  • 0 kudos

Workspace Client dbutils issue

 host = "https://adb-xxxxxx.xx.azuredatabricks.net"token = "dapxxxxxxx"we are using databricksconnect from databricks.sdk import WorkspaceClientdbutil = WorkspaceClient(host=host,token=token).dbutilsfiles = dbutil.fs.ls("abfss://container-name@storag...

  • 307 Views
  • 0 replies
  • 0 kudos
emorgoch
by New Contributor II
  • 9010 Views
  • 1 replies
  • 0 kudos

Passing variables from python to sql in a notebook using serverless compute

I've got a notebook that I've written that's going to execute some python code to parse the workspace id to figure out which of my environments that I'm in and set a value for it. I then want to take that value, and pass it through to a code block of...

  • 9010 Views
  • 1 replies
  • 0 kudos
Latest Reply
emorgoch
New Contributor II
  • 0 kudos

Thanks Kaniz, this is a great suggestion. I'll look into it and how it can work for my projects.

  • 0 kudos
tariq
by New Contributor III
  • 4889 Views
  • 5 replies
  • 1 kudos

SqlContext in DBR 14.3

I have a Databricks workspace in GCP and I am using the cluster with the Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am trying to set the checkpoint directory location using the following command in a notebook:spark.sparkContext.set...

  • 4889 Views
  • 5 replies
  • 1 kudos
Latest Reply
Dave1967
New Contributor III
  • 1 kudos

Has this been resolved, I am encountering the same issue with df.rdd.getNumPartitions()

  • 1 kudos
4 More Replies
MichaelO
by New Contributor III
  • 2746 Views
  • 1 replies
  • 0 kudos

Terminating cluster programmatically

Is there any python script that allows me to terminate (not delete)  a cluster in the notebook, similar to this R equivalent ofterminate_cluster(cluster_id, workspace, token = NULL, verbose = T, ...)

  • 2746 Views
  • 1 replies
  • 0 kudos
lbdatauser
by New Contributor II
  • 314 Views
  • 0 replies
  • 0 kudos

dbx with serverless clusters

With dbx, is it impossible to create tasks that run on serverless clusters? Is it necessary to use Databricks bundles for it?https://dbx.readthedocs.io/en/latest/reference/deployment/https://learn.microsoft.com/en-us/azure/databricks/jobs/run-serverl...

  • 314 Views
  • 0 replies
  • 0 kudos
PraveenReddy21
by New Contributor III
  • 1152 Views
  • 7 replies
  • 2 kudos

Resolved! i created External database but unable to transferring table to Storage Acc(BLOBcontainer-Gold)

Hi , I done activities  Bronze and Silver , after i trying to saving table to Gold  container but unable to storing .i created External database .I want store  data to PARQUET but not supporting ,only DELTA.only  MANAGED LOCATION supporting but unabl...

  • 1152 Views
  • 7 replies
  • 2 kudos
Latest Reply
PraveenReddy21
New Contributor III
  • 2 kudos

Thank You  Rishabh.

  • 2 kudos
6 More Replies
Filippo
by New Contributor
  • 497 Views
  • 0 replies
  • 0 kudos

Issue with View Ownership Reassignment in Unity Catalog

Hello,It appears that the ownership rules for views and functions in Unity Catalog do not align with the guidelines provided in the “Manage Unity Catalog object ownership” documentation on Microsoft Learn.When attempting to reassign the ownership of ...

  • 497 Views
  • 0 replies
  • 0 kudos
KosmaS
by New Contributor III
  • 783 Views
  • 1 replies
  • 0 kudos

Skewness / Salting with countDistinct

Hey Everyone,I experience data skewness for: df = (source_df .unionByName(source_df.withColumn("region", lit("Country"))) .groupBy("zip_code", "region", "device_type") .agg(countDistinct("device_id").alias("total_active_unique"), count("device_id").a...

Screenshot 2024-08-05 at 17.24.08.png
  • 783 Views
  • 1 replies
  • 0 kudos
Latest Reply
KosmaS
New Contributor III
  • 0 kudos

Hey @Retired_mod thanks for the reply. I tried to spend some time on your response.You're suggesting 'double aggregation' and as I'd be guessing it should look more or less this way:df = (source_df .unionByName(source_df.withColumn("region", lit("Cou...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels