cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ConfusedZebra
by New Contributor III
  • 1995 Views
  • 3 replies
  • 0 kudos

HTML form within notebooks

Hi all,I'm trying to make a small form in Databricks notebooks. I can't currently use apps so want an interim solution. I can successfully make the form using HTML which displays correctly but I cannot extract the values/use them e.g. A form with thr...

  • 1995 Views
  • 3 replies
  • 0 kudos
Latest Reply
ConfusedZebra
New Contributor III
  • 0 kudos

Thanks both. Notebooks are a little too intimidating for some users so we are trying to make them look and feel a bit more like what they are used to. Ideally we would build an app but apps aren't available in our area yet so we need an interim solut...

  • 0 kudos
2 More Replies
jura
by New Contributor II
  • 3157 Views
  • 3 replies
  • 1 kudos

SQL Identifier clause

Hi, I was trying to prepare some dynamic SQLs to create table using the IDENTIFIER clause and WITH AS clause, but I'm stuck on some bug as it seems. could someone verify it or tell me that I am doing something wrong?code is running on SQL Warehouse T...

jura_2-1710922868633.png jura_3-1710923081107.png jura_4-1710923152252.png
Data Engineering
identifier
  • 3157 Views
  • 3 replies
  • 1 kudos
Latest Reply
vinay_yogeesh
New Contributor II
  • 1 kudos

Hey, I am struck with the same issue, did you find any workaround. I am trying to run DESCRIBE & ALTER command using IDENTIFIER() using databricks-sql-connector. Did u figure out how to run the identifier command statements??

  • 1 kudos
2 More Replies
lprevost
by Contributor III
  • 2007 Views
  • 3 replies
  • 0 kudos

Using WorkspaceClient -- run a saved query

I've saved a query on my sql warehouse which has a parameter called :list_parameter.   I've found my query id as follows:  from databricks.sdk import WorkspaceClient w = WorkspaceClient() for query in w.queries.list(): print(f"query: {query.displ...

  • 2007 Views
  • 3 replies
  • 0 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 0 kudos

Hi @lprevost  The WorkspaceClient provides APIs to manage Query objects. But it doesn't provide the API to run it. If you need to run the query from a notebook, you can pass the query text into `spark.sql`. It returns SparkDataFrame. I hope this help...

  • 0 kudos
2 More Replies
abhijeet_more
by New Contributor II
  • 2413 Views
  • 2 replies
  • 1 kudos

Resolved! DLT pipline with generated identity column

I got a csv file which I am looking to read into a streaming table. I always want to add a generated identity column as surrogate key. I found few blogs which  says we can achieve this by explicit mention of schema. However, I have around 40 odd fiel...

  • 2413 Views
  • 2 replies
  • 1 kudos
Latest Reply
abhijeet_more
New Contributor II
  • 1 kudos

Thank you @koji_kawamura .This was helpful.

  • 1 kudos
1 More Replies
jeremy98
by Honored Contributor
  • 6251 Views
  • 7 replies
  • 0 kudos

Optimizing .collect() Usage in Spark

Hi all!I'm facing an issue with driver memory after deploying a cluster with 14GB of memory. My code utilizes the cluster’s compute power continuously (it never shuts down, as I cannot communicate with the Azure PostgreSQL database otherwise at the m...

  • 6251 Views
  • 7 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

I would expect both the Python process on the driver and Spark's JVM to release memory once you are done with each chunk of data. Otherwise, this sounds like a memory leak. If you suspect this is a problem in the JVM, you can look at heap dumps - the...

  • 0 kudos
6 More Replies
knutasm
by New Contributor III
  • 10007 Views
  • 7 replies
  • 7 kudos

Run Delta Live Tables as service principal

How to run a delta live tables pipeline in production? It uses the owner's (creator's) permissions for writing to tables, and I can't change the owner of a UC-enabled pipeline after creation. I don't want regular users to have write access to prod ta...

  • 10007 Views
  • 7 replies
  • 7 kudos
Latest Reply
ashwini0723
New Contributor II
  • 7 kudos

@knutasmI have build the solution for it. The way to create DLT pipeline using SPN is to write a code wherein via databricks API a new DLT pipeline will be created and you mentioned owner as a service principal in the API code as shown below. Below m...

  • 7 kudos
6 More Replies
Ian_Neft
by New Contributor II
  • 13975 Views
  • 4 replies
  • 1 kudos

Data Lineage in Unity Catalog not Populating

I have been trying to get the data lineage to populate with the simplest of queries on a unity enabled catalog with a unity enabled cluster.  I am essentially running the example provided with more data to see how it works with various aggregates dow...

  • 13975 Views
  • 4 replies
  • 1 kudos
Latest Reply
pmahawar
New Contributor II
  • 1 kudos

Cluster running in shared mode with Databricks runtime 15.4 LTSUC setup as per Databricks guide. I can see system tables everything but data is not populating in the table_lineage table. EventHub Firewall 9093 port is also open.Enabled runtime settin...

  • 1 kudos
3 More Replies
HeyRam
by New Contributor II
  • 975 Views
  • 1 replies
  • 1 kudos

Resolved! Lab material for "Apache Spark Developer Learning Plan"

Hi, I just finished the following course "Introduction to Python for Data Science and Data Engineering". The instructor talks about the lab material but no where in the tabs on the left hand side, I am able to find any link to the lab material. I am ...

  • 975 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika_
Databricks Employee
  • 1 kudos

Hello @HeyRam! Did you take the Self-paced course? To clarify, lab materials are not available in self-paced courses. To access them, you have two options: Enroll in the ILT (Instructor-Led Training) course - This will grant you access to the labs fo...

  • 1 kudos
Anish_2
by New Contributor III
  • 1661 Views
  • 3 replies
  • 0 kudos

removal of Delta live tables

Hello Team,I have removed definition of table from delta live table pipeline but table is still present in unity catalog. In event log, it is giving below messageMaterialized View '`catalog1`.`schema1`.`table1`' is no longer defined in the pipeline a...

Data Engineering
Delta Live Table
  • 1661 Views
  • 3 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi @Anish_2 How are you doing today? I agree with @KaranamS's answer.Databricks marks the table as inactive instead of removing it to prevent accidental data loss, allowing you to restore it if needed. Once inactive, the table remains in Unity Catalo...

  • 0 kudos
2 More Replies
N38
by New Contributor III
  • 2921 Views
  • 11 replies
  • 4 kudos

DLT Pipeline event_log error - invalid pipeline name / The Spark SQL phase analysis failed

I am trying the below queries using both SQL warehouse and a shared cluster on Databricks runtime (15.4/16.1) with Unity Catalog: SELECT * FROM event_log(table(my_catalog.myschema.bronze_employees))SELECT * FROM event_log("6b317553-5c5a-40d5-9541-1a5...

  • 2921 Views
  • 11 replies
  • 4 kudos
Latest Reply
ron99
New Contributor II
  • 4 kudos

Hi,I am also facing same issue, is there any ETA to fix it?

  • 4 kudos
10 More Replies
phoebe_dt
by New Contributor
  • 5854 Views
  • 1 replies
  • 0 kudos

Access denied error to s3 bucket in Databricks notebook

When running a databricks notebook connected to an s3 cluster I randomly but frequently experience the following error: java.nio.file.AccessDeniedException: s3://mybucket: getFileStatus on s3://mybucket: com.amazonaws.services.s3.model.AmazonS3Except...

Data Engineering
access denied
AWS
databricks notebook
S3
  • 5854 Views
  • 1 replies
  • 0 kudos
Latest Reply
pg289
New Contributor II
  • 0 kudos

Hey! I'm getting this same error connecting to s3 compatible storage - MinIO. Were you able to resolve this? Thanks!

  • 0 kudos
mkolonay
by New Contributor II
  • 1471 Views
  • 2 replies
  • 0 kudos

Widget management in Jupyter (ipynb) notebooks

Hello!We use widgets to set values for which catalogs and schemas to use in our solutions. These values change as we change environments, feature, dev, uat, prod.In dev, uat and prod, the values are set in workflows so the widget values are not an is...

  • 1471 Views
  • 2 replies
  • 0 kudos
Latest Reply
mkolonay
New Contributor II
  • 0 kudos

Thanks for the quick response.Your suggestion is essentially what we are currently doing. We have a config file for each environment that is used to create all the widgets. In the first cell of our notebooks, we call a script to create the widgets us...

  • 0 kudos
1 More Replies
hnnhhnnh
by New Contributor II
  • 3043 Views
  • 3 replies
  • 1 kudos

windowing/Join function on streaming data failing in DLT

i have a DLT pipeline, wanted to update a perticular column value from other records, trying it using windowing first_value geting below error.approach#1: joinI tried selfjoin initially, failing with joins are not possible in DLT with below error.org...

  • 3043 Views
  • 3 replies
  • 1 kudos
Latest Reply
hnnhhnnh
New Contributor II
  • 1 kudos

hi @jhonm_839 thanks for your response to the post. I appreciate your time.I have watermark on dataframe in palce and used last(), min() etc.. aggregate functions its still filing.final_df = df.withWatermark("kafka_ts", "1 hour").withColumn("s_featur...

  • 1 kudos
2 More Replies
the_dude
by New Contributor II
  • 1484 Views
  • 3 replies
  • 0 kudos

How are .whl files executed for Python wheel tasks?

Hello,We package a Poetry managed project into a .whl and run it as a Python wheel task. Naturally, many of the dependencies referenced by the .whl file are already present on the Databricks cluster. Is this detected by the task setup (in its virtual...

  • 1484 Views
  • 3 replies
  • 0 kudos
Latest Reply
Nik_Vanderhoof
Contributor
  • 0 kudos

Hi David,I can't speak exactly to how Poetry handles the dependency resolution of libraries that are already installed, or how that interacts with the Databricks runtime. However, I can offer you some advice on how my team handles this situtation.It'...

  • 0 kudos
2 More Replies
udi_azulay
by New Contributor II
  • 4603 Views
  • 6 replies
  • 1 kudos

Variant type table within DLT

Hi,I have a table with Variant type (preview) and works well in 15.3, when i try to run a code that reference this Variant type in a DLT pipeline i get : com.databricks.sql.transaction.tahoe.DeltaUnsupportedTableFeatureException: [DELTA_UNSUPPORTED_F...

  • 4603 Views
  • 6 replies
  • 1 kudos
Latest Reply
MAJVeld
New Contributor II
  • 1 kudos

I can indeed confirm that adding some additional table properties to the @Dlt attribute in the DLT pipeline definition resolved the earlier issues. Thanks for pointing this out. 

  • 1 kudos
5 More Replies
Labels