cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tom_shaffner
by New Contributor III
  • 12980 Views
  • 6 replies
  • 8 kudos

Resolved! Is there some form of enablement required to use Delta Live Tables (DLT)?

I'm trying to use delta live tables, but if I import even the example notebooks I get a warning saying `ModuleNotFoundError: No module named 'dlt'`. If I try and install via pip it attempts to install a deep learning framework of some sort.I checked ...

  • 12980 Views
  • 6 replies
  • 8 kudos
Latest Reply
Insight6
New Contributor II
  • 8 kudos

Here's the solution I came up with... Replace `import dlt` at the top of your first cell with the following: try: import dlt # When run in a pipeline, this package will exist (no way to import it here) except ImportError: class dlt...

  • 8 kudos
5 More Replies
dineshg
by New Contributor III
  • 4418 Views
  • 3 replies
  • 6 kudos

Resolved! pyspark - execute dynamically framed action statement stored in string variable

I need to execute union statement which is framed dynamically and stored in string variable. I framed the union statement, but struck with executing the statement. Does anyone know how to execute union statement stored in string variable? I'm using p...

  • 4418 Views
  • 3 replies
  • 6 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 6 kudos

@Dineshkumar Gopalakrishnan​ using python's exec() function can be used to execute a python statement, which in your case could be pyspark union statement. Refer below sample code snippet for your reference.df1 = spark.sparkContext.parallelize([(1, 2...

  • 6 kudos
2 More Replies
BearInTheWoods
by New Contributor III
  • 3119 Views
  • 1 replies
  • 4 kudos

Importing Azure SQL data into Databricks

Hi,I am looking at building a data warehouse using Databricks. Most of the data will be coming from Azure SQL, and we now have Azure SQL CDC enabled to capture changes. Also I would like to import this without paying for additional connectors like Fi...

  • 3119 Views
  • 1 replies
  • 4 kudos
Latest Reply
ravinchi
New Contributor III
  • 4 kudos

@Bear Woods​ Hi! were you able to create DLT tables using CDC feature from sources like sql tables ? even I'm kinda in your situation, you need to leverage apply_changes function and create_streaming_live_table() function but it required intermediate...

  • 4 kudos
g96g
by New Contributor III
  • 7318 Views
  • 8 replies
  • 0 kudos

Resolved! ADF pipeline fails when passing the parameter to databricks

I have project where I have to read the data from NETSUITE using API. Databricks Notebook runs perfectly when I manually insert the table names I want to read from the source. I have dataset (csv) file in adf with all the table names that I need to r...

  • 7318 Views
  • 8 replies
  • 0 kudos
Latest Reply
mcwir
Contributor
  • 0 kudos

Have you tried do debug the json payload of adf trigger ? maybe it wrongly conveys tables names

  • 0 kudos
7 More Replies
Ramabadran
by New Contributor II
  • 17373 Views
  • 3 replies
  • 4 kudos

java.lang.NoClassDefFoundError: scala/Product$class

Hi I am getting "java.lang.NoClassDefFoundError: scala/Product$class" error while using Deequ 1.0.5 version. Please suggest fix to this problem or any work around Error Py4JJavaError Traceback (most recent call last) <command-2625366351750561> in...

  • 17373 Views
  • 3 replies
  • 4 kudos
Latest Reply
mcwir
Contributor
  • 4 kudos

its seems like maven issue

  • 4 kudos
2 More Replies
tanin
by Contributor
  • 3089 Views
  • 4 replies
  • 7 kudos

Does anybody feel the unit test on Dataset is slow? (much slower than RDD). This is in Scala.

I profile it and it seems the slowness comes from Spark planning, especially for a more complex job (e.g. 100+ joins). Is there a way to speed it up (e.g. by disabling certain optimization)?

  • 3089 Views
  • 4 replies
  • 7 kudos
Latest Reply
mcwir
Contributor
  • 7 kudos

I had similar feeling recently.

  • 7 kudos
3 More Replies
Merchiv
by New Contributor III
  • 4422 Views
  • 3 replies
  • 1 kudos

Resolved! How to use uuid in SQL merge into statement

I have a Merge into statement that I use to update existing entries or create new entries in a dimension table based on a natural business key.When creating new entries I would like to also create a unique uuid for that entry that I can use to crossr...

  • 4422 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

you might wanna look into an identity column, which is possible now in delta lake.https://www.databricks.com/blog/2022/08/08/identity-columns-to-generate-surrogate-keys-are-now-available-in-a-lakehouse-near-you.html

  • 1 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 1902 Views
  • 3 replies
  • 11 kudos

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace.

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace. 

  • 1902 Views
  • 3 replies
  • 11 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 11 kudos

1000 has been documented to be by default, though I have never checked the correctness.

  • 11 kudos
2 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 2017 Views
  • 2 replies
  • 9 kudos

Kafka integration with Databricks

Hi allI want to integrate Kafka with databricks if anyone can share any doc or code it will help me a lot.Thanks in advance

  • 2017 Views
  • 2 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

This is code that I am using to read from KafkainputDF = (spark .readStream .format("kafka") .option("kafka.bootstrap.servers", host) .option("kafka.ssl.endpoint.identification.algorithm", "https") .option("kafka.sasl.mechanism", "PLAIN") .option("ka...

  • 9 kudos
1 More Replies
rpshgupta
by New Contributor III
  • 9379 Views
  • 7 replies
  • 6 kudos

Databricks notebook failed with "Caused by: java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://adls.dfs.core.windows.net/raw/file.csv?upn=false&action=getStatus&timeout=90".

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 458.0 failed 4 times, most recent failure: Lost task 0.3 in stage 458.0 (TID 2247) (172.18.102.75 executor 1): com.databricks.sql.io.FileReadException: Error while rea...

  • 9379 Views
  • 7 replies
  • 6 kudos
Latest Reply
Vidula
Honored Contributor
  • 6 kudos

Hi @Rupesh gupta​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 6 kudos
6 More Replies
Ryan_Chynoweth
by Esteemed Contributor
  • 4320 Views
  • 2 replies
  • 4 kudos

Connecting to Azure SQL from Azure Databricks with firewalls

We are trying to connect to an Azure SQL Server from Azure Databricks using JDBC, but have faced issues because our firewall blocks everything. We decided to whitelist IPs from the SQL Server side and add a public subnet to make the connection work. ...

  • 4320 Views
  • 2 replies
  • 4 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 4 kudos

Using subnets for Databricks connectivity is the correct thing to do. This way you ensure the resources (clusters) can connect to the SQL Database. We also recommend using NPIP (No Public IPs) so that there won't be any public ip associated with the...

  • 4 kudos
1 More Replies
nameziane
by New Contributor III
  • 13239 Views
  • 4 replies
  • 2 kudos

Set version (VERSION AS OF) dynamically from return of a subquery

Hello,We have a business request to compare the evolution in a certain delta table.We would like to compare the latest version of the table with the previous one using Delta time travel.The main issue we are facing is to retrieve programmatically us...

  • 13239 Views
  • 4 replies
  • 2 kudos
Latest Reply
apingle
Contributor
  • 2 kudos

In the docs it says that "'Neither timestamp_expression nor version can be subqueries." So it does sound challenging. I also tried playing with widgets to see if it could be populated using SQL but didn't succeed. With python it's really easy to do.

  • 2 kudos
3 More Replies
cmilligan
by Contributor II
  • 2454 Views
  • 3 replies
  • 4 kudos

Resolved! Pass through if a job was run as scheduled or if manual

I have a notebook that sets up parameters for the run based on some job parameters set by the user as well as the current date of the run. I want to supersede some of this logic and just use the manual values if kicked off manually. Is there a way to...

  • 2454 Views
  • 3 replies
  • 4 kudos
Latest Reply
SS2
Valued Contributor
  • 4 kudos

You can create widgets by using this- dbutils.widgets.text("widgetName", "")To get the value for that widget:- dbutils.widgets.get("widgetName")So by using this you can manually create widgets (variable) and can run the process by giving desired valu...

  • 4 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels