cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

tom_shaffner
by New Contributor III
  • 11595 Views
  • 6 replies
  • 8 kudos

Resolved! Is there some form of enablement required to use Delta Live Tables (DLT)?

I'm trying to use delta live tables, but if I import even the example notebooks I get a warning saying `ModuleNotFoundError: No module named 'dlt'`. If I try and install via pip it attempts to install a deep learning framework of some sort.I checked ...

  • 11595 Views
  • 6 replies
  • 8 kudos
Latest Reply
Insight6
New Contributor II
  • 8 kudos

Here's the solution I came up with... Replace `import dlt` at the top of your first cell with the following: try: import dlt # When run in a pipeline, this package will exist (no way to import it here) except ImportError: class dlt...

  • 8 kudos
5 More Replies
Swapnil1998
by New Contributor III
  • 2898 Views
  • 2 replies
  • 2 kudos

Ingest Cosmos Mongo DB data using Databricks by applying filters

I would need to add a filter condition while ingesting data from a Cosmos Mongo DB using Databricks,I am using the below query to ingest data of a Cosmos Collection:df = spark.read \.format('com.mongodb.spark.sql.DefaultSource') \.option('uri', sourc...

  • 2898 Views
  • 2 replies
  • 2 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 2 kudos

This widget could not be displayed.
I would need to add a filter condition while ingesting data from a Cosmos Mongo DB using Databricks,I am using the below query to ingest data of a Cosmos Collection:df = spark.read \.format('com.mongodb.spark.sql.DefaultSource') \.option('uri', sourc...

This widget could not be displayed.
  • 2 kudos
This widget could not be displayed.
1 More Replies
dineshg
by New Contributor III
  • 3687 Views
  • 3 replies
  • 6 kudos

Resolved! pyspark - execute dynamically framed action statement stored in string variable

I need to execute union statement which is framed dynamically and stored in string variable. I framed the union statement, but struck with executing the statement. Does anyone know how to execute union statement stored in string variable? I'm using p...

  • 3687 Views
  • 3 replies
  • 6 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 6 kudos

@Dineshkumar Gopalakrishnan​ using python's exec() function can be used to execute a python statement, which in your case could be pyspark union statement. Refer below sample code snippet for your reference.df1 = spark.sparkContext.parallelize([(1, 2...

  • 6 kudos
2 More Replies
BearInTheWoods
by New Contributor III
  • 2566 Views
  • 1 replies
  • 4 kudos

Importing Azure SQL data into Databricks

Hi,I am looking at building a data warehouse using Databricks. Most of the data will be coming from Azure SQL, and we now have Azure SQL CDC enabled to capture changes. Also I would like to import this without paying for additional connectors like Fi...

  • 2566 Views
  • 1 replies
  • 4 kudos
Latest Reply
ravinchi
New Contributor III
  • 4 kudos

@Bear Woods​ Hi! were you able to create DLT tables using CDC feature from sources like sql tables ? even I'm kinda in your situation, you need to leverage apply_changes function and create_streaming_live_table() function but it required intermediate...

  • 4 kudos
g96g
by New Contributor III
  • 6023 Views
  • 8 replies
  • 0 kudos

Resolved! ADF pipeline fails when passing the parameter to databricks

I have project where I have to read the data from NETSUITE using API. Databricks Notebook runs perfectly when I manually insert the table names I want to read from the source. I have dataset (csv) file in adf with all the table names that I need to r...

  • 6023 Views
  • 8 replies
  • 0 kudos
Latest Reply
mcwir
Contributor
  • 0 kudos

Have you tried do debug the json payload of adf trigger ? maybe it wrongly conveys tables names

  • 0 kudos
7 More Replies
Ramabadran
by New Contributor II
  • 11433 Views
  • 3 replies
  • 4 kudos

java.lang.NoClassDefFoundError: scala/Product$class

Hi I am getting "java.lang.NoClassDefFoundError: scala/Product$class" error while using Deequ 1.0.5 version. Please suggest fix to this problem or any work around Error Py4JJavaError Traceback (most recent call last) <command-2625366351750561> in...

  • 11433 Views
  • 3 replies
  • 4 kudos
Latest Reply
mcwir
Contributor
  • 4 kudos

its seems like maven issue

  • 4 kudos
2 More Replies
tanin
by Contributor
  • 1972 Views
  • 4 replies
  • 7 kudos

Does anybody feel the unit test on Dataset is slow? (much slower than RDD). This is in Scala.

I profile it and it seems the slowness comes from Spark planning, especially for a more complex job (e.g. 100+ joins). Is there a way to speed it up (e.g. by disabling certain optimization)?

  • 1972 Views
  • 4 replies
  • 7 kudos
Latest Reply
mcwir
Contributor
  • 7 kudos

I had similar feeling recently.

  • 7 kudos
3 More Replies
Merchiv
by New Contributor III
  • 3686 Views
  • 3 replies
  • 1 kudos

Resolved! How to use uuid in SQL merge into statement

I have a Merge into statement that I use to update existing entries or create new entries in a dimension table based on a natural business key.When creating new entries I would like to also create a unique uuid for that entry that I can use to crossr...

  • 3686 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

you might wanna look into an identity column, which is possible now in delta lake.https://www.databricks.com/blog/2022/08/08/identity-columns-to-generate-surrogate-keys-are-now-available-in-a-lakehouse-near-you.html

  • 1 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 1682 Views
  • 3 replies
  • 11 kudos

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace.

Is there any limitation in querying the no. of SQL queries in Databricks SQL workspace. 

  • 1682 Views
  • 3 replies
  • 11 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 11 kudos

1000 has been documented to be by default, though I have never checked the correctness.

  • 11 kudos
2 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 1748 Views
  • 2 replies
  • 9 kudos

Kafka integration with Databricks

Hi allI want to integrate Kafka with databricks if anyone can share any doc or code it will help me a lot.Thanks in advance

  • 1748 Views
  • 2 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

This is code that I am using to read from KafkainputDF = (spark .readStream .format("kafka") .option("kafka.bootstrap.servers", host) .option("kafka.ssl.endpoint.identification.algorithm", "https") .option("kafka.sasl.mechanism", "PLAIN") .option("ka...

  • 9 kudos
1 More Replies
rpshgupta
by New Contributor III
  • 8186 Views
  • 7 replies
  • 6 kudos

Databricks notebook failed with "Caused by: java.io.FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://adls.dfs.core.windows.net/raw/file.csv?upn=false&action=getStatus&timeout=90".

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 458.0 failed 4 times, most recent failure: Lost task 0.3 in stage 458.0 (TID 2247) (172.18.102.75 executor 1): com.databricks.sql.io.FileReadException: Error while rea...

  • 8186 Views
  • 7 replies
  • 6 kudos
Latest Reply
Vidula
Honored Contributor
  • 6 kudos

Hi @Rupesh gupta​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 6 kudos
6 More Replies
Ryan_Chynoweth
by Esteemed Contributor
  • 3775 Views
  • 2 replies
  • 4 kudos

Connecting to Azure SQL from Azure Databricks with firewalls

We are trying to connect to an Azure SQL Server from Azure Databricks using JDBC, but have faced issues because our firewall blocks everything. We decided to whitelist IPs from the SQL Server side and add a public subnet to make the connection work. ...

  • 3775 Views
  • 2 replies
  • 4 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 4 kudos

Using subnets for Databricks connectivity is the correct thing to do. This way you ensure the resources (clusters) can connect to the SQL Database. We also recommend using NPIP (No Public IPs) so that there won't be any public ip associated with the...

  • 4 kudos
1 More Replies
nameziane
by New Contributor III
  • 9901 Views
  • 4 replies
  • 2 kudos

Set version (VERSION AS OF) dynamically from return of a subquery

Hello,We have a business request to compare the evolution in a certain delta table.We would like to compare the latest version of the table with the previous one using Delta time travel.The main issue we are facing is to retrieve programmatically us...

  • 9901 Views
  • 4 replies
  • 2 kudos
Latest Reply
apingle
Contributor
  • 2 kudos

In the docs it says that "'Neither timestamp_expression nor version can be subqueries." So it does sound challenging. I also tried playing with widgets to see if it could be populated using SQL but didn't succeed. With python it's really easy to do.

  • 2 kudos
3 More Replies
cmilligan
by Contributor II
  • 2152 Views
  • 3 replies
  • 4 kudos

Resolved! Pass through if a job was run as scheduled or if manual

I have a notebook that sets up parameters for the run based on some job parameters set by the user as well as the current date of the run. I want to supersede some of this logic and just use the manual values if kicked off manually. Is there a way to...

  • 2152 Views
  • 3 replies
  • 4 kudos
Latest Reply
SS2
Valued Contributor
  • 4 kudos

You can create widgets by using this- dbutils.widgets.text("widgetName", "")To get the value for that widget:- dbutils.widgets.get("widgetName")So by using this you can manually create widgets (variable) and can run the process by giving desired valu...

  • 4 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels