cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

SRK
by Contributor III
  • 7577 Views
  • 2 replies
  • 0 kudos

How to get the count of dataframe rows when reading through spark.readstream using batch jobs?

I am trying to read messages from kafka topic using spark.readstream, I am using the following code to read it.My CODE:df = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx") .option("subscr...

  • 7577 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

You can try this approach:https://stackoverflow.com/questions/57568038/how-to-see-the-dataframe-in-the-console-equivalent-of-show-for-structured-st/62161733#62161733ReadStream is running a thread in background so there's no easy way like df.show().

  • 0 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 6852 Views
  • 7 replies
  • 7 kudos

Resolved! Copying delta to Azure SQL DB.

How to copy DELTA to AZURE SQL DB using ADF? Earlier we are using parquet format. Now, We have converted parquet to Delta by using below command: CONVERT TO DELTA parquet.path (Azure Blob Path)

  • 6852 Views
  • 7 replies
  • 7 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 7 kudos

Hi @Aviral Bhardwaj​ in ADF there is an option of a delta lake file you can directly save your file in delta lake format.

  • 7 kudos
6 More Replies
Mado
by Valued Contributor II
  • 1769 Views
  • 2 replies
  • 2 kudos

How can I pull all branches at once in Databricks?

Hi,I have cloned a remote repository into my folder in Repos. The repository has several feature branches.When I want to pull any branch, I select the desired branch in Repos and click on "Pull" button. And then I need to select another branch. Is th...

  • 1769 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi @Mohammad Saber​ below QnA link might help you-linklink2

  • 2 kudos
1 More Replies
mrmota
by New Contributor
  • 1857 Views
  • 1 replies
  • 3 kudos

ssh connection with paramiko

I'm trying to use the notebook with python to access our server through ssh. But I am not able to connect through the paramiko library. I have already authorized our server's firewall to accept the databrick ip. Have you ever had a similar case? can ...

  • 1857 Views
  • 1 replies
  • 3 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 3 kudos

yes this can happen you have to whitelist that Port in the databricks configuration, there is one spark configuration that you can add to your cluster, that will whitelist your particular portas that configuration can only give by databricks, you can...

  • 3 kudos
AK98
by New Contributor II
  • 4403 Views
  • 3 replies
  • 0 kudos

Py4JJavaError when trying to write dataframe to delta table

I'm trying to write a dataframe to a delta table and am getting this error.Im not sure what the issue is as I had no problem successfully writing other dataframes to delta tables. I attached a snippet of the data as well along with the schema:

image.png image image
  • 4403 Views
  • 3 replies
  • 0 kudos
Latest Reply
ramravi
Contributor II
  • 0 kudos

Does your delta tables contains all columns what your dataframe contains. If there is schema mismatch it might be a reason for failure.df.write.format("delta") \ .option("mergeSchema", "true") \ .mode("append") \ .sav...

  • 0 kudos
2 More Replies
espenol
by New Contributor III
  • 2875 Views
  • 4 replies
  • 1 kudos

How to work with DLT pipelines? Best practices?

So I'm used to developing notebooks interactively. Write some code, run to see if I made an error and if no error, filter and display dataframe to see that I did what I intended. With DLT pipelines, however, I can't run interactively. Is my understan...

  • 2875 Views
  • 4 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 1 kudos

yes exactly i am also working on the dlt , and what i get to know about from this is that if we want to check our error , we have to run the pipeline again and again for debugging it , but this is not the best practice to do so , so the other metho...

  • 1 kudos
3 More Replies
Choolanadu
by New Contributor
  • 3272 Views
  • 1 replies
  • 0 kudos

Airflow - How to pull XComs value in the notebook task?

Using AIrflow, I have created a DAG with a sequence of notebook tasks. The first notebook returns a batch id; the subsequent notebook tasks need this batch_id.I am using the DatabricksSubmitRunOperator to run the notebook task. This operator pushes ...

  • 3272 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

From what I understand - you want to pass a run_id parameter to the second notebook task?You can: Create a widget param inside your databricks notebook (https://docs.databricks.com/notebooks/widgets.html) that will consume your run_idPass the paramet...

  • 0 kudos
Viren123
by Contributor
  • 3148 Views
  • 1 replies
  • 1 kudos

MERGE in the delta table

Hello,I would like to UPDATE the data in the delta table. For the same MERGE option came to my attention.However my below code ends up with errors as shown at the end. Kindly advice what am I missing here. Thank you.**********************************...

image image.png
  • 3148 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

I think that you're mixing DataFrames spark vs. pandas.Try creating dfSource in Spark instead of Pandas.

  • 1 kudos
Kash
by Contributor III
  • 1824 Views
  • 1 replies
  • 0 kudos

Data-quality help: Save Data Profile dbutils.data.summarize(df) to table

Hi there,We would like to create a data quality database that helps us understand how complete our data is. We would like to run a job each day that basically outputs the same table data as dbutils.data.summarize(df) for a given table and save it to ...

  • 1824 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

From what I know there's no easy way to save dbutils.data.summarize() into a df.You can still create your custom python/pyspark code to profile your data and save the output.

  • 0 kudos
KVNARK
by Honored Contributor II
  • 1240 Views
  • 1 replies
  • 4 kudos

Resolved! a usecase to query millions of values.

have a small use case where we need to query the sql database with 1 million values(dynamically returned from python code) in the condition from python function. eg: select * from id in (1,2,23,33........1M). I feel this is very bad approach. Is ther...

  • 1240 Views
  • 1 replies
  • 4 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 4 kudos

You can also create a temporary view with the output from python code (one id = one row) and then inner join the view to the table. IMO will improve readability of your code.

  • 4 kudos
ramravi
by Contributor II
  • 16375 Views
  • 2 replies
  • 6 kudos
  • 16375 Views
  • 2 replies
  • 6 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 6 kudos

hey @Ravi Teja​ there is two methods by which we can limit our datafame , by using take and limit .refer this concept myDataFrame.take(10)-> results in an Array of Rows. This is an action and performs collecting the data (like collect does).myDataFra...

  • 6 kudos
1 More Replies
Prototype998
by New Contributor III
  • 6018 Views
  • 2 replies
  • 1 kudos

Resolved! Connecting Databricks with FTP server

hey i want to know how to connect Databricks with the FTP server ??? any help would be really appreciated

  • 6018 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 1 kudos

hey @Punit Chauhan​ , refer this code to connect with FTP server Host=""   Login=""   Passwd=""   ftp_dir=""   ftp = ftplib.FTP(Host) ftp.login(Login,Passwd) ftp.cwd(ftp_dir) files=ftp.nlst(ftp_dir) print(files)

  • 1 kudos
1 More Replies
Prototype998
by New Contributor III
  • 3528 Views
  • 4 replies
  • 4 kudos

Resolved! Databricks notebook run

How to run the databricks notebook through ADF ??? 

  • 3528 Views
  • 4 replies
  • 4 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 4 kudos

Hi @Punit Chauhan​ you can use databricks notebook activity in ADF to trigger you databricks notebook via ADF-

  • 4 kudos
3 More Replies
Ajay-Pandey
by Esteemed Contributor III
  • 6344 Views
  • 3 replies
  • 13 kudos

Resolved! Fetching data in excel through delta sharing

Hi all,Is anyway that we can access or push data in delta sharing by using Microsoft excel?

  • 6344 Views
  • 3 replies
  • 13 kudos
Latest Reply
Rishabh-Pandey
Esteemed Contributor
  • 13 kudos

hey @Ajay Pandey​ yes recently the new excel feature also comes in the market that we can enable the delta sharing from excel also so whatever the changes you will made to delta , it will automaticaly get saved in the excel file also ,refer this lin...

  • 13 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels