cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Aviral-Bhardwaj
by Esteemed Contributor III
  • 10619 Views
  • 6 replies
  • 33 kudos

Resolved! Timezone understanding

Today I was working in Timezone kind of data but my Singapore user want to see their time in the Data and USA user want to see their time in the datainstead of both, we all are getting UTC time,how to solve this issuePlease guide Data can be anything...

  • 10619 Views
  • 6 replies
  • 33 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 33 kudos

I got it guys it was happening due to a library conflict now your answers are really helpful I tried all things

  • 33 kudos
5 More Replies
Ruby8376
by Valued Contributor
  • 3857 Views
  • 5 replies
  • 1 kudos

Resolved! Databricks authentication

Hi there!!we are planning to use databricks -tableau on prem integration for reporting. Data would reside in delta lake and using ta leau-databricks connector, user would be able to generate reports from that data .question is: a private end point wi...

  • 3857 Views
  • 5 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

and make sure that you are going with SPARK SQL connection , else it will always fail

  • 1 kudos
4 More Replies
Sharmila04
by New Contributor
  • 5017 Views
  • 3 replies
  • 0 kudos

DBFS File Browser Error RESOURCE_DOES_NOT_EXIST:

Hi,I am new to databricks, and was trying to follow some tutorial to upload a file and move it under some different folder. I used DBFS option.While trying to move/rename the file I am getting below error, can you please help to understand why I am g...

image
  • 5017 Views
  • 3 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

use these three commands and it will workdbutils.fs.ls('dbfs:/FileStore/vehicle_data.csv')dbutils.fs.ls('/dbfs/FileStore/vehicle_data.csv')dbutils.fs.ls('/dbfs/dbfs/FileStore/vehicle_data.csv')ThanksAviral

  • 0 kudos
2 More Replies
SRK
by Contributor III
  • 10484 Views
  • 2 replies
  • 0 kudos

How to get the count of dataframe rows when reading through spark.readstream using batch jobs?

I am trying to read messages from kafka topic using spark.readstream, I am using the following code to read it.My CODE:df = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx") .option("subscr...

  • 10484 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 0 kudos

You can try this approach:https://stackoverflow.com/questions/57568038/how-to-see-the-dataframe-in-the-console-equivalent-of-show-for-structured-st/62161733#62161733ReadStream is running a thread in background so there's no easy way like df.show().

  • 0 kudos
1 More Replies
KVNARK
by Honored Contributor II
  • 9876 Views
  • 7 replies
  • 7 kudos

Resolved! Copying delta to Azure SQL DB.

How to copy DELTA to AZURE SQL DB using ADF? Earlier we are using parquet format. Now, We have converted parquet to Delta by using below command: CONVERT TO DELTA parquet.path (Azure Blob Path)

  • 9876 Views
  • 7 replies
  • 7 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 7 kudos

Hi @Aviral Bhardwaj​ in ADF there is an option of a delta lake file you can directly save your file in delta lake format.

  • 7 kudos
6 More Replies
Mado
by Valued Contributor II
  • 2697 Views
  • 2 replies
  • 2 kudos

How can I pull all branches at once in Databricks?

Hi,I have cloned a remote repository into my folder in Repos. The repository has several feature branches.When I want to pull any branch, I select the desired branch in Repos and click on "Pull" button. And then I need to select another branch. Is th...

  • 2697 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 2 kudos

Hi @Mohammad Saber​ below QnA link might help you-linklink2

  • 2 kudos
1 More Replies
mrmota
by New Contributor
  • 3127 Views
  • 1 replies
  • 3 kudos

ssh connection with paramiko

I'm trying to use the notebook with python to access our server through ssh. But I am not able to connect through the paramiko library. I have already authorized our server's firewall to accept the databrick ip. Have you ever had a similar case? can ...

  • 3127 Views
  • 1 replies
  • 3 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 3 kudos

yes this can happen you have to whitelist that Port in the databricks configuration, there is one spark configuration that you can add to your cluster, that will whitelist your particular portas that configuration can only give by databricks, you can...

  • 3 kudos
AK98
by New Contributor II
  • 6925 Views
  • 3 replies
  • 0 kudos

Py4JJavaError when trying to write dataframe to delta table

I'm trying to write a dataframe to a delta table and am getting this error.Im not sure what the issue is as I had no problem successfully writing other dataframes to delta tables. I attached a snippet of the data as well along with the schema:

image.png image image
  • 6925 Views
  • 3 replies
  • 0 kudos
Latest Reply
ramravi
Contributor II
  • 0 kudos

Does your delta tables contains all columns what your dataframe contains. If there is schema mismatch it might be a reason for failure.df.write.format("delta") \ .option("mergeSchema", "true") \ .mode("append") \ .sav...

  • 0 kudos
2 More Replies
espenol
by New Contributor III
  • 5062 Views
  • 4 replies
  • 1 kudos

How to work with DLT pipelines? Best practices?

So I'm used to developing notebooks interactively. Write some code, run to see if I made an error and if no error, filter and display dataframe to see that I did what I intended. With DLT pipelines, however, I can't run interactively. Is my understan...

  • 5062 Views
  • 4 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 1 kudos

yes exactly i am also working on the dlt , and what i get to know about from this is that if we want to check our error , we have to run the pipeline again and again for debugging it , but this is not the best practice to do so , so the other metho...

  • 1 kudos
3 More Replies
Choolanadu
by New Contributor
  • 5487 Views
  • 1 replies
  • 0 kudos

Airflow - How to pull XComs value in the notebook task?

Using AIrflow, I have created a DAG with a sequence of notebook tasks. The first notebook returns a batch id; the subsequent notebook tasks need this batch_id.I am using the DatabricksSubmitRunOperator to run the notebook task. This operator pushes ...

  • 5487 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 0 kudos

From what I understand - you want to pass a run_id parameter to the second notebook task?You can: Create a widget param inside your databricks notebook (https://docs.databricks.com/notebooks/widgets.html) that will consume your run_idPass the paramet...

  • 0 kudos
Viren123
by Contributor
  • 5387 Views
  • 1 replies
  • 1 kudos

MERGE in the delta table

Hello,I would like to UPDATE the data in the delta table. For the same MERGE option came to my attention.However my below code ends up with errors as shown at the end. Kindly advice what am I missing here. Thank you.**********************************...

image image.png
  • 5387 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 1 kudos

I think that you're mixing DataFrames spark vs. pandas.Try creating dfSource in Spark instead of Pandas.

  • 1 kudos
Kash
by Contributor III
  • 2823 Views
  • 1 replies
  • 0 kudos

Data-quality help: Save Data Profile dbutils.data.summarize(df) to table

Hi there,We would like to create a data quality database that helps us understand how complete our data is. We would like to run a job each day that basically outputs the same table data as dbutils.data.summarize(df) for a given table and save it to ...

  • 2823 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 0 kudos

From what I know there's no easy way to save dbutils.data.summarize() into a df.You can still create your custom python/pyspark code to profile your data and save the output.

  • 0 kudos
KVNARK
by Honored Contributor II
  • 2125 Views
  • 1 replies
  • 4 kudos

Resolved! a usecase to query millions of values.

have a small use case where we need to query the sql database with 1 million values(dynamically returned from python code) in the condition from python function. eg: select * from id in (1,2,23,33........1M). I feel this is very bad approach. Is ther...

  • 2125 Views
  • 1 replies
  • 4 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 4 kudos

You can also create a temporary view with the output from python code (one id = one row) and then inner join the view to the table. IMO will improve readability of your code.

  • 4 kudos
Labels