cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

thushar
by Contributor
  • 4054 Views
  • 4 replies
  • 2 kudos

Can we use a variable to mention the path in the %run command

To compile the Python scripts in Azure notebooks, we are using the magic command %run.The first parameter for this command is the notebook path, is it possible to mention that path in a variable (we have to construct this path dynamically during the ...

  • 4054 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Thushar R​ , We haven’t heard from you on the last response from @Akash Bhat​ â€‹ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to oth...

  • 2 kudos
3 More Replies
Aquib
by New Contributor
  • 2386 Views
  • 3 replies
  • 0 kudos

How to migrate DBFS from one tenant to another tenant

I am working on Databricks workspace migration, where I need to copy the Databricks workspace including DBFS from source to target (both source and target are in different subscription/account). Can someone suggest what could be approach to migrate D...

  • 2386 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Aquib Javeed​, We haven’t heard from you on the last response from me, and I was checking back to see if my suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.

  • 0 kudos
2 More Replies
Saurav
by New Contributor III
  • 4385 Views
  • 6 replies
  • 7 kudos

spark cluster monitoring and visibility

Hey. I'm working on a project where I'd like to be able to view and play around with the spark cluster metrics. I'd like to know what the utilization % and max values are for metrics like CPU, memory and network. I've tried using some open source sol...

  • 4385 Views
  • 6 replies
  • 7 kudos
Latest Reply
Saurav
New Contributor III
  • 7 kudos

Hey @Kaniz Fatma​, I Appreciate the suggestions and will be looking into them. Haven't gotten to it yet so I didn't want to mention whether they worked for me or not. Since I'm looking to avoid solutions like DataDog, I'll be checking out the Prometh...

  • 7 kudos
5 More Replies
irfanaziz
by Contributor II
  • 1590 Views
  • 3 replies
  • 3 kudos

How to make a string column with numeric and alphabet values use as partition?

So i have two partitions defined for this delta table, One is year('GJHAR') contains year values, and the other is a string column('BUKS') with around 124 unique values. However, there is one problem with the 2nd partition column('BUKS'), The values ...

  • 1590 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @nafri A​, We haven’t heard from you on the last response from @Werner Stinckens​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to ot...

  • 3 kudos
2 More Replies
marta_cdc
by New Contributor
  • 2465 Views
  • 4 replies
  • 0 kudos

Automate in code the launching of a sql script

Do you know how to automate in code the launching of a sql script? Currently I do it by selection. 

image
  • 2465 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Marta Vicente Sánchez​ , We haven’t heard from you on the last response from me, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise, ...

  • 0 kudos
3 More Replies
Kaniz_Fatma
by Community Manager
  • 1307 Views
  • 4 replies
  • 2 kudos

I first set up a delta live table using Python as follows.@dlt.table def transaction(): return ( spark .readStream .format("cloudFi...

I first set up a delta live table using Python as follows.@dlt.table def transaction(): return ( spark .readStream .format("cloudFiles") .schema(transaction_schema) .option("cloudFiles.format", "parquet") .load(path) )And ...

  • 1307 Views
  • 4 replies
  • 2 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 2 kudos

@Kaniz Fatma​ - is the error because of the partition column being created rather than using predefined column?I'm intrigued to know the flow of execution of the dlt script written above. So as I see, once the readStream creates a df with a new colum...

  • 2 kudos
3 More Replies
Ramya
by New Contributor III
  • 12584 Views
  • 6 replies
  • 3 kudos

Resolved! Databricks Rest API

Hi, I am having an issue accessing data bricks API 2.0/workspace/mkdirs through python. I am using the below azure method to generate the access token. I am not sure why I am getting 404 any suggestions?token_credential = DefaultAzureCredential()sc...

  • 12584 Views
  • 6 replies
  • 3 kudos
Latest Reply
Ramya
New Contributor III
  • 3 kudos

Yes that is correct!. It worked. Thanks

  • 3 kudos
5 More Replies
Dineshkumar_Raj
by New Contributor
  • 2237 Views
  • 2 replies
  • 1 kudos

why the job running time and command execution time not matching in databricks

I have a azure databricks job and it's triggered via ADF using a API call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands d...

  • 2237 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @DineshKumar​ Does @Prabakar Ammeappin​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Cheers!

  • 1 kudos
1 More Replies
abaschkim
by New Contributor II
  • 1732 Views
  • 4 replies
  • 0 kudos

Delta Lake table: large volume due to versioning

I have set up a Spark standalone cluster and use Spark Structured Streaming to write data from Kafka to multiple Delta Lake tables - simply stored in the file system. So there are multiple writes per second. After running the pipeline for a while, I ...

  • 1732 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Kim Abasch​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 0 kudos
3 More Replies
KumarShiv
by New Contributor III
  • 4179 Views
  • 5 replies
  • 11 kudos

Resolved! Databricks Issue:- assertion failed: Invalid shuffle partition specs:

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).Pls hel...

DB_Issue
  • 4179 Views
  • 5 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

Please use display(df_FinalAction)Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

  • 11 kudos
4 More Replies
Constantine
by Contributor III
  • 1990 Views
  • 4 replies
  • 3 kudos

Error when writing dataframe to s3 location using PySpark

I get an error when writing dataframe to s3 location Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourI have gone through all the columns and none of them have any special characters. Any idea how to fix this?

  • 1990 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @John Constantine​, We haven’t heard from you on the last response from @Emilie Myth​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Other...

  • 3 kudos
3 More Replies
Joe_C
by New Contributor
  • 1123 Views
  • 3 replies
  • 0 kudos

From what I'm seeing Databricks doesn't have DECLARE function, how can I ... ?

How can I re-write this statement in a way that is compatible for Databricks?DECLARE @DATE_BEGIN_TEST AS DATE = DATEADD(DAY, - 60, GETDATE());DECLARE @DATE_END_TEST AS DATE = GETDATE();

  • 1123 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Joseph Collins​ , We haven’t heard from you on the last response from me, and I was checking back to see if you have a resolution yet. If you have any solution, please do share that same with the community as it can be helpful to others. Otherwis...

  • 0 kudos
2 More Replies
Reza
by New Contributor III
  • 1711 Views
  • 3 replies
  • 1 kudos

Can we order the widgets in Databricks?

I am trying to order the way that widgets are shown in Databricks, but I cannot. For example, I have two text widgets (start date and end date). Databricks shows "end_date" before "start_date" on top, as the default order is alphabetical. Obviously, ...

  • 1711 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Reza Rajabi​ , We haven’t heard from you on the last response from @Prabakar Ammeappin​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpfu...

  • 1 kudos
2 More Replies
blakedwb
by New Contributor III
  • 5171 Views
  • 4 replies
  • 1 kudos

Resolved! How to Incorporate Historical Data in Delta Live Pipeline?

Now that delta live pipeline is GA we are looking to convert our existing processes to leverage it. One thing that remains unclear is how to populate new delta live tables with historical data? Currently we are looking to use CDC by leveraging create...

  • 5171 Views
  • 4 replies
  • 1 kudos
Latest Reply
blakedwb
New Contributor III
  • 1 kudos

@Kaniz Fatma​ Hello, sorry for the delayed response. The guide does not answer how to incorporate existing delta tables that container historical data into a delta live pipeline. We ended up changing the source data to pull from the existing bronze t...

  • 1 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels