Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
My dashboard uses Athena as data source for its availability (I don't need to fire up the cluster and manually refresh the data), but it requires me to create the tables manually. Wondering if there is a similar method like the .saveAsTable() to crea...
While connecting the Databricks and Grafana, I have gone through the following approach.Install Grafna Agent in Databrics Clusters from Databricks console --> Not working since the system is not booted with systemd as init systemSince Spark 3 has Pro...
There is a repo with Prometheus gateway https://gist.github.com/Lowess/3a71792d2d09e38bf8f524644bbf8349. In the community, we usually use DataDog as both plays nicely https://docs.datadoghq.com/integrations/databricks/?tabs=driveronly
Hi,I was going through this sessionhttps://tinyurl.com/databrickshcarebut on slides there is link to notebook which is broken. can you guys fix and share the link so I could try these notebooks ?this is mentioned in the slides for notebook linkhttps:...
To compile the Python scripts in Azure notebooks, we are using the magic command %run.The first parameter for this command is the notebook path, is it possible to mention that path in a variable (we have to construct this path dynamically during the ...
@Thushar R​ I don't think it is possible to pass the notebook path in a variable and run it with a %run.I believe you can make use of notebook workflows. Notebook workflows are a complement to %runhttps://docs.databricks.com/notebooks/notebook-workfl...
Hey. I'm working on a project where I'd like to be able to view and play around with the spark cluster metrics. I'd like to know what the utilization % and max values are for metrics like CPU, memory and network. I've tried using some open source sol...
Hey @Kaniz Fatma​, I Appreciate the suggestions and will be looking into them. Haven't gotten to it yet so I didn't want to mention whether they worked for me or not. Since I'm looking to avoid solutions like DataDog, I'll be checking out the Prometh...
So i have two partitions defined for this delta table, One is year('GJHAR') contains year values, and the other is a string column('BUKS') with around 124 unique values. However, there is one problem with the 2nd partition column('BUKS'), The values ...
@nafri A​ , So to make sure I understand correctly: if you partition the table with only numeric data in BUKS, new incoming data cannot be added if it contains a string; but the other way around it does work?Could it be that spark has inferred the co...
Hi, I am having an issue accessing data bricks API 2.0/workspace/mkdirs through python. I am using the below azure method to generate the access token. I am not sure why I am getting 404 any suggestions?token_credential = DefaultAzureCredential()sc...
I have a azure databricks job and it's triggered via ADF using a API call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands d...
Hey there @DineshKumar​ Does @Prabakar Ammeappin​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Cheers!
I have set up a Spark standalone cluster and use Spark Structured Streaming to write data from Kafka to multiple Delta Lake tables - simply stored in the file system. So there are multiple writes per second. After running the pipeline for a while, I ...
Hey there @Kim Abasch​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....
I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).Pls hel...
I get an error when writing dataframe to s3 location Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourI have gone through all the columns and none of them have any special characters. Any idea how to fix this?
I got this error when I was running a query given to me, and the author didn't have aliases on aggregates. Something like:sum(dollars_spent)needed an alias:sum(dollars_spent) as sum_dollars_spent
How can I re-write this statement in a way that is compatible for Databricks?DECLARE @DATE_BEGIN_TEST AS DATE = DATEADD(DAY, - 60, GETDATE());DECLARE @DATE_END_TEST AS DATE = GETDATE();