Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
Hi I am just getting started in databricks would appreciate some help here.I have 10TB TPCDS in S3 i a hive partition structure.My goal is to benchmark a data bricks cluster on this data.after setting all IAM credentials according to this https://doc...
Hi Expert,How we can setup multiple notebook in a sequence order in flow for an example 1 pipeline have notebook1 - sequence 1,Notebook2- Sequence 2(in 1pipeline only)
Not sure how to approach your challenge but something you can is to use the Databricks Job Scheduler or if you want an external solution in Azure you can call several notebooks from DataFactory.
HelloAfter upgrading my cluster from DBR 12 to 14.1 I got a MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION on some of my Joinsdf1.join(
df2,
[df1["name"] == df2["name"], df1["age"] == df2["age"]],
'left_outer'
)I resolved it by...
Hello community!I'm currently working on Spark scripts for data processing and facing some performance challenges. Any tips or suggestions on optimizing code for better efficiency? Your expertise is highly appreciated! paybyplatema Thanks.
I was trying to push .ipynb file to github from Azure DB to github and appears that original file is converted to source code as .py.Why does databricks do this and how can I control which ones to do or not ?I need to keep some files as .ipynb.Thanks...
Hi,I'm trying to call the DLT api to kickoff my delta live table flow with a web API call block from Azure Data Factory. I have two environments: one DEV and one PROD.The DEV environment works fine, the response is giving me the update_id, but the PR...
i have usecase to call rest API and then return response file with base64Is it possible save the response directly to ADLS without convert it to file first ?
Hi. Recently, I've found that Databricks is really slow when editing notebooks, such as adding cells, copying and pasting text or cells, etc. It's just the past few weeks actually. I'm using Chrome version 118.0.5993.118. Everything else with Chrome ...
It seems related to the notebook length (number of cells). The notebook that was really slow had about 40-50 cells, which I've done before without issue. Anyway after starting a new notebook using Chrome, it seems useable again. So without a specific...
I have installed graphframes library from maven repository in the cluster (13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12), Standard DS4_v2). The library that I have installed is graphframes:graphframes:0.8.3-spark3.5-s_2.13.I can import the graph...
Unable to run query- Insert into with 'Nan' value in SQL Editor.Query :-Insert into ABC with values('xyz',123,Nan);Error :-org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation cannot be cast to org.apache.spark.sql.execution.datasources...
Hi all,In Lakeview Dashboard, I would like to visualize some delta table's info returned from the SQL statement 'DESCRIBE DETAIL'. In 'Data' tab, the dataset which contained that statement returned all detail info of my delta table. But the visualiz...
Hi Team, I'm trying to connect the system (system.billing.usage table) catalog (unity is enabled) in my workspace from Tableau. Im using Tableau version 2023.1 and ODBC driver version 2.7.5.1012-osx. I was able to create a connection but when Im conn...
Hi @Cert-TeamMy test got suspend without any reason, the support guy had secured the area and I had also showed my entire desk but still they suspended my test, this is a huge loss please help me and reschedule my exam.I have also tried to raised a t...
Hi @Cert-Team,Still not able to raise ticket, after submitting the info for the ticket I'm not receiving any confirmation mail. Please help me to reschedule the exam as there this suspension is done without any reason and I haven't done anything whic...
HelloI having an issue with using a window in pyspark. I created a simple example that reproduce the error, basically i want to define a window with a dynamic size (say from another column ) instead of a fix value from pyspark.sql import SparkSessio...