Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hey, we are using DLT along with SCD I via the create_target_table function. It does actually not create the table as defined, but rather a view., however on top of the expected table we see system generated tables e.g.: __apply_changes_*Is there a w...
Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CAF52B4640>: Failed to establis...
I have huge no of small files in s3 and I was going through few blog where people are telling that providing list of files is faster like (spark.read.csv([file1,file2,file3]) instead of giving directory with wild card Reason : Spark actually does fi...
I am having a workflow with a task that is dependant on external application execution (not residing in Databricks). After external application finishes, how to update the status of a task to complete. Currently, Jobs API doesn't support status updat...
Sometimes when I am running R code in a Databricks notebook I am given this error. The cell I am running fails, and my whole R 'session' seems to get screwed up. For example my stored variables disappear, and I have to re-load my packages etc. It is ...
I couldn't find it clearly explained anywhere, so hope sb here shed some light on that.Few questions:1) Where does delta tables are stored? Docs say: "Delta Lake uses versioned Parquet files to store your data in your cloud storage"So where exactly i...
Hello,My team is currently working on azure databricks with a mid sized repo. When we wish to import pyspark functions and classes from other notebooks we currently use %run <relpath>which is less than ideal.I would like to replicate the functionalit...
Running Pyspark script getting the following error depending on which xml I query:cannot resolve 'explode(...)' due to data type mismatchThe pyspark code:from pyspark.sql import SparkSession
JOB_NAME = "Complex file to delimeted files transformer"
...
Hello,We are trying to load a Delta table from an Azure Data Lake Storage container into Power BI using the Databricks SQL Endpoint.We configured the SQL Workspace data to have access to the ADLS Delta table and created a view; we are able to query t...
@Marius Condescu​ Could you please include below spark config and try-spark.hadoop.fs.azure.account.oauth.provider.type.ariaprime.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProviderspark.hadoop.fs.azure.account.auth.typ...
We offer the best web hosting solutions that are blazing fast, and ultra reliable & our sales & support team is here to help you find the right solutions
Something fun for your Friday! If you are a visual person like me, you may like this image that was recently shared in our internal Databricks slack instance. Who else ï§¡s Legos? If you have seen data all 6 ways with Databricks, give this a ï§¡ !!!
Hi, I wondered if some of you have had this issue before and how it can be solved. In a Databricks Job, we have a UBQ with a Painless script for ES. these are the options. Staging and prod are the same configurations, but Staging is failing with the ...
I am trying to something like this but getting error like :Error in SQL statement: AnalysisException: Undefined function: 'DATEADD'. This function is neither a registered temporary function nor a permanent function registered in the database 'default...
Dateadd was added in DBR 10.4 and is in DBSQL current.SELECT DATEADD(HOUR,IFNULL(100, 0),current_date) AS Date_Created_Local=> 2022-05-31T04:00:00.000+0000.You can also use one of these casts to turn any wellformed string into an interval:SELECT curr...
Hi everyone, I recently upgraded the runtime version of one of the databricks job to 10.4 LTS but Pattern Matching is not working as expected the same code is working in 7.3 LTS.Basically doing this and returning Left or Right: val result = spark.sql...