cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RobDineen
by Contributor
  • 1020 Views
  • 4 replies
  • 0 kudos

Resolved! Pyspark to_date not coping with single digit Day or Month

Hi there i have a simple Pyspark To_date function but fails due to days or months from 1-9 sois there a nice easy way to get round this at allRegardsRob

RobDineen_0-1731324661487.png
  • 1020 Views
  • 4 replies
  • 0 kudos
Latest Reply
RobDineen
Contributor
  • 0 kudos

Resolved using format_string dff = df.withColumn("DayofMonthFormatted", when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), format_string("0%d", df.DayofMonth)).otherwise(df.DayofMonth))

  • 0 kudos
3 More Replies
Avinash_Narala
by Valued Contributor II
  • 1503 Views
  • 2 replies
  • 2 kudos

Fully serverless databricks SaaS

I'm exploring Databricks' fully serverless SaaS option, as shown in the attached image, which promises quick setup and $400 in initial credits. I'm curious about the pros and cons of using this fully serverless setup.Specifically, would this option b...

  • 1503 Views
  • 2 replies
  • 2 kudos
Latest Reply
gchandra
Databricks Employee
  • 2 kudos

There are; if you have spark config, customer jars, and init scripts, they won't work. Please check this page for long list of limitations. https://docs.databricks.com/en/compute/serverless/limitations.html

  • 2 kudos
1 More Replies
rcostanza
by New Contributor III
  • 999 Views
  • 4 replies
  • 2 kudos

Resolved! Changing git's author field when committing through Databricks

I have a git folder to a Bitbucket repo. Whenever I commit something, the commit uses my Bitbucket username (the unique name) in the field author, making it less readable when I'm reading a list of commits.For example, commits end up like this: commi...

  • 999 Views
  • 4 replies
  • 2 kudos
Latest Reply
yermulnik
New Contributor II
  • 2 kudos

Just found us suffering from the same issue since we enforced a GitHub ruleset to require commit emails to match our Org email pattern of `*@ourorgdomain.com`.

  • 2 kudos
3 More Replies
dfish8124
by New Contributor II
  • 586 Views
  • 1 replies
  • 1 kudos

Streaming IoT Hub Data Using Delta Live Table Pipeline

Hello,I'm attempting to stream IoT Hub data using a delta live table pipeline.  The issue I'm running into is that eventhub streaming isn't supported on shared clusters ([UNSUPPORTED_STREAMING_SOURCE_PERMISSION_ENFORCED] Data source eventhubs is not ...

  • 586 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @dfish8124 ,Is is possible to share with us code?

  • 1 kudos
Subhasis
by New Contributor III
  • 672 Views
  • 2 replies
  • 0 kudos

Small json files issue . taking 2 hours to read 3000 files

Hello I am trying to read 3000 json files which has only one records. It is taking 2 hours to read all the files . How can I perform this operation faster pls suggest.

  • 672 Views
  • 2 replies
  • 0 kudos
Latest Reply
Subhasis
New Contributor III
  • 0 kudos

This is the code ---df1 = spark.read.format("json").options(inferSchema="true", multiLine="true").load(file1) 

  • 0 kudos
1 More Replies
MarkV
by New Contributor III
  • 437 Views
  • 2 replies
  • 0 kudos

DLT Runtime Values

When my pipeline runs, I have a need to query a table in the pipeline before I actually create another table. I need to know the target catalog and target schema for the query. I figured the notebook might run automatically in the context of the cata...

  • 437 Views
  • 2 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

can you set up notebook parameters and pass them in the DLT pipeline? https://docs.databricks.com/en/jobs/job-parameters.html

  • 0 kudos
1 More Replies
NemesisMF
by New Contributor II
  • 631 Views
  • 4 replies
  • 2 kudos

Obtain refresh mode from within Delta Live Table pipeline run

Is it possible to obtain somehow if a DLT pipeline run is running in Full Refresh or incremental mode from within a notebook running in the pipeline?I looked into the pipeline configuration variables but could not find anything.It would be benefitial...

  • 631 Views
  • 4 replies
  • 2 kudos
Latest Reply
NemesisMF
New Contributor II
  • 2 kudos

We found a solution where we do not need to determine the refresh mode anymore. But I still do not know how to get the current refresh mode of the current pipeline run from within a notebook that is running in the pipeline. This may would still be be...

  • 2 kudos
3 More Replies
SALP_STELLAR
by New Contributor
  • 857 Views
  • 1 replies
  • 0 kudos

AzureException: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Server failed to a

Actually My first part of the code works fine:  dbutils.widgets.text("AutoLoanFilePath", "")inputPath = dbutils.widgets.get("AutoLoanFilePath")# inputPath = 'SEPT_2024/FAMILY_SECURITY'autoPath = 'dbfs:/mnt/dbs_adls_mnt/Prod_landing/'+inputPathautoLoa...

  • 857 Views
  • 1 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

This looks like an authentication issue when trying to access Azure Blob Storage from your Databricks environment. Can you please check the storage credentials and the setup?  Consider using an Azure AD service principal with the appropriate RBAC rol...

  • 0 kudos
L1000
by New Contributor III
  • 344 Views
  • 1 replies
  • 0 kudos

How to detect gap in filenames (Autoloader)

So my files arrive at the cloud storage and I have configured an autoloader to read these files.The files have a monotonically increasing id in their name.How can I detect a gap and stop the DLT as soon as there is a gap?eg.Autoloader finds file1, in...

  • 344 Views
  • 1 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

It doesn't seem like this can be done through the DLT autoloader. Particularly you require an automatic stop without manual intervention. You can write a custom Structured Streaming job and use a sequence-checking logic, and foreachBatch to process i...

  • 0 kudos
rvo19941
by New Contributor II
  • 417 Views
  • 1 replies
  • 0 kudos

Auto Loader with File Notification mode not picking up new files in Delta Live Tables pipeline

Dear,I am developing a Delta Live Table pipeline and use Auto Loader with File Notification mode to pick up files inside an Azure storage account (which is not the storage used by the default catalog). When I full refresh the target streaming table, ...

rvo19941_0-1730383733629.png
  • 417 Views
  • 1 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

Based on the error "Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key", the pipeline was still trying to use an account key authentication method instead of service principal au...

  • 0 kudos
dipali_globant
by New Contributor II
  • 658 Views
  • 2 replies
  • 0 kudos

parsing json string value column into dataframe structure

Hi All,I have to read kafka payload which has value column with json string. But the format of the json is as below.{ "data": [ { "p_al4": "N/A", "p_a5": "N/A", "p_ad": "OA003", "p_aName": "Abc", "p_aFlag": true ,....(dynamic)} ] }In data key it can ...

  • 658 Views
  • 2 replies
  • 0 kudos
Latest Reply
dipali_globant
New Contributor II
  • 0 kudos

No I don't know element in JSON . so I can't define structure.

  • 0 kudos
1 More Replies
mickniz
by Contributor
  • 29871 Views
  • 8 replies
  • 19 kudos

cannot import name 'sql' from 'databricks'

I am working on Databricks version 10.4 premium cluster and while importing sql from databricks module I am getting below error. cannot import name 'sql' from 'databricks' (/databricks/python/lib/python3.8/site-packages/databricks/__init__.py).Trying...

  • 29871 Views
  • 8 replies
  • 19 kudos
Latest Reply
ameet9257
Contributor
  • 19 kudos

if you ever received this kind of error after installing the correct Python package then try running the below command. dbutils.library.restartPython()

  • 19 kudos
7 More Replies
15460
by New Contributor II
  • 735 Views
  • 3 replies
  • 0 kudos

Idempotency token

Hi Team, I have used idempotency token in my dag code to avoid duplicate runs.note: Idempotency token given as static valueIssue: If dag fails once ...because of this idempotency token, airflow is not allowing to connect dbx ...can you please help me...

  • 735 Views
  • 3 replies
  • 0 kudos
Latest Reply
15460
New Contributor II
  • 0 kudos

Hi Vivian,Thanks for the response. Even i feel like it can be airflow issue. why because, even i dont have dbx job running at dbx end.. airflow still pointing to that idempotency token run id and results that error.current version we are using : 2.2....

  • 0 kudos
2 More Replies
Nid-cbs
by New Contributor III
  • 5954 Views
  • 8 replies
  • 3 kudos

Ownership change for table using SQL

It's not possible to use the ALTER TABLE tblname OWNER TO serviceprinc1 command in Azure Databricks, as this isn't supported. I was trying to set a catalog table's ownership, but it resulted in an error. How can I achieve this using a script

  • 5954 Views
  • 8 replies
  • 3 kudos
Latest Reply
vjani
New Contributor III
  • 3 kudos

I was getting same error in python notebook and I found typo in my sql:Changing from ALTER TABLE table_name SET OWNER TO 'principal' to below fixed the issue. ALTER TABLE table_name SET OWNER TO `principal`   

  • 3 kudos
7 More Replies
sukanya09
by New Contributor II
  • 1458 Views
  • 1 replies
  • 0 kudos

Photon is not supported for a query

(1) LocalTableScan Output [11]: [path#23524, partitionValues#23525, size#23526L, modificationTime#23527L, dataChange#23528, stats#23529, tags#23530, deletionVector#23531, baseRowId#23532L, defaultRowCommitVersion#23533L, clusteringProvider#23534] Arg...

Data Engineering
Databricks
MERGE
Photon
  • 1458 Views
  • 1 replies
  • 0 kudos
Latest Reply
rtreves
Contributor
  • 0 kudos

sukanya09 Any solution on this?

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels