cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Subhasis
by New Contributor III
  • 1452 Views
  • 2 replies
  • 0 kudos

Small json files issue . taking 2 hours to read 3000 files

Hello I am trying to read 3000 json files which has only one records. It is taking 2 hours to read all the files . How can I perform this operation faster pls suggest.

  • 1452 Views
  • 2 replies
  • 0 kudos
Latest Reply
Subhasis
New Contributor III
  • 0 kudos

This is the code ---df1 = spark.read.format("json").options(inferSchema="true", multiLine="true").load(file1) 

  • 0 kudos
1 More Replies
NemesisMF
by New Contributor II
  • 1389 Views
  • 4 replies
  • 2 kudos

Obtain refresh mode from within Delta Live Table pipeline run

Is it possible to obtain somehow if a DLT pipeline run is running in Full Refresh or incremental mode from within a notebook running in the pipeline?I looked into the pipeline configuration variables but could not find anything.It would be benefitial...

  • 1389 Views
  • 4 replies
  • 2 kudos
Latest Reply
NemesisMF
New Contributor II
  • 2 kudos

We found a solution where we do not need to determine the refresh mode anymore. But I still do not know how to get the current refresh mode of the current pipeline run from within a notebook that is running in the pipeline. This may would still be be...

  • 2 kudos
3 More Replies
SALP_STELLAR
by New Contributor
  • 1692 Views
  • 1 replies
  • 0 kudos

AzureException: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Server failed to a

Actually My first part of the code works fine:  dbutils.widgets.text("AutoLoanFilePath", "")inputPath = dbutils.widgets.get("AutoLoanFilePath")# inputPath = 'SEPT_2024/FAMILY_SECURITY'autoPath = 'dbfs:/mnt/dbs_adls_mnt/Prod_landing/'+inputPathautoLoa...

  • 1692 Views
  • 1 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

This looks like an authentication issue when trying to access Azure Blob Storage from your Databricks environment. Can you please check the storage credentials and the setup?  Consider using an Azure AD service principal with the appropriate RBAC rol...

  • 0 kudos
L1000
by New Contributor III
  • 606 Views
  • 1 replies
  • 0 kudos

How to detect gap in filenames (Autoloader)

So my files arrive at the cloud storage and I have configured an autoloader to read these files.The files have a monotonically increasing id in their name.How can I detect a gap and stop the DLT as soon as there is a gap?eg.Autoloader finds file1, in...

  • 606 Views
  • 1 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

It doesn't seem like this can be done through the DLT autoloader. Particularly you require an automatic stop without manual intervention. You can write a custom Structured Streaming job and use a sequence-checking logic, and foreachBatch to process i...

  • 0 kudos
rvo19941
by New Contributor II
  • 796 Views
  • 1 replies
  • 0 kudos

Auto Loader with File Notification mode not picking up new files in Delta Live Tables pipeline

Dear,I am developing a Delta Live Table pipeline and use Auto Loader with File Notification mode to pick up files inside an Azure storage account (which is not the storage used by the default catalog). When I full refresh the target streaming table, ...

rvo19941_0-1730383733629.png
  • 796 Views
  • 1 replies
  • 0 kudos
Latest Reply
SparkJun
Databricks Employee
  • 0 kudos

Based on the error "Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key", the pipeline was still trying to use an account key authentication method instead of service principal au...

  • 0 kudos
dipali_globant
by New Contributor II
  • 1463 Views
  • 2 replies
  • 0 kudos

parsing json string value column into dataframe structure

Hi All,I have to read kafka payload which has value column with json string. But the format of the json is as below.{ "data": [ { "p_al4": "N/A", "p_a5": "N/A", "p_ad": "OA003", "p_aName": "Abc", "p_aFlag": true ,....(dynamic)} ] }In data key it can ...

  • 1463 Views
  • 2 replies
  • 0 kudos
Latest Reply
dipali_globant
New Contributor II
  • 0 kudos

No I don't know element in JSON . so I can't define structure.

  • 0 kudos
1 More Replies
mickniz
by Contributor
  • 36492 Views
  • 8 replies
  • 19 kudos

cannot import name 'sql' from 'databricks'

I am working on Databricks version 10.4 premium cluster and while importing sql from databricks module I am getting below error. cannot import name 'sql' from 'databricks' (/databricks/python/lib/python3.8/site-packages/databricks/__init__.py).Trying...

  • 36492 Views
  • 8 replies
  • 19 kudos
Latest Reply
ameet9257
Contributor
  • 19 kudos

if you ever received this kind of error after installing the correct Python package then try running the below command. dbutils.library.restartPython()

  • 19 kudos
7 More Replies
15460
by New Contributor II
  • 1680 Views
  • 3 replies
  • 0 kudos

Idempotency token

Hi Team, I have used idempotency token in my dag code to avoid duplicate runs.note: Idempotency token given as static valueIssue: If dag fails once ...because of this idempotency token, airflow is not allowing to connect dbx ...can you please help me...

  • 1680 Views
  • 3 replies
  • 0 kudos
Latest Reply
15460
New Contributor II
  • 0 kudos

Hi Vivian,Thanks for the response. Even i feel like it can be airflow issue. why because, even i dont have dbx job running at dbx end.. airflow still pointing to that idempotency token run id and results that error.current version we are using : 2.2....

  • 0 kudos
2 More Replies
Nid-cbs
by New Contributor III
  • 11916 Views
  • 8 replies
  • 3 kudos

Ownership change for table using SQL

It's not possible to use the ALTER TABLE tblname OWNER TO serviceprinc1 command in Azure Databricks, as this isn't supported. I was trying to set a catalog table's ownership, but it resulted in an error. How can I achieve this using a script

  • 11916 Views
  • 8 replies
  • 3 kudos
Latest Reply
vjani
New Contributor III
  • 3 kudos

I was getting same error in python notebook and I found typo in my sql:Changing from ALTER TABLE table_name SET OWNER TO 'principal' to below fixed the issue. ALTER TABLE table_name SET OWNER TO `principal`   

  • 3 kudos
7 More Replies
sukanya09
by New Contributor II
  • 2203 Views
  • 1 replies
  • 0 kudos

Photon is not supported for a query

(1) LocalTableScan Output [11]: [path#23524, partitionValues#23525, size#23526L, modificationTime#23527L, dataChange#23528, stats#23529, tags#23530, deletionVector#23531, baseRowId#23532L, defaultRowCommitVersion#23533L, clusteringProvider#23534] Arg...

Data Engineering
Databricks
MERGE
Photon
  • 2203 Views
  • 1 replies
  • 0 kudos
Latest Reply
rtreves
Contributor
  • 0 kudos

sukanya09 Any solution on this?

  • 0 kudos
singhanuj2803
by Contributor
  • 1240 Views
  • 2 replies
  • 1 kudos

Resolved! Notebook Import Failed Due to Workspace Quota Exceeded Error

I am using Databricks Community Edition. I am writing to seek assistance regarding an issue I encountered while attempting to upload a notebook to my Databricks Community Edition Workspace. I received the following error message:It appears that I hav...

Screenshot 2024-11-10 205232.png
  • 1240 Views
  • 2 replies
  • 1 kudos
Latest Reply
singhanuj2803
Contributor
  • 1 kudos

Thanks . What is the maximum storage limit of Databricks Community Edition ?

  • 1 kudos
1 More Replies
Padmaja
by New Contributor II
  • 1193 Views
  • 2 replies
  • 0 kudos

Delta Live Table Pipeline Migration from one workspace to another workspace

Hi Team, Can you please help me to migrate Delta Live Table Pipeline Migration from one workspace to another workspace in automation way.Thanks,Padmaja M. 

  • 1193 Views
  • 2 replies
  • 0 kudos
Latest Reply
Padmaja
New Contributor II
  • 0 kudos

Hi Walter,Thank you for the information! Could you please guide me on where I can find more details about the idea DB-I-5757? Is there a public portal or forum where this idea is being tracked, or can I access updates on its status? Any additional in...

  • 0 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 3422 Views
  • 1 replies
  • 3 kudos

How to use Pylint to check your pyspark code quality ?

Hi guys,I would like to use the Pylint to check my pyspark scripts, do you do that ?Thank you ?

  • 3422 Views
  • 1 replies
  • 3 kudos
Latest Reply
developer_lumo
New Contributor II
  • 3 kudos

Currently I am working on Databricks (Notebooks) and have the same issue as unable to find a linter that is well integrated with Python, Pyspark and databricks notebooks. 

  • 3 kudos
ashraf1395
by Honored Contributor
  • 933 Views
  • 1 replies
  • 0 kudos

Resolved! Creating notebooks which work on both normal databricks jobs as well as dlt pipeline

We are working on automation of our databricks ingestion. We want to make our python scrips or notebooks such that they work on both databricks jobs and dlt pipelines.When i say databricks jobs it means normal run without dlt pipeline.How shall we wo...

  • 933 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello @ashraf1395, To address your goal of creating Python scripts or notebooks that work both in Databricks Jobs and Delta Live Tables (DLT) pipelines, here are some ideas: Unified Script Approach:Table Creation: As you mentioned, DLT supports two t...

  • 0 kudos
Dharinip
by Contributor
  • 1855 Views
  • 1 replies
  • 1 kudos

Resolved! Create a Delta Table with PK and FK constraints for a streaming source data

1. How to create a Delta Table with PK and FK constraints for a streaming source data?2. When the streaming data in the silver layer gets updated, will the delta table also be updated?My use case is:We have a streaming data in the silver layer as SCD...

  • 1855 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

1. You can use primary key and foreign key relationships on fields in Unity Catalog tables. Primary and foreign keys are informational only and are not enforced. Foreign keys must reference a primary key in another table. You can declare primary keys...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels