cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

WAHID
by New Contributor II
  • 227 Views
  • 1 replies
  • 0 kudos

GDAL on Databricks serverless compute

I am wondering if it's possible to install and use GDAL on Databricks serverless compute. I couldn't manage to do that using pip install gdal, and I discovered that init scripts are not supported on serverless compute.

  • 227 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @WAHID, GDAL (Geospatial Data Abstraction Library) is a powerful open-source library for working with geospatial data. While installing it directly using pip install gdal might not work on Databricks serverless compute, you can try this:- In your ...

  • 0 kudos
KosmaS
by New Contributor III
  • 350 Views
  • 1 replies
  • 1 kudos

Resolved! Lost Databricks' dependency in a job.

Hey,I had a stable notebook within the whole job. It contains one action defined as dumping data to s3. Currently, it started generating some issues. Maybe someone can suggest either how to investigate it further or what to try to do with such kinds ...

Screenshot 2024-07-19 at 19.55.48.png
  • 350 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @KosmaS, AQE can affect the execution plan of queries. If you notice unexpected changes in query behavior after enabling AQE, consider disabling it temporarily to verify if it’s causing the issue.

  • 1 kudos
tomph
by New Contributor II
  • 543 Views
  • 4 replies
  • 2 kudos

Cannot read from view if no access to underlying table

Hi,I created a view my_view in a schema project_schema in Unity catalog catalog_dev that is a select * from a table my_table in my common_schema in the same catalog.I gave a service principal full grants on the project_schema. It is a owner of the sc...

  • 543 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @tomph, Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback no...

  • 2 kudos
3 More Replies
Yyyyy
by New Contributor III
  • 492 Views
  • 4 replies
  • 2 kudos

showing only a limited number of lines from the CSV file

Expected no of lines is - 16400 Showing only 20 No of records Script spark.conf.set(     "REDACTED",     "REDACTED" ) # File location file_location = "REDACTED" # Read in the data to dataframe df df = spark.read.format("CSV").option("inferSchema",...

  • 492 Views
  • 4 replies
  • 2 kudos
Latest Reply
Yyyyy
New Contributor III
  • 2 kudos

 hi, pls look help mespark.conf.set(    "REDACTED",    "REDACTED")# File locationfile_location = "REDACTED"# Read in the data to dataframe dfdf = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",")...

  • 2 kudos
3 More Replies
Sadam97
by New Contributor
  • 383 Views
  • 1 replies
  • 0 kudos

Error: the Service Account Key in storage credential is not configured correctly

We have databricks on GCP. Streamings are running 24/7, storage credentials and external location are created as we are using managed unity catalog. We get random error, somewhere are around mid night (UTC). Here is trace of Error,ERROR MicroBatchExe...

  • 383 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Sadam97,  Could you please ensure that the Service Account Key associated with your Databricks workspace has the necessary permissions to access the storage resources (such as Google Cloud Storage or other external storage)? Please verify that th...

  • 0 kudos
Sudharsan24
by New Contributor II
  • 712 Views
  • 3 replies
  • 2 kudos

Job aborted stage failure java.sql.SQLRecoverableException: IO Error: Connection reset by peer

While ingesting data from Oracle to databricks(writing into ADLS) using jdbc I am getting connection reset by peer error when ingesting a large table which has millions of rows.I am using oracle sql developer and azure databricks.I tried every way li...

  • 712 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Sudharsan24, Could you please consider adjusting connection pool settings (e.g., connection timeout, maximum connections) if applicable?You mentioned using partition columns, lower and upper bounds, and incremental loading. These are good practic...

  • 2 kudos
2 More Replies
8b1tz
by Contributor
  • 2609 Views
  • 24 replies
  • 6 kudos

Resolved! ADF logs into Databricks

Hello, I would like to know the best way to insert Datafactory activity logs into my Databricks delta table, so that I can use dashbosrd and create monitoring in Databricks itself , can you help me? I would like every 5 minutes for all activity logs ...

  • 2609 Views
  • 24 replies
  • 6 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 6 kudos

How fancy do you want to go? You can send ADF diagnostic settings to an event hub and stream them into a delta table in Databricks. Or you can send them to a storage account and build a workflow with 5 minute interval that loads the storage blob into...

  • 6 kudos
23 More Replies
gazzyjuruj
by Contributor II
  • 8652 Views
  • 5 replies
  • 9 kudos

Cluster start is currently disabled ?

Hi, i'm trying to run the notebooks but it doesn't do any activity.I had to create a cluster in order to start my code.pressing the play button inside of notebook does nothing at all.and the 'compute' , pressing play there on the clusters gives the e...

  • 8652 Views
  • 5 replies
  • 9 kudos
Latest Reply
mrp12
New Contributor II
  • 9 kudos

This is very common issue I see with community edition. I suppose the only work around is to create new cluster each time. More info on stackoverflow:https://stackoverflow.com/questions/69072694/databricks-community-edition-cluster-wont-start

  • 9 kudos
4 More Replies
Sathish_itachi
by New Contributor III
  • 4894 Views
  • 19 replies
  • 16 kudos

Resolved! Encountering an error while accessing dbfs root folders

dbfs file browser storagecontext com.databricks.backend.storage.storagecontexttype$dbfsroot$@4155a7bf for workspace 2818007466707254 is not set in the customerstorageinfo above is the error displayed on the ui

  • 4894 Views
  • 19 replies
  • 16 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 16 kudos

This was a global issue and this is fixed now!

  • 16 kudos
18 More Replies
lei_armstrong
by New Contributor II
  • 8198 Views
  • 8 replies
  • 8 kudos

Resolved! Executing Notebooks - Run All Cells vs Run All Below

Due to dependencies, if one of our cells errors then we want the notebook to stop executing.We've noticed some odd behaviour when executing notebooks depending on if "Run all cells in this notebook" is selected from the header versus "Run All Below"....

  • 8198 Views
  • 8 replies
  • 8 kudos
Latest Reply
sukanya09
New Contributor II
  • 8 kudos

Has this been implemented? I have created a job using notebook. My notebook has 6 cells and if the code in first cell fails it should not run the rest of the cells 

  • 8 kudos
7 More Replies
Coders
by New Contributor II
  • 2072 Views
  • 5 replies
  • 0 kudos

Feedback on the data quality and consistency checks in Spark

I'm seeking validation from experts regarding the data quality and consistency checks we're implementing as part of a data migration using Spark and Databricks.Our migration involves transferring data from Azure Data Lake to a different data lake. As...

  • 2072 Views
  • 5 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor III
  • 0 kudos

Hi @Coders, I'd also consider some profiling checks for column stats and distribution just to be sure everything is consistent after the migration.Afterwards, you should consider the best-practice of implementing some data quality validations on the ...

  • 0 kudos
4 More Replies
laksh
by New Contributor II
  • 2941 Views
  • 5 replies
  • 3 kudos

What kind of data quality rules that can be run using unity catalog

We are trying to build data quality process for initial file level or data ingestion level for bronze and add more specific business times for silver and business related aggregates for golden layer.

  • 2941 Views
  • 5 replies
  • 3 kudos
Latest Reply
joarobles
New Contributor III
  • 3 kudos

Hi @laksh!You could take a look at Rudol Data Quality, it has native Databricks integration and covers both basic an advanced data quality checks. Basic checks can be configured by non-technical roles using a no-code interface, but there's also the o...

  • 3 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 1454 Views
  • 2 replies
  • 1 kudos

What is the Data Quality Framework do you use/recomend ?

Hi guys,In your opinion what is the best Data Quality Framework (or techinique) do you recommend ? 

Data Engineering
dataquality
  • 1454 Views
  • 2 replies
  • 1 kudos
Latest Reply
joarobles
New Contributor III
  • 1 kudos

Hi there!You could also take a look at Rudol, it has native Databricks support and covers Data Quality validations and Data Governance enabling non-technical roles such as Business Analysts or Data Stewards to be part of data quality as well with no-...

  • 1 kudos
1 More Replies
Phani1
by Valued Contributor II
  • 5481 Views
  • 6 replies
  • 0 kudos

Data Quality in Databricks

Hi Databricks Team, would like to implement data quality rules in Databricks, apart from DLT do we have any standard approach to perform/ apply data quality rules on bronze layer before further proceeding to silver and gold layer.

  • 5481 Views
  • 6 replies
  • 0 kudos
Latest Reply
joarobles
New Contributor III
  • 0 kudos

Looks nice! However I don't see Databricks support in the docs

  • 0 kudos
5 More Replies
narendra11
by New Contributor
  • 664 Views
  • 5 replies
  • 1 kudos

Resolved! getting Status code: 301 Moved Permanently error

getting this error while running the cells Failed to upload command result to DBFS. Error message: Status code: 301 Moved Permanently, Error message: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>PermanentRedirect</Code><Message>The bucket you ...

  • 664 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @narendra11, This has been fixed now. Could you please confirm?  

  • 1 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels