cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

naga_databricks
by Contributor
  • 4389 Views
  • 1 replies
  • 0 kudos

Reading bigquery data using a query

To read Bigquery data using spark.read, i'm using a query. This query executes and creates a table on the materializationDataset. df = spark.read.format("bigquery") \.option("query", query) \.option("materializationProject", materializationProject) \...

  • 4389 Views
  • 1 replies
  • 0 kudos
EDDatabricks
by Contributor
  • 1478 Views
  • 2 replies
  • 2 kudos

Appropriate storage account type for reference data (Azure)

Hello,We are using a reference dataset for our Production applications. We would like to create a delta table for this dataset to be used from our applications. Currently, manual updates will occur on this dataset through a script on a weekly basis. ...

Data Engineering
Delta Live Table
Storage account
  • 1478 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

+1 for ADLS.  Hierarchical storage, hot/cold/premium storage, things not possible in blob storage

  • 2 kudos
1 More Replies
irispan
by New Contributor II
  • 4650 Views
  • 4 replies
  • 1 kudos

Recommended Hive metastore pattern for Trino integration

Hi, i have several questions regarding Trino integration:Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?When I tried to use ex...

test - Databricks
  • 4650 Views
  • 4 replies
  • 1 kudos
Latest Reply
JunlinZeng
Databricks Employee
  • 1 kudos

> Is it recommended to use an external Hive metastore or leverage on the databricks-maintained Hive metastore when it comes to enabling external query engines such as Trino?Databricks maintained hive metastore is not suggested to be used externally. ...

  • 1 kudos
3 More Replies
Agus1
by New Contributor III
  • 5402 Views
  • 3 replies
  • 3 kudos

Update destination table when using Spark Structured Streaming and Delta tables

I’m trying to implement a streaming pipeline that will run hourly using Spark Structured Streaming, Scala and Delta tables. The pipeline will process different items with their details.The source are delta tables that already exists, written hourly u...

  • 5402 Views
  • 3 replies
  • 3 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 3 kudos

@Agus1 Could you try using CDC in delta. You could use readChangeFeed to read only the changes that got applied on the source table. This is also explained here.https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed

  • 3 kudos
2 More Replies
Eric_Kieft
by New Contributor III
  • 3125 Views
  • 2 replies
  • 1 kudos

Unity Catalog Table/View Column Data Type Changes

When changing a delta table column data type in Unity Catalog, we noticed a view that is referencing that table did not automatically update to reflect the new data type.Is there a way to update the delta table column data type so that it also update...

  • 3125 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

Can you try refreshing the view by running the command: REFRESH TABLE <viewname>

  • 1 kudos
1 More Replies
Vibhor
by Contributor
  • 5008 Views
  • 5 replies
  • 4 kudos

Resolved! Cluster Performance

Facing an issue with cluster performance, in event log can see - cluster is not responsive likely due to GC. Number of pipeline (databricks notebooks) running and cluster configuration is same as it used to be before but started seeing this issue sin...

  • 5008 Views
  • 5 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Vibhor Sethi​ ,Do you see any other error messages? did you data volume increase? what kind of job are you running?

  • 4 kudos
4 More Replies
ajain80
by New Contributor III
  • 22656 Views
  • 5 replies
  • 10 kudos

Resolved! SFTP Connect

How I can connect sftp server from databricks. So I can write files into tables directly?

  • 22656 Views
  • 5 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

The classic solution is to copy data from FTP to ADLS storage using Azure Data Factory, and after the copy is done in the ADF pipeline, trigger the databricks notebook.

  • 10 kudos
4 More Replies
Souvikng
by New Contributor
  • 2193 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks lakehouse fundamental certification page is not working

While going to give  Databricks lakehouse fundamental certification quizIt is showing that you don't have permission to access this. Could you please guide me on that or can provide a link through which I can get permission to access this and complet...

  • 2193 Views
  • 1 replies
  • 0 kudos
Latest Reply
APadmanabhan
Databricks Employee
  • 0 kudos

Hi @Souvikng, Here is the link for the Accrediition.

  • 0 kudos
parimalpatil28
by New Contributor III
  • 4685 Views
  • 1 replies
  • 0 kudos

How to Get spark history server logs using python REST API

Hello,I am trying to load logs from spark job in remote location using Python Rest API.I want to collect those logs for particular job runs using runID field, that log should contain Errors, Exceptions, and print details.I have tried "/api/2.1/jobs/r...

  • 4685 Views
  • 1 replies
  • 0 kudos
Latest Reply
parimalpatil28
New Contributor III
  • 0 kudos

Hello,How can I access endpoint (http URL) of spark history server for a particular cluster?  Thanks,Parimal  

  • 0 kudos
erigaud
by Honored Contributor
  • 8302 Views
  • 2 replies
  • 2 kudos

Resolved! Cannot delete file in dbfs

Hello, I am trying to delete a folder in /dbfs/mnt, but I am unable to do so. The folder was an old mounted storage account, which was deleted. The folder contains a single file 'mount.err'.In command line I tried rm -rf my_folder, with and without s...

  • 8302 Views
  • 2 replies
  • 2 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 2 kudos

Hi @erigaud  The issue you're facing might be due to the fact that the folder was a mounted storage account that has been deleted. This could cause some inconsistencies in the file system view and hence, you're unable to delete the folder.You can try...

  • 2 kudos
1 More Replies
brendanc19
by New Contributor III
  • 3792 Views
  • 3 replies
  • 1 kudos

Resolved! Running DBT via Jobs Compute

Is it possible to run a DBT job using Jobs Compute? If not, is there a reason for that, and is there any plans to change that?

  • 3792 Views
  • 3 replies
  • 1 kudos
Latest Reply
User16502773013
Databricks Employee
  • 1 kudos

Hello @brendanc19 ,Running DBT with Jobs compute cluster is actually supportedYou will need to create a job task with type DBTKindly check documentation here for step by step on how to run DBT transformation Databricks jobsRegards

  • 1 kudos
2 More Replies
chiaflute
by New Contributor
  • 1923 Views
  • 1 replies
  • 0 kudos

Error creating repo

I have this error when trying to clone a repo, using aos sparse checkout mode.  Error creating repoGit operation is terminated because of out of memory or CPU, possibly the repository is too big. Need help? See our Repos Limitations and FAQs docs How...

  • 1923 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752239289
Databricks Employee
  • 0 kudos

From the error message, you could exceed below limit :Working branches are limited to 200 MB.Individual files are limited to 200 MB.Files larger than 10 MB can’t be viewed in the Databricks UI.Databricks recommends that in a repo:The total number of ...

  • 0 kudos
sanjay
by Valued Contributor II
  • 11091 Views
  • 8 replies
  • 3 kudos

How to stop continuous running streaming job over weekend

I have an continuous running streaming Job, I would like to stop this over weekend and start again on Monday. Here is my streaming job code.(spark.readStream.format("delta").load(input_path).writeStream.option("checkpointLocation", input_checkpoint_p...

  • 11091 Views
  • 8 replies
  • 3 kudos
Latest Reply
NDK
New Contributor II
  • 3 kudos

@sanjay Any luck on that, I am also looking for the solution for the same issue 

  • 3 kudos
7 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels