cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Jackson1111
by New Contributor III
  • 1454 Views
  • 3 replies
  • 1 kudos

get job detail API

Hello, is there an API interface for passing in batches of run_id to obtain job running details?

  • 1454 Views
  • 3 replies
  • 1 kudos
Latest Reply
mhiltner
Databricks Employee
  • 1 kudos

Maybe this could help. Its not batch, but you can get the run_id details  https://docs.databricks.com/en/workflows/jobs/jobs-2.0-api.html#runs-get-output

  • 1 kudos
2 More Replies
eva_mcmf
by New Contributor II
  • 1757 Views
  • 1 replies
  • 0 kudos

Autoloader with SQLite db files

Hi Everyone, Is it possible to ingest SQLite db files with Databricks Autoloader? Is it currently supported? If so, could you please share an example?

Data Engineering
autoloader
azure
ingestion
sqlite
  • 1757 Views
  • 1 replies
  • 0 kudos
Latest Reply
lucasrocha
Databricks Employee
  • 0 kudos

Hello @eva_mcmf , I hope this message finds you well. As per the documentation, Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage. Auto Loader can load data files from AWS S3, Azure Data Lake Storage G...

  • 0 kudos
Chengcheng
by New Contributor III
  • 4137 Views
  • 1 replies
  • 0 kudos

The default location of temporary file in Azure Synapse Connector(com.databricks.spark.sqldw)

Hi everone, I'm trying to query data in Azure Synapse Dedicated SQL Pool according to the documentaion using:.format("com.databricks.spark.sqldw") Query data in Azure Synapse AnalyticsIt says that a abfss temporary location is needed.But I found that...

Data Engineering
Azure Synapse Connector
Data Ingstion
JDBC
  • 4137 Views
  • 1 replies
  • 0 kudos
Latest Reply
lucasrocha
Databricks Employee
  • 0 kudos

Hello @Chengcheng , I hope this message finds you well. As per the documentation the "tempDir" parameter is a required one and there is no default value for it.Databricks Synapse connector options reference: https://docs.databricks.com/en/connect/ext...

  • 0 kudos
PabloCSD
by Valued Contributor II
  • 3408 Views
  • 4 replies
  • 1 kudos

Resolved! My Libraries are not being installed in dbx-pipelines

Hello,I have some libraries on Azure Artifacts, but when I'm using notebooks, they are unreachable even though I'm explicitly adding the pip extra-url option (I have validated the tokens). So, I had to install them manually by downloading the wheel f...

PabloFelipe_0-1717599844578.png
Data Engineering
Databricks
dbx
  • 3408 Views
  • 4 replies
  • 1 kudos
Latest Reply
PabloCSD
Valued Contributor II
  • 1 kudos

@shan_chandrawe solved it, it was an issue with the DevOps key-vault token associated of the artifacts token.

  • 1 kudos
3 More Replies
AH
by New Contributor III
  • 1106 Views
  • 1 replies
  • 0 kudos

Resolved! Delta Lake Table Daily Read and Write job optimization

I have created 7 job for each business system to extract product data from each postgress source then write all job data into one data lake delta table [raw_product].each business system product table has around 20 GB of data.do the same thing for 15...

AH_0-1717569489175.png AH_1-1717572455868.png AH_3-1717572644640.png AH_2-1717572557758.png
  • 1106 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@AH  - we can try out the config  if read or fetch from postgres is slow , we can increase the fetchsize , numPartitions (to increase parallelism). kindly try to do a df.count() to check on slowness.  https://spark.apache.org/docs/latest/sql-data-sou...

  • 0 kudos
SamGreene
by Contributor II
  • 12765 Views
  • 4 replies
  • 0 kudos

Resolved! Using parameters in a SQL Notebook and COPY INTO statement

Hi, My scenario is I have an export of a table being dropped in ADLS every day.  I would like to load this data into a UC table and then repeat the process every day, replacing the data.  This seems to rule out DLT as it is meant for incremental proc...

  • 12765 Views
  • 4 replies
  • 0 kudos
Latest Reply
SamGreene
Contributor II
  • 0 kudos

The solution that worked what adding this python cell to the notebook: %pythonfrom pyspark.dbutils import DBUtilsdbutils = DBUtils(spark)dbutils.widgets.text("catalog", "my_business_app")dbutils.widgets.text("schema", "dev") Then in the SQL Cell: CRE...

  • 0 kudos
3 More Replies
galzamo
by New Contributor
  • 2074 Views
  • 1 replies
  • 0 kudos

Job running time too long

Hi all,I'm doing my first data jobs.I create one job that consists of 4 other jobs.Yesterday I ran the 4 jobs separately and it worked fine (about half hour)-today I ran the big job, and the 4 jobs is running for 2 hours (and still running), Why is t...

  • 2074 Views
  • 1 replies
  • 0 kudos
Latest Reply
anardinelli
Databricks Employee
  • 0 kudos

Hello @galzamo how are you? You can check on the SparkUI for long running stages that might give you a clue where it's spending the most time on each task. Somethings can be the reason: 1. Increase of data and partitions on your source data 2. Cluste...

  • 0 kudos
EDDatabricks
by Contributor
  • 2103 Views
  • 1 replies
  • 0 kudos

Expected size of managed Storage Accounts

Dear all,we are monitoring the size of managed storage accounts associated with our deployed Azure databricks instances.We have 5 databricks instances for specific components of our platform replicated in 4 environments (DEV, TEST, PREPROD, PROD).Dur...

Data Engineering
Filesize
LOGS
Managed Storage Account
  • 2103 Views
  • 1 replies
  • 0 kudos
Kayla
by Valued Contributor II
  • 7423 Views
  • 3 replies
  • 7 kudos

Resolved! SQL Warehouse Timeout / Prevent Long Running Queries

We have an external service connecting to a SQL Warehouse, running a query that normally lasts 30 minutes.On occasion an error occurs and it will run for 6 hours.This happens overnight and is contributing to a larger bill. Is there any way to force l...

  • 7423 Views
  • 3 replies
  • 7 kudos
Latest Reply
Kayla
Valued Contributor II
  • 7 kudos

@lucasrocha @raphaelblg That is exactly what I was hoping to find. Thank you!

  • 7 kudos
2 More Replies
thiagoawstest
by Contributor
  • 1514 Views
  • 0 replies
  • 0 kudos

add active directory group permission

Hi, I'm using Databricks on AWS, I did the single sign-on integration with Azure extra ID (active directory), everything is working fine, I can add users, but when I try to add a group that was created in AD, it can't be found the group.How should I ...

  • 1514 Views
  • 0 replies
  • 0 kudos
AlokThampi
by New Contributor III
  • 1073 Views
  • 0 replies
  • 0 kudos

Issues while writing into bad_records path

Hello All,I would like to get your inputs with a scenario that I see while writing into the bad_records file.I am reading a ‘Ԓ’ delimited CSV file based on a schema that I have already defined. I have enabled error handling while reading the file to ...

Alok1_0-1717548996735.png Alok1_1-1717549044696.png
  • 1073 Views
  • 0 replies
  • 0 kudos
MaximeGendre
by New Contributor III
  • 1263 Views
  • 0 replies
  • 0 kudos

Problem using from_avro function

Hello everyone,I need your help with a topic that has been preoccupying me for a few days."from_avro" function gives me a strange result when I pass it the json schema of a Kafka topic.=================================================================...

MaximeGendre_2-1717533967736.png MaximeGendre_0-1717533089570.png MaximeGendre_1-1717533556219.png
  • 1263 Views
  • 0 replies
  • 0 kudos
db_knowledge
by New Contributor II
  • 1324 Views
  • 2 replies
  • 0 kudos

Merge operation with ouputMode update in autoloader databricks

Hi team,I am trying to do merge operation along with outputMode('update') and foreachmode byusing below code but it is not updating data could you please any help on this?output=(casting_df.writeStream.format('delta').trigger(availableNow=True).optio...

  • 1324 Views
  • 2 replies
  • 0 kudos
Latest Reply
anardinelli
Databricks Employee
  • 0 kudos

Hi @db_knowledge  Please try .foreachBatch(upsertToDelta) instead of creating the lambda inside it. Best, Alessandro

  • 0 kudos
1 More Replies
Labels