cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

aranjan99
by New Contributor III
  • 1414 Views
  • 5 replies
  • 3 kudos

system.billing.usage table missing data for jobs running in my databricks account

I have some jobs running on databricks. I can obtain their jobId from the Jobs UI or List Job Runs API.However when trying to get DBU usage for the corresponding jobs from system.billing.usage, I do not see the same job_id in that table. Its been mor...

  • 1414 Views
  • 5 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi, @aranjan99. Apologies for the delayed response. If you’re not seeing job IDs from the UI or API in the billing table, it’s possible that the job run IDs are not being populated for long-running jobs.To address this, consider restarting the comput...

  • 3 kudos
4 More Replies
aranjan99
by New Contributor III
  • 1285 Views
  • 4 replies
  • 1 kudos

system.access.table_lineage table missing data

I am using the system.access.table_lineage table  to figure out the tables accessed by sql queries and the corresponding SQL queries. However I am noticing this table missing data or values very often.For eg for sql queries executed by our DBT jobs, ...

  • 1285 Views
  • 4 replies
  • 1 kudos
Latest Reply
jacovangelder
Contributor III
  • 1 kudos

Is all your ETL querying/referencing the full table name (i.e. catalog.schema.table)? If you query delta files for example, metadata for data lineage will not be captured. 

  • 1 kudos
3 More Replies
dm7
by New Contributor II
  • 1624 Views
  • 2 replies
  • 0 kudos

Resolved! Unit Testing DLT Pipelines

Now we are moving our DLT Pipelines into production, we would like to start looking at unit testing the transformation logic inside DLT notebooks.We want to know how we can unit test the PySpark logic/transformations independently without having to s...

  • 1624 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @dm7,  Instead of embedding all your transformation logic directly in the DLT notebook, create separate Python modules (files) for your transformations.This allows you to interactively test transformations from notebooks and write unit tests speci...

  • 0 kudos
1 More Replies
WWoman
by New Contributor III
  • 1575 Views
  • 2 replies
  • 0 kudos

Resolved! Persisting query history data

Hello,I am looking for a way to persist query history data. I have not have direct access to the system tables. I do have access to a query_history view created by selecting from the system.query.history and system.access.audit system tables. I want ...

  • 1575 Views
  • 2 replies
  • 0 kudos
Latest Reply
syed_sr7
New Contributor II
  • 0 kudos

Is any system table there for query history?

  • 0 kudos
1 More Replies
mbdata
by New Contributor II
  • 27205 Views
  • 6 replies
  • 4 kudos

Resolved! Toggle line comment

I work with Azure Databricks. The shortcut Ctrl + / to toggle line comment doesn't work on AZERTY keyboard on Firefox... Do you know this issue ? Is there an other shortcut I can try ? Thanks !

  • 27205 Views
  • 6 replies
  • 4 kudos
Latest Reply
Flo
New Contributor II
  • 4 kudos

'cmd + shift + 7' works for me!I'm using an AZERTY keyboard on Chrome for MacOS.

  • 4 kudos
5 More Replies
CarstenWeber
by New Contributor III
  • 1564 Views
  • 9 replies
  • 3 kudos

Resolved! Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config:  spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

  • 1564 Views
  • 9 replies
  • 3 kudos
Latest Reply
chhavibansal
New Contributor III
  • 3 kudos

@daniel_sahal any possible reason you know of why it works in OSS spark while it does not work in databricks notebook ? Why is there a disparity.

  • 3 kudos
8 More Replies
Aidzillafont
by New Contributor II
  • 128 Views
  • 1 replies
  • 0 kudos

How to pick the right cluster for your workflow

Hi All,I am attempting to execute a workflow on various job clusters, including general-purpose and memory-optimized clusters. My main bottleneck is that data is being written to disk because I’m running out of RAM. This is due to the large dataset t...

  • 128 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ravivarma
New Contributor III
  • 0 kudos

Hello @Aidzillafont , Greetings! Please find below the document which explains the Compute configuration best practices Doc: https://docs.databricks.com/en/compute/cluster-config-best-practices.html I hope this helps you! Regards, Ravi

  • 0 kudos
Enrique1987
by New Contributor II
  • 1147 Views
  • 1 replies
  • 2 kudos

Resolved! when to activate photon and when not to ?

Photon appears as an option to check and uncheck as appropriate.The use of Photon leads to higher consumption of DBUs and higher costs.At what point does it pay off and when not to enable it.More costs for the use of photon, but at the same time less...

  • 1147 Views
  • 1 replies
  • 2 kudos
Latest Reply
jacovangelder
Contributor III
  • 2 kudos

This is my own experience: For SQL workloads, with not too many joins, it will speed things up. For building facts and dimensions using many joins, I found Photon to increase costs by a lot, while not bringing much better performance. The only real w...

  • 2 kudos
Sudheer_DB
by New Contributor II
  • 180 Views
  • 3 replies
  • 0 kudos

DLT SQL schema definition

Hi All,While defining a schema in creating a table using Autoloader and DLT using SQL, I am getting schema mismatch error between the defined schema and inferred schema. CREATE OR REFRESH STREAMING TABLE csv_test(a0 STRING,a1 STRING,a2 STRING,a3 STRI...

Sudheer_DB_0-1719375711422.png
  • 180 Views
  • 3 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@Sudheer_DB You can specify your own _rescued_data column name by setting up rescuedDataColumn option.https://docs.databricks.com/en/ingestion/auto-loader/schema.html#what-is-the-rescued-data-column

  • 0 kudos
2 More Replies
Hertz
by New Contributor II
  • 204 Views
  • 1 replies
  • 0 kudos

Serverless Compute Cost Monitoring (System Tables)

Hello,I have developed a dashboard for monitoring compute costs using system tables, allowing tracking of expenses by Cluster Name (user created name), Job Name, or Warehouse Name. However, with the introduction of the new shared serverless compute, ...

  • 204 Views
  • 1 replies
  • 0 kudos
Latest Reply
kaiz
New Contributor II
  • 0 kudos

Hi @Hertz , thanks for your question. For the first case (billing_origin_product = "SQL"), that represents usage of materialized views or streaming tables on serverless DBSQL. Databricks bills for such usage using the serverless SKU. For the second c...

  • 0 kudos
drii_cavalcanti
by New Contributor III
  • 1328 Views
  • 3 replies
  • 0 kudos

DBUtils commands do not work on shared access mode clusters

Hi there,I am trying to upload a file to an s3 bucket. However, none of dbutils commands seem to work neither does the boto3 library. For clusters that have the configuration, except for the shared access mode, seem to work fine.Those are the error m...

  • 1328 Views
  • 3 replies
  • 0 kudos
Latest Reply
mchugani
New Contributor II
  • 0 kudos

@drii_cavalcanti Were you able to resolve this?

  • 0 kudos
2 More Replies
pm71
by New Contributor II
  • 470 Views
  • 4 replies
  • 3 kudos

Issue with os and sys Operations in Repo Path on Databricks

Hi,Starting from today, I have encountered an issue when performing operations using the os and sys modules within the Repo path in my Databricks environment. Specifically, any operation that involves these modules results in a timeout error. However...

  • 470 Views
  • 4 replies
  • 3 kudos
Latest Reply
mgradowski
New Contributor II
  • 3 kudos

https://status.azuredatabricks.net/pages/incident/5d49ec10226b9e13cb6a422e/667c08fa17fef71767abda04"Degraded performance" is a pretty mild way of saying almost nothing productve can be done ATM...

  • 3 kudos
3 More Replies
Hertz
by New Contributor II
  • 390 Views
  • 2 replies
  • 0 kudos

System Tables / Audit Logs action_name createWarehouse/createEndpoint

I am creating a cost dashboard across multiple accounts. I am working get sql warehouse names and warehouse ids so I can combine with system.access.billing on warehouse_id.  But the only action_names that include both the warehouse_id and warehouse_n...

Data Engineering
Audit Logs
cost monitor
createEndpoint
createWarehouse
  • 390 Views
  • 2 replies
  • 0 kudos
Latest Reply
Hertz
New Contributor II
  • 0 kudos

I just wanted to circle back to this. It appears that the ID is returned in the response column of the create action_name.

  • 0 kudos
1 More Replies
Bazhar
by New Contributor
  • 318 Views
  • 1 replies
  • 0 kudos

Understanding this Ipython related error in cluster logs

Hi Databricks Community !I'm having this error from a cluster's logs : [IPKernelApp] ERROR | Exception in control handler:Traceback (most recent call last):File "/databricks/python/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 334, in p...

  • 318 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Bazhar,  If you’re using Databricks Connect, ensure that it can reach your cluster.Verify that your workspace instance name and cluster ID are correct. 

  • 0 kudos
JamesY
by New Contributor III
  • 267 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks JDBC write to table with PK column, error, key not found.

Hello, I am trying to write data to table, it works find before, but after I recreated the table with one column as PK, there is an error.Unable to write into the A_Table table....key not found: id What is the correct way of doing this?PK column:   [...

Data Engineering
Databricks
SqlMi
  • 267 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @JamesY, If you’re using Databricks with SQL Server, you can use the OUTPUT clause to retrieve the primary key value after an INSERT query. CREATE TABLE A_Table ( ID BIGINT IDENTITY PRIMARY KEY, -- Other columns... ); INSERT INTO A_Table ...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels