cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

John_Rotenstein
by New Contributor II
  • 5359 Views
  • 6 replies
  • 3 kudos

Retrieve job-level parameters in Python

Parameters can be passed to Tasks and the values can be retrieved with:dbutils.widgets.get("parameter_name")More recently, we have been given the ability to add parameters to Jobs.However, the parameters cannot be retrieved like Task parameters.Quest...

  • 5359 Views
  • 6 replies
  • 3 kudos
Latest Reply
xiangzhu
Contributor II
  • 3 kudos

ah sorry, the thread asked for notebooks too.nevertheless, I'm search for getting job params in pure python jobs

  • 3 kudos
5 More Replies
arthurandraderj
by New Contributor
  • 49 Views
  • 1 replies
  • 0 kudos

Error truncating #REF with spark.read

Hello guysI am trying to read an excel file and even using PERMISSIVE mode, its truncating the records that contains #REF in any column Can anyone please help me on that? schema = StructType([\        StructField('Col1', DateType(), True), \ <-------...

  • 49 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @arthurandraderj, You can try reading the Excel file using Pandas with the xlrd engine, which supports the errors parameter. Set errors='coerce' to convert #REF values to NaN.

  • 0 kudos
joedata
by New Contributor
  • 43 Views
  • 1 replies
  • 0 kudos

pywin32

A python module called pywin32 enables users to read an excel file, make changes to specific cells, execute a Refresh All which refreshes all the data connections, and save the changes made to an excel file. This cannot be used on databricks because ...

  • 43 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @joedata,  Since Databricks notebooks allow you to run Python code, you can leverage Python libraries to manipulate Excel files.Instead of using pywin32, consider using libraries like pandas or openpyxl to read, modify, and save Excel files.You ca...

  • 0 kudos
Avinash_Narala
by New Contributor III
  • 32 Views
  • 1 replies
  • 0 kudos

How to capture cuurent_user in custom audit logs in job

Hi,I want to build custom audit log in my job.But when I do so, I'm able to capture everything except the current_user() who is running the job, How can I capture it in my custom audit log in my job itself.

  • 32 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Avinash_Narala,  Databricks provides system tables that store audit logs. You can query these tables to retrieve information about job executions, including user identities.Within your Databricks job, you can add custom logging to capture the cur...

  • 0 kudos
chaosBEE
by Visitor
  • 43 Views
  • 2 replies
  • 0 kudos

StructField Metadata Dictionary - What are the possible keys?

I have a Delta Live Table which is being deposited to Unity Catalog. In the Python notebook, I am defining the schema with a series of StructFields, for example: StructField(    "columnName",     StringType(),     True,     metadata = {        'comme...

  • 43 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @chaosBEE, Adding Tags to Columns Programmatically: To add tags to columns programmatically, you can leverage the @dlt.table decorator and define your schema with metadata. Adding Tags to the Specific Table: Unfortunately, the dlt.create_st...

  • 0 kudos
1 More Replies
Devsql
by New Contributor III
  • 253 Views
  • 2 replies
  • 0 kudos

Measure size of all tables in Azure databricks

Hi Team,Currently I am trying to find size of all tables in my Azure databricks, as i am trying to get idea of current data loading trends, so i can plan for data forecast ( i.e. Last 2 months, approx 100 GB data came-in, so in next 2-3 months there ...

  • 253 Views
  • 2 replies
  • 0 kudos
Latest Reply
Devsql
New Contributor III
  • 0 kudos

Hi @Kaniz,1-  Regarding this issue i had found below link:https://kb.databricks.com/sql/find-size-of-table#:~:text=You%20can%20determine%20the%20size,stats%20to%20return%20the%20sizeNow to try above link, I need to decide: Delta-Table Vs Non-Delta-Ta...

  • 0 kudos
1 More Replies
guangyi
by New Contributor
  • 24 Views
  • 1 replies
  • 0 kudos

How to make a DLT pipeline trigger another pipeline?

For example, I have 2 DLT pipelines, one is used for computing user gender distribution, another is used for compute user location distribution. In the first pipeline, I follow the medallion architecture creating the bronze, silver, gold table one by...

  • 24 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

  Hi @guangyi, To trigger your second pipeline when the silver table data is updated by the first pipeline, you can follow these steps: Event-Based Trigger: On the first pipeline (the one creating the silver table), add a WebActivity at the end.C...

  • 0 kudos
Awoke101
by New Contributor
  • 39 Views
  • 1 replies
  • 0 kudos

UC_COMMAND_NOT_SUPPORTED.WITHOUT_RECOMMENDATION in shared access mode

I'm using a shared access cluster and am getting this error while trying to upload to Qdrant. This is the error. Anyway I can make it worked on shared access mode? It works on the personal cluster.[UC_COMMAND_NOT_SUPPORTED.WITHOUT_RECOMMENDATION] The...

  • 39 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Awoke101,  Shared access clusters in Databricks have certain restrictions due to Unity Catalog limitations.I recommend trying the single-user Unity Catalog cluster or configuring data persistence outside the container. Let me know if you need fur...

  • 0 kudos
yvuignie
by Contributor
  • 70 Views
  • 2 replies
  • 0 kudos

Asset Bundles webhook not working

Hello,The webhook notifications in databricks jobs defined in the asset bundles are not taken into account and therefore not created. For instance, this is not working:resources: jobs: job1: name: my_job webhook_notifications: on...

  • 70 Views
  • 2 replies
  • 0 kudos
Latest Reply
yvuignie
Contributor
  • 0 kudos

Hello @Kaniz ,Thank you for your help.However we did check the job configuration multiple time. If we substitue 'webhook_notifications' with 'email_notifications' it works, so the syntax is correct. Here is a sample of our configuration:For the webho...

  • 0 kudos
1 More Replies
N_M
by New Contributor III
  • 810 Views
  • 3 replies
  • 1 kudos

Access historical injected data of COPY INTO command

Dear Community,I'm using the COPY INTO command to automate the staging of files that I get in an S3 bucket into specific delta tables (with some transformation on the fly).The command works smoothly, and files are indeed inserted only once (writing i...

  • 810 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

  Hi @N_M,  Accessing Inserted Filenames History Metadata: When using the COPY INTO command in Databricks, the filenames of staged files are indeed stored in metadata. However, accessing this information directly from the metadata or transaction...

  • 1 kudos
2 More Replies
ksenija
by New Contributor III
  • 41 Views
  • 2 replies
  • 0 kudos

DLT pipeline - DebeziumJDBCMicroBatchProvider not found

Hi!I created DLT pipeline and I'm getting this error:[STREAM_FAILED] Query [id = ***, runId = ***] terminated with exception: object com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found.I'm using Serverless.How to verify that the require...

Data Engineering
DebeziumJDBCMicroBatchProvider
dlt
  • 41 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @ksenija,  Here are some steps to verify and address the issue:   First, ensure that the necessary library or package containing the missing class is correctly included in your Databricks environment.You can verify this by going to your Databri...

  • 0 kudos
1 More Replies
N_M
by New Contributor III
  • 21 Views
  • 0 replies
  • 0 kudos

use job parameters in scripts

Hi CommunityI made some research, but I wasn't lucky, and I'm a bit surprised I can't find anything about it.So, I would simply access the job parameters when using python scripts (not notebooks).My flow doesn't use notebooks, but I still need to dri...

  • 21 Views
  • 0 replies
  • 0 kudos
AdventureAce
by New Contributor II
  • 39 Views
  • 1 replies
  • 0 kudos

Short-live token from Unity Catalog

What is this short-lived token shared by unity-catalog in step 4 and 5 here? And how does the cloud storage authenticate the token generated by unity catalog?  

AdventureAce_0-1718918698276.png
  • 39 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @AdventureAce, The short-lived token shared by Unity Catalog in steps 4 and 5 is a security measure. It’s designed to expire after a certain period, reducing risk. When using cloud storage, the token is authenticated by the system based on its exp...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels