With the newest version of databricks-connect, I cannot configure the extra jars I want to use. In the older version, I did that viaspark = SparkSession.builder.appName('DataFrame').\
config('spark.jars.packages','org.apache.spark:spark-avro_...
Hi @Lazloo, In the newer versions of Databricks Connect, configuring additional JARs for your Spark session is still possible.
Let’s adapt your previous approach to the latest version.
Adding JARs to a Databricks cluster:
If you want to add JAR f...
Hello everyone, I am currently working on my first dlt pipeline, and I stumped on a problem which I am struggling to solve.I am working on several tables where I have a column called "my_column" with an array of json with two keys : 1 key : score, 2n...
Hi @TiagoMag, It seems you’ve encountered an issue with your DLT pipeline while working on your first data lake table. Let’s dive into the problem and find a solution!
The error message you’re encountering states: “Detected a data update (for exam...
Hi I have the problem that my "module" is not known when used in a user defined function. The precise message is posted below. I have a repo structure as follows: analytics_pipelines
│ ├── __init__.py
│ ├── coordinate_transformation.py
│ ├── d...
Hiyes, I discovered three working possibilities:Define the pandas functions as inline function as pointed out aboveDefine the pandas function in the same script that is imported as "library" in the dlt config ( libraries:
- notebook:
path: ./pipeline...
Process to export a delta table is taking ~2hrs.Delta table has 66 partitions with total size of ~6gb, 4million rows and 270 columns.Used below commanddf.coalesce(1).write.csv("path")what are my options to reduce the time?
Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...
I have a dataframe with a schema similar to the following:id: stringarray_field: array element: struct field1: string field2: string array_field2: array element: struct nested_field: stri...
It turns out that if the exploded fields don't match the schema that was defined when reading the JSON in the first place that all the data that doesn't match is silently dropped. This is not really nice default behaviour.
Up until yesterday I was (sort of) able to read changes from target tables of apply changes operations (either through tables_changes() or using readChangeFeed). I say sort of because the meta columns (_change_type, _commit_version, _commit_timestamp...
Hi @afk, It seems you’ve been navigating the intricacies of Databricks Delta Live Tables and Change Data Capture (CDC). Let’s unravel this together!
Change Data Capture (CDC):
CDC is a process that identifies and captures incremental changes (data ...
Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?
Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...
I created a separate pipeline notebook to generate the table via DLT, and a separate notebook to write the entire output to redshift at the end. The table created via DLT is called spark.read.table("{schema}.{table}").This way, I can import[MATERIALI...
Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...
our managed databricks tables stored in s3 as default, while i am reading that s3 path directly i am getting the column value as UUIDeg: column name ID in databricks tablewhile checking the S3 Path, the column name looks like COL- b400af61-9tha-4565-...
hi @Kaniz Thank you for you are reply but the issue is i am not able to map ID with COL- b400af61-9tha-4565-89c4-d6ba43f948b7. i useDESCRIBE TABLE EXTENDED table_namea query to get the list of UUID column names. and for real column name fetting from...
This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...
Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...
I know in the Old version of Dashboards we have this Created at:And in the new Lake View Dashboards we have the Last Modified:I'm searching for a field that allows the client to quickly identify the latest data update timestamp for a Dashboard
Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...
This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...
Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...
Hello everyone I am trying to get information from Power BI semantic models via XMLA endpoint using PySpark in Databricks.Can someone help me with that?tks
Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...
HiOn Nov 29th ,I attempted the Databricks Certified Data Engineer Associate exam for 1st time , unfortunately I ended up by failing grade. My passing grade was 70%, and I received 64.00%.I am planning to reattempt the exam, Could you kindly give me a...
Thank you for posting your concern on Community! To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours).
Hi, we have several clusters used with Notebooks, we don't delete them, just start-stop according to the "minutes of inactivity" set.I'm trying to set a custom tag, so I wait until the cluster shuts down, add a tag, check that the tag is among then "...
@alejandrofm the behavior you're describing, where the custom tag disappears after the cluster restarts, might be related to the cluster configuration or the specific settings of your Databricks environment. To troubleshoot this, ensure that the cust...