cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Wassim
by New Contributor III
  • 1764 Views
  • 2 replies
  • 1 kudos

Resolved! Cancelling the exam- need to know whats policy if had scheduled the exam with voucher

I have my exam scheduled for next month ,but I am going to cancel it( i have regestered this exam using a voucher, In future i may schedule other exam ,would i be able to utilize that voucher that i used for the exam am gonna cancel? I mean could tha...

  • 1764 Views
  • 2 replies
  • 1 kudos
Latest Reply
Harun
Honored Contributor
  • 1 kudos

No, once redeemed means you cannot use the voucher again, better reschedule the exam now itself.

  • 1 kudos
1 More Replies
Sujitha
by Databricks Employee
  • 1298 Views
  • 1 replies
  • 4 kudos

Documentation Update  Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, and data enginee...

Documentation Update Databricks documentation provides how-to guidance and reference information for data analysts, data scientists, and data engineers working in the Databricks Data Science & Engineering, Databricks Machine Learning, and Databricks ...

  • 1298 Views
  • 1 replies
  • 4 kudos
Latest Reply
Harun
Honored Contributor
  • 4 kudos

Thanks for sharing @Sujitha Ramamoorthy​ 

  • 4 kudos
bernardocouto
by New Contributor II
  • 1422 Views
  • 1 replies
  • 4 kudos

Resolved! Databricks SQL Connector Abstraction for Python

Databricks SQL framework, easy to learn, fast to code, ready for production.I built an abstraction of the databricks-sql-connector in order to follow a pattern closer to the concepts of ORM tools, in addition to facilitating the adoption of the data ...

  • 1422 Views
  • 1 replies
  • 4 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 4 kudos

Sure, will try and provide feedback for same

  • 4 kudos
kskistad
by New Contributor III
  • 2683 Views
  • 1 replies
  • 2 kudos

Resolved! Identity column in DLT using Python

How would I implement the Identity column in Delta Live Tables using Python syntax?GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( [ START WITH start ] [ INCREMENT BY step ] ) ] 

  • 2683 Views
  • 1 replies
  • 2 kudos
Latest Reply
LaurentLeturgez
Databricks Employee
  • 2 kudos

Hi @Kory Skistad​ Please find below the table schema definition to use in a python dlt pipeline. You can see it mentions the identity column definition. @dlt.table( comment="Raw data on sales", schema=""" customer_id STRING, customer_name STR...

  • 2 kudos
Bartek
by Contributor
  • 2250 Views
  • 2 replies
  • 1 kudos

Resolved! Spark UI simulator is not available online

About 2 weeks ago I started course on "Optimizing Apache Spark on Databricks" from official Databricks academy. It is heavily based on Spark UI simulator experiments that were available here: https://www.databricks.training/spark-ui-simulator and for...

  • 2250 Views
  • 2 replies
  • 1 kudos
Latest Reply
LandanG
Databricks Employee
  • 1 kudos

Hi @Bartosz Maciejewski​ ,Can you try loading the website without https and instead just http like http://www.databricks.training/spark-ui-simulator/ ?

  • 1 kudos
1 More Replies
Gilg
by Contributor II
  • 5113 Views
  • 4 replies
  • 5 kudos

Avro Deserialization from Event Hub capture and Autoloader

Hi All,I am getting data from Event Hub capture in Avro format and using Auto Loader to process it.I get into the point where I can read the Avro by casting the Body into a string.Now I wanted to deserialized the Body column so it will in table forma...

image image
  • 5113 Views
  • 4 replies
  • 5 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 5 kudos

If you still want to go with the above approach and don't want to provide schema manually, then you can fetch a tiny batch with 1 record and build the schema into a variable using a .schema option. Once done, you can add a new Body column by providin...

  • 5 kudos
3 More Replies
kskistad
by New Contributor III
  • 1718 Views
  • 0 replies
  • 1 kudos

Set and use variables in DLT pipeline notebooks

Using DLT, I have two streaming sources coming from autoloader. Source1 contains a single row of data in the file and Source2 has thousands of rows. There is a common key column between the two sources to join them together. So far, so good.I have a ...

  • 1718 Views
  • 0 replies
  • 1 kudos
mikaellognseth
by New Contributor III
  • 11554 Views
  • 7 replies
  • 0 kudos

Resolved! Databricks cluster start-up: Self Bootstrap Failure

When attempting to deploy/start an Azure Databricks cluster through the UI, the following error consistently occurs: { "reason": { "code": "SELF_BOOTSTRAP_FAILURE", "parameters": { "databricks_error_message": "Self-bootstrap failure d...

  • 11554 Views
  • 7 replies
  • 0 kudos
Latest Reply
mikaellognseth
New Contributor III
  • 0 kudos

Hi, in our case the issue turned out to be DNS... As the DNS servers set on the Databricks workspace vnet are only available when peering the "management" vnet in our setup. Took a while to figure out as the error didn't exactly give a lot of clarity...

  • 0 kudos
6 More Replies
swzzzsw
by New Contributor III
  • 8599 Views
  • 3 replies
  • 9 kudos

"Run now with different parameters" - different parameters not recognized by jobs involving multiple tasks

I'm running a databricks job involving multiple tasks and would like to run the job with different set of task parameters. I can achieve that by edit each task and and change the parameter values. However, it gets very manual when I have a lot of tas...

  • 8599 Views
  • 3 replies
  • 9 kudos
Latest Reply
erens
New Contributor II
  • 9 kudos

Hello,I am also facing with the same issue. The problem is described below:I have a multi-task job. This job consists of multiple "spark_python_task" kind tasks that execute a python script in a spark cluster. This pipeline is created within a CI/CD ...

  • 9 kudos
2 More Replies
NavyaD
by New Contributor III
  • 2342 Views
  • 2 replies
  • 4 kudos

How to read a sql notebook in python notebook on workspace

I have a notebook named ecom_sellout.sql under the path notebooks/python/dataloader/queries.I have another notebook(named dataloader under the path notebooks/python/dataloader) in which I am calling this sql notebook.My code runs perfectly fine on re...

image
  • 2342 Views
  • 2 replies
  • 4 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 4 kudos

use magic commands and other hand you can use python and SQL formatted there. It will work

  • 4 kudos
1 More Replies
rami-lv
by New Contributor II
  • 3466 Views
  • 3 replies
  • 3 kudos

What gets overridden when writing overriding a delta lake table?

I just tried to write to a delta lake table using override mode, and I found that history is reserved. It's unclear to me how the data is overridden, and how long the history could be preserved. As they say, a code is better than a thousand words: my...

  • 3466 Views
  • 3 replies
  • 3 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 3 kudos

Hi @Rami ALZEBAK​ overwrite means first it will remove the data and again it will write the whole data.If you want to see history use can use DESCRIBE HISTORY command

  • 3 kudos
2 More Replies
Chris_Shehu
by Valued Contributor III
  • 1294 Views
  • 1 replies
  • 2 kudos

What are the options for extracting data from the delta lake for a vendor?

Our vendor is looking to use Microsoft API Manager to retrieve data from a variety of sources. Is it possible to extract records from the delta lake by using an API?What I've tried:I reviewed the available databricks API's it looks like most of them ...

  • 1294 Views
  • 1 replies
  • 2 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 2 kudos

Another possibility for this potentially is to stand up a cluster and have a notebook running flask to create an API interface. I'm still looking into options, but it seems like there should be a baked in solution besides the JDBC connector. I'm not ...

  • 2 kudos
gauthamchettiar
by New Contributor II
  • 1767 Views
  • 0 replies
  • 1 kudos

Spark always performing broad casts irrespective of spark.sql.autoBroadcastJoinThreshold during streaming merge operation with DeltaTable.

I am trying to do a streaming merge between delta tables using this guide - https://docs.delta.io/latest/delta-update.html#upsert-from-streaming-queries-using-foreachbatchOur Code Sample (Java): Dataset<Row> sourceDf = sparkSession ...

BroadCastJoin 1M
  • 1767 Views
  • 0 replies
  • 1 kudos
same213
by New Contributor III
  • 4771 Views
  • 4 replies
  • 8 kudos

Is it possible to create a sqlite database and export it?

I am trying to create a sqlite database in databricks and add a few tables to it. Ultimately, I want to export this using Azure. Is this possible?

  • 4771 Views
  • 4 replies
  • 8 kudos
Latest Reply
same213
New Contributor III
  • 8 kudos

@Hubert Dudek​  We currently have a process in place that reads in a SQLite file. We recently transitioned to using Databricks. We were hoping to be able to create a SQLite file so we didn't have to alter the current process we have in place.

  • 8 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels