cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

ameya
by New Contributor
  • 155 Views
  • 1 replies
  • 0 kudos

Adding new columns to a Delta Live table in a CDC process

Hi I am new to databricks and still learning.I am trying to do a CDC on a table.  APPLY CHANGES INTO LIVE.table1 FROM schema2.table2 KEYS (Id) SEQUENCE BY orderByColumn COLUMNS * EXCEPT (col1, col2) STORED AS SCD TYPE 1 ;  table1 is in schema1 and ...

  • 155 Views
  • 1 replies
  • 0 kudos
Latest Reply
raphaelblg
Honored Contributor
  • 0 kudos

Hi @ameya , Scenario 1: Enabling Delta schema evolution in your table or at DLT pipeline level should suffice for the scenario of new fields being added to the schema.  Scenario 2: The INSERT statement doesn't support schema evolution as described in...

  • 0 kudos
NarenderKumar
by New Contributor III
  • 137 Views
  • 2 replies
  • 0 kudos

Can we parameterize the compute in job cluster

I have created a workflow job in databricks with job parameters.I want to run the job same with different workloads and data volume.So I want the compute cluster to be parametrized so that I can pass the compute requirements(driver, executor size and...

  • 137 Views
  • 2 replies
  • 0 kudos
Latest Reply
brockb
Valued Contributor
  • 0 kudos

Hi @NarenderKumar , Have you considered leveraging autoscaling for the existing cluster?If this does not meet your needs, are the differing volume/workloads known in advance? If so, could different compute be provisioned using Infrastructure as Code ...

  • 0 kudos
1 More Replies
yusufd
by New Contributor III
  • 1484 Views
  • 9 replies
  • 8 kudos

Resolved! Pyspark serialization

Hi,I was looking for comprehensive documentation on implementing serialization in pyspark, most of the places I have seen is all about serialization with scala. Could you point out where I can get a detailed explanation on it?

  • 1484 Views
  • 9 replies
  • 8 kudos
Latest Reply
yusufd
New Contributor III
  • 8 kudos

Thank you @Kaniz_Fatma  for the prompt reply. This clears the things and also distinguishes between spark-scala and pyspark. Appreciate your explanation. Will apply this and also share any findings based on this which will help the community!

  • 8 kudos
8 More Replies
tseader
by New Contributor II
  • 352 Views
  • 3 replies
  • 1 kudos

Resolved! Python SDK clusters.create_and_wait - Sourcing from cluster-create JSON

I am attempting to create a compute cluster using the Python SDK while sourcing a cluster-create configuration JSON file, which is how it's done for the databricks-cli and what databricks provides through the GUI.  Reading in the JSON as a Dict fails...

  • 352 Views
  • 3 replies
  • 1 kudos
Latest Reply
tseader
New Contributor II
  • 1 kudos

@Kaniz_Fatma The structure of the `cluster-create.json` is perfectly fine.  The issue is as stated above related to the structure is that the SDK does not allow nested structures from the JSON file to be used, and instead they need to be cast to spec...

  • 1 kudos
2 More Replies
jacovangelder
by Contributor III
  • 156 Views
  • 2 replies
  • 2 kudos

How do you define PyPi libraries on job level in Asset Bundles?

Hello,Reading the documentation, it does not state it is possible to define libraries on job level instead of on task level. It feels really counter-intuitive putting libraries on task level in Databricks workflows provisioned by Asset Bundles. Is th...

  • 156 Views
  • 2 replies
  • 2 kudos
Latest Reply
jacovangelder
Contributor III
  • 2 kudos

Thanks @Witold ! Thought so. I decided to go with an init script where I install my dependencies rather than installing libraries. For future reference, this is what it looks like:job_clusters: - job_cluster_key: job_cluster new_cluster: ...

  • 2 kudos
1 More Replies
William_Scardua
by Valued Contributor
  • 501 Views
  • 1 replies
  • 0 kudos

Get Block Size

Hi guys,How can I get the block size ? have any idea ?Thank you

  • 501 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @William_Scardua, To get the total directory size in Databricks, you can use the dbutils.fs.ls command. Here are a few options: To get the size of a specific directory (e.g., /mnt/abc/xyz), you can run the following Scala code: val path = "/mn...

  • 0 kudos
KendraVant
by New Contributor II
  • 10764 Views
  • 7 replies
  • 2 kudos

Resolved! How do I clear all output results in a notebook?

I'm building notebooks for tutorial sessions and I want to clear all the output results from the notebook before distributing it to the participants. This functionality exists in Juypter but I can't find it in Databricks. Any pointers?

  • 10764 Views
  • 7 replies
  • 2 kudos
Latest Reply
holly
Contributor II
  • 2 kudos

Yes! Run >  Clear >  Clear all cell outputs Fun fact, this feature was made ~10 years ago when we realised all our customer demos looked very messy and had lots of spoilers in them!

  • 2 kudos
6 More Replies
kmaley
by New Contributor
  • 141 Views
  • 1 replies
  • 0 kudos

Concurrent append exception - Two streaming sources writing to same record on the delta table

Hi All, I have a scenario where there are 2 streaming sources Stream1( id, col1, col2) and Stream 2( id, col3, col4) and my delta table has columns (id, col1, col2, col3, col4). My requirement is to insert the record into the delta table if the corre...

  • 141 Views
  • 1 replies
  • 0 kudos
Latest Reply
Witold
New Contributor III
  • 0 kudos

I would keep both write operations separate, i.e. they should write in own tables/partitions. In later stages (e.g. silver), you can easily merge them.

  • 0 kudos
kiranpeesa
by New Contributor
  • 235 Views
  • 1 replies
  • 0 kudos

error in notebook execution

Error in callback <bound method UserNamespaceCommandHook.post_run_cell of <dbruntime.DatasetInfo.UserNamespaceCommandHook object at 0x7f5790c07070>> (for post_run_cell)

  • 235 Views
  • 1 replies
  • 0 kudos
Latest Reply
Witold
New Contributor III
  • 0 kudos

Can you show us your code?

  • 0 kudos
Soma
by Valued Contributor
  • 954 Views
  • 2 replies
  • 2 kudos

node_timeline system table is not producing any output in UC

node_timeline system table is not producing any output in UC.In system tables node_timeline table is not having any data .Do we need to enable to any other services for UC

Data Engineering
spark
Unity Catalog
  • 954 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Soma,  The node_timeline system table in Databricks is designed to provide historical observability across your account. However, if you’re not seeing any data in the node_timeline table, there are a few considerations to keep in mind: Unity ...

  • 2 kudos
1 More Replies
RajaPalukuri
by New Contributor II
  • 841 Views
  • 4 replies
  • 0 kudos

Databricks -Terraform- (condition_task)

Hi Team ,I am planning to create IF/ELSE condition task in databricks using terraform code . My requirement is Task A ( Extract records from DB and Count recs) --> Task B ( validate the counts using Condition_task) --> Task c ( load data if Task B va...

  • 841 Views
  • 4 replies
  • 0 kudos
Latest Reply
hendrykarlar
New Contributor II
  • 0 kudos

 Implementing conditional logic in Databricks using Terraform involves setting up tasks and condition checks between them. Here's how you can structure your Terraform code to achieve the desired workflow:Step 1: Define Databricks Notebooks as TasksAs...

  • 0 kudos
3 More Replies
NCat
by New Contributor III
  • 6067 Views
  • 7 replies
  • 3 kudos

How can I start SparkSession out of Notebook?

Hi community,How can I start SparkSession out of Notebook?I want to split my Notebook into small Python modules, and I want to let some of them to call Spark functionality.

  • 6067 Views
  • 7 replies
  • 3 kudos
Latest Reply
jacovangelder
Contributor III
  • 3 kudos

Just overtake Databricks sparksession.from pyspark.sql import SparkSession spark = SparkSession.getActiveSession()

  • 3 kudos
6 More Replies
VovaVili
by New Contributor II
  • 1118 Views
  • 3 replies
  • 0 kudos

Databricks Runtime 13.3 - can I use Databricks Connect without Unity Catalog?

Hello all,The official documentation for Databricks Connect states that, for Databricks Runtime versions 13.0 and above, my cluster needs to have Unity Catalog enabled for me to use Databricks Connect, and use a Databricks cluster through an IDE like...

  • 1118 Views
  • 3 replies
  • 0 kudos
Latest Reply
mohaimen_syed
New Contributor III
  • 0 kudos

Hi, I'm currently using Databricks Connect without the Unity Catalog on VS Code. Although I have connected the Unity Catalog separately on multiple occasion I don't thing its required.Here is the doc:https://docs.databricks.com/en/dev-tools/databrick...

  • 0 kudos
2 More Replies
amartinez
by New Contributor III
  • 3443 Views
  • 6 replies
  • 4 kudos

Workaround for GraphFrames not working on Delta Live Table?

According to this page, the GraphFrames package is included in the databricks runtime since at least 11.0. However trying to run a connected components algorithm inside a delta live table notebook yields the error java.lang.ClassNotFoundException: or...

  • 3443 Views
  • 6 replies
  • 4 kudos
Latest Reply
lprevost
New Contributor III
  • 4 kudos

I'm also trying to use GraphFrames inside a DLT pipeline.   I get an error that graphframes not installed in the cluster.   i"m using it successfully in test notebooks using the ML version of the cluster.  Is there a way to use this inside a DLT job?

  • 4 kudos
5 More Replies
johnb1
by Contributor
  • 16309 Views
  • 13 replies
  • 8 kudos

Problems with pandas.read_parquet() and path

I am doing the "Data Engineering with Databricks V2" learning path.I cannot run "DE 4.2 - Providing Options for External Sources", as the first code cell does not run successful:%run ../Includes/Classroom-Setup-04.2Screenshot 1: Inside the setup note...

MicrosoftTeams-image MicrosoftTeams-image (1) Capture Capture_2
  • 16309 Views
  • 13 replies
  • 8 kudos
Latest Reply
jonathanchcc
New Contributor III
  • 8 kudos

Thanks for sharing this helped me too 

  • 8 kudos
12 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels