cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Red_blue_green
by New Contributor III
  • 17015 Views
  • 3 replies
  • 0 kudos

Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspark?

Hello,I have currently a delta folder as a table with several columns that are nullable. I want to migrate data to the table and overwrite the content using Pyspark, add several new columns and make them not nullable. I have found a way to make the c...

  • 17015 Views
  • 3 replies
  • 0 kudos
Latest Reply
kanjinghat
New Contributor II
  • 0 kudos

Not sure if you found a solution, you can also try as below. In this case you pass the full path to the delta not the table itself.spark.sql(f"ALTER TABLE delta.`{full_delta_path}` ALTER column {column_name} SET NOT NULL") 

  • 0 kudos
2 More Replies
venkata_kishore
by New Contributor
  • 2686 Views
  • 1 replies
  • 1 kudos

delta live tables - oracle connectivity

Is delta live tables/pipelines support oracle or external database connectivity ? i am getting Oracle Driver not found error. dlt not supporting maven install through asset bundles. ERRORs: 1) py4j.protocol.Py4JJavaError: An error occurred while call...

Data Engineering
Delta Live Tables
dlt
oracle
pipelines
  • 2686 Views
  • 1 replies
  • 1 kudos
Latest Reply
RamGoli
Databricks Employee
  • 1 kudos

Hi @venkata_kishore  , As of now, DLT does not support Oracle, and one cannot install third-party libraries and JARs. https://docs.databricks.com/en/delta-live-tables/unity-catalog.html#limitationsIf Lakehouse Federation has support for Oracle, then ...

  • 1 kudos
AdityaM
by New Contributor II
  • 3654 Views
  • 1 replies
  • 0 kudos

Creating external tables using gzipped CSV file - S3 URI without extensions

Hi Databricks community,Hope you are doing well.I am trying to create an external table using a Gzipped CSV file uploaded to an S3 bucket.The S3 URI of the resource doesn't have any file extensions, but the content of the file is a Gzipped comma sepa...

  • 3654 Views
  • 1 replies
  • 0 kudos
Latest Reply
AdityaM
New Contributor II
  • 0 kudos

Hey  , thanks for your response. I tried using a Serde(I think the OpenCSVSerde should work for me) but unfortunately im getting the below from the Unity Catalog:[UC_DATASOURCE_NOT_SUPPORTED] Data source format hive is not supported in Unity Catalog....

  • 0 kudos
ashraf1395
by Honored Contributor
  • 2949 Views
  • 2 replies
  • 1 kudos

Resolved! Starting Serverless sql cluster on GCP

Hello there,I am trying to start a serverless databricks SQL cluster in GCP. I am following this databricks doc: https://docs.gcp.databricks.com/en/admin/sql/serverless.htmlI have checked that all my requirements are fulfilled for activating the clus...

Screenshot 2024-05-07 113120.png Screenshot 2024-05-07 113137.png
  • 2949 Views
  • 2 replies
  • 1 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 1 kudos

I had another question. Though not related to this thread.Do databricks has any plan for startups, like they have normal free trial

  • 1 kudos
1 More Replies
jainshasha
by New Contributor III
  • 15630 Views
  • 11 replies
  • 2 kudos

Job Cluster in Databricks workflow

Hi,I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially w...

  • 15630 Views
  • 11 replies
  • 2 kudos
Latest Reply
emora
New Contributor III
  • 2 kudos

Honestly you shouldn't have any kind of limitation executing diferent workflows.I did a test case in my Databricks and if you have your workflows with a job cluster your shouldn't have limitation. But I did all my test in Azure and just for you to kn...

  • 2 kudos
10 More Replies
Anske
by New Contributor III
  • 5304 Views
  • 3 replies
  • 1 kudos

Resolved! DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes(    target = "cdctest",    source = "cdctest_cdc_enriched",    keys = ["ID"],    sequence_by...

Data Engineering
Delta Live Tables
  • 5304 Views
  • 3 replies
  • 1 kudos
namankhamesara
by New Contributor II
  • 1087 Views
  • 0 replies
  • 0 kudos

Error while running Databricks modules

Hi Databricks Community,I am following https://customer-academy.databricks.com/learn/course/1266/data-engineering-with-databricks?generated_by=575333&hash=6edddab97f2f528922e2d38d8e4440cda4e5302a this course provided by databricks. In this when I am ...

namankhamesara_0-1715054731073.png
Data Engineering
databrickscommunity
  • 1087 Views
  • 0 replies
  • 0 kudos
MrD
by New Contributor
  • 1789 Views
  • 1 replies
  • 0 kudos

Issue with autoscalling the cluster

Hi All, My job is breaking as the cluster is not able to autoscale. below is the log,can it be due to AWS vms are not spinning up or can be due to issue databricks configuration.Does anyone has faced it before ?TERMINATING Compute terminated. Reason:...

  • 1789 Views
  • 1 replies
  • 0 kudos
Latest Reply
koushiknpvs
New Contributor III
  • 0 kudos

Hey MrD,I faced this issue while running Azure VMs. A restart and re atatching the cluster helped me. Please let me know if that works for you.

  • 0 kudos
Wolfoflag
by New Contributor II
  • 5605 Views
  • 1 replies
  • 0 kudos

Threads vs Processes (Parallel Programming) Databricks

Hi Everyone,I am trying to implement parallel processing in databricks and all the resources online point to using ThreadPool from the pythons multiprocessing.pool library or concurrent future library. These libraries offer methods for creating async...

  • 5605 Views
  • 1 replies
  • 0 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 0 kudos

I am not super expert but I have been using databricks for a while and I can say that - when you use any Python library like asyncio, ThredPool and so one - this is good only to some maintenance things, small api calls etc.When you want to leverage s...

  • 0 kudos
digui
by New Contributor
  • 7295 Views
  • 3 replies
  • 0 kudos

Issues when trying to modify log4j.properties

Hi y'all.​I'm trying to export metrics and logs to AWS cloudwatch, but while following their tutorial to do so, I ended up facing this error when trying to initialize my cluster with an init script they provided.This is the part where the script fail...

  • 7295 Views
  • 3 replies
  • 0 kudos
Latest Reply
cool_cool_cool
New Contributor II
  • 0 kudos

@digui Did you figure out what to do? We're facing the same issue, the script works for the executors.I was thinking on adding an if that checks if there is log4j.properties and modify it only if it exists

  • 0 kudos
2 More Replies
ashraf1395
by Honored Contributor
  • 7105 Views
  • 1 replies
  • 1 kudos

Optimising Clusters in Databricks on GCP

Hi there everyone,We are trying to get hands on Databricks Lakehouse for a prospective client's project.Our Major aim for the project is to Compare Datalakehosue on Databricks and Bigquery Datawarehouse in terms of Costs and time to setup and run que...

  • 7105 Views
  • 1 replies
  • 1 kudos
smedegaard
by New Contributor III
  • 2290 Views
  • 1 replies
  • 0 kudos

DLT run filas with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found"

I've created a streaming live table from a foreign catalog. When I run the DLT pipeline it fils with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found".I haven't seen any documentation that suggests I need to install Debezium manuall...

  • 2290 Views
  • 1 replies
  • 0 kudos
MartinH
by New Contributor II
  • 19537 Views
  • 7 replies
  • 6 kudos

Resolved! Azure Data Factory and Photon

Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...

  • 19537 Views
  • 7 replies
  • 6 kudos
Latest Reply
CharlesReily
New Contributor III
  • 6 kudos

When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...

  • 6 kudos
6 More Replies
dbdude
by New Contributor II
  • 13929 Views
  • 3 replies
  • 1 kudos

AWS Secrets Works In One Cluster But Not Another

Why can I use boto3 to go to secrets manager to retrieve a secret with a personal cluster but I get an error with a shared cluster?NoCredentialsError: Unable to locate credentials 

  • 13929 Views
  • 3 replies
  • 1 kudos
Latest Reply
Husky
New Contributor III
  • 1 kudos

Hey @dbdude, I am facing the same error. Did you find a solution to access the AWS credentials on a Shared Cluster?This article describes a way of storing credentials in a Unity Catalog Volume to fetch by the Shared Cluster:https://medium.com/@amluci...

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels