cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anish_2
by New Contributor II
  • 1304 Views
  • 3 replies
  • 0 kudos

Delta live tables - ignore updates on some columns

Hello Team,I have scenario where in apply_changes, i want to ignore updates on 1 column. Is there any way we can achieve this in Delta live tables?

  • 1304 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi there @Anish_2 , Yes you can do that Here is the doc link : https://docs.databricks.com/aws/en/dlt/cdc?language=PythonFor python you can simply add an attribute except_columns_list like thisdlt.apply_changes( target = "target", source = "users...

  • 0 kudos
2 More Replies
LukaszJ
by Contributor III
  • 23231 Views
  • 7 replies
  • 2 kudos

Resolved! Install ODBC driver by init script

Hello,I want to install ODBC driver (for pyodbc).I have tried to do it using terraform, however I think it is impossible.So I want to do it with Init Script in my cluster. I have the code from the internet and it works when it is on the beginning of ...

  • 23231 Views
  • 7 replies
  • 2 kudos
Latest Reply
MayaBakh_80151
New Contributor II
  • 2 kudos

Actually found this article and using this to migrate my shell script to workspace.Cluster-named and cluster-scoped init script migration notebook - Databricks 

  • 2 kudos
6 More Replies
GowthamR
by New Contributor II
  • 399 Views
  • 2 replies
  • 2 kudos

Connecting SQL Server From Databricks

Hi Team,Good Day!Iam trying to connect to sql server from databricks using pyodbc , However iam not able to connect , And I have tried many ways like adding the init script in the cluster configuration etc,But it is showing me error,I want to know ea...

  • 399 Views
  • 2 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 2 kudos

@GowthamR supplying errors, if possible, given that they aren't leaking credentials (please mask those in the screenshots) will be really useful for helping us to debug what's happening. All the best,BS

  • 2 kudos
1 More Replies
sandeepsuresh16
by New Contributor II
  • 1132 Views
  • 4 replies
  • 7 kudos

Resolved! Azure Databricks Job Run Failed with Error - Could not reach driver of cluster

Hello Community,I am facing an intermittent issue while running a Databricks job. The job fails with the following error message:Run failed with error message:Could not reach driver of cluster <cluster-id>.Here are some additional details:Cluster Typ...

  • 1132 Views
  • 4 replies
  • 7 kudos
Latest Reply
sandeepsuresh16
New Contributor II
  • 7 kudos

Hello Anudeep,Thank you for your detailed response and the helpful recommendations.I would like to provide some additional context:For our jobs, we are running only one notebook at a time, not multiple notebooks or tasks concurrently.The issue occurs...

  • 7 kudos
3 More Replies
Vinil
by New Contributor III
  • 769 Views
  • 7 replies
  • 1 kudos

Upgrading Drivers and Authentication Method for Snowflake Integration

Hello Databricks Support Team,I am reaching out to request assistance with upgrading the drivers and configuring authentication methods for our Snowflake–Databricks integration.We would like to explore and implement one of the recommended secure auth...

  • 769 Views
  • 7 replies
  • 1 kudos
Latest Reply
Vinil
New Contributor III
  • 1 kudos

@Khaja_Zaffer ,Need assistance on upgrading Snowflake drivers on Cluster. We installed Snowflake package on cluster, How to upgrade Snowflake library? For Authentication, I will reach out to Azure team.Thanks for the details.  

  • 1 kudos
6 More Replies
yit
by Contributor III
  • 820 Views
  • 3 replies
  • 6 kudos

Resolved! Schema hints: define column type as struct and incrementally add fields with schema evolution

Hey everyone,I want to set column type as empty struct via schema hints without specifying subfields. Then I expect the struct to be evolved with subfields through schema evolution when new subfields appear in the data. But, I've found in the documen...

yit_2-1758029850608.png
Data Engineering
autoloader
schema hints
Struct
  • 820 Views
  • 3 replies
  • 6 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 6 kudos

Hello @yit ,You can’t. An “empty struct”  is treated as a fixed struct with zero fields, so AutoLoader will not expand it later. The NOTE in the screenshot applies to JSON just as much as Parquet/Avro/CSV.If your goal is “discover whatever shows up u...

  • 6 kudos
2 More Replies
ToBeDataDriven
by New Contributor II
  • 663 Views
  • 4 replies
  • 3 kudos

Resolved! Disable Logging inPython `dbutils.fs.put`?

This function logs every time it writes to stdout "Wrote n bytes." I want to disable its logging as I have thousands of files I'm writing and it floods the log with meaningless information. Does anyone know if it's possible?

  • 663 Views
  • 4 replies
  • 3 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 3 kudos

@ToBeDataDriven , If the above answered your question, then could you please help accept the solution?

  • 3 kudos
3 More Replies
ManojkMohan
by Honored Contributor II
  • 1088 Views
  • 1 replies
  • 2 kudos

Parsing from PDF to a Structured Table | Looking for best practies

Use Case:Converting unstructured data from PDF to structured format before sending to SalesforceAsk:Best practices to structure my table better before sending to a system like salesforceOutput in structured format looks like: My code:Extract Tables f...

image.png
  • 1088 Views
  • 1 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 2 kudos

@ManojkMohan  My advice for parsing PDFs:1. Will your project have PDFs that are all the same in terms of formatting? i.e. invoices of a particular type where things like addresses and values might change but their position on the page is mostly the ...

  • 2 kudos
animesh_kumar27
by New Contributor
  • 407 Views
  • 2 replies
  • 1 kudos

not able to create a compute

hello allI deleted resource group for 3 times and selected 3 different regions east us , central India and south India . And every time I am trying to create a single node compute . It is taking so much time and then at last saying resource out of st...

  • 407 Views
  • 2 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

Resource out of Stock is becoming a commmon issue with Microsoft this days. It happens when they don't have enough VMs in a region. I would say try a new or different SKU for your cluster. This resolves the issue when you change the SKU.

  • 1 kudos
1 More Replies
gerard_gv
by New Contributor
  • 574 Views
  • 1 replies
  • 1 kudos

Resolved! readStream with readChangeFeed option in SQL

I have been some days trying to find the equivalent SQL for:  spark.readStream .option("readChangeFeed", "true") .table("table_name") I suspect that it works like AUTO CDC FROM SNAPSHOT, since CDF adds the column "_commit_version", a ...

  • 574 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @gerard_gv , there isn’t currently a direct SQL equivalent to the readChangeFeed option. This option is only supported in streaming through the Python and Scala DataFrame APIs. As a workaround, take a look at the table_changes SQL function....

  • 1 kudos
Phani1
by Databricks MVP
  • 2434 Views
  • 3 replies
  • 4 kudos

AI/BI Dashboards

Hi ,Can Databricks replace PowerBI with the announcement of these new features like AI/BI Dashboards?Regards,Phani

  • 2434 Views
  • 3 replies
  • 4 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 4 kudos

Whether Databricks AI/BI can fully replace Power BI depends on your organization's use case, data ecosystem, user expertise, and priorities (e.g., cost, integration, or advanced BI needs). Power BI is a mature, standalone BI tool from Microsoft, exce...

  • 4 kudos
2 More Replies
Prashant_151
by New Contributor II
  • 593 Views
  • 1 replies
  • 1 kudos

Resolved! EDW Migration Specialization labs error while installing remorph

Context : This is regarding lab assessment of EDW migration accreditation assessment#Using1. Databricks cli version 0.266.02. Python : 3.10 venvWhen installing databricks labs install remorph@v0.9.1 getting error:$ databricks labs install remorph@v0....

  • 593 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @Prashant_151! I suspect this issue is happening because the project has been sunset and replaced by Lakebridge. As you tried installing Remorph without a version pin, it actually sets up Lakebridge components, so the expected Remorph prompts w...

  • 1 kudos
yit
by Contributor III
  • 1013 Views
  • 7 replies
  • 2 kudos

Resolved! Autoloader: struct field inferred as string

We are currently implementing Autoloader for JSON files with nested struct fields. The goal is to detect the fields as structs, and to have schema evolution.The schema evolution mode is set to addNewColumns, and inferColumnTypes option is set to true...

Data Engineering
autoloader
json
schema inference
Struct
  • 1013 Views
  • 7 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @yit ,This is expected and documented behaviour of autloader schema inference:Configure schema inference and evolution in Auto Loader | Databricks on AWS

  • 2 kudos
6 More Replies
bhargavabasava
by New Contributor III
  • 643 Views
  • 3 replies
  • 3 kudos

Resolved! Accessing Cloud Storage Files on serverless

Hi team,We want to read files from GCS on serverless compute. How can we authenticate serverless compute to access GCS. We attached the service account (with all required privileges) to serverless compute in settings. FYI, Databricks is deployed on G...

  • 643 Views
  • 3 replies
  • 3 kudos
Latest Reply
bhargavabasava
New Contributor III
  • 3 kudos

Thank you @szymon_dybczak. It's working

  • 3 kudos
2 More Replies
HW413
by New Contributor II
  • 600 Views
  • 4 replies
  • 3 kudos

Copy into checkpoint location not able to find

Hi All, I have been using COPYINTO for ingesting the data from managed volumes  and my destination is a managed delta table .I would like to know where is it storing the metadata information or a checkpoint location to maintain its idempotent feature...

  • 600 Views
  • 4 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @HW413 ,You won't find checkpoint. COPY INTO does not use checkpoint like autoloader or spark structured streaming. The COPY INTO command retrieves metadata about all files in the specified source directory/prefix . So, every time you run copy int...

  • 3 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels