cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

GowthamR
by New Contributor II
  • 396 Views
  • 2 replies
  • 2 kudos

Connecting SQL Server From Databricks

Hi Team,Good Day!Iam trying to connect to sql server from databricks using pyodbc , However iam not able to connect , And I have tried many ways like adding the init script in the cluster configuration etc,But it is showing me error,I want to know ea...

  • 396 Views
  • 2 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 2 kudos

@GowthamR supplying errors, if possible, given that they aren't leaking credentials (please mask those in the screenshots) will be really useful for helping us to debug what's happening. All the best,BS

  • 2 kudos
1 More Replies
sandeepsuresh16
by New Contributor II
  • 1126 Views
  • 4 replies
  • 7 kudos

Resolved! Azure Databricks Job Run Failed with Error - Could not reach driver of cluster

Hello Community,I am facing an intermittent issue while running a Databricks job. The job fails with the following error message:Run failed with error message:Could not reach driver of cluster <cluster-id>.Here are some additional details:Cluster Typ...

  • 1126 Views
  • 4 replies
  • 7 kudos
Latest Reply
sandeepsuresh16
New Contributor II
  • 7 kudos

Hello Anudeep,Thank you for your detailed response and the helpful recommendations.I would like to provide some additional context:For our jobs, we are running only one notebook at a time, not multiple notebooks or tasks concurrently.The issue occurs...

  • 7 kudos
3 More Replies
Vinil
by New Contributor III
  • 766 Views
  • 7 replies
  • 1 kudos

Upgrading Drivers and Authentication Method for Snowflake Integration

Hello Databricks Support Team,I am reaching out to request assistance with upgrading the drivers and configuring authentication methods for our Snowflake–Databricks integration.We would like to explore and implement one of the recommended secure auth...

  • 766 Views
  • 7 replies
  • 1 kudos
Latest Reply
Vinil
New Contributor III
  • 1 kudos

@Khaja_Zaffer ,Need assistance on upgrading Snowflake drivers on Cluster. We installed Snowflake package on cluster, How to upgrade Snowflake library? For Authentication, I will reach out to Azure team.Thanks for the details.  

  • 1 kudos
6 More Replies
yit
by Contributor III
  • 818 Views
  • 3 replies
  • 6 kudos

Resolved! Schema hints: define column type as struct and incrementally add fields with schema evolution

Hey everyone,I want to set column type as empty struct via schema hints without specifying subfields. Then I expect the struct to be evolved with subfields through schema evolution when new subfields appear in the data. But, I've found in the documen...

yit_2-1758029850608.png
Data Engineering
autoloader
schema hints
Struct
  • 818 Views
  • 3 replies
  • 6 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 6 kudos

Hello @yit ,You can’t. An “empty struct”  is treated as a fixed struct with zero fields, so AutoLoader will not expand it later. The NOTE in the screenshot applies to JSON just as much as Parquet/Avro/CSV.If your goal is “discover whatever shows up u...

  • 6 kudos
2 More Replies
ToBeDataDriven
by New Contributor II
  • 663 Views
  • 4 replies
  • 3 kudos

Resolved! Disable Logging inPython `dbutils.fs.put`?

This function logs every time it writes to stdout "Wrote n bytes." I want to disable its logging as I have thousands of files I'm writing and it floods the log with meaningless information. Does anyone know if it's possible?

  • 663 Views
  • 4 replies
  • 3 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 3 kudos

@ToBeDataDriven , If the above answered your question, then could you please help accept the solution?

  • 3 kudos
3 More Replies
ManojkMohan
by Honored Contributor II
  • 1084 Views
  • 1 replies
  • 2 kudos

Parsing from PDF to a Structured Table | Looking for best practies

Use Case:Converting unstructured data from PDF to structured format before sending to SalesforceAsk:Best practices to structure my table better before sending to a system like salesforceOutput in structured format looks like: My code:Extract Tables f...

image.png
  • 1084 Views
  • 1 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 2 kudos

@ManojkMohan  My advice for parsing PDFs:1. Will your project have PDFs that are all the same in terms of formatting? i.e. invoices of a particular type where things like addresses and values might change but their position on the page is mostly the ...

  • 2 kudos
animesh_kumar27
by New Contributor
  • 406 Views
  • 2 replies
  • 1 kudos

not able to create a compute

hello allI deleted resource group for 3 times and selected 3 different regions east us , central India and south India . And every time I am trying to create a single node compute . It is taking so much time and then at last saying resource out of st...

  • 406 Views
  • 2 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

Resource out of Stock is becoming a commmon issue with Microsoft this days. It happens when they don't have enough VMs in a region. I would say try a new or different SKU for your cluster. This resolves the issue when you change the SKU.

  • 1 kudos
1 More Replies
gerard_gv
by New Contributor
  • 571 Views
  • 1 replies
  • 1 kudos

Resolved! readStream with readChangeFeed option in SQL

I have been some days trying to find the equivalent SQL for:  spark.readStream .option("readChangeFeed", "true") .table("table_name") I suspect that it works like AUTO CDC FROM SNAPSHOT, since CDF adds the column "_commit_version", a ...

  • 571 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @gerard_gv , there isn’t currently a direct SQL equivalent to the readChangeFeed option. This option is only supported in streaming through the Python and Scala DataFrame APIs. As a workaround, take a look at the table_changes SQL function....

  • 1 kudos
Phani1
by Databricks MVP
  • 2421 Views
  • 3 replies
  • 4 kudos

AI/BI Dashboards

Hi ,Can Databricks replace PowerBI with the announcement of these new features like AI/BI Dashboards?Regards,Phani

  • 2421 Views
  • 3 replies
  • 4 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 4 kudos

Whether Databricks AI/BI can fully replace Power BI depends on your organization's use case, data ecosystem, user expertise, and priorities (e.g., cost, integration, or advanced BI needs). Power BI is a mature, standalone BI tool from Microsoft, exce...

  • 4 kudos
2 More Replies
Prashant_151
by New Contributor II
  • 591 Views
  • 1 replies
  • 1 kudos

Resolved! EDW Migration Specialization labs error while installing remorph

Context : This is regarding lab assessment of EDW migration accreditation assessment#Using1. Databricks cli version 0.266.02. Python : 3.10 venvWhen installing databricks labs install remorph@v0.9.1 getting error:$ databricks labs install remorph@v0....

  • 591 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @Prashant_151! I suspect this issue is happening because the project has been sunset and replaced by Lakebridge. As you tried installing Remorph without a version pin, it actually sets up Lakebridge components, so the expected Remorph prompts w...

  • 1 kudos
yit
by Contributor III
  • 1012 Views
  • 7 replies
  • 2 kudos

Resolved! Autoloader: struct field inferred as string

We are currently implementing Autoloader for JSON files with nested struct fields. The goal is to detect the fields as structs, and to have schema evolution.The schema evolution mode is set to addNewColumns, and inferColumnTypes option is set to true...

Data Engineering
autoloader
json
schema inference
Struct
  • 1012 Views
  • 7 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @yit ,This is expected and documented behaviour of autloader schema inference:Configure schema inference and evolution in Auto Loader | Databricks on AWS

  • 2 kudos
6 More Replies
bhargavabasava
by New Contributor III
  • 641 Views
  • 3 replies
  • 3 kudos

Resolved! Accessing Cloud Storage Files on serverless

Hi team,We want to read files from GCS on serverless compute. How can we authenticate serverless compute to access GCS. We attached the service account (with all required privileges) to serverless compute in settings. FYI, Databricks is deployed on G...

  • 641 Views
  • 3 replies
  • 3 kudos
Latest Reply
bhargavabasava
New Contributor III
  • 3 kudos

Thank you @szymon_dybczak. It's working

  • 3 kudos
2 More Replies
HW413
by New Contributor II
  • 600 Views
  • 4 replies
  • 3 kudos

Copy into checkpoint location not able to find

Hi All, I have been using COPYINTO for ingesting the data from managed volumes  and my destination is a managed delta table .I would like to know where is it storing the metadata information or a checkpoint location to maintain its idempotent feature...

  • 600 Views
  • 4 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @HW413 ,You won't find checkpoint. COPY INTO does not use checkpoint like autoloader or spark structured streaming. The COPY INTO command retrieves metadata about all files in the specified source directory/prefix . So, every time you run copy int...

  • 3 kudos
3 More Replies
DatabricksEngi1
by Contributor
  • 1241 Views
  • 7 replies
  • 1 kudos

run a Databricks notebook on serverless environment version 4 with Asset Bundles

Hi everyone,I’m working with Databricks Asset Bundles and running jobs that use notebooks (.ipynb).According to the documentation, it should be possible to set an environment version for serverless jobs. I want to force all of my notebook tasks to ru...

  • 1241 Views
  • 7 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @DatabricksEngi1 When you're defining job in DAB you're using job mapping. One of the key of job mapping is called environments. This is the one you're looking for:  Databricks Asset Bundles resources - Azure Databricks | Microsoft Learn

  • 1 kudos
6 More Replies
aonurdemir
by Contributor
  • 750 Views
  • 2 replies
  • 2 kudos

Resolved! Creating an SCD Type 2 Table with Auto CDC API (One-Time Load + Ongoing Updates)

Hello everyone,I’m working with two CDC tables:table_x: 23,467,761 rows (and growing)table_y: 27,868,173,722 rowsMy goal is to build an SCD Type 2 table (table_z) using the Auto CDC API.The workflow I’d like to achieve is:Initial Load: Populate table...

  • 750 Views
  • 2 replies
  • 2 kudos
Latest Reply
aonurdemir
Contributor
  • 2 kudos

I have solved it with the name parameter as this:dlt.create_streaming_table(name="table_z")dlt.create_auto_cdc_flow(name="backfill",target="table_z",source="table_y",keys=["user_id"],sequence_by=col("source_ts_ms"),ignore_null_updates=False,apply_as_...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels