cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mutharasu
by New Contributor II
  • 7872 Views
  • 6 replies
  • 5 kudos

SAP Business Object(BO) Integration with Databricks

Hi Team,We are doing an analysis on SAP Business object to connect with databricks and built a report on top of the data in the data lakehouse. In our current architecture we have delta tables on top of S3 storage. Please let us know any connectors/d...

  • 7872 Views
  • 6 replies
  • 5 kudos
Latest Reply
bharat4880
New Contributor II
  • 5 kudos

Hi @HB83 , Can I know which version of BO are you using? We have a similar requirement.

  • 5 kudos
5 More Replies
Dave_Nithio
by Contributor II
  • 8804 Views
  • 4 replies
  • 2 kudos

Resolved! How to use autoloader with csv containing spaces in attribute names?

I am attempting to use autoloader to add a number of csv files to a delta table. The underlying csv files have spaces in the attribute names though (i.e. 'Account Number' instead of 'AccountNumber'). When I run my autoload, I get the following error ...

  • 8804 Views
  • 4 replies
  • 2 kudos
Latest Reply
Dave_Nithio
Contributor II
  • 2 kudos

@Hubert Dudek​ thanks for your response! I was able to use what you proposed above to generate the schema. The issue is that the schema sets all attributes to STRING values and renames them numerically ('_c0', '_c1', etc.). Although this allows us to...

  • 2 kudos
3 More Replies
suresh1122
by New Contributor III
  • 18780 Views
  • 12 replies
  • 7 kudos

dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?

I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code( df .write .format('delta') .mode('overwrite') .option('overwriteSchema', 'true') .saveAsTable('output_table'))but this...

  • 18780 Views
  • 12 replies
  • 7 kudos
Latest Reply
Lakshay
Databricks Employee
  • 7 kudos

You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time

  • 7 kudos
11 More Replies
ImAbhishekTomar
by New Contributor III
  • 13398 Views
  • 7 replies
  • 4 kudos

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

I am facing an error when trying to write data on Kafka using spark stream.#Extract source_stream_df= (spark.readStream .format("cosmos.oltp.changeFeed") .option("spark.cosmos.container", PARM_CONTAINER_NAME) .option("spark.cosmos.read.inferSchema.en...

  • 13398 Views
  • 7 replies
  • 4 kudos
Latest Reply
devmehta
New Contributor III
  • 4 kudos

What event hub namespace you were using?I had same problem and resolved by changing pricing plan from basic to standard as Kafka apps is not supporting in basic planLet me know if you had anything else. Thanks

  • 4 kudos
6 More Replies
Phani1
by Databricks MVP
  • 5401 Views
  • 6 replies
  • 1 kudos

Unity Catalog is set up for multi region

Hello Team, I need some clarification on the below diagram . According to the documentation, the Unity Catalog is set up for each region. If we are using multiple clouds, the diagram shows only one Unity Catalog across regions. Shouldn't there be two...

Phani1_0-1724329336161.png
  • 5401 Views
  • 6 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Yes, the reason why they are grouped into a single rectangle is probably to show that they both are Unity Catalog enabled. It can indeed be confusing, represented like that.If you want to let them connect to each other, delta sharing or metastore fed...

  • 1 kudos
5 More Replies
guangyi
by Contributor III
  • 2331 Views
  • 3 replies
  • 1 kudos

Resolved! Error when query event log: Cannot read more than one event logs in the same query

I try to follow the instructions of Monitor Delta Live Tables pipelines to query dlt.expection log.Here is the simple code version I copied from the Querying the event log section:CREATE TEMPORARY LIVE VIEW event_log_raw AS (SELECT * FROM event_log(T...

  • 2331 Views
  • 3 replies
  • 1 kudos
Latest Reply
guangyi
Contributor III
  • 1 kudos

H @szymon_dybczak , thank you for the advice. After investigating the problem has been solved. The error message mentioned in the title is not the key, the below error message “A pipeline with a different id is already registered in this Credential S...

  • 1 kudos
2 More Replies
raghu2
by New Contributor III
  • 3619 Views
  • 3 replies
  • 0 kudos

DAB run

Hello All,I am running this command : databricks bundle run -t dev dltPpl_job --debugBundle name: dltPpl. Bundle was generated using: databricks bundle init --target devError message: Error: exit status 1Failed to marshal state to json: unsupported a...

  • 3619 Views
  • 3 replies
  • 0 kudos
Latest Reply
prar_shah
New Contributor III
  • 0 kudos

@Retired_mod I was trying 'pip install --upgrade databricks' before but after upgrading the version with 'brew upgrade databricks it worked. Thanks for the help!

  • 0 kudos
2 More Replies
thot
by New Contributor II
  • 1300 Views
  • 2 replies
  • 0 kudos

spark config not working in job cluster

I am trying to rename a delta table like this:spark.conf.set("spark.databricks.delta.alterTable.rename.enabledOnAWS", "true")spark.sql("ALTER TABLE db1.rz_test5 RENAME TO db1.rz_test6") The data is on aws s3, that's why I have to use spark config in ...

  • 1300 Views
  • 2 replies
  • 0 kudos
Latest Reply
MaximeGendre
New Contributor III
  • 0 kudos

Hello,it would be interesting to test with the same runtime version.Does it work with a job running on a 13.3 ?

  • 0 kudos
1 More Replies
Karthig
by New Contributor III
  • 55280 Views
  • 15 replies
  • 8 kudos

Error Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient - while trying to create database

Hello All,I get the org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient while trying to create a database scr...

image image.png image
  • 55280 Views
  • 15 replies
  • 8 kudos
Latest Reply
prashasinghal
New Contributor III
  • 8 kudos

Getting same issue after installing ojdbc driver for oracle. 

  • 8 kudos
14 More Replies
cm04
by New Contributor III
  • 1283 Views
  • 2 replies
  • 3 kudos

Resolved! Why does my job run on shared compute instead of job compute?

I have configured a job using `databricks.yml````resources:  jobs:    my_job:      name: my_job      tasks:        - task_key: create_feature_tables          job_cluster_key: my_job_cluster          spark_python_task:            python_file: ../src/c...

cm04_0-1725643451954.png
  • 1283 Views
  • 2 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @cm04 ,You can try to upgrade CLI to newest version. I've seen similiar issue before and upgrading CLI was a solution back then.Solved: Yml file replacing job cluster with all-purpose cl... - Databricks Community - 72248

  • 3 kudos
1 More Replies
Graham
by New Contributor III
  • 6960 Views
  • 4 replies
  • 3 kudos

Resolved! Inline comment next to un-tickmarked SET statement = Syntax error

Running this code in databricks SQL works great:SET USE_CACHED_RESULT = FALSE;   -- Result: -- key value -- USE_CACHED_RESULT FALSEIf I add an inline comment, however, I get a syntax error:SET USE_CACHED_RESUL...

  • 6960 Views
  • 4 replies
  • 3 kudos
Latest Reply
rafal_walisko
New Contributor II
  • 3 kudos

Hi, I'm getting the same error when trying to execute statement through API "statement": "SET `USE_CACHED_RESULT` = FALSE; SELECT COUNT(*) FROM TABLE" Every combination fail  "status": { "state": "FAILED", "error": { "e...

  • 3 kudos
3 More Replies
shri0509
by New Contributor II
  • 3126 Views
  • 5 replies
  • 1 kudos

How to avoid iteration/loop in databricks in the given scenario

Hi all, I need your input.I am new to Databricks and working with a dataset that consists of around 10,000 systems, each containing approximately 100 to 150 parts. These parts have attributes such as name, version, and serial number. The dataset size...

Data Engineering
data engineering
  • 3126 Views
  • 5 replies
  • 1 kudos
Latest Reply
AnnieWhite
New Contributor II
  • 1 kudos

Thank you so much for the link.

  • 1 kudos
4 More Replies
Tico23
by Contributor
  • 18505 Views
  • 12 replies
  • 10 kudos

Connecting SQL Server (on-premise) to Databricks via jdbc:sqlserver

Is it possible to connect to SQL Server on-premise (Not Azure) from Databricks?I tried to ping my virtualbox VM (with Windows Server 2022) from within Databricks and the request timed out.%sh   ping 122.138.0.14This is what my connection might look l...

  • 18505 Views
  • 12 replies
  • 10 kudos
Latest Reply
BharathKumarS
New Contributor II
  • 10 kudos

I tried to connect to localhost sql server through databricks community edition, but it failed. I have created an IP rule on port 1433 allowed inbound connection from all public network, but still didn't connect. I tried locally using python its work...

  • 10 kudos
11 More Replies
guangyi
by Contributor III
  • 2714 Views
  • 4 replies
  • 2 kudos

Resolved! How to create a DLT pipeline with SQL statement

I need a DLT pipeline to create a materialized view for fetching event logs. All the ways below I tried are failed:Attach a notebook with pure SQL inside: No magic cell like `%sql` are failedAttach a notebook with `spark.sql` python code: Failed beca...

  • 2714 Views
  • 4 replies
  • 2 kudos
Latest Reply
guangyi
Contributor III
  • 2 kudos

After just finishing my last reply, I realized what’s wrong with my code: I should use “file” property instead of “notebook” in the libraries section.It works now. Thank you guys, you are my rubber duck!

  • 2 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels