cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

dhainik
by New Contributor II
  • 1894 Views
  • 3 replies
  • 1 kudos

ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3

I have a Databricks job that runs daily at 14:00 IST and typically finishes in about 2 hours. However, yesterday, the job got stuck and continued running indefinitely. After exceeding 5 hours, I canceled it and reran the job, which then completed suc...

  • 1894 Views
  • 3 replies
  • 1 kudos
Latest Reply
Witold
Contributor III
  • 1 kudos

When you navigate to the corresponding cluster, you'll see, "Event log", "Spark UI" and "Driver logs". There, you should find all the needed information.

  • 1 kudos
2 More Replies
prashasinghal
by New Contributor III
  • 345 Views
  • 1 replies
  • 0 kudos

Resolved! Compute cluster not working after installing ojdbc

Hi ,I have a Databricks 12.2 LTS compute cluster which as expected.I need to establish conn to Oracle, once ojdbc11 Driver jar is installed. Cluster does not execute any cell (even print statement) and stuck in waiting state.In Driver logs it shows:'...

  • 345 Views
  • 1 replies
  • 0 kudos
Latest Reply
prashasinghal
New Contributor III
  • 0 kudos

Issue is resolved.Installing drivers directly collide with Databricks internal runtime libraries.We have used init script to copy jar from workspace to databricks/jars/.

  • 0 kudos
fpmsi
by New Contributor
  • 248 Views
  • 1 replies
  • 0 kudos

Best Approach to Store Data in Azure Gov Cloud Workspace without Unity Catalog

Our team is using a workspace Azure Gov Cloud. We would like to download files from an external source into our Workspace. Since Unity Catalog is not enabled in Azure Gov Cloud, we’re looking for the best approach for securely storing data in our wor...

  • 248 Views
  • 1 replies
  • 0 kudos
Latest Reply
Marlene495
New Contributor II
  • 0 kudos

Hello!For securely storing sensitive data in your Azure Gov Cloud workspace without Unity Catalog, use Azure Blob Storage with encryption, Azure Data Lake Storage with AAD access control, or Azure Key Vault for secrets. Managed identities can also he...

  • 0 kudos
TWib
by New Contributor III
  • 4610 Views
  • 6 replies
  • 3 kudos

DatabricksSession broken for 15.1

This code fails with exception:[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.File <command-4420517954891674>, line 7 4 spark = DatabricksSession.builder.getOrCreate() 6 df = spark.read.table("samples.nyctaxi.trips") ---->...

  • 4610 Views
  • 6 replies
  • 3 kudos
Latest Reply
977073
New Contributor II
  • 3 kudos

I can see this issue in 13.3 LTS, production code still running in 11.3LTS but upgradding to higher LTS DBR version gives this error. I believe you should fix it or provide a migration guide from one DBR to the other

  • 3 kudos
5 More Replies
ruoyuqian
by New Contributor II
  • 979 Views
  • 3 replies
  • 2 kudos

How to print out logs during DLT pipeline run

I'm trying to debug my pipeline in DLT and during runtime I need some log info and how do I do a print('something') during DLT run?

  • 979 Views
  • 3 replies
  • 2 kudos
Latest Reply
filipniziol
Contributor III
  • 2 kudos

Hi  @ruoyuqian ,  @kranthi2,Why print() Statements Won’t Work in DLT:In Databricks Delta Live Tables (DLT), you do not see print() statements, as what is visible are the events.Alternative Solution: Using Log4j to log to Driver LogTo log information ...

  • 2 kudos
2 More Replies
Mutharasu
by New Contributor II
  • 4527 Views
  • 6 replies
  • 5 kudos

SAP Business Object(BO) Integration with Databricks

Hi Team,We are doing an analysis on SAP Business object to connect with databricks and built a report on top of the data in the data lakehouse. In our current architecture we have delta tables on top of S3 storage. Please let us know any connectors/d...

  • 4527 Views
  • 6 replies
  • 5 kudos
Latest Reply
bharat4880
New Contributor II
  • 5 kudos

Hi @HB83 , Can I know which version of BO are you using? We have a similar requirement.

  • 5 kudos
5 More Replies
Dave_Nithio
by Contributor
  • 5499 Views
  • 4 replies
  • 2 kudos

Resolved! How to use autoloader with csv containing spaces in attribute names?

I am attempting to use autoloader to add a number of csv files to a delta table. The underlying csv files have spaces in the attribute names though (i.e. 'Account Number' instead of 'AccountNumber'). When I run my autoload, I get the following error ...

  • 5499 Views
  • 4 replies
  • 2 kudos
Latest Reply
Dave_Nithio
Contributor
  • 2 kudos

@Hubert Dudek​ thanks for your response! I was able to use what you proposed above to generate the schema. The issue is that the schema sets all attributes to STRING values and renames them numerically ('_c0', '_c1', etc.). Although this allows us to...

  • 2 kudos
3 More Replies
suresh1122
by New Contributor III
  • 13157 Views
  • 12 replies
  • 7 kudos

dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?

I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code( df .write .format('delta') .mode('overwrite') .option('overwriteSchema', 'true') .saveAsTable('output_table'))but this...

  • 13157 Views
  • 12 replies
  • 7 kudos
Latest Reply
Lakshay
Databricks Employee
  • 7 kudos

You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time

  • 7 kudos
11 More Replies
Nis
by New Contributor II
  • 3689 Views
  • 8 replies
  • 4 kudos

Resolved! can we commit offset in spark structured streaming in databricks.

We are storing offset details in checkpoint location wanted to know is there a way can we commit offset once we consume the message from kafka.

  • 3689 Views
  • 8 replies
  • 4 kudos
Latest Reply
dmytro
New Contributor III
  • 4 kudos

Hi @raphaelblg , thanks a lot for providing an elaborate answer. Do you happen to you, by any chance, of some solutions that developers use to track a consumer lag when streaming with Spark from a Kafka topic? It's a rather essential knowledge to hav...

  • 4 kudos
7 More Replies
YannLevavasseur
by New Contributor
  • 2133 Views
  • 1 replies
  • 0 kudos

SQL function refactoring into Databricks environment

Hello all,I'm currently working on importing  some SQL functions from Informix Database into Databricks using Asset Bundle deploying Delta Live Table to Unity Catalog. I'm struggling importing a recursive one, there is the code :CREATE FUNCTION "info...

YannLevavasseur_0-1713952085696.png YannLevavasseur_1-1713952236903.png
  • 2133 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hello all,I'm currently working on importing  some SQL functions from Informix Database into Databricks using Asset Bundle deploying Delta Live Table to Unity Catalog. I'm struggling importing a recursive one, there is the code :CREATE FUNCTION "info...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
-werners-
by Esteemed Contributor III
  • 7431 Views
  • 2 replies
  • 0 kudos

performance issues using shared compute access mode in scala

I created on our dev environment a cluster using the shared access mode, for our devs to use (instead of separate single user clusters).What I notice is that the performance of this cluster is terrible.  And I mean really terrible: notebook cells wit...

  • 7431 Views
  • 2 replies
  • 0 kudos
Latest Reply
prakharcode
New Contributor II
  • 0 kudos

I can confirm this behaviour. To run the same job on shared cluster in "USER_ISOLATION" mode with nothing changes between the job definition or source data, the performance drop is significant. So much so that there needs to be a radical change in ho...

  • 0 kudos
1 More Replies
ImAbhishekTomar
by New Contributor III
  • 9070 Views
  • 7 replies
  • 4 kudos

kafkashaded.org.apache.kafka.common.errors.TimeoutException: topic-downstream-data-nonprod not present in metadata after 60000 ms.

I am facing an error when trying to write data on Kafka using spark stream.#Extract source_stream_df= (spark.readStream .format("cosmos.oltp.changeFeed") .option("spark.cosmos.container", PARM_CONTAINER_NAME) .option("spark.cosmos.read.inferSchema.en...

  • 9070 Views
  • 7 replies
  • 4 kudos
Latest Reply
devmehta
New Contributor III
  • 4 kudos

What event hub namespace you were using?I had same problem and resolved by changing pricing plan from basic to standard as Kafka apps is not supporting in basic planLet me know if you had anything else. Thanks

  • 4 kudos
6 More Replies
Phani1
by Valued Contributor II
  • 1272 Views
  • 6 replies
  • 1 kudos

Unity Catalog is set up for multi region

Hello Team, I need some clarification on the below diagram . According to the documentation, the Unity Catalog is set up for each region. If we are using multiple clouds, the diagram shows only one Unity Catalog across regions. Shouldn't there be two...

Phani1_0-1724329336161.png
  • 1272 Views
  • 6 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Yes, the reason why they are grouped into a single rectangle is probably to show that they both are Unity Catalog enabled. It can indeed be confusing, represented like that.If you want to let them connect to each other, delta sharing or metastore fed...

  • 1 kudos
5 More Replies
TimB
by New Contributor III
  • 8432 Views
  • 6 replies
  • 0 kudos

Foreign catalog - Connections using insecure transport are prohibited --require_secure_transport=ON

I have added a connection to a MySql database in Azure, and I have created a foreign catalog in Databricks. But when I go to query the database I get the following error;Connections using insecure transport are prohibited while --require_secure_trans...

  • 8432 Views
  • 6 replies
  • 0 kudos
Latest Reply
TimB
New Contributor III
  • 0 kudos

I was unable to use serverless compute with Azure MySQL. I did have a meeting with tech support at databricks and the conclusion at the time was - if your Azure MySql is not publically accessible, you cannot connect to it from serverless compute.I'm ...

  • 0 kudos
5 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels