cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Fnazar
by New Contributor II
  • 1986 Views
  • 1 replies
  • 0 kudos

Streaming delta table - Performance with incremental refresh

Hi Team,We are hitting performance issues with Streaming live delta table specifically when evaluating large tables of more than 10million rows. What are the workarounds to handle these streaming live tables in an attempt to load these large tables. ...

  • 1986 Views
  • 1 replies
  • 0 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 0 kudos

Hi @Fnazar  When dealing with streaming data, you might end up with many small files, which can be inefficient. Use Delta Lake's OPTIMIZE command to compact files into larger ones and ZORDER to colocate related information in the same set of files. T...

  • 0 kudos
TCorr15
by Databricks Partner
  • 8094 Views
  • 1 replies
  • 0 kudos

Databricks Connect V2 - OPENSSL_internal: CERTIFICATE_VERIFY_FAILED

I am getting an error when using Databricks V2 in when running anything relating to databricks-sql-connector/databricks.sql.connect(). Would anyone know how to resolve this issue?Sample Error Message Additional DetailsPython Version 3.11.4Sample Code...

TCorr15_0-1706177740099.png
  • 8094 Views
  • 1 replies
  • 0 kudos
Latest Reply
arpit
Databricks Employee
  • 0 kudos

Can you directly use Databricks connect and validate if it works from CLI?Also, confirm the databrics-connect version please 

  • 0 kudos
nitinsingh1
by Databricks Partner
  • 6076 Views
  • 5 replies
  • 2 kudos

Databricks Runtime compatibility error with latest version while reading from (ADLS) Dynamic 365 .

We are trying to establish ingestion from dynamic 365 >> ADLS >> Databricks, While reading information we need to use databricks runtime 6.4 to read the raw data from ADLS into Databricks. Latest databricks runtime couldn’t be used, Need your help to...

  • 6076 Views
  • 5 replies
  • 2 kudos
Latest Reply
BobBubble2000
New Contributor II
  • 2 kudos

Hi @nitinsingh1 Thank you for bringing up this topic, I'm also currently looking into how to ingest exported Dynamics 365 FO data (csv files with CDM) from ADLS into Databricks. Could you share how you achieved this? I'd be very curious to see your a...

  • 2 kudos
4 More Replies
Manjusha
by New Contributor II
  • 3297 Views
  • 3 replies
  • 0 kudos

Failed to create notebook on community edition

Hi,I am unable to create new notebook on databricks community edition.  getting error 'failed to create notebook' when I click on  create-> notebookIs anyone else facing the same issue? if so, any tips on how to resolve it?

  • 3297 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Thank you for the update. Please select the best response as a solution, so other community members will be able to get unblock if they have this issue

  • 0 kudos
2 More Replies
Shivanshu_
by Contributor
  • 5704 Views
  • 4 replies
  • 3 kudos

parallelizing function call in databricks

I have a use case where I have to process stream data and have to create categorical table's(500 table count). I'm using concurrent threadpools to parallelize the whole process, but while seeing the spark UI, my code dosen't utilizes all the workers(...

Data Engineering
parallelism
threading
threadpool executor
  • 5704 Views
  • 4 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

You can use DLT, read from many-to-one table.

  • 3 kudos
3 More Replies
Fnazar
by New Contributor II
  • 2753 Views
  • 3 replies
  • 0 kudos

Streaming live table

I am trying to create a streaming live table using the below syntax : CREATE OR REFRESH STREAMING LIVE TABLE revenue_stream AS (SELECT * FROM stream (finance_silver.finance_db.revenue)) And as I am trying to execute this notebook via DLT pipeline i a...

Fnazar_0-1706699205272.png
  • 2753 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

You can use materialized views in serverless only

  • 0 kudos
2 More Replies
FurqanAmin
by New Contributor II
  • 1799 Views
  • 1 replies
  • 0 kudos

Spark Logs inaccessible - from the UI and dbfs (GCS)

We have a lot of jobs with spark-submit tasks, previously we were able to see the logs for the jobs. Now we are not able to see the logs in the DBX UI.We created a test job for this 'test_job_2' in our workspace to test it out. When the job finishes ...

  • 1799 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yeshwanth
Databricks Employee
  • 0 kudos

@FurqanAmin Could you please attach a screenshot of this?

  • 0 kudos
OvZ
by New Contributor III
  • 34375 Views
  • 13 replies
  • 1 kudos

Resolved! Is it possible to disable jdbc/odbc connection to (azure) databrick cluster

Hi,I wanna know if it is possible to disable jdbc/odbc connection to (azure) databrick cluster.So know (download) tools could connect this way ? Thz in adv,Oscar

  • 34375 Views
  • 13 replies
  • 1 kudos
Latest Reply
wdphilli
Databricks Partner
  • 1 kudos

Hi @OvZ & @LandanG, as a point of clarification, the script provided is intended to be run in a notebook first. After running the below in a notebook, it creates the init script at the location "dbfs:/databricks/init_scripts/disable_jdbc_odbc.conf" %...

  • 1 kudos
12 More Replies
manohar3
by New Contributor III
  • 5389 Views
  • 2 replies
  • 0 kudos

Resolved! spark databricks jdbc driver integration return rows having column names as values

Hi all,i am using below to code to query table but query returns rows having column names as valuesspark.read .format("jdbc") .option("url", "jdbc:databricks://acme.cloud.databricks.com:443/myschema;transportMode=http;ssl=1;httpPath=<httppath>;Au...

  • 5389 Views
  • 2 replies
  • 0 kudos
Latest Reply
manohar3
New Contributor III
  • 0 kudos

This seems to be issue with spark and was able to fix issue by following postshttps://stackoverflow.com/questions/47020379/bigquery-simba-jdbc-error-with-sparkhttps://stackoverflow.com/questions/68013347/how-to-register-a-jdbc-spark-dialect-in-python...

  • 0 kudos
1 More Replies
Yoni
by New Contributor
  • 17980 Views
  • 5 replies
  • 3 kudos

Resolved! MLFlow failed: You haven't configured the CLI yet

I'm getting an errorYou haven’t configured the CLI yet! Please configure by entering `/databricks/python_shell/scripts/db_ipykernel_launcher.py configure`My cluster is running Databricks Runtime Version 10.1I've also installed mlflow to the cluster l...

  • 17980 Views
  • 5 replies
  • 3 kudos
Latest Reply
HemantKumar
New Contributor II
  • 3 kudos

dbutils.library.restartPython()Add that after you run the pip install mlflow, it worked for me in a non-ML cluster

  • 3 kudos
4 More Replies
DumbBeaver
by New Contributor II
  • 3347 Views
  • 2 replies
  • 1 kudos

Resolved! ERROR: Writing to Unity Catalog from Remote Spark using JDBC

This is my code here. df = spark.createDataFrame([[1,1,2]], schema=['id','first_name','last_name'])(df.write.format("jdbc")     .option("url",  <jdbc-url>)    .option("dbtable","hive_metastore.default.test")    .option("driver", "com.databricks.clien...

  • 3347 Views
  • 2 replies
  • 1 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 1 kudos

 %scala import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcDialects} JdbcDialects.registerDialect(new JdbcDialect() { override def canHandle(url: String): Boolean = url.toLowerCase.startsWith("jdbc:databricks:") override def quoteIde...

  • 1 kudos
1 More Replies
Azure_Data_Bric
by New Contributor III
  • 5228 Views
  • 6 replies
  • 0 kudos

Historical Data Clean-up from Silver tables

Hi Everyone,I need your help/suggestion.We are using a DLT framework for our ELT process, data is received from the Source to the RAW layer in parquet format. This raw data is loaded to the Bronze layer which acts like a history table. From the BRONZ...

  • 5228 Views
  • 6 replies
  • 0 kudos
Latest Reply
Azure_Data_Bric
New Contributor III
  • 0 kudos

Hi,I see Optimize and VACUUM are running on all tables once per day automatically.that day when we performed historical deletion, we deleted the data first, and then we just ran VACUUM with zero hour retention. After some time Optimize and VACUUM (wi...

  • 0 kudos
5 More Replies
CloudPlatformer
by New Contributor II
  • 4283 Views
  • 1 replies
  • 0 kudos

Npip Tunnel Setup Failure

Hi everyone,I'm currently running into an issue when trying to create any type of compute cluster in a workspace (premium, with VNet Injection and private DNS zone + private Endpoint). The operation always fails with: Compute terminated. Reason: Npip...

  • 4283 Views
  • 1 replies
  • 0 kudos
Latest Reply
CloudPlatformer
New Contributor II
  • 0 kudos

I forgot to add: the workspace as well as the other resources are hosted in Azure.

  • 0 kudos
Etyr
by Contributor II
  • 3193 Views
  • 2 replies
  • 0 kudos

Can not connect to databricks on Azure Machine Learning Compute Cluster.

Hello,I'am having an issue where I have :A local machine in WSL 1,Python 3.8 and 3.10OpenJDK 19.0.1 (version "build 19.0.1+10-21")Compute Instance In Azure Machine LearningPython 3.8OpenJDK 8 (version "1.8.0_392")Compute Cluster in Azure Machine Lear...

  • 3193 Views
  • 2 replies
  • 0 kudos
Latest Reply
Etyr
Contributor II
  • 0 kudos

Additional information I forgot to write.Compute Instance has a User managed Identity in Azure, a Service Principal access is created in databricks with its Application ID. Same with the compute cluster, it has its own User Managed Identity that is a...

  • 0 kudos
1 More Replies
learning_1989
by New Contributor II
  • 3270 Views
  • 2 replies
  • 1 kudos

You have json file which is nested with multiple key value pair how you read it in databricks?

You have json file which is nested with multiple key value pair how you read it in databricks?

  • 3270 Views
  • 2 replies
  • 1 kudos
Latest Reply
Lakshay
Databricks Employee
  • 1 kudos

You should be able to read the json file with below code. val df = spark.read.format("json").load("file.json") After this you will need to use the explode function to add columns to the dataframe using the nested values.

  • 1 kudos
1 More Replies
Labels