cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vroste
by New Contributor III
  • 12487 Views
  • 8 replies
  • 5 kudos

Resolved! Unsupported Azure Scheme: abfss

Using Databricks Runtime 12.0, when attempting to mount an Azure blob storage container, I'm getting the following exception:`IllegalArgumentException: Unsupported Azure Scheme: abfss` dbutils.fs.mount( source="abfss://container@my-storage-accoun...

  • 12487 Views
  • 8 replies
  • 5 kudos
Latest Reply
AdamRink
New Contributor III
  • 5 kudos

What configs did you tweak, having same issue?

  • 5 kudos
7 More Replies
NLearn
by New Contributor II
  • 847 Views
  • 1 replies
  • 0 kudos

How can I programmatically get my notebook default language?

I'm writing some code to perform regression testing which require notebook path and its default language. Based on default language it will perform further analysis. So how can I programmatically get my notebook default language and save in some vari...

  • 847 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

You can get the default language of a notebook using dbutils.notebook.get_notebook_language()  try this example: %pythonimport dbutilsdefault_language = dbutils.notebook.get_notebook_language()print(default_language)

  • 0 kudos
wissamimad
by New Contributor
  • 6830 Views
  • 1 replies
  • 1 kudos

Writing to Delta tables/files is taking a long time

I have a dataframe that is a series of transformation of big data (167 million rows) and I want to write it to delta files and tables using the below :  try: (df_new.write.format('delta') .option("delta.minReaderVersion", "2") .optio...

  • 6830 Views
  • 1 replies
  • 1 kudos
Latest Reply
prasu1222
New Contributor II
  • 1 kudos

Hi @Retired_mod I am having the same issue where i made a inner join on two spark dataframes they are running only a single node not sure how to modify to run on many nodes and same thing with when i write a 30 gb data to a delta table it is almost 3...

  • 1 kudos
quakenbush
by Contributor
  • 1794 Views
  • 3 replies
  • 1 kudos

Resolved! Time out importing DBC

Importing or cloning the .dbc Folder from "advanced-data-engineering-with-databricks" into my own workspace fails with a time-out. The folder is incomplete How can I fix this?I tried download and import the file and via URL... 

  • 1794 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Great news and congrats on your exam!!!

  • 1 kudos
2 More Replies
JonW
by New Contributor
  • 2136 Views
  • 2 replies
  • 0 kudos

Pandas finds parquet file, Spark does not

I am having an issue with Databricks (Community Edition) where I can use Pandas to read a parquet file into a dataframe, but when I use Spark it states the file doesn't exist. I have tried reformatting the file path for spark but I can't seem to find...

JonW_1-1703880035484.png
  • 2136 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Are you getting any error messages? what happens when you do a "ls /dbfs/"? are you able to list all the parquet files?

  • 0 kudos
1 More Replies
karthik_p
by Esteemed Contributor
  • 12262 Views
  • 3 replies
  • 1 kudos

does delta live tables supports identity columns

we are able to test identity columns using sql/python, but when we are trying same using DLT, we are not seeing values under identity column. it is always empty for coloumn we created "id BIGINT GENERATED ALWAYS AS IDENTITY" 

  • 12262 Views
  • 3 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@Retired_mod thank you for quick response, we are able to generate for streaming and materialized views. but only confusion that i am seeing is, in terms of limitations that are mentioned in DLT Identity columns are not supported with tables that are...

  • 1 kudos
2 More Replies
AzaharNadaf
by New Contributor III
  • 1194 Views
  • 2 replies
  • 0 kudos

Resolved! Using FOR XML RAW in Spark SQL

How can I convert below SQL server query to spark SQL Query - SELECT distinct HashBytes('md5', (SELECT a, b, c for xml raw)) as xyzzy  FROM table nameNeed help here community

  • 1194 Views
  • 2 replies
  • 0 kudos
Latest Reply
AzaharNadaf
New Contributor III
  • 0 kudos

The Idea here was to create a unique id based upon the number of columns, I have used dense Rank to resolve this issueThanks

  • 0 kudos
1 More Replies
rt-slowth
by Contributor
  • 2988 Views
  • 3 replies
  • 0 kudos

What is the difference between SQS in S3 and AutoLoader in databricks?

I'm curious about the difference between using S3's SQS to set the queue url in spark's readStream option and AutoLoader reading from Cloudfiles.I would also like some advice on which one is better to use in which situation.(from a cost and performan...

  • 2988 Views
  • 3 replies
  • 0 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 0 kudos

Using Spark readStream with SQS will be very similar to using CloudFiles with FileNotification mode (which also uses SQS on the backend).CloudFiles comes with some additional options compared to native Spark readStream:Common Auto Loader OptionsOne b...

  • 0 kudos
2 More Replies
ElaPG
by New Contributor III
  • 3617 Views
  • 6 replies
  • 1 kudos

Command restrictions

Is there any possibility to restrict usage of specified commands (like mount/unmount or SQL grant) based on group assignment? I do not want everybody to be able to execute these commands.

  • 3617 Views
  • 6 replies
  • 1 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 1 kudos

If you are ok for users to have only SQL syntax available ( no mounts ) , you can provision SQL warehouse for users , not clusters

  • 1 kudos
5 More Replies
Kazer
by New Contributor III
  • 6684 Views
  • 2 replies
  • 1 kudos

Resolved! com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption.

Hi. I am trying to read from our Microsoft SQL Server from Azure Databricks via spark.read.jdbc() as described here: Query databases using JDBC - Azure Databricks | Microsoft Learn. The SQL Server is on an Azure VM in a virtual network peered with th...

  • 6684 Views
  • 2 replies
  • 1 kudos
Latest Reply
databricks26
New Contributor II
  • 1 kudos

Hi @Kazer ,Even if I use a new table name, I get the same error. Do you have any suggestions?Thanks,

  • 1 kudos
1 More Replies
dzmitry_tt
by New Contributor
  • 1346 Views
  • 0 replies
  • 0 kudos

" Token is expiring within 30 seconds." when running a job using Databricks SDK

During attempt (in Azure DevOps environment) of running a job (and getting the result of the run) using Databricks SDK for Python, I've got this error:databricks.sdk.core.DatabricksError: Token is expiring within 30 seconds. (token expired). The stac...

  • 1346 Views
  • 0 replies
  • 0 kudos
DaveLeach
by New Contributor III
  • 2092 Views
  • 3 replies
  • 0 kudos

Resolved! Community Edition Sign Up Issues

Hi, I have had to sign up for the Community Edition using my work email address so I can use this for training courses.  I used my work mail address as it required me to enter my company name and position.  The signup is successful but I have 2 quest...

  • 2092 Views
  • 3 replies
  • 0 kudos
Latest Reply
BR_DatabricksAI
Contributor
  • 0 kudos

Hello DeveCommunity edition will be valid only for 14 days and you cannot login with the same credential until you convert the same in pay as you go model. Even though if you would like to use the Community edition with your personnel email id that a...

  • 0 kudos
2 More Replies
Karene
by New Contributor
  • 1700 Views
  • 1 replies
  • 0 kudos

Migrating jsonb data from Postgresql database to Databricks

Hi Team,I am trying to create a pipeline to incrementally ingest data from an RDS postgresql database which contains tables that have some columns of jsonb data type. I am currently using AWS DMS with CDC to first load the data into an S3 bucket as c...

  • 1700 Views
  • 1 replies
  • 0 kudos
Latest Reply
BR_DatabricksAI
Contributor
  • 0 kudos

Hello Karene, You can do the transformation in following manner from string to struct and refer to the example below: data =[('001','{"name":"bhupendra","zipcode":"260100"}')]schema = ['id','propertytype']df = spark.createDataFrame(data,schema)df.sho...

  • 0 kudos
Faisal
by Contributor
  • 4018 Views
  • 4 replies
  • 1 kudos

urgent: apply changes into delta live tables

Hi @Retired_mod , @Avnish_Jain How can I implement cdc in sql dlt pipelines with a live table (not streaming) . I am trying to implement below where i am reading from external tables, loading data into bronze layer and then want to apply these change...

  • 4018 Views
  • 4 replies
  • 1 kudos
Latest Reply
Avnish_Jain
Databricks Employee
  • 1 kudos

Hi Faisal, APPLY CHANGES INTO does not support a materialized view as a source, this must be a streaming table. Ideally, your bronze tables are append-only with the source providing data incrementally. If you do get revisions on previous records in y...

  • 1 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels