cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Simon_T
by New Contributor III
  • 2241 Views
  • 1 replies
  • 0 kudos

CURL API - Error while parsing token: io.jsonwebtoken.ExpiredJwtException: JWT expired

I am running this code:curl -X --request GET -H "Authorization: Bearer <databricks token>" "https://adb-1817728758721967.7.azuredatabricks.net/api/2.0/clusters/list"And I am getting this error:2024-01-17T13:21:41.4245092Z </head>2024-01-17T13:21:41.4...

  • 2241 Views
  • 1 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi, Could you please renew the token and confirm? 

  • 0 kudos
jborn
by New Contributor III
  • 6785 Views
  • 7 replies
  • 1 kudos

Resolved! Connecting an Azure Databricks to Azure Gen 2 storage stuck on "Running Command..."

I recently had an Azure Databricks setup done behind a VPN.  I'm trying to connect to my Azure Storage Account Gen 2  Using the following code I haven't been able to connect and keep getting stuck on reading the file.  What should I be checking?   #i...

  • 6785 Views
  • 7 replies
  • 1 kudos
Latest Reply
jborn
New Contributor III
  • 1 kudos

I ended up opening a ticket with Microsoft support about this issue and they walked us through the debugging on the issue.  In the end the route table was not attached to the subnet.  Once attached everything worked.

  • 1 kudos
6 More Replies
VJ3
by New Contributor III
  • 3853 Views
  • 3 replies
  • 2 kudos

Best Practice to use/implement SQL Persona using Azure Databricks

Hello,I am looking for details of Security Controls to use/implement SQL Persona using Azure Databricks.

  • 3853 Views
  • 3 replies
  • 2 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 2 kudos

Hi, There are several documents for the same and can be followed, let me know if the below helps.  https://learn.microsoft.com/en-us/answers/questions/1039176/whitelist-databricks-to-read-and-write-into-azure https://www.databricks.com/blog/2020/03/2...

  • 2 kudos
2 More Replies
Twilight
by New Contributor III
  • 4427 Views
  • 5 replies
  • 3 kudos

Resolved! Bug - Databricks requires extra escapes in repl string in regexp_replace (compared to Spark)

In Spark (but not Databricks), these work:regexp_replace('1234567890abc', '^(?<one>\\w)(?<two>\\w)(?<three>\\w)', '$3$2$1') regexp_replace('1234567890abc', '^(?<one>\\w)(?<two>\\w)(?<three>\\w)', '${three}${two}${one}')In Databricks, you have to use ...

  • 4427 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Stephen Wilcoxon​ : No, it is not a bug. Databricks uses a different flavor of regular expression syntax than Apache Spark. In particular, Databricks uses Java's regular expression syntax, whereas Apache Spark uses Scala's regular expression syntax....

  • 3 kudos
4 More Replies
ChristianRRL
by Contributor III
  • 1898 Views
  • 1 replies
  • 1 kudos

Resolved! DLT Bronze: Incremental File Updates

Hi there, I would like to clarify if there's a way for bronze data to be ingested from "the same" CSV file if the file has been modified (i.e. new file with new records overwriting the old file)? Currently in my setup my bronze table is a `streaming ...

  • 1898 Views
  • 1 replies
  • 1 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 1 kudos

You can use the option "cloudFiles.allowOverwrites" in DLT. This option will allow you to read the same csv file again but you should use it cautiously, as it can lead to duplicate data being loaded.

  • 1 kudos
otum
by New Contributor II
  • 2187 Views
  • 6 replies
  • 0 kudos

[Errno 2] No such file or directory

I am reading a Json a file as in below location, using the below code,    file_path = "/dbfs/mnt/platform-data/temp/ComplexJSON/sample.json" # replace with the file path f = open(file_path, "r") print(f.read())     but it is failing for no such file...

otum_0-1704950000614.png otum_0-1704949958734.png
  • 2187 Views
  • 6 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi, As Shan mentioned, could you please cat the file and see if it exists. 

  • 0 kudos
5 More Replies
ac0
by New Contributor III
  • 3981 Views
  • 4 replies
  • 0 kudos

Resolved! Setting environment variables to use in a SQL Delta Live Table Pipeline

I'm trying to use the Global Init Scripts in Databricks to set an environment variable to use in a Delta Live Table Pipeline. I want to be able to reference a value passed in as a path versus hard coding it. Here is the code for my pipeline:CREATE ST...

  • 3981 Views
  • 4 replies
  • 0 kudos
Latest Reply
ac0
New Contributor III
  • 0 kudos

I was able to accomplish this by creating a Cluster Policy that put in place the scripts, config settings, and environment variables I needed.

  • 0 kudos
3 More Replies
ChrisS
by New Contributor III
  • 4034 Views
  • 7 replies
  • 8 kudos

How to get data scraped from the web into your data storage

I learning data bricks for the first time following the book that is copywrited in 2020 so I imagine it might be a little outdated at this point. What I am trying to do is move data from an online source (in this specific case using shell script but ...

  • 4034 Views
  • 7 replies
  • 8 kudos
Latest Reply
CharlesReily
New Contributor III
  • 8 kudos

In Databricks, you can install external libraries by going to the Clusters tab, selecting your cluster, and then adding the Maven coordinates for Deequ. This represents the best b2b data enrichment services in Databricks.In your notebook or script, y...

  • 8 kudos
6 More Replies
aockenden
by New Contributor III
  • 1697 Views
  • 3 replies
  • 0 kudos

Switching SAS Tokens Mid-Script With Spark Dataframes

Hey all, my team has settled on using directory-scoped SAS tokens to provision access to data in our Azure Gen2 Datalakes. However, we have encountered an issue when switching from a first SAS token (which is used to read a first parquet table in the...

  • 1697 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @aockenden, The data in the Data Lake is not actually retrieved into cluster memory by the Spark dataframes until an action (like .show()) is executed. At this point, the fs.azure.sas.fixed.token Spark configuration setting has been switched to a ...

  • 0 kudos
2 More Replies
JKR
by Contributor
  • 2095 Views
  • 1 replies
  • 0 kudos

Databricks sql variables and if/else workflow

I have 2 tasks in databricks job workflow first task is of type SQL and SQL task is query.In that query I've declared 2 variables and SET the values by running query.e.g:DECLARE VARIABLE max_timestamp TIMESTAMP DEFAULT '1970-01-01'; SET VARIABLE max_...

Data Engineering
databricks-sql
Workflows
  • 2095 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @JKR, You can indeed pass variables between tasks in a Databricks job workflow. This is done using the taskValues subutility in Databricks Utilities. This utility allows tasks to output values that can be referenced in subsequent tasks.   However,...

  • 0 kudos
BobEng
by New Contributor
  • 1683 Views
  • 2 replies
  • 0 kudos

Delta Live Tables are dropped when pipeline is deleted

I created simplistic DLT pipeline that create one table. When I delete the pipeline the tables is dropped as well. That's not really desired behavior. Since I remember there was a strong distinction between data (stored in tables) and processing (spa...

  • 1683 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @BobEng, Here are a few things that might help:   Pipeline Settings: Delta Live Tables provides a user interface for configuring and editing pipeline settings. You can configure most settings with either the UI or a JSON specification.  Table Mana...

  • 0 kudos
1 More Replies
rt-slowth
by Contributor
  • 2212 Views
  • 4 replies
  • 0 kudos

User: anonymous is not authorized to perform: sqs:receivemessage on resource

  from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/xxxx/' S3_SCHEMA = 's3://datalake-lab/xxxx/schemas/' @dl...

  • 2212 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
3 More Replies
pyter
by New Contributor III
  • 5602 Views
  • 6 replies
  • 2 kudos

Resolved! [13.3] Vacuum on table fails if shallow clone without write access exists

Hello everyone,We use unity catalog, separating our dev, test and prod data into individual catalogs.We run weekly vacuums on our prod catalog using a service principal that only has (read+write) access to this production catalog, but no access to ou...

  • 5602 Views
  • 6 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 2 kudos
5 More Replies
Databricks_Work
by New Contributor II
  • 1052 Views
  • 4 replies
  • 1 kudos
  • 1052 Views
  • 4 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 1 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels
Top Kudoed Authors