cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

TinasheChinyati
by New Contributor
  • 2450 Views
  • 2 replies
  • 0 kudos

Is databricks capable of housing OLTP and OLAP?

Hi data experts.I currently have an OLTP (Azure SQL DB) that keeps data only for the past 14 days. We use Partition switching to achieve that and have an ETL (Azure data factory) process that feeds the Datawarehouse (Azure Synapse Analytics). My requ...

  • 2450 Views
  • 2 replies
  • 0 kudos
Latest Reply
ChrisCkx
New Contributor II
  • 0 kudos

Hi @Kaniz I have looked at this topic extensively and have even tried to implement it.I am a champion of databricks at my organization, but I do not think that it currently enables the OLTP scenarios.The closest I have gotten to it is by using the St...

  • 0 kudos
1 More Replies
dbal
by New Contributor III
  • 816 Views
  • 2 replies
  • 0 kudos

withColumnRenamed does not work with databricks-connect 14.3.0

I am not able to run our unit tests suite due a possible bug in the databricks-connect library. The problem is with the Dataframe transformation withColumnRenamed. When I run it in a Databricks cluster (Databricks Runtime 14.3 LTS), the column is ren...

dbal_3-1715382511871.png dbal_4-1715382516217.png dbal_1-1715383269610.png
  • 816 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@dbal - can you please try withColumnsRenamed() instead Reference: https://docs.databricks.com/en/release-notes/dbconnect/index.html#databricks-connect-1430-python

  • 0 kudos
1 More Replies
Sushmg
by New Contributor
  • 1903 Views
  • 1 replies
  • 0 kudos

Call rest api

Hi there is requirements to create a pipeline that calls api and store that data in datawarehouse. Can you suggest me the best way to do this

  • 1903 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Sushmg, Please refer to the Databricks documentation and resources for more detailed instructions and examples.

  • 0 kudos
StephanKnox
by New Contributor II
  • 370 Views
  • 1 replies
  • 1 kudos

Parametrized SQL - Pass column names as a parameter?

Hi all, Is there a way to pass a column name(not a value) in a parametrized Spark SQL query?I am trying to do it like so, however it does not work as I think column name get expanded like 'value' i.e. surrounded by single quotes: def count_nulls(df:D...

  • 370 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @StephanKnox , You can use string interpolation (f-strings) to dynamically insert the column name into your query.

  • 1 kudos
Dhruv-22
by New Contributor III
  • 594 Views
  • 2 replies
  • 0 kudos

Understanding least common type in databricks

I was reading the data type rules and found about least common type.I have a doubt. What is the least common type of STRING and INT? The referred link gives the following example saying the least common type is BIGINT.-- The least common type between...

  • 594 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Dhruv-22, The concept of the least common type can indeed be a bit tricky, especially when dealing with different data types like STRING and INT. Let’s dive into this and clarify the behaviour in Apache Spark™ and Databricks. Coalesce Functi...

  • 0 kudos
1 More Replies
SparkMaster
by New Contributor III
  • 5325 Views
  • 10 replies
  • 1 kudos

Why can't I delete experiments without deleting the notebook? Or better Organize experiments into folders?

My Databricks Experiments is cluttered with a whole lot of experiments. Many of them are notebooks which are showing there for some reason (even though they didn't have an MLflow run associated with it). I would like to delete the experiments, but it...

  • 5325 Views
  • 10 replies
  • 1 kudos
Latest Reply
mhiltner
New Contributor III
  • 1 kudos

Hey @Debayan @SparkMaster  A bit late here, but I believe this is being caused by a click on the right side experiments icon. This may look like a meaningless click but it actually triggers a run. 

  • 1 kudos
9 More Replies
210227
by New Contributor III
  • 867 Views
  • 1 replies
  • 0 kudos

Resolved! External table from external location

Hi, I'm creating external table from existing external location and am a bit puzzled as to what permissions I need for it or what is the correct way of defining the S3 path with wildcards. This:create external table if not exists test_catalogue_dev.b...

  • 867 Views
  • 1 replies
  • 0 kudos
Latest Reply
210227
New Contributor III
  • 0 kudos

Just for the reference, the wildcard is not needed in this case, just a misleading error message. In this case 's3://test-data/full/2023/01/' instead of 's3://test-data/full/2023/01/*/' was the correct PATH

  • 0 kudos
vvt1976
by New Contributor
  • 414 Views
  • 1 replies
  • 0 kudos

Create table using a location

Hi,Databricks newbie here. I have copied delta files from my Synapse workspace into DBFS. To add them as a table, I executed.create table audit_payload using delta location '/dbfs/FileStore/data/general/audit_payload'The command executed properly. Ho...

Data Engineering
data engineering
  • 414 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

can you read the delta lake files using spark.read.format("delta").load("path/to/delta/table")?If not, it is not a valid delta lake table, which is my guess as creating a table from delta lake is nothing more than a semantic wrapper around the actual...

  • 0 kudos
PankajMendi
by New Contributor
  • 261 Views
  • 1 replies
  • 0 kudos

Error accessing Azure sql from Azure databricks using jdbc authentication=ActiveDirectoryInteractive

Getting below error while accessing Azure sql using jdbc from Azure databricks notebook,com.microsoft.sqlserver.jdbc.SQLServerException: Failed to authenticate the user p***** in Active Directory (Authentication=ActiveDirectoryInteractive). Unable to...

  • 261 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

it seems you are trying to do MFA authentication using jdbc.The used driver might not support that. It could also be a OS issue (if you are not using Windows f.e.) or a browser issue (the browser will have to open a window/tab).Can you try to authent...

  • 0 kudos
anuintuceo
by New Contributor
  • 214 Views
  • 1 replies
  • 0 kudos

unzip a password protected file using synapse notebook

I have a zipped file. It has 3 csv files.  It is password protected. When I tried extracting it manually it will extract only with 7zip. I moved my zipped file to ADLS automatically and want to extract it with the password. How to unzip the file and ...

  • 214 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

 it is most probably possible.If you use python, the zipfile library can do it, something like this:with zipfile.ZipFile(zip_file_path, 'r') as zip_ref: zip_ref.extractall(path=extract_to, pwd=bytes(password,'utf-8')) In scala there is f....

  • 0 kudos
Sushmg
by New Contributor
  • 974 Views
  • 1 replies
  • 0 kudos

Rest Api call

There is a requirement to create a pipeline that calls a rest Api and we have to store the data in datawarehouse which is the best was to do this operation?

  • 974 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

there are several ways to do this.you could use python (or scala or ...) to do the call. then transform it and write it to the dwh.Or you could do the call, write the raw data and process it later on.Or you could use an ETL/ELT tool that can do the r...

  • 0 kudos
amitkmaurya
by New Contributor III
  • 848 Views
  • 2 replies
  • 2 kudos

Resolved! How to increase executor memory in Databricks jobs

May be I am new to Databricks that's why I have confusion.Suppose I have worker memory of 64gb in Databricks job max 12 nodes...and my job is failing due to Executor Lost due to 137 (OOM if found on internet).So, to fix this I need to increase execut...

  • 848 Views
  • 2 replies
  • 2 kudos
Latest Reply
amitkmaurya
New Contributor III
  • 2 kudos

Hi @raphaelblg ,I have solved this issue. Yes, in my case data skewness was the issue that was causing this executor OOM, so adding repartition just before writing resolved this skewness. I didn't change any workers or driver memory.Thanks for your h...

  • 2 kudos
1 More Replies
amitkmaurya
by New Contributor III
  • 1018 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks job keep getting failed due to executor lost.

Getting following error while saving a dataframe partitioned by two columns.Job aborted due to stage failure: Task 5774 in stage 33.0 failed 4 times, most recent failure: Lost task 5774.3 in stage 33.0 (TID 7736) (13.2.96.110 executor 7): ExecutorLos...

Data Engineering
databricks jobs
spark
  • 1018 Views
  • 2 replies
  • 2 kudos
Latest Reply
amitkmaurya
New Contributor III
  • 2 kudos

Hi, I have solved the problem with the same workers and driver.In my case data skewness was the problem.Adding repartition to the dataframe just before writing, evenly distributed the data across the nodes and this stage failure resolved.Thanks @Kani...

  • 2 kudos
1 More Replies
Mirza1
by New Contributor
  • 307 Views
  • 1 replies
  • 0 kudos

Error while Running a Table

Hi All,I am trying to run table schema and facing below error.Error - AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table.com.databricks.backend.common.rpc.SparkDriverExceptions$SQLExecutionException: org.apache...

  • 307 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Honored Contributor
  • 0 kudos

Hi @Mirza1 , Greetings!  Can you please confirm if it is an ADLS gen2 table? If yes then can you please give it a try to run the table schema by setting spark configs for gen2 at the cluster level?   You can refer to this document to set the spark co...

  • 0 kudos
Silabs
by New Contributor
  • 1051 Views
  • 3 replies
  • 3 kudos

Resolved! Set up connection to on prem sql server

Ive just set up our databricks environment. Hosted in AWS. We have an on prem SQL server and would like to connect . How can i do that?

  • 1051 Views
  • 3 replies
  • 3 kudos
Latest Reply
Yeshwanth
Honored Contributor
  • 3 kudos

@Silabs good day! To connect your Databricks environment (hosted on AWS) to your on-premise SQL server, follow these steps: 1. Network Setup: Establish a connection between your SQL server and the Databricks virtual private cloud (VPC) using VPN or A...

  • 3 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels